Lisp machine

Lisp machine

Lisp machines are general-purpose computers designed to efficiently run Lisp as their main software and programming language, usually via hardware support. They are an example of a high-level language computer architecture. In a sense, they were the first commercial single-user workstations. Despite being modest in number (perhaps 7,000 units total as of 1988) Lisp machines commercially pioneered some now-commonplace technologies, including networking innovations such as Chaosnet, and effective garbage collection. Several firms built and sold Lisp machines in the 1980s: Symbolics (3600, 3640, XL1200, MacIvory, and other models), Lisp Machines Incorporated (LMI Lambda), Texas Instruments (Explorer, MicroExplorer), and Xerox (Interlisp-D workstations). The operating systems were written in Lisp Machine Lisp, Interlisp (Xerox), and later partly in Common Lisp. == History == === Historical context === Artificial intelligence (AI) computer programs of the 1960s and 1970s intrinsically required what was then considered a huge amount of computer power, as measured in processor time and memory space. The power requirements of AI research were exacerbated by the Lisp symbolic programming language, when commercial hardware was designed and optimized for assembly- and Fortran-like programming languages. At first, the cost of such computer hardware meant that it had to be shared among many users. As integrated circuit technology shrank the size and cost of computers in the 1960s and early 1970s, and the memory needs of AI programs began to exceed the address space of the most common research computer, the Digital Equipment Corporation (DEC) PDP-10, researchers considered a new approach: a computer designed specifically to develop and run large artificial intelligence programs, and tailored to the semantics of the Lisp language. To provide consistent performance for interactive programs, these machines would often not be shared, but would be dedicated to a single user at a time. === Initial development === In 1973, Richard Greenblatt and Thomas Knight, programmers at Massachusetts Institute of Technology (MIT) Artificial Intelligence Laboratory (AI Lab), began what would become the MIT Lisp Machine Project when they first began building a computer hardwired to run certain basic Lisp operations, rather than run them in software, in a 24-bit tagged architecture. The machine also did incremental (or Arena) garbage collection. More specifically, since Lisp variables are typed at runtime rather than compile time, a simple addition of two variables could take five times as long on conventional hardware, due to test and branch instructions. Lisp Machines ran the tests in parallel with the more conventional single instruction additions. If the simultaneous tests failed, then the result was discarded and recomputed; this meant in many cases a speed increase by several factors. This simultaneous checking approach was used as well in testing the bounds of arrays when referenced, and other memory management necessities (not merely garbage collection or arrays). Type checking was further improved and automated when the conventional byte word of 32 bits was lengthened to 36 bits for Symbolics 3600-model Lisp machines and eventually to 40 bits or more (usually, the excess bits not accounted for by the following were used for error-correcting codes). The first group of extra bits were used to hold type data, making the machine a tagged architecture, and the remaining bits were used to implement compressed data representation (CDR) coding (wherein the usual linked list elements are compressed to occupy roughly half the space), aiding garbage collection by reportedly an order of magnitude. A further improvement was two microcode instructions which specifically supported Lisp functions, reducing the cost of calling a function to as little as 20 clock cycles, in some Symbolics implementations. The first machine was called the CONS machine (named after the list construction operator cons in Lisp). Often it was affectionately referred to as the Knight machine, perhaps since Knight wrote his master's thesis on the subject; it was extremely well received. It was subsequently improved into a version called CADR (a pun; in Lisp, the cadr function, which returns the second item of a list, is pronounced /ˈkeɪ.dəɹ/ or /ˈkɑ.dəɹ/, as some pronounce the word "cadre") which was based on essentially the same architecture. About 25 of what were essentially prototype CADRs were sold within and without MIT for ~$50,000; it quickly became the favorite machine for hacking – many of the most favored software tools were quickly ported to it (e.g. Emacs was ported from ITS in 1975). It was so well received at an AI conference held at MIT in 1978 that Defense Advanced Research Projects Agency (DARPA) began funding its development. === Commercializing MIT Lisp machine technology === In 1979, Russell Noftsker, being convinced that Lisp machines had a bright commercial future due to the strength of the Lisp language and the enabling factor of hardware acceleration, proposed to Greenblatt that they commercialize the technology. In a counter-intuitive move for an AI Lab hacker, Greenblatt acquiesced, hoping perhaps that he could recreate the informal and productive atmosphere of the Lab in a real business. These ideas and goals were considerably different from those of Noftsker. The two negotiated at length, but neither would compromise. As the proposed firm could succeed only with the full and undivided assistance of the AI Lab hackers as a group, Noftsker and Greenblatt decided that the fate of the enterprise was up to them, and so the choice should be left to the hackers. The ensuing discussions of the choice divided the lab into two factions. In February 1979, matters came to a head. The hackers sided with Noftsker, believing that a commercial venture-fund-backed firm had a better chance of surviving and commercializing Lisp machines than Greenblatt's proposed self-sustaining start-up. Greenblatt lost the battle. It was at this juncture that Symbolics, Noftsker's enterprise, slowly came together. While Noftsker was paying his staff a salary, he had no building or any equipment for the hackers to work on. He bargained with Patrick Winston that, in exchange for allowing Symbolics' staff to keep working out of MIT, Symbolics would let MIT use internally and freely all the software Symbolics developed. A consultant from CDC, who was trying to put together a natural language computer application with a group of West-coast programmers, came to Greenblatt, seeking a Lisp machine for his group to work with, about eight months after the disastrous conference with Noftsker. Greenblatt had decided to start his own rival Lisp machine firm, but he had done nothing. The consultant, Alexander Jacobson, decided that the only way Greenblatt was going to start the firm and build the Lisp machines that Jacobson desperately needed was if Jacobson pushed and otherwise helped Greenblatt launch the firm. Jacobson pulled together business plans, a board, a partner for Greenblatt (one F. Stephen Wyle). The newfound firm was named LISP Machine, Inc. (LMI), and was funded by CDC orders, via Jacobson. Around this time Symbolics (Noftsker's firm) began operating. It had been hindered by Noftsker's promise to give Greenblatt a year's head start, and by severe delays in procuring venture capital. Symbolics still had the major advantage that while 3 or 4 of the AI Lab hackers had gone to work for Greenblatt, 14 other hackers had signed onto Symbolics. Two AI Lab people were not hired by either: Richard Stallman and Marvin Minsky. Stallman, however, blamed Symbolics for the decline of the hacker community that had centered around the AI lab. For two years, from 1982 to the end of 1983, Stallman worked by himself to clone the output of the Symbolics programmers, with the aim of preventing them from gaining a monopoly on the lab's computers. Regardless, after a series of internal battles, Symbolics did get off the ground in 1980/1981, selling the CADR as the LM-2, while Lisp Machines, Inc. sold it as the LMI-CADR. Symbolics did not intend to produce many LM-2s, since the 3600 family of Lisp machines was supposed to ship quickly, but the 3600s were repeatedly delayed, and Symbolics ended up producing ~100 LM-2s, each of which sold for $70,000. Both firms developed second-generation products based on the CADR: the Symbolics 3600 and the LMI-LAMBDA (of which LMI managed to sell ~200). The 3600, which shipped a year late, expanded on the CADR by widening the machine word to 36-bits, expanding the address space to 28-bits, and adding hardware to accelerate certain common functions that were implemented in microcode on the CADR. The LMI-LAMBDA, which came out a year after the 3600, in 1983, was compatible with the CADR (it could run CADR microcode), but hardware differences existed. Texas Instruments (TI) joined the fray whe

Advanced automation functions

In automation production technology the actions performed by an automated process are executed by a program of instructions which is run during a work cycle. To execute work cycle programs, an automated system should be available to execute these advanced functions. == Safety monitoring == If there is a need for workers in an automated system, a safety monitoring is required for the occupational safety and health of the workers. In a safety monitoring various steps can take place including a complete stop of the system, sounding an alarm or reducing the operating speed. Usually, limiting switches are sensors like temperature probes, heat and smoke detectors or pressure sensitive floor pads. == Maintenance and repair diagnostics == There are three modes of operations which are used in a cycle of maintenance and repair diagnostics: status monitoring, failure diagnostics and recommendation of the repair procedure. In the status monitoring mode, the current system status is displayed. The failure diagnostics mode takes place when a failure occurs. The system will then suggest an adequate repair procedure to a team of experts. == Error detection and recovery == The error detection mode is a step to determine if and when a failure occurs in automated system. The possible errors can be divided into three categories. random errors, systematic errors and aberrations. While in the error recovery mode, remedy actions take place for all detected errors.

AI-assisted reverse engineering

AI-assisted reverse engineering (AIARE) is a branch of computer science that leverages artificial intelligence (AI), notably machine learning (ML) strategies, to augment and automate the process of reverse engineering. The latter involves breaking down a product, system, or process to comprehend its structure, design, and functionality. AIARE was primarily introduced in the early years of the 21st century, witnessing substantial advancements from the mid-2010s onwards. == Overview == Conventionally, reverse engineering is conducted by specialists who dismantle a system to grasp its working principles, often for the purposes of reproduction, modification, enhancement of compatibility, or forensic examination. This method, while efficient, can be laborious and time-intensive, particularly when dealing with intricate software or hardware systems. AIARE integrates machine learning algorithms to either partially automate or augment this process. It is capable of detecting patterns, relationships, structures, and potential vulnerabilities within the analyzed system, frequently surpassing human experts in speed and accuracy. This has rendered AIARE a critical tool in numerous fields, including cybersecurity, software development, and hardware design and analysis. == Techniques == AIARE encompasses several AI methodologies: === Supervised learning === Supervised learning employs tagged data to train models to recognize system components, their operations, and their interconnections. This method is particularly helpful in software analysis to discover vulnerabilities or enhance compatibility. === Unsupervised learning === Unsupervised learning is utilized to detect concealed patterns and structures in untagged data. It proves beneficial in comprehending complex systems where there's no evident labeling or mapping of components. === Reinforcement learning === Reinforcement learning is employed to build models that progressively refine their system understanding through a process of trial and error. This method is often implemented when deciphering a system's functionality under various circumstances or configurations. === Deep learning === Deep learning is employed for analysis of high-dimensional data. For instance, deep learning techniques can aid in examining the layout and connections of integrated circuits (ICs), substantially reducing the manual effort required for reverse engineering. == Benefits == === Usable Security === AIARE expands usable security as reverse engineering is traditionally slow and highly specialized as it produces dense, low-level information (usually in Assembly or C) when using tools like Ghidra. The use of multiple different methods to interface with models today (such as through chat bots like ChatGPT) greatly reduces the barrier to entry by providing a clear way to interact with the user and even providing meaningful decompiled source code. In addition, either done automatically or through prompt engineering, a model is capable of producing a high-level summary and explanation of its reverse engineering efforts in human-readable form that doesn't require much knowledge on code. === Speedup === AIARE is capable of processing data much faster than humans, providing a boost in speed when analyzing said data. In the context of computer security, this can greatly speed up incident management or response and malware detection as AIARE can be automated to drastically reduce the manual effort usually associated with reverse engineering. == Limitations == In an effort to improve readability for reverse engineering, AI-generated code may introduce erroneous bugs not present in the source. This compromises the correctness of the code if not carefully validated and will throw off reverse engineering efforts. Additionally, AIARE's weakness in zero-shot prompting makes gathering accurate data without reference data in the prompt more inconsistent, thus requiring a user to provide some quality data of their own that hurts its usability.

Weak stability boundary

Weak stability boundary (WSB), including low-energy transfer, is a concept introduced by Edward Belbruno in 1987. The concept explained how a spacecraft could change orbits using very little fuel. Weak stability boundary is defined for the three-body problem. This problem considers the motion of a particle P of negligible mass moving with respect to two larger bodies, P1, P2, modeled as point masses, where these bodies move in circular or elliptical orbits with respect to each other, and P2 is smaller than P1. The force between the three bodies is the classical Newtonian gravitational force. For example, P1 is the Earth, P2 is the Moon and P is a spacecraft; or P1 is the Sun, P2 is Jupiter and P is a comet, etc. This model is called the restricted three-body problem. The weak stability boundary defines a region about P2 where P is temporarily captured. This region is in position-velocity space. Capture means that the Kepler energy between P and P2 is negative. This is also called weak capture. == Background == This boundary was defined for the first time by Edward Belbruno of Princeton University in 1987. He described a Low-energy transfer which would allow a spacecraft to change orbits using very little fuel. It was for motion about Moon (P2) with P1 = Earth. It is defined algorithmically by monitoring cycling motion of P about the Moon and finding the region where cycling motion transitions between stable and unstable after one cycle. Stable motion means P can completely cycle about the Moon for one cycle relative to a reference section, starting in weak capture. P needs to return to the reference section with negative Kepler energy. Otherwise, the motion is called unstable, where P does not return to the reference section within one cycle or if it returns, it has non-negative Kepler energy. The set of all transition points about the Moon comprises the weak stability boundary, W. The motion of P is sensitive or chaotic as it moves about the Moon within W. A mathematical proof that the motion within W is chaotic was given in 2004. This is accomplished by showing that the set W about an arbitrary body P2 in the restricted three-body problem contains a hyperbolic invariant set of fractional dimension consisting of the infinitely many intersections Hyperbolic manifolds. The weak stability boundary was originally referred to as the fuzzy boundary. This term was used since the transition between capture and escape defined in the algorithm is not well defined and limited by the numerical accuracy. This defines a "fuzzy" location for the transition points. It is also due the inherent chaos in the motion of P near the transition points. It can be thought of as a fuzzy chaos region. As is described in an article in Discover magazine, the WSB can be roughly viewed as the fuzzy edge of a region, referred to as a gravity well, about a body (the Moon), where its force of gravity becomes small enough to be dominated by force of gravity of another body (the Earth) and the motion there is chaotic. A much more general algorithm defining W was given in 2007. It defines W relative to n-cycles, where n = 1,2,3,..., yielding boundaries of order n. This gives a much more complex region consisting of the union of all the weak stability boundaries of order n. This definition was explored further in 2010. The results suggested that W consists, in part, of the hyperbolic network of invariant manifolds associated to the Lyapunov orbits about the L1, L2 Lagrange points near P2. The explicit determination of the set W about P2 = Jupiter, where P1 is the Sun, is described in "Computation of Weak Stability Boundaries: Sun-Jupiter Case". It turns out that a weak stability region can also be defined relative to the larger mass point, P1. A proof of the existence of the weak stability boundary about P1 was given in 2012, but a different definition is used. The chaos of the motion is analytically proven in "Geometry of Weak Stability Boundaries". The boundary is studied in "Applicability and Dynamical Characterization of the Associated Sets of the Algorithmic Weak Stability Boundary in the Lunar Sphere of Influence". == Applications == There are a number of important applications for the weak stability boundary (WSB). Since the WSB defines a region of temporary capture, it can be used, for example, to find transfer trajectories from the Earth to the Moon that arrive at the Moon within the WSB region in weak capture, which is called ballistic capture for a spacecraft. No fuel is required for capture in this case. This was numerically demonstrated in 1987. This is the first reference for ballistic capture for spacecraft and definition of the weak stability boundary. The boundary was operationally demonstrated to exist in 1991 when it was used to find a ballistic capture transfer to the Moon for Japan's Hiten spacecraft. Other missions have used the same transfer type as Hiten, including Grail, Capstone, Danuri, Hakuto-R Mission 1 and SLIM. The WSB for Mars is studied in "Earth-Mars Transfers with Ballistic Capture" and ballistic capture transfers to Mars are computed. The BepiColombo mission of ESA should achieve ballistic capture at the WSB of Mercury in November 2026. The WSB region can be used in the field of Astrophysics. It can be defined for stars within open star clusters. This is done in "Chaotic Exchange of Solid Material Between Planetary Systems: Implications for the Lithopanspermia Hypothesis" to analyze the capture of solid material that may have arrived on the Earth early in the age of the Solar System to study the validity of the lithopanspermia hypothesis. Numerical explorations of trajectories for P starting in the WSB region about P2 show that after the particle P escapes P2 at the end of weak capture, it moves about the primary body, P1, in a near resonant orbit, in resonance with P2 about P1. This property was used to study comets that move in orbits about the Sun in orbital resonance with Jupiter, which change resonance orbits by becoming weakly captured by Jupiter. An example of such a comet is 39P/Oterma. This property of change of resonance of orbits about P1 when P is weakly captured by the WSB of P2 has an interesting application to the field of quantum mechanics to the motion of an electron about the proton in a hydrogen atom. The transition motion of an electron about the proton between different energy states described by the Schrödinger equation is shown to be equivalent to the change of resonance of P about P1 via weak capture by P2 for a family of transitioning resonance orbits. This gives a classical model using chaotic dynamics with Newtonian gravity for the motion of an electron.

Algorithm engineering

Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithmics theory and practical applications of algorithms in software engineering. It is a general methodology for algorithmic research. == Origins == In 1995, a report from an NSF-sponsored workshop "with the purpose of assessing the current goals and directions of the Theory of Computing (TOC) community" identified the slow speed of adoption of theoretical insights by practitioners as an important issue and suggested measures to reduce the uncertainty by practitioners whether a certain theoretical breakthrough will translate into practical gains in their field of work, and tackle the lack of ready-to-use algorithm libraries, which provide stable, bug-free and well-tested implementations for algorithmic problems and expose an easy-to-use interface for library consumers. But also, promising algorithmic approaches have been neglected due to difficulties in mathematical analysis. The term "algorithm engineering" was first used with specificity in 1997, with the first Workshop on Algorithm Engineering (WAE97), organized by Giuseppe F. Italiano. == Difference from algorithm theory == Algorithm engineering does not intend to replace or compete with algorithm theory, but tries to enrich, refine and reinforce its formal approaches with experimental algorithmics (also called empirical algorithmics). This way it can provide new insights into the efficiency and performance of algorithms in cases where the algorithm at hand is less amenable to algorithm theoretic analysis, formal analysis pessimistically suggests bounds which are unlikely to appear on inputs of practical interest, the algorithm relies on the intricacies of modern hardware architectures like data locality, branch prediction, instruction stalls, instruction latencies which the machine model used in Algorithm Theory is unable to capture in the required detail, the crossover between competing algorithms with different constant costs and asymptotic behaviors needs to be determined. == Methodology == Some researchers describe algorithm engineering's methodology as a cycle consisting of algorithm design, analysis, implementation and experimental evaluation, joined by further aspects like machine models or realistic inputs. They argue that equating algorithm engineering with experimental algorithmics is too limited, because viewing design and analysis, implementation and experimentation as separate activities ignores the crucial feedback loop between those elements of algorithm engineering. === Realistic models and real inputs === While specific applications are outside the methodology of algorithm engineering, they play an important role in shaping realistic models of the problem and the underlying machine, and supply real inputs and other design parameters for experiments. === Design === Compared to algorithm theory, which usually focuses on the asymptotic behavior of algorithms, algorithm engineers need to keep further requirements in mind: Simplicity of the algorithm, implementability in programming languages on real hardware, and allowing code reuse. Additionally, constant factors of algorithms have such a considerable impact on real-world inputs that sometimes an algorithm with worse asymptotic behavior performs better in practice due to lower constant factors. === Analysis === Some problems can be solved with heuristics and randomized algorithms in a simpler and more efficient fashion than with deterministic algorithms. Unfortunately, this makes even simple randomized algorithms difficult to analyze because there are subtle dependencies to be taken into account. === Implementation === Huge semantic gaps between theoretical insights, formulated algorithms, programming languages and hardware pose a challenge to efficient implementations of even simple algorithms, because small implementation details can have rippling effects on execution behavior. The only reliable way to compare several implementations of an algorithm is to spend an considerable amount of time on tuning and profiling, running those algorithms on multiple architectures, and looking at the generated machine code. === Experiments === See: Experimental algorithmics === Application engineering === Implementations of algorithms used for experiments differ in significant ways from code usable in applications. While the former prioritizes fast prototyping, performance and instrumentation for measurements during experiments, the latter requires thorough testing, maintainability, simplicity, and tuning for particular classes of inputs. === Algorithm libraries === Stable, well-tested algorithm libraries like LEDA play an important role in technology transfer by speeding up the adoption of new algorithms in applications. Such libraries reduce the required investment and risk for practitioners, because it removes the burden of understanding and implementing the results of academic research. == Conferences == Two main conferences on Algorithm Engineering are organized annually, namely: Symposium on Experimental Algorithms (SEA), established in 1997 (formerly known as WEA). SIAM Meeting on Algorithm Engineering and Experiments (ALENEX), established in 1999. The 1997 Workshop on Algorithm Engineering (WAE'97) was held in Venice (Italy) on September 11–13, 1997. The Third International Workshop on Algorithm Engineering (WAE'99) was held in London, UK in July 1999. The first Workshop on Algorithm Engineering and Experimentation (ALENEX99) was held in Baltimore, Maryland on January 15–16, 1999. It was sponsored by DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science (at Rutgers University), with additional support from SIGACT, the ACM Special Interest Group on Algorithms and Computation Theory, and SIAM, the Society for Industrial and Applied Mathematics.

Eigenface

An eigenface ( EYE-gən-) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. == History == The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En). In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels. Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes. == Generation == A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face. The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face. The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. === Practical implementation === To create a set of eigenfaces, one must: Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance ⁠ v = ( λ 1 + λ 2 + . . . + λ n ) {\displaystyle v=(\lambda _{1}+\lambda _{2}+...+\lambda _{n})} ⁠, n = number of components, and λ {\displaystyle \lambda } represents component eigenvalue. k is the smallest number that satisfies ( λ 1 + λ 2 + . . . + λ k ) v > ϵ {\displaystyle {\frac {(\lambda _{1}+\lambda _{2}+...+\lambda _{k})}{v}}>\epsilon } These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded. === Matlab example code === Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16. Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code: === Computing the eigenvectors === Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows. Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by S v i = T T T v i = λ i v i {\displaystyle \mathbf {Sv} _{i}=\mathbf {T} \mathbf {T} ^{T}\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} However TTT is a large matrix, and if instead we take the eigenvalue decomposition of T T T u i = λ i u i {\displaystyle \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {u} _{i}} then we notice that by pre-multiplying both sides of the equation with T, we obtain T T T T u i = λ i T u i {\displaystyle \mathbf {T} \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {T} \mathbf {u} _{i}} Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have

System of record

A system of record (SOR) or source system of record (SSoR) is a data management term for an information storage system (commonly implemented on a computer system running a database management system) that is the authoritative data source for a given data element or piece of information, like for example a row (or record) in a table. In data vault it is referred to as the record source. == Background == The need to identify systems of record can become acute in organizations where management information systems have been built by taking output data from multiple source systems, re-processing this data, and then re-presenting the result for a new business use. In these cases, multiple information systems may disagree about the same piece of information. These disagreements may stem from semantic differences, differences in opinion, use of different sources, differences in the timing of the extract, transform, load processes that create the data they report against, or may simply be the result of bugs. == Use == The integrity and validity of any data set is open to question when there is no traceable connection to a good source, and listing a source system of record is a solution to this. Where the integrity of the data is vital, if there is an agreed system of record, the data element must either be linked to, or extracted directly from it. In other cases, the provenance and estimated data quality should be documented. The "system of record" approach is a good fit for environments where both: there is a single authority over all data consumers, and all consumers have similar needs == Trade-offs == In diverse environments, one instead needs to support the presence of multiple opinions. Consumers may accept different authorities or may differ on what constitutes an authoritative source—researchers may prefer carefully vetted data, while tactical military systems may require the most recent credible report.