AI Chat Vumc

AI Chat Vumc — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Biohybrid microswimmer

    Biohybrid microswimmer

    A biohybrid microswimmer also known as biohybrid nanorobot, can be defined as a microswimmer that consist of both biological and artificial constituents, for instance, one or several living microorganisms attached to one or various synthetic parts. In recent years nanoscopic and mesoscopic objects have been designed to collectively move through direct inspiration from nature or by harnessing its existing tools. Small mesoscopic to nanoscopic systems typically operate at low Reynolds numbers (Re ≪ 1), and understanding their motion becomes challenging. For locomotion to occur, the symmetry of the system must be broken. In addition, collective motion requires a coupling mechanism between the entities that make up the collective. To develop mesoscopic to nanoscopic entities capable of swarming behaviour, it has been hypothesised that the entities are characterised by broken symmetry with a well-defined morphology, and are powered with some material capable of harvesting energy. If the harvested energy results in a field surrounding the object, then this field can couple with the field of a neighbouring object and bring some coordination to the collective behaviour. Such robotic swarms have been categorised by an online expert panel as among the 10 great unresolved group challenges in the area of robotics. Although investigation of their underlying mechanism of action is still in its infancy, various systems have been developed that are capable of undergoing controlled and uncontrolled swarming motion by harvesting energy (e.g., light, thermal, etc.). Over the past decade, biohybrid microrobots, in which living mobile microorganisms are physically integrated with untethered artificial structures, have gained growing interest to enable the active locomotion and cargo delivery to a target destination. In addition to the motility, the intrinsic capabilities of sensing and eliciting an appropriate response to artificial and environmental changes make cell-based biohybrid microrobots appealing for transportation of cargo to the inaccessible cavities of the human body for local active delivery of diagnostic and therapeutic agents. == Background == Biohybrid microswimmers can be defined as microswimmers that consist of both biological and artificial constituents, for instance, one or several living microorganisms attached to one or various synthetic parts. The pioneers of this field, ahead of their time, were Montemagno and Bachand with a 1999 work regarding specific attachment strategies of biological molecules to nanofabricated substrates enabling the preparation of hybrid inorganic/organic nanoelectromechanical systems, so called NEMS. They described the production of large amounts of F1-ATPase from the thermophilic bacteria Bacillus PS3 for the preparation of F1-ATPase biomolecular motors immobilized on a nanoarray pattern of gold, copper or nickel produced by electron beam lithography. These proteins were attached to one micron microspheres tagged with a synthetic peptide. Consequently, they accomplished the preparation of a platform with chemically active sites and the development of biohybrid devices capable of converting energy of biomolecular motors into useful work. One of the most fundamental questions in science is what defines life. Collective motion is one of the hallmarks of life. This is commonly observed in nature at various dimensional levels as energized entities gather, in a concerted effort, into motile aggregated patterns. These motile aggregated events can be noticed, among many others, as dynamic swarms; e.g., unicellular organisms such as bacteria, locust swarms, or the flocking behaviour of birds. Ever since Newton established his equations of motion, the mystery of motion on the microscale has emerged frequently in scientific history, as famously demonstrated by a couple of articles that should be discussed briefly. First, an essential concept, popularized by Osborne Reynolds, is that the relative importance of inertia and viscosity for the motion of a fluid depends on certain details of the system under consideration. The Reynolds number Re, named in his honor, quantifies this comparison as a dimensionless ratio of characteristic inertial and viscous forces: R e = ρ u l μ {\displaystyle \mathrm {Re} ={\frac {\rho ul}{\mu }}} Here, ρ represents the density of the fluid; u is a characteristic velocity of the system (for instance, the velocity of a swimming particle); l is a characteristic length scale (e.g., the swimmer size); and μ is the viscosity of the fluid. Taking the suspending fluid to be water, and using experimentally observed values for u, one can determine that inertia is important for macroscopic swimmers like fish (Re = 100), while viscosity dominates the motion of microscale swimmers like bacteria (Re = 10−4). The overwhelming importance of viscosity for swimming at the micrometer scale has profound implications for swimming strategy. This has been discussed memorably by E. M. Purcell, who invited the reader into the world of microorganisms and theoretically studied the conditions of their motion. In the first place, propulsion strategies of large scale swimmers often involve imparting momentum to the surrounding fluid in periodic discrete events, such as vortex shedding, and coasting between these events through inertia. This cannot be effective for microscale swimmers like bacteria: due to the large viscous damping, the inertial coasting time of a micron-sized object is on the order of 1 μs. The coasting distance of a microorganism moving at a typical speed is about 0.1 angstroms (Å). Purcell concluded that only forces that are exerted in the present moment on a microscale body contribute to its propulsion, so a constant energy conversion method is essential. Microorganisms have optimized their metabolism for continuous energy production, while purely artificial microswimmers (microrobots) must obtain energy from the environment, since their on-board-storage-capacity is very limited. As a further consequence of the continuous dissipation of energy, biological and artificial microswimmers do not obey the laws of equilibrium statistical physics, and need to be described by non-equilibrium dynamics. Mathematically, Purcell explored the implications of low Reynolds number by taking the Navier-Stokes equation and eliminating the inertial terms: μ ∇ 2 u − ∇ p = 0 {\displaystyle {\begin{aligned}\mu \nabla ^{2}\mathbf {u} -{\boldsymbol {\nabla }}p&={\boldsymbol {0}}\\\end{aligned}}} where u {\displaystyle \mathbf {u} } is the velocity of the fluid and ∇ p {\displaystyle {\boldsymbol {\nabla }}p} is the gradient of the pressure. As Purcell noted, the resulting equation — the Stokes equation — contains no explicit time dependence. This has some important consequences for how a suspended body (e.g., a bacterium) can swim through periodic mechanical motions or deformations (e.g., of a flagellum). First, the rate of motion is practically irrelevant for the motion of the microswimmer and of the surrounding fluid: changing the rate of motion will change the scale of the velocities of the fluid and of the microswimmer, but it will not change the pattern of fluid flow. Secondly, reversing the direction of mechanical motion will simply reverse all velocities in the system. These properties of the Stokes equation severely restrict the range of feasible swimming strategies. Recent publications of biohybrid microswimmers include the use of sperm cells, contractive muscle cells, and bacteria as biological components, as they can efficiently convert chemical energy into movement, and additionally are capable of performing complicated motion depending on environmental conditions. In this sense, biohybrid microswimmer systems can be described as the combination of different functional components: cargo and carrier. The cargo is an element of interest to be moved (and possibly released) in a customized way. The carrier is the component responsible for the movement of the biohybrid, transporting the desired cargo, which is linked to its surface. The great majority of these systems rely on biological motile propulsion for the transportation of synthetic cargo for targeted drug delivery/ There are also examples of the opposite case: artificial microswimmers with biological cargo systems. Over the past decade, biohybrid microrobots, in which living mobile microorganisms are physically integrated with untethered artificial structures, have gained growing interest to enable the active locomotion and cargo delivery to a target destination. In addition to the motility, the intrinsic capabilities of sensing and eliciting an appropriate response to artificial and environmental changes make cell-based biohybrid microrobots appealing for transportation of cargo to the inaccessible cavities of the human body for local active delivery of diagnostic and therapeutic agents. Active locomotion, targeting and steering of concentrated therape

    Read more →
  • Analogical modeling

    Analogical modeling

    Analogical modeling (AM) is a formal theory of exemplar based analogical reasoning, proposed by Royal Skousen, professor of Linguistics and English language at Brigham Young University in Provo, Utah. It is applicable to language modeling and other categorization tasks. Analogical modeling is related to connectionism and nearest neighbor approaches, in that it is data-based rather than abstraction-based; but it is distinguished by its ability to cope with imperfect datasets (such as caused by simulated short term memory limits) and to base predictions on all relevant segments of the dataset, whether near or far. In language modeling, AM has successfully predicted empirically valid forms for which no theoretical explanation was known (see the discussion of Finnish morphology in Skousen et al. 2002). == Implementation == === Overview === An exemplar-based model consists of a general-purpose modeling engine and a problem-specific dataset. Within the dataset, each exemplar (a case to be reasoned from, or an informative past experience) appears as a feature vector: a row of values for the set of parameters that define the problem. For example, in a spelling-to-sound task, the feature vector might consist of the letters of a word. Each exemplar in the dataset is stored with an outcome, such as a phoneme or phone to be generated. When the model is presented with a novel situation (in the form of an outcome-less feature vector), the engine algorithmically sorts the dataset to find exemplars that helpfully resemble it, and selects one, whose outcome is the model's prediction. The particulars of the algorithm distinguish one exemplar-based modeling system from another. In AM, we think of the feature values as characterizing a context, and the outcome as a behavior that occurs within that context. Accordingly, the novel situation is known as the given context. Given the known features of the context, the AM engine systematically generates all contexts that include it (all of its supracontexts), and extracts from the dataset the exemplars that belong to each. The engine then discards those supracontexts whose outcomes are inconsistent (this measure of consistency will be discussed further below), leaving an analogical set of supracontexts, and probabilistically selects an exemplar from the analogical set with a bias toward those in large supracontexts. This multilevel search exponentially magnifies the likelihood of a behavior's being predicted as it occurs reliably in settings that specifically resemble the given context. === Analogical modeling in detail === AM performs the same process for each case it is asked to evaluate. The given context, consisting of n variables, is used as a template to generate 2 n {\displaystyle 2^{n}} supracontexts. Each supracontext is a set of exemplars in which one or more variables have the same values that they do in the given context, and the other variables are ignored. In effect, each is a view of the data, created by filtering for some criteria of similarity to the given context, and the total set of supracontexts exhausts all such views. Alternatively, each supracontext is a theory of the task or a proposed rule whose predictive power needs to be evaluated. It is important to note that the supracontexts are not equal peers one with another; they are arranged by their distance from the given context, forming a hierarchy. If a supracontext specifies all of the variables that another one does and more, it is a subcontext of that other one, and it lies closer to the given context. (The hierarchy is not strictly branching; each supracontext can itself be a subcontext of several others, and can have several subcontexts.) This hierarchy becomes significant in the next step of the algorithm. The engine now chooses the analogical set from among the supracontexts. A supracontext may contain exemplars that only exhibit one behavior; it is deterministically homogeneous and is included. It is a view of the data that displays regularity, or a relevant theory that has never yet been disproven. A supracontext may exhibit several behaviors, but contain no exemplars that occur in any more specific supracontext (that is, in any of its subcontexts); in this case it is non-deterministically homogeneous and is included. Here there is no great evidence that a systematic behavior occurs, but also no counterargument. Finally, a supracontext may be heterogeneous, meaning that it exhibits behaviors that are found in a subcontext (closer to the given context), and also behaviors that are not. Where the ambiguous behavior of the nondeterministically homogeneous supracontext was accepted, this is rejected because the intervening subcontext demonstrates that there is a better theory to be found. The heterogeneous supracontext is therefore excluded. This guarantees that we see an increase in meaningfully consistent behavior in the analogical set as we approach the given context. With the analogical set chosen, each appearance of an exemplar (for a given exemplar may appear in several of the analogical supracontexts) is given a pointer to every other appearance of an exemplar within its supracontexts. One of these pointers is then selected at random and followed, and the exemplar to which it points provides the outcome. This gives each supracontext an importance proportional to the square of its size, and makes each exemplar likely to be selected in direct proportion to the sum of the sizes of all analogically consistent supracontexts in which it appears. Then, of course, the probability of predicting a particular outcome is proportional to the summed probabilities of all the exemplars that support it. (Skousen 2002, in Skousen et al. 2002, pp. 11–25, and Skousen 2003, both passim) === Formulas === Given a context with n {\displaystyle n} elements: total number of pairings: n 2 {\displaystyle n^{2}} number of agreements for outcome i: n i 2 {\displaystyle n_{i}^{2}} number of disagreements for outcome i: n i ( n − n i ) {\displaystyle n_{i}(n-n_{i})} total number of agreements: ∑ n i 2 {\displaystyle \sum {n_{i}^{2}}} total number of disagreements: ∑ n i ( n − n i ) = n 2 − ∑ n i 2 {\displaystyle \sum {n_{i}(n-n_{i})}=n^{2}-\sum {n_{i}^{2}}} === Example === This terminology is best understood through an example. In the example used in the second chapter of Skousen (1989), each context consists of three variables with potential values 0-3 Variable 1: 0,1,2,3 Variable 2: 0,1,2,3 Variable 3: 0,1,2,3 The two outcomes for the dataset are e and r, and the exemplars are: 3 1 0 e 0 3 2 r 2 1 0 r 2 1 2 r 3 1 1 r We define a network of pointers like so: The solid lines represent pointers between exemplars with matching outcomes; the dotted lines represent pointers between exemplars with non-matching outcomes. The statistics for this example are as follows: n = 5 {\displaystyle n=5} n r = 4 {\displaystyle n_{r}=4} n e = 1 {\displaystyle n_{e}=1} total number of pairings: n 2 = 25 {\displaystyle n^{2}=25} number of agreements for outcome r: n r 2 = 16 {\displaystyle n_{r}^{2}=16} number of agreements for outcome e: n e 2 = 1 {\displaystyle n_{e}^{2}=1} number of disagreements for outcome r: n r ( n − n r ) = 4 {\displaystyle n_{r}(n-n_{r})=4} number of disagreements for outcome e: n e ( n − n e ) = 4 {\displaystyle n_{e}(n-n_{e})=4} total number of agreements: n r 2 + n e 2 = 17 {\displaystyle n_{r}^{2}+n_{e}^{2}=17} total number of disagreements: n r ( n − n r ) + n e ( n − n e ) = n 2 − ( n r 2 + n e 2 ) = 8 {\displaystyle n_{r}(n-n_{r})+n_{e}(n-n_{e})=n^{2}-(n_{r}^{2}+n_{e}^{2})=8} uncertainty or fraction of disagreement: 8 / 25 = .32 {\displaystyle 8/25=.32} Behavior can only be predicted for a given context; in this example, let us predict the outcome for the context "3 1 2". To do this, we first find all of the contexts containing the given context; these contexts are called supracontexts. We find the supracontexts by systematically eliminating the variables in the given context; with m variables, there will generally be 2 m {\displaystyle 2^{m}} supracontexts. The following table lists each of the sub- and supracontexts; x means "not x", and - means "anything". These contexts are shown in the venn diagram below: The next step is to determine which exemplars belong to which contexts in order to determine which of the contexts are homogeneous. The table below shows each of the subcontexts, their behavior in terms of the given exemplars, and the number of disagreements within the behavior: Analyzing the subcontexts in the table above, we see that there is only 1 subcontext with any disagreements: "3 1 2", which in the dataset consists of "3 1 0 e" and "3 1 1 r". There are 2 disagreements in this subcontext; 1 pointing from each of the exemplars to the other (see the pointer network pictured above). Therefore, only supracontexts containing this subcontext will contain any disagreements. We use a simple rule to identify the homogeneous supraco

    Read more →
  • Dendrogram

    Dendrogram

    A dendrogram is a diagram representing a tree graph. This diagrammatic representation is frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. in computational biology, it shows the clustering of genes or samples, sometimes in the margins of heatmaps. in phylogenetics, it displays the evolutionary relationships among various biological taxa. In this case, the dendrogram is also called a phylogenetic tree. The name dendrogram derives from the two ancient greek words δένδρον (déndron), meaning "tree", and γράμμα (grámma), meaning "drawing, mathematical figure". == Clustering example == For a clustering example, suppose that five taxa ( a {\displaystyle a} to e {\displaystyle e} ) have been clustered by UPGMA based on a matrix of genetic distances. The hierarchical clustering dendrogram would show a column of five nodes representing the initial data (here individual taxa), and the remaining nodes represent the clusters to which the data belong, with the arrows representing the distance (dissimilarity). The distance between merged clusters is monotone, increasing with the level of the merger: the height of each node in the plot is proportional to the value of the intergroup dissimilarity between its two daughters (the nodes on the right representing individual observations all plotted at zero height).

    Read more →
  • KXEN Inc.

    KXEN Inc.

    KXEN was an American software company which existed from 1998 to 2013 when it was acquired by SAP AG. == History == KXEN was founded in June 1998 by Roger Haddad and Michel Bera. It was based in San Francisco, California with offices in Paris and London. On September 10, 2013, SAP AG announced plans to acquire KXEN. On October 1, 2013, a letter to KXEN customers announced the acquisition closed. KXEN primarily marketed predictive analytics software. == Predictive analytics == InfiniteInsight is a predictive modeling suite developed by KXEN that assists analytic professionals, and business executives to extract information from data. Among other functions, InfiniteInsight is used for variable importance, classification, regression, segmentation, time series, product recommendation, as described and expressed by the Java Data Mining interface, and for social network analysis. InfiniteInsight allows prediction of a behavior or a value, the forecast of a time series or the understanding of a group of individuals with similar behavior. Advanced functions include behavioral modeling, exporting the model code into different target environments or building predictive models on top of SAS or SPSS data files. Competitors are SAS Enterprise Miner, IBM SPSS Modeler, and Statistica. Open source predictive tools like the R package or Weka are also competitors, since they provide similar features free of charge.

    Read more →
  • Eigenface

    Eigenface

    An eigenface ( EYE-gən-) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. == History == The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En). In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels. Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes. == Generation == A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face. The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face. The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. === Practical implementation === To create a set of eigenfaces, one must: Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance ⁠ v = ( λ 1 + λ 2 + . . . + λ n ) {\displaystyle v=(\lambda _{1}+\lambda _{2}+...+\lambda _{n})} ⁠, n = number of components, and λ {\displaystyle \lambda } represents component eigenvalue. k is the smallest number that satisfies ( λ 1 + λ 2 + . . . + λ k ) v > ϵ {\displaystyle {\frac {(\lambda _{1}+\lambda _{2}+...+\lambda _{k})}{v}}>\epsilon } These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded. === Matlab example code === Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16. Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code: === Computing the eigenvectors === Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows. Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by S v i = T T T v i = λ i v i {\displaystyle \mathbf {Sv} _{i}=\mathbf {T} \mathbf {T} ^{T}\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} However TTT is a large matrix, and if instead we take the eigenvalue decomposition of T T T u i = λ i u i {\displaystyle \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {u} _{i}} then we notice that by pre-multiplying both sides of the equation with T, we obtain T T T T u i = λ i T u i {\displaystyle \mathbf {T} \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {T} \mathbf {u} _{i}} Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have

    Read more →
  • Multidimensional analysis

    Multidimensional analysis

    In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. For example, a data set consisting of the number of wins for a single football team at each of several years is a single-dimensional (in this case, longitudinal) data set. A data set consisting of the number of wins for several football teams in a single year is also a single-dimensional (in this case, cross-sectional) data set. A data set consisting of the number of wins for several football teams over several years is a two-dimensional data set. == Higher dimensions == In many disciplines, two-dimensional data sets are also called panel data. While, strictly speaking, two- and higher-dimensional data sets are "multi-dimensional", the term "multidimensional" tends to be applied only to data sets with three or more dimensions. For example, some forecast data sets provide forecasts for multiple target periods, conducted by multiple forecasters, and made at multiple horizons. The three dimensions provide more information than can be gleaned from two-dimensional panel data sets. == Software == Computer software for MDA include Online analytical processing (OLAP) for data in relational databases, pivot tables for data in spreadsheets, and Array DBMSs for general multi-dimensional data (such as raster data) in science, engineering, and business.

    Read more →
  • Induction of regular languages

    Induction of regular languages

    In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way (see language identification in the limit), approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction. == Definitions == A regular language is defined as a (finite or infinite) set of strings that can be described by one of the mathematical formalisms called "finite automaton", "regular grammar", or "regular expression", all of which have the same expressive power. Since the latter formalism leads to shortest notations, it shall be introduced and used here. Given a set Σ of symbols (a.k.a. alphabet), a regular expression can be any of ∅ (denoting the empty set of strings), ε (denoting the singleton set containing just the empty string), a (where a is any character in Σ; denoting the singleton set just containing the single-character string a), r + s (where r and s are, in turn, simpler regular expressions; denoting their set's union) r ⋅ s (denoting the set of all possible concatenations of strings from r's and s's set), r + (denoting the set of n-fold repetitions of strings from r's set, for any n ≥ 1), or r (similarly denoting the set of n-fold repetitions, but also including the empty string, seen as 0-fold repetition). For example, using Σ = {0,1}, the regular expression (0+1+ε)⋅(0+1) denotes the set of all binary numbers with one or two digits (leading zero allowed), while 1⋅(0+1)⋅0 denotes the (infinite) set of all even binary numbers (no leading zeroes). Given a set of strings (also called "positive examples"), the task of regular language induction is to come up with a regular expression that denotes a set containing all of them. As an example, given {1, 10, 100}, a "natural" description could be the regular expression 1⋅0, corresponding to the informal characterization "a 1 followed by arbitrarily many (maybe even none) 0's". However, (0+1) and 1+(1⋅0)+(1⋅0⋅0) is another regular expression, denoting the largest (assuming Σ = {0,1}) and the smallest set containing the given strings, and called the trivial overgeneralization and undergeneralization, respectively. Some approaches work in an extended setting where also a set of "negative example" strings is given; then, a regular expression is to be found that generates all of the positive, but none of the negative examples. == Lattice of automata == Dupont et al. have shown that the set of all structurally complete finite automata generating a given input set of example strings forms a lattice, with the trivial undergeneralized and the trivial overgeneralized automaton as bottom and top element, respectively. Each member of this lattice can be obtained by factoring the undergeneralized automaton by an appropriate equivalence relation. For the above example string set {1, 10, 100}, the picture shows at its bottom the undergeneralized automaton Aa,b,c,d in grey, consisting of states a, b, c, and d. On the state set {a,b,c,d}, a total of 15 equivalence relations exist, forming a lattice. Mapping each equivalence E to the corresponding quotient automaton language L(Aa,b,c,d / E) obtains the partially ordered set shown in the picture. Each node's language is denoted by a regular expression. The language may be recognized by quotient automata w.r.t. different equivalence relations, all of which are shown below the node. An arrow between two nodes indicates that the lower node's language is a proper subset of the higher node's. If both positive and negative example strings are given, Dupont et al. build the lattice from the positive examples, and then investigate the separation border between automata that generate some negative example and such that do not. Most interesting are those automata immediately below the border. In the picture, separation borders are shown for the negative example strings 11 (green), 1001 (blue), 101 (cyan), and 0 (red). Coste and Nicolas present an own search method within the lattice, which they relate to Mitchell's version space paradigm. To find the separation border, they use a graph coloring algorithm on the state inequality relation induced by the negative examples. Later, they investigate several ordering relations on the set of all possible state fusions. Kudo and Shimbo use the representation by automaton factorizations to give a unique framework for the following approaches (sketched below): k-reversible languages and the "tail clustering" follow-up approach, Successor automata and the predecessor-successor method, and pumping-based approaches (framework-integration challenged by Luzeaux, however). Each of these approaches is shown to correspond to a particular kind of equivalence relations used for factorization. == Approaches == === k-reversible languages === Angluin considers so-called "k-reversible" regular automata, that is, deterministic automata in which each state can be reached from at most one state by following a transition chain of length k. Formally, if Σ, Q, and δ denote the input alphabet, the state set, and the transition function of an automaton A, respectively, then A is called k-reversible if: ∀a0, ..., ak ∈ Σ ∀s1, s2 ∈ Q: δ(s1, a0...ak) = δ(s2, a0...ak) ⇒ s1 = s2, where δ means the homomorphic extension of δ to arbitrary words. Angluin gives a cubic algorithm for learning of the smallest k-reversible language from a given set of input words; for k = 0, the algorithm has even almost linear complexity. The required state uniqueness after k + 1 given symbols forces unifying automaton states, thus leading to a proper generalization different from the trivial undergeneralized automaton. This algorithm has been used to learn simple parts of English syntax; later, an incremental version has been provided. Another approach based on k-reversible automata is the tail clustering method. === Successor automata === From a given set of input strings, Vernadat and Richetin build a so-called successor automaton, consisting of one state for each distinct character and a transition between each two adjacent characters' states. For example, the singleton input set {aabbaabb} leads to an automaton corresponding to the regular expression (a+⋅b+). An extension of this approach is the predecessor-successor method which generalizes each character repetition immediately to a Kleene + and then includes for each character the set of its possible predecessors in its state. Successor automata can learn exactly the class of local languages. Since each regular language is the homomorphic image of a local language, grammars from the former class can be learned by lifting, if an appropriate (depending on the intended application) homomorphism is provided. In particular, there is such a homomorphism for the class of languages learnable by the predecessor-successor method. The learnability of local languages can be reduced to that of k-reversible languages. === Early approaches === Chomsky and Miller (1957) used the pumping lemma: they guess a part v of an input string uvw and try to build a corresponding cycle into the automaton to be learned; using membership queries they ask, for appropriate k, which of the strings uw, uvvw, uvvvw, ..., uvkw also belongs to the language to be learned, thereby refining the structure of their automaton. In 1959, Solomonoff generalized this approach to context-free languages, which also obey a pumping lemma. === Cover automata === Câmpeanu et al. learn a finite automaton as a compact representation of a large finite language. Given such a language F, they search a so-called cover automaton A such that its language L(A) covers F in the following sense: L(A) ∩ Σ≤ l = F, where l is the length of the longest string in F, and Σ≤ l denotes the set of all strings not longer than l. If such a cover automaton exists, F is uniquely determined by A and l. For example, F = {ad, read, reread } has l = 6 and a cover automaton corresponding to the regular expression (r⋅e)⋅a⋅d. For two strings x and y, Câmpeanu et al. define x ~ y if xz ∈ F ⇔ yz ∈ F for all strings z of a length such that both xz and yz are not longer than l. Based on this relation, whose lack of transitivity causes considerable technical problems, they give an O(n4) algorithm to construct from F a cover automaton A of minimal state count. Moreover, for union, intersection, and difference of two finite languages they provide corresponding operations on their cover automata. Păun et al. improve the time complexity to O(n2). === Residual automata === For a set S of strings and a string u, the Brzozowski derivative u−1S is defined as the set of all rest-strings obtainable from a string in S by cutting off its prefix u (if possible), formally: u−1S = {v ∈ Σ: uv ∈ S}, cf. picture. Denis et al. define a

    Read more →
  • Quantum neural network

    Quantum neural network

    Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models (which are widely used in machine learning for the important task of pattern recognition) with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments. Most Quantum neural networks are developed as feed-forward networks. Similar to their classical counterparts, this structure intakes input from one layer of qubits, and passes that input onto another layer of qubits. This layer of qubits evaluates this information and passes on the output to the next layer. Eventually the path leads to the final layer of qubits. The layers do not have to be of the same width, meaning they don't have to have the same number of qubits as the layer before or after it. This structure is trained on which path to take similar to classical artificial neural networks. This is discussed in a lower section. Quantum neural networks refer to three different categories: Quantum computer with classical data, classical computer with quantum data, and quantum computer with quantum data. == Examples == Quantum neural network research is still in its infancy, and a conglomeration of proposals and ideas of varying scope and mathematical rigor have been put forward. Most of them are based on the idea of replacing classical binary or McCulloch-Pitts neurons with a qubit (which can be called a "quron"), resulting in neural units that can be in a superposition of the state 'firing' and 'resting'. === Quantum perceptrons === A lot of proposals attempt to find a quantum equivalent for the perceptron unit from which neural nets are constructed. A problem is that nonlinear activation functions do not immediately correspond to the mathematical structure of quantum theory, since a quantum evolution is described by linear operations and leads to probabilistic observation. Ideas to imitate the perceptron activation function with a quantum mechanical formalism reach from special measurements to postulating non-linear quantum operators (a mathematical framework that is disputed). A direct implementation of the activation function using the circuit-based model of quantum computation has recently been proposed by Schuld, Sinayskiy and Petruccione based on the quantum phase estimation algorithm. === Quantum networks === At a larger scale, researchers have attempted to generalize neural networks to the quantum setting. One way of constructing a quantum neuron is to first generalise classical neurons and then generalising them further to make unitary gates. Interactions between neurons can be controlled quantumly, with unitary gates, or classically, via measurement of the network states. This high-level theoretical technique can be applied broadly, by taking different types of networks and different implementations of quantum neurons, such as photonically implemented neurons and quantum reservoir processor (quantum version of reservoir computing). Most learning algorithms follow the classical model of training an artificial neural network to learn the input-output function of a given training set and use classical feedback loops to update parameters of the quantum system until they converge to an optimal configuration. Learning as a parameter optimisation problem has also been approached by adiabatic models of quantum computing. Quantum neural networks can be applied to algorithmic design: given qubits with tunable mutual interactions, one can attempt to learn interactions following the classical backpropagation rule from a training set of desired input-output relations, taken to be the desired output algorithm's behavior. The quantum network thus 'learns' an algorithm. === Quantum associative memory === The first quantum associative memory algorithm was introduced by Dan Ventura and Tony Martinez in 1999. The authors do not attempt to translate the structure of artificial neural network models into quantum theory, but propose an algorithm for a circuit-based quantum computer that simulates associative memory. The memory states (in Hopfield neural networks saved in the weights of the neural connections) are written into a superposition, and a Grover-like quantum search algorithm retrieves the memory state closest to a given input. As such, this is not a fully content-addressable memory, since only incomplete patterns can be retrieved. The first truly content-addressable quantum memory, which can retrieve patterns also from corrupted inputs, was proposed by Carlo A. Trugenberger. Both memories can store an exponential (in terms of n qubits) number of patterns but can be used only once due to the no-cloning theorem and their destruction upon measurement. Trugenberger, however, has shown that his probabilistic model of quantum associative memory can be efficiently implemented and re-used multiples times for any polynomial number of stored patterns, a large advantage with respect to classical associative memories. === Classical neural networks inspired by quantum theory === A substantial amount of interest has been given to a "quantum-inspired" model that uses ideas from quantum theory to implement a neural network based on fuzzy logic. == Training == Quantum Neural Networks can be theoretically trained similarly to training classical/artificial neural networks. A key difference lies in communication between the layers of a neural networks. For classical neural networks, at the end of a given operation, the current perceptron copies its output to the next layer of perceptron(s) in the network. However, in a quantum neural network, where each perceptron is a qubit, this would violate the no-cloning theorem. A proposed generalized solution to this is to replace the classical fan-out method with an arbitrary unitary that spreads out, but does not copy, the output of one qubit to the next layer of qubits. Using this fan-out Unitary ( U f {\displaystyle U_{f}} ) with a dummy state qubit in a known state (Ex. | 0 ⟩ {\displaystyle |0\rangle } in the computational basis), also known as an Ancilla bit, the information from the qubit can be transferred to the next layer of qubits. This process adheres to the quantum operation requirement of reversibility. Using this quantum feed-forward network, deep neural networks can be executed and trained efficiently. A deep neural network is essentially a network with many hidden-layers, as seen in the sample model neural network above. Since the Quantum neural network being discussed uses fan-out Unitary operators, and each operator only acts on its respective input, only two layers are used at any given time. In other words, no Unitary operator is acting on the entire network at any given time, meaning the number of qubits required for a given step depends on the number of inputs in a given layer. Since Quantum Computers are notorious for their ability to run multiple iterations in a short period of time, the efficiency of a quantum neural network is solely dependent on the number of qubits in any given layer, and not on the depth of the network. === Cost functions === To determine the effectiveness of a neural network, a cost function is used, which essentially measures the proximity of the network's output to the expected or desired output. In a Classical Neural Network, the weights ( w {\displaystyle w} ) and biases ( b {\displaystyle b} ) at each step determine the outcome of the cost function C ( w , b ) {\displaystyle C(w,b)} . When training a Classical Neural network, the weights and biases are adjusted after each iteration, and given equation 1 below, where y ( x ) {\displaystyle y(x)} is the desired output and a out ( x ) {\displaystyle a^{\text{out}}(x)} is the actual output, the cost function is optimized when C ( w , b ) {\displaystyle C(w,b)} = 0. For a quantum neural network, the cost function is determined by measuring the fidelity of the outcome state ( ρ out {\displaystyle \rho ^{\text{out}}} ) with the desired outcome state ( ϕ out {\displaystyle \phi ^{\text{out}}} ), seen in Equation 2 below. In this case, the Unitary operators are adjusted after each it

    Read more →
  • Security of the Java software platform

    Security of the Java software platform

    The Java software platform provides a number of features designed for improving the security of Java applications. This includes enforcing runtime constraints through the use of the Java Virtual Machine (JVM), a security manager that sandboxes untrusted code from the rest of the operating system, and a suite of security APIs that Java developers can utilise. Despite this, criticism has been directed at the programming language, and Oracle, due to an increase in malicious programs that revealed security vulnerabilities in the JVM, which were subsequently not properly addressed by Oracle in a timely manner. == Security features == === The JVM === The binary form of programs running on the Java platform is not native machine code but an intermediate bytecode. The JVM performs verification on this bytecode before running it to prevent the program from performing unsafe operations such as branching to incorrect locations, which may contain data rather than instructions. It also allows the JVM to enforce runtime constraints such as array bounds checking. This means that Java programs are significantly less likely to suffer from memory safety flaws such as buffer overflow than programs written in languages such as C which do not provide such memory safety guarantees. The platform does not allow programs to perform certain potentially unsafe operations such as pointer arithmetic or unchecked type casts. It manages memory allocation and initialization and provides automatic garbage collection which in many cases (but not all) relieves the developer from manual memory management. This contributes to type safety and memory safety. === Security manager === The platform provides a security manager which allows users to run untrusted bytecode in a "sandboxed" environment designed to protect them from malicious or poorly written software by preventing the untrusted code from accessing certain platform features and APIs. For example, untrusted code might be prevented from reading or writing files on the local filesystem, running arbitrary commands with the current user's privileges, accessing communication networks, accessing the internal private state of objects using reflection, or causing the JVM to exit. The security manager also allows Java programs to be cryptographically signed; users can choose to allow code with a valid digital signature from a trusted entity to run with full privileges in circumstances where it would otherwise be untrusted. Users can also set fine-grained access control policies for programs from different sources. For example, a user may decide that only system classes should be fully trusted, that code from certain trusted entities may be allowed to read certain specific files, and that all other code should be fully sandboxed. === Security APIs === The Java Class Library provides a number of APIs related to security, such as standard cryptographic algorithms, authentication, and secure communication protocols. === The sun.misc.Unsafe class === sun.misc.Unsafe is an internal utility class in the Java programming language which is a collection of low-level unsafe operations. While it is not a part of the official Java Class Library, it is called internally by the Java libraries. It resides in an unofficial Java module named jdk.unsupported. Beginning in Java 11, it has been partially migrated to jdk.internal.misc.Unsafe (which resides in module java.base). Its primary feature is to allow direct memory management (similar to C memory management) and memory address manipulation, manipulating objects and fields, thread manipulation, and concurrency primitives. Its declaration is: public final class Unsafe;, and it is a singleton class with a private constructor. It contains the following methods, many of which are declared native (invoking Java Native Interface): static Unsafe getUnsafe(): retrieves the Unsafe instance. It uses sun.reflect.Reflection to do so. int getInt(Object o, long offset): fetches a value (a field or array element) in the object at the given offset. (There are corresponding getBoolean(), getByte(), getShort(), getChar(), getLong(), getFloat(), and getDouble() methods as well.) void putInt(Object o, long offset, int x): stores a value into an object at the given offset. (There are corresponding putBoolean(), putByte(), putShort(), putChar(), putLong(), putFloat(), and putDouble() methods as well.) Object getObject(Object o, long offset): fetches a reference value from an object at the given offset. void putObject(Object o, long offset, Object x): stores a reference value into an object at the given offset. int getInt(long address): fetches a value at the given address. (There are corresponding getBoolean(), getByte(), getShort(), getChar(), getLong(), getFloat(), and getDouble() methods as well.) void putInt(long address, int x): stores a value into the given address. (There are corresponding putBoolean(), putByte(), putShort(), putChar(), putLong(), putFloat(), and putDouble() methods as well.) long getAddress(long address): fetches a native pointer from a given address. void putAddress(long address, long x): stores a native pointer into a given address. long allocateMemory(long bytes): allocates a block of native memory of the given size (similar to malloc()). long reallocateMemory(long address, long bytes): resizes a block of native memory to the given size (similar to realloc()). void setMemory(Object o, long offset, long bytes, byte value), void setMemory(long address, long bytes, byte value): sets all bytes in a block of memory to a fixed value (similar to memset()). void copyMemory(Object srcBase, long srcOffset, Object destBase, long destOffset, long bytes), void copyMemory(long srcAddress, long destAddress, long bytes): sets all bytes in a given block of memory to a copy of another block (similar to memcpy()). void freeMemory(long address): deallocates a block of native memory obtained from allocateMemory() or reallocateMemory(), similar to free()). long staticFieldOffset(Field f): obtains the location of a given field in the storage allocation of its class. long objectFieldOffset(Field f): obtains the location of a given static field in conjunction with staticFieldBase(). Object staticFieldBase(Field f): obtains the location of a given static field in conjunction with staticFieldOffset(). void ensureClassInitialized(Class c): ensures the given class has been initialized. int arrayBaseOffset(Class arrayClass): obtains the offset of the first element in the storage allocation of a given array class. int arrayIndexScale(Class arrayClass): obtains the scale factor for addressing elements in the storage allocation of a given array class. static int addressSize(): obtains the size (in bytes) of a native pointer. int pageSize(): obtains the size (in bytes) of a native memory page. Class defineClass(String name, byte[] b, int off, int len, ClassLoader loader, ProtectionDomain protectionDomain): signals to the JVM to define a class without security checks. Class defineAnonymousClass(Class hostClass, byte[] data, Object[] cpPatches): signals to the JVM to define a class but do not make it known to the class loader or system directory. Object allocateInstance(Class cls) throws InstantiationException: allocates an instance of a class without running its constructor. void monitorEnter(Object o): locks an object. void monitorExit(Object o): unlocks an object. boolean tryMonitorEnter(Object o): tries to lock an object, returning whether the lock succeeded. void throwException(Throwable ee): throws an exception without telling the verifier. final boolean compareAndSwapInt(Object o, long offset, int expected, int x): updates a variable to x if it is holding expected, returning whether the operation succeeded. (There are corresponding compareAndSwapLong() and compareAndSwapObject() methods as well.) int getIntVolatile(Object o, long offset): volatile version of getInt(). (There are corresponding getBooleanVolatile(), getByteVolatile(), getShortVolatile(), getCharVolatile(), getLongVolatile(), getFloatVolatile(), getDoubleVolatile(), and getObjectVolatile() methods as well.) void putIntVolatile(Object o, long offset, int x): volatile version of putInt(). (There are corresponding putBooleanVolatile(), putByteVolatile(), putShortVolatile(), putCharVolatile(), putLongVolatile(), putFloatVolatile(), putDoubleVolatile(), and putObjectVolatile() methods as well.) void putOrderedInt(Object o, long offset, int x): version of putIntVolatile() not guaranteeing immediate visibility of storage to other threads. (There are corresponding putOrderedLong() and putOrderedObject() methods as well.) void unpark(Object thread): unblocks a thread. void park(boolean isAbsolute, long time): blocks the current thread. int getLoadAverage(double[] loadavg, int nelems): gets the load average in the system run queue assigned to available processors averaged over various periods of time. void invokeCleaner(ByteBuffe

    Read more →
  • Low-rank approximation

    Low-rank approximation

    In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure. Low-rank approximation is closely related to numerous other techniques, including principal component analysis, factor analysis, total least squares, latent semantic analysis, orthogonal regression, and dynamic mode decomposition. == Definition == Given structure specification S : R n p → R m × n {\displaystyle {\mathcal {S}}:\mathbb {R} ^{n_{p}}\to \mathbb {R} ^{m\times n}} , vector of structure parameters p ∈ R n p {\displaystyle p\in \mathbb {R} ^{n_{p}}} , norm ‖ ⋅ ‖ {\displaystyle \|\cdot \|} , and desired rank r {\displaystyle r} , minimize over p ^ ‖ p − p ^ ‖ subject to rank ⁡ ( S ( p ^ ) ) ≤ r . {\displaystyle {\text{minimize}}\quad {\text{over }}{\widehat {p}}\quad \|p-{\widehat {p}}\|\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\mathcal {S}}({\widehat {p}}){\big )}\leq r.} == Applications == Linear system identification, in which case the approximating matrix is Hankel structured. Machine learning, in which case the approximating matrix is nonlinearly structured. Recommender systems, in which cases the data matrix has missing values and the approximation is categorical. Distance matrix completion, in which case there is a positive definiteness constraint. Natural language processing, in which case the approximation is nonnegative. Computer algebra, in which case the approximation is Sylvester structured. Matrix product states, in which case the approximation is usually rescaled to have fixed Frobenius norm. == Basic low-rank approximation problem == The unstructured problem with fit measured by the Frobenius norm, i.e., minimize over D ^ ‖ D − D ^ ‖ F subject to rank ⁡ ( D ^ ) ≤ r {\displaystyle {\text{minimize}}\quad {\text{over }}{\widehat {D}}\quad \|D-{\widehat {D}}\|_{\text{F}}\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\widehat {D}}{\big )}\leq r} has an analytic solution in terms of the singular value decomposition of the data matrix. The result is referred to as the matrix approximation lemma or Eckart–Young–Mirsky theorem. This problem was originally solved by Erhard Schmidt in the infinite dimensional context of integral operators (although his methods easily generalize to arbitrary compact operators on Hilbert spaces) and later rediscovered by C. Eckart and G. Young. L. Mirsky generalized the result to arbitrary unitarily invariant norms. Let D = U Σ V ⊤ ∈ R m × n , m ≥ n {\displaystyle D=U\Sigma V^{\top }\in \mathbb {R} ^{m\times n},\quad m\geq n} be the singular value decomposition of D {\displaystyle D} , where Σ =: diag ⁡ ( σ 1 , … , σ r ) {\displaystyle \Sigma =:\operatorname {diag} (\sigma _{1},\ldots ,\sigma _{r})} , where r ≤ min { m , n } = n {\displaystyle r\leq \min\{m,n\}=n} , is the m × n {\displaystyle m\times n} rectangular diagonal matrix with r {\displaystyle r} non-zero singular values σ 1 ≥ … ≥ σ r > σ r + 1 = … = σ n = 0 {\displaystyle \sigma _{1}\geq \ldots \geq \sigma _{r}>\sigma _{r+1}=\ldots =\sigma _{n}=0} . For a given k ∈ { 1 , … , r } {\displaystyle k\in \{1,\dots ,r\}} , partition U {\displaystyle U} , Σ {\displaystyle \Sigma } , and V {\displaystyle V} as follows: U =: [ U 1 U 2 ] , Σ =: [ Σ 1 0 0 Σ 2 ] , and V =: [ V 1 V 2 ] , {\displaystyle U=:{\begin{bmatrix}U_{1}&U_{2}\end{bmatrix}},\quad \Sigma =:{\begin{bmatrix}\Sigma _{1}&0\\0&\Sigma _{2}\end{bmatrix}},\quad {\text{and}}\quad V=:{\begin{bmatrix}V_{1}&V_{2}\end{bmatrix}},} where U 1 {\displaystyle U_{1}} is m × k {\displaystyle m\times k} , Σ 1 {\displaystyle \Sigma _{1}} is k × k {\displaystyle k\times k} , and V 1 {\displaystyle V_{1}} is n × k {\displaystyle n\times k} . Then the rank k {\displaystyle k} matrix D ^ ∗ := U 1 Σ 1 V 1 ⊤ , {\displaystyle {\widehat {D}}^{}:=U_{1}\Sigma _{1}V_{1}^{\top },} obtained from the truncated singular value decomposition is such that ‖ D − D ^ ∗ ‖ F = min rank ⁡ ( D ^ ) ≤ k ‖ D − D ^ ‖ F = σ k + 1 2 + ⋯ + σ r 2 . {\displaystyle \|D-{\widehat {D}}^{}\|_{\text{F}}=\min _{\operatorname {rank} ({\widehat {D}})\leq k}\|D-{\widehat {D}}\|_{\text{F}}={\sqrt {\sigma _{k+1}^{2}+\cdots +\sigma _{r}^{2}}}.} The minimizer D ^ ∗ {\displaystyle {\widehat {D}}^{}} is unique if and only if σ k > σ k + 1 {\displaystyle \sigma _{k}>\sigma _{k+1}} . == Proof of Eckart–Young–Mirsky theorem (for spectral norm) == Let A ∈ R m × n {\displaystyle A\in \mathbb {R} ^{m\times n}} be a real (possibly rectangular) matrix with m ≤ n {\displaystyle m\leq n} . Suppose that A = U Σ V ⊤ {\displaystyle A=U\Sigma V^{\top }} is the singular value decomposition of A {\displaystyle A} . Recall that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices, and Σ {\displaystyle \Sigma } is an m × n {\displaystyle m\times n} diagonal matrix with entries ( σ 1 , σ 2 , ⋯ , σ m ) {\displaystyle (\sigma _{1},\sigma _{2},\cdots ,\sigma _{m})} such that σ 1 ≥ σ 2 ≥ ⋯ ≥ σ m ≥ 0 {\displaystyle \sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{m}\geq 0} . We claim that the best rank- k {\displaystyle k} approximation to A {\displaystyle A} in the spectral norm, denoted by ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} , is given by A k := ∑ i = 1 k σ i u i v i ⊤ {\displaystyle A_{k}:=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }} where u i {\displaystyle u_{i}} and v i {\displaystyle v_{i}} denote the i {\displaystyle i} th column of U {\displaystyle U} and V {\displaystyle V} , respectively. First, note that we have ‖ A − A k ‖ 2 = ‖ ∑ i = 1 n σ i u i v i ⊤ − ∑ i = 1 k σ i u i v i ⊤ ‖ 2 = ‖ ∑ i = k + 1 n σ i u i v i ⊤ ‖ 2 = σ k + 1 {\displaystyle \|A-A_{k}\|_{2}=\left\|\sum _{i=1}^{\color {red}{n}}\sigma _{i}u_{i}v_{i}^{\top }-\sum _{i=1}^{\color {red}{k}}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\left\|\sum _{i=\color {red}{k+1}}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\sigma _{k+1}} Therefore, we need to show that if B k = X Y ⊤ {\displaystyle B_{k}=XY^{\top }} where X {\displaystyle X} and Y {\displaystyle Y} have k {\displaystyle k} columns then ‖ A − A k ‖ 2 = σ k + 1 ≤ ‖ A − B k ‖ 2 {\displaystyle \|A-A_{k}\|_{2}=\sigma _{k+1}\leq \|A-B_{k}\|_{2}} . Since Y {\displaystyle Y} has k {\displaystyle k} columns, then there must be a nontrivial linear combination of the first k + 1 {\displaystyle k+1} columns of V {\displaystyle V} , i.e., w = γ 1 v 1 + ⋯ + γ k + 1 v k + 1 , {\displaystyle w=\gamma _{1}v_{1}+\cdots +\gamma _{k+1}v_{k+1},} such that Y ⊤ w = 0 {\displaystyle Y^{\top }w=0} . Without loss of generality, we can scale w {\displaystyle w} so that ‖ w ‖ 2 = 1 {\displaystyle \|w\|_{2}=1} or (equivalently) γ 1 2 + ⋯ + γ k + 1 2 = 1 {\displaystyle \gamma _{1}^{2}+\cdots +\gamma _{k+1}^{2}=1} . Therefore, ‖ A − B k ‖ 2 2 ≥ ‖ ( A − B k ) w ‖ 2 2 = ‖ A w ‖ 2 2 = γ 1 2 σ 1 2 + ⋯ + γ k + 1 2 σ k + 1 2 ≥ σ k + 1 2 . {\displaystyle \|A-B_{k}\|_{2}^{2}\geq \|(A-B_{k})w\|_{2}^{2}=\|Aw\|_{2}^{2}=\gamma _{1}^{2}\sigma _{1}^{2}+\cdots +\gamma _{k+1}^{2}\sigma _{k+1}^{2}\geq \sigma _{k+1}^{2}.} The result follows by taking the square root of both sides of the above inequality. == Proof of Eckart–Young–Mirsky theorem (for Frobenius norm) == Let A ∈ R m × n {\displaystyle A\in \mathbb {R} ^{m\times n}} be a real (possibly rectangular) matrix with m ≤ n {\displaystyle m\leq n} . Suppose that A = U Σ V ⊤ {\displaystyle A=U\Sigma V^{\top }} is the singular value decomposition of A {\displaystyle A} . We claim that the best rank k {\displaystyle k} approximation to A {\displaystyle A} in the Frobenius norm, denoted by ‖ ⋅ ‖ F {\displaystyle \|\cdot \|_{F}} , is given by A k = ∑ i = 1 k σ i u i v i ⊤ {\displaystyle A_{k}=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }} where u i {\displaystyle u_{i}} and v i {\displaystyle v_{i}} denote the i {\displaystyle i} th column of U {\displaystyle U} and V {\displaystyle V} , respectively. First, note that we have ‖ A − A k ‖ F 2 = ‖ ∑ i = k + 1 n σ i u i v i ⊤ ‖ F 2 = ∑ i = k + 1 n σ i 2 {\displaystyle \|A-A_{k}\|_{F}^{2}=\left\|\sum _{i=k+1}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}} Therefore, we need to show that if B k = X Y ⊤ {\displaystyle B_{k}=XY^{\top }} where X {\displaystyle X} and Y {\displaystyle Y} have k {\displaystyle k} columns then ‖ A − A k ‖ F 2 = ∑ i = k + 1 n σ i 2 ≤ ‖ A − B k ‖ F 2 . {\displaystyle \|A-A_{k}\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}\leq \|A-B_{k}\|_{F}^{2}.} By the triangle inequality with the spectral norm

    Read more →
  • Induction of regular languages

    Induction of regular languages

    In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way (see language identification in the limit), approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction. == Definitions == A regular language is defined as a (finite or infinite) set of strings that can be described by one of the mathematical formalisms called "finite automaton", "regular grammar", or "regular expression", all of which have the same expressive power. Since the latter formalism leads to shortest notations, it shall be introduced and used here. Given a set Σ of symbols (a.k.a. alphabet), a regular expression can be any of ∅ (denoting the empty set of strings), ε (denoting the singleton set containing just the empty string), a (where a is any character in Σ; denoting the singleton set just containing the single-character string a), r + s (where r and s are, in turn, simpler regular expressions; denoting their set's union) r ⋅ s (denoting the set of all possible concatenations of strings from r's and s's set), r + (denoting the set of n-fold repetitions of strings from r's set, for any n ≥ 1), or r (similarly denoting the set of n-fold repetitions, but also including the empty string, seen as 0-fold repetition). For example, using Σ = {0,1}, the regular expression (0+1+ε)⋅(0+1) denotes the set of all binary numbers with one or two digits (leading zero allowed), while 1⋅(0+1)⋅0 denotes the (infinite) set of all even binary numbers (no leading zeroes). Given a set of strings (also called "positive examples"), the task of regular language induction is to come up with a regular expression that denotes a set containing all of them. As an example, given {1, 10, 100}, a "natural" description could be the regular expression 1⋅0, corresponding to the informal characterization "a 1 followed by arbitrarily many (maybe even none) 0's". However, (0+1) and 1+(1⋅0)+(1⋅0⋅0) is another regular expression, denoting the largest (assuming Σ = {0,1}) and the smallest set containing the given strings, and called the trivial overgeneralization and undergeneralization, respectively. Some approaches work in an extended setting where also a set of "negative example" strings is given; then, a regular expression is to be found that generates all of the positive, but none of the negative examples. == Lattice of automata == Dupont et al. have shown that the set of all structurally complete finite automata generating a given input set of example strings forms a lattice, with the trivial undergeneralized and the trivial overgeneralized automaton as bottom and top element, respectively. Each member of this lattice can be obtained by factoring the undergeneralized automaton by an appropriate equivalence relation. For the above example string set {1, 10, 100}, the picture shows at its bottom the undergeneralized automaton Aa,b,c,d in grey, consisting of states a, b, c, and d. On the state set {a,b,c,d}, a total of 15 equivalence relations exist, forming a lattice. Mapping each equivalence E to the corresponding quotient automaton language L(Aa,b,c,d / E) obtains the partially ordered set shown in the picture. Each node's language is denoted by a regular expression. The language may be recognized by quotient automata w.r.t. different equivalence relations, all of which are shown below the node. An arrow between two nodes indicates that the lower node's language is a proper subset of the higher node's. If both positive and negative example strings are given, Dupont et al. build the lattice from the positive examples, and then investigate the separation border between automata that generate some negative example and such that do not. Most interesting are those automata immediately below the border. In the picture, separation borders are shown for the negative example strings 11 (green), 1001 (blue), 101 (cyan), and 0 (red). Coste and Nicolas present an own search method within the lattice, which they relate to Mitchell's version space paradigm. To find the separation border, they use a graph coloring algorithm on the state inequality relation induced by the negative examples. Later, they investigate several ordering relations on the set of all possible state fusions. Kudo and Shimbo use the representation by automaton factorizations to give a unique framework for the following approaches (sketched below): k-reversible languages and the "tail clustering" follow-up approach, Successor automata and the predecessor-successor method, and pumping-based approaches (framework-integration challenged by Luzeaux, however). Each of these approaches is shown to correspond to a particular kind of equivalence relations used for factorization. == Approaches == === k-reversible languages === Angluin considers so-called "k-reversible" regular automata, that is, deterministic automata in which each state can be reached from at most one state by following a transition chain of length k. Formally, if Σ, Q, and δ denote the input alphabet, the state set, and the transition function of an automaton A, respectively, then A is called k-reversible if: ∀a0, ..., ak ∈ Σ ∀s1, s2 ∈ Q: δ(s1, a0...ak) = δ(s2, a0...ak) ⇒ s1 = s2, where δ means the homomorphic extension of δ to arbitrary words. Angluin gives a cubic algorithm for learning of the smallest k-reversible language from a given set of input words; for k = 0, the algorithm has even almost linear complexity. The required state uniqueness after k + 1 given symbols forces unifying automaton states, thus leading to a proper generalization different from the trivial undergeneralized automaton. This algorithm has been used to learn simple parts of English syntax; later, an incremental version has been provided. Another approach based on k-reversible automata is the tail clustering method. === Successor automata === From a given set of input strings, Vernadat and Richetin build a so-called successor automaton, consisting of one state for each distinct character and a transition between each two adjacent characters' states. For example, the singleton input set {aabbaabb} leads to an automaton corresponding to the regular expression (a+⋅b+). An extension of this approach is the predecessor-successor method which generalizes each character repetition immediately to a Kleene + and then includes for each character the set of its possible predecessors in its state. Successor automata can learn exactly the class of local languages. Since each regular language is the homomorphic image of a local language, grammars from the former class can be learned by lifting, if an appropriate (depending on the intended application) homomorphism is provided. In particular, there is such a homomorphism for the class of languages learnable by the predecessor-successor method. The learnability of local languages can be reduced to that of k-reversible languages. === Early approaches === Chomsky and Miller (1957) used the pumping lemma: they guess a part v of an input string uvw and try to build a corresponding cycle into the automaton to be learned; using membership queries they ask, for appropriate k, which of the strings uw, uvvw, uvvvw, ..., uvkw also belongs to the language to be learned, thereby refining the structure of their automaton. In 1959, Solomonoff generalized this approach to context-free languages, which also obey a pumping lemma. === Cover automata === Câmpeanu et al. learn a finite automaton as a compact representation of a large finite language. Given such a language F, they search a so-called cover automaton A such that its language L(A) covers F in the following sense: L(A) ∩ Σ≤ l = F, where l is the length of the longest string in F, and Σ≤ l denotes the set of all strings not longer than l. If such a cover automaton exists, F is uniquely determined by A and l. For example, F = {ad, read, reread } has l = 6 and a cover automaton corresponding to the regular expression (r⋅e)⋅a⋅d. For two strings x and y, Câmpeanu et al. define x ~ y if xz ∈ F ⇔ yz ∈ F for all strings z of a length such that both xz and yz are not longer than l. Based on this relation, whose lack of transitivity causes considerable technical problems, they give an O(n4) algorithm to construct from F a cover automaton A of minimal state count. Moreover, for union, intersection, and difference of two finite languages they provide corresponding operations on their cover automata. Păun et al. improve the time complexity to O(n2). === Residual automata === For a set S of strings and a string u, the Brzozowski derivative u−1S is defined as the set of all rest-strings obtainable from a string in S by cutting off its prefix u (if possible), formally: u−1S = {v ∈ Σ: uv ∈ S}, cf. picture. Denis et al. define a

    Read more →
  • Yooreeka

    Yooreeka

    Yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The project started with the code of the book "Algorithms of the Intelligent Web". Although the term "Web" prevailed in the title, in essence, the algorithms are valuable in any software application. It covers all major algorithms and provides many examples. Yooreeka 2.x is licensed under the Apache License rather than the somewhat more restrictive LGPL (which was the license of v1.x). The library is written 100% in the Java language. == Algorithms == The following algorithms are covered: Clustering Hierarchical—Agglomerative (e.g. MST single link; ROCK) and Divisive Partitional (e.g. k-means) Classification Bayesian Decision trees Neural Networks Rule based (via Drools) Recommendations Collaborative filtering Content based Search PageRank DocRank Personalization

    Read more →
  • Vulnerability assessment (computing)

    Vulnerability assessment (computing)

    Vulnerability assessment is a process of defining, identifying and classifying the security holes in information technology systems. An attacker can exploit a vulnerability to violate the security of a system. Some known vulnerabilities are Authentication Vulnerability, Authorization Vulnerability and Input Validation Vulnerability. == Purpose == Before deploying a system, it first must go through from a series of vulnerability assessments that will ensure that the build system is secure from all the known security risks. When a new vulnerability is discovered, the system administrator can again perform an assessment, discover which modules are vulnerable, and start the patch process. After the fixes are in place, another assessment can be run to verify that the vulnerabilities were actually resolved. This cycle of assess, patch, and re-assess has become the standard method for many organizations to manage their security issues. The primary purpose of the assessment is to find the vulnerabilities in the system, but the assessment report conveys to stakeholders that the system is secured from these vulnerabilities. If an intruder gained access to a network consisting of vulnerable Web servers, it is safe to assume that he gained access to those systems as well. Because of assessment report, the security administrator will be able to determine how intrusion occurred, identify compromised assets and take appropriate security measures to prevent critical damage to the system. == Assessment types == Depending on the system a vulnerability assessment can have many types and level. === Host assessment === A host assessment looks for system-level vulnerabilities such as insecure file permissions, application level bugs, backdoor and Trojan horse installations. It requires specialized tools for the operating system and software packages being used, in addition to administrative access to each system that should be tested. Host assessment is often very costly in term of time, and thus is only used in the assessment of critical systems. Tools like COPS and Tiger are popular in host assessment. === Network assessment === In a network assessment one assess the network for known vulnerabilities. It locates all systems on a network, determines what network services are in use, and then analyzes those services for potential vulnerabilities. This process does not require any configuration changes on the systems being assessed. Unlike host assessment, network assessment requires little computational cost and effort. == Vulnerability assessment vs penetration testing == Vulnerability assessment and penetration testing are two different testing methods. They are differentiated on the basis of certain specific parameters. == Regulatory requirements == Vulnerability assessments are mandated or strongly recommended by several regulatory frameworks. In the United States healthcare sector, the Health Insurance Portability and Accountability Act (HIPAA) Security Rule requires covered entities to conduct periodic evaluations of their security posture, and a December 2024 Notice of Proposed Rulemaking would explicitly require vulnerability scanning at least every six months for systems containing electronic protected health information. The Payment Card Industry Data Security Standard (PCI DSS) requires quarterly vulnerability scans for organizations that process credit card transactions, and the NIST Cybersecurity Framework includes vulnerability assessment as a core component of its Identify function.

    Read more →
  • LPBoost

    LPBoost

    Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes, and thus also belongs to the class of margin classifier algorithms. Consider a classification function f : X → { − 1 , 1 } , {\displaystyle f:{\mathcal {X}}\to \{-1,1\},} which classifies samples from a space X {\displaystyle {\mathcal {X}}} into one of two classes, labelled 1 and -1, respectively. LPBoost is an algorithm for learning such a classification function, given a set of training examples with known class labels. LPBoost is a machine learning technique especially suited for joint classification and feature selection in structured domains. == LPBoost overview == As in all boosting classifiers, the final classification function is of the form f ( x ) = ∑ j = 1 J α j h j ( x ) , {\displaystyle f({\boldsymbol {x}})=\sum _{j=1}^{J}\alpha _{j}h_{j}({\boldsymbol {x}}),} where α j {\displaystyle \alpha _{j}} are non-negative weightings for weak classifiers h j : X → { − 1 , 1 } {\displaystyle h_{j}:{\mathcal {X}}\to \{-1,1\}} . Each individual weak classifier h j {\displaystyle h_{j}} may be just a little bit better than random, but the resulting linear combination of many weak classifiers can perform very well. LPBoost constructs f {\displaystyle f} by starting with an empty set of weak classifiers. Iteratively, a single weak classifier to add to the set of considered weak classifiers is selected, added and all the weights α {\displaystyle {\boldsymbol {\alpha }}} for the current set of weak classifiers are adjusted. This is repeated until no weak classifiers to add remain. The property that all classifier weights are adjusted in each iteration is known as totally-corrective property. Early boosting methods, such as AdaBoost do not have this property and converge slower. == Linear program == More generally, let H = { h ( ⋅ ; ω ) | ω ∈ Ω } {\displaystyle {\mathcal {H}}=\{h(\cdot ;\omega )|\omega \in \Omega \}} be the possibly infinite set of weak classifiers, also termed hypotheses. One way to write down the problem LPBoost solves is as a linear program with infinitely many variables. The primal linear program of LPBoost, optimizing over the non-negative weight vector α {\displaystyle {\boldsymbol {\alpha }}} , the non-negative vector ξ {\displaystyle {\boldsymbol {\xi }}} of slack variables and the margin ρ {\displaystyle \rho } is the following. min α , ξ , ρ − ρ + D ∑ n = 1 ℓ ξ n sb.t. ∑ ω ∈ Ω y n α ω h ( x n ; ω ) + ξ n ≥ ρ , n = 1 , … , ℓ , ∑ ω ∈ Ω α ω = 1 , ξ n ≥ 0 , n = 1 , … , ℓ , α ω ≥ 0 , ω ∈ Ω , ρ ∈ R . {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {\alpha }},{\boldsymbol {\xi }},\rho }{\min }}&-\rho +D\sum _{n=1}^{\ell }\xi _{n}\\{\textrm {sb.t.}}&\sum _{\omega \in \Omega }y_{n}\alpha _{\omega }h({\boldsymbol {x}}_{n};\omega )+\xi _{n}\geq \rho ,\qquad n=1,\dots ,\ell ,\\&\sum _{\omega \in \Omega }\alpha _{\omega }=1,\\&\xi _{n}\geq 0,\qquad n=1,\dots ,\ell ,\\&\alpha _{\omega }\geq 0,\qquad \omega \in \Omega ,\\&\rho \in {\mathbb {R} }.\end{array}}} Note the effects of slack variables ξ ≥ 0 {\displaystyle {\boldsymbol {\xi }}\geq 0} : their one-norm is penalized in the objective function by a constant factor D {\displaystyle D} , which—if small enough—always leads to a primal feasible linear program. Here we adopted the notation of a parameter space Ω {\displaystyle \Omega } , such that for a choice ω ∈ Ω {\displaystyle \omega \in \Omega } the weak classifier h ( ⋅ ; ω ) : X → { − 1 , 1 } {\displaystyle h(\cdot ;\omega ):{\mathcal {X}}\to \{-1,1\}} is uniquely defined. When the above linear program was first written down in early publications about boosting methods it was disregarded as intractable due to the large number of variables α {\displaystyle {\boldsymbol {\alpha }}} . Only later it was discovered that such linear programs can indeed be solved efficiently using the classic technique of column generation. === Column generation for LPBoost === In a linear program a column corresponds to a primal variable. Column generation is a technique to solve large linear programs. It typically works in a restricted problem, dealing only with a subset of variables. By generating primal variables iteratively and on-demand, eventually the original unrestricted problem with all variables is recovered. By cleverly choosing the columns to generate the problem can be solved such that while still guaranteeing the obtained solution to be optimal for the original full problem, only a small fraction of columns has to be created. ==== LPBoost dual problem ==== Columns in the primal linear program corresponds to rows in the dual linear program. The equivalent dual linear program of LPBoost is the following linear program. max λ , γ γ sb.t. ∑ n = 1 ℓ y n h ( x n ; ω ) λ n + γ ≤ 0 , ω ∈ Ω , 0 ≤ λ n ≤ D , n = 1 , … , ℓ , ∑ n = 1 ℓ λ n = 1 , γ ∈ R . {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {\lambda }},\gamma }{\max }}&\gamma \\{\textrm {sb.t.}}&\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}+\gamma \leq 0,\qquad \omega \in \Omega ,\\&0\leq \lambda _{n}\leq D,\qquad n=1,\dots ,\ell ,\\&\sum _{n=1}^{\ell }\lambda _{n}=1,\\&\gamma \in \mathbb {R} .\end{array}}} For linear programs the optimal value of the primal and dual problem are equal. For the above primal and dual problems, the optimal value is equal to the negative 'soft margin'. The soft margin is the size of the margin separating positive from negative training instances minus positive slack variables that carry penalties for margin-violating samples. Thus, the soft margin may be positive although not all samples are linearly separated by the classification function. The latter is called the 'hard margin' or 'realized margin'. ==== Convergence criterion ==== Consider a subset of the satisfied constraints in the dual problem. For any finite subset we can solve the linear program and thus satisfy all constraints. If we could prove that of all the constraints which we did not add to the dual problem no single constraint is violated, we would have proven that solving our restricted problem is equivalent to solving the original problem. More formally, let γ ∗ {\displaystyle \gamma ^{}} be the optimal objective function value for any restricted instance. Then, we can formulate a search problem for the 'most violated constraint' in the original problem space, namely finding ω ∗ ∈ Ω {\displaystyle \omega ^{}\in \Omega } as ω ∗ = argmax ω ∈ Ω ∑ n = 1 ℓ y n h ( x n ; ω ) λ n . {\displaystyle \omega ^{}={\underset {\omega \in \Omega }{\textrm {argmax}}}\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}.} That is, we search the space H {\displaystyle {\mathcal {H}}} for a single decision stump h ( ⋅ ; ω ∗ ) {\displaystyle h(\cdot ;\omega ^{})} maximizing the left hand side of the dual constraint. If the constraint cannot be violated by any choice of decision stump, none of the corresponding constraint can be active in the original problem and the restricted problem is equivalent. ==== Penalization constant ==== D {\displaystyle D} The positive value of penalization constant D {\displaystyle D} has to be found using model selection techniques. However, if we choose D = 1 ℓ ν {\displaystyle D={\frac {1}{\ell \nu }}} , where ℓ {\displaystyle \ell } is the number of training samples and 0 < ν < 1 {\displaystyle 0<\nu <1} , then the new parameter ν {\displaystyle \nu } has the following properties. ν {\displaystyle \nu } is an upper bound on the fraction of training errors; that is, if k {\displaystyle k} denotes the number of misclassified training samples, then k ℓ ≤ ν {\displaystyle {\frac {k}{\ell }}\leq \nu } . ν {\displaystyle \nu } is a lower bound on the fraction of training samples outside or on the margin. == Algorithm == Input: Training set X = { x 1 , … , x ℓ } {\displaystyle X=\{{\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{\ell }\}} , x i ∈ X {\displaystyle {\boldsymbol {x}}_{i}\in {\mathcal {X}}} Training labels Y = { y 1 , … , y ℓ } {\displaystyle Y=\{y_{1},\dots ,y_{\ell }\}} , y i ∈ { − 1 , 1 } {\displaystyle y_{i}\in \{-1,1\}} Convergence threshold θ ≥ 0 {\displaystyle \theta \geq 0} Output: Classification function f : X → { − 1 , 1 } {\displaystyle f:{\mathcal {X}}\to \{-1,1\}} Initialization Weights, uniform λ n ← 1 ℓ , n = 1 , … , ℓ {\displaystyle \lambda _{n}\leftarrow {\frac {1}{\ell }},\quad n=1,\dots ,\ell } Edge γ ← 0 {\displaystyle \gamma \leftarrow 0} Hypothesis count J ← 1 {\displaystyle J\leftarrow 1} Iterate h ^ ← argmax ω ∈ Ω ∑ n = 1 ℓ y n h ( x n ; ω ) λ n {\displaystyle {\hat {h}}\leftarrow {\underset {\omega \in \Omega }{\textrm {argmax}}}\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}} if ∑ n = 1 ℓ y n h ^ ( x n ) λ n + γ ≤ θ {\displaystyle \sum _{n=1}^{\ell }y_{n}{\hat {h}}({\boldsymbol {x}}_{n})\lambda _{n}+\gamma \leq \theta } then break h J ← h ^ {\displaystyle h_{J}\leftarrow {\hat {h}}} J

    Read more →
  • World Programming System

    World Programming System

    The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users of mixed ability to access and process data and to perform data science tasks. It has interactive visual programming tools using data workflows, and it has coding tools supporting the use of the SAS language mixed with Python, R and SQL. == About == WPS can use programs written in the language of SAS without the need for translating them into any other language. In this regard WPS is compatible with the SAS system. WPS has a built-in language interpreter able to process the language of SAS and produce similar results. WPS is available to run on z/OS, Windows, macOS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX. On all supported platforms, programs written in the language of SAS can be executed from a WPS command line interface, often referred to as running in batch mode. WPS can also be used from a graphical user interface known as the WPS Workbench for managing, editing and running programs written in the language of SAS. The WPS Workbench user interface is based on Eclipse. WPS version 4 (released in March 2018) introduced a drag-and-drop workflow canvas providing interactive blocks for data retrieval, blending and preparation, data discovery and profiling, predictive modelling powered by machine learning algorithms, model performance validation and scorecards. WPS version 3 (released in February 2012) provided a new client/server architecture that allows the WPS Workbench GUI to execute SAS programs on remote server installations of WPS in a network or cloud. The resulting output, data sets, logs, etc., can then all be viewed and manipulated from inside the Workbench as if the workloads had been executed locally. SAS programs do not require any special language statements to use this feature. == Summary of main features == Runs on Windows, macOS, z/OS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX An integrated development environment based on Eclipse for Linux, macOS and Windows. Support for language of SAS elements. Support for the language of SAS Macros. Matrix Programming support using PROC IML. Support for generating band plots, bar charts, box plots, bubble plots, contour plots, dendrogram plots, ellipse plots, fringe plots, heat maps, high-low plots, histograms, loess plots, needle plots, pie charts, penalised b-spline, radar charts, reference lines, scatter plots, series plots, step plots, regression plots and vector plots. Support for statistical procedures ACECLUS, ASSOCRULES, ANOVA, BIN, BOXPLOT, CANCORR, CANDISC, CLUSTER, CORRESP, DISCRIM, DISTANCE, FACTOR, FASTCLUS, FREQ, GAM, GANNO, GENMOD, GLIMMIX, GLM, GLMMOD, GLMSELECT, ICLIFETEST, KDE, LIFEREG, LIFETEST, LOESS, LOGISTIC, MDS, MEANS, MI, MIANALYSE, MIXED, MODECLUS, NESTED, NLIN, NPAR1WAY, PHREG, PLAN, PLS, POWER, PRINCOMP, PROBIT, QUANTREG, RBF, REG, ROBUSTREG, RSREG, SCORE, SEGMENT, SIMNORMAL, STANDARD, STDSIZE, STDRATE, STEPDISC, SUMMARY, SURVEYMEANS, SURVEYSELECT, TPSPLINE, TRANSREG, TREE, TTEST, UNIVARIATE, VARCLUS, VARCOMP Support for time series procedures ARIMA, AUTOREG, ESM, EXPAND, FORECAST, LOAN, SEVERITY, SPECTRA, TIMESERIES, X12 Support for machine learning procedures DECISIONFOREST, DECISIONTREE, GMM, MLP, OPTIMALBIN, SEGMENT, SVM Support for ODS. Reads and writes SAS datasets (compressed or uncompressed). Access: Actian Matrix (previously known as ParAccel), DASD, DB2, Excel, Greenplum, Hadoop, Informix, Kognitio Archived 2012-08-24 at the Wayback Machine, MariaDB, MySQL, Netezza, ODBC, OLEDB, Oracle, PostgreSQL, SAND, Snowflake, SPSS/PSPP, SQL Server, Sybase, Sybase IQ, Teradata, VSAM, Vertica and XML. Support for SAS Tape Format. Direct output of reports to CSV, PDF and HTML. Support to connect WPS systems programmatically, remote submit parts of a program to execute on connected remote servers, upload and download data between the connected systems. Support for Hadoop Support for R Support for Python == Industry recognition == Gartner recognized World Programming in their Cool Vendors in Data Science, 2014 Report. == Lawsuit == In 2010 World Programming defended its use of the language of SAS in the High Court of England and Wales in SAS Institute Inc. v World Programming Ltd. The software was the subject of a lawsuit by SAS Institute. The EU Court of Justice ruled in favor of World Programming, stating that the copyright protection does not extend to the software functionality, the programming language used and the format of the data files used by the program. It stated that there is no copyright infringement when a company which does not have access to the source code of a program studies, observes and tests that program to create another program with the same functionality.

    Read more →