A virtual data room (sometimes called a VDR or Deal Room) is an online repository of information that is used for the storing and distribution of documents. In many cases, a virtual data room is used to facilitate the due diligence process during an M&A transaction, loan syndication, or private equity and venture capital transactions. This due diligence process has traditionally used a physical data room to accomplish the disclosure of documents. For reasons of cost, efficiency and security, virtual data rooms have widely replaced the more traditional physical data room. A virtual data room is an extranet to which the bidders and their advisers are given access via the internet. An extranet is essentially a website with limited controlled access, using a secure log-on supplied by the vendor, which can be disabled at any time, by the vendor, if a bidder withdraws. Much of the information released is confidential and restrictions are applied to the viewer's ability to release this to third parties (by means of forwarding, copying or printing). This can be effectively applied to protect the data using digital rights management. The virtual data room provides access to secure documents for authorized users through a dedicated web site, or through secure agent applications. In the process of mergers and acquisitions the data room is set up as part of the central repository of data relating to companies or divisions being acquired or sold. The data room enables the interested parties to view information relating to the business in a controlled environment where confidentiality can be preserved. Conventionally this was achieved by establishing a supervised, physical data room in secure premises with controlled access. In most cases, with a physical data room, only one bidder team can access the room at a time. A virtual data room is designed to have the same advantages as a conventional data room (controlling access, viewing, copying and printing, etc.) with fewer disadvantages. Due to their increased efficiency, many businesses and industries have moved to using virtual data rooms instead of physical data rooms. In 2006, a spokesperson for a company which sets up virtual deal rooms was reported claiming that the process reduced the bidding process by about thirty days compared to physical data rooms. In the process of startup fundraising, a virtual data room is set up to be a central location for key data, documents, and financials. These are shared with venture capital and angel investors and allows them to streamline due diligence. == Application == Any business dealing with private data can apply VDRs when secure transaction processing is required. This includes financial institutions that need to negotiate confidential customer information without involving third parties. VDRs have traditionally been used for IPOs and real estate asset management. Technology companies may use them to exchange and review code or confidential data needed for operations. The same is true for clients, who entrust their valuable code only to the most qualified people in the organisation. The code is not something that can be printed out and brought in a folder. It resides on a computer and must be used together. VDR can find application in any business that manages data in the form of documents, especially law firms, financial advisers or the B2B sector. The latter work with documents that must always be handled and controlled confidentially, and it is difficult to store them securely when they are on a server that other people can access. In addition, in B2B, it is important to close the deal as quickly as possible: the average sales cycle is one to three months. VDR can be compared to a locked filing cabinet where all those folders and documents are kept. It automates the mathematics of pricing to prevent revenue leakage, and initially integrates CRM to ensure accurate synchronisation of all account data, which is important for B2B in particular and sales in general. While virtual data rooms offer many advantages, they are not suitable for every industry. For example, some governments may decide to continue using physical data rooms for highly confidential information sharing. The damage from potential cyberattacks and data breaches exceeds the benefits offered by virtual data rooms. In such cases, the use of VDRs is not considered. Data breaches have particularly affected the US healthcare system from March 2021 to March 2022 - according to IBM Security the cost of the breach was a record high of $10.1 million.
Identi.ca
identi.ca is a free and open-source social networking and blogging service based on the pump.io software, using the Activity Streams protocol. Identi.ca stopped accepting new registrations in 2013, but continues to operate alongside several other pump.io-based hosts provided by E14N which continue to accept new registrations. == Features == Identi.ca is similar to social networking sites like Facebook and Google+, allowing unlimited length status updates, rich text, and images. The Activity Streams protocol supports many kinds of activities such as games. OpenFarmGame is a prototype application for an Activity Streams-based game. Previous features from its StatusNet version such as hashtags, groups, and global search are not supported. == History == === StatusNet === The service received more than 8,000 registrations and 19,000 updates within the first 24 hours of publicly launching on July 2, 2008, and reached its 1,000,000th notice on November 4, 2008. In January 2009, identi.ca received investment funds from venture capital group Montreal Start Up. On March 30, 2009, Control Yourself (since renamed StatusNet Inc) announced that Identi.ca was to become part of a hosted microblogging service called status.net to be launched in May 2009. Status.net offers individual microblogs under a subdomain to be chosen by the customer. Identi.ca will remain a free service. All notices will be published under the Creative Commons Attribution 3.0 license by default, but paying customers will be free to choose a different license. Formerly based on StatusNet, a micro-blogging software package built on the OStatus specification (and earlier based on the OpenMicroBlogging specification), Identi.ca allowed users to send text updates (known as "notices") up to 140 characters long. While similar to Twitter in both concept and operation, Identi.ca/StatusNet provided many features not currently implemented by Twitter, including XMPP support and personal tag clouds. In addition, Identi.ca/StatusNet allowed free export and exchange of personal and "friend" data based on the FOAF standard; therefore, notices could be fed into a Twitter account or other service, and also ported in to a private system similar to Yammer. === pump.io === Developer Evan Prodromou chose to change the site to the pump.io software platform in development, because pump.io offers more features making it technically more advanced. Registration on Identi.ca was closed in December 2012 in preparation for the switch to pump.io software (the popularity of Identi.ca and "official" Status.net hosting were considered a hindrance to the creation of a federated social network). The conversion was completed on 12 July 2013. The 140 character per post limit was removed (in StatusNet, it was a setting, not an inherent limitation); now the blog posts can contain formatting and images. Groups, hashtags, and a page listing popular posts are not yet implemented in pump.io.
Parity benchmark
Parity problems are widely used as benchmark problems in genetic programming but inherited from the artificial neural network community. Parity is calculated by summing all the binary inputs and reporting if the sum is odd or even. This is considered difficult because: a very simple artificial neural network cannot solve it, and all inputs need to be considered and a change to any one of them changes the answer.
Elastic net regularization
In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. Nevertheless, elastic net regularization is typically more accurate than both methods with regard to reconstruction. == Specification == The elastic net method overcomes the limitations of the LASSO (least absolute shrinkage and selection operator) method which uses a penalty function based on ‖ β ‖ 1 = ∑ j = 1 p | β j | . {\displaystyle \|\beta \|_{1}=\textstyle \sum _{j=1}^{p}|\beta _{j}|.} Use of this penalty function has several limitations. For example, in the "large p, small n" case (high-dimensional data with few examples), the LASSO selects at most n variables before it saturates. Also if there is a group of highly correlated variables, then the LASSO tends to select one variable from a group and ignore the others. To overcome these limitations, the elastic net adds a quadratic part ( ‖ β ‖ 2 {\displaystyle \|\beta \|^{2}} ) to the penalty, which when used alone is ridge regression (known also as Tikhonov regularization). The estimates from the elastic net method are defined by β ^ ≡ argmin β ( ‖ y − X β ‖ 2 + λ 2 ‖ β ‖ 2 + λ 1 ‖ β ‖ 1 ) . {\displaystyle {\hat {\beta }}\equiv {\underset {\beta }{\operatorname {argmin} }}(\|y-X\beta \|^{2}+\lambda _{2}\|\beta \|^{2}+\lambda _{1}\|\beta \|_{1}).} The quadratic penalty term makes the loss function strongly convex, and it therefore has a unique minimum. The elastic net method includes the LASSO and ridge regression: in other words, each of them is a special case where λ 1 = λ , λ 2 = 0 {\displaystyle \lambda _{1}=\lambda ,\lambda _{2}=0} or λ 1 = 0 , λ 2 = λ {\displaystyle \lambda _{1}=0,\lambda _{2}=\lambda } . Meanwhile, the naive version of elastic net method finds an estimator in a two-stage procedure : first for each fixed λ 2 {\displaystyle \lambda _{2}} it finds the ridge regression coefficients, and then does a LASSO type shrinkage. This kind of estimation incurs a double amount of shrinkage, which leads to increased bias and poor predictions. To improve the prediction performance, sometimes the coefficients of the naive version of elastic net is rescaled by multiplying the estimated coefficients by ( 1 + λ 2 ) {\displaystyle (1+\lambda _{2})} . Examples of where the elastic net method has been applied are: Support vector machine Metric learning Portfolio optimization Cancer prognosis == Reduction to support vector machine == It was proven in 2014 that the elastic net can be reduced to the linear support vector machine. A similar reduction was previously proven for the LASSO in 2014. The authors showed that for every instance of the elastic net, an artificial binary classification problem can be constructed such that the hyper-plane solution of a linear support vector machine (SVM) is identical to the solution β {\displaystyle \beta } (after re-scaling). The reduction immediately enables the use of highly optimized SVM solvers for elastic net problems. It also enables the use of GPU acceleration, which is often already used for large-scale SVM solvers. The reduction is a simple transformation of the original data and regularization constants X ∈ R n × p , y ∈ R n , λ 1 ≥ 0 , λ 2 ≥ 0 {\displaystyle X\in {\mathbb {R} }^{n\times p},y\in {\mathbb {R} }^{n},\lambda _{1}\geq 0,\lambda _{2}\geq 0} into new artificial data instances and a regularization constant that specify a binary classification problem and the SVM regularization constant X 2 ∈ R 2 p × n , y 2 ∈ { − 1 , 1 } 2 p , C ≥ 0. {\displaystyle X_{2}\in {\mathbb {R} }^{2p\times n},y_{2}\in \{-1,1\}^{2p},C\geq 0.} Here, y 2 {\displaystyle y_{2}} consists of binary labels − 1 , 1 {\displaystyle {-1,1}} . When 2 p > n {\displaystyle 2p>n} it is typically faster to solve the linear SVM in the primal, whereas otherwise the dual formulation is faster. Some authors have referred to the transformation as Support Vector Elastic Net (SVEN), and provided the following MATLAB pseudo-code: == Software == "Glmnet: Lasso and elastic-net regularized generalized linear models" is a software which is implemented as an R source package and as a MATLAB toolbox. This includes fast algorithms for estimation of generalized linear models with ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two penalties (the elastic net) using cyclical coordinate descent, computed along a regularization path. JMP Pro 11 includes elastic net regularization, using the Generalized Regression personality with Fit Model. "pensim: Simulation of high-dimensional data and parallelized repeated penalized regression" implements an alternate, parallelised "2D" tuning method of the ℓ parameters, a method claimed to result in improved prediction accuracy. scikit-learn includes linear regression and logistic regression with elastic net regularization. SVEN, a Matlab implementation of Support Vector Elastic Net. This solver reduces the Elastic Net problem to an instance of SVM binary classification and uses a Matlab SVM solver to find the solution. Because SVM is easily parallelizable, the code can be faster than Glmnet on modern hardware. SpaSM, a Matlab implementation of sparse regression, classification and principal component analysis, including elastic net regularized regression. Apache Spark provides support for Elastic Net Regression in its MLlib machine learning library. The method is available as a parameter of the more general LinearRegression class. SAS (software) The SAS procedure Glmselect and SAS Viya procedure Regselect support the use of elastic net regularization for model selection.
Multi-label classification
In machine learning, multi-label classification or multi-output classification is a variant of the classification problem where multiple nonexclusive labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of several (greater than or equal to two) classes. In the multi-label problem the labels are nonexclusive and there is no constraint on how many of the classes the instance can be assigned to. The formulation of multi-label learning was first introduced by Shen et al. in the context of Semantic Scene Classification, and later gained popularity across various areas of machine learning. Formally, multi-label classification is the problem of finding a model that maps inputs x to binary vectors y; that is, it assigns a value of 0 or 1 for each element (label) in y. == Problem transformation methods == Several problem transformation methods exist for multi-label classification, and can be roughly broken down into: === Transformation into binary classification problems === The baseline approach, called the binary relevance method, amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. Although this method of dividing the task into multiple binary tasks may resemble superficially the one-vs.-all (OvA) and one-vs.-rest (OvR) methods for multiclass classification, it is essentially different from both, because a single classifier under binary relevance deals with a single label, without any regard to other labels whatsoever. A classifier chain is an alternative method for transforming a multi-label classification problem into several binary classification problems. It differs from binary relevance in that labels are predicted sequentially, and the output of all previous classifiers (i.e. positive or negative for a particular label) are input as features to subsequent classifiers. Classifier chains have been applied, for instance, in HIV drug resistance prediction. Bayesian network has also been applied to optimally order classifiers in Classifier chains. In case of transforming the problem to multiple binary classifications, the likelihood function reads L = ∏ i = 1 n ( ∏ k ( ∏ j k ( p k , j k ( x i ) δ y i , k , j k ) ) ) {\displaystyle L=\prod _{i=1}^{n}(\prod _{k}(\prod _{j_{k}}(p_{k,j_{k}}(x_{i})^{\delta _{y_{i,k},j_{k}}})))} where index i {\displaystyle i} runs over the samples, index k {\displaystyle k} runs over the labels, j k {\displaystyle j_{k}} indicates the binary outcomes 0 or 1, δ a , b {\displaystyle \delta _{a,b}} indicates the Kronecker delta, y i , k ∈ 0 , 1 {\displaystyle y_{i,k}\in {0,1}} indicates the multiple hot encoded labels of sample i {\displaystyle i} . === Transformation into multi-class classification problem === The label powerset (LP) transformation creates one binary classifier for every label combination present in the training set. For example, if possible labels for an example were A, B, and C, the label powerset representation of this problem is a multi-class classification problem with the classes [0 0 0], [1 0 0], [0 1 0], [0 0 1], [1 1 0], [1 0 1], [0 1 1], and [1 1 1] where for example [1 0 1] denotes an example where labels A and C are present and label B is absent. === Ensemble methods === A set of multi-class classifiers can be used to create a multi-label ensemble classifier. For a given example, each classifier outputs a single class (corresponding to a single label in the multi-label problem). These predictions are then combined by an ensemble method, usually a voting scheme where every class that receives a requisite percentage of votes from individual classifiers (often referred to as the discrimination threshold) is predicted as a present label in the multi-label output. However, more complex ensemble methods exist, such as committee machines. Another variation is the random k-labelsets (RAKEL) algorithm, which uses multiple LP classifiers, each trained on a random subset of the actual labels; label prediction is then carried out by a voting scheme. A set of multi-label classifiers can be used in a similar way to create a multi-label ensemble classifier. In this case, each classifier votes once for each label it predicts rather than for a single label. == Adapted algorithms == Some classification algorithms/models have been adapted to the multi-label task, without requiring problem transformations. Examples of these including for multi-label data are k-nearest neighbors: the ML-kNN algorithm extends the k-NN classifier to multi-label data. decision trees: "Clare" is an adapted C4.5 algorithm for multi-label classification; the modification involves the entropy calculations. MMC, MMDT, and SSC refined MMDT, can classify multi-labeled data based on multi-valued attributes without transforming the attributes into single-values. They are also named multi-valued and multi-labeled decision tree classification methods. kernel methods for vector output neural networks: BP-MLL is an adaptation of the popular back-propagation algorithm for multi-label learning. == Learning paradigms == Based on learning paradigms, the existing multi-label classification techniques can be classified into batch learning and online machine learning. Batch learning algorithms require all the data samples to be available beforehand. It trains the model using the entire training data and then predicts the test sample using the found relationship. The online learning algorithms, on the other hand, incrementally build their models in sequential iterations. In iteration t, an online algorithm receives a sample, xt and predicts its label(s) ŷt using the current model; the algorithm then receives yt, the true label(s) of xt and updates its model based on the sample-label pair: (xt, yt). == Multi-label stream classification == Data streams are possibly infinite sequences of data that continuously and rapidly grow over time. Multi-label stream classification (MLSC) is the version of multi-label classification task that takes place in data streams. It is sometimes also called online multi-label classification. The difficulties of multi-label classification (exponential number of possible label sets, capturing dependencies between labels) are combined with difficulties of data streams (time and memory constraints, addressing infinite stream with finite means, concept drifts). Many MLSC methods resort to ensemble methods in order to increase their predictive performance and deal with concept drifts. Below are the most widely used ensemble methods in the literature: Online Bagging (OzaBagging)-based methods: Observing the probability of having K many of a certain data point in a bootstrap sample is approximately Poisson(1) for big datasets, each incoming data instance in a data stream can be weighted proportional to Poisson(1) distribution to mimic bootstrapping in an online setting. This is called Online Bagging (OzaBagging). Many multi-label methods that use Online Bagging are proposed in the literature, each of which utilizes different problem transformation methods. EBR, ECC, EPS, EBRT, EBMT, ML-Random Rules are examples of such methods. ADWIN Bagging-based methods: Online Bagging methods for MLSC are sometimes combined with explicit concept drift detection mechanisms such as ADWIN (Adaptive Window). ADWIN keeps a variable-sized window to detect changes in the distribution of the data, and improves the ensemble by resetting the components that perform poorly when there is a drift in the incoming data. Generally, the letter 'a' is used as a subscript in the name of such ensembles to indicate the usage of ADWIN change detector. EaBR, EaCC, EaHTPS are examples of such multi-label ensembles. GOOWE-ML-based methods: Interpreting the relevance scores of each component of the ensemble as vectors in the label space and solving a least squares problem at the end of each batch, Geometrically-Optimum Online-Weighted Ensemble for Multi-label Classification (GOOWE-ML) is proposed. The ensemble tries to minimize the distance between the weighted prediction of its components and the ground truth vector for each instance over a batch. Unlike Online Bagging and ADWIN Bagging, GOOWE-ML utilizes a weighted voting scheme where better performing components of the ensemble are given more weight. The GOOWE-ML ensemble grows over time, and the lowest weight component is replaced by a new component when it is full at the end of a batch. GOBR, GOCC, GOPS, GORT are the proposed GOOWE-ML-based multi-label ensembles. Multiple Windows : Here, BR models that use a sliding window are replaced with two windows for each label, one for relevant and one for non-relevant examples. Instances are oversampled or undersampled according to a load factor that is kept
Wispr
Wispr AI is a software company founded in 2021 by Tanay Kothari and Sahaj Garg that develops voice-based interfaces for computers and other devices. The company’s main product, Wispr Flow, is an AI-powered speech-to-text application available on macOS, Windows and iOS. == History == Wispr was founded in 2021 with the goal of building a non-invasive wearable device that would allow users to control smartphones without touch input. The device was intended to translate neurological signals into actions and to enable silent text entry by mouthing words, drawing on techniques similar to brain–computer interfaces. Early funding was directed toward this hardware-focused effort. After around three years of development, Wispr concluded that contemporary AI systems were not sufficient for the requirements of the wearable device. The company shifted its focus to Flow voice dictation software, the software layer originally built for the wearable, and in 2024 released a macOS application based on this platform. == Wispr Flow == Wispr Flow (often referred to as Flow) is a speech-to-text application for macOS, Windows and iOS. It provides real-time dictation and transcription in more than 100 languages and can operate across applications, including email clients, messaging platforms and chatbots. In June 2025 Wispr released an iOS version that functions as a third-party keyboard, allowing voice input in any app. == Technology == Wispr Flow is based on automatic speech recognition (ASR) and other AI models. The system adapts to individual users over time, learning their vocabulary and preferred style with the aim of reducing manual editing. Flow operates through configurable “Flow Sessions”, defined as time windows during which the app has access to the microphone; users can set session timeouts or disable automatic time limits. == Users and Adoption == Wispr initially targeted users such as venture capitalists, entrepreneurs and executives who process large volumes of text and often work in private or flexible environments. The user base later expanded via platforms such as Product Hunt to students, software developers, writers, lawyers and consultants. Flow has also been adopted by users with conditions such as ADHD, dyslexia, paralysis and carpal tunnel syndrome. About 40% of users are in the United States, 30% in Europe and the remaining 30% in other regions. More than 30% of users come from non-technical backgrounds. Flow supports 104 languages, with approximately 40% of dictations in English and 60% in other languages, including Spanish, French, German, Dutch, Hindi and Mandarin. Wispr has reported monthly user growth above 50%, a six-month active-user retention rate of about 80%, a payment rate around 19%, and revenue of approximately US$3.8 million between July 2024 and July 2025. == Development == Wispr has announced plans for an Android application and maintains waiting lists for Android, Linux and web versions of Flow. The company is developing shared-context features for teams so that the software can recognize common terminology within organizations and has stated that it aims to evolve Flow into a broader AI assistant for tasks such as messaging, note-taking and reminders. Wispr has also reported working with unnamed AI hardware partners on interaction layers for future devices. == Funding == In 2025 Wispr raised US$30 million in a Series A funding round led by Menlo Ventures, with participation from NEA, 8VC and several individual investors, including Evan Sharp and Henry Ward. Earlier investors include Neo, MVP Ventures and AIX Ventures. In November of that same year, the company raised a US$25 million Series A extension led by Notable Capital, with participation from Flight Fund, bringing its total funding to US$81 million. Wispr competes with other AI-based dictation and voice-input tools, including Aqua, Talktastic, Superwhisper and Betterdication.
Dimensionality reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics. Methods are commonly divided into linear and nonlinear approaches. Linear approaches can be further divided into feature selection and feature extraction. Dimensionality reduction can be used for noise reduction, data visualization, cluster analysis, or as an intermediate step to facilitate other analyses. == Feature selection == The process of feature selection aims to find a suitable subset of the input variables (features, or attributes) for the task at hand. The three strategies are: the filter strategy (e.g., information gain), the wrapper strategy (e.g., accuracy-guided search), and the embedded strategy (features are added or removed while building the model based on prediction errors). Data analysis such as regression or classification can be done in the reduced space more accurately than in the original space. == Feature projection == Feature projection (also called feature extraction) transforms the data from the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist. For multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning. === Principal component analysis (PCA) === The main linear technique for dimensionality reduction, principal component analysis, performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, the covariance (and sometimes the correlation) matrix of the data is constructed and the eigenvectors on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behavior of the system, because they often contribute the vast majority of the system's energy, especially in low-dimensional systems. Still, this must be proved on a case-by-case basis as not all systems exhibit this behavior. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors. === Non-negative matrix factorization (NMF) === NMF decomposes a non-negative matrix to the product of two non-negative ones, which has been a promising tool in fields where only non-negative signals exist, such as astronomy. NMF is well known since the multiplicative update rule by Lee & Seung, which has been continuously developed: the inclusion of uncertainties, the consideration of missing data and parallel computation, sequential construction which leads to the stability and linearity of NMF, as well as other updates including handling missing data in digital image processing. With a stable component basis during construction, and a linear modeling process, sequential NMF is able to preserve the flux in direct imaging of circumstellar structures in astronomy, as one of the methods of detecting exoplanets, especially for the direct imaging of circumstellar discs. In comparison with PCA, NMF does not remove the mean of the matrices, which leads to physical non-negative fluxes; therefore NMF is able to preserve more information than PCA as demonstrated by Ren et al. === Kernel PCA === Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is called kernel PCA. === Graph-based kernel PCA === Other prominent nonlinear techniques include manifold learning techniques such as Isomap, locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and methods based on tangent space analysis. These techniques assume that the high-dimensional input data lies near a low-dimensional manifold embedded in the ambient space, and construct a low-dimensional representation using a cost function that retains local properties of the data; they can be viewed as defining a graph-based kernel for Kernel PCA. More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel using semidefinite programming. The most prominent example of such a technique is maximum variance unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest neighbors (in the inner product space) while maximizing the distances between points that are not nearest neighbors. An alternative approach to neighborhood preservation is through the minimization of a cost function that measures differences between distances in the input and output spaces. Important examples of such techniques include: classical multidimensional scaling, which is identical to PCA; Isomap, which uses geodesic distances in the data space; diffusion maps, which use diffusion distances in the data space; t-distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between distributions over pairs of points; and curvilinear component analysis. A different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feedforward neural networks with a bottleneck hidden layer. The training of deep encoders is typically performed using a greedy layer-wise pre-training (e.g., using a stack of restricted Boltzmann machines) that is followed by a finetuning stage based on backpropagation. === Linear discriminant analysis (LDA) === Linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. === Generalized discriminant analysis (GDA) === GDA deals with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the support-vector machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space. Similar to LDA, the objective of GDA is to find a projection for the features into a lower dimensional space by maximizing the ratio of between-class scatter to within-class scatter. === Autoencoder === Autoencoders can be used to learn nonlinear dimension reduction functions and codings together with an inverse function from the coding to the original representation. === t-SNE === T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique useful for the visualization of high-dimensional datasets. It is not recommended for use in analysis such as clustering or outlier detection since it does not necessarily preserve densities or distances well. === UMAP === Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique. Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a locally connected Riemannian manifold and that the Riemannian metric is locally constant or approximately locally constant. == Dimension reduction == For high-dimensional datasets, dimension reduction is usually performed prior to applying a k-nearest neighbors (k-NN) algorithm in order to mitigate the curse of dimensionality. Feature extraction and dimension reduction can be combined in one step, using principal component analysis (PCA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), or non-negative matrix factorization (NMF) techniques to pre-process the data, followed by clustering via k-NN on feature vectors in a reduced-dimension space. In machine learning, this process is also called low-dimensional embedding. For high-dimensional datasets (e.g., when performing similarity search on live video streams, DNA data, or high-dimensional time series), running a fast approximate k-NN search using locality-sensitive hashing, random projection, "sketches", or other high-dimensional similarity search techniques from the VLDB conference toolbox may be the only fe