AI Email Helper

AI Email Helper — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ViBe

    ViBe

    ViBe is a background subtraction algorithm which has been presented at the IEEE ICASSP 2009 conference and was refined in later publications. More precisely, it is a software module for extracting background information from moving images. It has been developed by Oliver Barnich and Marc Van Droogenbroeck of the Montefiore Institute, University of Liège, Belgium. ViBe is patented: the patent covers various aspects such as stochastic replacement, spatial diffusion, and non-chronological handling. ViBe is written in the programming language C, and has been implemented on CPU, GPU and FPGA. == Technical description == Source: === Pixel model and classification process === Many advanced techniques are used to provide an estimate of the temporal probability density function (pdf) of a pixel x. ViBe's approach is different, as it imposes the influence of a value in the polychromatic space to be limited to the local neighborhood. In practice, ViBe does not estimate the pdf, but uses a set of previously observed sample values as a pixel model. To classify a value pt(x), it is compared to its closest values among the set of samples. === Model update: Sample values lifespan policy === ViBe ensures a smooth exponentially decaying lifespan for the sample values that constitute the pixel models. This makes ViBe able to successfully deal with concomitant events with a single model of a reasonable size for each pixel. This is achieved by choosing, randomly, which sample to replace when updating a pixel model. Once the sample to be discarded has been chosen, the new value replaces the discarded sample. The pixel model that would result from the update of a given pixel model with a given pixel sample cannot be predicted since the value to be discarded is chosen at random. === Model update: Spatial Consistency === To ensure the spatial consistency of the whole image model and handle practical situations such as small camera movements or slowly evolving background objects, ViBe uses a technique similar to that developed for the updating process in which it chooses at random and update a pixel model in the neighborhood of the current pixel. By denoting NG(x) and p(x) respectively the spatial neighborhood of a pixel x and its value, and assuming that it was decided to update the set of samples of x by inserting p(x), then ViBe also use this value p(x) to update the set of samples of one of the pixels in the neighborhood NG(x), chosen at random. As a result, ViBe is able to produce spatially coherent results directly without the use of any post-processing method. === Model initialization === Although the model could easily recover from any type of initialization, for example by choosing a set of random values, it is convenient to get an accurate background estimate as soon as possible. Ideally a segmentation algorithm would like to be able to segment the video sequences starting from the second frame, the first frame being used to initialize the model. Since no temporal information is available prior to the second frame, ViBe populates the pixel models with values found in the spatial neighborhood of each pixel; more precisely, it initializes the background model with values taken randomly in each pixel neighborhood of the first frame. The background estimate is therefore valid starting from the second frame of a video sequence.

    Read more →
  • Random forest

    Random forest

    Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees. Random forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created in 1995 by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg. An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.). The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance. == History == The general method of random decision forests was first proposed by Salzberg and Heath in 1993, with a method that used a randomized decision tree algorithm to create multiple trees and then combine them using majority voting. This idea was developed further by Ho in 1995. Ho established that forests of trees splitting with oblique hyperplanes can gain accuracy as they grow without suffering from overtraining, as long as the forests are randomly restricted to be sensitive to only selected feature dimensions. A subsequent work along the same lines concluded that other splitting methods behave similarly, as long as they are randomly forced to be insensitive to some feature dimensions. This observation that a more complex classifier (a larger forest) gets more accurate nearly monotonically is in sharp contrast to the common belief that the complexity of a classifier can only grow to a certain level of accuracy before being hurt by overfitting. The explanation of the forest method's resistance to overtraining can be found in Kleinberg's theory of stochastic discrimination. The early development of Breiman's notion of random forests was influenced by the work of Amit and Geman who introduced the idea of searching over a random subset of the available decisions when splitting a node, in the context of growing a single tree. The idea of random subspace selection from Ho was also influential in the design of random forests. This method grows a forest of trees, and introduces variation among the trees by projecting the training data into a randomly chosen subspace before fitting each tree or each node. Finally, the idea of randomized node optimization, where the decision at each node is selected by a randomized procedure, rather than a deterministic optimization was first introduced by Thomas G. Dietterich. The proper introduction of random forests was made in a paper by Leo Breiman, that has become one of the world's most cited papers. This paper describes a method of building a forest of uncorrelated trees using a CART like procedure, combined with randomized node optimization and bagging. In addition, this paper combines several ingredients, some previously known and some novel, which form the basis of the modern practice of random forests, in particular: Using out-of-bag error as an estimate of the generalization error. Measuring variable importance through permutation. The report also offers the first theoretical result for random forests in the form of a bound on the generalization error which depends on the strength of the trees in the forest and their correlation. == Algorithm == === Preliminaries: decision tree learning === Decision trees are a popular method for various machine learning tasks. Tree learning is almost "an off-the-shelf procedure for data mining", say Hastie et al., "because it is invariant under scaling and various other transformations of feature values, is robust to inclusion of irrelevant features, and produces inspectable models. However, they are seldom accurate". In particular, trees that are grown very deep tend to learn highly irregular patterns: they overfit their training sets, i.e. have low bias, but very high variance. Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance. This comes at the expense of a small increase in the bias and some loss of interpretability, but generally greatly boosts the performance in the final model. === Bagging === The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples: After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': f ^ = 1 B ∑ b = 1 B f b ( x ′ ) {\displaystyle {\hat {f}}={\frac {1}{B}}\sum _{b=1}^{B}f_{b}(x')} or by taking the plurality vote in the case of classification trees. This bootstrapping procedure leads to better model performance because it decreases the variance of the model, without increasing the bias. This means that while the predictions of a single tree are highly sensitive to noise in its training set, the average of many trees is not, as long as the trees are not correlated. Simply training many trees on a single training set would give strongly correlated trees (or even the same tree many times, if the training algorithm is deterministic); bootstrap sampling is a way of de-correlating the trees by showing them different training sets. Additionally, an estimate of the uncertainty of the prediction can be made as the standard deviation of the predictions from all the individual regression trees on x′: σ = ∑ b = 1 B ( f b ( x ′ ) − f ^ ) 2 B − 1 . {\displaystyle \sigma ={\sqrt {\frac {\sum _{b=1}^{B}(f_{b}(x')-{\hat {f}})^{2}}{B-1}}}.} The number B of samples (equivalently, of trees) is a free parameter. Typically, a few hundred to several thousand trees are used, depending on the size and nature of the training set. B can be optimized using cross-validation, or by observing the out-of-bag error: the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. The training and test error tend to level off after some number of trees have been fit. === From bagging to random forests === The above procedure describes the original bagging algorithm for trees. Random forests also include another type of bagging scheme: they use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. This process is sometimes called "feature bagging". The reason for doing this is the correlation of the trees in an ordinary bootstrap sample: if one or a few features are very strong predictors for the response variable (target output), these features will be selected in many of the B trees, causing them to become correlated. An analysis of how bagging and random subspace projection contribute to accuracy gains under different conditions is given by Ho. Typically, for a classification problem with p {\displaystyle p} features, p {\displaystyle {\sqrt {p}}} (rounded down) features are used in each split. For regression problems the inventors recommend p / 3 {\displaystyle p/3} (rounded down) with a minimum node size of 5 as the default. In practice, the best values for these parameters should be tuned on a case-to-case basis for every problem. === ExtraTrees === Adding one further step of randomization yields extremely randomized trees, or ExtraTrees. As with ordinary random forests, they are an ensemble of individual trees, but there are two main differences: (1) each tree is trained using the whole learning sample (rather than a bootstrap sample), and (2) the top-down splitting is randomized: for each feature under consideration, a number of random cut-points are selected, instead of computing the locally optimal cut-point (based on, e.g., information gain or the Gini impurity). The values are chosen from a uniform distribution within the feature's empirical range (in the tree's training set). Then, of all the randomly chosen splits, the split that yields the highest score is chosen to split the node. Similar to ordinary random forests, the number of randomly selected features to be considered at each node can be specified. Default values for this parameter are p {\displaystyle {\sqrt {p}}} for classification and p {\displaystyle p} for regression, where p {\displaystyle p} is the number of features in the model. === Random forests for high-dimensional data === The basic random forest procedure may

    Read more →
  • ARKA descriptors in QSAR

    ARKA descriptors in QSAR

    In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.

    Read more →
  • Nonlinear dimensionality reduction

    Nonlinear dimensionality reduction

    Nonlinear dimensionality reduction (NLDR), also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis. == Applications of NLDR == High dimensional data can be hard for machines to work with, requiring significant time and space for analysis. It also presents a challenge for humans, since it's hard to visualize or understand data in more than three dimensions. Reducing the dimensionality of a data set, while keeping its essential features relatively intact, can make algorithms more efficient and allow analysts to visualize trends and patterns. The reduced-dimensional representations of data are often referred to as "intrinsic variables". This description implies that these are the values from which the data was produced. For example, consider a dataset that contains images of a letter 'A', which has been scaled and rotated by varying amounts. Each image has 32×32 pixels. Each image can be represented as a vector of 1024 pixel values. Each row is a sample on a two-dimensional manifold in 1024-dimensional space (a Hamming space). The intrinsic dimensionality is two, because two variables (rotation and scale) were varied in order to produce the data. Information about the shape or look of a letter 'A' is not part of the intrinsic variables because it is the same in every instance. Nonlinear dimensionality reduction will discard the correlated information (the letter 'A') and recover only the varying information (rotation and scale). By comparison, if principal component analysis, which is a linear dimensionality reduction algorithm, is used to reduce this same dataset into two dimensions, the resulting values are not so well organized. This demonstrates that the high-dimensional vectors (each representing a letter 'A') that sample this manifold vary in a non-linear manner. It should be apparent, therefore, that NLDR has several applications in the field of computer-vision. For example, consider a robot that uses a camera to navigate in a closed static environment. The images obtained by that camera can be considered to be samples on a manifold in high-dimensional space, and the intrinsic variables of that manifold will represent the robot's position and orientation. Invariant manifolds are of general interest for model order reduction in dynamical systems. In particular, if there is an attracting invariant manifold in the phase space, nearby trajectories will converge onto it and stay on it indefinitely, rendering it a candidate for dimensionality reduction of the dynamical system. While such manifolds are not guaranteed to exist in general, the theory of spectral submanifolds (SSM) gives conditions for the existence of unique attracting invariant objects in a broad class of dynamical systems. Active research in NLDR seeks to unfold the observation manifolds associated with dynamical systems to develop modeling techniques. Some of the more prominent nonlinear dimensionality reduction techniques are listed below. == Important concepts == === Sammon's mapping === Sammon's mapping is one of the first and most popular NLDR techniques. === Self-organizing map === The self-organizing map (SOM, also called Kohonen map) and its probabilistic variant generative topographic mapping (GTM) use a point representation in the embedded space to form a latent variable model based on a non-linear mapping from the embedded space to the high-dimensional space. These techniques are related to work on density networks, which also are based around the same probabilistic model. === Kernel principal component analysis === Perhaps the most widely used algorithm for dimensional reduction is kernel PCA. PCA begins by computing the covariance matrix of the m × n {\displaystyle m\times n} matrix X {\displaystyle \mathbf {X} } C = 1 m ∑ i = 1 m x i x i T . {\displaystyle C={\frac {1}{m}}\sum _{i=1}^{m}{\mathbf {x} _{i}\mathbf {x} _{i}^{\mathsf {T}}}.} It then projects the data onto the first k eigenvectors of that matrix. By comparison, KPCA begins by computing the covariance matrix of the data after being transformed into a higher-dimensional space, C = 1 m ∑ i = 1 m Φ ( x i ) Φ ( x i ) T . {\displaystyle C={\frac {1}{m}}\sum _{i=1}^{m}{\Phi (\mathbf {x} _{i})\Phi (\mathbf {x} _{i})^{\mathsf {T}}}.} It then projects the transformed data onto the first k eigenvectors of that matrix, just like PCA. It uses the kernel trick to factor away much of the computation, such that the entire process can be performed without actually computing Φ ( x ) {\displaystyle \Phi (\mathbf {x} )} . Of course Φ {\displaystyle \Phi } must be chosen such that it has a known corresponding kernel. Unfortunately, it is not trivial to find a good kernel for a given problem, so KPCA does not yield good results with some problems when using standard kernels. For example, it is known to perform poorly with these kernels on the Swiss roll manifold. However, one can view certain other methods that perform well in such settings (e.g., Laplacian Eigenmaps, LLE) as special cases of kernel PCA by constructing a data-dependent kernel matrix. KPCA has an internal model, so it can be used to map points onto its embedding that were not available at training time. === Principal curves and manifolds === Principal curves and manifolds give the natural geometric framework for nonlinear dimensionality reduction and extend the geometric interpretation of PCA by explicitly constructing an embedded manifold, and by encoding using standard geometric projection onto the manifold. This approach was originally proposed by Trevor Hastie in his 1984 thesis, which he formally introduced in 1989. This idea has been explored further by many authors. How to define the "simplicity" of the manifold is problem-dependent, however, it is commonly measured by the intrinsic dimensionality and/or the smoothness of the manifold. Usually, the principal manifold is defined as a solution to an optimization problem. The objective function includes a quality of data approximation and some penalty terms for the bending of the manifold. The popular initial approximations are generated by linear PCA and Kohonen's SOM. === Laplacian eigenmaps === Laplacian eigenmaps uses spectral techniques to perform dimensionality reduction. This technique relies on the basic assumption that the data lies in a low-dimensional manifold in a high-dimensional space. This algorithm cannot embed out-of-sample points, but techniques based on Reproducing kernel Hilbert space regularization exist for adding this capability. Such techniques can be applied to other nonlinear dimensionality reduction algorithms as well. Traditional techniques like principal component analysis do not consider the intrinsic geometry of the data. Laplacian eigenmaps builds a graph from neighborhood information of the data set. Each data point serves as a node on the graph and connectivity between nodes is governed by the proximity of neighboring points (using e.g. the k-nearest neighbor algorithm). The graph thus generated can be considered as a discrete approximation of the low-dimensional manifold in the high-dimensional space. Minimization of a cost function based on the graph ensures that points close to each other on the manifold are mapped close to each other in the low-dimensional space, preserving local distances. The eigenfunctions of the Laplace–Beltrami operator on the manifold serve as the embedding dimensions, since under mild conditions this operator has a countable spectrum that is a basis for square integrable functions on the manifold (compare to Fourier series on the unit circle manifold). Attempts to place Laplacian eigenmaps on solid theoretical ground have met with some success, as under certain nonrestrictive assumptions, the graph Laplacian matrix has been shown to converge to the Laplace–Beltrami operator as the number of points goes to infinity. === Isomap === Isomap is a combination of the Floyd–Warshall algorithm with classic Multidimensional Scaling (MDS). Classic MDS takes a matrix of pair-wise distances between all points and computes a position for each point. Isomap assumes that the pair-wise distances are only known between neighboring points, and uses the Floyd–Warshall algorithm to compute the pair-wise distances between all other points. This effectively estimates the full matrix of pair-wise geodesic distances between all of the points. Isomap th

    Read more →
  • Vulnerability Discovery Model

    Vulnerability Discovery Model

    A Vulnerability Discovery Model (VDM) uses discovery event data with software reliability models for predicting the same. A thorough presentation of VDM techniques is available in. Numerous model implementations are available in the MCMCBayes open source repository. Several VDM examples include: Alhazmi-Malaiya: Time based model (Alhazmi-Malaiya Logistic (AML) model) Alhazmi-Malaiya: Effort based model Rescorla: Quadratic Model and Exponential Model Anderson: Thermodynamic Model Kim: Weibull Model Linear Model Hump-Shaped Model Independent and Dependent Model Vulnerability Discovery Modeling using Bayesian model averaging Multivariate Vulnerability Discovery Models

    Read more →
  • List of datasets for machine-learning research

    List of datasets for machine-learning research

    These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less intuitively, the availability of high-quality training datasets. High-quality labeled training datasets for supervised and semi-supervised machine-learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality unlabeled datasets for unsupervised learning can also be difficult and costly to produce. Many organizations, including governments, publish and share their datasets, often using common metadata formats (such as Croissant). The datasets are classified, based on the licenses, into two groups: open data and non-open data. The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made available for searching, depositing and accessing through interfaces like Open API. The datasets are made available as various sorted types and subtypes. == List of sorting used for datasets == The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many government organizations and academic institutions. == List of open data portals == == List of portals suitable for multiple types of applications == The data portal sometimes lists a wide variety of subtypes of datasets pertaining to many machine learning applications. == List of portals suitable for a specific subtype of applications == The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections. == Image data == == Text data == These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis. === Reviews === === News articles === === Messages === === Twitter and tweets === === Dialogues === === Legal === === Other text === == Sound data == These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis. === Speech === === Music === === Other sounds === == Signal data == Datasets containing electric signal information requiring some sort of signal processing for further analysis. === Electrical === === Motion-tracking === === Other signals === == Chemical data == Datasets from physical systems. === Chemical Reactions with transition states (TS) === === OpenReACT-CHON-EFH === OpenReACT-CHON-EFH (Open Reaction Dataset of Atomic ConfiguraTions comprising C, H, O and N with Energies, Forces and Hessians) is a 2025 open-access benchmark for machine-learning interatomic potentials. RTP set – 35,087 stationary-point geometries (reactant, transition state and product) drawn from 11,961 elementary reactions, each labeled with density-functional energies, atomic forces and full Hessian matrices at the ωB97X-D/6-31G(d) level. IRC set – 34,248 structures along 600 minimum-energy reaction paths, used to test extrapolation beyond trained stationary points. NMS set – 62,527 off-equilibrium geometries generated by normal-mode sampling to probe model robustness under thermal perturbations. The collection underpins the study Does Hessian Data Improve the Performance of Machine Learning Potentials? and was used to train and benchmark the machine-learning interatomic potentials reported therein. The dataset itself is distributed under a CC licence via Figshare. == Physical data == Datasets from physical systems. === High-energy physics === === Systems === === Astronomy === === Earth science === === Other physical === == Biological data == Datasets from biological systems. === Human === === Animal === === Fungi === === Plant === === Microbe === === Drug discovery === == Anomaly data == == Question answering data == This section includes datasets that deals with structured data. == Dialog or instruction prompted data == This section includes datasets that contains multi-turn text with at least two actors, a "user" and an "agent". The user makes requests for the agent, which performs the request. == Cybersecurity == == Climate and sustainability == == Code data == == Multivariate data == === Financial === === Weather === === Census === === Transit === === Internet === === Games === === Other multivariate === == Curated repositories of datasets == As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research. OpenML: Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms. PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification and regression datasets in a standardized format that are accessible through a Python API. Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic. Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.

    Read more →
  • International Conference on Computer Vision

    International Conference on Computer Vision

    The International Conference on Computer Vision (ICCV) is a research conference sponsored by the Institute of Electrical and Electronics Engineers (IEEE) held every other year. It is considered to be one of the top conferences in computer vision, alongside CVPR and ECCV, and it is held on years in which ECCV is not. The conference is usually spread over four to five days. Typically, experts in the focus areas give tutorial talks on the first day, then the technical sessions (and poster sessions in parallel) follow. Recent conferences have also had an increasing number of focused workshops and a commercial exhibition. == Awards == === Azriel Rosenfeld Lifetime Achievement Award === The Azriel Rosenfeld Award, or Azriel Rosenfeld Lifetime Achievement Award, recognizes researchers who have made significant contributions to the field of computer vision over their careers. It is named in memory of computer scientist and mathematician Azriel Rosenfeld. The following people have received this award: === Helmholtz Prize === The ICCV Helmholtz Prize, known as the Test of Time Award before 2013, is awarded every other year at the ICCV, recognizing ICCV papers from ten or more years earlier that had a significant impact on computer vision research. Winners are selected by the IEEE Computer Society's Technical Committee on Pattern Analysis and Machine Intelligence. The award is named after the 19th century physician and physicist Hermann von Helmholtz, and the ICCV's award is not related to the various Helmholtz Prizes in physics, or the Hermann von Helmholtz Prize in neuroscience. === Marr Prize === The ICCV best-paper award is the Marr Prize, named after British neuroscientist David Marr. === Mark Everingham Prize === The Mark Everingham Prize is an award given yearly by the Technical Committee on Pattern Analysis and Machine Intelligence of the IEEE Computer Society at the IEEE International Conference on Computer Vision or the European Conference on Computer Vision to commemorate the late Mark Everingham, "one of the rising stars of computer vision", and to encourage others to follow in his footsteps by acting to further progress in the computer vision community as a whole. The prize is given to a researcher, or a team of researchers, who have made a selfless contribution of significant benefit to other members of the computer vision community. The Mark Everingham Prize for Rigorous Evaluation was an award given in 2012 at the British Machine Vision Conference. === PAMI Distinguished Researcher Award === The PAMI Distinguished Researcher Award (until 2013 called Significant Researcher Award) is awarded to candidates whose research projects have significantly contributed to the progress of computer vision. Awards are made based on major research contributions, as well as the role of those contributions in influencing and inspiring other research. Candidates are nominated by the community. The following people have received this award: == Conference list == The conference is usually held in the Spring in various international locations.

    Read more →
  • Neural Networks (journal)

    Neural Networks (journal)

    Neural Networks is a monthly peer-reviewed scientific journal and an official journal of the International Neural Network Society, European Neural Network Society, and Japanese Neural Network Society. == History == The journal was established in 1988 and is published by Elsevier. It covers all aspects of research on artificial neural networks. The founding editor-in-chief was Stephen Grossberg (Boston University). The current editors-in-chief are DeLiang Wang (Ohio State University) and Taro Toyoizumi (RIKEN Center for Brain Science). == Abstracting and indexing == The journal is abstracted and indexed in Scopus and the Science Citation Index Expanded. According to the Journal Citation Reports, the journal has a 2022 impact factor of 7.8.

    Read more →
  • TimeTiger

    TimeTiger

    TimeTiger is a time and project tracking app developed by Indigo Technologies Ltd. in Toronto, Ontario, Canada. Indigo was founded in 1997 and initially released TimeTiger in 1998. == Company == The company was incorporated in 1997 and began operations as a custom software developer. TimeTiger (internally called TaskMaster) was developed as a tool to help with Indigo's own project planning and estimating. After releasing TimeTiger as a commercial product in 1998, Indigo shifted its focus to time and project management solutions. TimeTiger first introduced support for web-based time logging in 2000, to appeal to workers who were not already tracking their time for billing reasons. Subsequent development emphasized project analysis tools. == Features == Web-based electronic time log "To Do" list to monitor project and non-project activities Pivot table report designer Role-based access control == Software integration == Reports can be exported to Microsoft Excel or saved as Excel-compatible HTML files. Microsoft Project files can be imported and exported. A Software Development Kit is available.

    Read more →
  • Radial basis function

    Radial basis function

    In mathematics a radial basis function (RBF) is a real-valued function φ {\textstyle \varphi } whose value depends only on the distance between the input and some fixed point, either the origin, so that φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} , or some other fixed point c {\textstyle \mathbf {c} } , called a center, so that φ ( x ) = φ ^ ( ‖ x − c ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} -\mathbf {c} \right\|)} . Any function φ {\textstyle \varphi } that satisfies the property φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} is a radial function. The distance is usually Euclidean distance, although other metrics are sometimes used. They are often used as a collection { φ k } k {\displaystyle \{\varphi _{k}\}_{k}} which forms a basis for some function space of interest, hence the name. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally applied to machine learning, in work by David Broomhead and David Lowe in 1988, which stemmed from Michael J. D. Powell's seminal research from 1977. RBFs are also used as a kernel in support vector classification. The technique has proven effective and flexible enough that radial basis functions are now applied in a variety of engineering applications. == Definition == A radial function is a function φ : [ 0 , ∞ ) → R {\textstyle \varphi :[0,\infty )\to \mathbb {R} } . When paired with a norm ‖ ⋅ ‖ : V → [ 0 , ∞ ) {\textstyle \|\cdot \|:V\to [0,\infty )} on a vector space, a function of the form φ c = φ ( ‖ x − c ‖ ) {\textstyle \varphi _{\mathbf {c} }=\varphi (\|\mathbf {x} -\mathbf {c} \|)} is said to be a radial kernel centered at c ∈ V {\textstyle \mathbf {c} \in V} . A radial function and the associated radial kernels are said to be radial basis functions if, for any finite set of nodes { x k } k = 1 n ⊆ V {\displaystyle \{\mathbf {x} _{k}\}_{k=1}^{n}\subseteq V} , all of the following conditions are true: === Examples === Commonly used types of radial basis functions include (writing r = ‖ x − x i ‖ {\textstyle r=\left\|\mathbf {x} -\mathbf {x} _{i}\right\|} and using ε {\textstyle \varepsilon } to indicate a shape parameter that can be used to scale the input of the radial kernel): == Approximation == Radial basis functions are typically used to build up function approximations of the form where the approximating function y ( x ) {\textstyle y(\mathbf {x} )} is represented as a sum of N {\displaystyle N} radial basis functions, each associated with a different center x i {\textstyle \mathbf {x} _{i}} , and weighted by an appropriate coefficient w i . {\textstyle w_{i}.} The weights w i {\textstyle w_{i}} can be estimated using the matrix methods of linear least squares, because the approximating function is linear in the weights w i {\textstyle w_{i}} . Approximation schemes of this kind have been particularly used in time series prediction and control of nonlinear systems exhibiting sufficiently simple chaotic behaviour and 3D reconstruction in computer graphics (for example, hierarchical RBF and Pose Space Deformation). == RBF Network == The sum can also be interpreted as a rather simple single-layer type of artificial neural network called a radial basis function network, with the radial basis functions taking on the role of the activation functions of the network. It can be shown that any continuous function on a compact interval can in principle be interpolated with arbitrary accuracy by a sum of this form, if a sufficiently large number N {\textstyle N} of radial basis functions is used. The approximant y ( x ) {\textstyle y(\mathbf {x} )} is differentiable with respect to the weights w i {\textstyle w_{i}} . The weights could thus be learned using any of the standard iterative methods for neural networks. Using radial basis functions in this manner yields a reasonable interpolation approach provided that the fitting set has been chosen such that it covers the entire range systematically (equidistant data points are ideal). However, without a polynomial term that is orthogonal to the radial basis functions, estimates outside the fitting set tend to perform poorly. == RBFs for PDEs == Radial basis functions are used to approximate functions and so can be used to discretize and numerically solve Partial Differential Equations (PDEs). This was first done in 1990 by E. J. Kansa who developed the first RBF based numerical method. It is called the Kansa method and was used to solve the elliptic Poisson equation and the linear advection-diffusion equation. The function values at points x {\displaystyle \mathbf {x} } in the domain are approximated by the linear combination of RBFs: The derivatives are approximated as such: where N {\displaystyle N} are the number of points in the discretized domain, d {\displaystyle d} the dimension of the domain and λ {\displaystyle \lambda } the scalar coefficients that are unchanged by the differential operator. Different numerical methods based on Radial Basis Functions were developed thereafter. Some methods are the RBF-FD method, the RBF-QR method and the RBF-PUM method.

    Read more →
  • Multilayer perceptron

    Multilayer perceptron

    In deep learning, a multilayer perceptron (MLP) is a kind of modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is not linearly separable. Modern neural networks are trained using backpropagation and are colloquially referred to as "vanilla" networks. MLPs grew out of an effort to improve on single-layer perceptrons, which could only be applied to linearly separable data. A perceptron traditionally used a Heaviside step function as its nonlinear activation function. However, the backpropagation algorithm requires that modern MLPs use continuous activation functions such as sigmoid or ReLU. Multilayer perceptrons form the basis of deep learning, and are applicable across a vast set of diverse domains. == Timeline == In 1943, Warren McCulloch and Walter Pitts proposed the binary artificial neuron as a logical model of biological neural networks. In 1958, Frank Rosenblatt proposed the multilayered perceptron model, consisting of an input layer, a hidden layer with randomized weights that did not learn, and an output layer with learnable connections. In 1962, Rosenblatt published many variants and experiments on perceptrons in his book Principles of Neurodynamics, including up to 2 trainable layers by "back-propagating errors". However, it was not the backpropagation algorithm, and he did not have a general method for training multiple layers. In 1965, Alexey Grigorevich Ivakhnenko and Valentin Lapa published Group Method of Data Handling. It was one of the first deep learning methods, used to train an eight-layer neural net in 1971. In 1967, Shun'ichi Amari reported the first multilayered neural network trained by stochastic gradient descent, was able to classify non-linearily separable pattern classes. Amari's student Saito conducted the computer experiments, using a five-layered feedforward network with two learning layers. Backpropagation was independently developed multiple times in early 1970s. The earliest published instance was Seppo Linnainmaa's master thesis (1970). Paul Werbos developed it independently in 1971, but had difficulty publishing it until 1982. In 1986, David E. Rumelhart et al. popularized backpropagation. In 2003, interest in backpropagation networks returned due to the successes of deep learning being applied to language modelling by Yoshua Bengio with co-authors. In 2021, a very simple NN architecture combining two deep MLPs with skip connections and layer normalizations was designed and called MLP-Mixer; its realizations featuring 19 to 431 millions of parameters were shown to be comparable to vision transformers of similar size on ImageNet and similar image classification tasks. == Mathematical foundations == === Activation function === If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that maps the weighted inputs to the output of each neuron, then linear algebra shows that any number of layers can be reduced to a two-layer input-output model. In MLPs some neurons use a nonlinear activation function that was developed to model the frequency of action potentials, or firing, of biological neurons. The two historically common activation functions are both sigmoids, and are described by y ( v i ) = tanh ⁡ ( v i ) and y ( v i ) = ( 1 + e − v i ) − 1 {\displaystyle y(v_{i})=\tanh(v_{i})~~{\textrm {and}}~~y(v_{i})=(1+e^{-v_{i}})^{-1}} . The first is a hyperbolic tangent that ranges from −1 to 1, while the other is the logistic function, which is similar in shape but ranges from 0 to 1. Here y i {\displaystyle y_{i}} is the output of the i {\displaystyle i} th node (neuron) and v i {\displaystyle v_{i}} is the weighted sum of the input connections. Alternative activation functions have been proposed, including the rectifier and softplus functions. More specialized activation functions include radial basis functions (used in radial basis networks, another class of supervised neural network models). In recent developments of deep learning the rectified linear unit (ReLU) is more frequently used as one of the possible ways to overcome the numerical problems related to the sigmoids. === Layers === The MLP consists of three or more layers (an input and an output layer with one or more hidden layers) of nonlinearly-activating nodes. Since MLPs are fully connected, each node in one layer connects with a certain weight w i j {\displaystyle w_{ij}} to every node in the following layer. === Learning === Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. This is an example of supervised learning, and is carried out through backpropagation, a generalization of the least mean squares algorithm in the linear perceptron. We can represent the degree of error in an output node j {\displaystyle j} in the n {\displaystyle n} th data point (training example) by e j ( n ) = d j ( n ) − y j ( n ) {\displaystyle e_{j}(n)=d_{j}(n)-y_{j}(n)} , where d j ( n ) {\displaystyle d_{j}(n)} is the desired target value for n {\displaystyle n} th data point at node j {\displaystyle j} , and y j ( n ) {\displaystyle y_{j}(n)} is the value produced by the perceptron at node j {\displaystyle j} when the n {\displaystyle n} th data point is given as an input. The node weights can then be adjusted based on corrections that minimize the error in the entire output for the n {\displaystyle n} th data point, given by E ( n ) = 1 2 ∑ output node j e j 2 ( n ) {\displaystyle {\mathcal {E}}(n)={\frac {1}{2}}\sum _{{\text{output node }}j}e_{j}^{2}(n)} . Using gradient descent, the change in each weight w i j {\displaystyle w_{ij}} is Δ w j i ( n ) = − η ∂ E ( n ) ∂ v j ( n ) y i ( n ) {\displaystyle \Delta w_{ji}(n)=-\eta {\frac {\partial {\mathcal {E}}(n)}{\partial v_{j}(n)}}y_{i}(n)} where y i ( n ) {\displaystyle y_{i}(n)} is the output of the previous neuron i {\displaystyle i} , and η {\displaystyle \eta } is the learning rate, which is selected to ensure that the weights quickly converge to a response, without oscillations. In the previous expression, ∂ E ( n ) ∂ v j ( n ) {\displaystyle {\frac {\partial {\mathcal {E}}(n)}{\partial v_{j}(n)}}} denotes the partial derivate of the error E ( n ) {\displaystyle {\mathcal {E}}(n)} according to the weighted sum v j ( n ) {\displaystyle v_{j}(n)} of the input connections of neuron i {\displaystyle i} . The derivative to be calculated depends on the induced local field v j {\displaystyle v_{j}} , which itself varies. It is easy to prove that for an output node this derivative can be simplified to − ∂ E ( n ) ∂ v j ( n ) = e j ( n ) ϕ ′ ( v j ( n ) ) {\displaystyle -{\frac {\partial {\mathcal {E}}(n)}{\partial v_{j}(n)}}=e_{j}(n)\phi ^{\prime }(v_{j}(n))} where ϕ ′ {\displaystyle \phi ^{\prime }} is the derivative of the activation function described above, which itself does not vary. The analysis is more difficult for the change in weights to a hidden node, but it can be shown that the relevant derivative is − ∂ E ( n ) ∂ v j ( n ) = ϕ ′ ( v j ( n ) ) ∑ k − ∂ E ( n ) ∂ v k ( n ) w k j ( n ) {\displaystyle -{\frac {\partial {\mathcal {E}}(n)}{\partial v_{j}(n)}}=\phi ^{\prime }(v_{j}(n))\sum _{k}-{\frac {\partial {\mathcal {E}}(n)}{\partial v_{k}(n)}}w_{kj}(n)} . This depends on the change in weights of the k {\displaystyle k} th nodes, which represent the output layer. So to change the hidden layer weights, the output layer weights change according to the derivative of the activation function, and so this algorithm represents a backpropagation of the activation function.

    Read more →
  • Linear genetic programming

    Linear genetic programming

    "Linear genetic programming" is unrelated to "linear programming". Linear genetic programming (LGP) is a particular method of genetic programming wherein computer programs in a population are represented as a sequence of register-based instructions from an imperative programming language or machine language. The adjective "linear" stems from the fact that each LGP program is a sequence of instructions and the sequence of instructions is normally executed sequentially. Like in other programs, the data flow in LGP can be modeled as a graph that will visualize the potential multiple usage of register contents and the existence of structurally noneffective code (introns) which are two main differences of this genetic representation from the more common tree-based genetic programming (TGP) variant. Like other Genetic Programming methods, Linear genetic programming requires the input of data to run the program population on. Then, the output of the program (its behaviour) is judged against some target behaviour, using a fitness function. However, LGP is generally more efficient than tree genetic programming due to its two main differences mentioned above: Intermediate results (stored in registers) can be reused and a simple intron removal algorithm exists that can be executed to remove all non-effective code prior to programs being run on the intended data. These two differences often result in compact solutions and substantial computational savings compared to the highly constrained data flow in trees and the common method of executing all tree nodes in TGP. Furthermore, LGP naturally has multiple outputs by defining multiple output registers and easily cooperates with control flow operations. Linear genetic programming has been applied in many domains, including system modeling and system control with considerable success. Linear genetic programming should not be confused with linear tree programs in tree genetic programming, program composed of a variable number of unary functions and a single terminal. Note that linear tree GP differs from bit string genetic algorithms since a population may contain programs of different lengths and there may be more than two types of functions or more than two types of terminals. == Examples of LGP programs == Because LGP programs are basically represented by a linear sequence of instructions, they are simpler to read and to operate on than their tree-based counterparts. For example, a simple program written to solve a Boolean function problem with 3 inputs (in R1, R2, R3) and one output (in R0), could read like this: R1, R2, R3 have to be declared as input (read-only) registers, while R0 and R4 are declared as calculation (read-write) registers. This program is very simple, having just 5 instructions. But mutation and crossover operators could work to increase the length of the program, as well as the content of each of its instructions. Note that one instruction is non-effective or an intron (marked), since it does not impact the output register R0. Recognition of those instructions is the basis for the intron removal algorithm which is used analyze code prior to execution. Technically, this happens by copying an individual and then run the intron removal once. The copy with removed introns is then executed as many times as dictated by the number of training cases. Notably, the original individual is left intact, so as to continue participating in the evolutionary process. It is only the copy that is executed that is compressed by removing these "structural" introns. Another simple program, this one written in the LGP language Slash/A looks like a series of instructions separated by a slash: By representing such code in bytecode format, i.e. as an array of bytes each representing a different instruction, one can make mutation operations simply by changing an element of such an array.

    Read more →
  • Fuse Mediation Router

    Fuse Mediation Router

    Fuse Mediation Router is an open source tool for integrating services using Enterprise Integration Patterns based on Apache Camel for use in enterprise IT organizations. It is certified, productized and fully supported by the people who wrote the code. Fuse Mediation Router uses a standard method of notation to go from diagram to implementation without coding. Fuse Mediation Router is a rule-based routing and process mediation engine that combines the ease of basic POJO development with the clarity of the standard Enterprise Integration Patterns. It can be deployed inside any container or be used stand-alone, and works directly with any kind of transport or messaging model to rapidly integrate existing services and applications. Fuse Mediation Router is now a part of Red Hat JBoss Fuse. == Tooling == FuseSource offers graphical, Eclipse-based tooling for Apache Camel for download.

    Read more →
  • Optical neural network

    Optical neural network

    An optical neural network is a physical implementation of an artificial neural network with optical components. Early optical neural networks used a photorefractive Volume hologram to interconnect arrays of input neurons to arrays of output with synaptic weights in proportion to the multiplexed hologram's strength. Volume holograms were further multiplexed using spectral hole burning to add one dimension of wavelength to space to achieve four dimensional interconnects of two dimensional arrays of neural inputs and outputs. This research led to extensive research on alternative methods using the strength of the optical interconnect for implementing neuronal communications. Some artificial neural networks that have been implemented as optical neural networks include the Hopfield neural network and the Kohonen self-organizing map with liquid crystal spatial light modulators Optical neural networks can also be based on the principles of neuromorphic engineering, creating neuromorphic photonic systems. Typically, these systems encode information in the networks using spikes, mimicking the functionality of spiking neural networks in optical and photonic hardware. Photonic devices that have demonstrated neuromorphic functionalities include (among others) vertical-cavity surface-emitting lasers, integrated photonic modulators, optoelectronic systems based on superconducting Josephson junctions or systems based on resonant tunnelling diodes. == Electrochemical vs. optical neural networks == Biological neural networks function on an electrochemical basis, while optical neural networks use electromagnetic waves. Optical interfaces to biological neural networks can be created with optogenetics, but is not the same as an optical neural networks. In biological neural networks there exist a lot of different mechanisms for dynamically changing the state of the neurons, these include short-term and long-term synaptic plasticity. Synaptic plasticity is among the electrophysiological phenomena used to control the efficiency of synaptic transmission, long-term for learning and memory, and short-term for short transient changes in synaptic transmission efficiency. Implementing this with optical components is difficult, and ideally requires advanced photonic materials. Properties that might be desirable in photonic materials for optical neural networks include the ability to change their efficiency of transmitting light, based on the intensity of incoming light. == Rising Era of Optical Neural Networks == With the increasing significance of computer vision in various domains, the computational cost of these tasks has increased, making it more important to develop the new approaches of the processing acceleration. Optical computing has emerged as a potential alternative to GPU acceleration for modern neural networks, particularly considering the looming obsolescence of Moore's Law. Consequently, optical neural networks have garnered increased attention in the research community. Presently, two primary methods of optical neural computing are under research: silicon photonics-based and free-space optics. Each approach has its benefits and drawbacks; while silicon photonics may offer superior speed, it lacks the massive parallelism that free-space optics can deliver. Given the substantial parallelism capabilities of free-space optics, researchers have focused on taking advantage of it. One implementation, proposed by Lin et al., involves the training and fabrication of phase masks for a handwritten digit classifier. By stacking 3D-printed phase masks, light passing through the fabricated network can be read by a photodetector array of ten detectors, each representing a digit class ranging from 1 to 10. Although this network can achieve terahertz-range classification, it lacks flexibility, as the phase masks are fabricated for a specific task and cannot be retrained. An alternative method for classification in free-space optics, introduced by Cahng et al., employs a 4F system that is based on the convolution theorem to perform convolution operations. This system uses two lenses to execute the Fourier transforms of the convolution operation, enabling passive conversion into the Fourier domain without power consumption or latency. However, the convolution operation kernels in this implementation are also fabricated phase masks, limiting the device's functionality to specific convolutional layers of the network only. In contrast, Li et al. proposed a technique involving kernel tiling to use the parallelism of the 4F system while using a Digital Micromirror Device (DMD) instead of a phase mask. This approach allows users to upload various kernels into the 4F system and execute the entire network's inference on a single device. Unfortunately, modern neural networks are not designed for the 4F systems, as they were primarily developed during the CPU/GPU era. Mostly because they tend to use a lower resolution and a high number of channels in their feature maps. == Other Implementations == In 2007 there was one model of Optical Neural Network: the Programmable Optical Array/Analogic Computer (POAC). It had been implemented in the year 2000 and reported based on modified Joint Fourier Transform Correlator (JTC) and Bacteriorhodopsin (BR) as a holographic optical memory. Full parallelism, large array size and the speed of light are three promises offered by POAC to implement an optical CNN. They had been investigated during the last years with their practical limitations and considerations yielding the design of the first portable POAC version. The practical details – hardware (optical setups) and software (optical templates) – were published. However, POAC is a general purpose and programmable array computer that has a wide range of applications including: image processing pattern recognition target tracking real-time video processing document security optical switching == Progress in the 2020s == Taichi from Tsinghua University in Beijing is a hybrid ONN that combines the power efficiency and parallelism of optical diffraction and the configurability of optical interference. Taichi offers 13.96 million parameters. Taichi avoids the high error rates that afflict deep (multi-layer) networks by combining clusters of fewer-layer diffractive units with arrays of interferometers for reconfigurable computation. Its encoding protocol divides large network models into sub-models that can be distributed across multiple chiplets in parallel. Taichi achieved 91.89% accuracy in tests with the Omniglot database. It was also used to generate music Bach and generate images the styles of Van Gogh and Munch. The developers claimed energy efficiency of up to 160 trillion operations second−1 watt−1 and an area efficiency of 880 trillion multiply-accumulate operations mm−2 or 103 more energy efficient than the NVIDIA H100, and 102 times more energy efficient and 10 times more area efficient than previous ONNs. Time dimension has recently been introduced into diffractive neural network by fs laser lithography of perovskite hydration. The temporal behaviour of the neuron can be modulated by the fs laser at the nanoscale, enabling a programmable holographic neural network with temporal evolution functionality, i.e., the functionality can change with time under the hydration stimuli. An in-memory temporal inference functionality was demonstrated to mimic the function evolution of the human brain, i.e., the functionality can change from simple digit image classification to more complicated digit and clothing product image classification with time. This is the first time of introducing time dimension into the optical neural network, laying a foundation for future brain-like photonic chip development.

    Read more →
  • Fitness approximation

    Fitness approximation

    Fitness approximation aims to approximate the objective or fitness functions in evolutionary optimization by building up machine learning models based on data collected from numerical simulations or physical experiments. The machine learning models for fitness approximation are also known as meta-models or surrogates, and evolutionary optimization based on approximated fitness evaluations are also known as surrogate-assisted evolutionary approximation. Fitness approximation in evolutionary optimization can be seen as a sub-area of data-driven evolutionary optimization. == Approximate models in function optimization == === Motivation === In many real-world optimization problems including engineering problems, the number of fitness function evaluations needed to obtain a good solution dominates the optimization cost. In order to obtain efficient optimization algorithms, it is crucial to use prior information gained during the optimization process. Conceptually, a natural approach to utilizing the known prior information is building a model of the fitness function to assist in the selection of candidate solutions for evaluation. A variety of techniques for constructing such a model, often also referred to as surrogates, metamodels or approximation models – for computationally expensive optimization problems have been considered. === Approaches === Common approaches to constructing approximate models based on learning and interpolation from known fitness values of a small population include: Low-degree polynomials and regression models Fourier surrogate modeling Artificial neural networks including Multilayer perceptrons Radial basis function network Support vector machines Due to the limited number of training samples and high dimensionality encountered in engineering design optimization, constructing a globally valid approximate model remains difficult. As a result, evolutionary algorithms using such approximate fitness functions may converge to local optima. Therefore, it can be beneficial to selectively use the original fitness function together with the approximate model.

    Read more →