AI Detector Accuracy

AI Detector Accuracy — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • DryvIQ

    DryvIQ

    DryvIQ is a software application that enables businesses to migrate on-site system files and associated data across storage and content management platforms, as well as create synchronized hybrid storage systems. == History == Before it was DryvIQ, the software SkySync was released in 2013 by Ann Arbor, Michigan based company, Portal Architects, Inc. The company created SkySync, a back-end, administrative application designed to transfer content across storage platforms, after abandoning 18 months of development on a desktop application called SkyBrary in 2011. Between 2014 and 2015, Portal Architects established partnerships with the following companies: Autodesk, Box, Dropbox, Egnyte, EMC, Google, Syncplicity, Huddle, IBM, Microsoft, OpenText, Oracle, Citrix ShareFile, Hightail and Internet2. SkySync (currently DryvIQ) was named a "Cool Vendor in Content Management" by Gartner in 2015. In 2022, SkySync changed its name to DryvIQ, which is now what the company is currently known as. == Overview == DryvIQ is a software application that syncs, migrates or backs up files including their associated properties, metadata, versions, user accounts and permissions across on-premises and Cloud-based storage platforms. The software deploys on a server, virtual machine or within Microsoft Azure, Amazon Web Services or other cloud computing services.

    Read more →
  • Alec Radford

    Alec Radford

    Alec Radford is an American artificial intelligence researcher. == Biography == Radford grew up in Texas. He graduated from Cistercian Preparatory School in 2011, where he became an Eagle Scout, and dropped out of Olin College in August 2014, where he and fellow students Slater Victoroff, Diana Yuan, and Madison May had formed the startup Indico in their dorm room. In 2015, the quartet were joined by Luke Metz and the firm and the Facebook AI research lab in New York used generative adversarial networks to create realistic low pixel images. A demonstration of Indico's technology was used without proper attribution in an April 2016 demonstration by Nvidia chief executive Jensen Huang. Radford joined OpenAI around 2016, where he worked on natural-language processing. The following year, Radford trained a neural network on Amazon reviews. The model was fairly basic, with layers which allowed for human understanding. Upon exploring it, he saw that it had a special neuron linked to the sentiment of the reviews, which it had created on its own. This was a drastic improvement from previous neural networks that had analysed sentiment, because they had to be told to do so and specially trained on data that was explicitly labeled according to sentiment. This development made OpenAI chief scientist Ilya Sutskever consider that a future model, using more diverse language data, could map far more structures of meaning, eventually becoming a "learned core module" for superintelligence. In 2018, Radford was the lead author on OpenAI's seminal research paper on generative pre-trained transformers, which form the foundation of ChatGPT. At OpenAI, he worked on early GPT models, Whisper, a speech recognition model, and the image generator DALL-E. He left OpenAI in December 2024 to pursue independent research. Around March 2025, Radford joined Thinking Machines Lab as an advisor. He joined along with Bob McGrew who was previously the chief research officer of OpenAI. In April 2026, Radford, Nick Levine, and David Duvenaud released Talkie, an AI model trained on books, newspapers, scientific journals, patents, and case law published before December 31, 1930. When asked about the state of the world in 2026, it stated that one billion people would live in Europe, that London and New York would be connected by steamships that transit between the two in ten days, and "winter will be passed in Paris, and the summer in London."

    Read more →
  • Linde–Buzo–Gray algorithm

    Linde–Buzo–Gray algorithm

    The Linde–Buzo–Gray algorithm (named after its creators Yoseph Linde, Andrés Buzo and Robert M. Gray, who designed it in 1980) is an iterative vector quantization algorithm to improve a small set of vectors (codebook) to represent a larger set of vectors (training set), such that it will be locally optimal. It combines Lloyd's Algorithm with a splitting technique in which larger codebooks are built from smaller codebooks by splitting each code vector in two. The core idea of the algorithm is that by splitting the codebook such that all code vectors from the previous codebook are present, the new codebook must be as good as the previous one or better. == Description == The Linde–Buzo–Gray algorithm may be implemented as follows: algorithm linde-buzo-gray is input: set of training vectors training, codebook to improve old-codebook output: codebook that is twice the size and better or as good as old-codebook new-codebook ← {} for each old-codevector in old-codebook do insert old-codevector into new-codebook insert old-codevector + 𝜖 into new-codebook where 𝜖 is a small vector return lloyd(new-codebook, training) algorithm lloyd is input: codebook to improve, set of training vectors training output: improved codebook do previous-codebook ← codebook clusters ← divide training into |codebook| clusters, where each cluster contains all vectors in training who are best represented by the corresponding vector in codebook for each cluster cluster in clusters do the corresponding code vector in codebook ← the centroid of all training vectors in cluster while difference in error representing training between codebook and previous-codebook > 𝜖 return codebook

    Read more →
  • ML.NET

    ML.NET

    ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions. == Machine learning == ML.NET brings model-based Machine Learning analytic and prediction capabilities to existing .NET developers. The framework is built upon .NET Core and .NET Standard inheriting the ability to run cross-platform on Linux, Windows and macOS. Although the ML.NET framework is new, its origins began in 2002 as a Microsoft Research project named TMSN (text mining search and navigation) for use internally within Microsoft products. It was later renamed to TLC (the learning code) around 2011. ML.NET was derived from the TLC library and has largely surpassed its parent says Dr. James McCaffrey, Microsoft Research. Developers can train a Machine Learning Model or reuse an existing Model by a 3rd party and run it on any environment offline. This means developers do not need to have a background in Data Science to use the framework. Support for the open-source Open Neural Network Exchange (ONNX) Deep Learning model format was introduced from build 0.3 in ML.NET. The release included other notable enhancements such as Factorization Machines, LightGBM, Ensembles, LightLDA transform and OVA. The ML.NET integration of TensorFlow is enabled from the 0.5 release. Support for x86 & x64 applications was added to build 0.7 including enhanced recommendation capabilities with Matrix Factorization. A full roadmap of planned features have been made available on the official GitHub repo. The first stable 1.0 release of the framework was announced at Build (developer conference) 2019. It included the addition of a Model Builder tool and AutoML (Automated Machine Learning) capabilities. Build 1.3.1 introduced a preview of Deep Neural Network training using C# bindings for Tensorflow and a Database loader which enables model training on databases. The 1.4.0 preview added ML.NET scoring on ARM processors and Deep Neural Network training with GPU's for Windows and Linux. === Performance === Microsoft's paper on machine learning with ML.NET demonstrated it is capable of training sentiment analysis models using large datasets while achieving high accuracy. Its results showed 95% accuracy on Amazon's 9GB review dataset. === Model builder === The ML.NET CLI is a Command-line interface which uses ML.NET AutoML to perform model training and pick the best algorithm for the data. The ML.NET Model Builder preview is an extension for Visual Studio that uses ML.NET CLI and ML.NET AutoML to output the best ML.NET Model using a GUI. === Model explainability === AI fairness and explainability has been an area of debate for AI Ethicists in recent years. A major issue for Machine Learning applications is the black box effect where end users and the developers of an application are unsure of how an algorithm came to a decision or whether the dataset contains bias. Build 0.8 included model explainability API's that had been used internally in Microsoft. It added the capability to understand the feature importance of models with the addition of 'Overall Feature Importance' and 'Generalized Additive Models'. When there are several variables that contribute to the overall score, it is possible to see a breakdown of each variable and which features had the most impact on the final score. The official documentation demonstrates that the scoring metrics can be output for debugging purposes. During training & debugging of a model, developers can preview and inspect live filtered data. This is possible using the Visual Studio DataView tools. === Infer.NET === Microsoft Research announced the popular Infer.NET model-based machine learning framework used for research in academic institutions since 2008 has been released open source and is now part of the ML.NET framework. The Infer.NET framework utilises probabilistic programming to describe probabilistic models which has the added advantage of interpretability. The Infer.NET namespace has since been changed to Microsoft.ML.Probabilistic consistent with ML.NET namespaces. === NimbusML Python support === Microsoft acknowledged that the Python programming language is popular with Data Scientists, so it has introduced NimbusML the experimental Python bindings for ML.NET. This enables users to train and use machine learning models in Python. It was made open source similar to Infer.NET. === Machine learning in the browser === ML.NET allows users to export trained models to the Open Neural Network Exchange (ONNX) format. This establishes an opportunity to use models in different environments that don't use ML.NET. It would be possible to run these models in the client side of a browser using ONNX.js, a JavaScript client-side framework for deep learning models created in the Onnx format. === AI School Machine Learning Course === Along with the rollout of the ML.NET preview, Microsoft rolled out free AI tutorials and courses to help developers understand techniques needed to work with the framework.

    Read more →
  • Adversarial machine learning

    Adversarial machine learning

    Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. Machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID). However, this assumption is often violated in practical high-stake applications, where users may intentionally supply fabricated data that violates the statistical assumption. Most common attacks in adversarial machine learning include evasion attacks, data poisoning attacks, Byzantine attacks and model extraction. == History == At the MIT Spam Conference in January 2004, John Graham-Cumming showed that a machine-learning spam filter could be used to defeat another machine-learning spam filter by automatically learning which words to add to a spam email to get the email classified as not spam. In 2004, Nilesh Dalvi and others noted that linear classifiers used in spam filters could be defeated by simple "evasion attacks" as spammers inserted "good words" into their spam emails. (Around 2007, some spammers added random noise to fuzz words within "image spam" in order to defeat OCR-based filters.) In 2006, Marco Barreno and others published "Can Machine Learning Be Secure?", outlining a broad taxonomy of attacks. As late as 2013 many researchers continued to hope that non-linear classifiers (such as support vector machines and neural networks) might be robust to adversaries, until Battista Biggio and others demonstrated the first gradient-based attacks on such machine-learning models (2012–2013). In 2012, deep neural networks began to dominate computer vision problems; starting in 2014, Christian Szegedy and others demonstrated that deep neural networks could be fooled by adversaries, again using a gradient-based attack to craft adversarial perturbations. Further work would show that adversarial attacks are harder to produce in uncontrolled environments, due to the different environmental constraints that cancel out the effect of noise. For example, any small rotation or slight illumination on an adversarial image can destroy the adversariality. In addition, researchers such as Google Brain's Nick Frosst point out that it is much easier to make self-driving cars miss stop signs by physically removing the sign itself, rather than creating adversarial examples. Frosst also believes that the adversarial machine learning community incorrectly assumes models trained on a certain data distribution will also perform well on a completely different data distribution. He suggests that a new approach to machine learning should be explored, and is currently working on a unique neural network that has characteristics more similar to human perception than state-of-the-art approaches. While adversarial machine learning continues to be heavily rooted in academia, large tech companies such as Google, Microsoft, and IBM have begun curating documentation and open source code bases to allow others to concretely assess the robustness of machine learning models and minimize the risk of adversarial attacks. === Examples === Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of "bad" words or the insertion of "good" words; attacks in computer security, such as obfuscating malware code within network packets or modifying the characteristics of a network flow to mislead intrusion detection; attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user; or to compromise users' template galleries that adapt to updated traits over time. Researchers showed that by changing only one-pixel it was possible to fool deep learning algorithms. Others 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed. Creating the turtle required only low-cost commercially available 3-D printing technology. A machine-tweaked image of a dog was shown to look like a cat to both computers and humans. A 2019 study reported that humans can guess how machines will classify adversarial images. Researchers discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle classified it as a merge or speed limit sign. A data poisoning filter called Nightshade was released in 2023 by researchers at the University of Chicago. It was created for use by visual artists to put on their artwork to corrupt the data set of text-to-image models, which usually scrape their data from the internet without the consent of the image creator. McAfee attacked Tesla's former Mobileye system, fooling it into driving 50 mph over the speed limit, simply by adding a two-inch strip of black tape to a speed limit sign. Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear". An adversarial attack on a neural network can allow an attacker to inject algorithms into the target system. Researchers can also create adversarial audio inputs to disguise commands to intelligent assistants in benign-seeming audio; a parallel literature explores human perception of such stimuli. Clustering algorithms are used in security applications. Malware and computer virus analysis aims to identify malware families, and to generate specific detection signatures. In the context of malware detection, researchers have proposed methods for adversarial malware generation that automatically craft binaries to evade learning-based detectors while preserving malicious functionality. Optimization-based attacks such as GAMMA use genetic algorithms to inject benign content (for example, padding or new PE sections) into Windows executables, framing evasion as a constrained optimization problem that balances misclassification success with the size of the injected payload and showing transferability to commercial antivirus products. Complementary work uses generative adversarial networks (GANs) to learn feature-space perturbations that cause malware to be classified as benign; Mal-LSGAN, for instance, replaces the standard GAN loss with a least-squares objective and modified activation functions to improve training stability and produce adversarial malware examples that substantially reduce true positive rates across multiple detectors. == Challenges in applying machine learning to security == Researchers have observed that the constraints under which machine-learning techniques function in the security domain are different from those of common benchmark domains. Security data may change over time, include mislabeled samples, or reflect adversarial behavior, which complicates evaluation and reproducibility. === Data collection issues === Security datasets vary across formats, including binaries, network traces, and log files. Studies have reported that the process of converting these sources into features can introduce bias or inconsistencies. In addition, time-based leakage can occur when related malware samples are not properly separated across training and testing splits, which may lead to overly optimistic results. === Labeling and ground truth challenges === Malware labels are often unstable because different antivirus engines may classify the same sample in conflicting ways. Ceschin et al. note that families may be renamed or reorganized over time, causing further discrepancies in ground truth and reducing the reliability of benchmarks. === Concept drift === Because malware creators continuously adapt their techniques, the statistical properties of malicious samples also change. This form of concept drift has been widely documented and may reduce model performance unless systems are updated regularly or incorporate mechanisms for incremental learning. === Feature robustness === Researchers differentiate between features that can be easily manipulated and those that are more resistant to modification. For example, simple static attributes, such as header fields, may be altered by attackers, while structural features, such as control-flow graphs, are generally more stable but computationally expensive to extract. === Class imbalance === In realistic deployment environments, the proportion of malicious samples can be extremely low, ranging from 0.01% to 2% of total data. This unbalanced distribution causes models to develop a bias towards the majority class, achieving high accuracy but failing to identify malicious samples. Prior approaches to this problem have included both data-level solutions and sequence-specific models. Methods like n-gram and Long Short-Term Memory (LSTM) networks can model sequential data, but their performance has been shown to decline significantly when malware samples are realistically proportioned in the training set, demonstrating the limitations in

    Read more →
  • Hyper basis function network

    Hyper basis function network

    In machine learning, a Hyper basis function network, or HyperBF network, is a generalization of radial basis function (RBF) networks concept, where the Mahalanobis-like distance is used instead of the Euclidean distance measure. Hyper basis function networks were first introduced by Poggio and Girosi in the 1990 paper “Networks for Approximation and Learning”. == Network Architecture == The typical HyperBF network structure consists of a real input vector x ∈ R n {\displaystyle x\in \mathbb {R} ^{n}} , a hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, ϕ : R n → R {\displaystyle \phi :\mathbb {R} ^{n}\to \mathbb {R} } , is given by where N {\displaystyle N} is a number of neurons in the hidden layer, μ j {\displaystyle \mu _{j}} and a j {\displaystyle a_{j}} are the center and weight of neuron j {\displaystyle j} . The activation function ρ j ( | | x − μ j | | ) {\displaystyle \rho _{j}(||x-\mu _{j}||)} at the HyperBF network takes the following form where R j {\displaystyle R_{j}} is a positive definite d × d {\displaystyle d\times d} matrix. Depending on the application, the following types of matrices R j {\displaystyle R_{j}} are usually considered R j = 1 2 σ 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma ^{2}}}\mathbb {I} _{d\times d}} , where σ > 0 {\displaystyle \sigma >0} . This case corresponds to the regular RBF network. R j = 1 2 σ j 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma _{j}^{2}}}\mathbb {I} _{d\times d}} , where σ j > 0 {\displaystyle \sigma _{j}>0} . In this case, the basis functions are radially symmetric, but are scaled with different width. R j = d i a g ( 1 2 σ j 1 2 , . . . , 1 2 σ j z 2 ) I d × d {\displaystyle R_{j}=diag\left({\frac {1}{2\sigma _{j1}^{2}}},...,{\frac {1}{2\sigma _{jz}^{2}}}\right)\mathbb {I} _{d\times d}} , where σ j i > 0 {\displaystyle \sigma _{ji}>0} . Every neuron has an elliptic shape with a varying size. Positive definite matrix, but not diagonal. == Training == Training HyperBF networks involves estimation of weights a j {\displaystyle a_{j}} , shape and centers of neurons R j {\displaystyle R_{j}} and μ j {\displaystyle \mu _{j}} . Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below. Consider the quadratic loss of the network H [ ϕ ∗ ] = ∑ i = 1 N ( y i − ϕ ∗ ( x i ) ) 2 {\displaystyle H[\phi ^{}]=\sum _{i=1}^{N}(y_{i}-\phi ^{}(x_{i}))^{2}} . The following conditions must be satisfied at the optimum: where R j = W T W {\displaystyle R_{j}=W^{T}W} . Then in the gradient descent method the values of a j , μ j , W {\displaystyle a_{j},\mu _{j},W} that minimize H [ ϕ ∗ ] {\displaystyle H[\phi ^{}]} can be found as a stable fixed point of the following dynamic system: where ω {\displaystyle \omega } determines the rate of convergence. Overall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.

    Read more →
  • Innovation Center for Artificial Intelligence

    Innovation Center for Artificial Intelligence

    The Innovation Center for Artificial Intelligence (ICAI) is a Dutch national network focused on joint technology development between academia, industry and government in the area of artificial intelligence (AI). The initiative was launched in April 2018 and is based at Amsterdam Science Park. As of 2024, the director of the ICAI is Maarten de Rijke. In November 2018, ICAI announced its contribution to AINED, the first iteration of the Dutch National AI Strategy. In January 2023, Maastricht University announced the ROBUST program, led by the Innovation Center for Artificial Intelligence (ICAI) and supported by the University of Amsterdam and others. This initiative focuses on advancing research in trustworthy AI technology across various sectors, notably healthcare and energy, in the Netherlands. The program's plan includes the creation of 17 new labs and the appointment of PhD candidates, backed by a €25 million funding from the Dutch Research Council (NWO). == Labs == The ICAI network is linked to several collaborative labs: Thira Lab (Imaging): Thirona, Delft Imaging Systems and Radboud UMC, founded March 2019 AIMLab (AI for Medical Imaging): Uva and Inception Institute of Artificial Intelligence from the United Arab Emirates, founded March 2019 AFL (AI for Fintech): ING and Delft University of Technology, founded March 2019 Police Lab AI: Dutch National Police, founded January 2019 Elsevier AI Lab: Uva and Elsevier, founded October 2018 AIRLab Delft (AI for Retail Robotics): TU Delft Robotics and AholdDelhaize, founded November 2018 Quva Lab (Deep Vision): Uva and Qualcomm, founded 2016 (prior to ICAI) AIRLab Amsterdam (AI for Retail): Uva and AholdDelhaize, founded April 2018 DeltaLab (Deep Learning Technologies Amsterdam): Uva and Bosch, founded April 2017 (prior to ICAI) AI4SE (AI for Software Engineering Lab) Delft University of Technology and JetBrains, founded October 2023 Atlas Lab: Uva and TomTom (TOM2)

    Read more →
  • Predictive Model Markup Language

    Predictive Model Markup Language

    The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago. PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and other feedforward neural networks. Version 0.9 was published in 1998. Subsequent versions have been developed by the Data Mining Group. Since PMML is an XML-based standard, the specification comes in the form of an XML schema. PMML itself is a mature standard with over 30 organizations having announced products supporting PMML. == PMML components == A PMML file can be described by the following components: Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version. It also contains an attribute for a timestamp which can be used to specify the date of model creation. Data Dictionary: contains definitions for all the possible fields used by the model. It is here that a field is defined as continuous, categorical, or ordinal (attribute optype). Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double). Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations. Normalization: map values to numbers, the input can be continuous or discrete. Discretization: map continuous values to discrete values. Value mapping: map discrete values to discrete values. Functions (custom and built-in): derive a value by applying a function to one or more parameters. Aggregation: used to summarize or collect groups of values. Model: contains the definition of the data mining model. E.g., A multi-layered feedforward neural network is represented in PMML by a "NeuralNetwork" element which contains attributes such as: Model Name (attribute modelName) Function Name (attribute functionName) Algorithm Name (attribute algorithmName) Activation Function (attribute activationFunction) Number of Layers (attribute numberOfLayers) This information is then followed by three kinds of neural layers which specify the architecture of the neural network model being represented in the PMML document. These attributes are NeuralInputs, NeuralLayer, and NeuralOutputs. Besides neural networks, PMML allows for the representation of many other types of models including support vector machines, association rules, Naive Bayes classifier, clustering models, text models, decision trees, and different regression models. Mining Schema: a list of all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as: Name (attribute name): must refer to a field in the data dictionary Usage type (attribute usageType): defines the way a field is to be used in the model. Typical values are: active, predicted, and supplementary. Predicted fields are those whose values are predicted by the model. Outlier Treatment (attribute outliers): defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values (based on the definition of high and low values for a particular field), or as is. Missing Value Replacement Policy (attribute missingValueReplacement): if this attribute is specified then a missing value is automatically replaced by the given values. Missing Value Treatment (attribute missingValueTreatment): indicates how the missing value replacement was derived (e.g. as value, mean or median). Targets: allows for post-processing of the predicted value in the format of scaling if the output of the model is continuous. Targets can also be used for classification tasks. In this case, the attribute priorProbability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values. Output: this element can be used to name all the desired output fields expected from the model. These are features of the predicted field and so are typically the predicted value itself, the probability, cluster affinity (for clustering models), standard error, etc. The latest release of PMML, PMML 4.1, extended Output to allow for generic post-processing of model outputs. In PMML 4.1, all the built-in and custom functions that were originally available only for pre-processing became available for post-processing too. == PMML 4.0, 4.1, 4.2 and 4.3 == PMML 4.0 was released on June 16, 2009. Examples of new features included: Improved Pre-Processing Capabilities: Additions to built-in functions include a range of Boolean operations and an If-Then-Else function. Time Series Models: New exponential Smoothing models; also place holders for ARIMA, Seasonal Trend Decomposition, and Spectral density estimation, which are to be supported in the near future. Model Explanation: Saving of evaluation and model performance measures to the PMML file itself. Multiple Models: Capabilities for model composition, ensembles, and segmentation (e.g., combining of regression and decision trees). Extensions of Existing Elements: Addition of multi-class classification for Support Vector Machines, improved representation for Association Rules, and the addition of Cox Regression Models. PMML 4.1 was released on December 31, 2011. New features included: New model elements for representing Scorecards, k-Nearest Neighbors (KNN) and Baseline Models. Simplification of multiple models. In PMML 4.1, the same element is used to represent model segmentation, ensemble, and chaining. Overall definition of field scope and field names. A new attribute that identifies for each model element if the model is ready or not for production deployment. Enhanced post-processing capabilities (via the Output element). PMML 4.2 was released on February 28, 2014. New features include: Transformations: New elements for implementing text mining New built-in functions for implementing regular expressions: matches, concat, and replace Simplified outputs for post-processing Enhancements to Scorecard and Naive Bayes model elements PMML 4.3 was released on August 23, 2016. New features include: New Model Types: Gaussian Process Bayesian Network New built-in functions Usage clarifications Documentation improvements Version 4.4 was released in November 2019. == Release history == == Data Mining Group == The Data Mining Group is a consortium managed by the Center for Computational Science Research, Inc., a nonprofit founded in 2008. The Data Mining Group also developed a standard called Portable Format for Analytics, or PFA, which is complementary to PMML.

    Read more →
  • Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.

    Read more →
  • Nortel Speech Server

    Nortel Speech Server

    The Nortel Speech Server (formerly known as Periphonics Speech Processing Platform) in telecommunications is a speech processing system that was originally developed by Nortel. Following the bankruptcy of Nortel, it is now sold by Avaya. The system is primarily used for large vocabulary speech recognition, natural language understanding, text-to-speech, and speaker verification. The Nortel Speech Server was based on the Periphonics OSCAR platform. The original OSCAR Platform was based upon Solaris servers. The current range of Speech Servers is Windows based. Nortel Speech Server is a component of the MPS 500, MPS 1000, and ICP platforms. On MPS systems, it may be used to stream prerecorded audio.

    Read more →
  • Jan Leike

    Jan Leike

    Jan Leike (born 1986 or 1987) is an AI alignment researcher who has worked at DeepMind and OpenAI. He joined Anthropic in May 2024. == Education == Jan Leike obtained his undergraduate degree from the University of Freiburg in Germany. After earning a master's degree in computer science, he pursued a PhD in machine learning at the Australian National University under the supervision of Marcus Hutter. == Career == Leike made a six-month postdoctoral fellowship at the Future of Humanity Institute before joining DeepMind to focus on empirical AI safety research, where he collaborated with Shane Legg. === OpenAI === In 2021, Leike joined OpenAI. In June 2023, he and Ilya Sutskever became the co-leaders of the newly introduced "superalignment" project, which aimed to determine how to align future artificial superintelligences within four years to ensure their safety. This project involved automating AI alignment research using relatively advanced AI systems. At the time, Sutskever was OpenAI's Chief Scientist, and Leike was the Head of Alignment. Leike was featured in Time's list of the 100 most influential personalities in AI, both in 2023 and in 2024. In May 2024, Leike announced his resignation from OpenAI, following the departure of Sutskever, Daniel Kokotajlo and several other AI safety employees from the company. Leike wrote that "Over the past years, safety culture and processes have taken a backseat to shiny products", and that he "gradually lost trust" in OpenAI's leadership. In May 2024, Leike joined Anthropic, an AI company founded by former OpenAI employees.

    Read more →
  • Jailbreak (computer science)

    Jailbreak (computer science)

    In computer security, jailbreaking is defined as the act of removing limitations that a vendor attempted to hard-code or hard-wire into its hardware and/or software. It is a form of privilege escalation. The term may have originated with the use of toolsets to break out of a chroot or jail in UNIX-like operating systems. This allowed the user to see files outside of the file system that the administrator intended to make available to the application or user in question. The term was first used in its modern meaning in the iPhone/iOS jailbreaking community and has also been used as a term for PlayStation Portable hacking; these devices have repeatedly been subject to jailbreaks, allowing the execution of arbitrary code, and sometimes have had those jailbreaks disabled by vendor updates, especially in the case of iOS devices. == iOS jailbreaking == iOS systems including the iPhone, iPad, and iPod Touch have been subject to iOS jailbreaking efforts since they were released, and continuing with each firmware update. iOS jailbreaking tools have included the option to install package frontends such as Cydia and Installer.app, third-party alternatives to the App Store, as a way to find and install system tweaks and binaries. To prevent iOS jailbreaking, Apple has made the device boot ROM execute checks for SHSH blobs in order to disallow uploads of custom kernels and prevent software downgrades to earlier, jailbreakable firmware. In an "untethered" jailbreak, the iBoot environment is changed to execute a boot ROM exploit and allow submission of a patched low level bootloader or hack the kernel to submit the jailbroken kernel after the SHSH check. == Other phones == A similar method of jailbreaking exists for S60 Platform smartphones, where utilities such as HelloOX allow the execution of unsigned code and full access to system files. or edited firmware (similar to the M33 hacked firmware used for the PlayStation Portable) to circumvent restrictions on unsigned code. Nokia has since issued updates to curb unauthorized jailbreaking, in a manner similar to Apple. Rooting is the equivalent concept for Android phones and other devices. == Console jailbreaking == In the case of gaming consoles, jailbreaking is often used to execute homebrew games. In 2011, Sony, with assistance from law firm Kilpatrick Stockton, sued 21-year-old George Hotz and associates of the group fail0verflow for jailbreaking the PlayStation 3 (see Sony Computer Entertainment America v. George Hotz and PlayStation Jailbreak). == AI jailbreaks == Jailbreaking can also occur in systems and software that use generative artificial intelligence models, such as ChatGPT. In jailbreaking attacks on artificial intelligence systems, users are able to manipulate the system to behave differently than it was intended, making it possible to reveal information about how the model was instructed by the vendor (the "system prompt") or to induce it to respond in an anomalous or harmful way. These attacks typically simply require prompting the AIs with specific phrasal templates - no software is typically required, although software could theoretically be used to "industrialise" such exploits, and some research has been done in this direction. In 2024, a consortium of AI firms founded HackAPrompt.com, a competition to encourage users to find new and effective AI jailbreaking techniques. These and other findings from "ethical hackers" have been used by AI model providers to try to improve AI safety.

    Read more →
  • Intelligent automation

    Intelligent automation

    Intelligent automation (IA), or intelligent process automation, is a software term that refers to a combination of artificial intelligence (AI) and robotic process automation (RPA). Companies use intelligent automation to cut costs and streamline tasks by using artificial-intelligence-powered robotic software to mitigate repetitive tasks. As it accumulates data, the system learns in an effort to improve its efficiency. Intelligent automation applications consist of, but are not limited to, pattern analysis, data assembly, and classification. The term is similar to hyperautomation, a concept identified by research group Gartner as being one of the top technology trends of 2020. == Technology == Intelligent automation applies the assembly line concept of breaking tasks into repetitive steps to improve business processes. Rather than having humans perform each step, intelligent automation can replace steps with an intelligent software robot, improving efficiency. Intelligent automation integrates robotic process automation (RPA) with artificial intelligence techniques (such as machine learning, natural-language processing, and computer vision) enabling systems to interpret data, make decisions, and adapt to changing inputs. Modern platforms use a layered architecture combining workflow orchestration, low-code tools, integration middleware, and AI services to coordinate bots and data pipelines across organisational systems. == Applications == Intelligent automation is used to process unstructured content. Common real-world applications include self-driving cars, self-checkouts at grocery stores, smart home assistants, and appliances. Businesses can apply data and machine learning to build predictive analytics that react to consumer behavior changes, or to implement RPA to improve manufacturing floor operations. For example, the technology has also been used to automate the workflow behind distributing COVID-19 vaccines. Data provided by hospital systems’ electronic health records can be processed to identify and educate patients, and schedule vaccinations. Intelligent automation can provide real-time insights on profitability and efficiency. However, in an April 2022 survey by Alchemmy, despite three quarters of businesses acknowledging the importance of Artificial Intelligence to their future development, just a quarter of business leaders (25%) considered Intelligent Automation a “game changer” in understanding current performance. 42% of CTOs see “shortage of talent” as the main obstacle to implementing Intelligent Automation in their business, while 36% of CEOs see ‘upskilling and professional development of existing workforce’ as the most significant adoption barrier. IA is becoming increasingly accessible for firms of all sizes. With this in mind, it is expected to continue to grow rapidly in all industries. This technology has the potential to change the workforce. As it advances, it will be able to perform increasingly complex and difficult tasks. In addition, this may expose certain workforce issues as well as change how tasks are allocated. Tools such as Semrush's AI Visibility Toolkit and Enterprise AIO reflect these developments by analysing how entities are referenced and represented within responses produced by large-language-model-based systems. == Benefits == Streamline processes: Repetitive manual tasks can put a strain on the workforce. However, with AI agents, these tasks can be automated to allow teams to focus on more important matters that require human cognition. Intelligent automation can also be used to mitigate tasks with human error which in turn increases proficiency. This allows the opportunity for firms to scale production without the traditional negative consequences such as reduced quality or increased risk. Customer service improvement: Customer service can be significantly improved, providing the firm with a competitive advantage. IA utilizing chat features allows for instant curated responses to customers. In addition, it can give updates to customers, make appointments, manage calls, and personalize campaigns. Flexibility: Due to the wide range of applications, IA is useful across a variety of fields, technologies, projects and industries. In addition, IA can be integrated with current automated systems in place. This allows for optimized systems unique to each firm to best fit their individual needs. == Capabilities == Cognitive automation: Employs AI techniques to assist humans in decision-making and task completion Natural language processing: Allows computers to automate knowledge work Business process management: Enhances the consistency and agility of corporate operations Process mining: Applies data mining methods to discover, analyze, and improve business processes Intelligent document processing: Utilizes OCR and other advanced technologies to extract data from documents and convert it into structured, usable data Computer vision: Allows computers to extract information from digital images, videos, and other visual inputs Integration automation: Establishes a unified platform with automated workflows that integrate data, applications, and devices.

    Read more →
  • 20Q

    20Q

    20Q is a computerized game of twenty questions that began as a test in artificial intelligence (AI). It was invented by Robin Burgener in 1988. The game was made handheld by Radica in 2003, but was discontinued in 2011 because Techno Source took the license for 20Q handheld devices. The game 20Q is based on the spoken parlor game known as twenty questions, and is both a website and a handheld device. 20Q asks the player to think of something and will then try to guess what they are thinking of with twenty yes-or-no questions. If it fails to guess in 20 questions, it will ask an additional 5 questions. If it fails to guess even with 25 (or 30) questions, the player is declared the winner. Sometimes the first guess of the object can be asked at question 14. == Principle and history == The principle is that the player thinks of something and the 20Q artificial intelligence asks a series of questions before guessing what the player is thinking. This artificial intelligence learns on its own with the information relayed back to the players who interact with it, and is not programmed. The player can answer these questions with: Yes, No, Unknown, and Sometimes. The experiment is based on the classic word game of Twenty Questions, and on the computer game "Animals," popular in the early 1970s, which used a somewhat simpler method to guess an animal. The 20Q AI uses an artificial neural network to pick the questions and to guess. After the player has answered the twenty questions posed (sometimes fewer), 20Q makes a guess. If it is incorrect, it asks more questions, then guesses again. It makes guesses based on what it has learned; it is not programmed with information or what the inventor thinks. Answers to any question are based on players’ interpretations of the questions asked. Newer editions were made for different categories, such as music 20Q which has the player think of a song, and Harry Potter 20Q, which has the player think of something from the world of the Harry Potter series. The 20Q AI can draw its own conclusions on how to interpret the information. It can be described as more of a folk taxonomy than a taxonomy. Its knowledge develops with every game played. In this regard, the online version of the 20Q AI can be inaccurate because it gathers its answers from what people think rather than from what people know. Limitations of taxonomy are often overcome by the AI itself because it can learn and adapt. For example, if the player was thinking of a "Horse" and answered "No" to the question "Is it an animal?," the AI will, nevertheless, guess correctly, despite being told that a horse is not an animal. Patent applications in the US and Europe were submitted in 2005. In August 2014, 20Q.net Inc., with Brashworks Studios, developed and released an iOS iPad version available at the Apple iTunes store. == Game show == On June 13, 2009, GSN began a TV version of the game, hosted by Cat Deeley, with Hal Sparks as the voice of Mr. Q.

    Read more →
  • Catastrophic interference

    Catastrophic interference

    Catastrophic interference, also known as catastrophic forgetting, is the tendency of an artificial neural network to abruptly and drastically forget previously learned information upon learning new information. Neural networks are an important part of the connectionist approach to cognitive science. The issue of catastrophic interference when modeling human memory with connectionist models was originally brought to the attention of the scientific community by research from McCloskey and Cohen (1989), and Ratcliff (1990). It is a radical manifestation of the 'sensitivity-stability' dilemma or the 'stability-plasticity' dilemma. Specifically, these problems refer to the challenge of making an artificial neural network that is sensitive to, but not disrupted by, new information. Lookup tables and connectionist networks lie on the opposite sides of the stability plasticity spectrum. The former remains completely stable in the presence of new information but lacks the ability to generalize, i.e. infer general principles, from new inputs. On the other hand, connectionist networks like the standard backpropagation network can generalize to unseen inputs, but they are sensitive to new information. Backpropagation models can be analogized to human memory insofar as they have a similar ability to generalize, but these networks often exhibit less stability than human memory. Notably, these backpropagation networks are susceptible to catastrophic interference. This is an issue when modelling human memory, because unlike these networks, humans typically do not show catastrophic forgetting. == Discovery == The term catastrophic interference was originally coined by McCloskey and Cohen (1989) but was also brought to the attention of the scientific community by research from Ratcliff (1990). === The Sequential Learning Problem: McCloskey and Cohen (1989) === McCloskey and Cohen (1989) noted the problem of catastrophic interference during two different experiments with backpropagation neural network modelling. Experiment 1: Learning the ones and twos addition facts In their first experiment they trained a standard backpropagation neural network on a single training set consisting of 17 single-digit ones problems (i.e., 1 + 1 through 9 + 1, and 1 + 2 through 1 + 9) until the network could represent and respond properly to all of them. The error between the actual output and the desired output steadily declined across training sessions, which reflected that the network learned to represent the target outputs better across trials. Next, they trained the network on a single training set consisting of 17 single-digit twos problems (i.e., 2 + 1 through 2 + 9, and 1 + 2 through 9 + 2) until the network could represent, respond properly to all of them. They noted that their procedure was similar to how a child would learn their addition facts. Following each learning trial on the twos facts, the network was tested for its knowledge on both the ones and twos addition facts. Like the ones facts, the twos facts were readily learned by the network. However, McCloskey and Cohen noted the network was no longer able to properly answer the ones addition problems even after one learning trial of the twos addition problems. The output pattern produced in response to the ones facts often resembled an output pattern for an incorrect number more closely than the output pattern for a correct number. This is considered to be a drastic amount of error. Furthermore, the problems 2+1 and 1+2, which were included in both training sets, even showed dramatic disruption during the first learning trials of the twos facts. Experiment 2: Replication of Barnes and Underwood (1959) study In their second connectionist model, McCloskey and Cohen attempted to replicate the study on retroactive interference in humans by Barnes and Underwood (1959). They trained the model on A-B and A-C lists and used a context pattern in the input vector (input pattern), to differentiate between the lists. Specifically the network was trained to respond with the right B response when shown the A stimulus and A-B context pattern and to respond with the correct C response when shown the A stimulus and the A-C context pattern. When the model was trained concurrently on the A-B and A-C items then the network readily learned all of the associations correctly. In sequential training the A-B list was trained first, followed by the A-C list. After each presentation of the A-C list, performance was measured for both the A-B and A-C lists. They found that the amount of training on the A-C list in Barnes and Underwood study that lead to 50% correct responses, lead to nearly 0% correct responses by the backpropagation network. Furthermore, they found that the network tended to show responses that looked like the C response pattern when the network was prompted to give the B response pattern. This indicated that the A-C list apparently had overwritten the A-B list. This could be likened to learning the word dog, followed by learning the word stool and then finding that you think of the word stool when presented with the word dog. McCloskey and Cohen tried to reduce interference through a number of manipulations including changing the number of hidden units, changing the value of the learning rate parameter, overtraining on the A-B list, freezing certain connection weights, changing target values 0 and 1 instead 0.1 and 0.9. However, none of these manipulations satisfactorily reduced the catastrophic interference exhibited by the networks. Overall, McCloskey and Cohen (1989) concluded that: at least some interference will occur whenever new learning alters the weights involved in representing old learning the greater the amount of new learning, the greater the disruption in old knowledge interference was catastrophic in the backpropagation networks when learning was sequential but not concurrent === Constraints Imposed by Learning and Forgetting Functions: Ratcliff (1990) === Ratcliff (1990) used multiple sets of backpropagation models applied to standard recognition memory procedures, in which the items were sequentially learned. After inspecting the recognition performance models he found two major problems: Well-learned information was catastrophically forgotten as new information was learned in both small and large backpropagation networks. Even one learning trial with new information resulted in a significant loss of the old information, paralleling the findings of McCloskey and Cohen (1989). Ratcliff also found that the resulting outputs were often a blend of the previous input and the new input. In larger networks, items learned in groups (e.g. AB then CD) were more resistant to forgetting than were items learned singly (e.g. A then B then C...). However, the forgetting for items learned in groups was still large. Adding new hidden units to the network did not reduce interference. Discrimination between the studied items and previously unseen items decreased as the network learned more. This finding contradicts studies on human memory, which indicated that discrimination increases with learning. Ratcliff attempted to alleviate this problem by adding 'response nodes' that would selectively respond to old and new inputs. However, this method did not work as these response nodes would become active for all inputs. A model which used a context pattern also failed to increase discrimination between new and old items. == Proposed solutions == The main cause of catastrophic interference seems to be overlap in the representations at the hidden layer of distributed neural networks. In a distributed representation, each input tends to create changes in the weights of many of the nodes. Catastrophic forgetting occurs because when many of the weights where "knowledge is stored" are changed, it is unlikely for prior knowledge to be kept intact. During sequential learning, the inputs become mixed, with the new inputs being superimposed on top of the old ones. Another way to conceptualize this is by visualizing learning as a movement through a weight space. This weight space can be likened to a spatial representation of all of the possible combinations of weights that the network could possess. When a network first learns to represent a set of patterns, it finds a point in the weight space that allows it to recognize all of those patterns. However, when the network then learns a new set of patterns, it will move to a place in the weight space for which the only concern is the recognition of the new patterns. To recognize both sets of patterns, the network must find a place in the weight space suitable for recognizing both the new and the old patterns. Below are a number of techniques which have empirical support in successfully reducing catastrophic interference in backpropagation neural networks: === Orthogonality === Many of the early techniques in reducing representational overlap involved making either the input vecto

    Read more →