AI Detector Jobs

AI Detector Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Inbox by Gmail

    Inbox by Gmail

    Inbox by Gmail was an email service developed by Google. Announced on a limited invitation-only basis on October 22, 2014, it was officially released to the public on May 28, 2015. Inbox was shut down by Google on April 2, 2019. Available on the web, and through mobile apps for Android and iOS, Inbox by Gmail aimed to improve email productivity and organization through several key features. Bundles gathered emails on the same topic together; highlighted surface key details from messages, reminders and assists; and a "snooze" functionality enabled users to control when specific information would appear. Updates to the service enabled an "undo send" feature; a "Smart Reply" feature that automatically generated short reply examples for certain emails; integration with Google Calendar for event organization, previews of newsletters; and a "Save to Inbox" feature that let users save links for later use. Inbox by Gmail received generally positive reviews. At its launch, it was called "minimalist and lovely, full of layers and easy to navigate", with features deemed helpful in finding the right messages—one reviewer noted that the service felt "a lot like the future of email". However, it also received criticism, particularly for a low density of information, algorithms that needed tweaking, and because the service required users to "give up the control" of organizing their own email, meaning that "Anyone who already has a system for organizing their emails will likely find themselves fighting Google's system". Google noted in March 2016 that 10% of all replies on mobile originated from Inbox's Smart Reply feature. Google announced it would discontinue Inbox by Gmail in March 2019, with many of its features integrated into Gmail proper. == Features == Inbox by Gmail scanned the user's incoming Gmail messages for information. It gathered email messages related to the same overall topic into an organized bundle, with a title describing the bundle's content. For example, flight tickets, car rentals, and hotel reservations were grouped under "Travel", giving the user an easier overview of emails. Users could also group emails together manually, to "teach" the Inbox how the user worked. The service highlighted key details and important information in messages, such as flight itineraries, event information, photos and documents. Inbox could retrieve updated information from the Internet, including the real-time status of flights and package deliveries. Users could set reminders to bring up important messages later. When a user needed particular information, Inbox could assist the user by displaying the necessary details. Where Inbox highlights information was not needed immediately, users could "snooze" a message or reminder, with options to make the information reappear at a later time or specific location. In June 2015, Google added an "Undo Send" feature to Inbox, giving the user 10 seconds to undo sending a message. In November 2015, Google added "Smart Reply" functionality to the mobile apps. With Smart Reply, Inbox determined which emails could be answered with a short reply, generating three example responses from which the user could select one with a single tap. Smart Reply (initially available only on the Android and iOS mobile apps) was added to the Inbox website in March 2016, Google announcing that "10% of all your replies on mobile already use Smart Reply". By May 2017, Google said Smart Reply was driving about 12% of replies in inbox on mobile. In April 2016, Google updated Inbox with three new features; Google Calendar event organization, newsletter previews, and a "Save to Inbox" functionality that let the user save links for later use, rather than having to email links to themselves. In December 2017, Google introduced an "Unsubscribe" card that let users easily unsubscribe from mailing lists. The card appeared for email messages (from specific senders) that the user had not opened for a month. A few popular Inbox by Gmail features were subsequently added to Gmail: "Snoozing" of emails Nudges: Gmail could move old messages back to the top of the inbox when it thought a follow up or reply might be required. Hover actions: Placing the mouse cursor over a certain part of the message could quickly effect an action, such as archiving, without its being opened. Smart reply: This feature employed boilerplate text to suggest appropriate replies. Google reportedly wished, at a time then to be decided, to add the "bundles" feature to Gmail, which at the time was available only in Inbox for Gmail. By March 2020, many Inbox features were still missing from Gmail. == Platforms == Inbox by Gmail was announced on a limited invitation-only basis on October 22, 2014, available on the web, and through the Android and iOS mobile operating systems. It was officially released to the public on May 28, 2015. == Reception == David Pierce of The Verge praised the service, writing that it was "minimalist and lovely, full of layers and easy to navigate. It's remarkably fast and smooth on all platforms, and far better on iOS than the Gmail app". However, he criticized the app's low density of information, with only a few emails visible on the screen at a time, making it "a bit of a challenge" for users who need to go through "hundreds of emails" every day. Although positive that "Inbox feels a lot like the future of email", Pierce wrote that there was "plenty of algorithm tweaking and design condensing to do", with particular attention needed on a "compact view" for denser view of information on the screen. Sarah Mitroff of CNET also praised Inbox, writing, "Not only is it visually appealing, it's also full of features that help you find every message you need, when you need it". She added that users must "give up the control" to organize their email, and that it "won't vibe with everyone", but admitted that "if you're willing ... the app will reward you with a smarter and cleaner inbox." Mitroff noted that, initially, users had to coach the app about which bundle was appropriate for certain emails, writing, "It's a tedious process at first, by [sic] in just a few days Inbox starts to get it right." Regarding any downsides of the service, Mitroff wrote that "Inbox has a built-in strategy for managing your emails that works best on its own. Anyone who already has a system for organizing their emails will likely find themselves fighting Google's system". == Discontinuation and legacy == Google ended the service in March 2019. Google called Inbox "a great place to experiment with new ideas" and noted that many of those ideas had been migrated to Gmail. The company wanted, going forward, to focus its resources on a single email system. Several services, like Shortwave, attempted to resurrect some of the features of Inbox by Gmail to attract its old users. Similarly, Inbox Reborn, an actively maintained browser extension developed by a team of volunteer developers from around the world since 2018, aims to recreate the core features and visual style of Inbox by Gmail within the standard Gmail interface. The project continues to focus on preserving functionalities such as email bundling and streamlined workflows to provide users with a familiar productivity experience. Afterwards, most people moved to Spark, Spike, or Newton. According to a product manager at Google, a "more focused approach" regarding email was the companies goal. This is likely the reason they moved away from Inbox.

    Read more →
  • Absorbing Markov chain

    Absorbing Markov chain

    In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .

    Read more →
  • List of text mining software

    List of text mining software

    Text mining computer programs are available from many commercial and open source companies and sources. == Commercial == Angoss – Angoss Text Analytics provides entity and theme extraction, topic categorization, sentiment analysis and document summarization capabilities via the embedded AUTINDEX – is a commercial text mining software package based on sophisticated linguistics by IAI (Institute for Applied Information Sciences), Saarbrücken. DigitalMR – social media listening & text+image analytics tool for market research. FICO Score – leading provider of analytics. General Sentiment – Social Intelligence platform that uses natural language processing to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004. IBM LanguageWare – the IBM suite for text analytics (tools and Runtime). IBM SPSS – provider of Modeler Premium (previously called IBM SPSS Modeler and IBM SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting. Inxight – provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by Business Objects that was bought by SAP AG in 2008). Language Computer Corporation – text extraction and analysis tools, available in multiple languages. Lexalytics – provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications. Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis. Linguamatics – provider of natural language processing (NLP) based enterprise text mining and text analytics software, I2E, for high-value knowledge discovery and decision support. Mathematica – provides built in tools for text alignment, pattern matching, clustering and semantic analysis. See Wolfram Language, the programming language of Mathematica. MATLAB offers Text Analytics Toolbox for importing text data, converting it to numeric form for use in machine and deep learning, sentiment analysis and classification tasks. Medallia – offers one system of record for survey, social, text, written and online feedback. NetMiner – software for network analysis and text mining. Supports social media and bibliographic data collection, NLP for english and chinese, sentiment analysis, work co-occurrence network(text network analysis) and visualization. NetOwl – suite of multilingual text and entity analytics products, including entity extraction, link and event extraction, sentiment analysis, geotagging, name translation, name matching, and identity resolution, among others. PolyAnalyst - text analytics environment. PoolParty Semantic Suite - graph-based text mining platform. RapidMiner with its Text Processing Extension – data and text mining software. SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for Information Management. Sketch Engine – a corpus manager and analysis software which providing creating text corpora from uploaded texts or the Web including part-of-speech tagging and lemmatization or detecting a particular website. Sysomos – provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations. WordStat – Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data. == Open source == Carrot2 – text and search results clustering framework. GATE – general Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering. Gensim – large-scale topic modelling and extraction of semantic information from unstructured text (Python). KH Coder – for Quantitative Content Analysis or Text Mining The KNIME Text Processing extension. Natural Language Toolkit (NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. OpenNLP – natural language processing. Orange with its text mining add-on. The PLOS Text Mining Collection. The programming language R provides a framework for text mining applications in the package tm. The Natural Language Processing task view contains tm and other text mining library packages. spaCy – open-source Natural Language Processing library for Python Stanbol – an open source text mining engine targeted at semantic content management. Voyant Tools – a web-based text analysis environment, created as a scholarly project.

    Read more →
  • Multiple discriminant analysis

    Multiple discriminant analysis

    Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction technique. It has been used to predict signals as diverse as neural memory traces and corporate failure. MDA is not directly used to perform classification. It merely supports classification by yielding a compressed signal amenable to classification. The method described in Duda et al. (2001) §3.8.3 projects the multivariate signal down to an M−1 dimensional space where M is the number of categories. MDA is useful because most classifiers are strongly affected by the curse of dimensionality. In other words, when signals are represented in very-high-dimensional spaces, the classifier's performance is catastrophically impaired by the overfitting problem. This problem is reduced by compressing the signal down to a lower-dimensional space as MDA does. MDA has been used to reveal neural codes.

    Read more →
  • Robot learning

    Robot learning

    Robot learning is a research field at the intersection of machine learning and robotics. It studies techniques allowing a robot to acquire novel skills or adapt to its environment through learning algorithms. The embodiment of the robot, situated in a physical embedding, provides at the same time specific difficulties (e.g. high-dimensionality, real time constraints for collecting data and learning) and opportunities for guiding the learning process (e.g. sensorimotor synergies, motor primitives). Example of skills that are targeted by learning algorithms include sensorimotor skills such as locomotion, grasping, active object categorization, as well as interactive skills such as joint manipulation of an object with a human peer, and linguistic skills such as the grounded and situated meaning of human language. Learning can happen either through autonomous self-exploration or through guidance from a human teacher, like for example in robot learning by imitation. Robot learning can be closely related to adaptive control, reinforcement learning as well as developmental robotics which considers the problem of autonomous lifelong acquisition of repertoires of skills. While machine learning is frequently used by computer vision algorithms employed in the context of robotics, these applications are usually not referred to as "robot learning". == Imitation learning == Many research groups are developing techniques where robots learn by imitating. This includes various techniques for learning from demonstration (sometimes also referred to as "programming by demonstration") and observational learning. == Sharing learned skills and knowledge == In Tellex's "Million Object Challenge", the goal is robots that learn how to spot and handle simple items and upload their data to the cloud to allow other robots to analyze and use the information. RoboBrain is a knowledge engine for robots which can be freely accessed by any device wishing to carry out a task. The database gathers new information about tasks as robots perform them, by searching the Internet, interpreting natural language text, images, and videos, object recognition as well as interaction. The project is led by Ashutosh Saxena at Stanford University. RoboEarth is a project that has been described as a "World Wide Web for robots" − it is a network and database repository where robots can share information and learn from each other and a cloud for outsourcing heavy computation tasks. The project brings together researchers from five major universities in Germany, the Netherlands and Spain and is backed by the European Union. Google Research, DeepMind, and Google X have decided to allow their robots share their experiences. == Vision-language-action model == Research groups and companies are developing vision-language-action models, foundation models that allow robotic control through the combination of vision and language. Google DeepMind, Figure AI and Hugging Face are actively working on that.

    Read more →
  • Wolfram Mathematica

    Wolfram Mathematica

    Wolfram Mathematica (also known as Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California. Mathematica's Wolfram Language is fundamentally based on Lisp; for example, the Mathematica command Most is identically equal to the Lisp command butlast. == Notebook interface == Mathematica is split into two parts: the kernel and the front end. The kernel interprets expressions (Wolfram Language code) and returns result expressions, which can then be displayed by the front end. The original front end, designed by Theodore Gray in 1988, consists of a notebook interface and allows the creation and editing of notebook documents that can contain code, plaintext, images, and graphics. Code development is also supported through support in a range of standard integrated development environment (IDE) including Eclipse, IntelliJ IDEA, Atom, Vim, Visual Studio Code and Git. The Mathematica Kernel also includes a command line front end. Other interfaces include JMath, based on GNU Readline and WolframScript which runs self-contained Mathematica programs (with arguments) from the UNIX command line. == High-performance computing == Capabilities for high-performance computing were extended with the introduction of packed arrays in version 4 (1999) and sparse matrices (version 5, 2003), and by adopting the GNU Multiple Precision Arithmetic Library to evaluate high-precision arithmetic. Version 5.2 (2005) added automatic multi-threading when computations are performed on multi-core computers. This release included CPU-specific optimized libraries. In addition Mathematica is supported by third party specialist acceleration hardware such as ClearSpeed. In 2002, gridMathematica was introduced to allow user level parallel programming on heterogeneous clusters and multiprocessor systems and in 2008 parallel computing technology was included in all Mathematica licenses including support for grid technology such as Windows HPC Server 2008, Microsoft Compute Cluster Server and Sun Grid. Support for CUDA and OpenCL GPU hardware was added in 2010. == Extensions == As of Version 14, there are 6,602 built-in functions and symbols in the Wolfram Language. Stephen Wolfram announced the launch of the Wolfram Function Repository in June 2019 as a way for the public Wolfram community to contribute functionality to the Wolfram Language. There are currently more than 3000 functions contributed as Resource Functions. In addition to the Wolfram Function Repository, there is a Wolfram Data Repository with computable data and the Wolfram Neural Net Repository for machine learning. Wolfram Mathematica is the basis of the Combinatorica package, which adds discrete mathematics functionality in combinatorics and graph theory to the program. == Connections to other applications, programming languages, and services == Communication with other applications can be done using a protocol called Wolfram Symbolic Transfer Protocol (WSTP). It allows communication between the Wolfram Mathematica kernel and the front end and provides a general interface between the kernel and other applications. Wolfram Research freely distributes a developer kit for linking applications written in the programming language C to the Mathematica kernel through WSTP using J/Link., a Java program that can ask Mathematica to perform computations. Similar functionality is achieved with .NET /Link, but with .NET programs instead of Java programs. Other languages that connect to Mathematica include Haskell, AppleScript, Racket, Visual Basic, Python, and Clojure. Mathematica supports the generation and execution of Modelica models for systems modeling and connects with Wolfram System Modeler. Links are also available to many third-party software packages and APIs. Mathematica can also capture real-time data from a variety of sources and can read and write to public blockchains (Bitcoin, Ethereum, and ARK). It supports import and export of over 220 data, image, video, sound, computer-aided design (CAD), geographic information systems (GIS), document, and biomedical formats. In 2019, support was added for compiling Wolfram Language code to LLVM. Version 12.3 of the Wolfram Language added support for Arduino. == Computable data == Mathematica is also integrated with Wolfram Alpha, an online answer engine that provides additional data, some of which is kept updated in real time, for users who use Mathematica with an internet connection. Some of the data sets include astronomical, chemical, geopolitical, language, biomedical, airplane, and weather data, in addition to mathematical data (such as knots and polyhedra). == Reception == BYTE in 1989 listed Mathematica as among the "Distinction" winners of the BYTE Awards, stating that it "is another breakthrough Macintosh application ... it could enable you to absorb the algebra and calculus that seemed impossible to comprehend from a textbook". Mathematica has been criticized for being closed source. Wolfram Research claims keeping Mathematica closed source is central to its business model and the continuity of the software.

    Read more →
  • Elastic map

    Elastic map

    Elastic maps provide a tool for nonlinear dimensionality reduction. By their construction, they are a system of elastic springs embedded in the data space. This system approximates a low-dimensional manifold. The elastic coefficients of this system allow the switch from completely unstructured k-means clustering (zero elasticity) to the estimators located closely to linear PCA manifolds (for high bending and low stretching modules). With some intermediate values of the elasticity coefficients, this system effectively approximates non-linear principal manifolds. This approach is based on a mechanical analogy between principal manifolds, that are passing through "the middle" of the data distribution, and elastic membranes and plates. The method was developed by A.N. Gorban, A.Y. Zinovyev and A.A. Pitenko in 1996–1998. == Energy of elastic map == Let S {\displaystyle {\mathcal {S}}} be a data set in a finite-dimensional Euclidean space. Elastic map is represented by a set of nodes w j {\displaystyle {\bf {w}}_{j}} in the same space. Each datapoint s ∈ S {\displaystyle s\in {\mathcal {S}}} has a host node, namely the closest node w j {\displaystyle {\bf {w}}_{j}} (if there are several closest nodes then one takes the node with the smallest number). The data set S {\displaystyle {\mathcal {S}}} is divided into classes K j = { s | w j is a host of s } {\displaystyle K_{j}=\{s\ |\ {\bf {w}}_{j}{\mbox{ is a host of }}s\}} . The approximation energy D is the distortion D = 1 2 ∑ j = 1 k ∑ s ∈ K j ‖ s − w j ‖ 2 {\displaystyle D={\frac {1}{2}}\sum _{j=1}^{k}\sum _{s\in K_{j}}\|s-{\bf {w}}_{j}\|^{2}} , which is the energy of the springs with unit elasticity which connect each data point with its host node. It is possible to apply weighting factors to the terms of this sum, for example to reflect the standard deviation of the probability density function of any subset of data points { s i } {\displaystyle \{s_{i}\}} . On the set of nodes an additional structure is defined. Some pairs of nodes, ( w i , w j ) {\displaystyle ({\bf {w}}_{i},{\bf {w}}_{j})} , are connected by elastic edges. Call this set of pairs E {\displaystyle E} . Some triplets of nodes, ( w i , w j , w k ) {\displaystyle ({\bf {w}}_{i},{\bf {w}}_{j},{\bf {w}}_{k})} , form bending ribs. Call this set of triplets G {\displaystyle G} . The stretching energy is U E = 1 2 λ ∑ ( w i , w j ) ∈ E ‖ w i − w j ‖ 2 {\displaystyle U_{E}={\frac {1}{2}}\lambda \sum _{({\bf {w}}_{i},{\bf {w}}_{j})\in E}\|{\bf {w}}_{i}-{\bf {w}}_{j}\|^{2}} , The bending energy is U G = 1 2 μ ∑ ( w i , w j , w k ) ∈ G ‖ w i − 2 w j + w k ‖ 2 {\displaystyle U_{G}={\frac {1}{2}}\mu \sum _{({\bf {w}}_{i},{\bf {w}}_{j},{\bf {w}}_{k})\in G}\|{\bf {w}}_{i}-2{\bf {w}}_{j}+{\bf {w}}_{k}\|^{2}} , where λ {\displaystyle \lambda } and μ {\displaystyle \mu } are the stretching and bending moduli respectively. The stretching energy is sometimes referred to as the membrane, while the bending energy is referred to as the thin plate term. For example, on the 2D rectangular grid the elastic edges are just vertical and horizontal edges (pairs of closest vertices) and the bending ribs are the vertical or horizontal triplets of consecutive (closest) vertices. The total energy of the elastic map is thus U = D + U E + U G . {\displaystyle U=D+U_{E}+U_{G}.} The position of the nodes { w j } {\displaystyle \{{\bf {w}}_{j}\}} is determined by the mechanical equilibrium of the elastic map, i.e. its location is such that it minimizes the total energy U {\displaystyle U} . == Expectation-maximization algorithm == For a given splitting of dataset S {\displaystyle {\mathcal {S}}} in classes K j {\displaystyle K_{j}} , minimization of the quadratic functional U {\displaystyle U} is a linear problem with the sparse matrix of coefficients. Therefore, similar to principal component analysis or k-means, a splitting method is used: For given { w j } {\displaystyle \{{\bf {w}}_{j}\}} find { K j } {\displaystyle \{K_{j}\}} ; For given { K j } {\displaystyle \{K_{j}\}} minimize U {\displaystyle U} and find { w j } {\displaystyle \{{\bf {w}}_{j}\}} ; If no change, terminate. This expectation-maximization algorithm guarantees a local minimum of U {\displaystyle U} . For improving the approximation various additional methods are proposed. For example, the softening strategy is used. This strategy starts with a rigid grids (small length, small bending and large elasticity modules λ {\displaystyle \lambda } and μ {\displaystyle \mu } coefficients) and finishes with soft grids (small λ {\displaystyle \lambda } and μ {\displaystyle \mu } ). The training goes in several epochs, each epoch with its own grid rigidness. Another adaptive strategy is growing net: one starts from a small number of nodes and gradually adds new nodes. Each epoch goes with its own number of nodes. == Applications == Most important applications of the method and free software are in bioinformatics for exploratory data analysis and visualisation of multidimensional data, for data visualisation in economics, social and political sciences, as an auxiliary tool for data mapping in geographic informational systems and for visualisation of data of various nature. The method is applied in quantitative biology for reconstructing the curved surface of a tree leaf from a stack of light microscopy images. This reconstruction is used for quantifying the geodesic distances between trichomes and their patterning, which is a marker of the capability of a plant to resist to pathogenes. Recently, the method is adapted as a support tool in the decision process underlying the selection, optimization, and management of financial portfolios. The method of elastic maps has been systematically tested and compared with several machine learning methods on the applied problem of identification of the flow regime of a gas-liquid flow in a pipe. There are various regimes: Single phase water or air flow, Bubbly flow, Bubbly-slug flow, Slug flow, Slug-churn flow, Churn flow, Churn-annular flow, and Annular flow. The simplest and most common method used to identify the flow regime is visual observation. This approach is, however, subjective and unsuitable for relatively high gas and liquid flow rates. Therefore, the machine learning methods are proposed by many authors. The methods are applied to differential pressure data collected during a calibration process. The method of elastic maps provided a 2D map, where the area of each regime is represented. The comparison with some other machine learning methods is presented in Table 1 for various pipe diameters and pressure. Here, ANN stands for the backpropagation artificial neural networks, SVM stands for the support vector machine, SOM for the self-organizing maps. The hybrid technology was developed for engineering applications. In this technology, elastic maps are used in combination with Principal Component Analysis (PCA), Independent Component Analysis (ICA) and backpropagation ANN. The textbook provides a systematic comparison of elastic maps and self-organizing maps (SOMs) in applications to economic and financial decision-making.

    Read more →
  • Variable kernel density estimation

    Variable kernel density estimation

    In statistics, adaptive or "variable-bandwidth" kernel density estimation is a form of kernel density estimation in which the size of the kernels used in the estimate are varied depending upon either the location of the samples or the location of the test point. It is a particularly effective technique when the sample space is multi-dimensional. == Rationale == Given a set of samples, { x → i } {\displaystyle \lbrace {\vec {x}}_{i}\rbrace } , we wish to estimate the density, P ( x → ) {\displaystyle P({\vec {x}})} , at a test point, x → {\displaystyle {\vec {x}}} : P ( x → ) ≈ W n h D {\displaystyle P({\vec {x}})\approx {\frac {W}{nh^{D}}}} W = ∑ i = 1 n w i {\displaystyle W=\sum _{i=1}^{n}w_{i}} w i = K ( x → − x → i h ) {\displaystyle w_{i}=K\left({\frac {{\vec {x}}-{\vec {x}}_{i}}{h}}\right)} where n is the number of samples, K is the "kernel", h is its width and D is the number of dimensions in x → {\displaystyle {\vec {x}}} . The kernel can be thought of as a simple, linear filter. Using a fixed filter width may mean that in regions of low density, all samples will fall in the tails of the filter with very low weighting, while regions of high density will find an excessive number of samples in the central region with weighting close to unity. To fix this problem, we vary the width of the kernel in different regions of the sample space. There are two methods of doing this: balloon and pointwise estimation. In a balloon estimator, the kernel width is varied depending on the location of the test point. In a pointwise estimator, the kernel width is varied depending on the location of the sample. For multivariate estimators, the parameter, h, can be generalized to vary not just the size, but also the shape of the kernel. This more complicated approach will not be covered here. == Balloon estimators == A common method of varying the kernel width is to make it inversely proportional to the density at the test point: h = k [ n P ( x → ) ] 1 / D {\displaystyle h={\frac {k}{\left[nP({\vec {x}})\right]^{1/D}}}} where k is a constant. If we back-substitute the estimated PDF, and assuming a Gaussian kernel function, we can show that W is a constant: W = k D ( 2 π ) D / 2 {\displaystyle W=k^{D}(2\pi )^{D/2}} A similar derivation holds for any kernel whose normalising function is of the order hD, although with a different constant factor in place of the (2 π)D/2 term. This produces a generalization of the k-nearest neighbour algorithm. That is, a uniform kernel function will return the KNN technique. There are two components to the error: a variance term and a bias term. The variance term is given as: e 1 = P ∫ K 2 n h D {\displaystyle e_{1}={\frac {P\int K^{2}}{nh^{D}}}} . The bias term is found by evaluating the approximated function in the limit as the kernel width becomes much larger than the sample spacing. By using a Taylor expansion for the real function, the bias term drops out: e 2 = h 2 n ∇ 2 P {\displaystyle e_{2}={\frac {h^{2}}{n}}\nabla ^{2}P} An optimal kernel width that minimizes the error of each estimate can thus be derived. == Use for statistical classification == The method is particularly effective when applied to statistical classification. There are two ways we can proceed: the first is to compute the PDFs of each class separately, using different bandwidth parameters, and then compare them as in Taylor. Alternatively, we can divide up the sum based on the class of each sample: P ( j , x → ) ≈ 1 n ∑ i = 1 , c i = j n w i {\displaystyle P(j,{\vec {x}})\approx {\frac {1}{n}}\sum _{i=1,c_{i}=j}^{n}w_{i}} where ci is the class of the ith sample. The class of the test point may be estimated through maximum likelihood.

    Read more →
  • Confusion network

    Confusion network

    A confusion network (sometimes called a word confusion network or informally known as a sausage) is a natural language processing method that combines outputs from multiple automatic speech recognition or machine translation systems. Confusion networks are simple linear directed acyclic graphs with the property that each a path from the start node to the end node goes through all the other nodes. The set of words represented by edges between two nodes is called a confusion set. In machine translation, the defining characteristic of confusion networks is that they allow multiple ambiguous inputs, deferring committal translation decisions until later stages of processing. This approach is used in the open source machine translation software Moses and the proprietary translation API in IBM Bluemix Watson.

    Read more →
  • Analogical modeling

    Analogical modeling

    Analogical modeling (AM) is a formal theory of exemplar based analogical reasoning, proposed by Royal Skousen, professor of Linguistics and English language at Brigham Young University in Provo, Utah. It is applicable to language modeling and other categorization tasks. Analogical modeling is related to connectionism and nearest neighbor approaches, in that it is data-based rather than abstraction-based; but it is distinguished by its ability to cope with imperfect datasets (such as caused by simulated short term memory limits) and to base predictions on all relevant segments of the dataset, whether near or far. In language modeling, AM has successfully predicted empirically valid forms for which no theoretical explanation was known (see the discussion of Finnish morphology in Skousen et al. 2002). == Implementation == === Overview === An exemplar-based model consists of a general-purpose modeling engine and a problem-specific dataset. Within the dataset, each exemplar (a case to be reasoned from, or an informative past experience) appears as a feature vector: a row of values for the set of parameters that define the problem. For example, in a spelling-to-sound task, the feature vector might consist of the letters of a word. Each exemplar in the dataset is stored with an outcome, such as a phoneme or phone to be generated. When the model is presented with a novel situation (in the form of an outcome-less feature vector), the engine algorithmically sorts the dataset to find exemplars that helpfully resemble it, and selects one, whose outcome is the model's prediction. The particulars of the algorithm distinguish one exemplar-based modeling system from another. In AM, we think of the feature values as characterizing a context, and the outcome as a behavior that occurs within that context. Accordingly, the novel situation is known as the given context. Given the known features of the context, the AM engine systematically generates all contexts that include it (all of its supracontexts), and extracts from the dataset the exemplars that belong to each. The engine then discards those supracontexts whose outcomes are inconsistent (this measure of consistency will be discussed further below), leaving an analogical set of supracontexts, and probabilistically selects an exemplar from the analogical set with a bias toward those in large supracontexts. This multilevel search exponentially magnifies the likelihood of a behavior's being predicted as it occurs reliably in settings that specifically resemble the given context. === Analogical modeling in detail === AM performs the same process for each case it is asked to evaluate. The given context, consisting of n variables, is used as a template to generate 2 n {\displaystyle 2^{n}} supracontexts. Each supracontext is a set of exemplars in which one or more variables have the same values that they do in the given context, and the other variables are ignored. In effect, each is a view of the data, created by filtering for some criteria of similarity to the given context, and the total set of supracontexts exhausts all such views. Alternatively, each supracontext is a theory of the task or a proposed rule whose predictive power needs to be evaluated. It is important to note that the supracontexts are not equal peers one with another; they are arranged by their distance from the given context, forming a hierarchy. If a supracontext specifies all of the variables that another one does and more, it is a subcontext of that other one, and it lies closer to the given context. (The hierarchy is not strictly branching; each supracontext can itself be a subcontext of several others, and can have several subcontexts.) This hierarchy becomes significant in the next step of the algorithm. The engine now chooses the analogical set from among the supracontexts. A supracontext may contain exemplars that only exhibit one behavior; it is deterministically homogeneous and is included. It is a view of the data that displays regularity, or a relevant theory that has never yet been disproven. A supracontext may exhibit several behaviors, but contain no exemplars that occur in any more specific supracontext (that is, in any of its subcontexts); in this case it is non-deterministically homogeneous and is included. Here there is no great evidence that a systematic behavior occurs, but also no counterargument. Finally, a supracontext may be heterogeneous, meaning that it exhibits behaviors that are found in a subcontext (closer to the given context), and also behaviors that are not. Where the ambiguous behavior of the nondeterministically homogeneous supracontext was accepted, this is rejected because the intervening subcontext demonstrates that there is a better theory to be found. The heterogeneous supracontext is therefore excluded. This guarantees that we see an increase in meaningfully consistent behavior in the analogical set as we approach the given context. With the analogical set chosen, each appearance of an exemplar (for a given exemplar may appear in several of the analogical supracontexts) is given a pointer to every other appearance of an exemplar within its supracontexts. One of these pointers is then selected at random and followed, and the exemplar to which it points provides the outcome. This gives each supracontext an importance proportional to the square of its size, and makes each exemplar likely to be selected in direct proportion to the sum of the sizes of all analogically consistent supracontexts in which it appears. Then, of course, the probability of predicting a particular outcome is proportional to the summed probabilities of all the exemplars that support it. (Skousen 2002, in Skousen et al. 2002, pp. 11–25, and Skousen 2003, both passim) === Formulas === Given a context with n {\displaystyle n} elements: total number of pairings: n 2 {\displaystyle n^{2}} number of agreements for outcome i: n i 2 {\displaystyle n_{i}^{2}} number of disagreements for outcome i: n i ( n − n i ) {\displaystyle n_{i}(n-n_{i})} total number of agreements: ∑ n i 2 {\displaystyle \sum {n_{i}^{2}}} total number of disagreements: ∑ n i ( n − n i ) = n 2 − ∑ n i 2 {\displaystyle \sum {n_{i}(n-n_{i})}=n^{2}-\sum {n_{i}^{2}}} === Example === This terminology is best understood through an example. In the example used in the second chapter of Skousen (1989), each context consists of three variables with potential values 0-3 Variable 1: 0,1,2,3 Variable 2: 0,1,2,3 Variable 3: 0,1,2,3 The two outcomes for the dataset are e and r, and the exemplars are: 3 1 0 e 0 3 2 r 2 1 0 r 2 1 2 r 3 1 1 r We define a network of pointers like so: The solid lines represent pointers between exemplars with matching outcomes; the dotted lines represent pointers between exemplars with non-matching outcomes. The statistics for this example are as follows: n = 5 {\displaystyle n=5} n r = 4 {\displaystyle n_{r}=4} n e = 1 {\displaystyle n_{e}=1} total number of pairings: n 2 = 25 {\displaystyle n^{2}=25} number of agreements for outcome r: n r 2 = 16 {\displaystyle n_{r}^{2}=16} number of agreements for outcome e: n e 2 = 1 {\displaystyle n_{e}^{2}=1} number of disagreements for outcome r: n r ( n − n r ) = 4 {\displaystyle n_{r}(n-n_{r})=4} number of disagreements for outcome e: n e ( n − n e ) = 4 {\displaystyle n_{e}(n-n_{e})=4} total number of agreements: n r 2 + n e 2 = 17 {\displaystyle n_{r}^{2}+n_{e}^{2}=17} total number of disagreements: n r ( n − n r ) + n e ( n − n e ) = n 2 − ( n r 2 + n e 2 ) = 8 {\displaystyle n_{r}(n-n_{r})+n_{e}(n-n_{e})=n^{2}-(n_{r}^{2}+n_{e}^{2})=8} uncertainty or fraction of disagreement: 8 / 25 = .32 {\displaystyle 8/25=.32} Behavior can only be predicted for a given context; in this example, let us predict the outcome for the context "3 1 2". To do this, we first find all of the contexts containing the given context; these contexts are called supracontexts. We find the supracontexts by systematically eliminating the variables in the given context; with m variables, there will generally be 2 m {\displaystyle 2^{m}} supracontexts. The following table lists each of the sub- and supracontexts; x means "not x", and - means "anything". These contexts are shown in the venn diagram below: The next step is to determine which exemplars belong to which contexts in order to determine which of the contexts are homogeneous. The table below shows each of the subcontexts, their behavior in terms of the given exemplars, and the number of disagreements within the behavior: Analyzing the subcontexts in the table above, we see that there is only 1 subcontext with any disagreements: "3 1 2", which in the dataset consists of "3 1 0 e" and "3 1 1 r". There are 2 disagreements in this subcontext; 1 pointing from each of the exemplars to the other (see the pointer network pictured above). Therefore, only supracontexts containing this subcontext will contain any disagreements. We use a simple rule to identify the homogeneous supraco

    Read more →
  • Yooreeka

    Yooreeka

    Yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The project started with the code of the book "Algorithms of the Intelligent Web". Although the term "Web" prevailed in the title, in essence, the algorithms are valuable in any software application. It covers all major algorithms and provides many examples. Yooreeka 2.x is licensed under the Apache License rather than the somewhat more restrictive LGPL (which was the license of v1.x). The library is written 100% in the Java language. == Algorithms == The following algorithms are covered: Clustering Hierarchical—Agglomerative (e.g. MST single link; ROCK) and Divisive Partitional (e.g. k-means) Classification Bayesian Decision trees Neural Networks Rule based (via Drools) Recommendations Collaborative filtering Content based Search PageRank DocRank Personalization

    Read more →
  • Representer theorem

    Representer theorem

    For computer science, in statistical learning theory, a representer theorem is any of several related results stating that a minimizer f ∗ {\displaystyle f^{}} of a regularized empirical risk functional defined over a reproducing kernel Hilbert space can be represented as a finite linear combination of kernel products evaluated on the input points in the training set data. == Formal statement == The following Representer Theorem and its proof are due to Schölkopf, Herbrich, and Smola: Theorem: Consider a positive-definite real-valued kernel k : X × X → R {\displaystyle k:{\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } on a non-empty set X {\displaystyle {\mathcal {X}}} with a corresponding reproducing kernel Hilbert space H k {\displaystyle H_{k}} . Let there be given a training sample ( x 1 , y 1 ) , … , ( x n , y n ) ∈ X × R {\displaystyle (x_{1},y_{1}),\dotsc ,(x_{n},y_{n})\in {\mathcal {X}}\times \mathbb {R} } , a strictly increasing real-valued function g : [ 0 , ∞ ) → R {\displaystyle g\colon [0,\infty )\to \mathbb {R} } , and an arbitrary error function E : ( X × R 2 ) n → R ∪ { ∞ } {\displaystyle E\colon ({\mathcal {X}}\times \mathbb {R} ^{2})^{n}\to \mathbb {R} \cup \lbrace \infty \rbrace } , which together define the following regularized empirical risk functional on H k {\displaystyle H_{k}} : f ↦ E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) + g ( ‖ f ‖ ) . {\displaystyle f\mapsto E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)+g\left(\lVert f\rVert \right).} Then, any minimizer of the empirical risk f ∗ = argmin f ∈ H k { E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) + g ( ‖ f ‖ ) } , ( ∗ ) {\displaystyle f^{}={\underset {f\in H_{k}}{\operatorname {argmin} }}\left\lbrace E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)+g\left(\lVert f\rVert \right)\right\rbrace ,\quad ()} admits a representation of the form: f ∗ ( ⋅ ) = ∑ i = 1 n α i k ( ⋅ , x i ) , {\displaystyle f^{}(\cdot )=\sum _{i=1}^{n}\alpha _{i}k(\cdot ,x_{i}),} where α i ∈ R {\displaystyle \alpha _{i}\in \mathbb {R} } for all 1 ≤ i ≤ n {\displaystyle 1\leq i\leq n} . Proof: Define a mapping φ : X → H k φ ( x ) = k ( ⋅ , x ) {\displaystyle {\begin{aligned}\varphi \colon {\mathcal {X}}&\to H_{k}\\\varphi (x)&=k(\cdot ,x)\end{aligned}}} (so that φ ( x ) = k ( ⋅ , x ) {\displaystyle \varphi (x)=k(\cdot ,x)} is itself a map X → R {\displaystyle {\mathcal {X}}\to \mathbb {R} } ). Since k {\displaystyle k} is a reproducing kernel, then φ ( x ) ( x ′ ) = k ( x ′ , x ) = ⟨ φ ( x ′ ) , φ ( x ) ⟩ , {\displaystyle \varphi (x)(x')=k(x',x)=\langle \varphi (x'),\varphi (x)\rangle ,} where ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } is the inner product on H k {\displaystyle H_{k}} . Given any x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} , one can use orthogonal projection to decompose any f ∈ H k {\displaystyle f\in H_{k}} into a sum of two functions, one lying in span ⁡ { φ ( x 1 ) , … , φ ( x n ) } {\displaystyle \operatorname {span} \left\lbrace \varphi (x_{1}),\ldots ,\varphi (x_{n})\right\rbrace } , and the other lying in the orthogonal complement: f = ∑ i = 1 n α i φ ( x i ) + v , {\displaystyle f=\sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})+v,} where ⟨ v , φ ( x i ) ⟩ = 0 {\displaystyle \langle v,\varphi (x_{i})\rangle =0} for all i {\displaystyle i} . The above orthogonal decomposition and the reproducing property together show that applying f {\displaystyle f} to any training point x j {\displaystyle x_{j}} produces f ( x j ) = ⟨ ∑ i = 1 n α i φ ( x i ) + v , φ ( x j ) ⟩ = ∑ i = 1 n α i ⟨ φ ( x i ) , φ ( x j ) ⟩ , {\displaystyle f(x_{j})=\left\langle \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})+v,\varphi (x_{j})\right\rangle =\sum _{i=1}^{n}\alpha _{i}\langle \varphi (x_{i}),\varphi (x_{j})\rangle ,} which we observe is independent of v {\displaystyle v} . Consequently, the value of the error function E {\displaystyle E} in () is likewise independent of v {\displaystyle v} . For the second term (the regularization term), since v {\displaystyle v} is orthogonal to ∑ i = 1 n α i φ ( x i ) {\displaystyle \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})} and g {\displaystyle g} is strictly monotonic, we have g ( ‖ f ‖ ) = g ( ‖ ∑ i = 1 n α i φ ( x i ) + v ‖ ) = g ( ‖ ∑ i = 1 n α i φ ( x i ) ‖ 2 + ‖ v ‖ 2 ) ≥ g ( ‖ ∑ i = 1 n α i φ ( x i ) ‖ ) . {\displaystyle {\begin{aligned}g\left(\lVert f\rVert \right)&=g\left(\lVert \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})+v\rVert \right)\\&=g\left({\sqrt {\lVert \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})\rVert ^{2}+\lVert v\rVert ^{2}}}\right)\\&\geq g\left(\lVert \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})\rVert \right).\end{aligned}}} Therefore, setting v = 0 {\displaystyle v=0} does not affect the first term of (), while it strictly decreases the second term. Consequently, any minimizer f ∗ {\displaystyle f^{}} in () must have v = 0 {\displaystyle v=0} , i.e., it must be of the form f ∗ ( ⋅ ) = ∑ i = 1 n α i φ ( x i ) = ∑ i = 1 n α i k ( ⋅ , x i ) , {\displaystyle f^{}(\cdot )=\sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})=\sum _{i=1}^{n}\alpha _{i}k(\cdot ,x_{i}),} which is the desired result. == Generalizations == The Theorem stated above is a particular example of a family of results that are collectively referred to as "representer theorems"; here we describe several such. The first statement of a representer theorem was due to Kimeldorf and Wahba for the special case in which E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) = 1 n ∑ i = 1 n ( f ( x i ) − y i ) 2 , g ( ‖ f ‖ ) = λ ‖ f ‖ 2 {\displaystyle {\begin{aligned}E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)&={\frac {1}{n}}\sum _{i=1}^{n}(f(x_{i})-y_{i})^{2},\\g(\lVert f\rVert )&=\lambda \lVert f\rVert ^{2}\end{aligned}}} for λ > 0 {\displaystyle \lambda >0} . Schölkopf, Herbrich, and Smola generalized this result by relaxing the assumption of the squared-loss cost and allowing the regularizer to be any strictly monotonically increasing function g ( ⋅ ) {\displaystyle g(\cdot )} of the Hilbert space norm. It is possible to generalize further by augmenting the regularized empirical risk functional through the addition of unpenalized offset terms. For example, Schölkopf, Herbrich, and Smola also consider the minimization f ~ ∗ = argmin ⁡ { E ( ( x 1 , y 1 , f ~ ( x 1 ) ) , … , ( x n , y n , f ~ ( x n ) ) ) + g ( ‖ f ‖ ) ∣ f ~ = f + h ∈ H k ⊕ span ⁡ { ψ p ∣ 1 ≤ p ≤ M } } , ( † ) {\displaystyle {\tilde {f}}^{}=\operatorname {argmin} \left\lbrace E\left((x_{1},y_{1},{\tilde {f}}(x_{1})),\ldots ,(x_{n},y_{n},{\tilde {f}}(x_{n}))\right)+g\left(\lVert f\rVert \right)\mid {\tilde {f}}=f+h\in H_{k}\oplus \operatorname {span} \lbrace \psi _{p}\mid 1\leq p\leq M\rbrace \right\rbrace ,\quad (\dagger )} i.e., we consider functions of the form f ~ = f + h {\displaystyle {\tilde {f}}=f+h} , where f ∈ H k {\displaystyle f\in H_{k}} and h {\displaystyle h} is an unpenalized function lying in the span of a finite set of real-valued functions { ψ p : X → R ∣ 1 ≤ p ≤ M } {\displaystyle \lbrace \psi _{p}\colon {\mathcal {X}}\to \mathbb {R} \mid 1\leq p\leq M\rbrace } . Under the assumption that the n × M {\displaystyle n\times M} matrix ( ψ p ( x i ) ) i p {\displaystyle \left(\psi _{p}(x_{i})\right)_{ip}} has rank M {\displaystyle M} , they show that the minimizer f ~ ∗ {\displaystyle {\tilde {f}}^{}} in ( † ) {\displaystyle (\dagger )} admits a representation of the form f ~ ∗ ( ⋅ ) = ∑ i = 1 n α i k ( ⋅ , x i ) + ∑ p = 1 M β p ψ p ( ⋅ ) {\displaystyle {\tilde {f}}^{}(\cdot )=\sum _{i=1}^{n}\alpha _{i}k(\cdot ,x_{i})+\sum _{p=1}^{M}\beta _{p}\psi _{p}(\cdot )} where α i , β p ∈ R {\displaystyle \alpha _{i},\beta _{p}\in \mathbb {R} } and the β p {\displaystyle \beta _{p}} are all uniquely determined. The conditions under which a representer theorem exists were investigated by Argyriou, Micchelli, and Pontil, who proved the following: Theorem: Let X {\displaystyle {\mathcal {X}}} be a nonempty set, k {\displaystyle k} a positive-definite real-valued kernel on X × X {\displaystyle {\mathcal {X}}\times {\mathcal {X}}} with corresponding reproducing kernel Hilbert space H k {\displaystyle H_{k}} , and let R : H k → R {\displaystyle R\colon H_{k}\to \mathbb {R} } be a differentiable regularization function. Then given a training sample ( x 1 , y 1 ) , … , ( x n , y n ) ∈ X × R {\displaystyle (x_{1},y_{1}),\ldots ,(x_{n},y_{n})\in {\mathcal {X}}\times \mathbb {R} } and an arbitrary error function E : ( X × R 2 ) m → R ∪ { ∞ } {\displaystyle E\colon ({\mathcal {X}}\times \mathbb {R} ^{2})^{m}\to \mathbb {R} \cup \lbrace \infty \rbrace } , a minimizer f ∗ = argmin f ∈ H k { E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) + R ( f ) } ( ‡ ) {\displaystyle f^{}={\underset {f\in H_{k}}{\operatorname {argmin} }}\left\lbrace E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)+R(f)\right\rbrace \quad (\ddagger )} of the regularized empirical risk admits a repr

    Read more →
  • Ed (chatbot)

    Ed (chatbot)

    Ed was a chatbot co-developed by the Los Angeles Unified School District and AllHere Education. Described as a learning acceleration platform, it was the first personal assistant for students in the United States. Part of the district's Individual Acceleration Plan, it was able to interact with students both verbally and visually, offering support in 100 languages. The chatbot was launched on March 20, 2024, as part of the district's plan for academic recovery from the COVID-19 pandemic and to improve overall academic performance. Utilizing artificial intelligence, Ed organizes data and reports on grades, test scores, and attendance, creating individualized plans for each student. After the company behind it, AllHere, collapsed, the district shuttered operations of the chatbot on June 14, 2024. The firm is under investigation by the US Federal Bureau of Investigation. == History == On February 14, 2022, Alberto M. Carvalho became the Superintendent of the Los Angeles Unified School District, pledging to give the district a full academic recovery from the COVID-19 pandemic. In December 2022, he announced the Individual Acceleration Plan for the district, which aimed to provide each student with a unique progress report and help them determine if they were on track to graduate. The district faced criticism from disability advocates for its management of Individualized Education Programs, and in April 2022, the United States Department of Education announced that the district had failed to provide appropriate educational services to students with disabilities during the pandemic. The district had been grappling with significant absenteeism issues since the pandemic, which led to declining academic performance and disengagement among students. On February 17, 2023, the district issued a request for proposals to develop a fully integrated portal system. Later that year, they signed a $6 million, five-year contract with AllHere Education, a Boston-based company founded in 2016. The introduction of Ed follows the public launch of ChatGPT, which has been utilized by both teachers and students in educational settings. On August 4, 2023, during an annual address at the Walt Disney Concert Hall, Carvalho and the Los Angeles Unified School District announced the launch of Ed. The district invested $4 million into the chatbot, with Carvalho noting that this cost would be halved thanks to donor and grant funding. The chatbot was launched on March 20, 2024. Following its launch, a press conference was held to address security and technology concerns. Carvalho stated that the district had collaborated with security companies and incorporated filters to screen for threatening language. Months after its launch, AllHere Education furloughed most of its staff on June 14, citing their “current financial position” on its website as the reason. After learning about the furlough, the district terminated its dealings with AllHere Education. However, it stated its intention to bring the chatbot back in the future once officials determine the best course of action. Carvalho announced that he would appoint an independent task force to review what went wrong with AllHere Education and the chatbot. On February 25, 2026, the FBI served a search warrant on Carvalho’s home and office in connection with AllHere. The FBI also raided the LAUSD's headquarters. == Service == The chatbot was described as a personal assistant and a "one-stop shop for parents and students" who want to see information about a student's attendance and grades, as well as other resources from the district. Additionally, the application can function as an alarm clock, provide daily lunch menus from the school cafeteria, and offer updates on the location of school buses. The chatbot also helps students and parents who do not speak English as their first language by translating displayed information into approximately 100 different languages. The application can also help with submitting applications and give updates on progress and upcoming assignments. The district stated that the primary goal of Ed was to actively motivate students to complete homework and other tasks. == Reception == The chatbot received a mostly positive reception among parents and observers upon its launch. Some parents and teachers expressed caution about the technology, voicing concerns that the district's push for its implementation lacked public accountability. Rob Nelson from the University of Pennsylvania described the district's strategy as risky, saying that the release felt "like the beginning of a Clippy-level disaster". After the chatbot's shutdown, The 74 criticized it for misusing student data. Chris Whiteley, a former software engineer at AllHere Education, alleged that the data collected by the chatbot likely violated the district's data privacy rules.

    Read more →
  • ARKA descriptors in QSAR

    ARKA descriptors in QSAR

    In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.

    Read more →
  • Oja's rule

    Oja's rule

    Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja (Finnish pronunciation: [ˈojɑ], AW-yuh), is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons. == Theory == Oja's rule requires a number of simplifications to derive, but in its final form it is demonstrably stable, unlike Hebb's rule. It is a single-neuron special case of the Generalized Hebbian Algorithm. However, Oja's rule can also be generalized in other ways to varying degrees of stability and success. === Formula === Consider a simplified model of a neuron y {\displaystyle y} that returns a linear combination of its inputs x using presynaptic weights w: y ( x ) = ∑ j = 1 m x j w j {\displaystyle \,y(\mathbf {x} )~=~\sum _{j=1}^{m}x_{j}w_{j}} Oja's rule defines the change in presynaptic weights w given the output response y {\displaystyle y} of a neuron to its inputs x to be Δ w = w n + 1 − w n = η y n ( x n − y n w n ) , {\displaystyle \,\Delta \mathbf {w} ~=~\mathbf {w} _{n+1}-\mathbf {w} _{n}~=~\eta \,y_{n}(\mathbf {x} _{n}-y_{n}\mathbf {w} _{n}),} where η is the learning rate which can also change with time. Note that the bold symbols are vectors and n defines a discrete time iteration. The rule can also be made for continuous iterations as d w d t = η y ( t ) ( x ( t ) − y ( t ) w ( t ) ) . {\displaystyle \,{\frac {d\mathbf {w} }{dt}}~=~\eta \,y(t)(\mathbf {x} (t)-y(t)\mathbf {w} (t)).} === Derivation === The simplest learning rule known is Hebb's rule, which states in conceptual terms that neurons that fire together, wire together. In component form as a difference equation, it is written Δ w = η y ( x n ) x n {\displaystyle \,\Delta \mathbf {w} ~=~\eta \,y(\mathbf {x} _{n})\mathbf {x} _{n}} , or in scalar form with implicit n-dependence, w i ( n + 1 ) = w i ( n ) + η y ( x ) x i {\displaystyle \,w_{i}(n+1)~=~w_{i}(n)+\eta \,y(\mathbf {x} )x_{i}} , where y(xn) is again the output, this time explicitly dependent on its input vector x. Hebb's rule has synaptic weights approaching infinity with a positive learning rate. We can stop this by normalizing the weights so that each weight's magnitude is restricted between 0, corresponding to no weight, and 1, corresponding to being the only input neuron with any weight. We do this by normalizing the weight vector to be of length one: w i ( n + 1 ) = w i ( n ) + η y ( x ) x i ( ∑ j = 1 m [ w j ( n ) + η y ( x ) x j ] p ) 1 / p {\displaystyle \,w_{i}(n+1)~=~{\frac {w_{i}(n)+\eta \,y(\mathbf {x} )x_{i}}{\left(\sum _{j=1}^{m}[w_{j}(n)+\eta \,y(\mathbf {x} )x_{j}]^{p}\right)^{1/p}}}} . Note that in Oja's original paper, p=2, corresponding to quadrature (root sum of squares), which is the familiar Cartesian normalization rule. However, any type of normalization, even linear, will give the same result without loss of generality. For a small learning rate | η | ≪ 1 {\displaystyle |\eta |\ll 1} the equation can be expanded as a Power series in η {\displaystyle \eta } . w i ( n + 1 ) = w i ( n ) ( ∑ j w j p ( n ) ) 1 / p + η ( y x i ( ∑ j w j p ( n ) ) 1 / p − w i ( n ) ∑ j y x j w j p − 1 ( n ) ( ∑ j w j p ( n ) ) ( 1 + 1 / p ) ) + O ( η 2 ) {\displaystyle \,w_{i}(n+1)~=~{\frac {w_{i}(n)}{\left(\sum _{j}w_{j}^{p}(n)\right)^{1/p}}}~+~\eta \left({\frac {yx_{i}}{\left(\sum _{j}w_{j}^{p}(n)\right)^{1/p}}}-{\frac {w_{i}(n)\sum _{j}yx_{j}w_{j}^{p-1}(n)}{\left(\sum _{j}w_{j}^{p}(n)\right)^{(1+1/p)}}}\right)~+~O(\eta ^{2})} . For small η, our higher-order terms O(η2) go to zero. We again make the specification of a linear neuron, that is, the output of the neuron is equal to the sum of the product of each input and its synaptic weight to the power of p-1, which in the case of p=2 is synaptic weight itself, or y ( x ) = ∑ j = 1 m x j w j p − 1 {\displaystyle \,y(\mathbf {x} )~=~\sum _{j=1}^{m}x_{j}w_{j}^{p-1}} . We also specify that our weights normalize to 1, which will be a necessary condition for stability, so | w | = ( ∑ j = 1 m w j p ) 1 / p = 1 {\displaystyle \,|\mathbf {w} |~=~\left(\sum _{j=1}^{m}w_{j}^{p}\right)^{1/p}~=~1} , which, when substituted into our expansion, gives Oja's rule, or w i ( n + 1 ) = w i ( n ) + η y ( x i − w i ( n ) y ) {\displaystyle \,w_{i}(n+1)~=~w_{i}(n)+\eta \,y(x_{i}-w_{i}(n)y)} . === Stability and PCA === In analyzing the convergence of a single neuron evolving by Oja's rule, one extracts the first principal component, or feature, of a data set. Furthermore, with extensions using the Generalized Hebbian Algorithm, one can create a multi-Oja neural network that can extract as many features as desired, allowing for principal components analysis. A principal component aj is extracted from a dataset x through some associated vector qj, or aj = qj⋅x, and we can restore our original dataset by taking x = ∑ j a j q j {\displaystyle \mathbf {x} ~=~\sum _{j}a_{j}\mathbf {q} _{j}} . In the case of a single neuron trained by Oja's rule, we find the weight vector converges to q1, or the first principal component, as time or number of iterations approaches infinity. We can also define, given a set of input vectors Xi, that its correlation matrix Rij = XiXj has an associated eigenvector given by qj with eigenvalue λj. The variance of outputs of our Oja neuron σ2(n) = ⟨y2(n)⟩ then converges with time iterations to the principal eigenvalue, or lim n → ∞ σ 2 ( n ) = λ 1 {\displaystyle \lim _{n\rightarrow \infty }\sigma ^{2}(n)~=~\lambda _{1}} . These results are derived using Lyapunov function analysis, and they show that Oja's neuron necessarily converges on strictly the first principal component if certain conditions are met in our original learning rule. Most importantly, our learning rate η is allowed to vary with time, but only such that its sum is divergent but its power sum is convergent, that is ∑ n = 1 ∞ η ( n ) = ∞ , ∑ n = 1 ∞ η ( n ) p < ∞ , p > 1 {\displaystyle \sum _{n=1}^{\infty }\eta (n)=\infty ,~~~\sum _{n=1}^{\infty }\eta (n)^{p}<\infty ,~~~p>1} . Our output activation function y(x(n)) is also allowed to be nonlinear and nonstatic, but it must be continuously differentiable in both x and w and have derivatives bounded in time. == Applications == Oja's rule was originally described in Oja's 1982 paper, but the principle of self-organization to which it is applied is first attributed to Alan Turing in 1952. PCA has also had a long history of use before Oja's rule formalized its use in network computation in 1989. The model can thus be applied to any problem of self-organizing mapping, in particular those in which feature extraction is of primary interest. Therefore, Oja's rule has an important place in image and speech processing. It is also useful as it expands easily to higher dimensions of processing, thus being able to integrate multiple outputs quickly. A canonical example is its use in binocular vision. === Biology and Oja's subspace rule === There is clear evidence for both long-term potentiation and long-term depression in biological neural networks, along with a normalization effect in both input weights and neuron outputs. However, while there is no direct experimental evidence yet of Oja's rule active in a biological neural network, a biophysical derivation of a generalization of the rule is possible. Such a derivation requires retrograde signalling from the postsynaptic neuron, which is biologically plausible (see neural backpropagation), and takes the form of Δ w i j ∝ ⟨ x i y j ⟩ − ϵ ⟨ ( c p r e ∗ ∑ k w i k y k ) ⋅ ( c p o s t ∗ y j ) ⟩ , {\displaystyle \Delta w_{ij}~\propto ~\langle x_{i}y_{j}\rangle -\epsilon \left\langle \left(c_{\mathrm {pre} }\sum _{k}w_{ik}y_{k}\right)\cdot \left(c_{\mathrm {post} }y_{j}\right)\right\rangle ,} where as before wij is the synaptic weight between the ith input and jth output neurons, x is the input, y is the postsynaptic output, and we define ε to be a constant analogous the learning rate, and cpre and cpost are presynaptic and postsynaptic functions that model the weakening of signals over time. Note that the angle brackets denote the average and the ∗ operator is a convolution. By taking the pre- and post-synaptic functions into frequency space and combining integration terms with the convolution, we find that this gives an arbitrary-dimensional generalization of Oja's rule known as Oja's Subspace, namely Δ w = C x ⋅ w − w ⋅ C y . {\displaystyle \Delta w~=~Cx\cdot w-w\cdot Cy.}

    Read more →