AI Chat Video

AI Chat Video — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • KidDesk

    KidDesk

    KidDesk is an alternative desktop software application. The early childhood learning company Hatch Early Childhood created KidDesk; it subsequently went to Edmark, which was bought by IBM then sold to Riverdeep (now Houghton Mifflin Harcourt Learning Technology). KidDesk is compatible with Microsoft Windows 95 and newer, as well as Apple System 7 and newer. KidDesk can be set to start when the computer starts up, and can only be exited through password entry. Adults choose what programs are included for the child to use, what icon represented the desk, and customize the software programs available for use. == History == Edmark first started shipping KidDesk in 1992. In 1993, Edmark updated KidDesk with KidDesk Family Edition for Macintosh and DOS, adding more desk accessories and desk styles (Sometimes included as a free exclusive offer with the Early Learning House and Thinkin' Things Series). In 1995, KidDesk Family Edition was enhanced for Windows 95, and released one month after the new operating system shipped. In 1998, Edmark developed KidDesk Internet Safe. The Internet Safe edition was written for Windows 95, Windows 98, and Macintosh (including OS8). In 2008, HMH ported KidDesk Family Edition was to run on Windows Vista and in 2011 version 3.07 of KidDesk Family Edition was released as part of the 'Young Explorer' suite which is fully supported on Windows XP, Windows Vista and Windows 7. == Features == A picture editor incorporated into the desk. Used both in the Adult settings menu and in the desk itself. KidDesk users can edit their user logo with a pixel grid paint program. A calendar incorporated into the desk. This allows the user to set dates that the user finds important, and allows the date to be marked with a picture or text. A password exit feature. For security reasons, the adult can set a password so that KidDesk can only be exited if it is entered. As an extra security measure, the password exit function could only be accessed if the user pressed the ctrl + alt + A keyboard buttons simultaneously. A skin changer with several themes - farm, princess, sports, ocean, etc. These themes can be changed. The e-mail and voicemail features are customizable depending on the KidDesk installation. The ability to add websites that can be accessed on KidDesk, and the ability to block hyperlinks, JavaScript, data entry, etc., on said sites was an added for the 'Internet Safe' edition released in 1998. KidDesk Internet Safe edition is available in Spanish and Brazilian-Portuguese versions. == Reception == KidDesk was given a platinum award at the 1994 Oppenheim Toy Portfolio Awards. The judges praised the program's security features allowing "configur[ation] so that kids never have access to the possibly destructive DOS prompt", and concluded that "[i]f you and your kids share a computer, you need to install Kiddesk immediately!" === Awards === Since 1992, KidDesk has won 15 major awards.

    Read more →
  • Probabilistic automaton

    Probabilistic automaton

    In mathematics and computer science, the probabilistic automaton (PA) is a generalization of the nondeterministic finite automaton; it includes the probability of a given transition into the transition function, turning it into a transition matrix. Thus, the probabilistic automaton also generalizes the concepts of a Markov chain and of a subshift of finite type. The languages recognized by probabilistic automata are called stochastic languages; these include the regular languages as a subset. The number of stochastic languages is uncountable. The concept was introduced by Michael O. Rabin in 1963; a certain special case is sometimes known as the Rabin automaton (not to be confused with the subclass of ω-automata also referred to as Rabin automata). In recent years, a variant has been formulated in terms of quantum probabilities, the quantum finite automaton. == Informal Description == For a given initial state and input character, a deterministic finite automaton (DFA) has exactly one next state, and a nondeterministic finite automaton (NFA) has a set of next states. A probabilistic automaton (PA) instead has a weighted set (or vector) of next states, where the weights must sum to 1 and therefore can be interpreted as probabilities (making it a stochastic vector). The notions states and acceptance must also be modified to reflect the introduction of these weights. The state of the machine as a given step must now also be represented by a stochastic vector of states, and a state accepted if its total probability of being in an acceptance state exceeds some cut-off. A PA is in some sense a half-way step from deterministic to non-deterministic, as it allows a set of next states but with restrictions on their weights. However, this is somewhat misleading, as the PA utilizes the notion of the real numbers to define the weights, which is absent in the definition of both DFAs and NFAs. This additional freedom enables them to decide languages that are not regular, such as the p-adic languages with irrational parameters. As such, PAs are more powerful than both DFAs and NFAs (which are famously equally powerful). == Formal Definition == The probabilistic automaton may be defined as an extension of a nondeterministic finite automaton ( Q , Σ , δ , q 0 , F ) {\displaystyle (Q,\Sigma ,\delta ,q_{0},F)} , together with two probabilities: the probability P {\displaystyle P} of a particular state transition taking place, and with the initial state q 0 {\displaystyle q_{0}} replaced by a stochastic vector giving the probability of the automaton being in a given initial state. For the ordinary non-deterministic finite automaton, one has a finite set of states Q {\displaystyle Q} a finite set of input symbols Σ {\displaystyle \Sigma } a transition function δ : Q × Σ → ℘ ( Q ) {\displaystyle \delta :Q\times \Sigma \to \wp (Q)} a set of states F {\displaystyle F} distinguished as accepting (or final) states F ⊆ Q {\displaystyle F\subseteq Q} . Here, ℘ ( Q ) {\displaystyle \wp (Q)} denotes the power set of Q {\displaystyle Q} . By use of currying, the transition function δ : Q × Σ → ℘ ( Q ) {\displaystyle \delta :Q\times \Sigma \to \wp (Q)} of a non-deterministic finite automaton can be written as a membership function δ : Q × Σ × Q → { 0 , 1 } {\displaystyle \delta :Q\times \Sigma \times Q\to \{0,1\}} so that δ ( q , a , q ′ ) = 1 {\displaystyle \delta (q,a,q^{\prime })=1} if q ′ ∈ δ ( q , a ) {\displaystyle q^{\prime }\in \delta (q,a)} and 0 {\displaystyle 0} otherwise. The curried transition function can be understood to be a matrix with matrix entries [ θ a ] q q ′ = δ ( q , a , q ′ ) {\displaystyle \left[\theta _{a}\right]_{qq^{\prime }}=\delta (q,a,q^{\prime })} The matrix θ a {\displaystyle \theta _{a}} is then a square matrix, whose entries are zero or one, indicating whether a transition q → a q ′ {\displaystyle q{\stackrel {a}{\rightarrow }}q^{\prime }} is allowed by the NFA. Such a transition matrix is always defined for a non-deterministic finite automaton. The probabilistic automaton replaces these matrices by a family of right stochastic matrices P a {\displaystyle P_{a}} , for each symbol a in the alphabet Σ {\displaystyle \Sigma } so that the probability of a transition is given by [ P a ] q q ′ {\displaystyle \left[P_{a}\right]_{qq^{\prime }}} A state change from some state to any state must occur with probability one, of course, and so one must have ∑ q ′ [ P a ] q q ′ = 1 {\displaystyle \sum _{q^{\prime }}\left[P_{a}\right]_{qq^{\prime }}=1} for all input letters a {\displaystyle a} and internal states q {\displaystyle q} . The initial state of a probabilistic automaton is given by a row vector v {\displaystyle v} , whose components are the probabilities of the individual initial states q {\displaystyle q} , that add to 1: ∑ q [ v ] q = 1 {\displaystyle \sum _{q}\left[v\right]_{q}=1} The transition matrix acts on the right, so that the state of the probabilistic automaton, after consuming the input string a b c {\displaystyle abc} , would be v P a P b P c {\displaystyle vP_{a}P_{b}P_{c}} In particular, the state of a probabilistic automaton is always a stochastic vector, since the product of any two stochastic matrices is a stochastic matrix, and the product of a stochastic vector and a stochastic matrix is again a stochastic vector. This vector is sometimes called the distribution of states, emphasizing that it is a discrete probability distribution. Formally, the definition of a probabilistic automaton does not require the mechanics of the non-deterministic automaton, which may be dispensed with. Formally, a probabilistic automaton PA is defined as the tuple ( Q , Σ , P , v , F ) {\displaystyle (Q,\Sigma ,P,v,F)} . A Rabin automaton is one for which the initial distribution v {\displaystyle v} is a coordinate vector; that is, has zero for all but one entries, and the remaining entry being one. == Stochastic languages == The set of languages recognized by probabilistic automata are called stochastic languages. They include the regular languages as a subset. Let F = Q accept ⊆ Q {\displaystyle F=Q_{\text{accept}}\subseteq Q} be the set of "accepting" or "final" states of the automaton. By abuse of notation, Q accept {\displaystyle Q_{\text{accept}}} can also be understood to be the column vector that is the membership function for Q accept {\displaystyle Q_{\text{accept}}} ; that is, it has a 1 at the places corresponding to elements in Q accept {\displaystyle Q_{\text{accept}}} , and a zero otherwise. This vector may be contracted with the internal state probability, to form a scalar. The language recognized by a specific automaton is then defined as L η = { s ∈ Σ ∗ | v P s Q accept > η } {\displaystyle L_{\eta }=\{s\in \Sigma ^{}\vert vP_{s}Q_{\text{accept}}>\eta \}} where Σ ∗ {\displaystyle \Sigma ^{}} is the set of all strings in the alphabet Σ {\displaystyle \Sigma } (so that is the Kleene star). The language depends on the value of the cut-point η {\displaystyle \eta } , normally taken to be in the range 0 ≤ η < 1 {\displaystyle 0\leq \eta <1} . A language is called η-stochastic if and only if there exists some PA that recognizes the language, for fixed η {\displaystyle \eta } . A language is called stochastic if and only if there is some 0 ≤ η < 1 {\displaystyle 0\leq \eta <1} for which L η {\displaystyle L_{\eta }} is η-stochastic. A cut-point is said to be an isolated cut-point if and only if there exists a δ > 0 {\displaystyle \delta >0} such that | v P ( s ) Q accept − η | ≥ δ {\displaystyle \vert vP(s)Q_{\text{accept}}-\eta \vert \geq \delta } for all s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} == Properties == Every regular language is stochastic, and more strongly, every regular language is η-stochastic. A weak converse is that every 0-stochastic language is regular; however, the general converse does not hold: there are stochastic languages that are not regular. Every η-stochastic language is stochastic, for some 0 < η < 1 {\displaystyle 0<\eta <1} . Every stochastic language is representable by a Rabin automaton. If η {\displaystyle \eta } is an isolated cut-point, then L η {\displaystyle L_{\eta }} is a regular language. == p-adic languages == The p-adic languages provide an example of a stochastic language that is not regular, and also show that the number of stochastic languages is uncountable. A p-adic language is defined as the set of strings L η ( p ) = { n 1 n 2 n 3 … | 0 ≤ n k < p and 0. n 1 n 2 n 3 … > η } {\displaystyle L_{\eta }(p)=\{n_{1}n_{2}n_{3}\ldots \vert 0\leq n_{k}\eta \}} in the letters 0 , 1 , 2 , … , ( p − 1 ) {\displaystyle 0,1,2,\ldots ,(p-1)} . That is, a p-adic language is merely the set of real numbers in [0, 1], written in base-p, such that they are greater than η {\displaystyle \eta } . It is straightforward to show that all p-adic languages are stochastic. In particular, this implies that the number of stochastic languages is uncountable. A p-adic

    Read more →
  • AI Content Generators: Free vs Paid (2026)

    AI Content Generators: Free vs Paid (2026)

    Shopping for the best AI content generator? An AI content generator is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI content generator slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • IBM alignment models

    IBM alignment models

    The IBM alignment models are a sequence of increasingly complex models used in statistical machine translation to train a translation model and an alignment model, starting with lexical translation probabilities and moving to reordering and word duplication. They underpinned the majority of statistical machine translation systems for almost twenty years starting in the early 1990s, until neural machine translation began to dominate. These models offer principled probabilistic formulation and (mostly) tractable inference. The IBM alignment models were published in parts in 1988 and 1990, and the entire series is published in 1993. Every author of the 1993 paper subsequently went to the hedge fund Renaissance Technologies. The original work on statistical machine translation at IBM proposed five models, and a model 6 was proposed later. The sequence of the six models can be summarized as: Model 1: lexical translation Model 2: additional absolute alignment model Model 3: extra fertility model Model 4: added relative alignment model Model 5: fixed deficiency problem. Model 6: Model 4 combined with a HMM alignment model in a log linear way == Mathematical setup == The IBM alignment models translation as a conditional probability model. For each source-language ("foreign") sentence f {\displaystyle f} , we generate both a target-language ("English") sentence e {\displaystyle e} and an alignment a {\displaystyle a} . The problem then is to find a good statistical model for p ( e , a | f ) {\displaystyle p(e,a|f)} , the probability that we would generate English language sentence e {\displaystyle e} and an alignment a {\displaystyle a} given a foreign sentence f {\displaystyle f} . The meaning of an alignment grows increasingly complicated as the model version number grew. See Model 1 for the most simple and understandable version. == Model 1 == === Word alignment === Given any foreign-English sentence pair ( e , f ) {\displaystyle (e,f)} , an alignment for the sentence pair is a function of type { 1 , . , . . . , l e } → { 0 , 1 , . , . . . , l f } {\displaystyle \{1,.,...,l_{e}\}\to \{0,1,.,...,l_{f}\}} . That is, we assume that the English word at location i {\displaystyle i} is "explained" by the foreign word at location a ( i ) {\displaystyle a(i)} . For example, consider the following pair of sentences It will surely rain tomorrow -- 明日 は きっと 雨 だWe can align some English words to corresponding Japanese words, but not everyone:it -> ? will -> ? surely -> きっと rain -> 雨 tomorrow -> 明日This in general happens due to the different grammar and conventions of speech in different languages. English sentences require a subject, and when there is no subject available, it uses a dummy pronoun it. Japanese verbs do not have different forms for future and present tense, and the future tense is implied by the noun 明日 (tomorrow). Conversely, the topic-marker は and the grammar word だ (roughly "to be") do not correspond to any word in the English sentence. So, we can write the alignment as 1-> 0; 2 -> 0; 3 -> 3; 4 -> 4; 5 -> 1where 0 means that there is no corresponding alignment. Thus, we see that the alignment function is in general a function of type { 1 , . , . . . , l e } → { 0 , 1 , . , . . . , l f } {\displaystyle \{1,.,...,l_{e}\}\to \{0,1,.,...,l_{f}\}} . Future models will allow one English world to be aligned with multiple foreign words. === Statistical model === Given the above definition of alignment, we can define the statistical model used by Model 1: Start with a "dictionary". Its entries are of form t ( e i | f j ) {\displaystyle t(e_{i}|f_{j})} , which can be interpreted as saying "the foreign word f j {\displaystyle f_{j}} is translated to the English word e i {\displaystyle e_{i}} with probability t ( e i | f j ) {\displaystyle t(e_{i}|f_{j})} ". After being given a foreign sentence f {\displaystyle f} with length l f {\displaystyle l_{f}} , we first generate an English sentence length l e {\displaystyle l_{e}} uniformly in a range U n i f o r m [ 1 , 2 , . . . , N ] {\displaystyle Uniform[1,2,...,N]} . In particular, it does not depend on f {\displaystyle f} or l f {\displaystyle l_{f}} . Then, we generate an alignment uniformly in the set of all possible alignment functions { 1 , . , . . . , l e } → { 0 , 1 , . , . . . , l f } {\displaystyle \{1,.,...,l_{e}\}\to \{0,1,.,...,l_{f}\}} . Finally, for each English word e 1 , e 2 , . . . e l e {\displaystyle e_{1},e_{2},...e_{l_{e}}} , generate each one independently of every other English word. For the word e i {\displaystyle e_{i}} , generate it according to t ( e i | f a ( i ) ) {\displaystyle t(e_{i}|f_{a(i)})} . Together, we have the probability p ( e , a | f ) = 1 / N ( 1 + l f ) l e ∏ i = 1 l e t ( e i | f a ( i ) ) {\displaystyle p(e,a|f)={\frac {1/N}{(1+l_{f})^{l_{e}}}}\prod _{i=1}^{l_{e}}t(e_{i}|f_{a(i)})} IBM Model 1 uses very simplistic assumptions on the statistical model, in order to allow the following algorithm to have closed-form solution. === Learning from a corpus === If a dictionary is not provided at the start, but we have a corpus of English-foreign language pairs { ( e ( k ) , f ( k ) ) } k {\displaystyle \{(e^{(k)},f^{(k)})\}_{k}} (without alignment information), then the model can be cast into the following form: fixed parameters: the foreign sentences { f ( k ) } k {\displaystyle \{f^{(k)}\}_{k}} . learnable parameters: the entries of the dictionary t ( e i | f j ) {\displaystyle t(e_{i}|f_{j})} . observable variables: the English sentences { e ( k ) } k {\displaystyle \{e^{(k)}\}_{k}} . latent variables: the alignments { a ( k ) } k {\displaystyle \{a^{(k)}\}_{k}} In this form, this is exactly the kind of problem solved by expectation–maximization algorithm. Due to the simplistic assumptions, the algorithm has a closed-form, efficiently computable solution, which is the solution to the following equations: { max t ′ ∑ k ∑ i ∑ a ( k ) t ( a ( k ) | e ( k ) , f ( k ) ) ln ⁡ t ( e i ( k ) | f a ( k ) ( i ) ( k ) ) ∑ x t ′ ( e x | f y ) = 1 ∀ y {\displaystyle {\begin{cases}\max _{t'}\sum _{k}\sum _{i}\sum _{a^{(k)}}t(a^{(k)}|e^{(k)},f^{(k)})\ln t(e_{i}^{(k)}|f_{a^{(k)}(i)}^{(k)})\\\sum _{x}t'(e_{x}|f_{y})=1\quad \forall y\end{cases}}} This can be solved by Lagrangian multipliers, then simplified. For a detailed derivation of the algorithm, see chapter 4 and. In short, the EM algorithm goes as follows:INPUT. a corpus of English-foreign sentence pairs { ( e ( k ) , f ( k ) ) } k {\displaystyle \{(e^{(k)},f^{(k)})\}_{k}} INITIALIZE. matrix of translations probabilities t ( e x | f y ) {\displaystyle t(e_{x}|f_{y})} .This could either be uniform or random. It is only required that every entry is positive, and for each y {\displaystyle y} , the probability sums to one: ∑ x t ( e x | f y ) = 1 {\displaystyle \sum _{x}t(e_{x}|f_{y})=1} . LOOP. until t ( e x | f y ) {\displaystyle t(e_{x}|f_{y})} converges: t ( e x | f y ) ← t ( e x | f y ) λ y ∑ k , i , j δ ( e x , e i ( k ) ) δ ( f y , f j ( k ) ) ∑ j ′ t ( e i ( k ) | f j ′ ( k ) ) {\displaystyle t(e_{x}|f_{y})\leftarrow {\frac {t(e_{x}|f_{y})}{\lambda _{y}}}\sum _{k,i,j}{\frac {\delta (e_{x},e_{i}^{(k)})\delta (f_{y},f_{j}^{(k)})}{\sum _{j'}t(e_{i}^{(k)}|f_{j'}^{(k)})}}} where each λ y {\displaystyle \lambda _{y}} is a normalization constant that makes sure each ∑ x t ( e x | f y ) = 1 {\displaystyle \sum _{x}t(e_{x}|f_{y})=1} .RETURN. t ( e x | f y ) {\displaystyle t(e_{x}|f_{y})} .In the above formula, δ {\displaystyle \delta } is the Dirac delta function -- it equals 1 if the two entries are equal, and 0 otherwise. The index notation is as follows: k {\displaystyle k} ranges over English-foreign sentence pairs in corpus; i {\displaystyle i} ranges over words in English sentences; j {\displaystyle j} ranges over words in foreign language sentences; x {\displaystyle x} ranges over the entire vocabulary of English words in the corpus; y {\displaystyle y} ranges over the entire vocabulary of foreign words in the corpus. === Limitations === There are several limitations to the IBM model 1. No fluency: Given any sentence pair ( e , f ) {\displaystyle (e,f)} , any permutation of the English sentence is equally likely: p ( e | f ) = p ( e ′ | f ) {\displaystyle p(e|f)=p(e'|f)} for any permutation of the English sentence e {\displaystyle e} into e ′ {\displaystyle e'} . No length preference: The probability of each length of translation is equal: ∑ e has length l p ( e | f ) = 1 N {\displaystyle \sum _{e{\text{ has length }}l}p(e|f)={\frac {1}{N}}} for any l ∈ { 1 , 2 , . . . , N } {\displaystyle l\in \{1,2,...,N\}} . Does not explicitly model fertility: some foreign words tend to produce a fixed number of English words. For example, for German-to-English translation, ja is usually omitted, and zum is usually translated to one of to the, for the, to a, for a. == Model 2 == Model 2 allows alignment to be conditional on sentence lengths. That is, we have a probability distribution p a ( j | i , l e , l f ) {\displaystyle

    Read more →
  • Lexalytics

    Lexalytics

    Lexalytics, Inc. provides sentiment and intent analysis to an array of companies using SaaS and cloud based technology. Salience 6, the engine behind Lexalytics, was built as an on-premises, multi-lingual text analysis engine. It is leased to other companies who use it to power filtering and reputation management programs. In July, 2015 Lexalytics acquired Semantria to be used as a cloud option for its technology. In September, 2021 Lexalytics was acquired by CX company InMoment. == History == Lexalytics spun into existence in January 2003 out of a content management startup called Lightspeed. Lightspeed consolidated on America's West Coast. Jeff Catlin, a Lightspeed General Manager, and Mike Marshall, a Lighstpeed Principal Engineer, convinced investors to give them the East Coast company so as to avoid shutdown costs. Catlin and Marshall renamed the operation Lexalytics. Catlin took on the role of chief executive officer with Marshall working as Chief Technology Officer. Lexalytics opted to not accept venture cash. Instead, the company initially shared sales and marketing expenses with U.K. based document management company Infonic. The partner companies soon formed a joint venture in July 2008, which was later dissolved. Since then, Lexalytics has worked with many other companies, like Bottlenose, Salesforce, Thomson Reuters, Oracle and DataSift. Relationships with social media monitoring companies like Datasift tend to find Lexalytics’ Salience engine baked into the product itself. Lexalytics is used similarly to monitor sentiment as it relates to stock trading. In December 2014, Lexalytics announced the latest iteration to its sentiment analysis engine, Salience 6. Earlier that year Lexalytics acquired Semantria in a bid to appeal to a wider variety of business models. Created by former Lexalytics Marketing Director Oleg Rogynskyy, Semantria is a SaaS text mining service offered as an API and Excel based plugin that measures sentiment. The goal of the acquisition, which cost Lexalytics less than US$10 million, was to expand the customer base both within the United States and abroad with multilingual support. The engine that powers Semantria, Salience, is grounded in its deep learning ability. An example of this is its concept matrix, which allows Salience an understanding of concepts and relationship between concepts based on a detailed reading of the entire repository of Wikipedia. This matrix allows Salience to use Wikipedia for automatic categorization. Along with features like the concept matrix, Salience supports 16 international languages. The engine has earned Lexalytics a spot on EContent's “Top 100 Companies in the Digital Content Industry” List for 2014–2015. In September 2018, Lexalytics launched document data extraction market using natural language processing (NLP).

    Read more →
  • MedSLT

    MedSLT

    MedSLT is a medium-ranged open source spoken language translator developed by the University of Geneva. It is funded by the Swiss National Science Foundation. The system has been designed for the medical domain. It currently covers the doctor-patient diagnosis dialogues for the domains of headache, chest and abdominal pain in English, French, Japanese, Spanish, Catalan and Arabic. The vocabulary used ranges from 350 to 1000 words depending on the domain and language pair. == Motivation for creating MedSLT == With more than 6000 languages worldwide, language barriers become an increasing problem for healthcare. The lack of medical interpreters can lead to disastrous consequences. These range from prolonged hospital stays to wrong diagnosis and medication. A study found that only about half of the 23 million people with limited proficiency in English in the United States had been provided with a medical interpreter. Millions of refugees and immigrants worldwide face similar problems, although not always as severe. The gap between need and availability of language services might be closed with speech translation systems. == Challenges == The biggest challenge is and was to develop an ideal system, though it is not possible to do so at this moment. This system would fit the needs of doctors and the patients alike, and would provide accurate and flexible translation. A realisation of an ideal translation tool is impossible without the use of unrestricted language and a large vocabulary. Medical professionals demand high reliability from translation. This favours rule-based architectures over data-driven. The latter are more suitable for inexperienced users. Rule-based architectures achieve higher accuracy especially if used by experts. Though it is highly desirable to build a bidirectional system supporting a two-way dialogue, which concentrates on patient-centered communication, the patients will have difficult access to the system. Most patients have no experience with such systems. Less reliable results for translation from the patient-to-doctor direction are the outcome. To overcome this the system needs to provide either easy access or an integrated help tool to guide the users through the process. Although controlled rule-based systems achieve good results, they are brittle. To receive good translations the user needs to be familiar with the system and has to know what is covered by the grammar. Covering different sub-domains (headache, chest and abdominal pain) and language pairs presents additional problems. A shared structure and grammar for all subdomains and language pairs minimises development and maintenance costs. The integration of new doctor and patient languages is also a key challenge. Adding new languages should be quick and rather simple, because he system has to be used in many countries to cover multiple language pairs. Direct translation from source to target language proves to be rather difficult. Using interlingua for unidirectional translation instead of a bidirectional approach helps to simplify the translation process. On top of this, the system has to run on different platforms, because mobility is a key issue for many attending physicians. A portable version addresses these issues, but has to deal with the heavy load of the translation process. == The MedSLT system == The system's speech recognition is based on the Nuance 8.5 platform that supports grammar-based language models. All grammars used for recognition, analysis and generation are compiled from a small set of unification grammars. These core grammars are created by the open-source Regulus Grammar Compiler and are automatically specialised using corpus-driven methods. The specialisation considers both the task (recognition, analysis and generation) and the sub-domain (headache, chest and abdominal pain). The specialisation uses the explanation-based learning algorithm to create a treebank from the training corpus. These examples are divided into sets of subtrees by using domain- and grammar-specific rules (also known as "operationality criteria" in machine translation). The subtree rules are combined into a single rule, creating a specialised unification grammar. The grammar is compiled to an executable form, for analysis and generation by a parser or generator, and for recognition of a CFG grammar. A CFG grammar is required for the Nuance engine. Compilation by Nuance-specific criteria turns the grammar into speech recognition packages. The final step uses the training corpus again for statistical tuning of the language model. MedSLT translation processes are based on a rule-based interlingua. The interlingua is treated as an actual language (it is a very simple version of English) and is specified by a Regulus grammar. This grammar does not take account of complex surface syntax phenomena of real languages like movement or agreement. A set of rules is the base for translating the source language semantic representation to interlingua. Another set of rules covers the translation from interlingua to the target language. The semantic representations are converted to surface words using a target language grammar. Defining semantics for a specific domain enables the developers to specify interlingua with a small, tightly constraint semantic grammar. The translations based on interlingua match direct translations almost perfectly, because the development shifts to a decoupled monolingual architecture. A set of combined interlingua corpora, with one corpus per sub-domain, is the core of this architecture. All source language development corpora are translated to interlingua. These are sorted and grouped together with the corresponding source language examples. The interlingua forms are then translated into each target language, and the results are attached together. This organisation improves the translation process. There is no duplicated effort for multilingual regression testing, because each parsing and generation step is performed once. This allows more frequent testing. The representation language used for all forms is Almost Flat Functional semantics. AFF is derived from the Spoken Language Translator, the precursor of MEdSLT. SLT uses Quasi Logical Form, a logical based representation language. QLF is an expressive yet very complex language, causing high development and maintenance costs. A minimal solution was planned for the medical translator. Early versions of the system utilised a language using simple feature-value lists. These lists were supplemented with an optional level of nesting to represent subordinate clauses (i.e. embedded clauses). Determiners were not included, because they are hard to translate and it is difficult to reliably distinguish and recognise them. This way, translation rules became a lot simpler, because only a list of feature-value pairs had to be mapped to another list of pairs. The language turned out to be underconstrained. Adding natural sortal constraints to the grammar solved this problem, but also returned the language to a more expressive formalism. The newly created AFF combines elements of QLF and the feature-value list semantics. This version of flat semantics is enhanced with additional functional markings. This together with a relatively small vocabulary solved the ambiguity problem of the original flat representation language without creating overly complex rules. In addition, the syntactic structures are treated carefully by a compromise of linguistic and engineering traditions. The grammars are in fact retrieved from linguistically motivated resource, using corpus-based methods. They are driven by small sets of examples. This results in simpler and flatter domain-specific grammars. The semantics are less sophisticated and represent a minimal approach in the engineering tradition. Each lexical item contributes a set of feature-value pairs. This leads to simple-to-write translation rules. There are only lists of features-value pairs to map to other feature-value pairs. However, as a result the machine translation channel model becomes underspecified and is weakened, whereas the target language model is strengthened. An intelligent help module is integrated into the system to support users in utilising the full coverage of the grammars. This tool provides the user with examples as close as possible to the users original utterance. The output is based on a library. Each sub-domain and language pair has its own library. The contents are extracted from the combined interlingua corpora. The help module scans the corpus for the tagged source language form mapped with the corresponding target language form. Additionally a second statistical recogniser is used as backup. The results are used to select similar examples from the library. According to the generation preferences, one of the derived strings is picked and the target language string is realised as spoken language. Some statistical corpus based meth

    Read more →
  • Global Language Monitor

    Global Language Monitor

    The Global Language Monitor (GLM) is a company based in Austin, Texas, that analyzes trends in the English language. == History == Founded in Silicon Valley in 2003 by Paul J.J. Payack, the GLM describes its role as "a media analytics company that documents, analyzes and tracks cultural trends in language the world over, with a particular emphasis upon International and Global English". In April 2008, GLM moved its headquarters from San Diego to Austin. In July 2020, GLM announced that the word covid was its Top Word of 2020 for English. The company has been repeatedly criticized by linguists for promoting misinformation about language. Writing on Language Log, the linguist Ben Zimmer accused it of "hoodwink[ing] unsuspecting journalists on a range of pseudoscientific claims".

    Read more →
  • Struc2vec

    Struc2vec

    struc2vec is a framework to generate node vector representations on a graph that preserve the structural identity. In contrast to node2vec, that optimizes node embeddings so that nearby nodes in the graph have similar embedding, struc2vec captures the roles of nodes in a graph, even if structurally similar nodes are far apart in the graph. It learns low-dimensional representations for nodes in a graph, generating random walks through a constructed multi-layer graph starting at each graph node. It is useful for machine learning applications where the downstream application is more related with the structural equivalence of the nodes (e.g., it can be used to detect nodes in networks with similar functions, such as interns in the social network of a corporation). struc2vec identifies nodes that play a similar role based solely on the structure of the graph, for example computing the structural identity of individuals in social networks. In particular, struc2vec employs a degree-based method to measure the pairwise structural role similarity, which is then adopted to build the multi-layer graph. Moreover, the distance between the latent representation of nodes is strongly correlated to their structural similarity. The framework contains three optimizations: reducing the length of degree sequences considered, reducing the number of pairwise similarity calculations, and reducing the number of layers in the generated graph. struc2vec follows the intuition that random walks through a graph can be treated as sentences in a corpus. Each node in a graph is treated as an individual word, and short random walk is treated as a sentence. In its final phase, the algorithm employs Gensim's word2vec algorithm to learn embeddings based on biased random walks. Sequences of nodes are fed into a skip-gram or continuous bag of words model and traditional machine-learning techniques for classification can be used. It is considered a useful framework to learn node embeddings based on structural equivalence.

    Read more →
  • Conditional random field

    Conditional random field

    Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. The kind of graph used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions. Other examples where CRFs are used are: labeling or parsing of sequential data for natural language processing or biological sequences, part-of-speech tagging, shallow parsing, named entity recognition, gene finding, peptide critical functional region finding, and object recognition and image segmentation in computer vision. == Description == CRFs are a type of discriminative undirected probabilistic graphical model. Lafferty, McCallum and Pereira define a CRF on observations X {\displaystyle {\boldsymbol {X}}} and random variables Y {\displaystyle {\boldsymbol {Y}}} as follows: Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph such that Y = ( Y v ) v ∈ V {\displaystyle {\boldsymbol {Y}}=({\boldsymbol {Y}}_{v})_{v\in V}} , so that Y {\displaystyle {\boldsymbol {Y}}} is indexed by the vertices of G {\displaystyle G} . Then ( X , Y ) {\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})} is a conditional random field when each random variable Y v {\displaystyle {\boldsymbol {Y}}_{v}} , conditioned on X {\displaystyle {\boldsymbol {X}}} , obeys the Markov property with respect to the graph; that is, its probability is dependent only on its neighbours in G and not its past states: P ( Y v | X , { Y w : w ≠ v } ) = P ( Y v | X , { Y w : w ∼ v } ) {\displaystyle P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\neq v\})=P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\sim v\})} , where w ∼ v {\displaystyle {\mathit {w}}\sim v} means that w {\displaystyle w} and v {\displaystyle v} are neighbors in G {\displaystyle G} . What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets X {\displaystyle {\boldsymbol {X}}} and Y {\displaystyle {\boldsymbol {Y}}} , the observed and output variables, respectively; the conditional distribution p ( Y | X ) {\displaystyle p({\boldsymbol {Y}}|{\boldsymbol {X}})} is then modeled. === Inference === For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible: If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs. If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions. If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include: Loopy belief propagation Alpha expansion Mean field inference Linear programming relaxations === Parameter learning === Learning the parameters θ {\displaystyle \theta } is usually done by maximum likelihood learning for p ( Y i | X i ; θ ) {\displaystyle p(Y_{i}|X_{i};\theta )} . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used. === Examples === In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X {\displaystyle X} represents a sequence of observations and Y {\displaystyle Y} represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y i {\displaystyle Y_{i}} are structured to form a chain, with an edge between each Y i − 1 {\displaystyle Y_{i-1}} and Y i {\displaystyle Y_{i}} . As well as having a simple interpretation of the Y i {\displaystyle Y_{i}} as "labels" for each element in the input sequence, this layout admits efficient algorithms for: model training, learning the conditional distributions between the Y i {\displaystyle Y_{i}} and feature functions from some corpus of training data. decoding, determining the probability of a given label sequence Y {\displaystyle Y} given X {\displaystyle X} . inference, determining the most likely label sequence Y {\displaystyle Y} given X {\displaystyle X} . The conditional dependency of each Y i {\displaystyle Y_{i}} on X {\displaystyle X} is defined through a fixed set of feature functions of the form f ( i , Y i − 1 , Y i , X ) {\displaystyle f(i,Y_{i-1},Y_{i},X)} , which can be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y i {\displaystyle Y_{i}} . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y i {\displaystyle Y_{i}} . Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence. Notably, in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X {\displaystyle X} at any point during inference, and the range of the feature functions need not have a probabilistic interpretation. == Variants == === Higher-order CRFs and semi-Markov CRFs === CRFs can be extended into higher order models by making each Y i {\displaystyle Y_{i}} dependent on a fixed number k {\displaystyle k} of previous variables Y i − k , . . . , Y i − 1 {\displaystyle Y_{i-k},...,Y_{i-1}} . In conventional formulations of higher order CRFs, training and inference are only practical for small values of k {\displaystyle k} (such as k ≤ 5), since their computational cost increases exponentially with k {\displaystyle k} . However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer (SM), a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations. To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions (which are driven by an SM). This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length. There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence Y {\displaystyle Y} . This provides much of the power of higher-order CRFs to model long-range dependencies of the Y i {\displaystyle Y_{i}} , at a reasonable computational cost. Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs. === Latent-dynamic conditional random field === Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively. In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} , the main problem the model must solve is how to assign a sequence of labels y = y 1 , … , y n {\displaystyle y_{1},\dots ,y_{n}} from one finite set

    Read more →
  • Collostructional analysis

    Collostructional analysis

    Collostructional analysis is a family of methods developed by (in alphabetical order) Stefan Th. Gries (University of California, Santa Barbara) and Anatol Stefanowitsch (Free University of Berlin). Collostructional analysis aims at measuring the degree of attraction or repulsion that words exhibit to constructions, where the notion of construction has so far been that of Goldberg's construction grammar. == Collostructional methods == Collostructional analysis so far comprises three different methods: collexeme analysis, to measure the degree of attraction/repulsion of a lemma to a slot in one particular construction; distinctive collexeme analysis, to measure the preference of a lemma to one particular construction over another, functionally similar construction; multiple distinctive collexeme analysis extends this approach to more than two alternative constructions; covarying collexeme analysis, to measure the degree of attraction of lemmas in one slot of a construction to lemmas in another slot of the same construction. == Input frequencies == Collostructional analysis requires frequencies of words and constructions and is similar to a wide variety of collocation statistics. It differs from raw frequency counts by providing not only observed co-occurrence frequencies of words and constructions, but also (i) a comparison of the observed frequency to the one expected by chance; thus, collostructional analysis can distinguish attraction and repulsion of words and constructions; (ii) a measure of the strength of the attraction or repulsion; this is usually the log-transformed p-value of a Fisher-Yates exact test. == Versus other collocation statistics == Collostructional analysis differs from most collocation statistics such that (i) it measures not the association of words to words, but of words to syntactic patterns or constructions; thus, it takes syntactic structure more seriously than most collocation-based analyses; (ii) it has so far only used the most precise statistics, namely the Fisher-Yates exact test based on the hypergeometric distribution; thus, unlike t-scores, z-scores, chi-square tests etc., the analysis is not based on, and does not violate, any distributional assumptions.

    Read more →
  • Ben Goertzel

    Ben Goertzel

    Ben Goertzel is a computer scientist, artificial intelligence (AI) researcher, and businessman. He helped popularize the term artificial general intelligence (AGI). == Early life and education == Three of Goertzel's Jewish great-grandparents immigrated to New York from Lithuania and Poland (in the Russian Empire). Goertzel's father is Ted Goertzel, a former professor of sociology at Rutgers University. Goertzel left high school after the tenth grade to attend Bard College at Simon's Rock, where he graduated with a bachelor's degree in Quantitative Studies. Goertzel graduated with a PhD in mathematics from Temple University under the supervision of Avi Lin in 1990, at age 23. == Career == Goertzel is the founder and CEO of SingularityNET, a project which was founded to distribute artificial intelligence data via blockchains. He is a leading developer of the OpenCog framework for artificial general intelligence. Goertzel was an associate and grant recipient of Jeffrey Epstein. He received a $100,000 grant from the Jeffrey Epstein Foundation for artificial general intelligence research in 2001. When interviewed by The New York Times about Epstein in 2019, Goertzel said, "I have no desire to talk about Epstein right now... The stuff I'm reading about him in the papers is pretty disturbing and goes way beyond what I thought his misdoings and kinks were. Yecch." === Sophia the Robot === Goertzel was the Chief Scientist of Hanson Robotics, the company that created the Sophia robot. As of 2018, Sophia's architecture includes scripting software, a chat system, and OpenCog, an AI system designed for general reasoning. Experts in the field have treated the project mostly as a PR stunt, stating that Hanson's claims that Sophia was "basically alive" are "grossly misleading" because the project does not involve AI technology, while computer scientist Yann LeCun, then Meta's chief AI scientist, made several unflattering remarks including calling the project "complete bullshit". === Views on AI === In May 2007, Goertzel spoke at a Google tech talk about his approach to creating artificial general intelligence. He defines intelligence as the ability to detect patterns in the world and in the agent itself, measurable in terms of emergent behavior of "achieving complex goals in complex environments". A "baby-like" artificial intelligence is initialized, then trained as an agent in a simulated or virtual world such as Second Life to produce a more powerful intelligence. Knowledge is represented in a network whose nodes and links carry probabilistic truth values as well as "attention values", with the attention values resembling the weights in a neural network. Several algorithms operate on this network, the central one being a combination of a probabilistic inference engine and a custom version of evolutionary programming. The 2012 documentary The Singularity by independent filmmaker Doug Wolens discussed Goertzel's views on AGI. In 2023 Goertzel postulated that artificial intelligence could replace up to 80 percent of human jobs in the coming years "without having an AGI, by my guess. Not with ChatGPT exactly as a product. But with systems of that nature". At the Web Summit 2023 in Rio de Janeiro, Goertzel spoke out against efforts to curb AI research and that AGI is only a few years away. Goertzel's belief is that AGI will be a net positive for humanity by assisting with societal problems such as, but not limited to, climate change.

    Read more →
  • AI Content Generators Reviews: What Actually Works in 2026

    AI Content Generators Reviews: What Actually Works in 2026

    In search of the best AI content generator? An AI content generator is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI content generator slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.

    Read more →
  • Cognitive computer

    Cognitive computer

    A cognitive computer is a computer that hardwires artificial intelligence and machine learning algorithms into an integrated circuit that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic engineering approach. Synonyms include neuromorphic chip and cognitive chip. In 2023, IBM's proof-of-concept NorthPole chip (optimized for 2-, 4- and 8-bit precision) achieved remarkable performance in image recognition. In 2013, IBM developed Watson, a cognitive computer that uses neural networks and deep learning techniques. The following year, it developed the 2014 TrueNorth microchip architecture which is designed to be closer in structure to the human brain than the von Neumann architecture used in conventional computers. In 2017, Intel also announced its version of a cognitive chip in "Loihi, which it intended to be available to university and research labs in 2018. Intel (most notably with its Pohoiki Beach and Springs systems), Qualcomm, and others are improving neuromorphic processors steadily. == IBM TrueNorth chip == TrueNorth was a neuromorphic CMOS integrated circuit produced by IBM in 2014. It is a manycore processor network on a chip design, with 4096 cores, each one having 256 programmable simulated neurons for a total of just over a million neurons. In turn, each neuron has 256 programmable "synapses" that convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). Its basic transistor count is 5.4 billion. In 2023 Zhejiang University and Alibaba developed Darwin a neuromorphic chip The darwin3 chip was designed around 2023 so it is fairly modern compared to IBM's TrueNorth or Intel's LoihI. === Details === Memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von Neumann-architecture bottleneck and is very energy-efficient, with IBM claiming a power consumption of 70 milliwatts and a power density that is 1/10,000th of conventional microprocessors. The SyNAPSE chip operates at lower temperatures and power because it only draws power necessary for computation. Skyrmions have been proposed as models of the synapse on a chip. The neurons are emulated using a Linear-Leak Integrate-and-Fire (LLIF) model, a simplification of the leaky integrate-and-fire model. According to IBM, it does not have a clock, operates on unary numbers, and computes by counting to a maximum of 19 bits. The cores are event-driven by using both synchronous and asynchronous logic, and are interconnected through an asynchronous packet-switched mesh network on chip (NOC). IBM developed a new network to program and use TrueNorth. It included a simulator, a new programming language, an integrated programming environment, and libraries. This lack of backward compatibility with any previous technology (e.g., C++ compilers) poses serious vendor lock-in risks and other adverse consequences that may prevent it from commercialization in the future. === Research === In 2018, a cluster of TrueNorth network-linked to a master computer was used in stereo vision research that attempted to extract the depth of rapidly moving objects in a scene. == IBM NorthPole chip == In 2023, IBM released its NorthPole chip, which is a proof-of-concept for dramatically improving performance by intertwining compute with memory on-chip, thus eliminating the Von Neumann bottleneck. It blends approaches from IBM's 2014 TrueNorth system with modern hardware designs to achieve speeds about 4,000 times faster than TrueNorth. It can run ResNet-50 or Yolo-v4 image recognition tasks about 22 times faster, with 25 times less energy and 5 times less space, when compared to GPUs which use the same 12-nm node process that it was fabricated with. It includes 224 MB of RAM and 256 processor cores and can perform 2,048 operations per core per cycle at 8-bit precision, and 8,192 operations at 2-bit precision. It runs at between 25 and 425 MHz. This is an inferencing chip, but it cannot yet handle GPT-4 because of memory and accuracy limitations == Intel Loihi chip == === Pohoiki Springs === Pohoiki Springs is a system that incorporates Intel's self-learning neuromorphic chip, named Loihi, introduced in 2017, perhaps named after the Hawaiian seamount Lōʻihi. Intel claims Loihi is about 1000 times more energy efficient than general-purpose computing systems used to train neural networks. In theory, Loihi supports both machine learning training and inference on the same silicon independently of a cloud connection, and more efficiently than convolutional neural networks or deep learning neural networks. Intel points to a system for monitoring a person's heartbeat, taking readings after events such as exercise or eating, and using the chip to normalize the data and work out the ‘normal’ heartbeat. It can then spot abnormalities and deal with new events or conditions. The first iteration of the chip was made using Intel's 14 nm fabrication process and houses 128 clusters of 1,024 artificial neurons each for a total of 131,072 simulated neurons. This offers around 130 million synapses, far less than the human brain's 800 trillion synapses, and behind IBM's TrueNorth. Loihi is available for research purposes among more than 40 academic research groups as a USB form factor. In October 2019, researchers from Rutgers University published a research paper to demonstrate the energy efficiency of Intel's Loihi in solving simultaneous localization and mapping. In March 2020, Intel and Cornell University published a research paper to demonstrate the ability of Intel's Loihi to recognize different hazardous materials, which could eventually aid to "diagnose diseases, detect weapons and explosives, find narcotics, and spot signs of smoke and carbon monoxide". === Pohoiki Beach === Intel's Loihi 2, named Pohoiki Beach, was released in September 2021 with 64 cores. It boasts faster speeds, higher-bandwidth inter-chip communications for enhanced scalability, increased capacity per chip, a more compact size due to process scaling, and improved programmability. === Hala Point === Hala Point packages 1,152 Loihi 2 processors produced on Intel 3 process node in a six-rack-unit chassis. The system supports up to 1.15 billion neurons and 128 billion synapses distributed over 140,544 neuromorphic processing cores, consuming 2,600 watts of power. It includes over 2,300 embedded x86 processors for ancillary computations. Intel claimed in 2024 that Hala Point was the world’s largest neuromorphic system. It uses Loihi 2 chips. It is claimed to offer 10x more neuron capacity and up to 12x higher performance. The Darwin3 chip exceeds these specs. Hala Point provides up to 20 quadrillion operations per second, (20 petaops), with efficiency exceeding 15 trillion (8-bit) operations s−1 W−1 on conventional deep neural networks. Hala Point integrates processing, memory and communication channels in a massively parallelized fabric, providing 16 PB s−1 of memory bandwidth, 3.5 PB s−1 of inter-core communication bandwidth, and 5 TB s−1 of inter-chip bandwidth. The system can process its 1.15 billion neurons 20 times faster than a human brain. Its neuron capacity is roughly equivalent to that of an owl brain or the cortex of a capuchin monkey. Loihi-based systems can perform inference and optimization using 100 times less energy at speeds as much as 50 times faster than CPU/GPU architectures. Intel claims that Hala Point can create LLMs. Much further research is needed == SpiNNaker == SpiNNaker (Spiking Neural Network Architecture) is a massively parallel, manycore supercomputer architecture designed by the Advanced Processor Technologies Research Group at the Department of Computer Science, University of Manchester. == Criticism == Critics argue that a room-sized computer – as in the case of IBM's Watson – is not a viable alternative to a three-pound human brain. Some also cite the difficulty for a single system to bring so many elements together, such as the disparate sources of information as well as computing resources. In 2021, The New York Times released Steve Lohr's article "What Ever Happened to IBM’s Watson?". He wrote about some costly failures of IBM Watson. One of them, a cancer-related project called the Oncology Expert Advisor, was abandoned in 2016 as a costly failure. During the collaboration, Watson could not use patient data. Watson struggled to decipher doctors’ notes and patient histories. The development of LLMs has placed a new emphasis on cognitive computers, because the Transformer technology that underpins LLMs demands huge energy for GPUs and PCs. Cognitive computers use significantly less energy, but the details of STDPs and neuron models cannot yet match the accuracy of backprop, and so ANN to SNN weight translations such as QAT and PQT or progressive quantization are becoming popular, with their own limitations.

    Read more →
  • How to Choose an AI Pair Programmer

    How to Choose an AI Pair Programmer

    In search of the best AI pair programmer? An AI pair programmer is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI pair programmer slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →