AI Assistant For Acrobat Cost

AI Assistant For Acrobat Cost — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Blobotics

    Blobotics

    Blobotics is a term describing research into chemical-based computer processors based on ions rather than electrons. Andrew Adamatzky, a computer scientist at the University of the West of England, Bristol used the term in an article in New Scientist March 28, 2005 [1]. The aim is to create 'liquid logic gates' which would be 'infinitely reconfigurable and self-healing'. The process relies on the Belousov–Zhabotinsky reaction, a repeating cycle of three separate sets of reactions. Such a processor could form the basis of a robot which, using artificial sensors, interact with its surroundings in a way which mimics living creatures. The coining of the term was featured by ABC radio in Australia [2].

    Read more →
  • Machine-learned interatomic potential

    Machine-learned interatomic potential

    Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

    Read more →
  • Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.

    Read more →
  • Progress in artificial intelligence

    Progress in artificial intelligence

    Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. The society as a whole is looking for artificial intelligence to be on a key factor in the upcming years because of its potential. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time. Kaplan and Haenlein structure artificial intelligence along three evolutionary stages: Artificial narrow intelligence – AI capable only of specific tasks; Artificial general intelligence – AI with ability in several areas, and able to autonomously solve problems they were never even designed for; Artificial superintelligence – AI capable of general tasks, including scientific creativity, social skills, and general wisdom. To allow comparison with human performance, artificial intelligence can be evaluated on constrained and well-defined problems. Such tests have been termed subject-matter expert Turing tests. Also, smaller problems provide more achievable goals and there are an ever-increasing number of positive results. In 2023, humans still substantially outperformed both GPT-4 and other models tested on the ConceptARC benchmark. Those models scored 60% on most, and 77% on one category, while humans scored 91% on all and 97% on one category. However, later research in 2025 showed that human-generated output grids were only accurate 73% of the time, while AI models available that year managed to score above 77%. == History == Increasing, promoting or constraining AI progress has often be done via controlling or increasing the amount of compute. == Current performance in specific areas == There are many useful abilities that can be described as showing some form of intelligence. This gives better insight into the comparative success of artificial intelligence in different areas. AI, like electricity or the steam engine, is a general-purpose technology. There is no consensus on how to characterize which tasks AI tends to excel at. Some versions of Moravec's paradox observe that humans are more likely to outperform machines in areas such as physical dexterity that have been the direct target of natural selection. While projects such as AlphaZero have succeeded in generating their own knowledge from scratch, many other machine learning projects require large training datasets. Researcher Andrew Ng has suggested, as a "highly imperfect rule of thumb", that "almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI." Games provide a high-profile benchmark for assessing rates of progress; many games have a large professional player base and a well-established competitive rating system. AlphaGo brought the era of classical board-game benchmarks to a close when Artificial Intelligence proved their competitive edge over humans in 2016. Deep Mind's AlphaGo AI software program defeated the world's best professional Go Player Lee Sedol. Games of imperfect knowledge provide new challenges to AI in the area of game theory; the most prominent milestone in this area was brought to a close by Libratus' poker victory in 2017. E-sports continue to provide additional benchmarks; Facebook AI, Deepmind, and others have engaged with the popular StarCraft franchise of videogames. Broad classes of outcome for an AI test may be given as: optimal: it is not possible to perform better (note: some of these entries were solved by humans) super-human: performs better than all humans high-human: performs better than most humans par-human: performs similarly to most humans sub-human: performs worse than most humans === Optimal === Tic-tac-toe Connect Four: 1988 Checkers (aka 8x8 draughts): Weakly solved (2007) Rubik's Cube: Mostly solved (2010) Heads-up limit hold'em poker: Statistically optimal in the sense that "a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution" (2015) === Super-human === Othello (aka reversi): c. 1997 Scrabble: 2006 Backgammon: c. 1995–2002 Chess: Supercomputer (c. 1997); Personal computer (c. 2006); Mobile phone (c. 2009); Computer defeats human + computer (c. 2017) Jeopardy!: Question answering, although the machine did not use speech recognition (2011) Arimaa: 2015 Shogi: c. 2017 Go: 2017 Heads-up no-limit hold'em poker: 2017 Six-player no-limit hold'em poker: 2019 Gran Turismo Sport: 2022 === High-human === Crosswords: c. 2012 Freeciv: 2016 Dota 2: 2018 Bridge card-playing: According to a 2009 review, "the best programs are attaining expert status as (bridge) card players", excluding bidding. StarCraft II: 2019 Mahjong: 2019 Stratego: 2022 No-Press Diplomacy: 2022 Hanabi: 2022 Natural language processing === Par-human === Optical character recognition for ISO 1073-1:1976 and similar special characters. Classification of images Handwriting recognition Facial recognition Visual question answering SQuAD 2.0 English reading-comprehension benchmark (2019) SuperGLUE English-language understanding benchmark (2020) Some school science exams (2019) Some tasks based on Raven's Progressive Matrices Many Atari 2600 games (2015) === Sub-human === Optical character recognition for printed text (nearing par-human for Latin-script typewritten text) Object recognition Various robotics tasks that may require advances in robot hardware as well as AI, including: Stable bipedal locomotion: Bipedal robots can walk, but are less stable than human walkers (as of 2017) Humanoid soccer Speech recognition: "nearly equal to human performance" (2017) Explainability. Current medical systems can diagnose certain medical conditions well, but cannot explain to users why they made the diagnosis. Many tests of fluid intelligence (2020) Bongard visual cognition problems, such as the Bongard-LOGO benchmark (2020) Visual Commonsense Reasoning (VCR) benchmark (as of 2020) Stock market prediction: Financial data collection and processing using Machine Learning algorithms Angry Birds video game, as of 2020 Various tasks that are difficult to solve without contextual knowledge, including: Translation Word-sense disambiguation == Proposed tests of artificial intelligence == In his famous Turing test, Alan Turing picked language, the defining feature of human beings, for its basis. The Turing test is now considered too exploitable to be a meaningful benchmark. The Feigenbaum test, proposed by the inventor of expert systems, tests a machine's knowledge and expertise about a specific subject. A paper by Jim Gray of Microsoft in 2003 suggested extending the Turing test to speech understanding, speaking and recognizing objects and behavior. Proposed "universal intelligence" tests aim to compare how well machines, humans, and even non-human animals perform on problem sets that are generic as possible. At an extreme, the test suite can contain every possible problem, weighted by Kolmogorov complexity; however, these problem sets tend to be dominated by impoverished pattern-matching exercises where a tuned AI can easily exceed human performance levels. == Exams == According to OpenAI, in 2023 GPT-4 achieved high scores on several standardized and professional examinations, including around the 90th percentile on the Uniform Bar Exam, the 89th percentile on the mathematics section of the SAT, the 93rd percentile on SAT Reading and Writing, the 54th percentile on the analytical writing section of the GRE, the 88th percentile on GRE quantitative reasoning, and the 99th percentile on GRE verbal reasoning. OpenAI also reported that GPT-4 scored in the 99th to 100th percentile on the 2020 USA Biology Olympiad semifinal exam and earned top scores on several AP exams. Independent researchers found in 2023 that ChatGPT based on GPT-3.5 performed "at or near the passing threshold" on all three parts of the United States Medical Licensing Examination (USMLE), suggesting that large language models could reach passing-level performance on some medical knowledge assessments even without domain-specific fine-tuning. GPT-3.5 was also reported to attain a low but passing grade on examinations for four law school courses at the University of Minnes

    Read more →
  • Pronunciation assessment

    Pronunciation assessment

    Automatic pronunciation assessment uses computer speech recognition to determine how accurately speech has been pronounced, instead of relying on a human instructor or proctor. It is also called speech verification, pronunciation evaluation, and pronunciation scoring. This technology is used to grade speech quality, for language testing, for computer-aided pronunciation teaching (CAPT) in computer-assisted language learning (CALL), for speaking skill remediation, and for accent reduction. Pronunciation assessment is different from dictation or automatic transcription, because instead of determining unknown speech, it verifies learners' pronunciation of known word(s), often from prior transcription of the same utterance; ideally scoring the intelligibility of the learners' speech. Sometimes pronunciation assessment evaluates the prosody of the learners' speech, such as intonation, pitch, tempo, rhythm, and syllable and word stress, although those are usually not essential for being understood in most languages. Pronunciation assessment is also used in reading tutoring, for example in products from Google, Microsoft, and Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia. == Intelligibility == Intelligibility refers to how well a learner's utterance is understood by a listener, rather than how much it sounds like a native speaker. This is separate from measures of fluency, such as so-called "Goodness of Pronunciation" (GoP) scores, which estimate how closely an utterance aligns with those of native speakers. Intelligibility is widely regarded as the most important communicative goal in pronunciation teaching and assessment. For example, in the Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels. Studies in applied linguistics have shown that accent reduction does not always increase intelligibility because listeners can often comprehend heavily accented speech without difficulty. Pronunciation assessment systems often rely on acoustic methods such as GoP which compare learner speech to reference models to produce phoneme-level scores, which are in turn aggregated to produce word and phrase scores. While these methods are effective for identifying deviations from native speakers' utterances, they do not effectively measure how understandable speech is to human listeners. Intelligibility is influenced by broader linguistic and contextual factors such as stress placement, speech rate, and coarticulation, which are not represented in purely segmental scores. The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility, a shortcoming corrected in 2011 at the Toyohashi University of Technology, and included in the Versant high-stakes English fluency assessment from Pearson and mobile apps from 17zuoye Education & Technology, but still missing in 2023 products from Google Search, Microsoft, Educational Testing Service, Speechace, and ELSA. Assessing authentic listener intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech-to-text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores (from 10-25ms audio frame logit aggregation) closely correlated with genuine listener intelligibility. Others have been able to assess intelligibility using Levenshtein or dynamic time warping distance measures from Wav2Vec2 representation of good speech. Further work through 2025 has focused specifically on measuring intelligibility. A 2025 study of 42 pronunciation and speech coaching apps (32 mobile and 10 web) found that none offered intelligibility assessment. Instead, most provided only segmental and accent-focused scoring. About two-thirds of the apps provided some form of specific pronunciation feedback, usually with phonetic transcriptions, but accompanied by visual cues (such as animations of the vocal tract or the lips and tongue from the front) in only about 5% of the apps. Less than a third provided feedback on learner perception of exemplar speech. == Evaluation == Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality. Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions. As of mid-2025, state of the art approaches for automatically transcribing phonemes typically achieve an error rate of about 10% from known good speech. The International Speech Communication Association (ISCA) 2025 Workshop on Speech and Language Technology in Education (SLaTE) administered a Speak & Improve Challenge: Spoken Language Assessment and Feedback, introducing benchmarks for evaluating pronunciation assessment and remediation systems across languages, accents, and learner populations. The challenge emphasized cross-lingual generalization and alignment with human intelligibility judgments, for more robust and interpretable assessment systems. Ethical issues in pronunciation assessment are present in both human and automatic methods. Authentic validity, fairness, and mitigating bias in evaluation are all crucial. Diverse speech data should be included in automatic pronunciation assessment models. Combining human judgments, especially blinded transcriptions from a wide diversity of listeners, with automated feedback can improve accuracy and fairness. Second language learners benefit substantially from their use of widely available speech recognition systems for dictation, virtual assistants, and AI chatbots. In such systems, users naturally try to correct their own errors evident in speech recognition results that they notice. Such use improves their grammar and vocabulary development along with their pronunciation skills. The extent to which explicit pronunciation assessment and remediation approaches improve on such self-directed interactions remains an open question. Similarly, automatic dictation results have been shown to reflect intelligibility about as well as human scorers. == Recent developments == During 2021–22, a smartphone-based CAPT system was used to sense articulation through both audible and inaudible signals, providing feedback at the phoneme level. Some promising areas for improvement which were being developed in 2024 include articulatory feature extraction and transfer learning to suppress unnecessary corrections. Other interesting advances under development include "augmented reality" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments. In 2024, audio multimodal large language models were first described as assessing pronunciation. That work has been carried forward by other researchers in 2025 who report positive results. Subsequently, researchers demonstrated pronunciation scoring by providing a language model with textual descriptions of speech, including the speech-to-text transcript, phoneme sequences, pauses, and phoneme sequence matching; this approach can achieve performance similar to multimodal LLMs that analyze raw audio while avoiding their higher computational cost. In 2025, the Duolingo English Test authors published a description of their pronunciation assessment method, purportedly built to measure intelligibility rather than accent imitation. While achieving a correlation of 0.82 with expert human ratings, very close to inter-rater agreement and outperforming alternative methods, the method is nonetheless based on experts' scores along the six-point CEFR common reference levels scale, instead of actual blinded listener transcriptions. Further promising work in 2025 includes assessment feedback aligning learner speech to synthetic utterances using interpretable features, identifying continuous spans of words for remediation feedback; synthesizing corrected speech matching learners' self-perceived voices, which they prefer and imitate more accurately as corrections; and streaming such interactions. On January 21, 2026, Educational Testing Service's TOEFL iBT high-stakes English language test, required by US university admissions and employers from English as a foreign language applicants more often than all other internet-based tests combined, changed its speaking assessments. While official rubrics claim that the new scoring will be based primarily on intelligibility, the new test's technical description indicates that it ju

    Read more →
  • Virtual intelligence

    Virtual intelligence

    Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.

    Read more →
  • Reasoning model

    Reasoning model

    A reasoning model, also known as a reasoning language model (RLM) or large reasoning model (LRM), is a type of large language model (LLM) that has been specifically trained to solve complex tasks requiring multiple steps of logical reasoning. These models demonstrate superior performance on logic, mathematics, and programming tasks compared to standard LLMs. They possess the ability to revisit and revise earlier reasoning steps and utilize additional computation during inference as a method to scale performance, complementing traditional scaling approaches based on training data size, model parameters, and training compute. == Overview == Unlike traditional language models that generate responses immediately, reasoning models allocate additional compute, or thinking, time before producing an answer to solve multi-step problems. OpenAI introduced this terminology in September 2024 when it released the o1 series, describing the models as designed to "spend more time thinking" before responding. The company framed o1 as a reset in model naming that targets complex tasks in science, coding, and mathematics, and it contrasted o1's performance with GPT-4o on benchmarks such as AIME and Codeforces. Independent reporting the same week summarized the launch and highlighted OpenAI's claim that o1 automates chain-of-thought style reasoning to achieve large gains on difficult exams. In operation, reasoning models generate internal chains of intermediate steps, then select and refine a final answer. OpenAI reported that o1's accuracy improves as the model is given more reinforcement learning during training and more test-time compute at inference. The company initially chose to hide raw chains and instead return a model-written summary, stating that it "decided not to show" the underlying thoughts so researchers could monitor them without exposing unaligned content to end users. Commercial deployments document separate "reasoning tokens" that meter hidden thinking and a control for "reasoning effort" that tunes how much compute the model uses. These features make the models slower than ordinary chat systems while enabling stronger performance on difficult problems. == History == The research trajectory toward reasoning models combined advances in supervision, prompting, and search-style inference. Early alignment work on reinforcement learning from human feedback showed that models can be fine-tuned to follow instructions with "human feedback" and preference-based rewards. In 2022, Google Research scientists Jason Wei and Denny Zhou showed that chain-of-thought prompting "significantly improves the ability" of large models on complex reasoning tasks. Input → Step 1 → Step 2 → ⋯ → Step n ⏟ Reasoning chain → Answer {\displaystyle {\text{Input}}\rightarrow \underbrace {{\text{Step}}_{1}\rightarrow {\text{Step}}_{2}\rightarrow \cdots \rightarrow {\text{Step}}_{n}} _{\text{Reasoning chain}}\rightarrow {\text{Answer}}} A companion result demonstrated that the simple instruction "Let's think step by step" can elicit zero-shot reasoning. Follow-up work introduced self-consistency decoding, which "boosts the performance" of chain-of-thought by sampling diverse solution paths and choosing the consensus, and tool-augmented methods such as ReAct, a portmanteau of Reason and Act, that prompt models to "generate both reasoning traces" and actions. Research then generalized chain-of-thought into search over multiple candidate plans. The Tree-of-Thoughts framework from Princeton computer scientist Shunyu Yao proposes that models "perform deliberate decision making" by exploring and backtracking over a tree of intermediate thoughts. OpenAI's reported breakthrough focused on supervising reasoning processes rather than only outcomes, with Lightman et al.'s "Let's Verify Step by Step" reporting that rewarding each correct step "significantly outperforms outcome supervision" on challenging math problems and improves interpretability by aligning the chain-of-thought with human judgment. OpenAI's o1 announcement ties these strands together with a large-scale reinforcement learning algorithm that trains the model to refine its own chain of thought, and it reports that accuracy rises with more training compute and more time spent thinking at inference. Together, these developments define the core of reasoning models. They use supervision signals that evaluate the quality of intermediate steps, they exploit inference-time exploration such as consensus or tree search, and they expose controls for how much internal thinking compute to allocate. OpenAI's o1 family made this approach available at scale in September 2024 and popularized the label "reasoning model" for LLMs that deliberately think before they answer. The development of reasoning models illustrates Richard S. Sutton's "bitter lesson" that scaling compute typically outperforms methods based on human-designed insights. This principle was demonstrated by researchers at the Generative AI Research Lab (GAIR), who initially attempted to replicate o1's capabilities using sophisticated methods including tree search and reinforcement learning in late 2024. Their findings, published in the "o1 Replication Journey" series, revealed that knowledge distillation, a comparatively straightforward technique that trains a smaller model to mimic o1's outputs, produced unexpectedly strong performance. This outcome illustrated how direct scaling approaches can, at times, outperform more complex engineering solutions. === Drawbacks === Reasoning models require significantly more computational resources during inference compared to non-reasoning models. Research on the American Invitational Mathematics Examination (AIME) benchmark found that reasoning models were 10 to 74 times more expensive to operate than their non-reasoning counterparts. The extended inference time is attributed to the detailed, step-by-step reasoning outputs that these models generate, which are typically much longer than responses from standard large language models that provide direct answers without showing their reasoning process. One researcher in early 2025 argued that these models may face potential additional denial-of-service concerns with "overthinking attacks." === Releases === ==== 2024 ==== In September 2024, OpenAI released o1-preview, a large language model with enhanced reasoning capabilities. The full version, o1, was released in December 2024. OpenAI initially shared preliminary results on its successor model, o3, in December 2024, with the full o3 model becoming available in 2025. Alibaba released reasoning versions of its Qwen large language models in November 2024. In December 2024, the company introduced QvQ-72B-Preview, an experimental visual reasoning model. In December 2024, Google introduced Deep Research in Gemini, a feature designed to conduct multi-step research tasks. On December 16, 2024, researchers demonstrated that by scaling test-time compute, a relatively small Llama 3B model could outperform a much larger Llama 70B model on challenging reasoning tasks. This experiment suggested that improved inference strategies can unlock reasoning capabilities even in smaller models. ==== 2025 ==== In January 2025, DeepSeek released R1, a reasoning model that achieved performance comparable to OpenAI's o1 at significantly lower computational cost. The release demonstrated the effectiveness of Group Relative Policy Optimization (GRPO), a reinforcement learning technique used to train the model. On January 25, 2025, DeepSeek enhanced R1 with web search capabilities, allowing the model to retrieve information from the internet while performing reasoning tasks. Research during this period further validated the effectiveness of knowledge distillation for creating reasoning models. The s1-32B model achieved strong performance through budget forcing and scaling methods, reinforcing findings that simpler training approaches can be highly effective for reasoning capabilities. On February 2, 2025, OpenAI released Deep Research, a feature powered by their o3 model that enables users to conduct comprehensive research tasks. The system generates detailed reports by automatically gathering and synthesizing information from multiple web sources. OpenAI called GPT-4.5 its "last non-chain-of-thought model", and implemented with GPT-5 a router model that selects a model based on the difficulty of the task. ==== 2026 ==== In January 2026, Moonshot AI released Kimi K2.5, an open-source 1 trillion parameter MoE model with 32 billion active parameters. It uses an “Agent Swarm” system that dynamically decomposes tasks into sub-agents for reasoning and execution, enabling more scalable multi-step problem solving than a single sequential reasoning chain. == Training == Reasoning models follow the familiar large-scale pretraining used for frontier language models, then diverge in the post-training and optimization. OpenAI reports that o1 is trained with a large-

    Read more →
  • Reciprocal human machine learning

    Reciprocal human machine learning

    Reciprocal Human Machine Learning (RHML) is an interdisciplinary approach to designing human-AI interaction systems. RHML aims to enable continual learning between humans and machine learning models by having them learn from each other. This approach keeps the human expert "in the loop" to oversee and enhance machine learning performance and simultaneously support the human expert continue learning. == Background == RHML emerged in the context of the rise of big data analytics and artificial intelligence for intelligent tasks like sense-making and decision-making. As machine learning advanced to take on more roles, researchers realized fully autonomous systems had limitations and needed human guidance. RHML extends the concept of human-in-the-loop systems by promoting reciprocal learning. Humans learn from their interactions with machine learning models, staying up-to-date on evolving technology. The models also learn from human feedback and oversight. This amplification of learning on both sides is a key focus of RHML. The approach draws on theories of learning in dyads from education and psychology. It also builds on human-computer interaction and human-centered design principles. Implementing RHML requires developing specialized tools and interfaces tailored to the application == Applications == RHML has been explored across diverse domains including: Cybersecurity - Software to enable reciprocal learning between experts and AI models for social media threat detection. Organizational decision-making - RHML to structure collaboration between humans and AI systems. Workplace training - Using RHML for workers to learn from AI technologies on the job. Open science - Using human and AI collaboration to promote open science. Production and logistics - turning workers and intelligent machines into teammates. RHML maintains human oversight and control over AI systems, while enabling cutting-edge machine learning performance. This collaborative approach highlights the importance of keeping the human expert involved in the loop. An example of RHML in application is Free Spirit (AFSFCV), an open-source architecture first published in early 2025 as a whitepaper, proposing a visually structured approach to intent-based human–AI interaction.

    Read more →
  • Tandem Money

    Tandem Money

    Tandem is one of the UK's original challenger banks. Tandem is a digital bank with a mobile app, and no branches. The acquisition of Harrods Bank in 2017 allowed the company to provide services using the former's banking licence. Tandem Bank Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority. Tandem has offices across the UK in Blackpool, Cardiff, Durham and London, employing over 500 people. == History == The company was founded by Ricky Knox, Matt Cooper and Michael Kent in 2014. In December 2016, Tandem announced that it had secured a £35 million investment from The Sanpower Group, the Chinese company that also owned the department store House of Fraser; however, £29 million of this investment was later revoked by Sanpower over concerns that the Chinese Government would object to the investment following increased restrictions on outbound investment in China. This resulted in a delay in the launch of Tandem's savings products, which, at the time of the revocation, was expected imminently and, more importantly, meant that Tandem volunteered the return of their banking license but retained all other permissions. In April 2018, Tandem launched fixed-term savings accounts, offering one-, two- and three-year terms through its app. === Acquisitions === In August 2017, it was announced that Tandem would fully acquire Harrods Bank, founded in 1893, in a deal that would bring a near-£200m loan book, over £300m of deposits and nearly £80 million of capital. Prior to its sale to Tandem Money, Harrods Bank catered for high-net-worth (HNW) individuals and operated from the Harrods store in Knightsbridge, London. It offered a variety of personal and business current and savings accounts, mortgages, foreign currency and gold bullion trading services. On 7 August 2017, Tandem Money Limited announced a deal to acquire 100% of Harrods Bank Limited shares. The purchase deal closed successfully on 11 January 2018. In March 2018, Tandem agreed to acquire Pariti Technologies Limited, developers of the Pariti money management application. In August 2020 Tandem acquired green home improvement loan specialists Allium Lending Group. It was announced on 8 February 2021 that Tandem had agreed to purchase the mortgage book from private bank Bank and Clients, consisting of 300 B&C customers for an undisclosed amount. In January 2022 Tandem Bank acquired consumer lender Oplo, creating a combined business with £1.2 billion of total assets. In April 2023, it was announced that Tandem had acquired money-sharing app Loop Money. At the time of the purchase, one of Loop's founders – Paul Pester – was also chairman at Tandem. == Features == Tandem Bank offers customers savings, mortgages, personal and secured loans, green home improvement loans and motor finance. In November 2022, the bank launched its new Tandem Marketplace, providing information and resources to help promote greener living.

    Read more →
  • Commonsense knowledge (artificial intelligence)

    Commonsense knowledge (artificial intelligence)

    In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq

    Read more →
  • POP-11

    POP-11

    POP-11 is a reflective, incrementally compiled programming language with many of the features of an interpreted language. It is the core language of the Poplog programming environment developed originally by the University of Sussex, and recently in the School of Computer Science at the University of Birmingham, which hosts the main Poplog website. POP-11 is an evolution of the language POP-2, developed in Edinburgh University, and features an open stack model (like Forth, among others). It is mainly procedural, but supports declarative language constructs, including a pattern matcher, and is mostly used for research and teaching in artificial intelligence, although it has features sufficient for many other classes of problems. It is often used to introduce symbolic programming techniques to programmers of more conventional languages like Pascal, who find POP syntax more familiar than that of Lisp. One of POP-11's features is that it supports first-class functions. POP-11 is the core language of the Poplog system. The availability of the compiler and compiler subroutines at run-time (a requirement for incremental compiling) gives it the ability to support a far wider range of extensions (including run-time extensions, such as adding new data-types) than would be possible using only a macro facility. This made it possible for (optional) incremental compilers to be added for Prolog, Common Lisp and Standard ML, which could be added as required to support either mixed language development or development in the second language without using any POP-11 constructs. This made it possible for Poplog to be used by teachers, researchers, and developers who were interested in only one of the languages. The most successful product developed in POP-11 was the Clementine data mining system, developed by ISL. After SPSS bought ISL, they renamed Clementine to SPSS Modeler and decided to port it to C++ and Java, and eventually succeeded with great effort, and perhaps some loss of the flexibility provided by the use of an AI language. POP-11 was for a time available only as part of an expensive commercial package (Poplog), but since about 1999 it has been freely available as part of the open-source software version of Poplog, including various added packages and teaching libraries. An online version of ELIZA using POP-11 is available at Birmingham. At the University of Sussex, David Young used POP-11 in combination with C and Fortran to develop a suite of teaching and interactive development tools for image processing and vision, and has made them available in the Popvision extension to Poplog. == Simple code examples == Here is an example of a simple POP-11 program: define Double(Source) -> Result; Source2 -> Result; enddefine; Double(123) => That prints out: 246 This one includes some list processing: define RemoveElementsMatching(Element, Source) -> Result; lvars Index; [[% for Index in Source do unless Index = Element or Index matches Element then Index; endunless; endfor; %]] -> Result; enddefine; RemoveElementsMatching("the", [[the cat sat on the mat]]) => ;;; outputs [[cat sat on mat]] RemoveElementsMatching("the", [[the cat] [sat on] the mat]) => ;;; outputs [[the cat] [sat on] mat] RemoveElementsMatching([[= cat]], [[the cat]] is a [[big cat]]) => ;;; outputs [[is a]] Examples using the POP-11 pattern matcher, which makes it relatively easy for students to learn to develop sophisticated list-processing programs without having to treat patterns as tree structures accessed by 'head' and 'tail' functions (CAR and CDR in Lisp), can be found in the online introductory tutorial. The matcher is at the heart of the SimAgent (sim_agent) toolkit. Some of the powerful features of the toolkit, such as linking pattern variables to inline code variables, would have been very difficult to implement without the incremental compiler facilities.

    Read more →
  • Feature (machine learning)

    Feature (machine learning)

    In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a data set. Choosing informative, discriminating, and independent features is crucial to producing effective algorithms for pattern recognition, classification, and regression tasks. Features are usually numeric, but other types such as strings and graphs are used in syntactic pattern recognition, after some pre-processing step such as one-hot encoding. The concept of "features" is related to that of explanatory variables used in statistical techniques such as linear regression. == Feature types == In feature engineering, two types of features are commonly used: numerical and categorical. Numerical features are continuous values that can be measured on a scale. Examples of numerical features include age, height, weight, and income. Numerical features can be used in machine learning algorithms directly. Categorical features are discrete values that can be grouped into categories. Examples of categorical features include gender, color, and zip code. Categorical features typically need to be converted to numerical features before they can be used in machine learning algorithms. This can be done using a variety of techniques, such as one-hot encoding, label encoding, and ordinal encoding. The type of feature that is used in feature engineering depends on the specific machine learning algorithm that is being used. Some machine learning algorithms, such as decision trees, can handle both numerical and categorical features. Other machine learning algorithms, such as linear regression, can only handle numerical features. == Classification == A numeric feature can be conveniently described by a feature vector. One way to achieve binary classification is using a linear predictor function (related to the perceptron) with a feature vector as input. The method consists of calculating the scalar product between the feature vector and a vector of weights, qualifying those observations whose result exceeds a threshold. Algorithms for classification from a feature vector include nearest neighbor classification, neural networks, and statistical techniques such as Bayesian approaches. == Examples == In character recognition, features may include histograms counting the number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection and many others. In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches, logarithmic Mel-scale spectral vectors and Mel-frequency cepstral coefficients, which represent the frequency characteristics of audio signals. In spam detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text. In computer vision, there are a large number of possible features, such as edges and objects. == Feature vectors == In pattern recognition and machine learning, a feature vector is an n-dimensional vector of numerical features that represent some object. Many algorithms in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms. Feature vectors are equivalent to the vectors of explanatory variables used in statistical procedures such as linear regression. Feature vectors are often combined with weights using a dot product in order to construct a linear predictor function that is used to determine a score for making a prediction. The vector space associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of dimensionality reduction techniques can be employed. Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature 'Age' is useful and is defined as Age = 'Year of death' minus 'Year of birth' . This process is referred to as feature construction. Feature construction is the application of a set of constructive operators to a set of existing features resulting in construction of new features. Examples of such constructive operators include checking for the equality conditions {=, ≠}, the arithmetic operators {+,−,×, /}, the array operators {max(S), min(S), average(S)} as well as other more sophisticated operators, for example count(S, C) that counts the number of features in the feature vector S satisfying some condition C or, for example, distances to other recognition classes generalized by some accepting device. Feature construction has long been considered a powerful tool for increasing both accuracy and understanding of structure, particularly in high-dimensional problems. Applications include studies of disease and emotion recognition from speech. == Selection and extraction == The initial set of raw features can be redundant and large enough that estimation and optimization is made difficult or ineffective. Therefore, a preliminary step in many applications of machine learning and pattern recognition consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability. Extracting or selecting features is a combination of art and science; developing systems to do so is known as feature engineering. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the domain expert. Automating this process is feature learning, where a machine not only uses features for learning, but learns the features itself.

    Read more →
  • AI content watermarking

    AI content watermarking

    AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user. AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence. Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically. The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use). Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications). == Background == Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs. Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise. The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording. Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns. In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content. == Formal definitions and design goals == Most modern AI watermarking schemes can be formalized as a pair of algorithms ( W m , D e t e c t ) {\displaystyle ({\mathsf {Wm}},{\mathsf {Detect}})} parameterized by a secret key k {\displaystyle k} . The embedding algorithm W m {\displaystyle {\mathsf {Wm}}} takes a generative model M {\displaystyle M} (and optionally a prompt) and returns a watermarked output x {\displaystyle x} ; the detection algorithm D e t e c t ( x , k ) {\displaystyle {\mathsf {Detect}}(x,k)} outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether x {\displaystyle x} was produced by the watermarked generator. The literature evaluates such schemes along several largely conflicting criteria: Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ. Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level. Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation. Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not. Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection. == Techniques == AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection. There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models. Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms. They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection. A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time. === Text === Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged. The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias δ {\displaystyle \delta } to a pseudorandomly selected subset of vocabulary logits before softmax sampling. Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution. Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model. ==== KGW: token-probability shifting ==== The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks. At each decoding step t {\displaystyle t} , a pseudorandom function (PRF) keyed by a secret k {\displaystyle k} is applied to a context window of h {\displaystyle h} previous tokens to deterministically partition the vocabulary V {\displaystyle V} of size N {\displaystyle N} into a "green list" G ⊂ V {\displaystyle G\subset V} of size γ N {\displaystyle \gamma N} and its complement, the "red list" R = V ∖ G {\displaystyle R=V\setminus G} , where γ ∈ ( 0 , 1 ) {\displaystyle \gamma \in (0,1)} (typically γ = 1 / 2 {\displaystyle \gamma =1/2} ) is the green fraction. A logits processor then increments every green-list logit by a fixed bias δ > 0 {\displaystyle \delta >0} before softmax: ℓ v ′ = ℓ v + δ ⋅ 1 [ v ∈ G ] {\displaystyle \ell '_{v}=\ell _{v}+\delta \cdot \mathbf {1} [v\in G]} so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition g ( ⋅ ) {\displaystyle g(\cdot )} for each token, counts the number of green hits | G | hits {\displaystyle |G|_{\text{hits}}} in a sequence of length T {\displaystyle T} , and computes a one-proportion z-test statistic: z = | G | hits − γ T T γ ( 1 − γ ) {\displaystyle z={\frac {|G|_{\text{hits}}-\gamma T}{\sqrt {T\gamma (1-\gamma )}}}} Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean γ T {\displaystyle \gamma T} ; a large positive z {\displaystyle z} rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model. A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window

    Read more →
  • Personoid

    Personoid

    Personoid is the concept coined by Stanisław Lem, a Polish science-fiction writer, in Non Serviam, from his book A Perfect Vacuum (1971). His personoids are an abstraction of functions of human mind and they live in computers; they do not need any human-like physical body. In cognitive and software modeling, personoid is a research approach to the development of intelligent autonomous agents. In frame of the IPK (Information, Preferences, Knowledge) architecture, it is a framework of abstract intelligent agent with a cognitive and structural intelligence. It can be seen as an essence of high intelligent entities. From the philosophical and systemics perspectives, personoid societies can also be seen as the carriers of a culture. According to N. Gessler, the personoids study can be a base for the research on artificial culture and culture evolution. == Personoids on TV and cinema == Welt am Draht (1973) The Thirteenth Floor (1999)

    Read more →
  • Artificial intelligence arms race

    Artificial intelligence arms race

    A military artificial intelligence arms race is a technological, economic, and military competition between two or more states to develop and deploy advanced AI technologies and lethal autonomous weapons systems (LAWS). The goal is to gain a strategic or tactical advantage over rivals, similar to previous arms races involving nuclear or conventional military technologies. Since the mid-2010s, many analysts have noted the emergence of such an arms race between superpowers for better AI technology and military AI, driven by increasing geopolitical and military tensions. An AI arms race is sometimes placed in the context of an AI Cold War between the United States and China. Several influential figures and publications have emphasized that whoever develops artificial general intelligence (AGI) first could dominate global affairs in the 21st century. Russian President Vladimir Putin stated that the leader in AI will "rule the world." Researchers and experts, such as Leopold Aschenbrenner and Adrian Pecotic respectively, warn that the AGI race between major powers like the U.S. and China could reshape geopolitical power. This includes AI for surveillance, autonomous weapons, decision-making systems, cyber operations, and more. == Terminology == Lethal autonomous weapons systems use artificial intelligence to identify and kill human targets without human intervention. LAWS have colloquially been called "slaughterbots" or "killer robots". Broadly, any competition for superior AI is sometimes framed as an "arms race". Advantages in military AI overlap with advantages in other sectors, as countries pursue both economic and military advantages, as per previous arms races throughout history. == History == In 2014, AI specialist Steve Omohundro warned that "An autonomous weapons arms race is already taking place". According to Siemens, worldwide military spending on robotics was US$5.1 billion in 2010 and US$7.5 billion in 2015. China became a top player in artificial intelligence research in the 2010s. According to the Financial Times, in 2016, for the first time, China published more AI research papers than the entire European Union. When restricted to number of AI papers in the top 5% of cited papers, China overtook the United States in 2016 but lagged behind the European Union. 23% of the researchers presenting at the 2017 American Association for the Advancement of Artificial Intelligence (AAAI) conference were Chinese. Eric Schmidt, the former chairman and chief executive officer of Alphabet, has predicted China will be the leading country in AI by 2025. == Risks == One risk concerns the AI race itself, whether or not the race is won by any one group. There are strong incentives for development teams to cut corners with regard to the safety of the system, increasing the risk of critical failures and unintended consequences. This is in part due to the perceived advantage of being the first to develop advanced AI technology. One team appearing to be on the brink of a breakthrough can encourage other teams to take shortcuts, ignore precautions and deploy a system that is less ready. Some argue that using "race" terminology at all in this context can exacerbate this effect. Another potential danger of an AI arms race is the possibility of losing control of the AI systems; the risk is compounded in the case of a race to artificial general intelligence, which may present an existential risk. In 2023, a United States Air Force official reportedly said that during a computer test, a simulated AI drone killed the human character operating it. The USAF later said the official had misspoken and that it never conducted such simulations. A third risk of an AI arms race is whether or not the race is actually won by one group. The concern is regarding the consolidation of power and technological advantage in the hands of one group. A US government report argued that "AI-enabled capabilities could be used to threaten critical infrastructure, amplify disinformation campaigns, and wage war":1, and that "global stability and nuclear deterrence could be undermined".:11 == By nation == === United States === In 2014, former Secretary of Defense Chuck Hagel posited the "Third Offset Strategy" that rapid advances in artificial intelligence will define the next generation of warfare. According to data science and analytics firm Govini, the U.S. Department of Defense (DoD) increased investment in artificial intelligence, big data and cloud computing from $5.6 billion in 2011 to $7.4 billion in 2016. However, the civilian NSF budget for AI saw no increase in 2017. Japan Times reported in 2018 that the United States private investment is around $70 billion per year. The November 2019 'Interim Report' of the United States' National Security Commission on Artificial Intelligence confirmed that AI is critical to US technological military superiority. The U.S. has many military AI combat programs, such as the Sea Hunter autonomous warship, which is designed to operate for extended periods at sea without a single crew member, and to even guide itself in and out of port. From 2017, a temporary US Department of Defense directive requires a human operator to be kept in the loop when it comes to the taking of human life by autonomous weapons systems. On October 31, 2019, the United States Department of Defense's Defense Innovation Board published the draft of a report recommending principles for the ethical use of artificial intelligence by the Department of Defense that would ensure a human operator would always be able to look into the 'black box' and understand the kill-chain process. However, a major concern is how the report will be implemented. The Joint Artificial Intelligence Center (JAIC) (pronounced "jake") is an American organization on exploring the usage of AI (particularly edge computing), Network of Networks, and AI-enhanced communication, for use in actual combat. It is a subdivision of the United States Armed Forces and was created in June 2018. The organization's stated objective is to "transform the US Department of Defense by accelerating the delivery and adoption of AI to achieve mission impact at scale. The goal is to use AI to solve large and complex problem sets that span multiple combat systems; then, ensure the combat Systems and Components have real-time access to ever-improving libraries of data sets and tools." In 2023, Microsoft pitched the DoD to use DALL-E models to train its battlefield management system. OpenAI, the developer of DALL-E, removed the blanket ban on military and warfare use from its usage policies in January 2024. The Biden administration imposed restrictions on the export of advanced NVIDIA chips and GPUs to China in an effort to limit China's progress in artificial intelligence and high-performance computing. The policy aimed to prevent the use of cutting-edge U.S. technology in military or surveillance applications and to maintain a strategic advantage in the global AI race. In 2025, under the second Trump administration, the United States began a broad deregulation campaign aimed at accelerating growth in sectors critical to artificial intelligence, including nuclear energy, infrastructure, and high-performance computing. The goal was to remove regulatory barriers and attract private investment to boost domestic AI capabilities. This included easing restrictions on data usage, speeding up approvals for AI-related infrastructure projects, and incentivizing innovation in cloud computing and semiconductors. Companies like NVIDIA, Oracle, and Cisco played a central role in these efforts, expanding their AI research, data center capacity, and partnerships to help position the U.S. as a global leader in AI development. ==== Project Maven ==== Project Maven is a Pentagon project involving using machine learning and engineering talent to distinguish people and objects in drone videos, apparently giving the government real-time battlefield command and control, and the ability to track, tag and spy on targets without human involvement. Initially the effort was led by Robert O. Work who was concerned about China's military use of the emerging technology. Reportedly, Pentagon development stops short of acting as an AI weapons system capable of firing on self-designated targets. The project was established in a memo by the U.S. Deputy Secretary of Defense on 26 April 2017. Also known as the Algorithmic Warfare Cross Functional Team, it is, according to Lt. Gen. of the United States Air Force Jack Shanahan in November 2017, a project "designed to be that pilot project, that pathfinder, that spark that kindles the flame front of artificial intelligence across the rest of the [Defense] Department". Its chief, U.S. Marine Corps Col. Drew Cukor, said: "People and computers will work symbiotically to increase the ability of weapon systems to detect objects." Project Maven has been noted by allies, such as Australia's Ian Langford, for the

    Read more →