Georgetown–IBM experiment

The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English. == Background == Conceived and performed primarily in order to attract governmental and public interest and funding by showing the possibilities of machine translation, it was by no means a fully featured system: It had only six grammar rules and 250 lexical items in its vocabulary (of stems and endings). Words in the vocabulary were in the fields of politics, law, mathematics, chemistry, metallurgy, communications and military affairs. Vocabulary was punched onto punch cards. This complete dictionary was never fully shown (only the extended one from Garvin's article). Apart from general topics, the system was specialized in the domain of organic chemistry. The translation was carried out using an IBM 701 mainframe computer (launched in April 1953). The Georgetown-IBM experiment is the best-known result of the MIT conference in June 1952 to which all active researchers in the machine translation field were invited. At the conference, Duncan Harkin from US Department of Defense suggested that his department would finance a new machine translation project. Jerome Weisner supported the idea and offered finance from the Research Laboratory of Electronics at MIT. Leon Dostert had been invited to the project for his previous experience with the automatic correction of translations (back then 'mechanical translation'); his interpretation system had a strong impact on the Nuremberg War Crimes Tribunal. The linguistics part of the demonstration was carried out for the most part by linguist Paul Garvin who had also good knowledge of Russian. Over 60 Romanized Russian statements from a wide range of political, legal, mathematical, and scientific topics were entered into the machine by a computer operator who knew no Russian, and the resulting English translations appeared on a printer. The sentences to be translated were carefully selected. Many operations for the demonstration were fitted to specific words and sentences. In addition, there was no relational or sentence analysis which could recognize the sentence structure. The approach was mostly 'lexicographical' based on a dictionary where a specific word had a connection with specific rules and steps. == Algorithm == The algorithm first translates Russian words into numerical codes, then performs the following case-analysis on each numerical code to choose between possible English word translations, reorder the English words, or omit some English words. The flowchart of the algorithm is reproduced in (see Table 1 for the 6 rules). == Translation examples == How it analyzes Vyelyichyina ugla opryedyelyayetsya otnoshyenyiyem dlyini dugi k radyiusu (figure 2 of ). == Reception == Well publicized by journalists and perceived as a success, the experiment did encourage governments to invest in computational linguistics. The authors claimed that within three or five years, machine translation could well be a solved problem. However, the real progress was much slower, and after the ALPAC report in 1966, which found that the ten years of long research had failed to fulfill the expectations, funding was reduced dramatically. The demonstration was given widespread coverage in the foreign press, but only a small fraction of journalists drew attention to previous machine translation attempts.

Stripe, Inc.

Stripe, Inc. is an Irish and American multinational financial services and software as a service (SaaS) company dual-headquartered in South San Francisco, California, United States, and Dublin, Ireland. The company primarily offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications. Stripe is the largest privately owned financial technology company with a valuation of about $159 billion and over $1.9 trillion in payment volume processed in 2025, processing transactions for 5 million businesses in that year. == History == Irish entrepreneur brothers John and Patrick Collison founded Stripe in Palo Alto, California, in 2010, and serve as the company's president and CEO, respectively. In 2011 the company received a $2 million investment, including contributions from Elon Musk, PayPal founder Peter Thiel, Irish entrepreneur Liam Casey, and venture capital firms Sequoia Capital, Andreessen Horowitz, and SV Angel. In March 2013, Stripe made its first acquisition, Kickoff, a chat and task-management application. In 2012 the company moved from Palo Alto to San Francisco. In October 2019, the company announced that it would be moving from the South of Market area to Oyster Point in the neighbouring city of South San Francisco in 2021. In February 2021, Mark Carney, former governor of the Bank of Canada and of the Bank of England, was appointed to the company's board. Carney stepped down from his role with the company in 2025 in order to run for the leadership of the Liberal Party. Stripe acquired accountancy platform Recko in October 2021 whose solution was to be added to Stripe's existing suite of financial tools. In January 2022, Stripe entered a five-year partnership with Ford Motor Company. Through the deal, Stripe would handle transactions for consumer vehicle orders and reservations. That same month, Stripe partnered with Spotify to help the company monetize subscriptions. In April 2022, Twitter announced that it would partner with Stripe, Inc. (digital payments processor) for piloting cryptocurrency pay-outs for limited users in the platform. In April 2022, Stripe announced its strategic partnership with UK-based financial technology company ION. The Wall Street Journal reported in July 2022 that the company's internal share price had fallen, causing its implied valuation to drop from $95 billion to $74 billion. In November 2022, the company announced it intended to initiate layoffs, terminating some 14% of its workforce. Throughout 2022 and 2023, the company announced a number of large enterprise customers, including Airbnb, Amazon, Microsoft, Uber, BMW, Maersk, Zara, Lotus, Alaska Airlines, Le Monde, and Toyota. The company also announced in March 2023 that OpenAI is working with Stripe to commercialize its generative AI technology. In January 2025, Stripe sent layoff notices to nearly 300 workers, primarily affecting roles in Product, Operations and Engineering. The company experienced controversy when the company sent a cartoon picture of a duck to the laid-off employees. Stripe's Chief People Officer Rob McIntosh later apologized for the mistake. After re-enabling cryptocurrency pay-ins in April 2024, starting with USDC, Stripe completed the acquisition of Bridge in February 2025. The acquisition of the two-year-old stablecoin platform company is valued at $1.1 billion. In June 2025, the company acquired Privy, which powers crypto wallets. In September 2025, Stripe announced it was powering Instant Checkout in ChatGPT and released Agentic Commerce Protocol for agentic commerce, which was co-developed with OpenAI. In October 2025, the company opened its second headquarters in Dublin, Ireland. In February 2026, Stripe was valued at $159 billion in a tender offer posted for employees and shareholders. The tender offer was about a 70% increase from Stripe's previous valuation published in February 2025, where it was valued at $91.5 billion. Stripe also announced that its total volume increased to $1.9 trillion USD in 2025, a 34% increase from 2024. == Technology company == === Payment processing === Stripe provides application programming interfaces that web developers can use to integrate payment processing into their websites and mobile applications. The company introduced Stripe Connect in 2012, a multiparty payments solution that lets software developers embed payments natively into their products. In April 2018, Stripe released antifraud tools, branded "Radar", that block fraudulent transactions. The same year, it expanded its services to include a billing product for online businesses, allowing businesses to manage subscription recurring revenue and invoicing. Stripe's point-of-sale service called Terminal was made available to US users on 11 June 2019. Terminal had previously been invitation-only. Terminal is currently available in Australia, Canada, France, Germany, Ireland, the Netherlands, New Zealand, Singapore, and the United Kingdom. The service offers physical credit-card readers designed to work with Stripe. On 5 September 2019, Stripe launched a merchant cash-advance scheme called Stripe Capital. The scheme allows Stripe merchants to request an advance on future payments they expect to process through their Stripe merchant account. In June 2021, the company launched Stripe Tax, a service to allow businesses to automatically calculate and collect sales tax, VAT, and GST, initially rolling out to 30 countries and all US states. As of 2025, it has been made available in 102 countries. In May that year, Stripe introduced Payment Links, a no-code product allowing businesses to create a link to a checkout page and begin accepting payments on social platforms or direct channels. In January 2022, Stripe agreed to acquire Terminal manufacturing partner BBPOS, allowing the company to bring the hardware development of Terminal readers in-house. In February, it was announced as Apple's first partner on in-person Tap to Pay, which enables businesses to accept contactless payments using an iPhone and a partner-enabled iOS app. In May, Stripe announced Data Pipeline, a tool for Stripe users who store data with Amazon Redshift or Snowflake Data Cloud. Data Pipeline syncs Stripe data and reports with Amazon Redshift or Snowflake Data Cloud, where they can be queried in combination with other business information. That month, the company also introduced Stripe Financial Connections, enabling businesses to establish direct connections with their customers’ bank accounts to verify accounts for payments and pay-outs, check balances to reduce payment failures, and cut fraud by confirming bank account ownership. In September 2023, Stripe announced that its optimized checkout suite allowed businesses to offer their customers more than 100 payment methods. In May 2025, Stripe announced a new AI foundational model for payments, and introduced stablecoin powered accounts. === Corporate finance === In July 2018, Stripe introduced Stripe Issuing, a product that allows online businesses and platforms to create their own physical and digital credit and debit cards. === Atlas === On 14 February 2016, the company launched the Atlas platform to help start-ups register as US corporations, targeting foreign entrepreneurs. The platform was originally invitation-only. In March 2016, Cuba was added to the list of countries covered under the program. Originally, companies registered using Atlas were set up as Delaware-based C corporations. As of 30 April 2018, the option to be registered as limited liability companies was added. Companies set up using Atlas automatically had a business bank account and Stripe merchant account set up. === Link === In May 2021, Stripe launched Link, a service for saving and auto-filling payment details when paying via Stripe. The service supported payments in over 185 countries and Stripe reported plans to make it available to platform businesses through its API. In September 2025, Patrick Collison announced that Link had surpassed 200 million users. === Other === In 2018, Stripe started a publishing company named Stripe Press to promote ideas that support businesses. In 2019, Stripe began offering loans and credit cards to businesses in the United States. The company stated that loans are approved automatically using machine-learning models, with no human intervention. The following year, the company introduced Stripe Treasury, which provides its platform users APIs to embed financial services, allowing their customers to send, receive, and store funds. In October 2020, Stripe announced Stripe Climate, a service for businesses to fund atmospheric carbon research and capture. In 2022, Stripe started a new subsidiary called Frontier that would direct spending on carbon removal. It announced $925 million in funding from major Silicon Valley companies to fund start up companies performing carbon capture to kick-start the industry. Stripe Identity, launched in Ju

Lazy learning

(Not to be confused with the lazy learning regime, see Neural tangent kernel). In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data before receiving queries. The primary motivation for employing lazy learning, as in the K-nearest neighbors algorithm, used by online recommendation systems ("people who viewed/purchased/listened to this movie/item/tune also ...") is that the data set is continuously updated with new entries (e.g., new items for sale at Amazon, new movies to view at Netflix, new clips at YouTube, new music at Spotify or Pandora). Because of the continuous update, the "training data" would be rendered obsolete in a relatively short time especially in areas like books and movies, where new best-sellers or hit movies/music are published/released continuously. Therefore, one cannot really talk of a "training phase". Lazy classifiers are most useful for large, continuously changing datasets with few attributes that are commonly queried. Specifically, even if a large set of attributes exist - for example, books have a year of publication, author/s, publisher, title, edition, ISBN, selling price, etc. - recommendation queries rely on far fewer attributes - e.g., purchase or viewing co-occurrence data, and user ratings of items purchased/viewed. == Advantages == The main advantage gained in employing a lazy learning method is that the target function will be approximated locally, such as in the k-nearest neighbor algorithm. Because the target function is approximated locally for each query to the system, lazy learning systems can simultaneously solve multiple problems and deal successfully with changes in the problem domain. At the same time they can reuse a lot of theoretical and applied results from linear regression modelling (notably PRESS statistic) and control. It is said that the advantage of this system is achieved if the predictions using a single training set are only developed for few objects. This can be demonstrated in the case of the k-NN technique, which is instance-based and function is only estimated locally. == Disadvantages == Theoretical disadvantages with lazy learning include: The large space requirement to store the entire training dataset. In practice, this is not an issue because of advances in hardware and the relatively small number of attributes (e.g., as co-occurrence frequency) that need to be stored. Particularly noisy training data increases the case base unnecessarily, because no abstraction is made during the training phase. In practice, as stated earlier, lazy learning is applied to situations where any learning performed in advance soon becomes obsolete because of changes in the data. Also, for the problems for which lazy learning is optimal, "noisy" data does not really occur - the purchaser of a book has either bought another book or hasn't. Lazy learning methods are usually slower to evaluate. In practice, for very large databases with high concurrency loads, the queries are not postponed until actual query time, but recomputed in advance on a periodic basis - e.g., nightly, in anticipation of future queries, and the answers stored. This way, the next time new queries are asked about existing entries in the database, the answers are merely looked up rapidly instead of having to be computed on the fly, which would almost certainly bring a high-concurrency multi-user system to its knees. Larger training data also entail increased cost. Particularly, there is the fixed amount of computational cost, where a processor can only process a limited amount of training data points. There are standard techniques to improve re-computation efficiency so that a particular answer is not recomputed unless the data that impact this answer has changed (e.g., new items, new purchases, new views). In other words, the stored answers are updated incrementally. This approach, used by large e-commerce or media sites, has long been used in the Entrez portal of the National Center for Biotechnology Information (NCBI) to precompute similarities between the different items in its large datasets: biological sequences, 3-D protein structures, published-article abstracts, etc. Because "find similar" queries are asked so frequently, the NCBI uses highly parallel hardware to perform nightly recomputation. The recomputation is performed only for new entries in the datasets against each other and against existing entries: the similarity between two existing entries need not be recomputed. == Examples of Lazy Learning Methods == K-nearest neighbors, which is a special case of instance-based learning. Local regression. Lazy naive Bayes rules, which are extensively used in commercial spam detection software. Here, the spammers keep getting smarter and revising their spamming strategies, and therefore the learning rules must also be continually updated.

Intelligent database

Until the 1980s, databases were viewed as computer systems that stored record-oriented and business data such as manufacturing inventories, bank records, and sales transactions. A database system was not expected to merge numeric data with text, images, or multimedia information, nor was it expected to automatically notice patterns in the data it stored. In the late 1980s the concept of an intelligent database was put forward as a system that manages information (rather than data) in a way that appears natural to users and which goes beyond simple record keeping. The term was introduced in 1989 by the book Intelligent Databases by Kamran Parsaye, Mark Chignell, Setrag Khoshafian and Harry Wong. The concept postulated three levels of intelligence for such systems: high level tools, the user interface and the database engine. The high level tools manage data quality and automatically discover relevant patterns in the data with a process called data mining. This layer often relies on the use of artificial intelligence techniques. The user interface uses hypermedia in a form that uniformly manages text, images and numeric data. The intelligent database engine supports the other two layers, often merging relational database techniques with object orientation. In the twenty-first century, intelligent databases have now become widespread, e.g. hospital databases can now call up patient histories consisting of charts, text and x-ray images just with a few mouse clicks, and many corporate databases include decision support tools based on sales pattern analysis.

Agentive logic

Agentive logic (also called the logic of action or logic of agency) is the field of philosophical logic and logic in computer science that studies formal representations of agents, their actions, and their abilities. An agentive logic in the narrower sense is a formal system whose primitive operators express that an agent does something, can do something, or sees to it that something is the case. Agentive logics generalise modal logic by adding modalities indexed to agents and to actions. Typical examples include: STIT logics (from sees to it that) with operators of the form [ i s t i t : φ ] {\displaystyle [i\ {\mathsf {stit}}:\varphi ]} meaning that agent i {\displaystyle i} sees to it that φ {\displaystyle \varphi } holds; dynamic logics of action with program-like modalities [ α ] φ {\displaystyle [\alpha ]\varphi } and ⟨ α ⟩ φ {\displaystyle \langle \alpha \rangle \varphi } meaning, roughly, that after every (respectively, some) execution(s) of action α {\displaystyle \alpha } , φ {\displaystyle \varphi } holds; logics with explicit agentive operators such as "can do", "brings about", or "is able to ensure". Agentive logics are used in action theory in philosophy, in the semantics of natural language, in the theory of program verification, and in artificial intelligence, where they underpin formalisms for reasoning about actions, planning, and intelligent agents. == Terminology and scope == The adjective agentive derives from the Latin agens ("one who acts") and originally referred to the grammatical agent of a verb. In logical contexts it designates operators or predicates whose primary argument position is an agent rather than a proposition alone, for example A i φ {\displaystyle A_{i}\varphi } ("agent i {\displaystyle i} does φ {\displaystyle \varphi } ") or C i φ {\displaystyle C_{i}\varphi } ("agent i {\displaystyle i} can bring about φ {\displaystyle \varphi } "). In contemporary literature, agentive logic is sometimes used narrowly for formal reconstructions of St. Anselm's modal account of facere ("to do"). More broadly, the term is used interchangeably with logic of action or logic of agency to cover a family of modal and dynamic logics designed to capture the structure of action and choice. == Historical background == === Medieval and early modern roots === Medieval logicians already explored analogies between modalities of action and alethic modalities such as possibility and necessity, for instance, in discussions of obligation and power. An influential early agentive analysis is due to St. Anselm (11th century), who treated "doing φ {\displaystyle \varphi } " as a kind of modal operator on propositions, anticipating later modal logics of agency. Modern reconstructions of Anselm's theory show that the resulting "agentive logic" can be modelled with neighbourhood semantics and satisfies a recognisable square of opposition. === Modern logic of action === Modern study of the logic of action began in the mid-20th century, parallel to developments in deontic logic and tense logic. Early systems were proposed by Georg Henrik von Wright, Stig Kanger, and others, often motivated by questions about norms and responsibility. From the 1960s onward, two largely independent but eventually converging traditions emerged: a branching-time tradition, culminating in STIT logics, emphasising agents' choices among possible futures; and dynamic logics of programs and actions, developed within computer science to reason about program execution. In the 1990s and 2000s, action logics were further developed in connection with knowledge representation, planning, and multi-agent systems in AI, and with dynamic and update semantics in linguistics. == Core ideas == Despite their diversity, most agentive logics share some general themes: Agents are treated as explicit indices of modal operators, as in [ i d o e s ] φ {\displaystyle [i\ {\mathsf {does}}]\varphi } or C i φ {\displaystyle C_{i}\varphi } . Actions are represented either implicitly, via changes between possible worlds along an accessibility relation, or explicitly, as terms denoting primitive and composite actions. Choice and ability are captured by modalities describing what an agent can ensure, usually relative to assumptions about the environment and other agents. Formal properties such as closure under composition, interaction between different agents, and connections to obligation (what an agent ought to do) and knowledge (what an agent knows how to do) are investigated. == STIT logics == STIT ("sees to it that") logics, originating in work by Nuel Belnap and collaborators, treat agency in a branching-time framework. A STIT model consists of a partially ordered set of moments with a tree-like structure, sets of histories (maximal branches through the tree), and for each agent at each moment, a partition of the histories through that moment representing the choices available to the agent. Intuitively, an agent's action at a moment determines which equivalence class (choice cell) of histories becomes actual; a formula [ i s t i t : φ ] {\displaystyle [i\ {\mathsf {stit}}:\varphi ]} is true at a history–moment pair if φ {\displaystyle \varphi } holds on all histories in the choice cell corresponding to the agent's current action. Different STIT operators have been distinguished, notably: the Chellas STIT operator, often written [ i c s t i t : φ ] {\displaystyle [i\ {\mathsf {cstit}}:\varphi ]} , which requires only that the agent's choice guarantees φ {\displaystyle \varphi } ; and the deliberative STIT operator, [ i d s t i t : φ ] {\displaystyle [i\ {\mathsf {dstit}}:\varphi ]} , which additionally requires that φ {\displaystyle \varphi } is not already historically necessary. STIT frameworks have been extended with group agency operators, temporal modalities, epistemic operators, and deontic operators to study responsibility, collective action, and obligations under indeterminism. == Dynamic logics of action == Dynamic logic was originally developed to reason about the behaviour of computer programs, treating program execution as a kind of action. In propositional dynamic logic (PDL), action terms α , β , … {\displaystyle \alpha ,\beta ,\dots } denote abstract programs or actions, and formulas of the form [ α ] φ {\displaystyle [\alpha ]\varphi } and ⟨ α ⟩ φ {\displaystyle \langle \alpha \rangle \varphi } express that all, respectively some, terminating executions of α {\displaystyle \alpha } lead to states where φ {\displaystyle \varphi } holds. From the standpoint of agentive logic, dynamic logic provides: a language for building complex actions from primitives via sequencing, choice, and iteration (e.g., α ; β {\displaystyle \alpha ;\beta } , α ∪ β {\displaystyle \alpha \cup \beta } , α ∗ {\displaystyle \alpha ^{}} ); a Kripke semantics in which actions correspond to labelled accessibility relations; and proof systems (such as Hoare logic and weakest precondition calculi) for reasoning about the correctness of action sequences. Extensions such as concurrent dynamic logic add operators for parallel composition, allowing reasoning about interacting processes and concurrent actions. John-Jules Ch. Meyer and others have argued that dynamic logic is a natural base for logics of agents, by adding modalities for knowledge, belief, and ability on top of the action modalities. Dynamic logics have also been applied to normative reasoning, yielding dynamic deontic logics where actions are related to obligations and permissions, and to dynamic epistemic logics in which information-changing actions such as announcements are modelled as programs. == Situation calculus and other action formalisms == In artificial intelligence, reasoning about action and change is often based on first-order languages that explicitly represent situations, events, and fluents (time-varying properties). The best known is situation calculus, introduced by John McCarthy and developed extensively by Raymond Reiter. In such formalisms: action terms name primitive actions; a function symbol (often d o {\displaystyle {\mathsf {do}}} ) maps an action and a situation to a successor situation; and axioms describe which fluents hold in which situations and how actions change them. Reiter's successor state axioms give compact specifications of how each fluent changes under all actions, and precondition axioms specify when actions are possible. Related formalisms include the event calculus and fluent calculus, which provide alternative ways of representing events and their effects. While these systems are often first-order rather than modal, they are closely related to agentive logics: their action terms and transition structures can be seen as providing models for dynamic or STIT-style modalities, and conversely, dynamic logics can be used as abstract specification languages for such AI formalisms. == Ability, agency, and related modalities == Many agentive logics introduce explicit operators for ability or "can-do"

Flektor

Flektor was a web application that allowed users the ability to create and "mashup" their own content (photos, videos, music, etc.) and share it via email, on social networking websites MySpace, Facebook, Blogger, Digg, eBay or on personal blogs. The company's website (Flektor.com) launched on April 2, 2007, and over 40,000 people began utilizing its features just one month later. Flektor closed down in January 2009. Flektor offered tools and widgets that included audio, video, photos, text, and approximately 100 effects, transitions and filters to be used with media. Users could create personalized slideshows, polls, postcards, and streaming video projects which the website calls "fleks". Flektor also offered Chat (used as a MySpace addon) and Movie Editor, which provided the ability to edit content and assets together. Users of Flektor could import media from websites like Photobucket and Google's YouTube, and then edit their content with the site's editing tools. Flektor's erstwhile competitors include Slide.com (founded by PayPal co-founder Max Levchin), RockYou!, Yahoo's JumpCut and Brightcove. == History == Flektor was created by Jason Rubin, Andy Gavin and former HBO executive Jason R. Kay. Both Rubin and Gavin spent most of their careers in the video game industry developing games for publishers like Electronic Arts, Universal Interactive Studios and Sony Computer Entertainment America. They founded a successful game development studio called Naughty Dog and were responsible for games such as Crash Bandicoot and Jak and Daxter. After selling Naughty Dog to Sony, Rubin focused on a comic book series called Iron and the Maiden before teaming up again with Gavin to venture into the web industry with Flektor. Jason Kay spent four years at Home Box Office, working as a consultant to the EVP of Business Development. They recruited former employee and then Naughty Dog Lead Programmer Scott Shumaker to lead the technology team along with Gavin. Ryan Evans joined shortly thereafter, spearheading product development. Flektor is based in Culver City, California. In May 2007, the company was sold to Fox Interactive Media, which is a division of News Corp., for more than $20 million. The deal coincided with Fox's acquisition of Photobucket, an image-hosting and sharing website. Fox Interactive Media already holds possession of MySpace, IGN Entertainment, FOXSports.com, AmericanIdol.com and Rotten Tomatoes. After the acquisition, Rubin, Gavin and Kay departed, leaving the studio in the hands of Shumaker and Evans. In the fall of 2007, Flektor partnered with its sister company, MySpace, and MTV to provide instant audience feedback via polls for the interactive MySpace/ MTV Presidential Dialogues series with presidential candidates Senator Barack Obama, Senator John McCain and John Edwards. Use of Flektor's polling system, enabled hosts John McLaughlin and Geoffrey Garin to cater their questions towards subjects of voter-interest. In the fall of 2008, Flektor built the official site for the 2008 Presidential debates, hosted at MyDebates. In January 2009, due to a company directive to focus on the core MySpace property, Fox Interactive announced that Flektor would be shut down, with some of its technology being incorporated into MySpace.

Agent verification

Agent verification is activity to gain assurances that purposeful artificial constructs act in accordance with their specifications. While primitive forms of inorganic agents have been used in manufacturing for centuries, the study of artificial agents did not begin until the mid 20th century. Foundational work on such agents was closely bound with the emergence of artificial intelligence as an academic discipline. Early agents deployed for industrial control systems and in computing were often controlled by quite simple logic however, not involving artificial intelligence as such. When deployed as part of a multi-agent system, even such simple agents could require special agent orientated testing methods, as their collective behaviour was challenging to verify with traditional testing techniques. Difficulties in providing assurances that agents will not behave in dangerous ways became more prevalent after the introduction of LLM agents, especially after the rapid acceleration of their deployment in 2025. The verification of agent behaviour can be conducted by formal or informal methods. Informal verification requires less mathematical skill. But when agents are part of systems where errors have significant risks — such as danger to human life, environmental damage or major financial loss — formal verification is preferred. Both regulators and system designers themselves like formal verification as it provides a high degree of mathematical certainty. It is not however always possible to formally test all aspects of an agent based system's behaviour, especially where newer LLM based agents are concerned, due in part to their high degree of autonomy. Accordingly, agent verification for low impact deployments might be carried out only with informal methods, while for high impact deployments, it may be performed with a mix of formal and informal techniques. == Terminology == In academia, the term agent verification is often defined to mean activity concerned with gaining assurance that the agent behaves in accordance with its specification - whether by processes such as testing or simulation. 'Verification' is typically contrasted with 'validation', the latter meaning activity concerned with checking that the specification itself meets user or real world needs. Such definitions are not universally adhered to however - for example, in some workplaces and documents, the words 'verification' and 'validation' can be used synonymously. Efforts to gain confidence in Agents have intensified sharply since 2025 due to the rapid roll out of LLM agents; different terms are sometimes used in the commercial sector. Here the term 'agent verification' can be used in the same sense as it is in academia, but sometimes the same activity can be covered by more ambiguous and wider ranging terms such as 'Agent governance' , 'Agent observability' or 'AI agent policing'. == History == === Classical agents === The theoretical underpinnings for artificial (inorganic) agents emerged in the mid 20th century, with establishment of cybernetics and artificial intelligence. Oliver Selfridge's 1958 Pandemonium - A Paradigm for Learning paper was an important early theoretical contribution in establishing agent oriented architecture. Practical implementations of agents for real world applications began to become widespread in the 1990s, after the introduction of the belief–desire–intention software model (BDI), and agent-oriented programming. Pure digital agents were deployed in computer infrastructure for purposes such as monitoring, while agents connected to real-world sensors and actuators were increasingly used in industrial control systems. While the concept of artificial agents was interwoven with early artificial intelligence studies right from the start, early agents lacked general purpose reasoning capabilities, often only having simple if then logic. Even a device as simple as a thermostat, which has a sensor and a means of acting, can be considered a proto agent in this sense. Verifying the behaviours of a simple single agent system is not generally especially difficult, but it can be a different matter when several simple agents coexist in the same system. Craig Reynolds's work on boids showed that relatively complex, "intelligent" behaviour can emerge from a number of such simple agents working together in a Multi-agent system (MAS). By the 1990s, even the behaviour of a single agent system could sometimes be quite complex; in accordance with the Belief–desire–intention software model, agents could have believes that might evolve over time. Agents were increasingly introduced that were controlled by quite large decision tree models, which had new vulnerabilities to adversarial attack. It was becoming increasingly apparent that traditional software verification methods had limitations for testing such agents, or even for the more primitive type of agents when they were deployed as part of a MAS. It was the use of agents for industrial control systems, sometimes associated with robotics, that lent urgency to the practice of agent verification. Informal testing might be acceptable for digital agents used say to monitor whether each of an organisation's computers are properly licensed. But with an increasing potential for faulty agents to result in a failure that might cause a large fire to break out at a chemical manufacturing plant, a botched medical operation, or even a crashed aircraft, the need to develop reliable means of verifying behaviour of such agents was considered urgent. The Foundation for Intelligent Physical Agents was established in 1996. From the late 90s, a growing number of industry and university based scientists began working on the problem, with researchers publishing papers on the verification of both single and multi agent systems. Much of this work showed how formal verification techniques like model checking could be used to gain a high level of assurance that agent based systems would conform with their specification. A 2018 systematic review covering 231 studies found that model checking was the most common technique for agent verification, with theorem proving the second most commonly used formal verification method. In the first two decades of the 20th century, agents run by AI became more common, with Siri and Alexa being well known examples. But such agents still lacked general reasoning capabilities and did not pose new pressing problems for agent verification. === General purpose reasoning agents === The advent of LLMs created huge potential for further use of artificial agents, as agents based on them could have general purpose cognitive abilities. Agents run by LLMs (and occasionally non-LLM foundation models) have similar vulnerability to adversarial attack as those run by decision tree models. The wider scope of actions for LLM agents has created new challenges for their verification, over and above those present for classical agents. For example, the LLM's neural network endows it with infinite domains, an especial challenge for traditional formal verification techniques. Academics began to study the problems involved in verifying LLM agents from 2018. Deployment of such agents began to accelerate in late 2023 after OpenAI's "function-calling" API was made available, and especially after Anthropic's late 2024 introduction of Model Context Protocol (MCP), a standardised way for LLM agents to gain contextual awareness, and to act on the world by calling various external tools. The rapid rollout of LLM agents following MCP's release has seen the task of agent verification receive increased attention within academia, and also from the private sector. In 2024 and 2025 several startups focusing on LLM agent verification have been founded in both Europe and the US to meet growing demand. == Approaches == === Formal verification === Formal verification involves proving the correctness of some or all aspects of a system using mathematical methods. Such methods can range from manual formal proof, to verification assisted with automated theorem provers like Isabelle. For agent verification, model checking is by far the most frequently used formal verification method; for pre-LLM models it was often complemented with techniques using computation tree logic. Another common method is theorem proving. Formal verification provides a higher degree of confidence than informal methods, but it is not always used, even when it is possible. Sometimes a person or organisation developing software agents won't have the necessary skills, or may not see it as worth the effort if the agent(s) will not have the ability to cause much harm even if they malfunction. When agents are deployed in systems where errors could have serious consequences, the ability of formal verification methods to provide mathematical certainty tends to be strongly preferred by both regulators and designers themselves. But even for high impact systems, formal verificatio