The Interim Measures for the Management of Generative AI Services (Chinese: 生成式人工智能服务管理暂行办法; pinyin: Shēngchéng shì réngōng zhìnéng fúwù guǎnlǐ zànxíng bànfǎ) are a set of regulations governing public-facing generative artificial intelligence services in China. Issued on 10 July 2023 and effective from 15 August 2023, they were China's first binding regulation specifically targeting generative AI. They have been described as among the earliest such regulations adopted by any country. The measures were jointly issued by the Cyberspace Administration of China (CAC) and six other national bodies: the National Development and Reform Commission, the Ministry of Education, the Ministry of Science and Technology, the Ministry of Industry and Information Technology, the Ministry of Public Security, and the National Radio and Television Administration. Among the measures' most prominent requirements is that generative AI services must uphold Core Socialist Values and must not generate content that could subvert state power, harm national security, or undermine social stability. The measures also require providers of public-facing generative AI services to undergo security assessments and register their algorithms with the CAC. As of December 2025, 748 generative AI services had completed the filing process at the national level. == Background == The Interim Measures build on two earlier sets of regulations targeting specific algorithm applications. The Administrative Provisions on Algorithm Recommendation for Internet Information Services, effective from March 2022, established China's algorithm registry and required providers of recommendation algorithms with "public opinion properties or social mobilization capabilities" to file with the CAC and undergo security assessments. The Administrative Provisions on Deep Synthesis of Internet Information Services, effective from January 2023, extended similar requirements to algorithms used for generating synthetic media such as deepfakes. In April 2023, the CAC released a draft of the generative AI regulation for public comment. The draft included several requirements that attracted attention, including that generated content should "embody Core Socialist Values" and that training data should be "true and accurate". The public consultation period ran until May 2023. The final version, published in July 2023, was substantially revised from the draft. According to an analysis by the Future of Privacy Forum, changes appeared to reflect feedback from industry stakeholders including Baidu, Xiaomi, SenseTime, and others, as well as input from government-affiliated research institutes. The final measures adopted a more permissive tone, with the CAC describing its approach as "inclusive and prudent" (包容审慎) and emphasising "classified and graded" (分类分级) supervision. == Scope == The measures apply to services that use generative AI technology to provide text, images, audio, video, or other content to the public within mainland China (Article 2). They do not apply to organisations that develop or use generative AI internally without offering services to the domestic public, such as industry associations, enterprises, and research institutions. Overseas providers whose services are accessible to users in China are also subject to the measures. == Key provisions == === Content requirements === Article 4 sets out the core content obligations. Providers and users of generative AI services must uphold the Core Socialist Values. The measures prohibit generating content that incites subversion of national sovereignty or the socialist system, endangers national security or the nation's image, incites separatism, promotes terrorism or extremism, promotes ethnic hatred or discrimination, or contains violence, obscenity, or false information prohibited by law. These content prohibitions largely mirror those in Article 12 of the Cybersecurity Law and in prior regulations governing online content. Article 4 also requires that models be designed and trained to avoid discrimination, that services respect intellectual property rights, and that providers take effective measures to improve the transparency and accuracy of generated content. === Training data and labelling === Article 7 requires providers to ensure that training data is of high quality and legitimately sourced, and that it does not infringe upon intellectual property rights. Where personal information is used, consent must be obtained. The final version of this provision removed language from the draft that would have held providers responsible for the "legitimacy" of all pretraining data, replacing it with a requirement to "employ effective measures to improve the quality of training data". Article 8 requires providers to establish labelling rules for training data and to conduct quality assessments of data annotations. Article 12 requires that generated images, videos, and other synthetic content be labelled as AI-generated. === User rights and privacy === Article 11 requires providers to protect user privacy, to minimise the collection and retention of personal data, and to refrain from unlawfully sharing user information. Users have the right to request review, correction, or deletion of their personal information. Article 10 requires providers to take measures to prevent excessive dependence on or addiction to generative AI services by minors. === Security assessment and algorithm filing === Article 17 requires that providers of generative AI services with "public opinion properties or the capacity for social mobilization" (具有舆论属性或者社会动员能力) carry out security assessments and complete algorithm filing procedures in accordance with the Administrative Provisions on Algorithm Recommendation for Internet Information Services. == Implementation == === Algorithm filing process === In practice, the filing requirements under the Interim Measures have developed into a two-tier process. The first tier is the standard algorithm filing (算法备案) under the pre-existing Algorithm Recommendation Provisions, which involves submitting information about an algorithm's design, purpose, and data sources to the CAC. This process is primarily a registration mechanism. For public-facing generative AI products, there is an additional, more rigorous process commonly referred to as the "large model filing" (大模型备案). This involves submitting a security self-assessment report, data annotation rules, a keyword blocking list, and evaluation test question sets. The process includes technical testing at the provincial level, followed by review at the national CAC level. The algorithm filing targets specific algorithms, while the large model filing evaluates the broader system architecture, training data, model parameters, and potential social impact. The CAC publishes lists of generative AI services that have successfully completed the filing process. The first such list was published on 2 April 2024. According to the CAC's year-end announcements, 302 generative AI services had completed national-level filing by the end of 2024 (of which 238 were new that year), alongside 105 applications that completed local-level registration. By the end of 2025, the cumulative total had risen to 748 national-level filings and 435 local-level registrations. === Content compliance and testing === According to the Carnegie Endowment, the CAC has conducted compliance audits of generative AI services with a particular focus on ensuring appropriate responses to queries about politically sensitive topics. The large model filing process requires providers to pass both provincial-level and national-level technical testing before their services can be made available to the public. On 1 March 2024, the National Technical Committee 260 on Cybersecurity (TC260) published TC260-003, the Basic Security Requirements for Generative AI Services (生成式人工智能服务安全基本要求), a technical standard that provides detailed guidance on the security assessments required under the Interim Measures. The standard covers requirements for training data safety, model security, and content safety evaluation, and is used as a reference for the filing process. == Analysis == === Relationship to broader Chinese internet regulation === The content requirements in the Interim Measures extend China's existing framework for online information control to generative AI. Legal scholars have noted that the "Core Socialist Values" provision and the specific content prohibitions are consistent with longstanding requirements imposed on internet platforms under the Cybersecurity Law and related regulations. The Asia Society Policy Institute has described the Chinese government's highest regulatory priority in this area as retaining control of information, noting that content-related obligations receive stricter enforcement than other provisions. === Nature of the filing system === The character of the filing system has been debated by scholars. Angela Huyue Zh
Automation integrator
An automation integrator is a systems integrator company or individual who makes different versions of automation hardware and software work together, generally combining several subsystems to work together as one large system. The title may refer to those who only integrate hardware, although these will often work with software integrators. Software created by automation integrators allows devices to communicate with each other, as well as collecting and reporting data. The magazine Control Engineering publishes an annual “Automation Integrator Guide” which lists over 2,000 automation integrators. They also give an annual system integrator of the year award to three automation integration firms. The Control System Integrators Association (CSIA) maintains a buyers' guide of over 1200 member and nonmember systems integrators known as the Industrial Automation Exchange, or CSIA Exchange for short. == Certification == The Control System Integrators Association (CSIA) certifies automation integrators, through an audit based on 79 critical criteria from the best practices manual. Companies must be associate members of the CSIA to be eligible for certification. Integrators can also receive certification through a program launched in 2012 by the Robotics Industries Association. == Industries == Automation Integrators work in a wide variety of industries which use robotics and automation. Some of the most common include:
Agent verification
Agent verification is activity to gain assurances that purposeful artificial constructs act in accordance with their specifications. While primitive forms of inorganic agents have been used in manufacturing for centuries, the study of artificial agents did not begin until the mid 20th century. Foundational work on such agents was closely bound with the emergence of artificial intelligence as an academic discipline. Early agents deployed for industrial control systems and in computing were often controlled by quite simple logic however, not involving artificial intelligence as such. When deployed as part of a multi-agent system, even such simple agents could require special agent orientated testing methods, as their collective behaviour was challenging to verify with traditional testing techniques. Difficulties in providing assurances that agents will not behave in dangerous ways became more prevalent after the introduction of LLM agents, especially after the rapid acceleration of their deployment in 2025. The verification of agent behaviour can be conducted by formal or informal methods. Informal verification requires less mathematical skill. But when agents are part of systems where errors have significant risks — such as danger to human life, environmental damage or major financial loss — formal verification is preferred. Both regulators and system designers themselves like formal verification as it provides a high degree of mathematical certainty. It is not however always possible to formally test all aspects of an agent based system's behaviour, especially where newer LLM based agents are concerned, due in part to their high degree of autonomy. Accordingly, agent verification for low impact deployments might be carried out only with informal methods, while for high impact deployments, it may be performed with a mix of formal and informal techniques. == Terminology == In academia, the term agent verification is often defined to mean activity concerned with gaining assurance that the agent behaves in accordance with its specification - whether by processes such as testing or simulation. 'Verification' is typically contrasted with 'validation', the latter meaning activity concerned with checking that the specification itself meets user or real world needs. Such definitions are not universally adhered to however - for example, in some workplaces and documents, the words 'verification' and 'validation' can be used synonymously. Efforts to gain confidence in Agents have intensified sharply since 2025 due to the rapid roll out of LLM agents; different terms are sometimes used in the commercial sector. Here the term 'agent verification' can be used in the same sense as it is in academia, but sometimes the same activity can be covered by more ambiguous and wider ranging terms such as 'Agent governance' , 'Agent observability' or 'AI agent policing'. == History == === Classical agents === The theoretical underpinnings for artificial (inorganic) agents emerged in the mid 20th century, with establishment of cybernetics and artificial intelligence. Oliver Selfridge's 1958 Pandemonium - A Paradigm for Learning paper was an important early theoretical contribution in establishing agent oriented architecture. Practical implementations of agents for real world applications began to become widespread in the 1990s, after the introduction of the belief–desire–intention software model (BDI), and agent-oriented programming. Pure digital agents were deployed in computer infrastructure for purposes such as monitoring, while agents connected to real-world sensors and actuators were increasingly used in industrial control systems. While the concept of artificial agents was interwoven with early artificial intelligence studies right from the start, early agents lacked general purpose reasoning capabilities, often only having simple if then logic. Even a device as simple as a thermostat, which has a sensor and a means of acting, can be considered a proto agent in this sense. Verifying the behaviours of a simple single agent system is not generally especially difficult, but it can be a different matter when several simple agents coexist in the same system. Craig Reynolds's work on boids showed that relatively complex, "intelligent" behaviour can emerge from a number of such simple agents working together in a Multi-agent system (MAS). By the 1990s, even the behaviour of a single agent system could sometimes be quite complex; in accordance with the Belief–desire–intention software model, agents could have believes that might evolve over time. Agents were increasingly introduced that were controlled by quite large decision tree models, which had new vulnerabilities to adversarial attack. It was becoming increasingly apparent that traditional software verification methods had limitations for testing such agents, or even for the more primitive type of agents when they were deployed as part of a MAS. It was the use of agents for industrial control systems, sometimes associated with robotics, that lent urgency to the practice of agent verification. Informal testing might be acceptable for digital agents used say to monitor whether each of an organisation's computers are properly licensed. But with an increasing potential for faulty agents to result in a failure that might cause a large fire to break out at a chemical manufacturing plant, a botched medical operation, or even a crashed aircraft, the need to develop reliable means of verifying behaviour of such agents was considered urgent. The Foundation for Intelligent Physical Agents was established in 1996. From the late 90s, a growing number of industry and university based scientists began working on the problem, with researchers publishing papers on the verification of both single and multi agent systems. Much of this work showed how formal verification techniques like model checking could be used to gain a high level of assurance that agent based systems would conform with their specification. A 2018 systematic review covering 231 studies found that model checking was the most common technique for agent verification, with theorem proving the second most commonly used formal verification method. In the first two decades of the 20th century, agents run by AI became more common, with Siri and Alexa being well known examples. But such agents still lacked general reasoning capabilities and did not pose new pressing problems for agent verification. === General purpose reasoning agents === The advent of LLMs created huge potential for further use of artificial agents, as agents based on them could have general purpose cognitive abilities. Agents run by LLMs (and occasionally non-LLM foundation models) have similar vulnerability to adversarial attack as those run by decision tree models. The wider scope of actions for LLM agents has created new challenges for their verification, over and above those present for classical agents. For example, the LLM's neural network endows it with infinite domains, an especial challenge for traditional formal verification techniques. Academics began to study the problems involved in verifying LLM agents from 2018. Deployment of such agents began to accelerate in late 2023 after OpenAI's "function-calling" API was made available, and especially after Anthropic's late 2024 introduction of Model Context Protocol (MCP), a standardised way for LLM agents to gain contextual awareness, and to act on the world by calling various external tools. The rapid rollout of LLM agents following MCP's release has seen the task of agent verification receive increased attention within academia, and also from the private sector. In 2024 and 2025 several startups focusing on LLM agent verification have been founded in both Europe and the US to meet growing demand. == Approaches == === Formal verification === Formal verification involves proving the correctness of some or all aspects of a system using mathematical methods. Such methods can range from manual formal proof, to verification assisted with automated theorem provers like Isabelle. For agent verification, model checking is by far the most frequently used formal verification method; for pre-LLM models it was often complemented with techniques using computation tree logic. Another common method is theorem proving. Formal verification provides a higher degree of confidence than informal methods, but it is not always used, even when it is possible. Sometimes a person or organisation developing software agents won't have the necessary skills, or may not see it as worth the effort if the agent(s) will not have the ability to cause much harm even if they malfunction. When agents are deployed in systems where errors could have serious consequences, the ability of formal verification methods to provide mathematical certainty tends to be strongly preferred by both regulators and designers themselves. But even for high impact systems, formal verificatio
Algorithmic probability
In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs. In the mathematical formalism used, the observations have the form of finite binary strings viewed as outputs of Turing machines, and the universal prior is a probability distribution over the set of finite binary strings calculated from a probability distribution over programs (that is, inputs to a universal Turing machine). The prior is universal in the Turing-computability sense, i.e. no string has zero probability. It is not computable, but it can be approximated. Formally, the probability P {\displaystyle P} is not a probability and it is not computable. It is only "lower semi-computable" and a "semi-measure". By "semi-measure", it means that 0 ≤ ∑ x P ( x ) < 1 {\displaystyle 0\leq \sum _{x}P(x)<1} . That is, the "probability" does not actually sum up to one, unlike actual probabilities. This is because some inputs to the Turing machine causes it to never halt, which means the probability mass allocated to those inputs is lost. By "lower semi-computable", it means there is a Turing machine that, given an input string x {\displaystyle x} , can print out a sequence y 1 < y 2 < ⋯ {\displaystyle y_{1} Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. The society as a whole is looking for artificial intelligence to be on a key factor in the upcming years because of its potential. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time. Kaplan and Haenlein structure artificial intelligence along three evolutionary stages: Artificial narrow intelligence – AI capable only of specific tasks; Artificial general intelligence – AI with ability in several areas, and able to autonomously solve problems they were never even designed for; Artificial superintelligence – AI capable of general tasks, including scientific creativity, social skills, and general wisdom. To allow comparison with human performance, artificial intelligence can be evaluated on constrained and well-defined problems. Such tests have been termed subject-matter expert Turing tests. Also, smaller problems provide more achievable goals and there are an ever-increasing number of positive results. In 2023, humans still substantially outperformed both GPT-4 and other models tested on the ConceptARC benchmark. Those models scored 60% on most, and 77% on one category, while humans scored 91% on all and 97% on one category. However, later research in 2025 showed that human-generated output grids were only accurate 73% of the time, while AI models available that year managed to score above 77%. == History == Increasing, promoting or constraining AI progress has often be done via controlling or increasing the amount of compute. == Current performance in specific areas == There are many useful abilities that can be described as showing some form of intelligence. This gives better insight into the comparative success of artificial intelligence in different areas. AI, like electricity or the steam engine, is a general-purpose technology. There is no consensus on how to characterize which tasks AI tends to excel at. Some versions of Moravec's paradox observe that humans are more likely to outperform machines in areas such as physical dexterity that have been the direct target of natural selection. While projects such as AlphaZero have succeeded in generating their own knowledge from scratch, many other machine learning projects require large training datasets. Researcher Andrew Ng has suggested, as a "highly imperfect rule of thumb", that "almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI." Games provide a high-profile benchmark for assessing rates of progress; many games have a large professional player base and a well-established competitive rating system. AlphaGo brought the era of classical board-game benchmarks to a close when Artificial Intelligence proved their competitive edge over humans in 2016. Deep Mind's AlphaGo AI software program defeated the world's best professional Go Player Lee Sedol. Games of imperfect knowledge provide new challenges to AI in the area of game theory; the most prominent milestone in this area was brought to a close by Libratus' poker victory in 2017. E-sports continue to provide additional benchmarks; Facebook AI, Deepmind, and others have engaged with the popular StarCraft franchise of videogames. Broad classes of outcome for an AI test may be given as: optimal: it is not possible to perform better (note: some of these entries were solved by humans) super-human: performs better than all humans high-human: performs better than most humans par-human: performs similarly to most humans sub-human: performs worse than most humans === Optimal === Tic-tac-toe Connect Four: 1988 Checkers (aka 8x8 draughts): Weakly solved (2007) Rubik's Cube: Mostly solved (2010) Heads-up limit hold'em poker: Statistically optimal in the sense that "a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution" (2015) === Super-human === Othello (aka reversi): c. 1997 Scrabble: 2006 Backgammon: c. 1995–2002 Chess: Supercomputer (c. 1997); Personal computer (c. 2006); Mobile phone (c. 2009); Computer defeats human + computer (c. 2017) Jeopardy!: Question answering, although the machine did not use speech recognition (2011) Arimaa: 2015 Shogi: c. 2017 Go: 2017 Heads-up no-limit hold'em poker: 2017 Six-player no-limit hold'em poker: 2019 Gran Turismo Sport: 2022 === High-human === Crosswords: c. 2012 Freeciv: 2016 Dota 2: 2018 Bridge card-playing: According to a 2009 review, "the best programs are attaining expert status as (bridge) card players", excluding bidding. StarCraft II: 2019 Mahjong: 2019 Stratego: 2022 No-Press Diplomacy: 2022 Hanabi: 2022 Natural language processing === Par-human === Optical character recognition for ISO 1073-1:1976 and similar special characters. Classification of images Handwriting recognition Facial recognition Visual question answering SQuAD 2.0 English reading-comprehension benchmark (2019) SuperGLUE English-language understanding benchmark (2020) Some school science exams (2019) Some tasks based on Raven's Progressive Matrices Many Atari 2600 games (2015) === Sub-human === Optical character recognition for printed text (nearing par-human for Latin-script typewritten text) Object recognition Various robotics tasks that may require advances in robot hardware as well as AI, including: Stable bipedal locomotion: Bipedal robots can walk, but are less stable than human walkers (as of 2017) Humanoid soccer Speech recognition: "nearly equal to human performance" (2017) Explainability. Current medical systems can diagnose certain medical conditions well, but cannot explain to users why they made the diagnosis. Many tests of fluid intelligence (2020) Bongard visual cognition problems, such as the Bongard-LOGO benchmark (2020) Visual Commonsense Reasoning (VCR) benchmark (as of 2020) Stock market prediction: Financial data collection and processing using Machine Learning algorithms Angry Birds video game, as of 2020 Various tasks that are difficult to solve without contextual knowledge, including: Translation Word-sense disambiguation == Proposed tests of artificial intelligence == In his famous Turing test, Alan Turing picked language, the defining feature of human beings, for its basis. The Turing test is now considered too exploitable to be a meaningful benchmark. The Feigenbaum test, proposed by the inventor of expert systems, tests a machine's knowledge and expertise about a specific subject. A paper by Jim Gray of Microsoft in 2003 suggested extending the Turing test to speech understanding, speaking and recognizing objects and behavior. Proposed "universal intelligence" tests aim to compare how well machines, humans, and even non-human animals perform on problem sets that are generic as possible. At an extreme, the test suite can contain every possible problem, weighted by Kolmogorov complexity; however, these problem sets tend to be dominated by impoverished pattern-matching exercises where a tuned AI can easily exceed human performance levels. == Exams == According to OpenAI, in 2023 GPT-4 achieved high scores on several standardized and professional examinations, including around the 90th percentile on the Uniform Bar Exam, the 89th percentile on the mathematics section of the SAT, the 93rd percentile on SAT Reading and Writing, the 54th percentile on the analytical writing section of the GRE, the 88th percentile on GRE quantitative reasoning, and the 99th percentile on GRE verbal reasoning. OpenAI also reported that GPT-4 scored in the 99th to 100th percentile on the 2020 USA Biology Olympiad semifinal exam and earned top scores on several AP exams. Independent researchers found in 2023 that ChatGPT based on GPT-3.5 performed "at or near the passing threshold" on all three parts of the United States Medical Licensing Examination (USMLE), suggesting that large language models could reach passing-level performance on some medical knowledge assessments even without domain-specific fine-tuning. GPT-3.5 was also reported to attain a low but passing grade on examinations for four law school courses at the University of Minnes BevQ is a queue management mobile application developed by Faircode Technologies of Kochi, Kerala. It is provided by the Kerala State Beverages Corporation under Government of Kerala. == History == This app was released together by the Government of Kerala and the Kerala State Beverages Corporation in order to implement social distancing in the liquor stores Kerala in the case of the COVID-19 pandemic in Kerala and to reduce the congestion of people. The BevQ App was released by Faircode Technologies on 27 May 2020 on the Google Play Store. In January 2021, the app was withdrawn as bars had opened. In June 2021, there was a commitment from the Kerala CM that the App will be relaunched again. It has been reported that over 132,000 new users downloaded the app in the 48 hours after the announcement. == Achievements == The BEVQ app, which works only in the state of Kerala, beat all other Indian food and drink apps in 2020 to see the highest growth in year-on-year sessions, according to the State of Mobile 2021 report by App Annie. The app even beat the likes of Domino’s, which is used all across India. Around 300 government Liquor shops and 900 private liquor shops were enlisted in the platform. More than 200 million unique users registered in the platform. About 250,000 tokens were given out a day. Progress in artificial intelligence (AI) refers to the advances, milestones, and breakthroughs that have been achieved in the field of artificial intelligence over time. AI is a branch of computer science that aims to create machines and systems capable of performing tasks that typically require human intelligence. AI applications have been used in a wide range of fields including medical diagnosis, finance, robotics, law, video games, agriculture, and scientific discovery. The society as a whole is looking for artificial intelligence to be on a key factor in the upcming years because of its potential. However, many AI applications are not perceived as AI: "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." "Many thousands of AI applications are deeply embedded in the infrastructure of every industry." In the late 1990s and early 2000s, AI technology became widely used as elements of larger systems, but the field was rarely credited for these successes at the time. Kaplan and Haenlein structure artificial intelligence along three evolutionary stages: Artificial narrow intelligence – AI capable only of specific tasks; Artificial general intelligence – AI with ability in several areas, and able to autonomously solve problems they were never even designed for; Artificial superintelligence – AI capable of general tasks, including scientific creativity, social skills, and general wisdom. To allow comparison with human performance, artificial intelligence can be evaluated on constrained and well-defined problems. Such tests have been termed subject-matter expert Turing tests. Also, smaller problems provide more achievable goals and there are an ever-increasing number of positive results. In 2023, humans still substantially outperformed both GPT-4 and other models tested on the ConceptARC benchmark. Those models scored 60% on most, and 77% on one category, while humans scored 91% on all and 97% on one category. However, later research in 2025 showed that human-generated output grids were only accurate 73% of the time, while AI models available that year managed to score above 77%. == History == Increasing, promoting or constraining AI progress has often be done via controlling or increasing the amount of compute. == Current performance in specific areas == There are many useful abilities that can be described as showing some form of intelligence. This gives better insight into the comparative success of artificial intelligence in different areas. AI, like electricity or the steam engine, is a general-purpose technology. There is no consensus on how to characterize which tasks AI tends to excel at. Some versions of Moravec's paradox observe that humans are more likely to outperform machines in areas such as physical dexterity that have been the direct target of natural selection. While projects such as AlphaZero have succeeded in generating their own knowledge from scratch, many other machine learning projects require large training datasets. Researcher Andrew Ng has suggested, as a "highly imperfect rule of thumb", that "almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI." Games provide a high-profile benchmark for assessing rates of progress; many games have a large professional player base and a well-established competitive rating system. AlphaGo brought the era of classical board-game benchmarks to a close when Artificial Intelligence proved their competitive edge over humans in 2016. Deep Mind's AlphaGo AI software program defeated the world's best professional Go Player Lee Sedol. Games of imperfect knowledge provide new challenges to AI in the area of game theory; the most prominent milestone in this area was brought to a close by Libratus' poker victory in 2017. E-sports continue to provide additional benchmarks; Facebook AI, Deepmind, and others have engaged with the popular StarCraft franchise of videogames. Broad classes of outcome for an AI test may be given as: optimal: it is not possible to perform better (note: some of these entries were solved by humans) super-human: performs better than all humans high-human: performs better than most humans par-human: performs similarly to most humans sub-human: performs worse than most humans === Optimal === Tic-tac-toe Connect Four: 1988 Checkers (aka 8x8 draughts): Weakly solved (2007) Rubik's Cube: Mostly solved (2010) Heads-up limit hold'em poker: Statistically optimal in the sense that "a human lifetime of play is not sufficient to establish with statistical significance that the strategy is not an exact solution" (2015) === Super-human === Othello (aka reversi): c. 1997 Scrabble: 2006 Backgammon: c. 1995–2002 Chess: Supercomputer (c. 1997); Personal computer (c. 2006); Mobile phone (c. 2009); Computer defeats human + computer (c. 2017) Jeopardy!: Question answering, although the machine did not use speech recognition (2011) Arimaa: 2015 Shogi: c. 2017 Go: 2017 Heads-up no-limit hold'em poker: 2017 Six-player no-limit hold'em poker: 2019 Gran Turismo Sport: 2022 === High-human === Crosswords: c. 2012 Freeciv: 2016 Dota 2: 2018 Bridge card-playing: According to a 2009 review, "the best programs are attaining expert status as (bridge) card players", excluding bidding. StarCraft II: 2019 Mahjong: 2019 Stratego: 2022 No-Press Diplomacy: 2022 Hanabi: 2022 Natural language processing === Par-human === Optical character recognition for ISO 1073-1:1976 and similar special characters. Classification of images Handwriting recognition Facial recognition Visual question answering SQuAD 2.0 English reading-comprehension benchmark (2019) SuperGLUE English-language understanding benchmark (2020) Some school science exams (2019) Some tasks based on Raven's Progressive Matrices Many Atari 2600 games (2015) === Sub-human === Optical character recognition for printed text (nearing par-human for Latin-script typewritten text) Object recognition Various robotics tasks that may require advances in robot hardware as well as AI, including: Stable bipedal locomotion: Bipedal robots can walk, but are less stable than human walkers (as of 2017) Humanoid soccer Speech recognition: "nearly equal to human performance" (2017) Explainability. Current medical systems can diagnose certain medical conditions well, but cannot explain to users why they made the diagnosis. Many tests of fluid intelligence (2020) Bongard visual cognition problems, such as the Bongard-LOGO benchmark (2020) Visual Commonsense Reasoning (VCR) benchmark (as of 2020) Stock market prediction: Financial data collection and processing using Machine Learning algorithms Angry Birds video game, as of 2020 Various tasks that are difficult to solve without contextual knowledge, including: Translation Word-sense disambiguation == Proposed tests of artificial intelligence == In his famous Turing test, Alan Turing picked language, the defining feature of human beings, for its basis. The Turing test is now considered too exploitable to be a meaningful benchmark. The Feigenbaum test, proposed by the inventor of expert systems, tests a machine's knowledge and expertise about a specific subject. A paper by Jim Gray of Microsoft in 2003 suggested extending the Turing test to speech understanding, speaking and recognizing objects and behavior. Proposed "universal intelligence" tests aim to compare how well machines, humans, and even non-human animals perform on problem sets that are generic as possible. At an extreme, the test suite can contain every possible problem, weighted by Kolmogorov complexity; however, these problem sets tend to be dominated by impoverished pattern-matching exercises where a tuned AI can easily exceed human performance levels. == Exams == According to OpenAI, in 2023 GPT-4 achieved high scores on several standardized and professional examinations, including around the 90th percentile on the Uniform Bar Exam, the 89th percentile on the mathematics section of the SAT, the 93rd percentile on SAT Reading and Writing, the 54th percentile on the analytical writing section of the GRE, the 88th percentile on GRE quantitative reasoning, and the 99th percentile on GRE verbal reasoning. OpenAI also reported that GPT-4 scored in the 99th to 100th percentile on the 2020 USA Biology Olympiad semifinal exam and earned top scores on several AP exams. Independent researchers found in 2023 that ChatGPT based on GPT-3.5 performed "at or near the passing threshold" on all three parts of the United States Medical Licensing Examination (USMLE), suggesting that large language models could reach passing-level performance on some medical knowledge assessments even without domain-specific fine-tuning. GPT-3.5 was also reported to attain a low but passing grade on examinations for four law school courses at the University of MinnesProgress in artificial intelligence
BevQ
Progress in artificial intelligence