AI Code Checker Python

AI Code Checker Python — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • WiPay

    WiPay

    WiPay is a Caribbean-based payment technology company that specializes in electronic payments for businesses. WiPay was founded in 2016 by Aldwyn Wayne Jr., a Trinidadian businessman and graduate of Georgia Tech Institute. In September 2019, WiPay partnered with MasterCard. As a result, WiPay became the only licensed Payment Facilitator (PAYFAC) on both the MasterCard and Visa networks in the region.

    Read more →
  • Multi-agent reinforcement learning

    Multi-agent reinforcement learning

    Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics. Multi-agent reinforcement learning is closely related to game theory and especially repeated games, as well as multi-agent systems. Its study combines the pursuit of finding ideal algorithms that maximize rewards with a more sociological set of concepts. While research in single-agent reinforcement learning is concerned with finding the algorithm that gets the biggest number of points for one agent, research in multi-agent reinforcement learning evaluates and quantifies social metrics, such as cooperation, reciprocity, equity, social influence, language and discrimination. == Definition == Similarly to single-agent reinforcement learning, multi-agent reinforcement learning is modeled as some form of a Markov decision process (MDP). Fix a set of agents I = { 1 , . . . , N } {\displaystyle I=\{1,...,N\}} . We then define: A set S {\displaystyle S} of environment states. One set A i {\displaystyle {\mathcal {A}}_{i}} of actions for each of the agents i ∈ I = { 1 , … , N } {\displaystyle i\in I=\{1,\dots ,N\}} . P a → ( s , s ′ ) = Pr ( s t + 1 = s ′ ∣ s t = s , a → t = a → ) {\displaystyle P_{\vec {a}}(s,s')=\Pr(s_{t+1}=s'\mid s_{t}=s,{\vec {a}}_{t}={\vec {a}})} is the probability of transition (at time t {\displaystyle t} ) from state s {\displaystyle s} to state s ′ {\displaystyle s'} under joint action a → {\displaystyle {\vec {a}}} . R → a → ( s , s ′ ) {\displaystyle {\vec {R}}_{\vec {a}}(s,s')} is the immediate joint reward after the transition from s {\displaystyle s} to s ′ {\displaystyle s'} with joint action a → {\displaystyle {\vec {a}}} . In settings with perfect information, such as the games of chess and Go, the MDP would be fully observable. In settings with imperfect information, especially in real-world applications like self-driving cars, each agent would access an observation that only has part of the information about the current state. In the partially observable setting, the core model is the partially observable stochastic game in the general case, and the decentralized POMDP in the cooperative case. == Cooperation vs. competition == When multiple agents are acting in a shared environment their interests might be aligned or misaligned. MARL allows exploring all the different alignments and how they affect the agents' behavior: In pure competition settings, the agents' rewards are exactly opposite to each other, and therefore they are playing against each other. Pure cooperation settings are the other extreme, in which agents get the exact same rewards, and therefore they are playing with each other. Mixed-sum settings cover all the games that combine elements of both cooperation and competition. === Pure competition settings === When two agents are playing a zero-sum game, they are in pure competition with each other. Many traditional games such as chess and Go fall under this category, as do two-player variants of video games like StarCraft. Because each agent can only win at the expense of the other agent, many complexities are stripped away. There is no prospect of communication or social dilemmas, as neither agent is incentivized to take actions that benefit its opponent. The Deep Blue and AlphaGo projects demonstrate how to optimize the performance of agents in pure competition settings. One complexity that is not stripped away in pure competition settings is autocurricula. As the agents' policy is improved using self-play, multiple layers of learning may occur. === Pure cooperation settings === MARL is used to explore how separate agents with identical interests can communicate and work together. Pure cooperation settings are explored in recreational cooperative games such as Overcooked, as well as real-world scenarios in robotics. In pure cooperation settings all the agents get identical rewards, which means that social dilemmas do not occur. In pure cooperation settings, oftentimes there are an arbitrary number of coordination strategies, and agents converge to specific "conventions" when coordinating with each other. The notion of conventions has been studied in language and also alluded to in more general multi-agent collaborative tasks. === Mixed-sum settings === Most real-world scenarios involving multiple agents have elements of both cooperation and competition. For example, when multiple self-driving cars are planning their respective paths, each of them has interests that are diverging but not exclusive: Each car is minimizing the amount of time it's taking to reach its destination, but all cars have the shared interest of avoiding a traffic collision. Zero-sum settings with three or more agents often exhibit similar properties to mixed-sum settings, since each pair of agents might have a non-zero utility sum between them. Mixed-sum settings can be explored using classic matrix games such as prisoner's dilemma, more complex sequential social dilemmas, and recreational games such as Among Us, Diplomacy and StarCraft II. Mixed-sum settings can give rise to communication and social dilemmas. == Social dilemmas == As in game theory, much of the research in MARL revolves around social dilemmas, such as prisoner's dilemma, chicken and stag hunt. While game theory research might focus on Nash equilibria and what an ideal policy for an agent would be, MARL research focuses on how the agents would learn these ideal policies using a trial-and-error process. The reinforcement learning algorithms that are used to train the agents are maximizing the agent's own reward; the conflict between the needs of the agents and the needs of the group is a subject of active research. Various techniques have been explored in order to induce cooperation in agents: Modifying the environment rules, adding intrinsic rewards, and more. === Sequential social dilemmas === Social dilemmas like prisoner's dilemma, chicken and stag hunt are "matrix games". Each agent takes only one action from a choice of two possible actions, and a simple 2x2 matrix is used to describe the reward that each agent will get, given the actions that each agent took. In humans and other living creatures, social dilemmas tend to be more complex. Agents take multiple actions over time, and the distinction between cooperating and defecting is not as clear cut as in matrix games. The concept of a sequential social dilemma (SSD) was introduced in 2017 as an attempt to model that complexity. There is ongoing research into defining different kinds of SSDs and showing cooperative behavior in the agents that act in them. == Autocurricula == An autocurriculum (plural: autocurricula) is a reinforcement learning concept that's salient in multi-agent experiments. As agents improve their performance, they change their environment; this change in the environment affects themselves and the other agents. The feedback loop results in several distinct phases of learning, each depending on the previous one. The stacked layers of learning are called an autocurriculum. Autocurricula are especially apparent in adversarial settings, where each group of agents is racing to counter the current strategy of the opposing group. The Hide and Seek game is an accessible example of an autocurriculum occurring in an adversarial setting. In this experiment, a team of seekers is competing against a team of hiders. Whenever one of the teams learns a new strategy, the opposing team adapts its strategy to give the best possible counter. When the hiders learn to use boxes to build a shelter, the seekers respond by learning to use a ramp to break into that shelter. The hiders respond by locking the ramps, making them unavailable for the seekers to use. The seekers then respond by "box surfing", exploiting a glitch in the game to penetrate the shelter. Each "level" of learning is an emergent phenomenon, with the previous level as its premise. This results in a stack of behaviors, each dependent on its predecessor. Autocurricula in reinforcement learning experiments are compared to the stages of the evolution of life on Earth and the development of human culture. A major stage in evolution happened 2-3 billion years ago, when photosynthesizing life forms started to produce massive amounts of oxygen, changing the balance of gases in the atmosphere. In the next stages of evolution, oxygen-breathing life forms evolved, eventually leading up to land mammals and human beings. These later stages could only happen after the photosynthesis stage made oxygen widely available. Similarly, human culture could not have gone through the Industrial Revolution in the 18th century without the resources and insights gaine

    Read more →
  • TipTop Technologies

    TipTop Technologies

    TipTop Technologies is a real-time web and social search engine with a platform for semantic analysis of natural language. Tip-Top Search provides results capturing individual and group sentiment, opinions, and experiences there from the content of various sorts such as real-time messages from Twitter or consumer product reviews on Amazon.com. TipTop Technologies and ITC Infotech collaborated to create a search interface suitable for both enterprise and consumer applications. Tip-Top's products are part of the "emerging Web 3.0 applications which use semantic technologies to augment the underlying Web system's functionalities." Their main product is 360, an AI tool that incorporates multiple AI applications under one wing. Jonathan AlBright professor at Elon University, found videos generated by TipTop Technologies software on YouTube in his research into artificial intelligence, described it as AI-generated "fake news". Through semantic analysis of large data sets, TipTop gleaned behavioral insights from Tweets around events like Halloween, Thanksgiving, Holiday Gifting, the Super Bowl, and the Oscar Nominees for the Academy Awards coverage. Sentiment analysis, concept trend tracking, and real-time market research are other applications included in the TipTop Search product. TipTop's insight engine solves the problem of real-time data noise, and its ability to "sort the 'good tweets' from the 'bad tweets' when it comes to a product, service, or a region..." In addition, products like TipTop Shopping with customizable search widgets bring together consumer reviews, social search, and sentiment analysis enabling product comparisons across attributes like the overall value and aiding purchasing decisions through user-driven product tips and pits. TipTop Finance adds another complexity to real-time search results by incorporating corporate sentiment, company stock tickers, and social media into TipTop's existing social search platform. Additional success applying semantic technologies has been with polling, "if you compare these Gallup results with TipTop, a sentiment engine based on Twitter, the results are not way off. It does surprise you but it tells me that sentiment analysis in case of public opinion about a burning social issue or a famous personality is relatively easier." With the increasing amount of unstructured, opinion-oriented, and user-generated content available on the Web, TipTop's technology aims to make sense of all this data, and deliver it in a useful way for consumer and enterprise users alike. TipTop Technologies is a privately held company with its headquarters in the San Francisco Bay Area, and team members are located globally.

    Read more →
  • Sparrow (chatbot)

    Sparrow (chatbot)

    Sparrow is a chatbot developed by the artificial intelligence research lab DeepMind, a subsidiary of Alphabet Inc. It is designed to answer users' questions correctly, while reducing the risk of unsafe and inappropriate answers. One motivation behind Sparrow is to address the problem of language models producing incorrect, biased or potentially harmful outputs. Sparrow is trained using human judgements, in order to be more “Helpful, Correct and Harmless” compared to baseline pre-trained language models. The development of Sparrow involved asking paid study participants to interact with Sparrow, and collecting their preferences to train a model of how useful an answer is. To improve accuracy and help avoid the problem of hallucinating incorrect answers, Sparrow has the ability to search the Internet using Google Search in order to find and cite evidence for any factual claims it makes. To make the model safer, its behaviour is constrained by a set of rules, for example "don't make threatening statements" and "don't make hateful or insulting comments", as well as rules about possibly harmful advice, and not claiming to be a person. During development study participants were asked to converse with the system and try to trick it into breaking these rules. A 'rule model' was trained on judgements from these participants, which was used for further training. Sparrow was introduced in a paper in September 2022, titled "Improving alignment of dialogue agents via targeted human judgements"; however, the bot was not released publicly. DeepMind CEO Demis Hassabis said DeepMind is considering releasing Sparrow for a "private beta" some time in 2023. == Training == Sparrow is a deep neural network based on the transformer machine learning model architecture. It is fine-tuned from DeepMind's Chinchilla AI pre-trained large language model (LLM), which has 70 Billion parameters. Sparrow is trained using reinforcement learning from human feedback (RLHF), although some supervised fine-tuning techniques are also used. The RLHF training utilizes two reward models to capture human judgements: a “preference model” that predicts what a human study participant would prefer and a “rule model” that predicts if the model has broken one of the rules. == Limitations == Sparrow's training data corpus is mainly in English, meaning it performs worse in other languages. When adversarially probed by study participants it breaks the rules 8% of the time; however, this is still three times lower than the baseline prompted pre-trained model (Chinchilla).

    Read more →
  • Lose It!

    Lose It!

    Lose It! is an American health and wellness mobile app developed by FitNow, Inc. The app generates calorie budgets for users by tracking weight, exercise, food and calorie intake, and personal goals, primarily to assist them in achieving weight loss. == History == Lose It! was developed in Boston and debuted in 2008. The app and its associated company were founded by J.J. Allaire, Charles Teague and Paul Dicristina. Prior to founding Lose It!, Teague and Allaire had founded the online research tool Onfolio, which was acquired by Microsoft in 2006. The Lose It! app was originally released as an iOS app before being released as a website in 2010 and an Android app in 2011. In 2015, Lose It! announced plans to release the app internationally. Lose It! was also available as an app for Apple Watch at its launch in 2015. The app’s “Snap It” feature, which allows users to approximate calorie counts by taking pictures of their daily meals and snacks, was released in beta in 2016. Snap It was named an Innovation Awards Honoree at the 2017 Consumer Electronics Show in Las Vegas. In 2020, Patrick Wetherille, one of the company’s earliest employees, was appointed chief executive officer. == App == Lose It! is weight loss app. The app allows users to set goals such as increasing strength, overall health/maintenance, and weight loss. It provides users recommended calorie budgets based on data such as their current weight and their desired weight. Lose It! also tracks data such as exercise/activity level and food consumption and allows users to track calories consumed by scanning barcodes for food products then retrieving calorie information for products. The app can also estimate the amount of calories in a food products. Lose It! has integration features connecting it to other apps such as Fitbit and Runkeeper. It also has social features such as joining groups and sharing progress with friends. The Premium version of the app allows users to track foods according to specific diets like keto, heart healthy or Mediterranean.

    Read more →
  • Weight initialization

    Weight initialization

    In deep learning, weight initialization or parameter initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initialization is the pre-training step of assigning initial values to these parameters. The choice of weight initialization method affects the speed of convergence, the scale of neural activation within the network, the scale of gradient signals during backpropagation, and the quality of the final model. Proper initialization is necessary for avoiding issues such as vanishing and exploding gradients and activation function saturation. Note that even though this article is titled "weight initialization", both weights and biases are used in a neural network as trainable parameters, so this article describes how both of these are initialized. Similarly, trainable parameters in convolutional neural networks (CNNs) are called kernels and biases, and this article also describes these. == Constant initialization == We discuss the main methods of initialization in the context of a multilayer perceptron (MLP). Specific strategies for initializing other network architectures are discussed in later sections. For an MLP, there are only two kinds of trainable parameters, called weights and biases. Each layer l {\displaystyle l} contains a weight matrix W ( l ) ∈ R n l − 1 × n l {\displaystyle W^{(l)}\in \mathbb {R} ^{n_{l-1}\times n_{l}}} and a bias vector b ( l ) ∈ R n l {\displaystyle b^{(l)}\in \mathbb {R} ^{n_{l}}} , where n l {\displaystyle n_{l}} is the number of neurons in that layer. A weight initialization method is an algorithm for setting the initial values for W ( l ) , b ( l ) {\displaystyle W^{(l)},b^{(l)}} for each layer l {\displaystyle l} . The simplest form is zero initialization: W ( l ) = 0 , b ( l ) = 0 {\displaystyle W^{(l)}=0,b^{(l)}=0} Zero initialization is usually used for initializing biases, but it is not used for initializing weights, as it leads to symmetry in the network, causing all neurons to learn the same features. In this page, we assume b = 0 {\displaystyle b=0} unless otherwise stated. Recurrent neural networks typically use activation functions with bounded range, such as sigmoid and tanh, since unbounded activation may cause exploding values. (Le, Jaitly, Hinton, 2015) suggested initializing weights in the recurrent parts of the network to identity and zero bias, similar to the idea of residual connections and LSTM with no forget gate. In most cases, the biases are initialized to zero, though some situations can use a nonzero initialization. For example, in multiplicative units, such as the forget gate of LSTM, the bias can be initialized to 1 to allow good gradient signal through the gate. For neurons with ReLU activation, one can initialize the bias to a small positive value like 0.1, so that the gradient is likely nonzero at initialization, avoiding the dying ReLU problem. == Random initialization == Random initialization means sampling the weights from a normal distribution or a uniform distribution, usually independently. === LeCun initialization === LeCun initialization, popularized in (LeCun et al., 1998), is designed to preserve the variance of neural activations during the forward pass. It samples each entry in W ( l ) {\displaystyle W^{(l)}} independently from a distribution with mean 0 and variance 1 / n l − 1 {\displaystyle 1/n_{l-1}} . For example, if the distribution is a continuous uniform distribution, then the distribution is U ( ± 3 / n l − 1 ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {3/n_{l-1}}})} . === Glorot initialization === Glorot initialization (or Xavier initialization) was proposed by Xavier Glorot and Yoshua Bengio. It was designed as a compromise between two goals: to preserve activation variance during the forward pass and to preserve gradient variance during the backward pass. For uniform initialization, it samples each entry in W ( l ) {\displaystyle W^{(l)}} independently and identically from U ( ± 6 / ( n l + 1 + n l − 1 ) ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {6/(n_{l+1}+n_{l-1})}})} . In the context, n l − 1 {\displaystyle n_{l-1}} is also called the "fan-in", and n l + 1 {\displaystyle n_{l+1}} the "fan-out". When the fan-in and fan-out are equal, then Glorot initialization is the same as LeCun initialization. === He initialization === As Glorot initialization performs poorly for ReLU activation, He initialization (or Kaiming initialization) was proposed by Kaiming He et al. for networks with ReLU activation. It samples each entry in W ( l ) {\displaystyle W^{(l)}} from N ( 0 , 2 / n l − 1 ) {\displaystyle {\mathcal {N}}(0,2/n_{l-1})} . === Orthogonal initialization === (Saxe et al. 2013) proposed orthogonal initialization: initializing weight matrices as uniformly random (according to the Haar measure) semi-orthogonal matrices, multiplied by a factor that depends on the activation function of the layer. It was designed so that if one initializes a deep linear network this way, then its training time until convergence is independent of depth. Sampling a uniformly random semi-orthogonal matrix can be done by initializing X {\displaystyle X} by IID sampling its entries from a standard normal distribution, then calculate ( X X ⊤ ) − 1 / 2 X {\displaystyle \left(XX^{\top }\right)^{-1/2}X} or its transpose, depending on whether X {\displaystyle X} is tall or wide. For CNN kernels with odd widths and heights, orthogonal initialization is done this way: initialize the central point by a semi-orthogonal matrix, and fill the other entries with zero. As an illustration, a kernel K {\displaystyle K} of shape 3 × 3 × c × c ′ {\displaystyle 3\times 3\times c\times c'} is initialized by filling K [ 2 , 2 , : , : ] {\displaystyle K[2,2,:,:]} with the entries of a random semi-orthogonal matrix of shape c × c ′ {\displaystyle c\times c'} , and the other entries with zero. (Balduzzi et al., 2017) used it with stride 1 and zero-padding. This is sometimes called the Orthogonal Delta initialization. Related to this approach, unitary initialization proposes to parameterize the weight matrices to be unitary matrices, with the result that at initialization they are random unitary matrices (and throughout training, they remain unitary). This is found to improve long-sequence modelling in LSTM. Orthogonal initialization has been generalized to layer-sequential unit-variance (LSUV) initialization. It is a data-dependent initialization method, and can be used in convolutional neural networks. It first initializes weights of each convolution or fully connected layer with orthonormal matrices. Then, proceeding from the first to the last layer, it runs a forward pass on a random minibatch, and divides the layer's weights by the standard deviation of its output, so that its output has variance approximately 1. === Fixup initialization === In 2015, the introduction of residual connections allowed very deep neural networks to be trained, much deeper than the ~20 layers of the previous state of the art (such as the VGG-19). Residual connections gave rise to their own weight initialization problems and strategies. These are sometimes called "normalization-free" methods, since using residual connection could stabilize the training of a deep neural network so much that normalizations become unnecessary. Fixup initialization is designed specifically for networks with residual connections and without batch normalization, as follows: Initialize the classification layer and the last layer of each residual branch to 0. Initialize every other layer using a standard method (such as He initialization), and scale only the weight layers inside residual branches by L − 1 2 m − 2 {\displaystyle L^{-{\frac {1}{2m-2}}}} . Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer. Similarly, T-Fixup initialization is designed for Transformers without layer normalization. === Others === Instead of initializing all weights with random values on the order of O ( 1 / n ) {\displaystyle O(1/{\sqrt {n}})} , sparse initialization initialized only a small subset of the weights with larger random values, and the other weights zero, so that the total variance is still on the order of O ( 1 ) {\displaystyle O(1)} . Random walk initialization was designed for MLP so that during backpropagation, the L2 norm of gradient at each layer performs an unbiased random walk as one moves from the last layer to the first. Looks linear initialization was designed to allow the neural network to behave like a deep linear network at initialization, since W R e L U ( x ) − W R e L U ( − x ) = W x {\displaystyle W\;\mathrm {ReLU} (x)-W\;\mathrm {ReLU} (-x)=Wx} . It initializes a matrix W {\displaystyle W} of shape R n 2 × m {\displaystyle \mathbb {R} ^{{\frac {n}{2}}\times m}} by any method, such as orthogonal initialization, t

    Read more →
  • GPT-5

    GPT-5

    GPT-5 is a multimodal large language model developed by OpenAI and the fifth in its series of generative pre-trained transformer (GPT) foundation models. Preceded in the series by GPT-4, it was launched on August 7, 2025. It is publicly accessible to users of the chatbot products ChatGPT and Microsoft Copilot as well as to developers through the OpenAI API. == Background == On April 14, 2023, Sam Altman, the chief executive officer of OpenAI, spoke at an event at the Massachusetts Institute of Technology and said that the company was not training GPT-5 at that time. He stated that OpenAI was "prioritizing GPT-4 development" and that "we are not and won't for some time" release GPT-5. On July 18, OpenAI filed for a "GPT-5" trademark in the United States. On November 13, Altman confirmed to the Financial Times that the company was working to develop GPT-5. According to The Information, "[f]or much of the second half of 2024, OpenAI was developing a model known internally as Orion and intended to become GPT-5", "[b]ut the Orion effort failed to produce a better model, and the company instead released it as GPT-4.5 in February [2025]." By late July 2025, OpenAI was widely anticipated as planning to release GPT-5 in early August. On July 30, The Verge reported that "Microsoft is getting ready for GPT-5" as "sources familiar with Microsoft's AI plans" told an editor that the company was testing a new mode for its Copilot chatbot that would offer a model that "thinks deeply or quickly based on the task". On August 5, in the leadup to the release of GPT-5, OpenAI released GPT-OSS, a set of two open-weight models that have reasoning capabilities. GPT-5 was then unveiled during a livestream event on August 7. == Capabilities == At the time of its release, GPT-5 had state-of-the-art performance on benchmarks that test mathematics, programming, finance, and multimodal understanding. According to OpenAI, improvements over its predecessor models include faster response times, better coding and writing skills, more accurate answers to health questions, and lower levels of hallucination. Also, compared to previous models, GPT-5 aims to give safe, high-level responses to potentially harmful queries rather than outright declining them, an approach that OpenAI refers to as "safe completions", aiming to result "in GPT-5 being able to refuse more unsafe questions, while offering fewer rejections to users seeking harmless information." In addition, GPT-5 was trained to give more critical, "less effusively agreeable" answers compared to its predecessor models. Days before the launch of GPT-5, two early testers of the model stated that they were "impressed" by its ability to code and to solve mathematical and scientific problems. They suggested that the model shows great improvement from GPT-4, but not as large of a gain as from GPT-3 to GPT-4. A day prior to the release of GPT-5, during a press briefing, Sam Altman, the chief executive officer of OpenAI, called GPT-5 "a significant step along the path to AGI", referring to artificial general intelligence, the hypothetical level of intelligence that OpenAI defines as the ability to perform any economically valuable task that a human can. According to Altman, GPT-5 is "significantly better" than its predecessors, offering "PhD-level" abilities across a wide range of tasks. The exact energy consumption of GPT-5 use has not been disclosed by OpenAI. Researchers at the University of Rhode Island estimated that a medium-length response consumes slightly over 18 watt-hours, equivalent to using an incandescent bulb for 18 minutes. === Architecture === GPT-5 is a system that contains a fast, high-throughput model, a deeper reasoning model, and a real-time router that decides which model to use based on conversation type, complexity, tool needs, and explicit user intent. Altman had previously criticized the manual model picker for being overly complex, suggesting a need for unification. GPT-5 also includes agentic functionality through which it can set up its own desktop and can use its browser to search autonomously for sources that relate to its task. The GPT-5 system card defines two fast, high-throughput models – gpt-5-main and gpt-5-main-mini – and two thinking models – gpt-5-thinking and gpt-5-thinking-mini. In the OpenAI API, developers can access the thinking model, its mini version, and gpt-5-thinking-nano, an even smaller and faster nano version of the thinking model. The version of GPT-5 that is accessible via the API has adjustable reasoning effort (low, medium, high, or minimal) and verbosity (low, medium, or high). Additionally, ChatGPT provides access to gpt-5-thinking with a setting that makes use of parallel test-time compute, referred to as gpt-5-thinking-pro. == Limitations == === Safety === Neuraltrust, a security research company, claimed to have successfully compromised GPT-5 within its first day of testing the model. According to its report, it enabled GPT-5 to generate detailed instructions for manufacturing explosive devices. SPLX, another company, conducted similar tests and came to similar conclusions about GPT-5's security. Their assessments suggest that GPT-5 has significant security gaps, potentially rendering it as being unsafe for use in a corporate environment. == Training == According to AIMultiple, GPT-5 is natively multimodal, meaning that it was trained from scratch on multiple modalities (like text and images) at once without relying on already-trained language or vision models. Its training process involved three stages: unsupervised pretraining, supervised fine-tuning, and reinforcement learning from human feedback. Pretraining used a large-scale multilingual dataset of books, articles, web pages, academic papers, and licensed sources. GPT-5's visual and text capabilities were described as having been developed alongside each other throughout training, unlike with GPT-4. == Use == GPT-5 is used in ChatGPT. Although GPT-5 is free for all ChatGPT users, Plus users get higher use limits while Pro users get unlimited access to GPT-5 as well as limited access to GPT-5 Pro. Standard limits for lower-tier users on responses per hour still apply. Additionally, with the introduction of GPT-5, ChatGPT's "Advanced Voice Mode" was replaced by "ChatGPT Voice", which is supposed to enable more natural-sounding conversations. OpenAI stated that "Standard Voice Mode retires on September 9, 2025, unifying all users on ChatGPT Voice". On November 24, 2025, the feature of shopping research was added to ChatGPT, claimed to be a mini model post-trained on gpt-5-thinking-mini. GPT-5 is also available in Microsoft Copilot, and Microsoft stated that it will incorporate GPT-5 into a wide variety of its products. According to 9to5Mac, Apple Inc. is planning to integrate the model into the Apple Intelligence feature in its iOS 26, iPadOS 26, and macOS Tahoe operating systems. It is also accessible via the OpenAI API. A number of American companies were reported as having received access to GPT-5 ahead of its launch. OpenAI stated that the private health insurance company Oscar Health was checking applications from its policyholders with the model. In addition, Uber was using GPT-5 for its customer support system; GitLab, Windsurf, and Cursor were using the model for software development; and the Spanish bank BBVA was using it for financial analysis. Other companies that OpenAI listed as having used GPT-5 pre-release include Amgen, Lowe's, and Notion. == Reception == === Critical reviews === Grace Huckins in MIT Technology Review found that, "[w]hereas o1 was a major technological advancement, GPT-5 is, above all else, a refined product." In response to claims that Sam Altman, the chief executive officer of OpenAI, had made about the model, she stated that "GPT-5 will furnish a more pleasant and seamless user experience. That's not nothing, but it falls far short of the transformative AI future that Altman has spent much of the past year hyping." In response to Altman's claim that GPT-5 is "a significant step along the path" to artificial general intelligence, she noted: "[M]aybe he's right—but if so, it's a very small step." In The Information, Stephanie Palazzolo praised GPT-5's coding capabilities. According to Matteo Wong in The Atlantic, GPT-5 "is intuitive, fast, and efficient; adapts to human preferences and intentions; and is easy to personalize." He stated: "At this stage of the AI boom, when every major chatbot is legitimately helpful in numerous ways, benchmarks, science, and rigor feel almost insignificant. What matters is how the chatbot feels [...]". John Herrman from the New York magazine wrote: "Casual users who encounter GPT-5 through ChatGPT aren't likely to feel like they're using a completely different product [...] while people who use it for software development or in a corporate context are more likely to notice a major change." Mashable's Christian de Looper found that "GPT-5

    Read more →
  • Ghana Post GPS

    Ghana Post GPS

    GhanaPostGPS is a web and smartphone application, sponsored by the government of Ghana and developed by Vokacom, to provide a digital addresses and postal codes for every 5 squared meter location in Ghana. The digital address is a composite of the postcode (region, district & area code) plus a unique address. GhanaPostGPS is the first digital addressing system created by the government of Ghana. GhanaPost GPS is a mandatory requirement for obtaining the National Identification Card and other services.

    Read more →
  • Computational photography

    Computational photography

    Computational photography refers to digital image capture and processing techniques that use digital computation instead of optical processes. Computational photography can improve the capabilities of a camera, or introduce features that were not possible at all with film-based photography, or reduce the cost or size of camera elements. Examples of computational photography include in-camera computation of digital panoramas, high-dynamic-range images, and light field cameras. Light field cameras use novel optical elements to capture three-dimensional scene information, which can then be used to produce 3D images, enhanced depth-of-field, and selective de-focusing (or "post focus"). Enhanced depth-of-field reduces the need for mechanical focusing systems. All of these features use computational imaging techniques. The definition of computational photography has evolved to cover a number of subject areas in computer graphics, computer vision, and applied optics. These areas are given below, organized according to a taxonomy proposed by Shree K. Nayar. Within each area is a list of techniques, and for each technique, one or two representative papers or books are cited. Deliberately omitted from the taxonomy are image processing (see also digital image processing) techniques applied to traditionally captured images to produce better images. Examples of such techniques are image scaling, dynamic range compression (i.e. tone mapping), color management, image completion (a.k.a. inpainting or hole filling), image compression, digital watermarking, and artistic image effects. Also omitted are techniques that produce range data, volume data, 3D models, 4D light fields, 4D, 6D, or 8D BRDFs, or other high-dimensional image-based representations. Epsilon photography is a sub-field of computational photography. == Effect on photography == Photos taken using computational photography can allow amateurs to produce photographs rivalling the quality of professional photographers, but as of 2019 do not outperform the use of professional-level equipment. == Computational illumination == This is controlling photographic illumination in a structured fashion, then processing the captured images, to create new images. The applications include image-based relighting, image enhancement, image deblurring, geometry/material recovery and so forth. High-dynamic-range imaging uses differently exposed pictures of the same scene to extend dynamic range. Other examples include processing and merging differently illuminated images of the same subject matter ("lightspace"). == Computational optics == This is a capture of optically coded images, followed by computational decoding to produce new images. Coded aperture imaging was mainly applied in astronomy and X-ray imaging to boost the image quality. Instead of a single pin-hole, a pinhole pattern is applied in imaging, and deconvolution is performed to recover the image. In coded exposure imaging, the on/off state of the shutter is coded to modify the kernel of motion blur. In this way, motion deblurring becomes a well-conditioned problem. Similarly, in a lens based coded aperture, the aperture can be modified by inserting a broadband mask. Thus, out of focus deblurring becomes a well-conditioned problem. The coded aperture can also improve the quality in light field acquisition using Hadamard transform optics. Coded aperture patterns can also be designed using color filters, in order to apply different codes at different wavelengths. This allows for increase the amount of light that reaches the camera sensor, compared to binary masks. == Computational imaging == Computational imaging is a set of imaging techniques that combine data acquisition and data processing to create the image of an object through indirect means to yield enhanced resolution, additional information such as optical phase or 3D reconstruction. The information is often recorded without using a conventional optical microscope configuration or with limited datasets. Computational imaging allows going beyond physical limitations of optical systems, such as numerical aperture, or even obliterates the need for optical elements. For parts of the optical spectrum where imaging elements such as objectives are difficult to manufacture or image sensors cannot be miniaturized, computational imaging provides useful alternatives, in fields such as X-ray and THz radiations. === Common techniques === Among common computational imaging techniques are lensless imaging, computational speckle imaging , ptychography and Fourier ptychography. Computational imaging technique often draws on compressive sensing or phase retrieval techniques, where the angular spectrum of the object is reconstructed. Other techniques are related to the field of computational imaging, such as digital holography, computer vision and inverse problems such as tomography. == Computational processing == This is the processing of non-optically-coded images to produce new images. == Computational sensors == These are detectors that combine sensing and processing, typically in hardware, like the oversampled binary image sensor. == Early work in computer vision == Although computational photography is a currently popular buzzword in computer graphics, many of its techniques first appeared in the computer vision literature, either under other names or within papers aimed at 3D shape analysis. == Art history == Computational photography, as an art form, has been practiced by capturing differently exposed pictures of the same subject matter and combining them. This was the inspiration for the development of the wearable computer in the 1970s and early 1980s. Computational photography was inspired by the work of Charles Wyckoff, and thus computational photography datasets (e.g. differently exposed pictures of the same subject matter that are taken in order to make a single composite image) are sometimes referred to as Wyckoff Sets, in his honor. Early work in this area (joint estimation of image projection and exposure value) was undertaken by Mann and Candoccia. Charles Wyckoff devoted much of his life to creating special kinds of 3-layer photographic films that captured different exposures of the same subject matter. A picture of a nuclear explosion, taken on Wyckoff's film, appeared on the cover of Life Magazine and showed the dynamic range from the dark outer areas to the inner core.

    Read more →
  • Visual temporal attention

    Visual temporal attention

    Visual temporal attention is a special case of visual attention that involves directing attention to specific instant of time. Similar to its spatial counterpart visual spatial attention, these attention modules have been widely implemented in video analytics in computer vision to provide enhanced performance and human interpretable explanation of deep learning models. As visual spatial attention mechanism allows human and/or computer vision systems to focus more on semantically more substantial regions in space, visual temporal attention modules enable machine learning algorithms to emphasize more on critical video frames in video analytics tasks, such as human action recognition. In convolutional neural network-based systems, the prioritization introduced by the attention mechanism is regularly implemented as a linear weighting layer with parameters determined by labeled training data. == Application in Action Recognition == Recent video segmentation algorithms often exploits both spatial and temporal attention mechanisms. Research in human action recognition has accelerated significantly since the introduction of powerful tools such as Convolutional Neural Networks (CNNs). However, effective methods for incorporation of temporal information into CNNs are still being actively explored. Motivated by the popular recurrent attention models in natural language processing, the Attention-aware Temporal Weighted CNN (ATW CNN) is proposed in videos, which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is implemented as temporal weighting and it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW CNN framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with back-propagation. Experimental results show that the ATW CNN attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments. == Literature == Seibold VC, Balke J and Rolke B (2023): Temporal attention. Front. Cognit. 2:1168320. doi: 10.3389/fcogn.2023.1168320.

    Read more →
  • EXAPT

    EXAPT

    EXAPT (a portmanteau of "Extended Subset of APT") is a production-oriented programming language that allows users to generate NC programs with control information for machining tools and facilitates decision-making for production-related issues that may arise during various machining processes. EXAPT was first developed to address industrial requirements. Through the years, the company created additional software for the manufacturing industry. Today, EXAPT offers a suite of SAAS products and services for the manufacturing industry. The trade name, EXAPT, is most commonly associated with the CAD/CAM-System, production data, and tool management software of the German company EXAPT Systemtechnik GmbH based in Aachen, DE. == General == EXAPT is a modularly built programming system for all NC machining operations as Drilling Turning Milling Turn-Milling Nibbling Flame-, laser-, plasma- and water jet cutting Wire eroding Operations with industrial robots Due to the modular structure, the main product groups, EXAPTcam and EXAPTpdo, are gradually expandable and permit individual software for the manufacturing industry used individually and also in a compound with an existing IT environment. == Functionality == EXAPTcam meets the requirements for NC planning, especially for the cutting operations such as turning, drilling, and milling up to 5-axis simultaneous machining. Thereby new process technologies, tool, and machine concepts are constantly involved. In the NC programming data from different sources such as 3D CAD models, drawings or tables can flow in. The possibilities of NC programming reaches from language-oriented to feature-oriented NC programming. The integrated EXAPT knowledge database and intelligent and scalable automatisms support the user. The EXAPT NC planning also covers the generation of production information as clamping and tool plans, presetting data or time calculations. The realistic simulation possibilities of NC planning and NC control data provide with production reliability. EXAPTpdo (EXAPT ProductionsDataOrganization) provides a neutrally applicable technology platform for the information compound of the NC planning - to the shop floor. This applies to all NC production data that are necessary for the set-up of NC machines, for the provision, presetting, and stocking of manufacturing resources and provided by EXAPTpdo in a central database. Besides classical functions of the tool management system (TMS) as the management of cutting tools, measuring, testing and clamping devices the technology data management and tool lifecycle management (TLM) is also included. System-supported "where-used lists" helps to handle the manufacturing resource cycle by secured requirement determination and requirement fulfillment. Unnecessary transports and unplanned dispositive adjustments are dropped, stocks are reduced, set-up times reduced and the throughput is increased. EXAPTpdo synchronizes involved systems within the value chain. Stock systems, MES systems or ERP systems (e.g. from the purchasing or production areas) do not work in isolation from each other but they interact with each other. EXAPTpdo provides the base to Smart Factory, for more flexibility in production and faster communication. == History == With the foundation of the EXAPT-Verein in 1967 as spin-off of the universities Aachen, Berlin and Stuttgart the further development "EXAPT (EXtended Subset of APT)" of the programming language "APT (Automatically Programmed Tool)" was focused and so the first milestone for the EXAPT history was set. In the same year the system EXAPT 1 for drilling and simple milling tasks became available. 1969 The industrial application of EXAPT 2 for the programming of NC machines with 2-axis linear and path control begins. In the following year, the development of the EXAPT modular system starts. 1972 BASIC-EXAPT is provided for the universal, homogeneous programming of all NC tasks. The support is made by the EXAPT applications consultancy. 1973 EXAPT 1.1 is provided for the programming of straight-cut and continuous-path controlled drilling and milling machines and machining centers. At the Hanover Fair (IHA 73) the interactive access to a mainframe via a time-sharing terminal for the part program entry and correction is presented and starts the replacement of the punch card. 1974 The possibilities for the use of process computers for the NC data transfer are leveled out. EXAPT offers the possibility of the result simulation when using plotters with display of tool paths and tools in assignment to the workpiece. In April 1975, the EXAPT NC Systemtechnik GmbH was founded with the aim, of enabling entry into the NC technique for small and medium-sized companies by a complete product and service program. In the following year, the system portfolio is extended with further system modules and service programs and the provision of postprocessors. 1978 The development activities on the EXAPT module system started in 1970 are completed. Using modern software techniques, the different system parts BASIC-EXAPT, EXAPT 1, EXAPT 1.1, and EXAPT 2 are composed of a total system. System support and applications consultancy become a new working focus. From the beginning to the middle of the 1980s Beside new portable software modules for CAD/CAM applications (e. g. CAPEX, NESTEX, CADEX, CADCPL), the first version of the EXAPT DNC system and extensions of the EXAPT NC programming system for the machining of sculptured surfaces are presented. 1988 EXAPT expands the software product range by systems for tool data management (BMO) and production data management (FDO). EXAPT trains more than 1,300 course participants including company-specific courses. 1992 The first version of the completely new product generation EXAPTplus is presented and the agency in Dresden is opened. 1993 The company name "EXAPT NC Systemtechnik GmbH" is changed to "EXAPT Systemtechnik GmbH." EXAPTplus is presented on PC under Windows NT at the EMO '93. The decentralization of the use of EXAPT systems expands the range of applications. In the following year, EXAPT-DNC is executable under Windows on a customary PC. Special hardware is not needed and so it can be used in compound with the database-supported EXAPT production data management system (FDO). 1995 EXAPTplus is also ready for complex application cases such as machining of tubes at extrusion tools. EXAPT-CADI provides the transfer of 2D CAD data to EXAPTplus. With the new office Gießen the marketing is strengthened. In the following year the EXAPT NC editor is developed for the direct processing of NC control data with tool path display and visualization of the tools. In the course of the market entry of more comfortable 3D CAD systems for the solid modelling of components a detailed evaluation of current systems is made in 1997. It is decided to use SolidWorks as a reference system for the solid-oriented NC planning with EXAPT. 1998 The first solution for the transfer of geometry data between SolidWorks and EXAPTplus is generated. The EXAPT organization systems are (beside SQL) also executable under Oracle now. The use of client server solutions supports the data flow in the production. 1999 AFR functions are provided in connection with EXAPTsolid to support a workpiece modelling for NC. The millennium capability is ensured for all EXAPT systems. AFR is a ground-breaking for the integration of third-party products. 2002 EXAPT-BMG is developed for the generation and visualization of tools with additional functions for the assembly from components. The acquisition of tools with their geometric and technological presentation offers extensive support of the NC planning with EXAPT systems. 2003 EXAPTpdo is available to optimize the process chains in production planning and production execution optimally regarding the increasing requirements of changing production conditions. 2004 Diverse system extensions are made in EXAPTplus, EXAPTsolid, EXAPT NC editor, EXAPTpdo for the complete machining on turning/milling centres with result reliability because of more extensive simulation based on realNC (Tecnomatix), for the use of new complex tool systems and the compound use between ERP systems as SAP and intelligent CNC systems. In the following year, EXAPTpdo is extended for the cross-order set-up optimization and provision of manufacturing re-sources especially for single and small series production with connection to purchase and physical portfolio management. 2006 The EXAPT systems are available for extended use as an information platform for production, the time management, and similar requirements. EXAPTsolid is extended for the feature-oriented milling operation and machine simulation. The NC programming of complex machine tools, e.g. three-turret-turning/milling centers is supported by EXAPT systems, as well as the use of multi-functional tools. 2007 A module for 3-5-axis simultaneous milling machining is presented.

    Read more →
  • Natural language processing

    Natural language processing

    Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and linguistics more broadly. Major processing tasks in an NLP system include: speech recognition, text classification, natural language understanding, and natural language generation. == History == Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence," which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language. === Symbolic NLP (1950s – early 1990s) === The premise of symbolic NLP is often illustrated using John Searle's Chinese room thought experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. 1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe) until the late 1980s when the first statistical machine translation systems were developed. 1960s: Some notably successful natural language processing systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies, and ELIZA, a simulation of Rogerian psychotherapy, written by Joseph Weizenbaum between 1964 and 1966. Despite using minimal information about human thought or emotion, ELIZA was able to produce interactions that appeared human-like. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only twenty words, because that was all that would fit in a computer memory at the time. 1970s: During the 1970s, many programmers began to write "conceptual ontologies", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first chatterbots were written (e.g., PARRY). 1980s: The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of HPSG as a computational operationalization of generative grammar), morphology (e.g., two-level morphology), semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory) and other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other lines of research were continued, e.g., the development of chatterbots with Racter and Jabberwacky. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period. === Statistical NLP (1990s–present) === Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This shift was influenced by increasing computational power (see Moore's law) and a decline in the dominance of Chomskyan linguistic theories (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. 1990s: Many of the notable early successes in statistical methods in NLP occurred in the field of machine translation, due especially to work at IBM Research, such as IBM alignment models. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, many systems relied on corpora that were specifically developed for the tasks they were designed to perform. This reliance has been a major limitation to their broader effectiveness and continues to affect similar systems. Consequently, significant research has focused on methods for learning effectively from limited amounts of data. 2000s: With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, large quantities of non-annotated data are available (including, among other things, the entire content of the World Wide Web), which can often make up for the worse efficiency if the algorithm used has a low enough time complexity to be practical. 2003: word n-gram model, at the time the best statistical algorithm, is outperformed by a multi-layer perceptron (with a single hidden layer and context length of several words, trained on up to 14 million words, by Bengio et al.) 2010: Tomáš Mikolov (then a PhD student at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden layer to language modeling, and in the following years he went on to develop Word2vec. In the 2010s, representation learning and deep neural network-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. This shift gained momentum due to results showing that such techniques can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling and parsing. This is increasingly important in medicine and healthcare, where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care or protect patient privacy. == Approaches: Symbolic, statistical, neural networks == Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular: such as by writing grammars or devising heuristic rules for stemming. Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: both statistical and neural network methods tend to focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare and common cases equally. language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce. the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to intractability problems. Rule-based systems are commonly used: when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system, for preprocessing in NLP pipelines, e.g., tokenization, or for post-processing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses. === Statistical approach === In the late 1980s and mid-1990s, the statistical approach ended a peri

    Read more →
  • Language model benchmark

    Language model benchmark

    A language model benchmark is a standardized test designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like answering questions, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. In addition to accuracy, the metrics can include throughput, energy efficiency, bias, trust, and sustainability. == Overview == === Types === Benchmarks may be described by the following adjectives, not mutually exclusive: Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as well as bilingual translation benchmarked by BLEU scores. Question answering: These tasks have a text question and a text answer, often multiple-choice. They can be open-book or closed-book. Open-book QA resembles reading comprehension questions, with relevant passages included as annotation in the question, in which the answer appears. Closed-book QA includes no relevant passages. Closed-book QA is also called open-domain question-answering. Before the era of large language models, open-book QA was more common, and understood as testing information retrieval methods. Closed-book QA became common since GPT-2 as a method to measure knowledge stored within model parameters. Omnibus: An omnibus benchmark combines many benchmarks, often previously published. It is intended as an all-in-one benchmarking solution. Reasoning: These tasks are usually in the question-answering format, but are intended to be more difficult than standard question answering. Multimodal: These tasks require processing not only text, but also other modalities, such as images and sound. Examples include OCR and transcription. Agency: These tasks are for a language-model–based software agent that operates a computer for a user, such as editing images, browsing the web, etc. Adversarial: A benchmark is "adversarial" if the items in the benchmark are picked specifically so that certain models do badly on them. Adversarial benchmarks are often constructed after state of the art (SOTA) models have saturated (achieved 100% performance) a benchmark, to renew the benchmark. A benchmark is "adversarial" only at a certain moment in time, since what is adversarial may cease to be adversarial as newer SOTA models appear. Public/Private: A benchmark might be partly or entirely private, meaning that some or all of the questions are not publicly available. The idea is that if a question is publicly available, then it might be used for training, which would be "training on the test set" and invalidate the result of the benchmark. Usually, only the guardians of the benchmark have access to the private subsets, and to score a model on such a benchmark, one must send the model weights, or provide API access, to the guardians. The boundary between a benchmark and a dataset is not sharp. Generally, a dataset contains three "splits": training, test, and validation. Both the test and validation splits are essentially benchmarks. In general, a benchmark is distinguished from a test/validation dataset in that a benchmark is typically intended to be used to measure the performance of many different models that are not trained specifically for doing well on the benchmark, while a test/validation set is intended to be used to measure the performance of models trained specifically on the corresponding training set. In other words, a benchmark may be thought of as a test/validation set without a corresponding training set. Conversely, certain benchmarks may be used as a training set, such as the English Gigaword or the One Billion Word Benchmark, which in modern language is just the negative log-likelihood loss on a pretraining set with 1 billion words. Indeed, the distinction between benchmark and dataset in language models became sharper after the rise of the pretraining paradigm, whereby a model is first trained on massive, unlabeled datasets to learn general language patterns, syntax, and knowledge (pretraining), and the base model is then adapted to specific, downstream tasks using smaller, labeled datasets (fine-tuning). === Lifecycle === Generally, the life cycle of a benchmark consists of the following steps: Inception: A benchmark is published. It can be simply given as a demonstration of the power of a new model (implicitly) that others then picked up as a benchmark, or as a benchmark that others are encouraged to use (explicitly). Growth: More papers and models use the benchmark, and the performance on the benchmark grows. Maturity, degeneration or deprecation: A benchmark may be saturated, after which researchers move on to other benchmarks. Progress on the benchmark may also be neglected as the field moves to focus on other benchmarks. Renewal: A saturated benchmark can be upgraded to make it no longer saturated, allowing further progress. === Construction === Like datasets, benchmarks are typically constructed by several methods, individually or in combination: Web scraping: Ready-made question-answer pairs may be scraped online, such as from websites that teach mathematics and programming. Conversion: Items may be constructed programmatically from scraped web content, such as by blanking out named entities from sentences, and asking the model to fill in the blank. This was used for making the CNN/Daily Mail Reading Comprehension Task. Crowd sourcing: Items may be constructed by paying people to write them, such as on Amazon Mechanical Turk. This was used for making the MCTest. === Evaluation === Generally, benchmarks are fully automated. This limits the questions that can be asked. For example, with mathematical questions, "proving a claim" would be difficult to automatically check, while "calculate an answer with a unique integer answer" would be automatically checkable. With programming tasks, the answer can generally be checked by running unit tests, with an upper limit on runtime. The benchmark scores are of the following kinds: For multiple choice or cloze questions, common scores are accuracy (frequency of correct answer), precision, recall, F1 score, etc. pass@n: The model is given n {\displaystyle n} attempts to solve each problem. If any attempt is correct, the model earns a point. The pass@n score is the model's average score over all problems. k@n: The model makes n {\displaystyle n} attempts to solve each problem, but only k {\displaystyle k} attempts out of them are selected for submission. If any submission is correct, the model earns a point. The k@n score is the model's average score over all problems. cons@n: The model is given n {\displaystyle n} attempts to solve each problem. If the most common answer is correct, the model earns a point. The cons@n score is the model's average score over all problems. Here "cons" stands for "consensus" or "majority voting". The pass@n score can be estimated more accurately by making N > n {\displaystyle N>n} attempts, and use the unbiased estimator 1 − ( N − c n ) ( N n ) {\displaystyle 1-{\frac {\binom {N-c}{n}}{\binom {N}{n}}}} , where c {\displaystyle c} is the number of correct attempts. For less well-formed tasks, where the output can be any sentence, there are the following commonly used scores including BLEU ROUGE, METEOR, NIST, word error rate, LEPOR, CIDEr, and SPICE. === Issues === error: Some benchmark answers may be wrong. ambiguity: Some benchmark questions may be ambiguously worded. subjective: Some benchmark questions may not have an objective answer at all. This problem generally prevents creative writing benchmarks. Similarly, this prevents benchmarking writing proofs in natural language, though benchmarking proofs in a formal language is possible. open-ended: Some benchmark questions may not have a single answer of a fixed size. This problem generally prevents programming benchmarks from using more natural tasks such as "write a program for X", and instead uses tasks such as "write a function that implements specification X". inter-annotator agreement: Some benchmark questions may be not fully objective, such that even people would not agree with 100% on what the answer should be. This is common in natural language processing tasks, such as syntactic annotation. shortcut: Some benchmark questions may be easily solved by an "unintended" shortcut. For example, in the SNLI benchmark, having a negative word like "not" in the second sentence is a strong signal for the "Contradiction" category, regardless of what the se

    Read more →
  • Pronunciation assessment

    Pronunciation assessment

    Automatic pronunciation assessment uses computer speech recognition to determine how accurately speech has been pronounced, instead of relying on a human instructor or proctor. It is also called speech verification, pronunciation evaluation, and pronunciation scoring. This technology is used to grade speech quality, for language testing, for computer-aided pronunciation teaching (CAPT) in computer-assisted language learning (CALL), for speaking skill remediation, and for accent reduction. Pronunciation assessment is different from dictation or automatic transcription, because instead of determining unknown speech, it verifies learners' pronunciation of known word(s), often from prior transcription of the same utterance; ideally scoring the intelligibility of the learners' speech. Sometimes pronunciation assessment evaluates the prosody of the learners' speech, such as intonation, pitch, tempo, rhythm, and syllable and word stress, although those are usually not essential for being understood in most languages. Pronunciation assessment is also used in reading tutoring, for example in products from Google, Microsoft, and Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia. == Intelligibility == Intelligibility refers to how well a learner's utterance is understood by a listener, rather than how much it sounds like a native speaker. This is separate from measures of fluency, such as so-called "Goodness of Pronunciation" (GoP) scores, which estimate how closely an utterance aligns with those of native speakers. Intelligibility is widely regarded as the most important communicative goal in pronunciation teaching and assessment. For example, in the Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels. Studies in applied linguistics have shown that accent reduction does not always increase intelligibility because listeners can often comprehend heavily accented speech without difficulty. Pronunciation assessment systems often rely on acoustic methods such as GoP which compare learner speech to reference models to produce phoneme-level scores, which are in turn aggregated to produce word and phrase scores. While these methods are effective for identifying deviations from native speakers' utterances, they do not effectively measure how understandable speech is to human listeners. Intelligibility is influenced by broader linguistic and contextual factors such as stress placement, speech rate, and coarticulation, which are not represented in purely segmental scores. The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility, a shortcoming corrected in 2011 at the Toyohashi University of Technology, and included in the Versant high-stakes English fluency assessment from Pearson and mobile apps from 17zuoye Education & Technology, but still missing in 2023 products from Google Search, Microsoft, Educational Testing Service, Speechace, and ELSA. Assessing authentic listener intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech-to-text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores (from 10-25ms audio frame logit aggregation) closely correlated with genuine listener intelligibility. Others have been able to assess intelligibility using Levenshtein or dynamic time warping distance measures from Wav2Vec2 representation of good speech. Further work through 2025 has focused specifically on measuring intelligibility. A 2025 study of 42 pronunciation and speech coaching apps (32 mobile and 10 web) found that none offered intelligibility assessment. Instead, most provided only segmental and accent-focused scoring. About two-thirds of the apps provided some form of specific pronunciation feedback, usually with phonetic transcriptions, but accompanied by visual cues (such as animations of the vocal tract or the lips and tongue from the front) in only about 5% of the apps. Less than a third provided feedback on learner perception of exemplar speech. == Evaluation == Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality. Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions. As of mid-2025, state of the art approaches for automatically transcribing phonemes typically achieve an error rate of about 10% from known good speech. The International Speech Communication Association (ISCA) 2025 Workshop on Speech and Language Technology in Education (SLaTE) administered a Speak & Improve Challenge: Spoken Language Assessment and Feedback, introducing benchmarks for evaluating pronunciation assessment and remediation systems across languages, accents, and learner populations. The challenge emphasized cross-lingual generalization and alignment with human intelligibility judgments, for more robust and interpretable assessment systems. Ethical issues in pronunciation assessment are present in both human and automatic methods. Authentic validity, fairness, and mitigating bias in evaluation are all crucial. Diverse speech data should be included in automatic pronunciation assessment models. Combining human judgments, especially blinded transcriptions from a wide diversity of listeners, with automated feedback can improve accuracy and fairness. Second language learners benefit substantially from their use of widely available speech recognition systems for dictation, virtual assistants, and AI chatbots. In such systems, users naturally try to correct their own errors evident in speech recognition results that they notice. Such use improves their grammar and vocabulary development along with their pronunciation skills. The extent to which explicit pronunciation assessment and remediation approaches improve on such self-directed interactions remains an open question. Similarly, automatic dictation results have been shown to reflect intelligibility about as well as human scorers. == Recent developments == During 2021–22, a smartphone-based CAPT system was used to sense articulation through both audible and inaudible signals, providing feedback at the phoneme level. Some promising areas for improvement which were being developed in 2024 include articulatory feature extraction and transfer learning to suppress unnecessary corrections. Other interesting advances under development include "augmented reality" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments. In 2024, audio multimodal large language models were first described as assessing pronunciation. That work has been carried forward by other researchers in 2025 who report positive results. Subsequently, researchers demonstrated pronunciation scoring by providing a language model with textual descriptions of speech, including the speech-to-text transcript, phoneme sequences, pauses, and phoneme sequence matching; this approach can achieve performance similar to multimodal LLMs that analyze raw audio while avoiding their higher computational cost. In 2025, the Duolingo English Test authors published a description of their pronunciation assessment method, purportedly built to measure intelligibility rather than accent imitation. While achieving a correlation of 0.82 with expert human ratings, very close to inter-rater agreement and outperforming alternative methods, the method is nonetheless based on experts' scores along the six-point CEFR common reference levels scale, instead of actual blinded listener transcriptions. Further promising work in 2025 includes assessment feedback aligning learner speech to synthetic utterances using interpretable features, identifying continuous spans of words for remediation feedback; synthesizing corrected speech matching learners' self-perceived voices, which they prefer and imitate more accurately as corrections; and streaming such interactions. On January 21, 2026, Educational Testing Service's TOEFL iBT high-stakes English language test, required by US university admissions and employers from English as a foreign language applicants more often than all other internet-based tests combined, changed its speaking assessments. While official rubrics claim that the new scoring will be based primarily on intelligibility, the new test's technical description indicates that it ju

    Read more →
  • Mobile cloud storage

    Mobile cloud storage

    Mobile cloud storage is a form of cloud storage that is accessible on mobile devices such as laptops, tablets, and smartphones. Mobile cloud storage providers offer services that allow the user to create and organize files, folders, music, and photos, similar to other cloud computing models. Services are used by both individuals and companies. Most cloud file storage providers offer limited free use but charge for additional storage once the free limit is exceeded. These costs are usually charged as a monthly subscription rate and have different rates depending on the amount of storage desired. In 2018, cloud services revenue was about $182.4 billion and in 2022 it is projected to grow to $331.2 billion. The cloud storage industry was projected to grow 17.2 percent in 2019 (Costello, 2019). == History == The concept of cloud computing trace back to 1960s, when the groundwork for modern internet and network technologies was being laid (Human for humans, 2024). One of the pivotal figures in this early period was J.C.R. Licklider, a visionary computer scientist who worked on ARPANET, the precursor to the internet. Licklider's ideas set the stage for the development of distributed computing systems, which are fundamental to cloud computing. Moving into the 1990s, AT&T introduced PersonaLink Services, a more advanced online platform offering electronic mail and online storage. Major turning point in 2006 The launch of Amazon Web Services (AWS) in 2006 marked a major turning point. AWS introduced Amazon S3 (Simple Storage Service), which allowed businesses and developers to store and retrieve any amount of data, at any time, from anywhere on the web. This development was revolutionary, providing scalable, reliable, and low-cost data storage infrastructure that transformed how organizations managed their data. == Applications == Some mobile device manufacturers include mobile cloud storage apps with their product. These apps facilitate synchronization of user files across multiple platforms. Part of the process for setting up new mobile devices frequently includes configuring a cloud storage service to Backup the device's files and information. Apple iOS devices come pre-loaded and configured to use Apple's mobile cloud storage service iCloud. Google offers a similar feature with the Android operating system by backing up the device using a Google Drive account. The Samsung Galaxy smartphone has partnered with Dropbox, while Microsoft similarly offers Microsoft OneDrive. Some mobile cloud storage apps are platform-independent. For example, Nasuni's Mobile Access app is available on any Android or iOS device. Most companies offering Cloud Storage have secure website to access files allowing use on any device that can browse the Internet.

    Read more →