AI For Business For Dummies

AI For Business For Dummies — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Vision transformer

    Vision transformer

    A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into tokens), serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings. ViTs were designed as alternatives to convolutional neural networks (CNNs) in computer vision applications. They have different inductive biases, training stability, and data efficiency. Compared to CNNs, ViTs are less data efficient, but have higher capacity. Some of the largest modern computer vision models are ViTs, such as one with 22B parameters. Subsequent to its publication, many variants were proposed, with hybrid architectures with both features of ViTs and CNNs. ViTs have found application in image recognition, image segmentation, weather prediction, and autonomous driving. == History == Transformers were introduced in Attention Is All You Need (2017), and have found widespread use in natural language processing. A 2019 paper applied ideas from the Transformer to computer vision. Specifically, they started with a ResNet, a standard convolutional neural network used for computer vision, and replaced all convolutional kernels by the self-attention mechanism found in a Transformer. It resulted in superior performance. However, it is not a Vision Transformer. In 2020, an encoder-only Transformer was adapted for computer vision, yielding the ViT, which reached state of the art in image classification, overcoming the previous dominance of CNN. The masked autoencoder (2022) extended ViT to work with unsupervised training. The vision transformer and the masked autoencoder, in turn, stimulated new developments in convolutional neural networks. Subsequently, there was cross-fertilization between the previous CNN approach and the ViT approach. In 2021, some important variants of the Vision Transformers were proposed. These variants are mainly intended to be more efficient, more accurate or better suited to a specific domain. Two studies improved efficiency and robustness of ViT by adding a CNN as a preprocessor. The Swin Transformer achieved state-of-the-art results on some object detection datasets such as COCO, by using convolution-like sliding windows of attention mechanism, and the pyramid process in classical computer vision. == Overview == The basic architecture, used by the original 2020 paper, is as follows. In summary, it is a BERT-like encoder-only Transformer. The input image is of type R H × W × C {\displaystyle \mathbb {R} ^{H\times W\times C}} , where H , W , C {\displaystyle H,W,C} are height, width, channel (RGB). It is then split into square-shaped patches of type R P × P × C {\displaystyle \mathbb {R} ^{P\times P\times C}} . For each patch, the patch is pushed through a linear operator, to obtain a vector ("patch embedding"). The position of the patch is also transformed into a vector by "position encoding" (the paper tried no embedding, 1D embedding, 2D embedding, and relative embedding: 1D was adopted). The two vectors are added, then pushed through several Transformer encoders. The attention mechanism in a ViT repeatedly transforms representation vectors of image patches, incorporating more and more semantic relations between image patches in an image. This is analogous to how in natural language processing, as representation vectors flow through a transformer, they incorporate more and more semantic relations between words, from syntax to semantics. The above architecture turns an image into a sequence of vector representations. To use these for downstream applications, an additional head needs to be trained to interpret them. For example, to use it for classification, one can add a shallow MLP on top of it that outputs a probability distribution over classes. The original paper uses a linear-GeLU-linear-softmax network. == Variants == === Original ViT === The original ViT was an encoder-only Transformer supervise-trained to predict the image label from the patches of the image. As in the case of BERT, it uses a special token in the input side, and the corresponding output vector is used as the only input of the final output MLP head. The special token is an architectural hack to allow the model to compress all information relevant for predicting the image label into one vector. Transformers found their initial applications in natural language processing tasks, as demonstrated by language models such as BERT and GPT-3. By contrast the typical image processing system uses a convolutional neural network (CNN). Well-known projects include Xception, ResNet, EfficientNet, DenseNet, and Inception. Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is quadratic in the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. === Architectural improvements === ==== Pooling ==== After the ViT processes an image, it produces some embedding vectors. These must be converted to a single class probability prediction by some kind of network. In the original ViT and Masked Autoencoder, they used a dummy [CLS] token, in emulation of the BERT language model. The output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multiheaded attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. The output from MAP is M u l t i h e a d e d A t t e n t i o n ( Q , V , V ) {\displaystyle \mathrm {MultiheadedAttention} (Q,V,V)} , where q {\displaystyle q} is a trainable query vector, and V {\displaystyle V} is the matrix with rows being x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} . This was first proposed in the Set Transformer architecture. Later papers demonstrated that GAP and MAP both perform better than BERT-like pooling. A variant of MAP was proposed as class attention, which applies MAP, then feedforward, then MAP again. Re-attention was proposed to allow training deep ViT. It changes the multiheaded attention module. === Masked Autoencoder === The Masked Autoencoder took inspiration from denoising autoencoders and context encoders. It has two ViTs put end-to-end. The first one ("encoder") takes in image patches with positional encoding, and outputs vectors representing each patch. The second one (called "decoder", even though it is still an encoder-only Transformer) takes in vectors with positional encoding and outputs image patches again. ==== Training ==== During training, input images (224px x 224 px in the original implementation) are split along a designated number of lines on each axis, producing image patches. A certain percentage of patches are selected to be masked out by mask tokens, while all others are retained in the image. The network is tasked with reconstructing the image from the remaining unmasked patches. Mask tokens in the original implementation are learnable vector quantities. A linear projection with positional embeddings is then applied to the vector of unmasked patches. Experiments varying mask ratio on networks trained on the ImageNet-1K dataset found 75% mask ratios achieved high performance on both finetuning and linear-probing of the encoder's latent space. The MAE processes only unmasked patches during training, increasing the efficiency of data processing in the encoder and lowering the memory usage of the transformer. A less computationally-intensive ViT is used for the decoder in the original implementation of the MAE. Masked patches are added back to the output of the encoder block as mask tokens and both are fed into the decoder. A reconstruction loss is computed for the masked patches to assess network performance. ==== Prediction ==== In prediction, the decoder architecture is discarded entirely. The input image is split into patches by the same algorithm as in training, but no patches are masked out. A linear projection wi

    Read more →
  • LG ThinQ

    LG ThinQ

    LG ThinQ (pronounced as "think-cue"; sometimes known as LG webOS) is a smart home and artificial intelligence brand launched by LG Electronics in 2017, featuring products that are equipped with voice control and artificial intelligence technology. The brand was originally launched for home appliances and consumer electronics, such as televisions, smart home devices, mobile devices, refrigerators, air conditioners and related services. The name was first used in 2011 for LG's THINQ-branded smart appliances, which were introduced at the Consumer Electronics Show in Las Vegas. In December 2017, LG announced ThinQ as a unified brand for artificial intelligence-enabled home appliances, consumer electronics and services.In February 2018, LG announced the LG V30S ThinQ, which is the first phone to have the "ThinQ" branding. == History == The branding was first introduced in 2011 in the Consumer Electronics Show (CES) in Las Vegas as THINQ. The first ThinQ product was a smart refrigerator, with features such as smart savings options, food management system, washing machine, oven and robotic vacuum cleaner and different software in the LCD screen on the fridge. The unified branding was then officially launched as ThinQ at CES 2017 as an artificial intelligence-based brand for all their smart products. The company announced DeepThinQ, a deep-learning technology for connected products, and later opened an Artificial Intelligence Lab in Seoul to coordinate research involving voice, video, sensors and machine learning. In December 2017, LG announced ThinQ as a brand designation for home appliances, consumer electronics, and services incorporating artificial intelligence, applied to its 2018 product lineup. In 2018, LG extended the ThinQ brand to smartphones with the LG V30S ThinQ. The phone used ThinQ branding for AI camera features, including image recognition and shooting-mode recommendations. That year, LG also used ThinQ branding on televisions with smart-assistant features, as manufacturers increasingly added voice assistants to TV platforms. In 2022, LG first introduced ThinQ UP, a software-upgradable appliance concept that allows compatible appliances to receive new features through the ThinQ app. The program included appliances such as refrigerators, washing machines, dryers, ovens and dishwashers, and was covered as part of a wider move toward upgradeable connected appliances. In 2024, LG introduced ThinQ ON, an AI-powered smart home hub designed to connect LG appliances and other smart home devices. It expanded ThinQ from an appliance-control platform into a broader smart home system. == Platform an app == LG ThinQ operates as a smart home platform and mobile app for connecting compatible LG appliances and consumer electronics. The app is used to control and monitor supported products, including kitchen appliances, laundry appliances, air purifiers, vacuum cleaners and televisions. Depending on the product and market, the ThinQ app can provide remote control, status monitoring, downloadable appliance cycles, diagnostic support, maintenance alerts and software-based feature updates. In 2024, LG introduced ThinQ ON as a hub for the ThinQ platform. The device supports Matter, Thread and Wi-Fi connectivity and includes a built-in voice assistant. The Verge described the product as part of LG's effort to expand ThinQ from an appliance-control platform into a broader smart home system competing with platforms such as Samsung SmartThings and Apple Home. == Features == LG ThinQ products use connected-device features, voice control to interact with users, and use sensor data and different features such as product recognition and learning engine technologies to enhance their abilities. Deep ThinQ (or LG ThinQ AI) was introduced as LG's own AI platform. It was reported that it could engage in two-way conversations with users and could educate itself according to users' behaviour patterns and habits. At the 2017 ThinQ launch, LG said the brand would cover products and services using artificial intelligence technologies from LG and partner companies. ThinQ features vary by product category. On appliances, the platform may support remote operation, product-status notifications, downloaded cycles and diagnostic functions. On televisions, ThinQ branding has been associated with voice-control and smart-assistant features. In 2018, LG ThinQ-branded TVs added support for Google Assistant and Alexa voice commands. As of August 30, 2018, LG's ThinQ products now communicate with each other for tasks such as going to an event or following a recipe. They have sensors for communicating with other ThinQ devices and appliances. == Products == LG ThinQ branding and connectivity features have been used across several LG product categories, including home appliances, televisions, air conditioners and mobile devices. Home appliances LG has applied ThinQ branding and app connectivity to home appliances such as refrigerators, washing machines, dryers, dishwashers, cooking appliances, air purifiers and vacuum cleaners. Through the ThinQ app, compatible appliances can be monitored or controlled remotely. Some compatible appliances can also receive downloadable cycles, diagnostic support, maintenance alerts and software-based feature updates through ThinQ UP. Televisions and home entertainment LG has used ThinQ branding on smart televisions and other home entertainment products. In 2018, LG ThinQ-branded televisions added support for smart-assistant voice commands, including Google Assistant. Smartphones LG G6 (ThinQ branding was added to startup screen in an update) LG V30 (ThinQ branding was added to startup screen in an update) LG V30S ThinQ LG V35 ThinQ LG G7 ThinQ LG V40 ThinQ LG G8 ThinQ LG G8s ThinQ LG G8x ThinQ LG V50 ThinQ LG V60 ThinQ LG Velvet (Generally considered a ThinQ product in other countries)

    Read more →
  • Semantic similarity network

    Semantic similarity network

    A semantic similarity network (SSN) is a special form of semantic network. designed to represent concepts and their semantic similarity. Its main contribution is reducing the complexity of calculating semantic distances. Bendeck (2004, 2008) introduced the concept of semantic similarity networks (SSN) as the specialization of a semantic network to measure semantic similarity from ontological representations. Implementations include genetic information handling. The concept is formally defined (Bendeck 2008) as a directed graph, with concepts represented as nodes and semantic similarity relations as edges. The relationships are grouped into relation types. The concepts and relations contain attribute values to evaluate the semantic similarity between concepts. The semantic similarity relationships of the SSN represent several of the general relationship types of the standard Semantic network, reducing the complexity of the (normally, very large) network for calculations of semantics. SSNs define relation types as templates (and taxonomy of relations) for semantic similarity attributes that are common to relations of the same type. SSN representation allows propagation algorithms to faster calculate semantic similarities, including stop conditions within a specified threshold. This reduces the computation time and power required for calculation. A more recent publications on Semantic Matching and Semantic Similarity Networks could be found in (Bendeck 2019). Specific Semantic Similarity Network application on healthcare was presented at the Healthcare information exchange Format (FHIR European Conference) 2019. The latest evolution in Artificial Intelligence (like ChatGPT, based on Large language model), relay strongly on evolutionary computation, the next level will be to include semantic unification (like in the Semantic Networks and this Semantic similarity network) to extend the current models with more powerful understanding tools.

    Read more →
  • Pretext

    Pretext

    A pretext (adj.: pretextual) is an excuse to do something or say something that is not accurate. Pretexts may be based on a half-truth or developed in the context of a misleading fabrication. Pretexts have been used to conceal the true purpose or rationale behind actions and words. They are often heard in political speeches. In US law, a pretext usually describes false reasons that hide the true intentions or motivations for a legal action. If a party can establish a prima facie case for the proffered evidence, the opposing party must prove that these reasons were "pretextual" or false. This can be accomplished by directly demonstrating that the motivations behind the presentation of evidence is false, or indirectly by evidence that the motivations are not "credible". In Griffith v. Schnitzer, an employment discrimination case, a jury award was reversed by a Court of Appeals because the evidence was not sufficient that the defendant's reasons were "pretextual". That is, the defendant's evidence was either undisputed, or the plaintiff's was "irrelevant subjective assessments and opinions". A "pretextual" arrest by law enforcement officers is one carried out for illegal purposes such as to conduct an unjustified search and seizure. As one example of pretext, in the 1880s, the Chinese government raised money on the pretext of modernizing the Chinese navy. Instead, these funds were diverted to repair a ship-shaped, two-story pavilion which had been originally constructed for the mother of the Qianlong Emperor. This pretext and the Marble Barge are famously linked with Empress Dowager Cixi. This architectural folly, known today as the Marble Boat (Shifang), is "moored" on Lake Kunming in what the empress renamed the "Garden for Cultivating Harmony" (Yiheyuan). Another example of pretext was demonstrated in the speeches of the Roman orator Cato the Elder (234–149 BC). For Cato, every public speech became a pretext for a comment about Carthage. The Roman statesman had come to believe that the prosperity of ancient Carthage represented an eventual and inevitable danger to Rome. In the Senate, Cato famously ended every speech by proclaiming his opinion that Carthage had to be destroyed (Carthago delenda est). This oft-repeated phrase was the ultimate conclusion of all logical argument in every oration, regardless of the subject of the speech. This pattern persisted until his death in 149, which was the year in which the Third Punic War began. In other words, any subject became a pretext for reminding his fellow senators of the dangers Carthage represented. == Uses in warfare == The early years of Japan's Tokugawa shogunate were unsettled, with warring factions battling for power. The causes for the fighting were in part pretextual, but the outcome brought diminished armed conflicts after the Siege of Osaka in 1614–1615. The next two-and-a-half centuries of Japanese history were comparatively peaceful under the successors of Tokugawa Ieyasu and the bakufu government he established. === United States === During the War of 1812, US President James Madison was often accused of using impressment of American sailors by the Royal Navy as a pretext to invade Canada. The sinking of the USS Maine in 1898 was blamed on the Spanish, despite early reports of it having been an accident, contributing to U.S. entry into the Spanish–American War. The slogan "Remember the Maine! To hell with Spain!" was used as a rallying cry. Some have argued that United States President Franklin D. Roosevelt used the attack on Pearl Harbor by Japanese forces on December 7, 1941, as a pretext to enter World War II. American soldiers and supplies had been assisting British and Soviet operations for almost a year by this point, and the United States had thus "chosen a side", but due to the political climate in the States at the time and some campaign promises made by Roosevelt that he would not send American troops to fight in foreign wars, Roosevelt could not declare war for fear of public backlash. The attack on Pearl Harbor united the American people's resolve against the Axis powers and created the bellicose atmosphere in which to declare war. The 1964 Gulf of Tonkin incident, later revealed to have been partly provoked and partly not to have happened, was used to bring the United States fully into the Vietnam War. United States President George W. Bush used the September 11 attacks and faulty intelligence about the existence of weapons of mass destruction as a pretext for the war in Iraq. == Social engineering == A type of social engineering called pretexting uses a pretext to elicit information fraudulently from a target. The pretext in this case includes research into the identity of a certain authorized person or personality type in order to establish legitimacy in the mind of the target.

    Read more →
  • Apptek

    Apptek

    Applications Technology (AppTek) is a U.S. company headquartered in McLean, Virginia that specializes in artificial intelligence and machine learning for human language technologies. The company provides both managed and professional services for natural language processing (NLP) technologies including automatic speech recognition (ASR), neural machine translation (MT), natural-language understanding (NLU) and neural speech synthesis. AppTek's Head of Science, Prof. Dr. -Ing Hermann Ney, was awarded the IEEE James L. Flanagan Speech and Audio Processing Award in 2019 and the ISCA Medal for Scientific Achievement in 2021 for his work in natural language processing. == History == AppTek was acquired in 1998 by Lernout & Hauspie (at the time a NASDAQ publicly traded company), AppTek organized a management buy-out and went private again in 2001. In 2014, the company sold its hybrid machine translation technology to eBay and has since rebuilt the platform to modern neural-based approaches for machine translation. In 2020, SOSi acquired non-controlling interest in AppTek and became an exclusive reseller of AppTek products for U.S. federal, state, and local government entities.

    Read more →
  • Portable Format for Analytics

    Portable Format for Analytics

    The Portable Format for Analytics (PFA) is a JSON-based predictive model interchange format conceived and developed by Jim Pivarski. PFA provides a way for analytic applications to describe and exchange predictive models produced by analytics and machine learning algorithms. It supports common models such as logistic regression and decision trees. Version 0.8 was published in 2015. Subsequent versions have been developed by the Data Mining Group. As a predictive model interchange format developed by the Data Mining Group, PFA is complementary to the DMG's XML-based standard called the Predictive Model Markup Language or PMML. == Release history == == Data Mining Group == The Data Mining Group is a consortium managed by the Center for Computational Science Research, Inc., a nonprofit founded in 2008. == Examples == reverse array: # reverse input array of doubles input: {"type": "array", "items": "double"} output: {"type": "array", "items": "double"} action: - let: { x : input} - let: { z : input} - let: { l : {a.len: [x]}} - let: { i : l} - while : { ">=" : [i,0]} do: - set : {z : {attr: z, path : [i] , to: {attr : x ,path : [ {"-":[{"-" : [l ,i]},1]}] } } } - set : {i : {-:[i,1]}} - z Bubblesort input: {"type": "array", "items": "double"} output: {"type": "array", "items": "double"} action: - let: { A : input} - let: { N : {a.len: [A]}} - let: { n : {-:[N,1]}} - let: { i : 0} - let: { s : 0.0} - while : { ">=" : [n,0]} do : - set : { i : 0 } - while : { "<=" : [i,{-:[n,1]}]} do : - if: {">": [ {attr: A, path : [i]} , {attr: A, path:[{+:[i,1]}]} ]} then : - set : {s : {attr: A, path: [i]}} - set : {A : {attr: A, path: [i], to: {attr: A, path:[{+:[i,1]}]} } } - set : {A : {attr: A, path: [{+:[i,1]}], to: s }} - set : {i : {+:[i,1]}} - set : {n : {-:[n,1]}} - A == Implementations == Hadrian (Java/Scala/JVM) - Hadrian is a complete implementation of PFA in Scala, which can be accessed through any JVM language, principally Java. It focuses on model deployment, so it is flexible (can run in restricted environments) and fast. Titus (Python 2.x) - Titus is a complete, independent implementation of PFA in pure Python. It focuses on model development, so it includes model producers and PFA manipulation tools in addition to runtime execution. Currently, it works for Python 2. Titus 2 (Python 3.x) - Titus 2 is a fork of Titus which supports PFA implementation for Python 3. Aurelius (R) - Aurelius is a toolkit for generating PFA in the R programming language. It focuses on porting models to PFA from their R equivalents. To validate or execute scoring engines, Aurelius sends them to Titus through rPython (so both must be installed). Antinous (Model development in Jython) - Antinous is a model-producer plugin for Hadrian that allows Jython code to be executed anywhere a PFA scoring engine would go. It also has a library of model producing algorithms.

    Read more →
  • Danilo McGarry

    Danilo McGarry

    Danilo McGarry (born 1985) is a British tech executive, writer, and speaker who has led AI initiatives in finance and healthcare. == Early life and education == Danilo McGarry was born in 1985. He received a Bachelor of Science (BSc) with honors in Business Management from the University of Bath. == Career == McGarry began his career in technology and financial services, with positions at companies including Motorola, JPMorgan Chase, and BNP Paribas. He later joined the Royal Bank of Canada (RBC) as an analyst and later became a director, where he led transformation initiatives involving robotic process automation (RPA) in the bank's capital markets operations. McGarry subsequently moved into leadership roles focused on AI. At Citigroup, he served as Head of Artificial Intelligence and Machine Learning, where he launched an AI-driven robotics and automation initiative. At UnitedHealth Group (UHG), he held a senior role in the company's automation program, which utilized a large fleet of software robots in its healthcare operations. In December 2019, McGarry was appointed Global Head of AI & Automation at Alter Domus, a multinational financial services firm. In this role, he established a new AI and automation department. He left the firm in late 2023 to establish his businesses. In 2025, the Chartered Institute of Personnel and Development (CIPD) appointed him as its strategic adviser on artificial intelligence.

    Read more →
  • Open Neural Network Exchange

    Open Neural Network Exchange

    The Open Neural Network Exchange (ONNX) [ˈɒnɪks] is an open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and software tools to enable a standard format for representing machine learning models. ONNX is available on GitHub. == History == ONNX was originally named Toffee and was developed by the PyTorch team at Facebook. In September 2017 it was renamed to ONNX and announced by Facebook and Microsoft. Later, IBM, Huawei, Intel, AMD, Arm and Qualcomm announced support for the initiative. In October 2017, Microsoft announced that it would add its Cognitive Toolkit and Project Brainwave platform to the initiative. In November 2019 ONNX was accepted as graduate project in Linux Foundation AI. In October 2020 Zetane Systems became a member of the ONNX ecosystem. == Intent == The initiative targets: === Framework interoperability === Enable developers to move machine learning models between different frameworks, which may be used at different stages of the development process, such as training, architecture design, or deployment on mobile devices. === Shared optimization === Provide a common representation that can be used by hardware vendors and other developers to apply optimizations to artificial neural network models across multiple machine learning frameworks. == Contents == ONNX provides definitions of an extensible computation graph model, built-in operators and standard data types, focused on inferencing (evaluation).. The container format is Protocol Buffers. Each computation dataflow graph is a list of nodes that form an acyclic graph. Nodes have inputs and outputs. Each node is a call to an operator. Metadata documents the graph. Built-in operators are to be available on each ONNX-supporting framework. ONNX models can be trained in a single framework, such as PyTorch or TensorFlow, and then exported to ONNX. This format allows models to be transferred from the training framework to other environments for testing or deployment. Once a model is in ONNX format, it can be executed in different runtime systems or on various hardware platforms, such as GPUs or specialized AI accelerators. Using a common format enables the same model representation to be used across multiple systems and frameworks.

    Read more →
  • GPT-5

    GPT-5

    GPT-5 is a multimodal large language model developed by OpenAI and the fifth in its series of generative pre-trained transformer (GPT) foundation models. Preceded in the series by GPT-4, it was launched on August 7, 2025. It is publicly accessible to users of the chatbot products ChatGPT and Microsoft Copilot as well as to developers through the OpenAI API. == Background == On April 14, 2023, Sam Altman, the chief executive officer of OpenAI, spoke at an event at the Massachusetts Institute of Technology and said that the company was not training GPT-5 at that time. He stated that OpenAI was "prioritizing GPT-4 development" and that "we are not and won't for some time" release GPT-5. On July 18, OpenAI filed for a "GPT-5" trademark in the United States. On November 13, Altman confirmed to the Financial Times that the company was working to develop GPT-5. According to The Information, "[f]or much of the second half of 2024, OpenAI was developing a model known internally as Orion and intended to become GPT-5", "[b]ut the Orion effort failed to produce a better model, and the company instead released it as GPT-4.5 in February [2025]." By late July 2025, OpenAI was widely anticipated as planning to release GPT-5 in early August. On July 30, The Verge reported that "Microsoft is getting ready for GPT-5" as "sources familiar with Microsoft's AI plans" told an editor that the company was testing a new mode for its Copilot chatbot that would offer a model that "thinks deeply or quickly based on the task". On August 5, in the leadup to the release of GPT-5, OpenAI released GPT-OSS, a set of two open-weight models that have reasoning capabilities. GPT-5 was then unveiled during a livestream event on August 7. == Capabilities == At the time of its release, GPT-5 had state-of-the-art performance on benchmarks that test mathematics, programming, finance, and multimodal understanding. According to OpenAI, improvements over its predecessor models include faster response times, better coding and writing skills, more accurate answers to health questions, and lower levels of hallucination. Also, compared to previous models, GPT-5 aims to give safe, high-level responses to potentially harmful queries rather than outright declining them, an approach that OpenAI refers to as "safe completions", aiming to result "in GPT-5 being able to refuse more unsafe questions, while offering fewer rejections to users seeking harmless information." In addition, GPT-5 was trained to give more critical, "less effusively agreeable" answers compared to its predecessor models. Days before the launch of GPT-5, two early testers of the model stated that they were "impressed" by its ability to code and to solve mathematical and scientific problems. They suggested that the model shows great improvement from GPT-4, but not as large of a gain as from GPT-3 to GPT-4. A day prior to the release of GPT-5, during a press briefing, Sam Altman, the chief executive officer of OpenAI, called GPT-5 "a significant step along the path to AGI", referring to artificial general intelligence, the hypothetical level of intelligence that OpenAI defines as the ability to perform any economically valuable task that a human can. According to Altman, GPT-5 is "significantly better" than its predecessors, offering "PhD-level" abilities across a wide range of tasks. The exact energy consumption of GPT-5 use has not been disclosed by OpenAI. Researchers at the University of Rhode Island estimated that a medium-length response consumes slightly over 18 watt-hours, equivalent to using an incandescent bulb for 18 minutes. === Architecture === GPT-5 is a system that contains a fast, high-throughput model, a deeper reasoning model, and a real-time router that decides which model to use based on conversation type, complexity, tool needs, and explicit user intent. Altman had previously criticized the manual model picker for being overly complex, suggesting a need for unification. GPT-5 also includes agentic functionality through which it can set up its own desktop and can use its browser to search autonomously for sources that relate to its task. The GPT-5 system card defines two fast, high-throughput models – gpt-5-main and gpt-5-main-mini – and two thinking models – gpt-5-thinking and gpt-5-thinking-mini. In the OpenAI API, developers can access the thinking model, its mini version, and gpt-5-thinking-nano, an even smaller and faster nano version of the thinking model. The version of GPT-5 that is accessible via the API has adjustable reasoning effort (low, medium, high, or minimal) and verbosity (low, medium, or high). Additionally, ChatGPT provides access to gpt-5-thinking with a setting that makes use of parallel test-time compute, referred to as gpt-5-thinking-pro. == Limitations == === Safety === Neuraltrust, a security research company, claimed to have successfully compromised GPT-5 within its first day of testing the model. According to its report, it enabled GPT-5 to generate detailed instructions for manufacturing explosive devices. SPLX, another company, conducted similar tests and came to similar conclusions about GPT-5's security. Their assessments suggest that GPT-5 has significant security gaps, potentially rendering it as being unsafe for use in a corporate environment. == Training == According to AIMultiple, GPT-5 is natively multimodal, meaning that it was trained from scratch on multiple modalities (like text and images) at once without relying on already-trained language or vision models. Its training process involved three stages: unsupervised pretraining, supervised fine-tuning, and reinforcement learning from human feedback. Pretraining used a large-scale multilingual dataset of books, articles, web pages, academic papers, and licensed sources. GPT-5's visual and text capabilities were described as having been developed alongside each other throughout training, unlike with GPT-4. == Use == GPT-5 is used in ChatGPT. Although GPT-5 is free for all ChatGPT users, Plus users get higher use limits while Pro users get unlimited access to GPT-5 as well as limited access to GPT-5 Pro. Standard limits for lower-tier users on responses per hour still apply. Additionally, with the introduction of GPT-5, ChatGPT's "Advanced Voice Mode" was replaced by "ChatGPT Voice", which is supposed to enable more natural-sounding conversations. OpenAI stated that "Standard Voice Mode retires on September 9, 2025, unifying all users on ChatGPT Voice". On November 24, 2025, the feature of shopping research was added to ChatGPT, claimed to be a mini model post-trained on gpt-5-thinking-mini. GPT-5 is also available in Microsoft Copilot, and Microsoft stated that it will incorporate GPT-5 into a wide variety of its products. According to 9to5Mac, Apple Inc. is planning to integrate the model into the Apple Intelligence feature in its iOS 26, iPadOS 26, and macOS Tahoe operating systems. It is also accessible via the OpenAI API. A number of American companies were reported as having received access to GPT-5 ahead of its launch. OpenAI stated that the private health insurance company Oscar Health was checking applications from its policyholders with the model. In addition, Uber was using GPT-5 for its customer support system; GitLab, Windsurf, and Cursor were using the model for software development; and the Spanish bank BBVA was using it for financial analysis. Other companies that OpenAI listed as having used GPT-5 pre-release include Amgen, Lowe's, and Notion. == Reception == === Critical reviews === Grace Huckins in MIT Technology Review found that, "[w]hereas o1 was a major technological advancement, GPT-5 is, above all else, a refined product." In response to claims that Sam Altman, the chief executive officer of OpenAI, had made about the model, she stated that "GPT-5 will furnish a more pleasant and seamless user experience. That's not nothing, but it falls far short of the transformative AI future that Altman has spent much of the past year hyping." In response to Altman's claim that GPT-5 is "a significant step along the path" to artificial general intelligence, she noted: "[M]aybe he's right—but if so, it's a very small step." In The Information, Stephanie Palazzolo praised GPT-5's coding capabilities. According to Matteo Wong in The Atlantic, GPT-5 "is intuitive, fast, and efficient; adapts to human preferences and intentions; and is easy to personalize." He stated: "At this stage of the AI boom, when every major chatbot is legitimately helpful in numerous ways, benchmarks, science, and rigor feel almost insignificant. What matters is how the chatbot feels [...]". John Herrman from the New York magazine wrote: "Casual users who encounter GPT-5 through ChatGPT aren't likely to feel like they're using a completely different product [...] while people who use it for software development or in a corporate context are more likely to notice a major change." Mashable's Christian de Looper found that "GPT-5

    Read more →
  • Journal of Experimental and Theoretical Artificial Intelligence

    Journal of Experimental and Theoretical Artificial Intelligence

    The Journal of Experimental and Theoretical Artificial Intelligence is a quarterly peer-reviewed scientific journal published by Taylor and Francis. It covers all aspects of artificial intelligence and was established in 1989. The editor-in-chief is Eric Dietrich (Binghamton University), the deputy editors-in-chief are Li Pheng Khoo (School of Mechanical & Aerospace Engineering, Nanyang Technological University) and Antonio Lieto (Department of Computer Science, University of Turin). == Abstracting and indexing == The journal is abstracted and indexed in: According to the Journal Citation Reports, the journal has a 2020/2021 impact factor of 2.340 .

    Read more →
  • Allen's interval algebra

    Allen's interval algebra

    Allen's interval algebra is a calculus for temporal reasoning that was introduced by James F. Allen in 1983. The calculus defines possible relations between time intervals and provides a composition table that can be used as a basis for reasoning about temporal descriptions of events. == Formal description == === Relations === The following 13 base relations capture the possible relations between two intervals. To see that the 13 relations are exhaustive, note that each point of X {\displaystyle X} can be at 5 possible locations relative to Y {\displaystyle Y} : before, at the start, within, at the end, after. These give 5 + 4 + 3 + 2 + 1 = 15 {\displaystyle 5+4+3+2+1=15} possible relative positions for the start and the end of X {\displaystyle X} . Of these, we cannot have X 0 = X 1 = Y 0 {\displaystyle X_{0}=X_{1}=Y_{0}} since X 0 < X 1 {\displaystyle X_{0} Read more →

  • Knowledge value chain

    Knowledge value chain

    A knowledge value chain is a sequence of intellectual tasks by which knowledge workers build their employer's unique competitive advantage and/or social and environmental benefit. As an example, the components of a research and development project form a knowledge value chain. Productivity improvements in a knowledge value chain may come from knowledge integration in its original sense of data systems consolidation. Improvements also flow from the knowledge integration that occurs when knowledge management techniques are applied to the continuous improvement of a business process or processes. The term first started coming into common use around 1999, appearing in management-related talks and papers. It was registered as a trademark in 2004 by TW Powell Co., a Manhattan company. Knowledge value chain processes Knowledge acquisition Knowledge storage Knowledge dissemination Knowledge application

    Read more →
  • Isotropic position

    Isotropic position

    In the fields of machine learning, the theory of computation, and random matrix theory, a probability distribution over vectors is said to be in isotropic position if its covariance matrix is proportional to the identity matrix. == Formal definitions == Let D {\textstyle D} be a distribution over vectors in the vector space R n {\textstyle \mathbb {R} ^{n}} . Then D {\textstyle D} is in isotropic position if, for vector v {\textstyle v} sampled from the distribution, E v v T = I d . {\displaystyle \mathbb {E} \,vv^{\mathsf {T}}=\mathrm {Id} .} A set of vectors is said to be in isotropic position if the uniform distribution over that set is in isotropic position. In particular, every orthonormal set of vectors is isotropic. As a related definition, a convex body K {\textstyle K} in R n {\textstyle \mathbb {R} ^{n}} is called isotropic if it has volume | K | = 1 {\textstyle |K|=1} , center of mass at the origin, and there is a constant α > 0 {\textstyle \alpha >0} such that ∫ K ⟨ x , y ⟩ 2 d x = α 2 | y | 2 , {\displaystyle \int _{K}\langle x,y\rangle ^{2}dx=\alpha ^{2}|y|^{2},} for all vectors y {\textstyle y} in R n {\textstyle \mathbb {R} ^{n}} ; here | ⋅ | {\textstyle |\cdot |} stands for the standard Euclidean norm.

    Read more →
  • Semantic analysis (knowledge representation)

    Semantic analysis (knowledge representation)

    Semantic analysis is a method for eliciting and representing knowledge about organisations. Initially the problem must be defined by domain experts and passed to the project analyst(s). The next step is the generation of candidate affordances. This step will generate a list of semantic units that may be included in the schema. The candidate grouping follows where some of the semantic units that will appear in the schema are placed in simple groups. Finally the groups will be integrated together into an ontology chart. Semantic analysis always starts from the problem definition which if not clear, require the analyst to employ relevant literature, interviews with the stakeholders and other techniques towards collecting supplementary information. All assumptions made must be genuine and not limiting the system.

    Read more →
  • Guideline execution engine

    Guideline execution engine

    A guideline execution engine is a computer program which can interpret a clinical guideline represented in a computerized format and perform actions towards the user of an electronic medical record. A guideline execution engine needs to communicate with a host clinical information system. Virtual Medical Record (vMR) is one possible interface which can be used. The engine's main function is to manage instances of executed guidelines of individual patients. == Architecture == The following modules are generally needed for any engine: interface to clinical information system new guidelines loading module guideline interpreter module clinical events parser alert/recommendations dispatch == Guideline Interchange Format == The Guideline Interchange Format (GLIF) is a computer representation format for clinical guidelines. Represented guidelines can be executed using a guideline execution engine. The format has several versions as it has been improved. In 2003 GLIF3 was introduced. == Use of third party workflow engine as a guideline execution engine == Some commercial electronic health record systems use a workflow engine to execute clinical guidelines. RetroGuide and HealthFlow are examples of such an approach.

    Read more →