Semantic query

Semantic query

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results (possibly the distinctive selection of one single piece of information) or to answer more fuzzy and wide open questions through pattern matching and digital reasoning. Semantic queries work on named graphs, linked data or triples. This enables the query to process the actual relationships between information and infer the answers from the network of data. This is in contrast to semantic search, which uses semantics (meaning of language constructs) in unstructured text to produce a better search result. (See natural language processing.) From a technical point of view, semantic queries are precise relational-type operations much like a database query. They work on structured data and therefore have the possibility to utilize comprehensive features like operators (e.g. >, < and =), namespaces, pattern matching, subclassing, transitive relations, semantic rules and contextual full text search. The semantic web technology stack of the W3C is offering SPARQL to formulate semantic queries in a syntax similar to SQL. Semantic queries are used in triplestores, graph databases, semantic wikis, natural language and artificial intelligence systems. == Background == Relational databases represent all relationships between data in an implicit manner only. For example, the relationships between customers and products (stored in two content-tables and connected with an additional link-table) only come into existence in a query statement (SQL in the case of relational databases) written by a developer. Writing the query demands exact knowledge of the database schema. Linked-Data represent all relationships between data in an explicit manner. In the above example, no query code needs to be written. The correct product for each customer can be fetched automatically. Whereas this simple example is trivial, the real power of linked-data comes into play when a network of information is created (customers with their geo-spatial information like city, state and country; products with their categories within sub- and super-categories). Now the system can automatically answer more complex queries and analytics that look for the connection of a particular location with a product category. The development effort for this query is omitted. Executing a semantic query is conducted by walking the network of information and finding matches (also called Data Graph Traversal). Another important aspect of semantic queries is that the type of the relationship can be used to incorporate intelligence into the system. The relationship between a customer and a product has a fundamentally different nature than the relationship between a neighbourhood and its city. The latter enables the semantic query engine to infer that a customer living in Manhattan is also living in New York City whereas other relationships might have more complicated patterns and "contextual analytics". This process is called inference or reasoning and is the ability of the software to derive new information based on given facts. == Articles == Velez, Golda (2008). "Semantics Help Wall Street Cope With Data Overload". Wall Street & Technology. wallstreetandtech.com. Zhifeng, Xiao (2009). "Spatial information semantic query based on SPARQL". In Liu, Yaolin; Tang, Xinming (eds.). International Symposium on Spatial Analysis, Spatial-Temporal Data Modeling, and Data Mining. Vol. 7492. SPIE. pp. 74921P. Bibcode:2009SPIE.7492E..60X. doi:10.1117/12.838556. S2CID 62191842. Aquin, Mathieu (2010). "Watson, more than a Semantic Web search engine" (PDF). Semantic Web Journal. Dworetzky, Tom (2011). "How Siri Works: iPhone's 'Brain' Comes from Natural Language Processing". International Business Times. Horwitt, Elisabeth (2011). "The semantic Web gets down to business". computerworld.com. Rodriguez, Marko (2011). "Graph Pattern Matching with Gremlin". Marko A. Rodriguez. markorodriguez.com on Graph Computing. Sequeda, Juan (2011). "SPARQL Nuts & Bolts". Cambridge Semantics. Freitas, Andre (2012). "Querying Heterogeneous Datasets on the Linked Data Web" (PDF). IEEE Internet Computing. Kauppinen, Tomi (2012). "Using the SPARQL Package in R to handle Spatial Linked Data". linkedscience.org. Lorentz, Alissa (2013). "With Big Data, Context is a Big Issue". Wired.

H2O (software)

H2O is an open-source, in-memory, distributed machine learning and predictive analytics platform developed by the company H2O.ai (previously 0xdata). The software uses a distributed architecture for parallel processing on standard hardware. It supports algorithms for large-scale data analysis and model deployment. H2O is primarily used by data scientists and developers for statistical modeling and data-driven decision-making. The platform is designed to handle in-memory computations across a distributed computing environment. It offers implementations for numerous statistical and machine learning algorithms, which are accessible through various programming interfaces. The software is released under the Apache License 2.0. == Functionality and features == H2O provides a suite of supervised and unsupervised machine learning algorithms. Its core functions include: Supervised learning: algorithms in the field of statistics, data mining and machine learning such as generalized linear models, random forests, gradient boosting and deep learning are implemented for classification and regression tasks. Unsupervised learning: including K-Means clustering and principal component analysis. Automated machine learning: a features designed to automate the processes of model selection, tuning, and ensemble creation. The software can ingest data from various sources, including the Hadoop Distributed File System, Amazon S3, SQL databases, as well as local file systems. It operates natively on Apache Spark clusters through Sparkling Water. Proponents claim that improved performance is achieved compared to other analysis tools. The software is distributed free of charge, under a business model based on the development of individual applications and support. == Architecture == H2O is primarily written in Java. It uses a distributed architecture that allows the platform to cluster nodes for parallel processing and in-memory storage of data and models. Users interact with the H2O platform through several primary interfaces: Programming language interfaces: APIs are provided for the R and Python programming languages, and various Apache offerings (Apache Hadoop and Spark, as well as Maven). H2O Flow: a graphical web-based interactive computational environment that functions as a notebook interface for data exploration, model building, and scripting. REST-API: allows for integration with other applications and frameworks such as Microsoft Excel or RStudio. With the H2O Machine Learning Integration Nodes, KNIME offers algorithmic workflows. While the algorithm executes, approximate results are displayed, so that users can track the progress and intervene if needed. == History, influences, and extensions == The software project was initiated by the company 0xdata, which later changed its name to H2O.ai. The three Stanford professors Stephen P. Boyd, Robert Tibshirani and Trevor Hastie form a panel that advises H2O on scientific issues. Since its inception, H2O provides open-source machine learning libraries for enterprise use. The core H2O platform is often complemented by offerings from H2O.ai, such as H2O Driverless AI. == Reception == H2O is referenced in peer-reviewed literature regarding automated machine learning (AutoML). The platform has been categorized as a "Leader" and a "Strong Performer" in industry reports by Forrester Research. H2O (the open-source platform) and the associated commercial platform Driverless AI have been recurring winners of InfoWorld's most prestigious awards, including both the Best of Open Source Software ("Bossies") and the Technology of the Year awards.

Computer audition

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound semantics for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing, auditory modelling, music perception and cognition, pattern recognition, and machine learning, as well as more traditional methods of artificial intelligence for musical knowledge representation. == Applications == Like computer vision versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings. Applications of computer audition are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on. == Related disciplines == Computer Audition overlaps with the following disciplines: Music information retrieval: methods for search and analysis of similarity between music signals. Auditory scene analysis: understanding and description of audio sources and events. Computational musicology and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data. Computer music: use of computers in creative musical applications. Machine musicianship: audition driven interactive music systems. == Areas of study == Since audio signals are interpreted by the human ear–brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) could benefit from efficient compression schemes, which discard inaudible information in the sound. Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces. The study of CA could be roughly divided into the following sub-problems: Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture. Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma, and auditory representations. Musical knowledge structures: analysis of tonality, rhythm, and harmonies. Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering. Sequence modeling: matching and alignment between signals and note sequences. Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods. Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure. Multi-modal analysis: finding correspondences between textual, visual, and audio signals. === Representation issues === Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files. Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric representation for general audio. Parametric audio representations usually use filter banks or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings. === Features === Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape etc., statistical characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (bandwidth) in frequency or octave invariance (chroma). Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation. === Musical knowledge === Finding specific musical structures is possible by using musical knowledge as well as supervised and unsupervised machine learning methods. Examples of this include detection of tonality according to distribution of frequencies that correspond to patterns of occurrence of notes in musical scales, distribution of note onset times for detection of beat structure, distribution of energies in different frequencies to detect musical chords and so on. === Sound similarity and sequence modeling === Comparison of sounds can be done by comparison of features with or without reference to time. In some cases an overall similarity can be assessed by close values of features between two sounds. In other cases when temporal structure is important, methods of dynamic time warping need to be applied to "correct" for different temporal scales of acoustic events. Finding repetitions and similar sub-sequences of sonic events is important for tasks such as texture synthesis and machine improvisation. === Source separation === Since one of the basic characteristics of general audio is that it comprises multiple simultaneously sounding sources, such as multiple musical instruments, people talking, machine noises or animal vocalization, the ability to identify and separate individual sources is very desirable. Unfortunately, there are no methods that can solve this problem in a robust fashion. Existing methods of source separation rely sometimes on correlation between different audio channels in multi-channel recordings. The ability to separate sources from stereo signals requires different techniques than those usually applied in communications where multiple sensors are available. Other source separation methods rely on training or clustering of features in mono recording, such as tracking harmonically related partials for multiple pitch detection. Some methods, before explicit recognition, rely on revealing structures in data without knowing the structures (like recognizing objects in abstract pictures without attributing them meaningful labels) by finding the least complex data representations, for instance describing audio scenes as generated by a few tone patterns and their trajectories (polyphonic voices) and acoustical contours drawn by a tone (c

Rademacher complexity

In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions. == Definitions == === Rademacher complexity of a set === Given a set A ⊆ R m {\displaystyle A\subseteq \mathbb {R} ^{m}} , the Rademacher complexity of A is defined as follows: Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]} where σ 1 , σ 2 , … , σ m {\displaystyle \sigma _{1},\sigma _{2},\dots ,\sigma _{m}} are independent random variables drawn from the Rademacher distribution i.e. Pr ( σ i = + 1 ) = Pr ( σ i = − 1 ) = 1 / 2 {\displaystyle \Pr(\sigma _{i}=+1)=\Pr(\sigma _{i}=-1)=1/2} for i ∈ { 1 , 2 , … , m } {\displaystyle i\in \{1,2,\dots ,m\}} , and a = ( a 1 , … , a m ) ∈ A {\displaystyle a=(a_{1},\ldots ,a_{m})\in A} . Some authors take the absolute value of the sum before taking the supremum, but if A {\displaystyle A} is symmetric this makes no difference. === Rademacher complexity of a function class === Let S = { z 1 , z 2 , … , z m } ⊆ Z {\displaystyle S=\{z_{1},z_{2},\dots ,z_{m}\}\subseteq Z} be a sample of points and consider a function class F {\displaystyle {\mathcal {F}}} of real-valued functions over Z {\displaystyle Z} . Then, the empirical Rademacher complexity of F {\displaystyle {\mathcal {F}}} given S {\displaystyle S} is defined as: Rad S ⁡ ( F ) = 1 m E σ [ sup f ∈ F | ∑ i = 1 m σ i f ( z i ) | ] {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{f\in {\mathcal {F}}}\left|\sum _{i=1}^{m}\sigma _{i}f(z_{i})\right|\right]} This can also be written using the previous definition: Rad S ⁡ ( F ) = Rad ⁡ ( F ∘ S ) {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})=\operatorname {Rad} ({\mathcal {F}}\circ S)} where F ∘ S {\displaystyle {\mathcal {F}}\circ S} denotes function composition, i.e.: F ∘ S := { ( f ( z 1 ) , … , f ( z m ) ) ∣ f ∈ F } {\displaystyle {\mathcal {F}}\circ S:=\{(f(z_{1}),\ldots ,f(z_{m}))\mid f\in {\mathcal {F}}\}} The worst case empirical Rademacher complexity is Rad ¯ m ( F ) = sup S = { z 1 , … , z m } Rad S ⁡ ( F ) {\displaystyle {\overline {\operatorname {Rad} }}_{m}({\mathcal {F}})=\sup _{S=\{z_{1},\dots ,z_{m}\}}\operatorname {Rad} _{S}({\mathcal {F}})} Let P {\displaystyle P} be a probability distribution over Z {\displaystyle Z} . The Rademacher complexity of the function class F {\displaystyle {\mathcal {F}}} with respect to P {\displaystyle P} for sample size m {\displaystyle m} is: Rad P , m ⁡ ( F ) := E S ∼ P m [ Rad S ⁡ ( F ) ] {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}}):=\mathbb {E} _{S\sim P^{m}}\left[\operatorname {Rad} _{S}({\mathcal {F}})\right]} where the above expectation is taken over an identically independently distributed (i.i.d.) sample S = ( z 1 , z 2 , … , z m ) {\displaystyle S=(z_{1},z_{2},\dots ,z_{m})} generated according to P {\displaystyle P} . == Intuition == The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of σ i {\displaystyle \sigma _{i}} under the expectation, so that this quantity in the sum is maximized. The Rademacher complexity of a set A {\displaystyle A} can be rewritten as Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] = 1 m 2 m ∑ σ ∈ { − 1 / m , + 1 / m } m [ sup a ∈ A ⟨ σ , a ⟩ ] . {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]={\frac {1}{{\sqrt {m}}2^{m}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}}\left[\sup _{a\in A}\langle \sigma ,a\rangle \right].} Each term in the summation is the farthest distance of the set A {\displaystyle A} from the origin, along a unit-length direction σ {\displaystyle \sigma } . The directions are along the vertices of a hypercube. Thus, we can also write it as Rad ⁡ ( A ) = 1 2 m 1 2 m − 1 ∑ σ ∈ { − 1 / m , + 1 / m } m / { − 1 , + 1 } [ sup a ∈ A ⟨ σ , a ⟩ − inf a ∈ A ⟨ σ , a ⟩ ] {\displaystyle \operatorname {Rad} (A)={\frac {1}{2{\sqrt {m}}}}{\frac {1}{2^{m-1}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}}\left[\sup _{a\in A}\langle \sigma ,a\rangle -\inf _{a\in A}\langle \sigma ,a\rangle \right]} Here, the set { − 1 / m , + 1 / m } m / { − 1 , + 1 } {\displaystyle \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}} denotes half of the vertices of a hypercube, selected so that each diagonal has exactly one vertex selected. In words, this states that 2 m Rad ⁡ ( A ) {\displaystyle 2{\sqrt {m}}\operatorname {Rad} (A)} is precisely the average width of the set A {\displaystyle A} along all diagonal directions of a hypercube. == Examples == A singleton set has 0 width in any direction, so it has Rademacher complexity 0. The set A = { ( 1 , 1 ) , ( 1 , 2 ) } ⊆ R 2 {\displaystyle A=\{(1,1),(1,2)\}\subseteq \mathbb {R} ^{2}} has average width 1 / 2 {\displaystyle 1/{\sqrt {2}}} along the two diagonal directions of the square, so it has Rademacher complexity 1 / 4 {\displaystyle 1/4} . The unit cube [ 0 , 1 ] m {\displaystyle [0,1]^{m}} has constant width m {\displaystyle {\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / 2 {\displaystyle 1/2} . Similarly, the unit cross-polytope { x ∈ R m : ‖ x ‖ 1 ≤ 1 } {\displaystyle \{x\in \mathbb {R} ^{m}:\|x\|_{1}\leq 1\}} has constant width 2 / m {\displaystyle 2/{\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / m {\displaystyle 1/m} . == Using the Rademacher complexity == The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn. === Bounding the representativeness === In machine learning, it is desired to have a training set that represents the true distribution of some sample data S {\displaystyle S} . This can be quantified using the notion of representativeness. Denote by P {\displaystyle P} the probability distribution from which the samples are drawn. Denote by H {\displaystyle H} the set of hypotheses (potential classifiers) and denote by F {\displaystyle {\mathcal {F}}} the corresponding set of error functions, i.e., for every hypothesis h ∈ H {\displaystyle h\in H} , there is a function f h ∈ F {\displaystyle f_{h}\in F} , that maps each training sample (features,label) to the error of the classifier h {\displaystyle h} (note in this case hypothesis and classifier are used interchangeably). For example, in the case that h {\displaystyle h} represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function f h {\displaystyle f_{h}} returns 0 if h {\displaystyle h} correctly classifies a sample and 1 else. We omit the index and write f {\displaystyle f} instead of f h {\displaystyle f_{h}} when the underlying hypothesis is irrelevant. Define: L P ( f ) := E z ∼ P [ f ( z ) ] {\displaystyle L_{P}(f):=\mathbb {E} _{z\sim P}[f(z)]} – the expected error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the real distribution P {\displaystyle P} ; L S ( f ) := 1 m ∑ i = 1 m f ( z i ) {\displaystyle L_{S}(f):={1 \over m}\sum _{i=1}^{m}f(z_{i})} – the estimated error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the sample S {\displaystyle S} . The representativeness of the sample S {\displaystyle S} , with respect to P {\displaystyle P} and F {\displaystyle {\mathcal {F}}} , is defined as: Rep P ⁡ ( F , S ) := sup f ∈ F ( L P ( f ) − L S ( f ) ) {\displaystyle \operatorname {Rep} _{P}({\mathcal {F}},S):=\sup _{f\in F}(L_{P}(f)-L_{S}(f))} Smaller representativeness is better, since it provides a way to avoid overfitting: it means that the true error of a classifier is not much higher than its estimated error, and so selecting a classifier that has low estimated error will ensure that the true error is also low. Note however that the concept of representativeness is relative and hence can not be compared across distinct samples. The expected representativeness of a sample can be bounded above by the Rademacher complexity of the function class: If F {\displaystyle {\mathcal {F}}} is a set of functions with range within [ 0 , 1 ] {\displaystyle [0,1]} , then Rad P , m ⁡ ( F ) − ln ⁡ 2 2 m ≤ E S ∼ P m [ Rep P ⁡ ( F , S ) ] ≤ 2 Rad P , m ⁡ ( F ) {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}})-{\sqrt {\frac {\ln 2}{2m}}}\leq \mathbb {E} _{S\sim P^{m}}[\operatorname {Rep} _{P}({\

Reasoning model

A reasoning model, also known as a reasoning language model (RLM) or large reasoning model (LRM), is a type of large language model (LLM) that has been specifically trained to solve complex tasks requiring multiple steps of logical reasoning. These models demonstrate superior performance on logic, mathematics, and programming tasks compared to standard LLMs. They possess the ability to revisit and revise earlier reasoning steps and utilize additional computation during inference as a method to scale performance, complementing traditional scaling approaches based on training data size, model parameters, and training compute. == Overview == Unlike traditional language models that generate responses immediately, reasoning models allocate additional compute, or thinking, time before producing an answer to solve multi-step problems. OpenAI introduced this terminology in September 2024 when it released the o1 series, describing the models as designed to "spend more time thinking" before responding. The company framed o1 as a reset in model naming that targets complex tasks in science, coding, and mathematics, and it contrasted o1's performance with GPT-4o on benchmarks such as AIME and Codeforces. Independent reporting the same week summarized the launch and highlighted OpenAI's claim that o1 automates chain-of-thought style reasoning to achieve large gains on difficult exams. In operation, reasoning models generate internal chains of intermediate steps, then select and refine a final answer. OpenAI reported that o1's accuracy improves as the model is given more reinforcement learning during training and more test-time compute at inference. The company initially chose to hide raw chains and instead return a model-written summary, stating that it "decided not to show" the underlying thoughts so researchers could monitor them without exposing unaligned content to end users. Commercial deployments document separate "reasoning tokens" that meter hidden thinking and a control for "reasoning effort" that tunes how much compute the model uses. These features make the models slower than ordinary chat systems while enabling stronger performance on difficult problems. == History == The research trajectory toward reasoning models combined advances in supervision, prompting, and search-style inference. Early alignment work on reinforcement learning from human feedback showed that models can be fine-tuned to follow instructions with "human feedback" and preference-based rewards. In 2022, Google Research scientists Jason Wei and Denny Zhou showed that chain-of-thought prompting "significantly improves the ability" of large models on complex reasoning tasks. Input → Step 1 → Step 2 → ⋯ → Step n ⏟ Reasoning chain → Answer {\displaystyle {\text{Input}}\rightarrow \underbrace {{\text{Step}}_{1}\rightarrow {\text{Step}}_{2}\rightarrow \cdots \rightarrow {\text{Step}}_{n}} _{\text{Reasoning chain}}\rightarrow {\text{Answer}}} A companion result demonstrated that the simple instruction "Let's think step by step" can elicit zero-shot reasoning. Follow-up work introduced self-consistency decoding, which "boosts the performance" of chain-of-thought by sampling diverse solution paths and choosing the consensus, and tool-augmented methods such as ReAct, a portmanteau of Reason and Act, that prompt models to "generate both reasoning traces" and actions. Research then generalized chain-of-thought into search over multiple candidate plans. The Tree-of-Thoughts framework from Princeton computer scientist Shunyu Yao proposes that models "perform deliberate decision making" by exploring and backtracking over a tree of intermediate thoughts. OpenAI's reported breakthrough focused on supervising reasoning processes rather than only outcomes, with Lightman et al.'s "Let's Verify Step by Step" reporting that rewarding each correct step "significantly outperforms outcome supervision" on challenging math problems and improves interpretability by aligning the chain-of-thought with human judgment. OpenAI's o1 announcement ties these strands together with a large-scale reinforcement learning algorithm that trains the model to refine its own chain of thought, and it reports that accuracy rises with more training compute and more time spent thinking at inference. Together, these developments define the core of reasoning models. They use supervision signals that evaluate the quality of intermediate steps, they exploit inference-time exploration such as consensus or tree search, and they expose controls for how much internal thinking compute to allocate. OpenAI's o1 family made this approach available at scale in September 2024 and popularized the label "reasoning model" for LLMs that deliberately think before they answer. The development of reasoning models illustrates Richard S. Sutton's "bitter lesson" that scaling compute typically outperforms methods based on human-designed insights. This principle was demonstrated by researchers at the Generative AI Research Lab (GAIR), who initially attempted to replicate o1's capabilities using sophisticated methods including tree search and reinforcement learning in late 2024. Their findings, published in the "o1 Replication Journey" series, revealed that knowledge distillation, a comparatively straightforward technique that trains a smaller model to mimic o1's outputs, produced unexpectedly strong performance. This outcome illustrated how direct scaling approaches can, at times, outperform more complex engineering solutions. === Drawbacks === Reasoning models require significantly more computational resources during inference compared to non-reasoning models. Research on the American Invitational Mathematics Examination (AIME) benchmark found that reasoning models were 10 to 74 times more expensive to operate than their non-reasoning counterparts. The extended inference time is attributed to the detailed, step-by-step reasoning outputs that these models generate, which are typically much longer than responses from standard large language models that provide direct answers without showing their reasoning process. One researcher in early 2025 argued that these models may face potential additional denial-of-service concerns with "overthinking attacks." === Releases === ==== 2024 ==== In September 2024, OpenAI released o1-preview, a large language model with enhanced reasoning capabilities. The full version, o1, was released in December 2024. OpenAI initially shared preliminary results on its successor model, o3, in December 2024, with the full o3 model becoming available in 2025. Alibaba released reasoning versions of its Qwen large language models in November 2024. In December 2024, the company introduced QvQ-72B-Preview, an experimental visual reasoning model. In December 2024, Google introduced Deep Research in Gemini, a feature designed to conduct multi-step research tasks. On December 16, 2024, researchers demonstrated that by scaling test-time compute, a relatively small Llama 3B model could outperform a much larger Llama 70B model on challenging reasoning tasks. This experiment suggested that improved inference strategies can unlock reasoning capabilities even in smaller models. ==== 2025 ==== In January 2025, DeepSeek released R1, a reasoning model that achieved performance comparable to OpenAI's o1 at significantly lower computational cost. The release demonstrated the effectiveness of Group Relative Policy Optimization (GRPO), a reinforcement learning technique used to train the model. On January 25, 2025, DeepSeek enhanced R1 with web search capabilities, allowing the model to retrieve information from the internet while performing reasoning tasks. Research during this period further validated the effectiveness of knowledge distillation for creating reasoning models. The s1-32B model achieved strong performance through budget forcing and scaling methods, reinforcing findings that simpler training approaches can be highly effective for reasoning capabilities. On February 2, 2025, OpenAI released Deep Research, a feature powered by their o3 model that enables users to conduct comprehensive research tasks. The system generates detailed reports by automatically gathering and synthesizing information from multiple web sources. OpenAI called GPT-4.5 its "last non-chain-of-thought model", and implemented with GPT-5 a router model that selects a model based on the difficulty of the task. ==== 2026 ==== In January 2026, Moonshot AI released Kimi K2.5, an open-source 1 trillion parameter MoE model with 32 billion active parameters. It uses an “Agent Swarm” system that dynamically decomposes tasks into sub-agents for reasoning and execution, enabling more scalable multi-step problem solving than a single sequential reasoning chain. == Training == Reasoning models follow the familiar large-scale pretraining used for frontier language models, then diverge in the post-training and optimization. OpenAI reports that o1 is trained with a large-

Artificial intelligence of things

Artificial Intelligence of Things (AIoT) is the combination of artificial intelligence (AI) technologies with the Internet of things (IoT) infrastructure to create systems capable of sensing, learning, and acting on data without continuous human intervention. While IoT focuses on connectivity and sensor data collection, AI enables IoT devices to analyse data in real time and produce actionable outputs, including automated decisions at the edge. == Applications == === Manufacturing and predictive maintenance === Manufacturing accounts for the largest share of AIoT adoption by industry vertical. A common application is predictive maintenance, where sensors measuring vibration, temperature, current draw, and acoustic emissions feed machine learning models trained to detect signatures that precede equipment failure. These systems can flag developing faults weeks or months in advance, and in more advanced deployments can autonomously adjust machine parameters such as motor speed or cooling cycles to delay or prevent failure. === Other industries === In healthcare, AIoT enables remote patient monitoring through wearable devices that collect vital signs and apply AI models to detect anomalies or predict deterioration. In logistics, GPS and telematics sensors combined with AI models support real-time route optimisation, vehicle maintenance prediction, and fuel cost forecasting. Smart building systems use occupancy, temperature, and energy sensors with AI to dynamically adjust HVAC and lighting, reducing energy consumption. == Architecture == AIoT systems typically operate across three layers: a device layer of sensors and actuators that collect data, a connectivity layer that transmits data via protocols such as MQTT or HTTP, and a compute layer where AI models process the data either in the cloud or at the edge. The trend toward edge-based processing, where inference runs on low-cost processors near the data source rather than in a centralised cloud, has accelerated as hardware costs have fallen and applications increasingly require sub-second response times. == Market == Market sizing estimates for AIoT vary significantly depending on scope and definition. Fortune Business Insights valued the AIoT market at USD 35.65 billion in 2023, projecting growth to USD 253.86 billion by 2030 at a compound annual growth rate of 32.4%. Grand View Research estimated the broader market at USD 171.4 billion in 2024 with a CAGR of 31.7% through 2030, reflecting a wider definition that includes AI-integrated hardware components. North America accounted for approximately 40% of global market share in 2024, with the Asia-Pacific region projected as the fastest-growing market.

Data annotation

Data annotation is the process of labeling or tagging relevant metadata within a dataset to enable machines to interpret the data accurately. The dataset can take various forms, including images, audio files, video footage, or text. == Applications == Data is a fundamental component in the development of artificial intelligence (AI). Training AI models, particularly in computer vision and natural language processing, requires large volumes of annotated data. Proper annotation ensures that machine learning algorithms can recognize patterns and make accurate predictions. Common types of data annotation include classification, bounding boxes, semantic segmentation, and keypoint annotation. Data annotation is used in AI-driven fields, including healthcare, autonomous vehicles, retail, security, and entertainment. By accurately labeling data, machine learning models can perform complex tasks such as object detection, sentiment analysis, and speech recognition with greater precision. This growing demand has led to the emergence of specialized sectors and platforms dedicated to AI training and human-in-the-loop workflows, which often utilize Reinforcement Learning from Human Feedback (RLHF) to refine model behavior. == In computer vision == === Image classification === Image classification, also known as image categorization, involves assigning predefined labels to images. Machine learning algorithms trained on classified images can later recognize objects and differentiate between categories. For instance, an AI model trained to recognize furniture styles can distinguish between Georgian and Rococo armchairs. === Semantic segmentation === Semantic segmentation assigns each pixel in an image to a specific class, such as trees, vehicles, humans, or buildings. This type of annotation enables machine learning models to differentiate objects by grouping similar pixels, allowing for a detailed understanding of an image. === Bounding boxes === Bounding box annotation involves drawing rectangular boxes around objects in an image. This technique is commonly used in autonomous driving, security surveillance, and retail analytics to detect and classify objects such as pedestrians, vehicles, and products on store shelves. === 3D cuboids === 3D cuboid annotation enhances traditional bounding boxes by adding depth, enabling models to predict an object's spatial orientation, movement, and size. This method is particularly useful for autonomous vehicles and robotics, where understanding object dimensions and depth is critical. === Polygonal annotation === For objects with irregular shapes, such as curved or multi-sided items, polygonal annotation provides more precise labeling than bounding boxes. This technique is often used in applications that require detailed object recognition, such as medical imaging or aerial mapping. === Keypoint annotation === Keypoint annotation marks specific points on an object, such as facial landmarks or body joints, to enable tracking and motion analysis. This method is widely used in facial recognition, emotion detection, sports analytics, and augmented reality applications.