Structured prediction

Structured prediction

Structured prediction or structured output learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than discrete or real values. Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the predicted value is compared to the ground truth, and this is used to adjust the model parameters. Due to the complexity of the model and the interrelations of predicted variables, the processes of model training and inference are often computationally infeasible, so approximate inference and learning methods are used. == Applications == An example application is the problem of translating a natural language sentence into a syntactic representation such as a parse tree. This can be seen as a structured prediction problem in which the structured output domain is the set of all possible parse trees. Structured prediction is used in a wide variety of domains including bioinformatics, natural language processing (NLP), speech recognition, and computer vision. === Example: sequence tagging === Sequence tagging is a class of problems prevalent in NLP in which input data are often sequential, for instance sentences of text. The sequence tagging problem appears in several guises, such as part-of-speech tagging (POS tagging) and named entity recognition. In POS tagging, for example, each word in a sequence must be 'tagged' with a class label representing the type of word: The main challenge of this problem is to resolve ambiguity: in the above example, the words "sentence" and "tagged" in English can also be verbs. While this problem can be solved by simply performing classification of individual tokens, this approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strong conditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as a hidden Markov model or conditional random field that predicts the entire tag sequence for a sentence (rather than just individual tags) via the Viterbi algorithm. == Techniques == Probabilistic graphical models form a large class of structured prediction models. In particular, Bayesian networks and random fields are popular. Other algorithms and models for structured prediction include inductive logic programming, case-based reasoning, structured SVMs, Markov logic networks, Probabilistic Soft Logic, and constrained conditional models. The main techniques are: Conditional random fields Structured support vector machines Structured k-nearest neighbours Recurrent neural networks, in particular Elman networks Transformers. === Structured perceptron === One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron by Collins. This algorithm combines the perceptron algorithm for learning linear classifiers with an inference algorithm (classically the Viterbi algorithm when used on sequence data) and can be described abstractly as follows: First, define a function ϕ ( x , y ) {\displaystyle \phi (x,y)} that maps a training sample x {\displaystyle x} and a candidate prediction y {\displaystyle y} to a vector of length n {\displaystyle n} ( x {\displaystyle x} and y {\displaystyle y} may have any structure; n {\displaystyle n} is problem-dependent, but must be fixed for each model). Let G E N {\displaystyle GEN} be a function that generates candidate predictions. Then: Let w {\displaystyle w} be a weight vector of length n {\displaystyle n} For a predetermined number of iterations: For each sample x {\displaystyle x} in the training set with true output t {\displaystyle t} : Make a prediction y ^ {\displaystyle {\hat {y}}} : y ^ = a r g m a x { y ∈ G E N ( x ) } ( w T , ϕ ( x , y ) ) {\displaystyle {\hat {y}}={\operatorname {arg\,max} }\,\{y\in GEN(x)\}\,(w^{T},\phi (x,y))} Update w {\displaystyle w} (from y ^ {\displaystyle {\hat {y}}} towards t {\displaystyle t} ): w = w + c ( − ϕ ( x , y ^ ) + ϕ ( x , t ) ) {\displaystyle w=w+c(-\phi (x,{\hat {y}})+\phi (x,t))} , where c {\displaystyle c} is the learning rate. In practice, finding the argmax over G E N ( x ) {\displaystyle {GEN}({x})} is done using an algorithm such as Viterbi or a max-sum, rather than an exhaustive search through an exponentially large set of candidates. The idea of learning is similar to that for multiclass perceptrons.

Distributed Common Ground System

The Distributed Common Ground System (DCGS) is a system which produces military intelligence for multiple branches of the American military. == DCGS Programs == DCGS-N - DCGS for the United States Navy DCGS-A - DCGS for the United States Army AF DCGS - DCGS for the United States Air Force DCGS-MC - DCGS for the United States Marine Corps DCGS-SOF - DCGS for the United States Special Operations Forces IS&A Support Center - DCGS-A Help Desk for the United States Army - https://dcgsahelp.max.gov/ - Max.gov sunset 15 December 2023 == Description == While in U.S. Air Force use, the system produces intelligence collected by the U-2 Dragonlady, RQ-4 Global Hawk, MQ-9 Reaper and MQ-1 Predator. The previous system of similar use was the Deployable Ground Station (DGS), which was first deployed in July 1994. Subsequent version of DGS were developed from 1995 through 2009. Although officially designated a "weapons system", it consists of computer hardware and software connected together in a computer network, devoted to processing and dissemination of information such as images. The 480th Intelligence, Surveillance and Reconnaissance Wing of the Air Combat Command operates and maintains the USAF system. A plan envisioned in 1998 was to develop interoperable systems for the Army and Navy, in addition to the Air Force. By 2006, version 10.6 was deployed by the Air Force, and a version known as DCGS-A was developed for the Army. After a 2010 report by General Michael T. Flynn, the program was intended to use cloud computing and be as easy to use as an iPad, which soldiers over a few years were commonly using. By April 2011, project manager Colonel Charles Wells announced version 3 of the Army system (code named "Griffin") was being deployed in the US war in Afghanistan. In January 2012, the United States Army Communications-Electronics Research, Development and Engineering Center hosted a meeting based on the DCGS-A early experience. It brought together technology providers in the hope of developing more integrated systems using cloud computing with open architectures, compared to previously specialized custom-built systems. A major contractor was Lockheed Martin, with computers supplied by Silicon Graphics International out of its Chippewa Falls, Wisconsin office. Software known as the Analyst's Notebook, originally developed by i2 Limited, was included in DCGS-A. IBM acquired i2 in 2011. Some US Army personnel reported using a Palantir Technologies product to improve their ability to predict locations of improvised explosive devices. An April 2012 report recommending further study after initial success. Palantir software was rated easy to use, but did not have the flexibility and wide number of data sources of DCGS-A. In July 2012, Congressman Duncan D. Hunter (from California, the state where Palantir is based) complained of US DoD obstacles to its wider use. Although a limited test in August 2011 by the Test and Evaluation Command had recommended deployment, operation problems of DCGS-A included the baseline system was "not operationally effective" with reboots on average about every 8 hours. A set of improvements was identified in November 2012. The press reported some of the shortcomings uncovered by General Genaro Dellarocco in the tests. The ambitious goal of integrating 473 data sources for 75 million reports proved to be challenging, after spending an estimated $2.3 billion on the Army system alone. In May 2013 Politico reported that Palantir lobbyists and some anonymous returning veterans continued to advocate the use of its software, despite its interoperability limits. In particular, members of special forces and US Marines were not required to use the official Army system. Similar stories appeared in other publications, with Army representatives (such as Major General Mary A. Legere) citing the limitations of various systems. Congressman Hunter was a member of the House Armed Services Committee which required a review of the program, after two other members of congress sent an open letter to Secretary of Defense Leon Panetta. The Senate Defense Appropriations Subcommittee included testimony from Army Chief of Staff General Ray Odierno. The 130th Engineer Brigade (United States) has found the system to be "unstable, slow, not friendly and a major hindrance to operations". The equivalent system for the United States Navy was planned for initial deployment by 2015, and within a shipboard network called Consolidated Afloat Networks and Enterprise Services (CANES) by 2016. Some early testing was announced in 2009 aboard the aircraft carrier USS Harry Truman. A portion of the software, a distributed data framework for the DCGS integration backbone (DIB) version 4, was submitted to an open-source software repository of the Codice Foundation on GitHub. The framework was new for DIB version 4, replacing the legacy DIB portal with an Ozone Widget Framework interface. It was written in the Java programming language. == DCGS-A == Distributed Common Ground System-Army (DCGS-A) is the United States Army's primary system to post data, process information, and disseminate Intelligence, Surveillance and Reconnaissance (ISR) information about the threat, weather, and terrain to echelons. DCGS-A provides commanders the ability to task battle-space sensors and receive intelligence information from multiple sources. === Promotion === An August 17, 2011, UPI article quoted i2 Chief Executive Officer Robert Griffin who commented on DCGS-A's best-of-breed approach to development. The article detailed the Army contracting with i2 for Analyst's Notebook software. "With its open architecture, Analyst's Notebook supports the Army's strategy to employ and integrate best-of-breed solutions from across the industry to meet the dynamic needs users face in the field on a daily basis." A February 1, 2012, article in the Army web page quoted Mark Kitz, DCGS-A technical director. DCGS-A "uses the latest in cloud technology to rapidly gather, collaborate and share intelligence data from multiple sources to deliver a common operating picture. DCGS-A is able to rapidly adapt to changing operational environments by leveraging an iterative development model and open architecture allowing for collaboration with multiple government, industry and academic partners." A July 2012 article in SIGNAL Magazine, monthly publication of the Armed Forces Communications and Electronics Association, promoted DCGS-A as taking advantage of technological environments with which young soldiers are familiar. The article quoted the DCGS-A program manager, Col. Charles Wells on the systems benefits. The article also included Lockheed Martin's DCGS-A program manager. The Milwaukee Journal Sentinel published an article May 4, 2012, about Wisconsin-located companies helping DCGS-A with cloud computing technology. The article promoted the speed when cloud computing processes intelligence and cost savings by analyzing data in the field. === The U.S. Army's 2011 Posture Statement === The U.S. Army released its 2011 Army Posture Statement March 2. It included a statement on DCGS-A: “The Distributed Common Ground System-Army (DCGS-A) is the Army's premier intelligence, surveillance, and reconnaissance (ISR) enterprise for the tasking of sensors, analysis and processing of data, exploitation of data, and dissemination of intelligence (TPED) across all echelons. It is the Army component of the larger Defense Intelligence Information Enterprise (DI2E) and interoperable with other Service DCGS programs. Under the DI2E framework, USD (I) hopes to provide COCOM Joint Intelligence Operations Centers (JIOCs) capabilities interoperable with DCGS-A through a Cloud/widget approach. DCGS-A connects tactical, operational, and theater-level commanders to hundreds of intelligence and intelligence-related data sources at all classification levels and allows them to focus efforts of the entire ISR community on their information requirements. === Comparisons === Some Ground Commanders who describe DCGS-A as "unwieldy and unreliable, hard to learn and difficult to use," supporting alternative software from Palantir Technologies. Palantir software supports small unit situational awareness, but is not sufficiently funded to support the broader role that DCGS-A fulfills. == Operators == 480th Intelligence, Surveillance and Reconnaissance Wing 9th Intelligence Squadron 13th Intelligence Squadron 548th Intelligence, Surveillance and Reconnaissance Group 548 Operational Support Squadron 48th Intelligence Squadron 101st Intelligence Squadron 113th Air Support Operations Squadron 127th Command and Control Squadron 161st Intelligence Squadron

The Best Free AI Logo Maker for Beginners

Curious about the best AI logo maker? An AI logo maker is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI logo maker slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

The Best Free AI Paraphrasing Tool for Beginners

Trying to pick the best AI paraphrasing tool? An AI paraphrasing tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI paraphrasing tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

AI Blog Writers: Free vs Paid (2026)

Shopping for the best AI blog writer? An AI blog writer is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI blog writer slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

Kolmogorov–Arnold Networks

Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.

Topic model

In natural language processing, a topic model is a type of probabilistic, neural, or algebraic model for discovering the abstract topics that occur in a collection of documents. Topic modeling is a frequently used text mining tool for discovering hidden semantic features and structures in a text. The topics produced by topic models are generated through a variety of mathematical frameworks, including probabilistic generative models, matrix factorization methods based on word co-occurrence, and clustering algorithms applied to semantic embeddings. Topic models are commonly used to organize and discover latent features in large collections of unstructured text and other forms of big data. Beyond text mining, topic models have also been used to uncover latent structures in fields such as genetic information, bioinformatics, computer vision, and social networks. == History == An early topic model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998. Another one, called probabilistic latent semantic analysis (PLSA), was created by Thomas Hofmann in 1999. Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSA. Developed by David Blei, Andrew Ng, and Michael I. Jordan in 2002, LDA introduces sparse Dirichlet prior distributions over document-topic and topic-word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words. Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Hierarchical latent tree analysis (HLTA) is an alternative to LDA, which models word co-occurrence using a tree of latent variables and the states of the latent variables, which correspond to soft clusters of documents, are interpreted as topics. == Topic models for context information == Approaches for temporal information include Block and Newman's determination of the temporal dynamics of topics in the Pennsylvania Gazette during 1728–1800. Griffiths & Steyvers used topic modeling on abstracts from the journal PNAS to identify topics that rose or fell in popularity from 1991 to 2001 whereas Lamba & Madhusushan used topic modeling on full-text research articles retrieved from DJLIT journal from 1981 to 2018. In the field of library and information science, Lamba & Madhusudhan applied topic modeling on different Indian resources like journal articles and electronic theses and resources (ETDs). Nelson has been analyzing change in topics over time in the Richmond Times-Dispatch to understand social and political changes and continuities in Richmond during the American Civil War. Yang, Torget and Mihalcea applied topic modeling methods to newspapers from 1829 to 2008. Mimno used topic modelling with 24 journals on classical philology and archaeology spanning 150 years to look at how topics in the journals change over time and how the journals become more different or similar over time. Yin et al. introduced a topic model for geographically distributed documents, where document positions are explained by latent regions which are detected during inference. Chang and Blei included network information between linked documents in the relational topic model, to model the links between websites. The author-topic model by Rosen-Zvi et al. models the topics associated with authors of documents to improve the topic detection for documents with authorship information. HLTA was applied to a collection of recent research papers published at major AI and Machine Learning venues. The resulting model is called The AI Tree. The resulting topics are used to index the papers at aipano.cse.ust.hk to help researchers track research trends and identify papers to read, and help conference organizers and journal editors identify reviewers for submissions. To improve the qualitative aspects and coherency of generated topics, some researchers have explored the efficacy of "coherence scores", or otherwise how computer-extracted clusters (i.e. topics) align with a human benchmark. Coherence scores are metrics for optimising the number of topics to extract from a document corpus. == Algorithms == In practice, researchers attempt to fit appropriate model parameters to the data corpus using one of several heuristics for maximum likelihood fit. A survey by D. Blei describes this suite of algorithms. Several groups of researchers starting with Papadimitriou et al. have attempted to design algorithms with provable guarantees. Assuming that the data were actually generated by the model in question, they try to design algorithms that probably find the model that was used to create the data. Techniques used here include singular value decomposition (SVD) and the method of moments. In 2012 an algorithm based upon non-negative matrix factorization (NMF) was introduced that also generalizes to topic models with correlations among topics. Since 2017, neural networks has been leveraged in topic modeling in order to improve the speed of inference, and leading to further advancements like vONTSS, which allows humans to incorporate domain knowledge via weakly supervised learning. In 2018, a new approach to topic models was proposed based on the stochastic block model. Topic modeling has leveraged LLMs through contextual embedding and fine tuning. == Applications of topic models == === To quantitative biomedicine === Topic models are being used also in other contexts. For examples uses of topic models in biology and bioinformatics research emerged. Recently topic models has been used to extract information from dataset of cancers' genomic samples. In this case topics are biological latent variables to be inferred. === To analysis of music and creativity === Topic models can be used for analysis of continuous signals like music. For instance, they were used to quantify how musical styles change in time, and identify the influence of specific artists on later music creation.