AI for Business

Explore the best AI for Business — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Production (computer science)

    Production (computer science)

    In computer science, a production or production rule is a rewrite rule that replaces some symbols with other symbols. A finite set of productions P {\displaystyle P} is the main component in the specification of a formal grammar (specifically a generative grammar). In such grammars, a set of productions is a special case of relation on the set of strings V ∗ {\displaystyle V^{}} (where ∗ {\displaystyle {}^{}} is the Kleene star operator) over a finite set of symbols V {\displaystyle V} called a vocabulary that defines which non-empty strings can be substituted with others. The set of productions is thus a special kind subset P ⊂ V ∗ × V ∗ {\displaystyle P\subset V^{}\times V^{}} and productions are then written in the form u → v {\displaystyle u\to v} to mean that ( u , v ) ∈ P {\displaystyle (u,v)\in P} (not to be confused with → {\displaystyle \to } being used as function notation, since there may be multiple rules for the same u {\displaystyle u} ). Given two subsets A , B ⊂ V ∗ {\displaystyle A,B\subset V^{}} , productions can be restricted to satisfy P ⊂ A × B {\displaystyle P\subset A\times B} , in which case productions are said "to be of the form A → B {\displaystyle A\to B} . Different choices and constructions of A , B {\displaystyle A,B} lead to different types of grammars. In general, any production of the form u → ϵ , {\displaystyle u\to \epsilon ,} where ϵ {\displaystyle \epsilon } is the empty string (sometimes also denoted λ {\displaystyle \lambda } ), is called an erasing rule, while productions that would produce strings out of nowhere, namely of the form ϵ → v , {\displaystyle \epsilon \to v,} are never allowed. In order to allow the production rules to create meaningful sentences, the vocabulary is partitioned into (disjoint) sets Σ {\displaystyle \Sigma } and N {\displaystyle N} providing two different roles: Σ {\displaystyle \Sigma } denotes the terminal symbols known as an alphabet containing the symbols allowed in a sentence; N {\displaystyle N} denotes nonterminal symbols, containing a distinguished start symbol S ∈ N {\displaystyle S\in N} , that are needed together with the production rules to define how to build the sentences. In the most general case of an unrestricted grammar, a production u → v {\displaystyle u\to v} , is allowed to map arbitrary strings u {\displaystyle u} and v {\displaystyle v} in V {\displaystyle V} (terminals and nonterminals), as long as u {\displaystyle u} is not empty. So unrestricted grammars have productions of the form V ∗ ∖ { ϵ } → V ∗ {\displaystyle V^{}\setminus \{\epsilon \}\to V^{}} or if we want to disallow changing finished sentences V ∗ N V ∗ = ( V ∗ ∖ Σ ∗ ) → V ∗ {\displaystyle V^{}NV^{}=(V^{}\setminus \Sigma ^{})\to V^{}} , where V ∗ N V ∗ {\displaystyle V^{}NV^{}} indicates concatenation and forces a non-terminal symbol to always be present on the left-hand side of the productions, and ∖ {\displaystyle \setminus } denotes set minus or set difference. If we do not allow the start symbol to occur in v {\displaystyle v} (the word on the right side), we have to replace V ∗ {\displaystyle V^{}} with ( V ∖ { S } ) ∗ {\displaystyle (V\setminus \{S\})^{}} on the right-hand side. The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form: N → V ∗ {\displaystyle N\to V^{}} == Grammar generation == To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when a string containing only terminals is obtained. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous. For example, assume the alphabet consists of a {\displaystyle a} and b {\displaystyle b} , with the start symbol S {\displaystyle S} , and we have the following rules: 1. S → a S b {\displaystyle S\rightarrow aSb} 2. S → b a {\displaystyle S\rightarrow ba} then we start with S {\displaystyle S} , and can choose a rule to apply to it. If we choose rule 1, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a S b {\displaystyle aSb} . If we choose rule 1 again, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a a S b b {\displaystyle aaSbb} . This process is repeated until we only have symbols from the alphabet (i.e., a {\displaystyle a} and b {\displaystyle b} ). If we now choose rule 2, we replace S {\displaystyle S} with b a {\displaystyle ba} and obtain the string a a b a b b {\displaystyle aababb} , and are done. We can write this series of choices more briefly, using symbols: S ⇒ a S b ⇒ a a S b b ⇒ a a b a b b {\displaystyle S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb} . The language of the grammar is the set of all the strings that can be generated using this process: { b a , a b a b , a a b a b b , a a a b a b b b , … } {\displaystyle \{ba,abab,aababb,aaababbb,\dotsc \}} .

    Read more →
  • Regularization perspectives on support vector machines

    Regularization perspectives on support vector machines

    Within mathematical analysis, Regularization perspectives on support-vector machines provide a way of interpreting support-vector machines (SVMs) in the context of other regularization-based machine-learning algorithms. SVM algorithms categorize binary data, with the goal of fitting the training set data in a way that minimizes the average of the hinge-loss function and L2 norm of the learned weights. This strategy avoids overfitting via Tikhonov regularization and in the L2 norm sense and also corresponds to minimizing the bias and variance of our estimator of the weights. Estimators with lower Mean squared error predict better or generalize better when given unseen data. Specifically, Tikhonov regularization algorithms produce a decision boundary that minimizes the average training-set error and constrain the Decision boundary not to be excessively complicated or overfit the training data via a L2 norm of the weights term. The training and test-set errors can be measured without bias and in a fair way using accuracy, precision, Auc-Roc, precision-recall, and other metrics. Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the hinge loss for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to generalize without overfitting. SVM was first proposed in 1995 by Corinna Cortes and Vladimir Vapnik, and framed geometrically as a method for finding hyperplanes that can separate multidimensional data into two categories. This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other machine-learning techniques for avoiding overfitting, like regularization, early stopping, sparsity and Bayesian inference. However, once it was discovered that SVM is also a special case of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms. This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's loss function, the hinge loss. == Theoretical background == In the statistical learning theory framework, an algorithm is a strategy for choosing a function f : X → Y {\displaystyle f\colon \mathbf {X} \to \mathbf {Y} } given a training set S = { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle S=\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\}} of inputs x i {\displaystyle x_{i}} and their labels y i {\displaystyle y_{i}} (the labels are usually ± 1 {\displaystyle \pm 1} ). Regularization strategies avoid overfitting by choosing a function that fits the data, but is not too complex. Specifically: f = argmin f ∈ H { 1 n ∑ i = 1 n V ( y i , f ( x i ) ) + λ ‖ f ‖ H 2 } , {\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},f(x_{i}))+\lambda \|f\|_{\mathcal {H}}^{2}\right\},} where H {\displaystyle {\mathcal {H}}} is a hypothesis space of functions, V : Y × Y → R {\displaystyle V\colon \mathbf {Y} \times \mathbf {Y} \to \mathbb {R} } is the loss function, ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} is a norm on the hypothesis space of functions, and λ ∈ R {\displaystyle \lambda \in \mathbb {R} } is the regularization parameter. When H {\displaystyle {\mathcal {H}}} is a reproducing kernel Hilbert space, there exists a kernel function K : X × X → R {\displaystyle K\colon \mathbf {X} \times \mathbf {X} \to \mathbb {R} } that can be written as an n × n {\displaystyle n\times n} symmetric positive-definite matrix K {\displaystyle \mathbf {K} } . By the representer theorem, f ( x i ) = ∑ j = 1 n c j K i j , and ‖ f ‖ H 2 = ⟨ f , f ⟩ H = ∑ i = 1 n ∑ j = 1 n c i c j K ( x i , x j ) = c T K c . {\displaystyle f(x_{i})=\sum _{j=1}^{n}c_{j}\mathbf {K} _{ij},{\text{ and }}\|f\|_{\mathcal {H}}^{2}=\langle f,f\rangle _{\mathcal {H}}=\sum _{i=1}^{n}\sum _{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=c^{T}\mathbf {K} c.} == Special properties of the hinge loss == The simplest and most intuitive loss function for categorization is the misclassification loss, or 0–1 loss, which is 0 if f ( x i ) = y i {\displaystyle f(x_{i})=y_{i}} and 1 if f ( x i ) ≠ y i {\displaystyle f(x_{i})\neq y_{i}} , i.e. the Heaviside step function on − y i f ( x i ) {\displaystyle -y_{i}f(x_{i})} . However, this loss function is not convex, which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0–1 loss. The hinge loss, V ( y i , f ( x i ) ) = ( 1 − y f ( x ) ) + {\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+}} , where ( s ) + = max ( s , 0 ) {\displaystyle (s)_{+}=\max(s,0)} , provides such a convex relaxation. In fact, the hinge loss is the tightest convex upper bound to the 0–1 misclassification loss function, and with infinite data returns the Bayes-optimal solution: f b ( x ) = { 1 , p ( 1 ∣ x ) > p ( − 1 ∣ x ) , − 1 , p ( 1 ∣ x ) < p ( − 1 ∣ x ) . {\displaystyle f_{b}(x)={\begin{cases}1,&p(1\mid x)>p(-1\mid x),\\-1,&p(1\mid x) Read more →

  • Is an AI Marketing Tool Worth It in 2026?

    Is an AI Marketing Tool Worth It in 2026?

    Trying to pick the best AI marketing tool? An AI marketing tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI marketing tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Boris Katz

    Boris Katz

    Boris Gershevich Katz (Russian: Борис Гершевич Кац; born October 5, 1947) is a principal American research scientist (computer scientist) at the MIT Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology in Cambridge and head of the Laboratory's InfoLab Group. His research interests include natural language processing and understanding, machine learning and intelligent information access. His brother Victor Kac is a mathematician at MIT. He was able to get out of the USSR with the help of U.S. Senator Ted Kennedy, before the end of the Cold War. Over the last several decades, Boris Katz has been developing the START natural language system that allows the user to access various types of information using English. == Biography == Boris Katz was born on October 5, 1947, in Chișinău in the family of Hersh Katz (died 1976) and Hayki (Klara) Landman (born 1921, Lipcani, Briceni District - died 2006, Cambridge, Middlesex County), who moved from Lipcani, a town located in the northern Bessarabian, to Chișinău before the war. He graduated from Moscow State University and in November 1978, he left for the United States thanks to the personal intervention of Senator Edward M. Kennedy. He defended his thesis as a candidate of physical and mathematical sciences in 1975 under the supervision of Evgenii M. Landis. He currently lives in Boston and heads the InfoLabresearch team at the Laboratory of Informatics and Artificial Intelligence at the Massachusetts Institute of Technology. Boris Katz is the creator of the START information processing system (since 1993 - on the Internet), the author of several works in the field of processing, generation and perception of natural languages, machine learning, and accelerated access to multimedia information. == Family == Brothers - Victor Gershevich Katz, American mathematician, professor at the Massachusetts Institute of Technology; Mikhail Gershevich Katz, Israeli mathematician, graduate of Harvard and Columbia (Ph.D., 1984) universities, professor at Bar-Ilan University, author of the monograph "Systolic Geometry and Topology" (Mathematical Surveys and Monographs, vol. 137. American Mathematical Society: Providence, 2007). Daughter - Luba Katz, a bioinformatics scientist (her husband is Alan Jasanoff, a neuroimaging scientist, a professor at MIT, the son of Harvard University professors Jay Jasanoff and Sheila Jasanoff). == Past works == A Knowledge Entry System for Subject Matter Experts: The goal of SHAKEN project is to enable subject matter experts, without any assistance from AI technologists, to assemble the models of processes and mechanisms so that questions about them can be answered by declarative inference and simulation. Exploiting lexical regularities in designing natural language systems Word sense disambiguation for information retrieval HIKE (HPKB integrated knowledge environment)- a query interface and integrated knowledge environment for HPKB Quantitative evaluation of passage retrieval algorithms for question answering Sticky notes for the semantic web Question answering from the web using knowledge annotation and knowledge mining techniques The role of context in question answering systems

    Read more →
  • Rademacher complexity

    Rademacher complexity

    In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions. == Definitions == === Rademacher complexity of a set === Given a set A ⊆ R m {\displaystyle A\subseteq \mathbb {R} ^{m}} , the Rademacher complexity of A is defined as follows: Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]} where σ 1 , σ 2 , … , σ m {\displaystyle \sigma _{1},\sigma _{2},\dots ,\sigma _{m}} are independent random variables drawn from the Rademacher distribution i.e. Pr ( σ i = + 1 ) = Pr ( σ i = − 1 ) = 1 / 2 {\displaystyle \Pr(\sigma _{i}=+1)=\Pr(\sigma _{i}=-1)=1/2} for i ∈ { 1 , 2 , … , m } {\displaystyle i\in \{1,2,\dots ,m\}} , and a = ( a 1 , … , a m ) ∈ A {\displaystyle a=(a_{1},\ldots ,a_{m})\in A} . Some authors take the absolute value of the sum before taking the supremum, but if A {\displaystyle A} is symmetric this makes no difference. === Rademacher complexity of a function class === Let S = { z 1 , z 2 , … , z m } ⊆ Z {\displaystyle S=\{z_{1},z_{2},\dots ,z_{m}\}\subseteq Z} be a sample of points and consider a function class F {\displaystyle {\mathcal {F}}} of real-valued functions over Z {\displaystyle Z} . Then, the empirical Rademacher complexity of F {\displaystyle {\mathcal {F}}} given S {\displaystyle S} is defined as: Rad S ⁡ ( F ) = 1 m E σ [ sup f ∈ F | ∑ i = 1 m σ i f ( z i ) | ] {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{f\in {\mathcal {F}}}\left|\sum _{i=1}^{m}\sigma _{i}f(z_{i})\right|\right]} This can also be written using the previous definition: Rad S ⁡ ( F ) = Rad ⁡ ( F ∘ S ) {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})=\operatorname {Rad} ({\mathcal {F}}\circ S)} where F ∘ S {\displaystyle {\mathcal {F}}\circ S} denotes function composition, i.e.: F ∘ S := { ( f ( z 1 ) , … , f ( z m ) ) ∣ f ∈ F } {\displaystyle {\mathcal {F}}\circ S:=\{(f(z_{1}),\ldots ,f(z_{m}))\mid f\in {\mathcal {F}}\}} The worst case empirical Rademacher complexity is Rad ¯ m ( F ) = sup S = { z 1 , … , z m } Rad S ⁡ ( F ) {\displaystyle {\overline {\operatorname {Rad} }}_{m}({\mathcal {F}})=\sup _{S=\{z_{1},\dots ,z_{m}\}}\operatorname {Rad} _{S}({\mathcal {F}})} Let P {\displaystyle P} be a probability distribution over Z {\displaystyle Z} . The Rademacher complexity of the function class F {\displaystyle {\mathcal {F}}} with respect to P {\displaystyle P} for sample size m {\displaystyle m} is: Rad P , m ⁡ ( F ) := E S ∼ P m [ Rad S ⁡ ( F ) ] {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}}):=\mathbb {E} _{S\sim P^{m}}\left[\operatorname {Rad} _{S}({\mathcal {F}})\right]} where the above expectation is taken over an identically independently distributed (i.i.d.) sample S = ( z 1 , z 2 , … , z m ) {\displaystyle S=(z_{1},z_{2},\dots ,z_{m})} generated according to P {\displaystyle P} . == Intuition == The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of σ i {\displaystyle \sigma _{i}} under the expectation, so that this quantity in the sum is maximized. The Rademacher complexity of a set A {\displaystyle A} can be rewritten as Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] = 1 m 2 m ∑ σ ∈ { − 1 / m , + 1 / m } m [ sup a ∈ A ⟨ σ , a ⟩ ] . {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]={\frac {1}{{\sqrt {m}}2^{m}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}}\left[\sup _{a\in A}\langle \sigma ,a\rangle \right].} Each term in the summation is the farthest distance of the set A {\displaystyle A} from the origin, along a unit-length direction σ {\displaystyle \sigma } . The directions are along the vertices of a hypercube. Thus, we can also write it as Rad ⁡ ( A ) = 1 2 m 1 2 m − 1 ∑ σ ∈ { − 1 / m , + 1 / m } m / { − 1 , + 1 } [ sup a ∈ A ⟨ σ , a ⟩ − inf a ∈ A ⟨ σ , a ⟩ ] {\displaystyle \operatorname {Rad} (A)={\frac {1}{2{\sqrt {m}}}}{\frac {1}{2^{m-1}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}}\left[\sup _{a\in A}\langle \sigma ,a\rangle -\inf _{a\in A}\langle \sigma ,a\rangle \right]} Here, the set { − 1 / m , + 1 / m } m / { − 1 , + 1 } {\displaystyle \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}} denotes half of the vertices of a hypercube, selected so that each diagonal has exactly one vertex selected. In words, this states that 2 m Rad ⁡ ( A ) {\displaystyle 2{\sqrt {m}}\operatorname {Rad} (A)} is precisely the average width of the set A {\displaystyle A} along all diagonal directions of a hypercube. == Examples == A singleton set has 0 width in any direction, so it has Rademacher complexity 0. The set A = { ( 1 , 1 ) , ( 1 , 2 ) } ⊆ R 2 {\displaystyle A=\{(1,1),(1,2)\}\subseteq \mathbb {R} ^{2}} has average width 1 / 2 {\displaystyle 1/{\sqrt {2}}} along the two diagonal directions of the square, so it has Rademacher complexity 1 / 4 {\displaystyle 1/4} . The unit cube [ 0 , 1 ] m {\displaystyle [0,1]^{m}} has constant width m {\displaystyle {\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / 2 {\displaystyle 1/2} . Similarly, the unit cross-polytope { x ∈ R m : ‖ x ‖ 1 ≤ 1 } {\displaystyle \{x\in \mathbb {R} ^{m}:\|x\|_{1}\leq 1\}} has constant width 2 / m {\displaystyle 2/{\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / m {\displaystyle 1/m} . == Using the Rademacher complexity == The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn. === Bounding the representativeness === In machine learning, it is desired to have a training set that represents the true distribution of some sample data S {\displaystyle S} . This can be quantified using the notion of representativeness. Denote by P {\displaystyle P} the probability distribution from which the samples are drawn. Denote by H {\displaystyle H} the set of hypotheses (potential classifiers) and denote by F {\displaystyle {\mathcal {F}}} the corresponding set of error functions, i.e., for every hypothesis h ∈ H {\displaystyle h\in H} , there is a function f h ∈ F {\displaystyle f_{h}\in F} , that maps each training sample (features,label) to the error of the classifier h {\displaystyle h} (note in this case hypothesis and classifier are used interchangeably). For example, in the case that h {\displaystyle h} represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function f h {\displaystyle f_{h}} returns 0 if h {\displaystyle h} correctly classifies a sample and 1 else. We omit the index and write f {\displaystyle f} instead of f h {\displaystyle f_{h}} when the underlying hypothesis is irrelevant. Define: L P ( f ) := E z ∼ P [ f ( z ) ] {\displaystyle L_{P}(f):=\mathbb {E} _{z\sim P}[f(z)]} – the expected error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the real distribution P {\displaystyle P} ; L S ( f ) := 1 m ∑ i = 1 m f ( z i ) {\displaystyle L_{S}(f):={1 \over m}\sum _{i=1}^{m}f(z_{i})} – the estimated error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the sample S {\displaystyle S} . The representativeness of the sample S {\displaystyle S} , with respect to P {\displaystyle P} and F {\displaystyle {\mathcal {F}}} , is defined as: Rep P ⁡ ( F , S ) := sup f ∈ F ( L P ( f ) − L S ( f ) ) {\displaystyle \operatorname {Rep} _{P}({\mathcal {F}},S):=\sup _{f\in F}(L_{P}(f)-L_{S}(f))} Smaller representativeness is better, since it provides a way to avoid overfitting: it means that the true error of a classifier is not much higher than its estimated error, and so selecting a classifier that has low estimated error will ensure that the true error is also low. Note however that the concept of representativeness is relative and hence can not be compared across distinct samples. The expected representativeness of a sample can be bounded above by the Rademacher complexity of the function class: If F {\displaystyle {\mathcal {F}}} is a set of functions with range within [ 0 , 1 ] {\displaystyle [0,1]} , then Rad P , m ⁡ ( F ) − ln ⁡ 2 2 m ≤ E S ∼ P m [ Rep P ⁡ ( F , S ) ] ≤ 2 Rad P , m ⁡ ( F ) {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}})-{\sqrt {\frac {\ln 2}{2m}}}\leq \mathbb {E} _{S\sim P^{m}}[\operatorname {Rep} _{P}({\

    Read more →
  • FrameNet

    FrameNet

    FrameNet is a group of online lexical databases based upon the theory of meaning known as Frame semantics, developed by linguist Charles J. Fillmore. The project's fundamental notion is simple: most words' meanings may be best understood in terms of a semantic frame, which is a description of a certain kind of event, connection, or item and its actors. As an illustration, the act of cooking usually requires the following: a cook, the food being cooked, a container to hold the food while it is being cooked, and a heating instrument. Within FrameNet, this act is represented by a frame named Apply_heat, and its components (Cook, Food, Container, and Heating_instrument), are referred to as frame elements (FEs). The Apply_heat frame also lists a number of words that represent it, known as lexical units (LUs), like fry, bake, boil, and broil. Other frames are simpler. For example, Placing only has an agent or cause, a theme—something that is placed—and the location where it is placed. Some frames are more complex, like Revenge, which contains more FEs (offender, injury, injured party, avenger, and punishment). As in the examples of Apply_heat and Revenge below, FrameNet's role is to define the frames and annotate sentences to demonstrate how the FEs fit syntactically around the word that elicits the frame. == Concepts == === Frames === A frame is a schematic representation of a situation involving various participants, props, and other conceptual roles. Examples of frame names are Being_born and Locative_relation. A frame in FrameNet contains a textual description of what it represents (a frame definition), associated frame elements, lexical units, example sentences, and frame-to-frame relations. === Frame elements === Frame elements (FE) provide additional information to the semantic structure of a sentence. Each frame has a number of core and non-core FEs which can be thought of as semantic roles. Core FEs are essential to the meaning of the frame while non-core FEs are generally descriptive (such as time, place, manner, etc.) For example: The only core FE of the Being_born frame is called Child; non-core FEs Time, Place, Means, etc. Core FEs of the Commerce_goods-transfer frame include the Seller, Buyer, and Goods, while non-core FEs include a Place, Purpose, etc. FrameNet includes shallow data on syntactic roles that frame elements play in the example sentences. For example, for a sentence like "She was born about AD 460", FrameNet would mark She as a noun phrase referring to the Child frame element, and "about AD 460" as a noun phrase corresponding to the Time frame element. Details of how frame elements can be realized in a sentence are important because this reveals important information about the subcategorization frames as well as possible diathesis alternations (e.g. "John broke the window" vs. "The window broke") of a verb. === Lexical units === Lexical units (LUs) are lemmas, with their part of speech, that evoke a specific frame. In other words, when an LU is identified in a sentence, that specific LU can be associated with its specific frame(s). For each frame, there may be many LUs associated to that frame, and also there may be many frames that share a specific LU; this is typically the case with LUs that have multiple word senses. Alongside the frame, each lexical unit is associated with specific frame elements by means of the annotated example sentences. For example, lexical units that evoke the Complaining frame (or more specific perspectivized versions of it, to be precise), include the verbs complain, grouse, lament, and others. === Example sentences === Frames are associated with example sentences and frame elements are marked within the sentences. Thus, the sentence She was born about AD 460 is associated with the frame Being_born, while She is marked as the frame element Child and "about AD 460" is marked as Time. From the start, the FrameNet project has been committed to looking at evidence from actual language use as found in text collections like the British National Corpus. Based on such example sentences, automatic semantic role labeling tools are able to determine frames and mark frame elements in new sentences. === Valences === FrameNet also exposes statistics on the valence of each frame; that is, the number and position of the frame elements within example sentences. The sentence She was born about AD 460 falls in the valence pattern NP Ext, INI --, NP Dep which occurs twice in the FrameNet's annotation report for the born.v lexical unit, namely: She was born about AD 460, daughter and granddaughter of Roman and Byzantine emperors, whose family had been prominent in Roman politics for over 700 years. He was soon posted to north Africa, and never met their only child, a daughter born 8 June 1941. === Frame relations === FrameNet additionally captures relationships between different frames using relations. These include the following: Inheritance: When one frame is a more specific version of another, more abstract, parent frame. Anything that is true about the parent frame must also be true about the child frame, and a mapping is specified between the frame elements of the parent and the frame elements of the child. Perspectivization: A neutral frame is connected to a frame with a specific perspective of the same scenario. For example, Commerce_transfer-goods is considered from the perspective of the buyer in Commerce_buy and from that of the seller in Commerce_sell. Subframe: Some frames refer to complex scenarios that consist of several individual states or events that can be described by separate frames. For example, Criminal_process is composed of Arrest, Trial, and so on. Precedence: This relation captures the temporal order that holds between subframes of a complex frame. For example, within the Cycle_of_life_and_death frame, the subframe Death is preceded by the subframe Being_born. Causative and Inchoative: These two relations mark, for causative- and inchoative-aspect frames, the separate stative frame they refer to. For example, the stative Position_on_a_scale (e.g. "She had a high salary") is described by the causative Cause_change_of_scalar_position (e.g. "She raised his salary") and by the inchoative Change_position_on_a_scale frame (e.g. "Her salary increased"). Using: This relation marks a frame that in some way involves another frame. For example, Judgment_communication uses both Judgment and Statement, but does not inherit from either of them because there is no clear correspondence of frame elements. See also: Connects frames that bear some resemblance but need to be distinguished carefully. == Applications == FrameNet has proven to be useful in a number of computational applications, because computers need additional knowledge in order to recognize that "John sold a car to Mary" and "Mary bought a car from John" describe essentially the same situation, despite using two quite different verbs, different prepositions and a different word order. FrameNet has been used in applications like question answering, paraphrasing, recognizing textual entailment, and information extraction, either directly or by means of Semantic Role Labeling tools. The first automatic system for Semantic Role Labeling (SRL, sometimes also referred to as "shallow semantic parsing") was developed by Daniel Gildea and Daniel Jurafsky based on FrameNet in 2002. Semantic Role Labeling has since become one of the standard tasks in natural language processing, with the latest version (1.7) of FrameNet now fully supported in the Natural Language Toolkit. Since frames are essentially semantic descriptions, they are similar across languages, and several projects have arisen over the years that have relied on the original FrameNet as the basis for additional non-English FrameNets, for Spanish, Japanese, German, and Polish, among others.

    Read more →
  • Markov chain Monte Carlo

    Markov chain Monte Carlo

    In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it, i.e. the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too high dimensional to study with analytic techniques alone. Various algorithms exist for constructing such Markov chains, including the Metropolis–Hastings algorithm. == General explanation == Markov chain Monte Carlo methods create samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance. Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities. Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the Markov chain central limit theorem when estimating the error of mean values. These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given. == History == The development of MCMC methods is deeply rooted in the early exploration of Monte Carlo (MC) techniques in the mid-20th century, particularly in physics. These developments were marked by the Metropolis algorithm proposed by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller in 1953, which was designed to tackle high-dimensional integration problems using early computers. Then in 1970, W. K. Hastings generalized this algorithm and inadvertently introduced the component-wise updating idea, later known as Gibbs sampling. Simultaneously, the theoretical foundations for Gibbs sampling were being developed, such as the Hammersley–Clifford theorem from Julian Besag's 1974 paper. Although the seeds of MCMC were sown earlier, including the formal naming of Gibbs sampling in image processing by Stuart Geman and Donald Geman (1984) and the data augmentation method by Martin A. Tanner and Wing Hung Wong (1987), its "revolution" in mainstream statistics largely followed demonstrations of the universality and ease of implementation of sampling methods (especially Gibbs sampling) for complex statistical (particularly Bayesian) problems, spurred by increasing computational power and software like BUGS. This transformation was accompanied by significant theoretical advancements, such as Luke Tierney's (1994) rigorous treatment of MCMC convergence, and Jun S. Liu, Wong, and Augustine Kong's (1994, 1995) analysis of Gibbs sampler structure. Subsequent developments further expanded the MCMC toolkit, including particle filters (Sequential Monte Carlo) for sequential problems, Perfect sampling aiming for exact simulation (Jim Propp and David B. Wilson, 1996), RJMCMC (Peter J. Green, 1995) for handling variable-dimension models, and deeper investigations into convergence diagnostics and the central limit theorem. Overall, the evolution of MCMC represents a paradigm shift in statistical computation, enabling the analysis of numerous previously intractable complex models and continually expanding the scope and impact of statistics. == Mathematical setting == Suppose (Xn) is a Markov Chain in the general state space X {\displaystyle {\mathcal {X}}} with specific properties. We are interested in the limiting behavior of the partial sums: S n ( h ) = 1 n ∑ i = 1 n h ( X i ) {\displaystyle S_{n}(h)={\dfrac {1}{n}}\sum _{i=1}^{n}h(X_{i})} as n goes to infinity. Particularly, we hope to establish the Law of Large Numbers and the Central Limit Theorem for MCMC. In the following, we state some definitions and theorems necessary for the important convergence results. In short, we need the existence of invariant measure and Harris recurrent to establish the Law of Large Numbers of MCMC (Ergodic Theorem). And we need aperiodicity, irreducibility and extra conditions such as reversibility to ensure the Central Limit Theorem holds in MCMC. === Irreducibility and aperiodicity === Recall that in the discrete setting, a Markov chain is said to be irreducible if it is possible to reach any state from any other state in a finite number of steps with positive probability. However, in the continuous setting, point-to-point transitions have zero probability. In this case, φ-irreducibility generalizes irreducibility by using a reference measure φ on the measurable space ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} . Definition (φ-irreducibility) Given a measure φ {\displaystyle \varphi } defined on ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} , the Markov chain ( X n ) {\displaystyle (X_{n})} with transition kernel K ( x , y ) {\displaystyle K(x,y)} is φ-irreducible if, for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} with φ ( A ) > 0 {\displaystyle \varphi (A)>0} , there exists n {\displaystyle n} such that K n ( x , A ) > 0 {\displaystyle K^{n}(x,A)>0} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} (Equivalently, P x ( τ A < ∞ ) > 0 {\displaystyle P_{x}(\tau _{A}<\infty )>0} , here τ A = inf { n ≥ 1 ; X n ∈ A } {\displaystyle \tau _{A}=\inf\{n\geq 1;X_{n}\in A\}} is the first n {\displaystyle n} for which the chain enters the set A {\displaystyle A} ). This is a more general definition for irreducibility of a Markov chain in non-discrete state space. In the discrete case, an irreducible Markov chain is said to be aperiodic if it has period 1. Formally, the period of a state ω ∈ X {\displaystyle \omega \in {\mathcal {X}}} is defined as: d ( ω ) := g c d { m ≥ 1 ; K m ( ω , ω ) > 0 } {\displaystyle d(\omega ):=\mathrm {gcd} \{m\geq 1\,;\,K^{m}(\omega ,\omega )>0\}} For the general (non-discrete) case, we define aperiodicity in terms of small sets: Definition (Cycle length and small sets) A φ-irreducible Markov chain ( X n ) {\displaystyle (X_{n})} has a cycle of length d if there exists a small set C {\displaystyle C} , an associated integer M {\displaystyle M} , and a probability distribution ν M {\displaystyle \nu _{M}} such that d is the greatest common divisor of: { m ≥ 1 ; ∃ δ m > 0 such that C is small for ν m ≥ δ m ν M } . {\displaystyle \{m\geq 1\,;\,\exists \,\delta _{m}>0{\text{ such that }}C{\text{ is small for }}\nu _{m}\geq \delta _{m}\nu _{M}\}.} A set C {\displaystyle C} is called small if there exists m ∈ N ∗ {\displaystyle m\in \mathbb {N} ^{}} and a nonzero measure ν m {\displaystyle \nu _{m}} such that: K m ( x , A ) ≥ ν m ( A ) , ∀ x ∈ C , ∀ A ∈ B ( X ) . {\displaystyle K^{m}(x,A)\geq \nu _{m}(A),\quad \forall x\in C,\,\forall A\in {\mathcal {B}}({\mathcal {X}}).} === Harris recurrent === Definition (Harris recurrence) A set A {\displaystyle A} is Harris recurrent if P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ A {\displaystyle x\in A} , where η A = ∑ n = 1 ∞ I A ( X n ) {\displaystyle \eta _{A}=\sum _{n=1}^{\infty }\mathbb {I} _{A}(X_{n})} is the number of visits of the chain ( X n ) {\displaystyle (X_{n})} to the set A {\displaystyle A} . The chain ( X n ) {\displaystyle (X_{n})} is said to be Harris recurrent if there exists a measure ψ {\displaystyle \psi } such that the chain is ψ {\displaystyle \psi } -irreducible and every measurable set A {\displaystyle A} with ψ ( A ) > 0 {\displaystyle \psi (A)>0} is Harris recurrent. A useful criterion for verifying Harris recurrence is the following: Proposition If for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} , we have P x ( τ A < ∞ ) = 1 {\displaystyle P_{x}(\tau _{A}<\infty )=1} for every x ∈ A {\displaystyle x\in A} , then P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} , and the chain ( X n ) {\displaystyle (X_{n})} is Harris recurrent. This definition is only needed when the state space X {\displaystyle {\mathcal {X}}} is uncountable. In the countable case, recurrence corresponds to E x [ η x ] = ∞ {\displaystyle \mathbb {E} _{x}[\eta _{x}]=\infty } , which is equivalent to P x ( τ x < ∞ ) = 1 {\displaystyle P_{x}(\tau _{x}<\infty )=1} for all x ∈ X {\displaystyle x\i

    Read more →
  • The Best Free AI Bug Finder for Beginners

    The Best Free AI Bug Finder for Beginners

    Shopping for the best AI bug finder? An AI bug finder is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI bug finder slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Normalization (image processing)

    Normalization (image processing)

    In image processing, normalization is a process that changes the range of pixel intensity values, a kind of intensity mapping. Applications include photographs with poor contrast due to glare, for example. A typical case is contrast stretching. In more general fields of data processing, such as digital signal processing, it is referred to as dynamic range expansion. The purpose of dynamic range expansion in the various applications is usually to bring the image, or other type of signal, into a range that is more familiar or normal to the senses, hence the term normalization. Often, the motivation is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. For example, a newspaper will strive to make all of the images in an issue share a similar range of grayscale. Auto-normalization in image processing software typically normalizes to the full dynamic range of the number system specified in the image file format. == Definition == Normalization transforms an n-dimensional grayscale image I : { X ⊆ R n } → { Min , . . , Max } {\displaystyle I:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{Min}},..,{\text{Max}}\}} with intensity values in the range ( Min , Max ) {\displaystyle ({\text{Min}},{\text{Max}})} , into a new image I N : { X ⊆ R n } → { newMin , . . , newMax } {\displaystyle I_{N}:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{newMin}},..,{\text{newMax}}\}} with intensity values in the range ( newMin , newMax ) {\displaystyle ({\text{newMin}},{\text{newMax}})} . The linear normalization of a grayscale digital image is performed according to the formula I N = ( I − Min ) newMax − newMin Max − Min + newMin {\displaystyle I_{N}=(I-{\text{Min}}){\frac {{\text{newMax}}-{\text{newMin}}}{{\text{Max}}-{\text{Min}}}}+{\text{newMin}}} For example, if the intensity range of the image is 50 to 180 and the desired range is 0 to 255 the process entails subtracting 50 from each of pixel intensity, making the range 0 to 130. Then each pixel intensity is multiplied by 255/130, making the range 0 to 255. Normalization might also be non-linear, as the relationship between I {\displaystyle I} and I N {\displaystyle I_{N}} may not be linear. An example of non-linear normalization is when the normalization follows a sigmoid function, in which case the normalized image is computed according to the formula I N = ( newMax − newMin ) 1 1 + e − I − β α + newMin {\displaystyle I_{N}=({\text{newMax}}-{\text{newMin}}){\frac {1}{1+e^{-{\frac {I-\beta }{\alpha }}}}}+{\text{newMin}}} Where α {\displaystyle \alpha } defines the width of the input intensity range, and β {\displaystyle \beta } defines the intensity around which the range is centered. Gamma correction (log/inverse log) is also a common transformation function. === Colorspace === Intensity operations generally operate on a colorspace that maps to the human perception of lightness without intentionally changing the other properties. This can be done, for example, by operating on the L component of the CIELAB color space, or approximately by operating on the Y component of YCbCr. It is also possible to operate on each of the RGB color channels, though the result will not always make sense. == Contrast stretching == This is the most significant and essential technique of spatial-based image enhancement. The basic intent of this contrast enhancement technique is to adjust the local contrast in the image so as to bring out the clear regions or objects in the image. Low-contrast images often result from poor or non-uniform lighting conditions, a limited dynamic range of the imaging sensor, or improper settings of the lens aperture. This operation tries to change the intensity of the pixel in the image, particularly in the input image, to obtain an enhanced image. It is based on the number of techniques, namely local, global, dark and bright levels of contrast. The contrast enhancement is considered as the amount of color or gray differentiation that lies among the different features in an image. The contrast enhancement improves the quality of image by increasing the luminance difference between the foreground and background. A contrast stretching transformation can be achieved by: Stretching the dark range of input values into a wider range of output values: This involves increasing the brightness of the darker areas in the image to enhance details and improve visibility. Shifting the mid-range of input values: This involves adjusting the brightness levels of the mid-tones in the image to improve overall contrast and clarity. Compressing the bright range of input values: This process involves reducing the brightness of the brighter areas in the image to prevent overexposure resulting in a more balanced and visually appealing image. It can be described as the following piecewise funciton: I N = { s 1 r 1 I if I < r 1 s 2 − s 1 r 1 − r 2 ( I − r 1 ) if r 1 ≤ I ≤ r 2 1 − s 2 1 − r 2 ( I − r 2 ) if I > r 2 {\displaystyle I_{N}={\begin{cases}{\frac {s_{1}}{r_{1}}}I&{\text{if }}Ir_{2}\end{cases}}} Where: ( r 1 , s 1 ) {\displaystyle (r_{1},s_{1})} defines the transition point between the "dark" range to the "main" range. ( r 2 , s 2 ) {\displaystyle (r_{2},s_{2})} defines the transition point between the "main" range to the "bright" range. A typical linear stretch is obtained when ( r 1 , s 1 ) = ( r min , 0 ) {\displaystyle (r_{1},s_{1})=(r_{\text{min}},0)} and ( r 2 , s 2 ) = ( r max , 1 ) {\displaystyle (r_{2},s_{2})=(r_{\text{max}},1)} , where r min {\displaystyle r_{\text{min}}} and r max {\displaystyle r_{\text{max}}} denote the minimum and maximum levels in the source image. === Global contrast stretching === Global Contrast Stretching considers all color palate ranges at once to determine the maximum and minimum values for the entire RGB color image. This approach utilizes the combination of RGB colors to derive a single maximum and minimum value for contrast stretching across the entire image. === Local contrast stretching === Local contrast stretching (LCS) is an image enhancement method that focuses on locally adjusting each pixel's value to improve the visualization of structures within an image, particularly in both the darkest and lightest portions. It operates by utilizing sliding windows, known as kernels, which traverse the image. The central pixel within each kernel is adjusted using the following formula: I p ( x , y ) = 255 × [ I 0 ( x , y ) − m i n ] ( m a x − m i n ) {\displaystyle I_{p}(x,y)=255\times {\frac {[I_{0}(x,y)-min]}{(max-min)}}} Where: Ip(x,y) is the color level for the output pixel (x,y) after the contrast stretching process. I0(x,y) is the color level input for data pixel (x, y). max is the maximum value for color level in the input image within the selected kernel. min is the minimum value for color level in the input image within the selected kernel. A piecewise form (see above) may also be used. LCS can be applied to the three color channels of an image separately.

    Read more →
  • The Best Free AI Copywriting Tool for Beginners

    The Best Free AI Copywriting Tool for Beginners

    Curious about the best AI copywriting tool? An AI copywriting tool is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI copywriting tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Best AI Photo Editors in 2026

    Best AI Photo Editors in 2026

    Shopping for the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Multiple sequence alignment

    Multiple sequence alignment

    Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations (single amino acid or nucleotide changes), insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides. Multiple sequence alignments require more sophisticated methodologies than pairwise alignments, as they are more computationally complex. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive. However, heuristic methods generally cannot guarantee high-quality solutions and have been shown to fail to yield near-optimal solutions on benchmark test cases. == Problem statement == Given m {\displaystyle m} sequences S i {\displaystyle S_{i}} , i = 1 , ⋯ , m {\displaystyle i=1,\cdots ,m} similar to the form below: S := { S 1 = ( S 11 , S 12 , … , S 1 n 1 ) S 2 = ( S 21 , S 22 , ⋯ , S 2 n 2 ) ⋮ S m = ( S m 1 , S m 2 , … , S m n m ) {\displaystyle S:={\begin{cases}S_{1}=(S_{11},S_{12},\ldots ,S_{1n_{1}})\\S_{2}=(S_{21},S_{22},\cdots ,S_{2n_{2}})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S_{m}=(S_{m1},S_{m2},\ldots ,S_{mn_{m}})\end{cases}}} A multiple sequence alignment is taken of this set of sequences S {\displaystyle S} by inserting any amount of gaps needed into each of the S i {\displaystyle S_{i}} sequences of S {\displaystyle S} until the modified sequences, S i ′ {\displaystyle S'_{i}} , all conform to length L ≥ max { n i ∣ i = 1 , … , m } {\displaystyle L\geq \max\{n_{i}\mid i=1,\ldots ,m\}} and no values in the sequences of S {\displaystyle S} of the same column consists of only gaps. The mathematical form of an MSA of the above sequence set is shown below: S ′ := { S 1 ′ = ( S 11 ′ , S 12 ′ , … , S 1 L ′ ) S 2 ′ = ( S 21 ′ , S 22 ′ , … , S 2 L ′ ) ⋮ S m ′ = ( S m 1 ′ , S m 2 ′ , … , S m L ′ ) {\displaystyle S':={\begin{cases}S'_{1}=(S'_{11},S'_{12},\ldots ,S'_{1L})\\S'_{2}=(S'_{21},S'_{22},\ldots ,S'_{2L})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S'_{m}=(S'_{m1},S'_{m2},\ldots ,S'_{mL})\end{cases}}} To return from each particular sequence S i ′ {\displaystyle S'_{i}} to S i {\displaystyle S_{i}} , remove all gaps. == Graphing approach == A general approach when calculating multiple sequence alignments is to use graphs to identify all of the different alignments. When finding alignments via graph, a complete alignment is created in a weighted graph that contains a set of vertices and a set of edges. Each of the graph edges has a weight based on a certain heuristic that helps to score each alignment or subset of the original graph. === Tracing alignments === When determining the best suited alignments for each MSA, a trace is usually generated. A trace is a set of realized, or corresponding and aligned, vertices that has a specific weight based on the edges that are selected between corresponding vertices. When choosing traces for a set of sequences it is necessary to choose a trace with a maximum weight to get the best alignment of the sequences. == Alignment methods == There are various alignment methods used within multiple sequence to maximize scores and correctness of alignments. Each is usually based on a certain heuristic with an insight into the evolutionary process. Most try to replicate evolution to get the most realistic alignment possible to best predict relations between sequences. === Dynamic programming === A direct method for producing an MSA uses the dynamic programming technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a gap penalty and a substitution matrix assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences, a similar gap penalty is used, but a much simpler substitution matrix, wherein only identical matches and mismatches are considered, is typical. The scores in the substitution matrix may be either all positive or a mix of positive and negative in the case of a global alignment, but must be both positive and negative, in the case of a local alignment. For n individual sequences, the naive method requires constructing the n-dimensional equivalent of the matrix formed in standard pairwise sequence alignment. The search space thus increases exponentially with increasing n and is also strongly dependent on sequence length. Expressed with the big O notation commonly used to measure computational complexity, a naïve MSA takes O(LengthNseqs) time to produce. To find the global optimum for n sequences this way has been shown to be an NP-complete problem. In 1989, based on Carrillo-Lipman Algorithm, Altschul introduced a practical method that uses pairwise alignments to constrain the n-dimensional search space. In this approach pairwise dynamic programming alignments are performed on each pair of sequences in the query set, and only the space near the n-dimensional intersection of these alignments is searched for the n-way alignment. The MSA program optimizes the sum of all of the pairs of characters at each position in the alignment (the so-called sum of pair score) and has been implemented in a software program for constructing multiple sequence alignments. In 2019, Hosseininasab and van Hoeve showed that by using decision diagrams, MSA may be modeled in polynomial space complexity. === Progressive alignment construction === The most widely used approach to multiple sequence alignments uses a heuristic search known as progressive technique (also known as the hierarchical or tree method) developed by Da-Fei Feng and Doolittle in 1987. Progressive alignment builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly related. All progressive alignment methods require two stages: a first stage in which the relationships between the sequences are represented as a phylogenetic tree, called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree. The initial guide tree is determined by an efficient clustering method such as neighbor-joining or unweighted pair group method with arithmetic mean (UPGMA), and may use distances based on the number of identical two-letter sub-sequences (as in FASTA rather than a dynamic programming alignment). Progressive alignments are not guaranteed to be globally optimal. The primary problem is that when errors are made at any stage in growing the MSA, these errors are then propagated through to the final result. Performance is also particularly bad when all of the sequences in the set are rather distantly related. Most modern progressive methods modify their scoring function with a secondary weighting function that assigns scaling factors to individual members of the query set in a nonlinear fashion based on their phylogenetic distance from their nearest neighbors. This corrects for non-random selection of the sequences given to the alignment program. Progressive alignment methods are efficient enough to implement on a large scale for many (100s to 1000s) sequences. A popular progressive alignment method has been the Clustal family. ClustalW is used extensively for phylogenetic tree construction, in spite of the author's explicit warnings that unedited alignments should not be used in such studies and as input for protein structure prediction by homology modeling. European Bioinformatics Institute (EMBL-EBI) announced that CLustalW2 will expire in August 2015. They recommend Clustal Omega which performs based on seeded guide trees and HMM profile-profile techniques for protein alignments. An alternative tool for progressive DNA alignments is multiple alignment using fast Fourier transform (MAFFT). Another common progressive alignment method named T-Coffee is slower than Clustal and its derivatives but generally produces more accurate alignments for distantly related sequence sets. T-Coffee calculates pairwise alignments by combining the direct alignment of the pair with indirect alignments that aligns each sequence of the pair to a third sequence. It uses the output from Clustal as well as another local alignment program LALIGN, which finds multiple regions of local alignment between two sequences. The resulting alignment and phylogenetic tree are used as a guide to produce new and more accurate w

    Read more →
  • LRE Map

    LRE Map

    The LRE Map (Language Resources and Evaluation) is a freely accessible large database on resources dedicated to Natural language processing. The original feature of LRE Map is that the records are collected during the submission of different major Natural language processing conferences. The records are then cleaned and gathered into a global database called "LRE Map". The LRE Map is intended to be an instrument for collecting information about language resources and to become, at the same time, a community for users, a place to share and discover resources, discuss opinions, provide feedback, discover new trends, etc. It is an instrument for discovering, searching and documenting language resources, here intended in a broad sense, as both data and tools. The large amount of information contained in the Map can be analyzed in many different ways. For instance, the LRE Map can provide information about the most frequent type of resource, the most represented language, the applications for which resources are used or are being developed, the proportion of new resources vs. already existing ones, or the way in which resources are distributed to the community. == Context == Several institutions worldwide maintain catalogues of language resources (ELRA, LDC, NICT Universal Catalogue, ACL Data and Code Repository, OLAC, LT World, etc.) However, it has been estimated that only 10% of existing resources are known, either through distribution catalogues or via direct publicity by providers (web sites and the like). The rest remains hidden, the only occasions where it briefly emerges being when a resource is presented in the context of a research paper or report at some conference. Even in this case, nevertheless, it might be that a resource remains in the background simply because the focus of the research is not on the resource per se. == History == The LRE Map originated under the name "LREC Map" during the preparation of LREC 2010 conference. More specifically, the idea was discussed within the FlaReNet project, and in collaboration with ELRA and the Institute of Computational Linguistics of CNR in Pisa, the Map was put in place at LREC 2010. The LREC organizers asked the authors to provide some basic information about all the resources (in a broad sense, i.e. including tools, standards and evaluation packages), either used or created, described in their papers. All these descriptors were then gathered in a global matrix called the LREC Map. The same methodology and requirements from the authors has been then applied and extended to other conferences, namely COLING-2010, EMNLP-2010, RANLP-2011, LREC 2012, LREC 2014 and LREC 2016. After this generalization to other conferences, the LREC Map has been renamed as the LRE Map. == Size and content == The size of the database increases over time. The data collected amount to 4776 entries. Each resource is described according to the following attributes: Resource type, e.g. lexicon, annotation tool, tagger/parser. Resource production status, e.g. newly created finished, existing-updated. Resource availability, e.g. freely available, from data center. Resource modality, e.g. speech, written, sign language. Resource use, e.g. named entity recognition, language identification, machine translation. Resource language, e.g. English, 23 European Union languages, official languages of India. == Uses == The LRE map is a very important tool to chart the NLP field. Compared to other studied based on subjective scorings, the LRE map is made of real facts. The map has a great potential for many uses, in addition to being an information gathering tool: It is a great instrument for monitoring the evolution of the field (useful for funders), if applied in different contexts and times. It can be seen as a huge joint effort, the beginning of an even larger cooperative action not just among few leaders but among all the researchers. It is also an "educational" means towards the broad acknowledgment of the need of meta-research activities with the active involvement of many. It is also instrumental in introducing the new notion of "citation of resources" that could provide an award and a means of scholarly recognition for researchers engaged in resource creation. It is used to help the organization of the conferences of the field like LREC. == Derived matrices == The data were then cleaned and sorted by Joseph Mariani (CNRS-LIMSI IMMI) and Gil Francopoulo (CNRS-LIMSI IMMI + Tagmatica) in order to compute the various matrices of the final FLaReNet reports. One of them, the matrix for written data at LREC 2010 is as follows: English is the most studied language. Secondly, come French and German languages and then Italian and Spanish. == Future == The LRE Map has been extended to Language Resources and Evaluation Journal and other conferences.

    Read more →
  • Michael L. Littman

    Michael L. Littman

    Michael Lederman Littman (born August 30, 1966) is a computer scientist, researcher, educator, and author. His research interests focus on reinforcement learning. He is currently a University Professor of Computer Science at Brown University, where he has taught since 2012. As of July 2025, he is also the university’s inaugural Associate Provost for Artificial Intelligence. == Career == Before graduate school, Littman worked with Thomas Landauer at Bellcore and was granted a patent for one of the earliest systems for cross-language information retrieval. Littman received his Ph.D. in computer science from Brown University in 1996. From 1996 to 1999, he was a professor at Duke University. During his time at Duke, he worked on an automated crossword solver PROVERB, which won an Outstanding Paper Award in 1999 from AAAI and competed in the American Crossword Puzzle Tournament. From 2000 to 2002, he worked at AT&T. From 2002 to 2012, he was a professor at Rutgers University; he chaired the department from 2009-12. In Summer 2012 he returned to Brown University as a full professor. He has also taught at Georgia Institute of Technology, where he was listed as an adjunct professor. Littman served as the Division Director for Information and Intelligent Systems (the AI division) at the National Science Foundation from 2022-2025. After serving a term, he returned to Brown University as their first Associate Provost for Artificial Intelligence where he coordinates the intersection of AI with research, teaching, operations, policy, and communication at the university level. == Research == Littman's research interests are varied but have focused mostly on reinforcement learning and related fields, particularly, in machine learning more generally, game theory, computer networking, partially observable Markov decision process solving, computer solving of analogy problems and other areas. He is also interested in computing education more broadly and has authored a book on programming for everyone. == Leadership and Service == Littman has chaired the panel for The One Hundred‑Year Study on Artificial Intelligence (AI100) 2021 Report and will chair the standing committee for the 2026 report. During his time at the National Science Foundation, he co-led the development of the 2023 National Strategic Artificial Intelligence Research and Development Strategic Plan. == Personal Notes == Littman is also known for his playful approach to communication. He has produced multiple education and parody videos (for example a machine-learning version of Michael Jackson’s Thriller with his oft-collaborator Charles Lee Isbell, Jr.) as part of his teaching outreach. Among his hobbies, he has been noted riding an electric unicycle to his office at the NSF. == Awards == Elected as an ACM Fellow in 2018 for "contributions to the design and analysis of sequential decision-making algorithms in artificial intelligence". Winner of the IFAAMAS Influential Paper Award (2014) Winner of the AAAI “Shakey” Award for Overfitting: Machine Learning Music Video (2014) Elected as a AAAI Fellow in 2010 for "significant contributions to the fields of reinforcement learning, decision making under uncertainty, and statistical language applications". Winner of the AAAI “Shakey” Award for Short Video for Aibo Ingenuity (2007) Winner of the Warren I. Susman Award for Excellence in Teaching at Rutgers (2011) Winner of the Robert B. Cox Award at Duke (1999) Winner of the AAAI Outstanding Paper Award (1999)

    Read more →
  • Ziad Obermeyer

    Ziad Obermeyer

    Ziad Obermeyer (Arabic: زياد أوبرماير) is a Lebanese American physician and researcher whose work focuses on machine learning, health policy, and clinical decision-making in medicine. He is the Blue Cross of California Distinguished Associate Professor at the UC Berkeley School of Public Health, a Chan Zuckerberg Biohub investigator, and a research associate at the National Bureau of Economic Research. He is known for his research on racial bias in health care algorithms and the use of artificial intelligence in health care. == Early life and education == Obermeyer was born in Beirut, Lebanon, and raised in Cambridge, Massachusetts. He earned a Bachelor of Arts degree from Harvard College and a Master of Philosophy (M.Phil.) in History and Science from the University of Cambridge. He received his Doctor of Medicine (M.D.) from Harvard Medical School in 2008. Before pursuing medicine, Obermeyer worked as a consultant at McKinsey & Company, advising pharmaceutical and global health clients in New Jersey, Geneva, and Tokyo. After completing his medical degree, he trained as an emergency physician at Mass General Brigham (MGB) in Boston, Massachusetts. He later continued practicing emergency medicine at the Fort Defiance Indian Hospital on the Navajo Nation in Arizona. == Academic career == Obermeyer served as an Assistant Professor at Harvard Medical School from 2014 to 2020. In 2020, he joined the University of California, Berkeley as an Associate Professor and the Blue Cross of California Distinguished Professor at the School of Public Health. == Research focus == === Algorithmic racial bias in healthcare === In 2019, Obermeyer and economist Sendhil Mullainathan examined a commercial healthcare algorithm by UnitedHealth Group, used in hospitals and by insurers to identify patients with complex health needs. The study found that the algorithm underestimated the health needs of Black patients compared to white patients with similar conditions and that reformulating it would reduce racial bias. In 2020, Obermeyer analyzed an algorithm used to allocate CARE Act relief funding to hospitals. The study identified allocation patterns that favored hospitals with higher revenues over hospitals serving larger numbers of COVID-19 patients who are predominantly Black. === Clinical decision-making === In 2021, Obermeyer and colleagues examined physician decision-making in cardiac care using machine learning models. The study found that physicians misdiagnose cases when they rely on symptoms representative of a heart attack, such as chest pain, over other symptoms. === Pain assessment === Obermeyer developed a deep learning approach to investigate the severity of osteoarthritis in underserved communities. == Policy and regulatory work == Following the publication of the 2019 algorithmic racial bias study, the New York Department of Financial Services and Department of Health launched an investigation into UnitedHealth Group's algorithm, requesting that the company cease using it, citing discriminatory business practices. Also related to this study, in December 2019, Democratic Senators Cory Booker and Ron Wyden released letters to the Federal Trade Commission and Centers for Medicare and Medicaid Services asking to investigate potential discrimination in decision-making algorithms against marginalized communities in healthcare. The senators also wrote to major healthcare companies, including Aetna and Blue Cross Blue Shield, about their internal safeguards against racial bias in their technology. In 2021, Obermeyer and colleagues at the University of Chicago Booth School of Business released the Algorithmic Bias Playbook, a resource for policymakers and technical teams working in healthcare on how to measure and mitigate algorithmic racial bias. Obermeyer testified before the U.S. Senate Financial Committee in February 2024 on artificial intelligence in healthcare, recommending transparency requirements for AI developers and independent algorithm evaluations. In December 2025, he testified before the United States House Committee on Oversight and Government Reform on the role of AI in affordable healthcare and the impact of its integration on the workforce. == Organizations == In 2021, Obermeyer cofounded Nightingale Open Science, a non-profit that creates new medical imaging datasets available for research, and Dandelion Health, a health data analytics company. In June 2023, the company launched a program to audit and evaluate the performance of algorithms to identify potential racial, ethnic, and geographic bias, funded by the Gordon and Betty Moore Foundation and the SCAN Foundation. Dandelion Health partnered with the American Heart Association in 2025 to power an AI assessment lab for cardiovascular algorithms. Obermeyer is a founding faculty member of the University of California, Berkeley–University of California, San Francisco joint program in computational precision health. == Recognition == TIME magazine named Obermeyer one of the 100 most influential people in artificial intelligence in 2023. He has served as a Chan Zuckerberg Biohub Investigator since 2022, and as a Research Associate at the National Bureau of Economic Research since 2023. He was designated an Emerging Leader by the National Academy of Medicine in 2020. Obermeyer's racial bias study received the Willard G. Manning Memorial Award for the Best Research in Health Econometrics from the American Society of Health Economists (ASHEcon) in 2021 and the Responsible Business Education Award from the Financial Times in 2022.

    Read more →