AI App Just Like Chatgpt

AI App Just Like Chatgpt — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Automated negotiation

    Automated negotiation

    Automated negotiation is a form of interaction in systems that are composed of multiple autonomous agents, in which the aim is to reach agreements through an iterative process of making offers. Automated negotiation can be employed for many tasks human negotiators regularly engage in, such as bargaining and joint decision making. The main topics in automated negotiation revolve around the design of protocols and negotiating strategies. == History == Through digitization, the beginning of the 21st century has seen a growing interest in the automation of negotiation and e-negotiation systems, for example in the setting of e-commerce. This interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators, and to find better outcomes than human negotiators. == Examples == Examples of automated negotiation include: Online dispute resolution, in which disagreements between parties are settled. Sponsored search auction, where bids are placed on advertisement keywords. Content negotiation, in which user agents negotiate over HTTP about how to best represent a web resource. Negotiation support systems, in which negotiation decision-making activities are supported by an information system.

    Read more →
  • Margin classifier

    Margin classifier

    In machine learning (ML), a margin classifier is a type of classification model which is able to give an associated distance from the decision boundary for each data sample. For instance, if a linear classifier is used, the distance (typically Euclidean, though others may be used) of a sample from the separating hyperplane is the margin of that sample. The notion of margins is important in several ML classification algorithms, as it can be used to bound the generalization error of these classifiers. These bounds are frequently shown using the VC dimension. The generalization error bound in boosting algorithms and support vector machines is particularly prominent. == Margin for boosting algorithms == The margin for an iterative boosting algorithm given a dataset with two classes can be defined as follows: the classifier is given a sample pair ( x , y ) {\displaystyle (x,y)} , where x ∈ X {\displaystyle x\in X} is a domain space and y ∈ Y = { − 1 , + 1 } {\displaystyle y\in Y=\{-1,+1\}} is the sample's label. The algorithm then selects a classifier h j ∈ C {\displaystyle h_{j}\in C} at each iteration j {\displaystyle j} where C {\displaystyle C} is a space of possible classifiers that predict real values. This hypothesis is then weighted by α j ∈ R {\displaystyle \alpha _{j}\in R} as selected by the boosting algorithm. At iteration t {\displaystyle t} , the margin of a sample x {\displaystyle x} can thus be defined as y ∑ j t α j h j ( x ) ∑ | α j | . {\displaystyle {\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}.} By this definition, the margin is positive if the sample is labeled correctly, or negative if the sample is labeled incorrectly. This definition may be modified and is not the only way to define the margin for boosting algorithms. However, there are reasons why this definition may be appealing. == Examples of margin-based algorithms == Many classifiers can give an associated margin for each sample. However, only some classifiers utilize information of the margin while learning from a dataset. Many boosting algorithms rely on the notion of a margin to assign weight to samples. If a convex loss is utilized (as in AdaBoost or LogitBoost, for instance) then a sample with a higher margin will receive less (or equal) weight than a sample with a lower margin. This leads the boosting algorithm to focus weight on low-margin samples. In non-convex algorithms (e.g., BrownBoost), the margin still dictates the weighting of a sample, though the weighting is non-monotone with respect to the margin. == Generalization error bounds == One theoretical motivation behind margin classifiers is that their generalization error may be bound by the algorithm parameters and a margin term. An example of such a bound is for the AdaBoost algorithm. Let S {\displaystyle S} be a set of m {\displaystyle m} data points, sampled independently at random from a distribution D {\displaystyle D} . Assume the VC-dimension of the underlying base classifier is d {\displaystyle d} and m ≥ d ≥ 1 {\displaystyle m\geq d\geq 1} . Then, with probability 1 − δ {\displaystyle 1-\delta } , we have the bound: P D ( y ∑ j t α j h j ( x ) ∑ | α j | ≤ 0 ) ≤ P S ( y ∑ j t α j h j ( x ) ∑ | α j | ≤ θ ) + O ( 1 m d log 2 ⁡ ( m / d ) / θ 2 + log ⁡ ( 1 / δ ) ) {\displaystyle P_{D}\left({\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}\leq 0\right)\leq P_{S}\left({\frac {y\sum _{j}^{t}\alpha _{j}h_{j}(x)}{\sum |\alpha _{j}|}}\leq \theta \right)+O\left({\frac {1}{\sqrt {m}}}{\sqrt {d\log ^{2}(m/d)/\theta ^{2}+\log(1/\delta )}}\right)} for all θ > 0 {\displaystyle \theta >0} .

    Read more →
  • Sigmoid function

    Sigmoid function

    A sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function. Other sigmoid functions are given in the Examples section. In some fields, most notably in the context of artificial neural networks, the term "sigmoid function" is used as a synonym for "logistic function". Special cases of sigmoid functions include the Gompertz curve (used in modeling systems that saturate at large values of x) and the ogee curve (used in the spillway of some dams). Sigmoid functions have domain of all real numbers, with return (response) value commonly monotonically increasing but could be decreasing. Sigmoid functions most often show a return value (y axis) in the range 0 to 1. Another commonly used range is from −1 to 1. There is also the Heaviside step function, which instantaneously transitions between 0 and 1. A wide variety of sigmoid functions including the logistic and hyperbolic tangent functions have been used as the activation function of artificial neurons. Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions. The logistic sigmoid function is invertible, and its inverse is the logit function. == Theory == In mathematics, a unitary sigmoid function is a bounded sigmoid-type function normalized to the unit range, typically with lower and upper asymptotes at 0 and 1. The theory proposed by Grebenc distinguishes three kinds of unitary sigmoid functions according to their asymptotic behavior and the presence or absence of oscillation near the asymptotes. A general form of a unitary sigmoid function is y = A S ( f ( x ) ) + B , {\displaystyle y=A\,S(f(x))+B,} where S {\displaystyle S} is an increasing sigmoid function, f ( x ) {\displaystyle f(x)} is a transformation of the independent variable, and A {\displaystyle A} and B {\displaystyle B} are constants controlling scaling and translation. === Classification === ==== 1st kind ==== A unitary sigmoid function of the first kind is a bounded increasing function that approaches its lower and upper asymptotes monotonically, without oscillation. This class includes many of the standard sigmoid functions used in statistics, biomathematics, and engineering, such as the logistic function and related generalizations. ==== 2nd kind ==== A unitary sigmoid function of the second kind is a bounded increasing function that oscillates near the upper asymptote while preserving an overall sigmoid transition. ==== 3rd kind ==== A unitary sigmoid function of the third kind is a bounded increasing function that oscillates near both the lower and upper asymptotes. These functions retain the global shape of a sigmoid curve but exhibit oscillatory behavior in the vicinity of both limiting states. === Taxonomy === The tables below show the taxonomy of unitary sigmoid functions of all three kinds. Table 1. Taxonomy matrix with examples of sigmoid functions of the 1st kind Table 2. Taxonomy matrix with examples of sigmoid functions of the 2nd kind on the unbounded interval Table 3. Taxonomy matrix with examples of sigmoid functions of the 3rd kind === Construction methods === The same theory presents a list of 30 methods for constructing sigmoid functions.. These include algebraic transformations, integration and convolution methods, constructions from bell-shaped functions, solutions of ordinary and partial differential equations, recursive schemes, stochastic differential equations, feedback systems, and chaotic systems. M0: Construction method for sigmoid functions not evident or intuitive M1: Inverse of singularity functions M2: Sigmoid functions of embedded positive functions M3: Rising a sigmoid function to the power M4: Exponentiating a sigmoid function M5: Symmetric sigmoid functions derived from asymmetric ones M6: Sigmoid functions of the reciprocal independent variable M7: Embedding a sigmoid function into other function M8: Sum of sigmoid functions M9: Multiplication of sigmoid functions M10: Integral of the product of an increasing and a decreasing function M11: Derivation from lambda (bell-shaped) functions M12: Integration of lambda (bell-shaped) function M13: Integration of the sum of lambda (bell-shaped) functions M14: Integration of the product of two lambda (bell-shaped) functions M15: Integration of the difference of two shifted sigmoid functions M16: Integration of the product of two shifted sigmoid functions M17: Convolution of sigmoid functions M18: Integration of the product of lambda and sigmoid function M19: Solutions of ordinary differential equations M20: Solutions of partial differential equation (PDE) M21: Solutions of functional differential equation (FDE) M22: Sum of a sigmoid function and some derivatives M23: Combination of sigmoid functions, its derivative and integral M24: Filtering sigmoid functions M25: Special cases of Gauss hypergeometric functions M26: Feedback closed-loop systems M27: Recursive functions M28: Recursive time-delayed feed-forward loops M29: Solutions of stochastic differential equation M30: Chaotic sigmoid functions Consult reference for more details. == Definition == A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a positive derivative at each point. == Properties == In general, a sigmoid function is monotonic, and has a first derivative which is bell shaped. Conversely, the integral of any continuous, non-negative, bell-shaped function (with one local maximum and no local minimum, unless degenerate) will be sigmoidal. Thus the cumulative distribution functions for many common probability distributions are sigmoidal. One such example is the error function, which is related to the cumulative distribution function of a normal distribution; another is the arctan function, which is related to the cumulative distribution function of a Cauchy distribution. A sigmoid function is constrained by a pair of horizontal asymptotes as x → ± ∞ {\displaystyle x\rightarrow \pm \infty } . A sigmoid function is convex for values less than a particular point, and it is concave for values greater than that point: in many of the examples here, that point is 0. == Examples == Logistic function f ( x ) = 1 1 + e − x {\displaystyle f(x)={\frac {1}{1+e^{-x}}}} Hyperbolic tangent (shifted and scaled version of the logistic function, above) f ( x ) = tanh ⁡ x = e x − e − x e x + e − x {\displaystyle f(x)=\tanh x={\frac {e^{x}-e^{-x}}{e^{x}+e^{-x}}}} Arctangent function f ( x ) = arctan ⁡ x {\displaystyle f(x)=\arctan x} Gudermannian function f ( x ) = gd ⁡ ( x ) = ∫ 0 x d t cosh ⁡ t = 2 arctan ⁡ ( tanh ⁡ ( x 2 ) ) {\displaystyle f(x)=\operatorname {gd} (x)=\int _{0}^{x}{\frac {dt}{\cosh t}}=2\arctan \left(\tanh \left({\frac {x}{2}}\right)\right)} Error function f ( x ) = erf ⁡ ( x ) = 2 π ∫ 0 x e − t 2 d t {\displaystyle f(x)=\operatorname {erf} (x)={\frac {2}{\sqrt {\pi }}}\int _{0}^{x}e^{-t^{2}}\,dt} Generalised logistic function f ( x ) = ( 1 + e − x ) − α , α > 0 {\displaystyle f(x)=\left(1+e^{-x}\right)^{-\alpha },\quad \alpha >0} Smoothstep function f ( x ) = { ( ∫ 0 1 ( 1 − u 2 ) N d u ) − 1 ∫ 0 x ( 1 − u 2 ) N d u , | x | ≤ 1 sgn ⁡ ( x ) | x | ≥ 1 N ∈ Z ≥ 1 {\displaystyle f(x)={\begin{cases}{\displaystyle \left(\int _{0}^{1}\left(1-u^{2}\right)^{N}du\right)^{-1}\int _{0}^{x}\left(1-u^{2}\right)^{N}\ du},&|x|\leq 1\\\\\operatorname {sgn}(x)&|x|\geq 1\\\end{cases}}\quad N\in \mathbb {Z} \geq 1} Some algebraic functions, for example f ( x ) = x 1 + x 2 {\displaystyle f(x)={\frac {x}{\sqrt {1+x^{2}}}}} and in a more general form f ( x ) = x ( 1 + | x | k ) 1 / k {\displaystyle f(x)={\frac {x}{\left(1+|x|^{k}\right)^{1/k}}}} Up to shifts and scaling, many sigmoids are special cases of f ( x ) = φ ( φ ( x , β ) , α ) , {\displaystyle f(x)=\varphi (\varphi (x,\beta ),\alpha ),} where φ ( x , λ ) = { ( 1 − λ x ) 1 / λ λ ≠ 0 e − x λ = 0 {\displaystyle \varphi (x,\lambda )={\begin{cases}(1-\lambda x)^{1/\lambda }&\lambda \neq 0\\e^{-x}&\lambda =0\\\end{cases}}} is the inverse of the negative Box–Cox transformation, and α < 1 {\displaystyle \alpha <1} and β < 1 {\displaystyle \beta <1} are shape parameters. Smooth transition function normalized to (−1,1): f ( x ) = { 2 1 + e − 2 m x 1 − x 2 − 1 , | x | < 1 sgn ⁡ ( x ) | x | ≥ 1 = { tanh ⁡ ( m x 1 − x 2 ) , | x | < 1 sgn ⁡ ( x ) | x | ≥ 1 {\displaystyle {\begin{aligned}f(x)&={\begin{cases}{\displaystyle {\frac {2}{1+e^{-2m{\frac {x}{1-x^{2}}}}}}-1},&|x|<1\\\\\operatorname {sgn}(x)&|x|\geq 1\\\end{cases}}\\&={\begin{cases}{\displaystyle \tanh \left(m{\frac {x}{1-x^{2}}}\right)},&|x|<1\\\\\operatorname {sgn}(x)&|x|\geq 1\\\end{cases}}\end{aligned}}} using the hyperbolic tangent mentioned above. Here, m {\displaystyle m} is a free parameter encoding the slope at x = 0 {\displaystyle x=0} , which must be great

    Read more →
  • Robust principal component analysis

    Robust principal component analysis

    Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure of principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a low-rank matrix L0 from highly corrupted measurements M = L0 +S0. This decomposition in low-rank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), Stable PCP, Quantized PCP, Block based PCP, and Local PCP. Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM), Alternating Direction Method (ADM), Fast Alternating Minimization (FAM), Iteratively Reweighted Least Squares (IRLS ) or alternating projections (AP). == Algorithms == === Non-convex method === The 2014 guaranteed algorithm for the robust PCA problem (with the input matrix being M = L + S {\displaystyle M=L+S} ) is an alternating minimization type algorithm. The computational complexity is O ( m n r 2 log ⁡ 1 ϵ ) {\displaystyle O\left(mnr^{2}\log {\frac {1}{\epsilon }}\right)} where the input is the superposition of a low-rank (of rank r {\displaystyle r} ) and a sparse matrix of dimension m × n {\displaystyle m\times n} and ϵ {\displaystyle \epsilon } is the desired accuracy of the recovered solution, i.e., ‖ L ^ − L ‖ F ≤ ϵ {\displaystyle \|{\widehat {L}}-L\|_{F}\leq \epsilon } where L {\displaystyle L} is the true low-rank component and L ^ {\displaystyle {\widehat {L}}} is the estimated or recovered low-rank component. Intuitively, this algorithm performs projections of the residual onto the set of low-rank matrices (via the SVD operation) and sparse matrices (via entry-wise hard thresholding) in an alternating manner - that is, low-rank projection of the difference the input matrix and the sparse matrix obtained at a given iteration followed by sparse projection of the difference of the input matrix and the low-rank matrix obtained in the previous step, and iterating the two steps until convergence. This alternating projections algorithm is later improved by an accelerated version, coined AccAltProj. The acceleration is achieved by applying a tangent space projection before projecting the residue onto the set of low-rank matrices. This trick improves the computational complexity to O ( m n r log ⁡ 1 ϵ ) {\displaystyle O\left(mnr\log {\frac {1}{\epsilon }}\right)} with a much smaller constant in front while it maintains the theoretically guaranteed linear convergence. Another fast version of accelerated alternating projections algorithm is IRCUR. It uses the structure of CUR decomposition in alternating projections framework to dramatically reduces the computational complexity of RPCA to O ( max { m , n } r 2 log ⁡ ( m ) log ⁡ ( n ) log ⁡ 1 ϵ ) {\displaystyle O\left(\max\{m,n\}r^{2}\log(m)\log(n)\log {\frac {1}{\epsilon }}\right)} === Convex relaxation === This method consists of relaxing the rank constraint r a n k ( L ) {\displaystyle rank(L)} in the optimization problem to the nuclear norm ‖ L ‖ ∗ {\displaystyle \|L\|_{}} and the sparsity constraint ‖ S ‖ 0 {\displaystyle \|S\|_{0}} to ℓ 1 {\displaystyle \ell _{1}} -norm ‖ S ‖ 1 {\displaystyle \|S\|_{1}} . The resulting program can be solved using methods such as the method of Augmented Lagrange Multipliers. === Deep-learning augmented method === Some recent works propose RPCA algorithms with learnable/training parameters. Such a learnable/trainable algorithm can be unfolded as a deep neural network whose parameters can be learned via machine learning techniques from a given dataset or problem distribution. The learned algorithm will have superior performance on the corresponding problem distribution. == Applications == RPCA has many real life important applications particularly when the data under study can naturally be modeled as a low-rank plus a sparse contribution. Following examples are inspired by contemporary challenges in computer science, and depending on the applications, either the low-rank component or the sparse component could be the object of interest: === Video surveillance === Given a sequence of surveillance video frames, it is often required to identify the activities that stand out from the background. If we stack the video frames as columns of a matrix M, then the low-rank component L0 naturally corresponds to the stationary background and the sparse component S0 captures the moving objects in the foreground. === Face recognition === Images of a convex, Lambertian surface under varying illuminations span a low-dimensional subspace. This is one of the reasons for effectiveness of low-dimensional models for imagery data. In particular, it is easy to approximate images of a human's face by a low-dimensional subspace. To be able to correctly retrieve this subspace is crucial in many applications such as face recognition and alignment. It turns out that RPCA can be applied successfully to this problem to exactly recover the face.

    Read more →
  • IBM ALP

    IBM ALP

    IBM Assembly Language Processor (ALP) is an assembler written by IBM for 32-bit OS/2 Warp (OS/2 3.0), which was released in 1994. ALP accepts source programs compatible with Microsoft Macro Assembler (MASM) version 5.1, which was originally used to build many of the device drivers included with OS/2. For OS/2 versions 3 and 4, ALP was distributed, along with other tools and documentation, as part of the Device Driver Kit (DDK). The DDK was withdrawn in 2004 as part of IBM's discontinuance of OS/2.

    Read more →
  • ARKA descriptors in QSAR

    ARKA descriptors in QSAR

    In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.

    Read more →
  • Stochastic block model

    Stochastic block model

    The stochastic block model is a generative model for random graphs. This model tends to produce graphs containing communities, subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation was first introduced in 1983 in the field of social network analysis by Paul W. Holland et al. The stochastic block model is important in statistics, machine learning, and network science, where it serves as a useful benchmark for the task of recovering community structure in graph data. == Definition == The stochastic block model takes the following parameters: The number n {\displaystyle n} of vertices; a partition of the vertex set { 1 , … , n } {\displaystyle \{1,\ldots ,n\}} into disjoint subsets C 1 , … , C r {\displaystyle C_{1},\ldots ,C_{r}} , called communities; a symmetric r × r {\displaystyle r\times r} matrix P {\displaystyle P} of edge probabilities. The edge set is then sampled at random as follows: any two vertices u ∈ C i {\displaystyle u\in C_{i}} and v ∈ C j {\displaystyle v\in C_{j}} are connected by an edge with probability P i j {\displaystyle P_{ij}} . An example problem is: given a graph with n {\displaystyle n} vertices, where the edges are sampled as described, recover the groups C 1 , … , C r {\displaystyle C_{1},\ldots ,C_{r}} . == Special cases == If the probability matrix is a constant, in the sense that P i j = p {\displaystyle P_{ij}=p} for all i , j {\displaystyle i,j} , then the result is the Erdős–Rényi model G ( n , p ) {\displaystyle G(n,p)} . This case is degenerate—the partition into communities becomes irrelevant—but it illustrates a close relationship to the Erdős–Rényi model. The planted partition model is the special case that the values of the probability matrix P {\displaystyle P} are a constant p {\displaystyle p} on the diagonal and another constant q {\displaystyle q} off the diagonal. Thus two vertices within the same community share an edge with probability p {\displaystyle p} , while two vertices in different communities share an edge with probability q {\displaystyle q} . Sometimes it is this restricted model that is called the stochastic block model. The case where p > q {\displaystyle p>q} is called an assortative model, while the case p < q {\displaystyle p P j k {\displaystyle P_{ii}>P_{jk}} whenever j ≠ k {\displaystyle j\neq k} : all diagonal entries dominate all off-diagonal entries. A model is called weakly assortative if P i i > P i j {\displaystyle P_{ii}>P_{ij}} whenever i ≠ j {\displaystyle i\neq j} : each diagonal entry is only required to dominate the rest of its own row and column. Disassortative forms of this terminology exist, by reversing all inequalities. For some algorithms, recovery might be easier for block models with assortative or disassortative conditions of this form. == Typical statistical tasks == Much of the literature on algorithmic community detection addresses three statistical tasks: detection, partial recovery, and exact recovery. === Detection === The goal of detection algorithms is simply to determine, given a sampled graph, whether the graph has latent community structure. More precisely, a graph might be generated, with some known prior probability, from a known stochastic block model, and otherwise from a similar Erdos-Renyi model. The algorithmic task is to correctly identify which of these two underlying models generated the graph. === Partial recovery === In partial recovery, the goal is to approximately determine the latent partition into communities, in the sense of finding a partition that is correlated with the true partition significantly better than a random guess. === Exact recovery === In exact recovery, the goal is to recover the latent partition into communities exactly. The community sizes and probability matrix may be known or unknown. == Statistical lower bounds and threshold behavior == Stochastic block models exhibit a sharp threshold effect reminiscent of percolation thresholds. Suppose that we allow the size n {\displaystyle n} of the graph to grow, keeping the community sizes in fixed proportions. If the probability matrix remains fixed, tasks such as partial and exact recovery become feasible for all non-degenerate parameter settings. However, if we scale down the probability matrix at a suitable rate as n {\displaystyle n} increases, we observe a sharp phase transition: for certain settings of the parameters, it will become possible to achieve recovery with probability tending to 1, whereas on the opposite side of the parameter threshold, the probability of recovery tends to 0 no matter what algorithm is used. For partial recovery, the appropriate scaling is to take P i j = P ~ i j / n {\displaystyle P_{ij}={\tilde {P}}_{ij}/n} for fixed P ~ {\displaystyle {\tilde {P}}} , resulting in graphs of constant average degree. In the case of two equal-sized communities, in the assortative planted partition model with probability matrix P = ( p ~ / n q ~ / n q ~ / n p ~ / n ) , {\displaystyle P=\left({\begin{array}{cc}{\tilde {p}}/n&{\tilde {q}}/n\\{\tilde {q}}/n&{\tilde {p}}/n\end{array}}\right),} partial recovery is feasible with probability 1 − o ( 1 ) {\displaystyle 1-o(1)} whenever ( p ~ − q ~ ) 2 > 2 ( p ~ + q ~ ) {\displaystyle ({\tilde {p}}-{\tilde {q}})^{2}>2({\tilde {p}}+{\tilde {q}})} , whereas any estimator fails partial recovery with probability 1 − o ( 1 ) {\displaystyle 1-o(1)} whenever ( p ~ − q ~ ) 2 < 2 ( p ~ + q ~ ) {\displaystyle ({\tilde {p}}-{\tilde {q}})^{2}<2({\tilde {p}}+{\tilde {q}})} . For exact recovery, the appropriate scaling is to take P i j = P ~ i j log ⁡ n / n {\displaystyle P_{ij}={\tilde {P}}_{ij}\log n/n} , resulting in graphs of logarithmic average degree. Here a similar threshold exists: for the assortative planted partition model with r {\displaystyle r} equal-sized communities, the threshold lies at p ~ − q ~ = r {\displaystyle {\sqrt {\tilde {p}}}-{\sqrt {\tilde {q}}}={\sqrt {r}}} . In fact, the exact recovery threshold is known for the fully general stochastic block model. == Algorithms == In principle, exact recovery can be solved in its feasible range using maximum likelihood, but this amounts to solving a constrained or regularized cut problem such as minimum bisection that is typically NP-complete. Hence, no known efficient algorithms will correctly compute the maximum-likelihood estimate in the worst case. However, a wide variety of algorithms perform well in the average case, and many high-probability performance guarantees have been proven for algorithms in both the partial and exact recovery settings. Successful algorithms include spectral clustering of the vertices, semidefinite programming, forms of belief propagation, and community detection among others. == Variants == Several variants of the model exist. One minor tweak allocates vertices to communities randomly, according to a categorical distribution, rather than in a fixed partition. More significant variants include the degree-corrected stochastic block model, the hierarchical stochastic block model, the geometric block model, censored block model and the mixed-membership block model. == Topic models == Stochastic block model have been recognised to be a topic model on bipartite networks. In a network of documents and words, Stochastic block model can identify topics: group of words with a similar meaning. == Extensions to signed graphs == Signed graphs allow for both favorable and adverse relationships and serve as a common model choice for various data analysis applications, e.g., correlation clustering. The stochastic block model can be trivially extended to signed graphs by assigning both positive and negative edge weights or equivalently using a difference of adjacency matrices of two stochastic block models. == DARPA/MIT/AWS Graph Challenge: streaming stochastic block partition == GraphChallenge encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to enable relationships between events to be discovered as they unfold in the field. Streaming stochastic block partition is one of the challenges since 2017. Spectral clustering has demonstrated outstanding performance compared to the original and even improved base algorithm, matching its quality of clusters while being multiple orders of magnitude faster.

    Read more →
  • Language identification in the limit

    Language identification in the limit

    Language identification in the limit is a formal model for inductive inference of formal languages, mainly by computers (see machine learning and induction of regular languages). It was introduced by E. Mark Gold in a technical report and a journal article with the same title. In this model, a teacher provides to a learner some presentation (i.e. a sequence of strings) of some formal language. The learning is seen as an infinite process. Each time the learner reads an element of the presentation, it should provide a representation (e.g. a formal grammar) for the language. Gold defines that a learner can identify in the limit a class of languages if, given any presentation of any language in the class, the learner will produce only a finite number of wrong representations, and then stick with the correct representation. However, the learner need not be able to announce its correctness; and the teacher might present a counterexample to any representation arbitrarily long after. Gold defined two types of presentations: Text (positive information): an enumeration of all strings the language consists of. Complete presentation (positive and negative information): an enumeration of all possible strings, each with a label indicating if the string belongs to the language or not. == Learnability == This model is an early attempt to formally capture the notion of learnability. Gold's journal article introduces for contrast the stronger models Finite identification (where the learner has to announce correctness after a finite number of steps), and Fixed-time identification (where correctness has to be reached after an apriori-specified number of steps). A weaker formal model of learnability is the Probably approximately correct learning (PAC) model, introduced by Leslie Valiant in 1984. == Examples == It is instructive to look at concrete examples (in the tables) of learning sessions the definition of identification in the limit speaks about. A fictitious session to learn a regular language L over the alphabet {a,b} from text presentation:In each step, the teacher gives a string belonging to L, and the learner answers a guess for L, encoded as a regular expression. In step 3, the learner's guess is not consistent with the strings seen so far; in step 4, the teacher gives a string repeatedly. After step 6, the learner sticks to the regular expression (ab+ba). If this happens to be a description of the language L the teacher has in mind, it is said that the learner has learned that language.If a computer program for the learner's role would exist that was able to successfully learn each regular language, that class of languages would be identifiable in the limit. Gold has shown that this is not the case. A particular learning algorithm always guessing L to be just the union of all strings seen so far:If L is a finite language, the learner will eventually guess it correctly, however, without being able to tell when. Although the guess didn't change during step 3 to 6, the learner couldn't be sure to be correct.Gold has shown that the class of finite languages is identifiable in the limit, however, this class is neither finitely nor fixed-time identifiable. Learning from complete presentation by telling:In each step, the teacher gives a string and tells whether it belongs to L (green) or not (red, struck-out). Each possible string is eventually classified in this way by the teacher. Learning from complete presentation by request:The learner gives a query string, the teacher tells whether it belongs to L (yes) or not (no); the learner then gives a guess for L, followed by the next query string. In this example, the learner happens to query in each step just the same string as given by the teacher in example 3.In general, Gold has shown that each language class identifiable in the request-presentation setting is also identifiable in the telling-presentation setting, since the learner, instead of querying a string, just needs to wait until it is eventually given by the teacher. == Gold's theorem == More formally, a language L {\displaystyle L} is a nonempty set, and its elements are called sentences. a language family is a set of languages. a language-learning environment E {\displaystyle E} for a language L {\displaystyle L} is a stream of sentences from L {\displaystyle L} , such that each sentence in L {\displaystyle L} appears at least once. a language learner is a function f {\displaystyle f} that sends a list of sentences to a language. This is interpreted as saying that, after seeing sentences a 1 , a 2 . . . , a n {\displaystyle a_{1},a_{2}...,a_{n}} in that order, the language learner guesses that the language that produces the sentences should be f ( a 1 , . . . , a n ) {\displaystyle f(a_{1},...,a_{n})} . Note that the learner is not obliged to be correct — it could very well guess a language that does not even contain a 1 , . . . , a n {\displaystyle a_{1},...,a_{n}} . a language learner f {\displaystyle f} learns a language L {\displaystyle L} in environment E = ( a 1 , a 2 , . . . ) {\displaystyle E=(a_{1},a_{2},...)} if the learner always guesses L {\displaystyle L} after seeing enough examples from the environment. a language learner f {\displaystyle f} learns a language L {\displaystyle L} if it learns L {\displaystyle L} in any environment E {\displaystyle E} for L {\displaystyle L} . a language family is learnable if there exists a language learner that can learn all languages in the family. Notes: In the context of Gold's theorem, sentences need only be distinguishable. They need not be anything in particular, such as finite strings (as usual in formal linguistics). Learnability is not a concept for individual languages. Any individual language L {\displaystyle L} could be learned by a trivial learner that always guesses L {\displaystyle L} . Learnability is not a concept for individual learners. A language family is learnable if, and only if, there exists some learner that can learn the family. It does not matter how well the learner performs for learning languages outside the family. Gold's theorem is easily bypassed if negative examples are allowed. In particular, the language family { L 1 , L 2 , . . . , L ∞ } {\displaystyle \{L_{1},L_{2},...,L_{\infty }\}} can be learned by a learner that always guesses L ∞ {\displaystyle L_{\infty }} until it receives the first negative example ¬ a n {\displaystyle \neg a_{n}} , where a n ∈ L n + 1 ∖ L n {\displaystyle a_{n}\in L_{n+1}\setminus L_{n}} , at which point it always guesses L n {\displaystyle L_{n}} . == Learnability characterization == Dana Angluin gave the characterizations of learnability from text (positive information) in a 1980 paper. If a learner is required to be effective, then an indexed class of recursive languages is learnable in the limit if there is an effective procedure that uniformly enumerates tell-tales for each language in the class (Condition 1). It is not hard to see that if an ideal learner (i.e., an arbitrary function) is allowed, then an indexed class of languages is learnable in the limit if each language in the class has a tell-tale (Condition 2). == Language classes learnable in the limit == The table shows which language classes are identifiable in the limit in which learning model. On the right-hand side, each language class is a superclass of all lower classes. Each learning model (i.e. type of presentation) can identify in the limit all classes below it. In particular, the class of finite languages is identifiable in the limit by text presentation (cf. Example 2 above), while the class of regular languages is not. Pattern Languages, introduced by Dana Angluin in another 1980 paper, are also identifiable by normal text presentation; they are omitted in the table, since they are above the singleton and below the primitive recursive language class, but incomparable to the classes in between. == Sufficient conditions for learnability == Condition 1 in Angluin's paper is not always easy to verify. Therefore, people come up with various sufficient conditions for the learnability of a language class. See also Induction of regular languages for learnable subclasses of regular languages. === Finite thickness === A class of languages has finite thickness if every non-empty set of strings is contained in at most finitely many languages of the class. This is exactly Condition 3 in Angluin's paper. Angluin showed that if a class of recursive languages has finite thickness, then it is learnable in the limit. A class with finite thickness certainly satisfies MEF-condition and MFF-condition; in other words, finite thickness implies M-finite thickness. === Finite elasticity === A class of languages is said to have finite elasticity if for every infinite sequence of strings s 0 , s 1 , . . . {\displaystyle s_{0},s_{1},...} and every infinite sequence of languages in the class L 1 , L 2 , . . . {\displaystyle L_{1},L_{2},...} , there exists a finite number n such

    Read more →
  • Mix automation

    Mix automation

    In music recording, mix automation allows the mixing console to remember the mixing engineer's dynamic adjustment of faders during a musical piece in the post-production editing process. A timecode is necessary for the synchronization of automation. Modern mixing consoles and digital audio workstations use comprehensive mix automation. The need for automated mixing originated from the late 1970s transition form 8-track to 16-track and then 24-track multitrack recording, as mixing could be laborious and require multiple people and hands, and the results could be almost impossible to reproduce. With 48-track recording - synchronized twin 24-track recorders (for a net 46 audio tracks, with one on each machine for SMPTE timecode) - came larger recording and mixing consoles with even more channel faders to manage during mixdown. Manufacturers, such as Neve Electronics (now AMS Neve) and Solid State Logic (SSL), both English companies, developed systems that enabled one engineer to oversee every detail of a complex mix, although the computers required to power these desks remained a rarity into the late 1970s. According to record producer Roy Thomas Baker, Queen's 1975 single "Bohemian Rhapsody" was one of the first mixes to be done with automation. == Types == Voltage Controlled Automation fader levels are regulated by voltage-controlled amplifiers (VCA). VCAs control the audio level and not the actual fader. Moving Fader Automation a motor is attached to the fader, which then can be controlled by the console, digital audio workstation (DAW), or user. Software Controlled Automation the software can be internal to the console, or external as part of a DAW. The virtual fader can be adjusted in the software by the user. MIDI Automation the communications protocol MIDI can be used to send messages to the console to control automation. == Modes == Auto Write used the first time automation is created or when writing over existing automation Auto Touch writes automation data only while a fader is touched/faders return to any previously automated position after release Auto Latch starts writing automation data when a fader is touched/stays in position after release Auto Read digital Audio Workstation performs the written automation Auto Off automation is temporarily disabled All of these include the mute button. If mute is pressed during writing of automation, the audio track will be muted during playback of that automation. Depending on software, other parameters such as panning, sends, and plug-in controls can be automated as well. In some cases, automation can be written using a digital potentiometer instead of a fader.

    Read more →
  • Julia (programming language)

    Julia (programming language)

    Julia is a dynamic general-purpose programming language. As a high-level language, distinctive aspects of Julia's design include a type system with parametric polymorphism, the use of multiple dispatch as a core programming paradigm, just-in-time compilation and a parallel garbage collection implementation. Notably, Julia does not support classes with encapsulated methods but instead relies on the types of all of a function's arguments to determine which method will be called. By default, Julia is run similarly to scripting languages, using its runtime, and allows for interactions, but Julia programs can also be compiled to small binary standalone executables (or to small libraries for e.g. Python), with e.g. the JuliaC.jl compiler. Julia programs can reuse libraries from other languages, and vice versa. Julia has interoperability with C, C++, Fortran, Rust, Python, and R. Additionally, some Julia packages have bindings to be used from Python and R as libraries. Julia is supported by programmer tools like IDEs (see below) and by notebooks like Pluto.jl, Jupyter, and since 2025, Google Colab officially supports Julia natively. Julia is sometimes used in embedded systems (e.g. has been used in a satellite in space on a Raspberry Pi Compute Module 4; 64-bit Pis work best with Julia, and Julia is supported in Raspbian). == History == Work on Julia began in 2009, when Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman set out to create a free language that was both high-level and fast. On 14 February 2012, the team launched a website with a blog post explaining the language's mission. In an interview with InfoWorld in April 2012, Karpinski said about the name of the language, Julia: "There's no good reason, really. It just seemed like a pretty name." Bezanson said he chose the name on the recommendation of a friend, then years later wrote: Maybe julia stands for "Jeff's uncommon lisp is automated"? Julia's syntax is stable, since version 1.0 in 2018, and Julia has a backward compatibility guarantee for 1.x and also a stability promise for the documented (stable) API, while in the years before in the early development prior to 0.7 the syntax (and semantics) was changed in new versions. All of the (registered package) ecosystem uses the new and improved syntax, and in most cases relies on new APIs that have been added regularly, and in some cases minor additional syntax added in a forward compatible way e.g. in Julia 1.7. In the 10 years since the 2012 launch of pre-1.0 Julia, the community has grown. The Julia package ecosystem has over 11.8 million lines of code (including docs and tests). The JuliaCon academic conference for Julia users and developers has been held annually since 2014 with JuliaCon2020 welcoming over 28,900 unique viewers, and then JuliaCon2021 breaking all previous records (with more than 300 JuliaCon2021 presentations available for free on YouTube, up from 162 the year before), and 43,000 unique viewers during the conference. Three of the Julia co-creators are the recipients of the 2019 James H. Wilkinson Prize for Numerical Software (awarded every four years) "for the creation of Julia, an innovative environment for the creation of high-performance tools that enable the analysis and solution of computational science problems." Also, Alan Edelman, professor of applied mathematics at MIT, has been selected to receive the 2019 IEEE Computer Society Sidney Fernbach Award "for outstanding breakthroughs in high-performance computing, linear algebra, and computational science and for contributions to the Julia programming language." Version 0.3 was released in August 2014. Both Julia 0.7 and version 1.0 were released on 8 August 2018. Julia 1.4 added syntax for generic array indexing to handle e.g. 0-based arrays. The memory model was also changed. Julia 1.5 released in August 2020 added record and replay debugging support, for Mozilla's rr tool. The release changed the behavior in the REPL (to soft scope) to the one used in Jupyter, but keeps full compatible with non-REPL code (that retains hard scope). Julia 1.6 was the largest release since 1.0, and it was the long-term support (LTS) version for the longest time. Since Julia 1.7 development is back to time-based releases, and it was released in November 2021 with e.g. a new default random-number generator and Julia 1.7.3 fixed at least one security issue. Julia 1.8 added options for hiding source code when compiling Julia source code to executables. Julia 1.9 has added the ability to precompile packages to native machine code, done automatically; to improve precompilation of packages a new package PrecompileTools.jl was introduced, for use by package developers. Julia 1.10 was released on 25 December 2023 with new features such as parallel garbage collection. Julia 1.11 was released on 7 October 2024, and with it 1.10.5 became the next long-term support (LTS) version (i.e. those became the only two supported versions), since replaced by 1.10.10 released on 27 June, and 1.6 is no longer an LTS version. Julia 1.11 adds e.g. the new public keyword to signal safe public API (Julia users are advised to use such API, not internals, of Julia or packages, and package authors advised to use the keyword, generally indirectly, e.g. prefixed with the @compat macro, from Compat.jl, to also support older Julia versions, at least the LTS version). Julia 1.12 was released on 7 October 2025 (and 1.12.5 on 9 February 2026), and with it a JuliaC.jl package including the juliac compiler that works with it, for making rather small binary executables (much smaller than was possible before; through the use of new so-called trimming feature). Julia 1.10 LTS is an officially still-supported branch, but the 1.11 branch has also been maintained after 1.12 release, with 1.11.8 released and then 1.11.9 released on 8 February 2026. === JuliaCon === Since 2014, the Julia Community has hosted an annual Julia Conference focused on developers and users. The first JuliaCon took place in Chicago and kickstarted the annual occurrence of the conference. Since 2014, the conference has taken place across a number of locations including MIT and the University of Maryland, Baltimore. The event audience has grown from a few dozen people to over 28,900 unique attendees during JuliaCon 2020, which took place virtually. JuliaCon 2021 also took place virtually with keynote addresses from professors William Kahan, the primary architect of the IEEE 754 floating-point standard (which virtually all CPUs and languages, including Julia, use), Jan Vitek, Xiaoye Sherry Li, and Soumith Chintala, a co-creator of PyTorch. JuliaCon grew to 43,000 unique attendees and more than 300 presentations (still freely accessible, plus for older years). JuliaCon 2022 will also be virtual held between July 27 and July 29, 2022, for the first time in several languages, not just in English. === Sponsors === The Julia language became a NumFOCUS fiscally sponsored project in 2014 in an effort to ensure the project's long-term sustainability. Jeremy Kepner at MIT Lincoln Laboratory was the founding sponsor of the Julia project in its early days. In addition, funds from the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, Intel, and agencies such as NSF, DARPA, NIH, NASA, and FAA have been essential to the development of Julia. Mozilla, the maker of Firefox web browser, with its research grants for H1 2019, sponsored "a member of the official Julia team" for the project "Bringing Julia to the Browser", meaning to Firefox and other web browsers. The Julia language is also supported by individual donors on GitHub. === The Julia company === JuliaHub, Inc. was founded in 2015 as Julia Computing, Inc. by Viral B. Shah, Deepak Vinchhi, Alan Edelman, Jeff Bezanson, Stefan Karpinski and Keno Fischer. In June 2017, Julia Computing raised US$4.6 million in seed funding from General Catalyst and Founder Collective, the same month was "granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community", and in December 2019 the company got $1.1 million funding from the US government to "develop a neural component machine learning tool to reduce the total energy consumption of heating, ventilation, and air conditioning (HVAC) systems in buildings". In July 2021, Julia Computing announced they raised a $24 million Series A round led by Dorilton Ventures, which also owns Formula One team Williams Racing, that partnered with Julia Computing. Williams' Commercial Director said: "Investing in companies building best-in-class cloud technology is a strategic focus for Dorilton and Julia's versatile platform, with revolutionary capabilities in simulation and modelling, is hugely relevant to our business. We look forward to embedding Julia Computing in the world's most technologically advanced sport". In June 2023, JuliaHub received (again, now

    Read more →
  • Diffusion model

    Diffusion model

    In machine learning, diffusion models, also known as diffusion-based generative models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of two major components: the forward diffusion process, and the reverse sampling process. The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models data as generated by a diffusion process, whereby a new datum performs a random walk with drift through the space of all possible data. A trained diffusion model can be sampled in many ways, with different efficiency and quality. There are various equivalent formalisms, including Markov chains, denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. They are typically trained using variational inference. The model responsible for denoising is typically called its "backbone". The backbone may be of any kind, but they are typically U-nets or transformers. As of 2024, diffusion models are mainly used for computer vision tasks, including image denoising, inpainting, super-resolution, image generation, and video generation. These typically involve training a neural network to sequentially denoise images blurred with Gaussian noise. The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise, and applying the network iteratively to denoise the image. Diffusion-based image generators have seen widespread commercial interest, such as Stable Diffusion and DALL-E. These models typically combine diffusion models with other models, such as text-encoders and cross-attention modules to allow text-conditioned generation. Other than computer vision, diffusion models have also found applications in natural language processing such as text generation and summarization, sound generation, and reinforcement learning. == Denoising diffusion model == === Non-equilibrium thermodynamics === Diffusion models were introduced in 2015 as a method to train a model that can sample from a highly complex probability distribution. They used techniques from non-equilibrium thermodynamics, especially diffusion. Consider, for example, how one might model the distribution of all naturally occurring photos. Each image is a point in the space of all images, and the distribution of naturally occurring photos is a "cloud" in space, which, by repeatedly adding noise to the images, diffuses out to the rest of the image space, until the cloud becomes all but indistinguishable from a Gaussian distribution N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . A model that can approximately undo the diffusion can then be used to sample from the original distribution. This is studied in "non-equilibrium" thermodynamics, as the starting distribution is not in equilibrium, unlike the final distribution. The equilibrium distribution is the Gaussian distribution N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} , with pdf ρ ( x ) ∝ e − 1 2 ‖ x ‖ 2 {\displaystyle \rho (x)\propto e^{-{\frac {1}{2}}\|x\|^{2}}} . This is just the Maxwell–Boltzmann distribution of particles in a potential well V ( x ) = 1 2 ‖ x ‖ 2 {\displaystyle V(x)={\frac {1}{2}}\|x\|^{2}} at temperature 1. The initial distribution, being very much out of equilibrium, would diffuse towards the equilibrium distribution, making biased random steps that are a sum of pure randomness (like a Brownian walker) and gradient descent down the potential well. The randomness is necessary: if the particles were to undergo only gradient descent, then they will all fall to the origin, collapsing the distribution. === Denoising Diffusion Probabilistic Model (DDPM) === The 2020 paper proposed the Denoising Diffusion Probabilistic Model (DDPM), which improves upon the previous method by variational inference. ==== Forward diffusion ==== To present the model, some notation is required. β 1 , . . . , β T ∈ ( 0 , 1 ) {\displaystyle \beta _{1},...,\beta _{T}\in (0,1)} are fixed constants. α t := 1 − β t {\displaystyle \alpha _{t}:=1-\beta _{t}} α ¯ t := α 1 ⋯ α t {\displaystyle {\bar {\alpha }}_{t}:=\alpha _{1}\cdots \alpha _{t}} σ t := 1 − α ¯ t {\displaystyle \sigma _{t}:={\sqrt {1-{\bar {\alpha }}_{t}}}} σ ~ t := σ t − 1 σ t β t {\displaystyle {\tilde {\sigma }}_{t}:={\frac {\sigma _{t-1}}{\sigma _{t}}}{\sqrt {\beta _{t}}}} μ ~ t ( x t , x 0 ) := α t ( 1 − α ¯ t − 1 ) x t + α ¯ t − 1 ( 1 − α t ) x 0 σ t 2 {\displaystyle {\tilde {\mu }}_{t}(x_{t},x_{0}):={\frac {{\sqrt {\alpha _{t}}}(1-{\bar {\alpha }}_{t-1})x_{t}+{\sqrt {{\bar {\alpha }}_{t-1}}}(1-\alpha _{t})x_{0}}{\sigma _{t}^{2}}}} N ( μ , Σ ) {\displaystyle {\mathcal {N}}(\mu ,\Sigma )} is the normal distribution with mean μ {\displaystyle \mu } and variance Σ {\displaystyle \Sigma } , and N ( x | μ , Σ ) {\displaystyle {\mathcal {N}}(x|\mu ,\Sigma )} is the probability density at x {\displaystyle x} . A vertical bar denotes conditioning. A forward diffusion process starts at some starting point x 0 ∼ q {\displaystyle x_{0}\sim q} , where q {\displaystyle q} is the probability distribution to be learned, then repeatedly adds noise to it by x t = 1 − β t x t − 1 + β t z t {\displaystyle x_{t}={\sqrt {1-\beta _{t}}}x_{t-1}+{\sqrt {\beta _{t}}}z_{t}} where z 1 , . . . , z T {\displaystyle z_{1},...,z_{T}} are IID (Independent and identically distributed random variables) samples from N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . The coefficients 1 − β t {\displaystyle {\sqrt {1-\beta _{t}}}} and β t {\displaystyle {\sqrt {\beta _{t}}}} ensure that Var ( X t ) = I {\displaystyle {\mbox{Var}}(X_{t})=I} assuming that Var ( X 0 ) = I {\displaystyle {\mbox{Var}}(X_{0})=I} . The values of β t {\displaystyle \beta _{t}} are chosen such that for any starting distribution of x 0 {\displaystyle x_{0}} , if it has finite second moment, then lim t → ∞ x t | x 0 {\displaystyle \lim _{t\to \infty }x_{t}|x_{0}} converges to N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . The entire diffusion process then satisfies q ( x 0 : T ) = q ( x 0 ) q ( x 1 | x 0 ) ⋯ q ( x T | x T − 1 ) = q ( x 0 ) N ( x 1 | α 1 x 0 , β 1 I ) ⋯ N ( x T | α T x T − 1 , β T I ) {\displaystyle q(x_{0:T})=q(x_{0})q(x_{1}|x_{0})\cdots q(x_{T}|x_{T-1})=q(x_{0}){\mathcal {N}}(x_{1}|{\sqrt {\alpha _{1}}}x_{0},\beta _{1}I)\cdots {\mathcal {N}}(x_{T}|{\sqrt {\alpha _{T}}}x_{T-1},\beta _{T}I)} or ln ⁡ q ( x 0 : T ) = ln ⁡ q ( x 0 ) − ∑ t = 1 T 1 2 β t ‖ x t − 1 − β t x t − 1 ‖ 2 + C {\displaystyle \ln q(x_{0:T})=\ln q(x_{0})-\sum _{t=1}^{T}{\frac {1}{2\beta _{t}}}\|x_{t}-{\sqrt {1-\beta _{t}}}x_{t-1}\|^{2}+C} where C {\displaystyle C} is a normalization constant and often omitted. In particular, we note that x 1 : T | x 0 {\displaystyle x_{1:T}|x_{0}} is a Gaussian process, which affords us considerable freedom in reparameterization. For example, by standard manipulation with Gaussian process, x t | x 0 ∼ N ( α ¯ t x 0 , σ t 2 I ) {\displaystyle x_{t}|x_{0}\sim N\left({\sqrt {{\bar {\alpha }}_{t}}}x_{0},\sigma _{t}^{2}I\right)} x t − 1 | x t , x 0 ∼ N ( μ ~ t ( x t , x 0 ) , σ ~ t 2 I ) {\displaystyle x_{t-1}|x_{t},x_{0}\sim {\mathcal {N}}({\tilde {\mu }}_{t}(x_{t},x_{0}),{\tilde {\sigma }}_{t}^{2}I)} In particular, notice that for large t {\displaystyle t} , the variable x t | x 0 ∼ N ( α ¯ t x 0 , σ t 2 I ) {\displaystyle x_{t}|x_{0}\sim N\left({\sqrt {{\bar {\alpha }}_{t}}}x_{0},\sigma _{t}^{2}I\right)} converges to N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . That is, after a long enough diffusion process, we end up with some x T {\displaystyle x_{T}} that is very close to N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} , with all traces of the original x 0 ∼ q {\displaystyle x_{0}\sim q} gone. For example, since x t | x 0 ∼ N ( α ¯ t x 0 , σ t 2 I ) {\displaystyle x_{t}|x_{0}\sim N\left({\sqrt {{\bar {\alpha }}_{t}}}x_{0},\sigma _{t}^{2}I\right)} we can sample x t | x 0 {\displaystyle x_{t}|x_{0}} directly "in one step", instead of going through all the intermediate steps x 1 , x 2 , . . . , x t − 1 {\displaystyle x_{1},x_{2},...,x_{t-1}} . ==== Backward diffusion ==== The key idea of DDPM is to use a neural network parametrized by θ {\displaystyle \theta } . The network takes in two arguments x t , t {\displaystyle x_{t},t} , and outputs a vector μ θ ( x t , t ) {\displaystyle \mu _{\theta }(x_{t},t)} and a matrix Σ θ ( x t , t ) {\displaystyle \Sigma _{\theta }(x_{t},t)} , such that each step in the forward diffusion process can be approximately undone by x t − 1 ∼ N ( μ θ ( x t , t ) , Σ θ ( x t , t ) ) {\displaystyle x_{t-1}\sim {\mathcal {N}}(\mu _{\theta }(x_{t},t),\Sigma _{\theta }(x_{t},t))} . This then gives us a backward diffusion process p θ {\displaystyle p_{\theta }} defined by p θ ( x T ) = N ( x T | 0 , I ) {\displaystyle p_{\theta }(x

    Read more →
  • Linear discriminant analysis

    Linear discriminant analysis

    Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification. LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. However, ANOVA uses categorical independent variables and a continuous dependent variable, whereas discriminant analysis has continuous independent variables and a categorical dependent variable (i.e. the class label). Logistic regression and probit regression are more similar to LDA than ANOVA is, as they also explain a categorical variable by the values of continuous independent variables. These other methods are preferable in applications where it is not reasonable to assume that the independent variables have a normal distribution, which is a fundamental assumption of the LDA method. LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA, in contrast, does not take into account any difference in class, and factor analysis builds the feature combinations based on similarities rather than differences. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made. LDA works when the measurements made on independent variables for each observation are continuous quantities. When dealing with categorical independent variables, the equivalent technique is discriminant correspondence analysis. Discriminant analysis is used when groups are known a priori (unlike in cluster analysis). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure. In simple terms, discriminant function analysis is classification - the act of distributing things into groups, classes or categories of the same type. == History == The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936. It is different from an ANOVA or MANOVA, which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership. == LDA for two classes == Consider a set of observations x → {\displaystyle {\vec {x}}} (also called features, attributes, variables or measurements) for each sample of an object or event with known class y {\displaystyle y} . This set of samples is called the training set in a supervised learning context. The classification problem is then to find a good predictor for the class y {\displaystyle y} of any sample of the same distribution (not necessarily from the training set) given only an observation x → {\displaystyle {\vec {x}}} . LDA approaches the problem by assuming that the conditional probability density functions p ( x → | y = 0 ) {\displaystyle p({\vec {x}}|y=0)} and p ( x → | y = 1 ) {\displaystyle p({\vec {x}}|y=1)} are both the normal distribution with mean and covariance parameters ( μ → 0 , Σ 0 ) {\displaystyle \left({\vec {\mu }}_{0},\Sigma _{0}\right)} and ( μ → 1 , Σ 1 ) {\displaystyle \left({\vec {\mu }}_{1},\Sigma _{1}\right)} , respectively. Under this assumption, the Bayes-optimal solution is to predict points as being from the second class if the log of the likelihood ratios is bigger than some threshold T, so that: 1 2 ( x → − μ → 0 ) T Σ 0 − 1 ( x → − μ → 0 ) + 1 2 ln ⁡ | Σ 0 | − 1 2 ( x → − μ → 1 ) T Σ 1 − 1 ( x → − μ → 1 ) − 1 2 ln ⁡ | Σ 1 | > T {\displaystyle {\frac {1}{2}}({\vec {x}}-{\vec {\mu }}_{0})^{\mathrm {T} }\Sigma _{0}^{-1}({\vec {x}}-{\vec {\mu }}_{0})+{\frac {1}{2}}\ln |\Sigma _{0}|-{\frac {1}{2}}({\vec {x}}-{\vec {\mu }}_{1})^{\mathrm {T} }\Sigma _{1}^{-1}({\vec {x}}-{\vec {\mu }}_{1})-{\frac {1}{2}}\ln |\Sigma _{1}|\ >\ T} Without any further assumptions, the resulting classifier is referred to as quadratic discriminant analysis (QDA). LDA instead makes the additional simplifying homoscedasticity assumption (i.e. that the class covariances are identical, so Σ 0 = Σ 1 = Σ {\displaystyle \Sigma _{0}=\Sigma _{1}=\Sigma } ) and that the covariances have full rank. In this case, several terms cancel: x → T Σ 0 − 1 x → = x → T Σ 1 − 1 x → {\displaystyle {\vec {x}}^{\mathrm {T} }\Sigma _{0}^{-1}{\vec {x}}={\vec {x}}^{\mathrm {T} }\Sigma _{1}^{-1}{\vec {x}}} x → T Σ i − 1 μ → i = μ → i T Σ i − 1 x → {\displaystyle {\vec {x}}^{\mathrm {T} }{\Sigma _{i}}^{-1}{\vec {\mu }}_{i}={{\vec {\mu }}_{i}}^{\mathrm {T} }{\Sigma _{i}}^{-1}{\vec {x}}} because both sides are scalar and transpose to each other ( Σ i {\displaystyle \Sigma _{i}} is Hermitian) and the above decision criterion becomes a threshold on the dot product w → T x → > c {\displaystyle {\vec {w}}^{\mathrm {T} }{\vec {x}}>c} for some threshold constant c, where w → = Σ − 1 ( μ → 1 − μ → 0 ) {\displaystyle {\vec {w}}=\Sigma ^{-1}({\vec {\mu }}_{1}-{\vec {\mu }}_{0})} c = 1 2 w → T ( μ → 1 + μ → 0 ) {\displaystyle c={\frac {1}{2}}\,{\vec {w}}^{\mathrm {T} }({\vec {\mu }}_{1}+{\vec {\mu }}_{0})} This means that the criterion of an input x → {\displaystyle {\vec {x}}} being in a class y {\displaystyle y} is purely a function of this linear combination of the known observations. It is often useful to see this conclusion in geometrical terms: the criterion of an input x → {\displaystyle {\vec {x}}} being in a class y {\displaystyle y} is purely a function of projection of multidimensional-space point x → {\displaystyle {\vec {x}}} onto vector w → {\displaystyle {\vec {w}}} (thus, we only consider its direction). In other words, the observation belongs to y {\displaystyle y} if corresponding x → {\displaystyle {\vec {x}}} is located on a certain side of a hyperplane perpendicular to w → {\displaystyle {\vec {w}}} . The location of the plane is defined by the threshold c {\displaystyle c} . == Assumptions == The assumptions of discriminant analysis are the same as those for MANOVA. The analysis is quite sensitive to outliers and the size of the smallest group must be larger than the number of predictor variables. Multivariate normality: Independent variables are normal for each level of the grouping variable. Homogeneity of variance/covariance (homoscedasticity): Variances among group variables are the same across levels of predictors. Can be tested with Box's M statistic. It has been suggested, however, that linear discriminant analysis be used when covariances are equal, and that quadratic discriminant analysis may be used when covariances are not equal. Independence: Participants are assumed to be randomly sampled, and a participant's score on one variable is assumed to be independent of scores on that variable for all other participants. It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions, and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated). == Discriminant functions == Discriminant analysis works by creating one or more linear combinations of predictors, creating a new latent variable for each function. These functions are called discriminant functions. The number of functions possible is either N g − 1 {\displaystyle N_{g}-1} where N g {\displaystyle N_{g}} = number of groups, or p {\displaystyle p} (the number of predictors), whichever is smaller. The first function created maximizes the differences between groups on that function. The second function maximizes differences on that function, but also must not be correlated with the previous function. This continues with subsequent functions with the requirement that the new function not be correlated with any of the previous functions. Given group j {\displaystyle j} , with R j {\displaystyle \mathbb {R} _{j}} sets of sample space, there is a discriminant rule such that if x ∈ R j {\displaystyle x\in \mathbb {R} _{j}} , then x ∈ j {\displaystyle x\in j} . Discriminant analysis then, finds “good” regions of R j {\displaystyle \mathbb {R} _{j}} to minimize classification error, therefore leading to a high percent correct classified in the classification table. Each function is given a discriminant score to determine how well it predicts group placement. Structure Corr

    Read more →
  • AppBlock

    AppBlock

    AppBlock is a software tool for managing screen time that limits access to selected mobile applications and websites. Developed by the Czech studio MobileSoft, it is distributed for Android and iOS devices as well as through browser extensions for Google Chrome, Microsoft Edge and Brave, and as desktop solutions. The application is used primarily to restrict time spent on social media and similar distracting services while working and studying. By 2025, the application reported 700,000 monthly active users, with the domestic Czech market accounting for less than one percent of its total user base and revenue. == History == === Origins === AppBlock was created by the Czech software studio MobileSoft, based in Hradec Králové. The studio was founded in 2012 by Miroslav Novosvětský, who remains the sole owner. The idea for the application arose from the use of browser-based website blockers on desktop computers. AppBlock was conceived as a way to reduce the time spent on mobile devices. === Early releases === In its early phase, AppBlock was available only for phones running on Android. Early versions allowed users to limit access to selected applications and websites during specified periods. From the outset, the application was distributed internationally rather than only within the Czech market, and early coverage reported a multi-million number of downloads worldwide. === Expansion of functionality === Over time, AppBlock has expanded beyond basic application blocking to include additional functions related to limiting procrastination and managing attention. The development of AppBlock accelerated during the COVID-19 pandemic. Following a reduction in external client orders, the studio reallocated resources from contract development to the application. Increased digital content consumption during lockdowns contributed to a rise in the application's usage and revenue. As the application developed, it became the company's product with the largest user base. Novosvětský described an increase in downloads over a twelve-month period, which he linked in part to the company's activities abroad, including participation in events focused on mobile marketing in the United States. These activities were an important factor in the further development of AppBlock. === Internationalization and market expansion === Within roughly the first eight years of the company's existence, MobileSoft became active both in the domestic Czech market and in the United States, supported among other things by participation in the CzechAccelerator program, which is intended to help Czech firms enter foreign markets. In mid-August 2021 the developers launched a version for iOS, which soon began to attract paying users. The expansion to iOS was accompanied by plans for cooperation with the Procrastination.com platform, intended to complement the blocking functions with educational content related to digital media use, sleep and work habits. By 2025, AppBlock was localised into 15 languages, with the largest share of users in the United States, the United Kingdom, Germany, and France, with recent growth in Brazil, and usage extending across several continents. AppBlock has reached more than 10 million installations. In the same period its creators announced plans to refine existing functions and to expand support beyond mobile phones to desktop use, including through support for additional web browsers. == Features == === Supported platforms === AppBlock is distributed as a mobile application for Android and iOS users through Google Play and the Apple App Store. Browser extensions for desktop systems are available for Google Chrome, Microsoft Edge and Brave. === Functionality === AppBlock's core function is to restrict access to selected applications and websites. The mobile application shows a list of installed apps and lets the user select which ones to block. It also includes tools to block specific websites and, on iOS, to block certain phrases entered in the Safari browser. AppBlock can mute notifications from selected applications, so alerts from those apps do not appear while blocking is active. In addition to choosing which apps or content to block, the software also offers an allowlist mode, where only selected applications remain accessible and all others are blocked. Blocking rules are organized into configurable schedules, called profiles. Users can create profiles that define time periods when selected apps and websites are unavailable. Newer versions also allow profiles to be activated automatically based on the time of day, days of the week, the device's location, or connection to specific Wi-Fi networks. The iOS version lets users set limits on how often or how long certain apps can be used before they are blocked, and it can track and restrict screen time for individual apps. In addition to these recurring rules, AppBlock includes a Quick Block feature that temporarily blocks selected apps and websites with a single action, without requiring a separate long-term schedule. Strict Mode is an optional setting that limits the ability to change blocking once it is active. For a specified period, it prevents editing AppBlock's rules and can be configured to stop the app from being uninstalled during that time. While Strict Mode is enabled, users cannot modify or disable the restrictions they have set. Deactivation requires specific verification steps, such as connecting the device to a charger or obtaining approval from a designated contact person. The mobile application also includes statistical and reporting features. In addition to blocking, AppBlock lets users view statistics and data about their use of applications and websites, including screen-time summaries and focus sessions that silence notifications and enforce blocking during defined work or study periods. Browser extensions for desktop environments apply AppBlock's website-blocking functions on Windows and macOS systems through supported web browsers. == Business model == AppBlock uses a freemium revenue model. The basic version of the application is available free of charge and allows blocking of up to three applications at the same time. The premium version removes this limit and adds further configuration options. In 2020, the application shifted from a one-time payment structure to a subscription model. By 2021, AppBlock had more than seven thousand paying users and annual revenue of about four million Czech crowns. By 2025, annual revenue reached approximately 4 million US dollars (80 million CZK) before taxes and platform fees, with roughly 20 percent of active users subscribing to the paid version. == Usage == AppBlock limits access to selected applications and websites in order to reduce smartphone overuse and digital distraction. It is used to block social media, games and other services considered addictive, with the aim of reducing frequent checking of mobile devices and creating time intervals in which these services are unavailable. Reported use cases of AppBlock cover work, students, parents, ADHD, mental health, well-being and business. The application is used both by individual users and within workplace initiatives in which employees install it to reduce digital distractions during working hours.

    Read more →
  • Detrended correspondence analysis

    Detrended correspondence analysis

    Detrended correspondence analysis (DCA) is a multivariate statistical technique widely used by ecologists to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify ecological community data. DCA is frequently used to suppress artifacts inherent in most other multivariate analyses when applied to gradient data. == History == DCA was created in 1979 by Mark Hill of the United Kingdom's Institute for Terrestrial Ecology (now merged into Centre for Ecology and Hydrology) and implemented in FORTRAN code package called DECORANA (Detrended Correspondence Analysis), a correspondence analysis method. DCA is sometimes erroneously referred to as DECORANA; however, DCA is the underlying algorithm, while DECORANA is a tool implementing it. == Issues addressed == According to Hill and Gauch, DCA suppresses two artifacts inherent in most other multivariate analyses when applied to gradient data. An example is a time-series of plant species colonising a new habitat; early successional species are replaced by mid-successional species, then by late successional ones (see example below). When such data are analysed by a standard ordination such as a correspondence analysis: the ordination scores of the samples will exhibit the 'edge effect', i.e. the variance of the scores at the beginning and the end of a regular succession of species will be considerably smaller than that in the middle, when presented as a graph the points will be seen to follow a horseshoe shaped curve rather than a straight line ('arch effect'), even though the process under analysis is a steady and continuous change that human intuition would prefer to see as a linear trend. Outside ecology, the same artifacts occur when gradient data are analysed (e.g. soil properties along a transect running between 2 different geologies, or behavioural data over the lifespan of an individual) because the curved projection is an accurate representation of the shape of the data in multivariate space. Ter Braak and Prentice (1987, p. 121) cite a simulation study analysing two-dimensional species packing models resulting in a better performance of DCA compared to CA. == Method == DCA is an iterative algorithm that has shown itself to be a highly reliable and useful tool for data exploration and summary in community ecology (Shaw 2003). It starts by running a standard ordination (CA or reciprocal averaging) on the data, to produce the initial horse-shoe curve in which the 1st ordination axis distorts into the 2nd axis. It then divides the first axis into segments (default = 26), and rescales each segment to have mean value of zero on the 2nd axis - this effectively squashes the curve flat. It also rescales the axis so that the ends are no longer compressed relative to the middle, so that 1 DCA unit approximates to the same rate of turnover all the way through the data: the rule of thumb is that 4 DCA units mean that there has been a total turnover in the community. Ter Braak and Prentice (1987, p. 122) warn against the non-linear rescaling of the axes due to robustness issues and recommend using detrending-by-polynomials only. == Drawbacks == No significance tests are available with DCA, although there is a constrained (canonical) version called DCCA in which the axes are forced by Multiple linear regression to correlate optimally with a linear combination of other (usually environmental) variables; this allows testing of a null model by Monte-Carlo permutation analysis. == Example == The example shows an ideal data set: The species data is in rows, samples in columns. For each sample along the gradient, a new species is introduced but another species is no longer present. The result is a sparse matrix. Ones indicate the presence of a species in a sample. Except at the edges each sample contains five species. The plot of the first two axes of the correspondence analysis result on the right hand side clearly shows the disadvantages of this procedure: the edge effect, i.e. the points are clustered at the edges of the first axis, and the arch effect. == Software == An open source implementation of DCA, based on the original FORTRAN code, is available in the vegan R-package.

    Read more →
  • Tanagra (machine learning)

    Tanagra (machine learning)

    Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France. Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression, factor analysis, clustering, classification and association rule learning. Tanagra is an academic project. It is widely used in French-speaking universities. Tanagra is frequently used in real studies and in software comparison papers. == History == The development of Tanagra was started in June 2003. The first version was distributed in December 2003. Tanagra is the successor of Sipina, another free data mining tool which is intended only for supervised learning tasks (classification), especially the interactive and visual construction of decision trees. Sipina is still available online and is maintained. Tanagra is an "open source project" as every researcher can access the source code and add their own algorithms, as long as they agree and conform to the software distribution license. The main purpose of the Tanagra project is to give researchers and students a user-friendly data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing the analyzation of either real or synthetic data. From 2006, Ricco Rakotomalala made an important documentation effort. A large number of tutorials are published on a dedicated website. They describe the statistical and machine learning methods and their implementation with Tanagra on real case studies. The use of other free data mining tools on the same problems is also widely described. The comparison of the tools enables readers to understand the possible differences in the presentation of results. == Description == Tanagra works similarly to current data mining tools. The user can design visually a data mining process in a diagram. Each node is a statistical or machine learning technique, the connection between two nodes represents the data transfer. But unlike the majority of tools which are based on the workflow paradigm, Tanagra is very simplified. The treatments are represented in a tree diagram. The results are displayed in an HTML format. This makes it is easy to export the outputs in order to visualize the results in a browser. It is also possible to copy the result tables to a spreadsheet. Tanagra makes a good compromise between statistical approaches (e.g. parametric and nonparametric statistical tests), multivariate analysis methods (e.g. factor analysis, correspondence analysis, cluster analysis, regression) and machine learning techniques (e.g. neural network, support vector machine, decision trees, random forest).

    Read more →