AI Email Plugin For Outlook

AI Email Plugin For Outlook — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Multimodal representation learning

    Multimodal representation learning

    Multimodal representation learning is a subfield of representation learning focused on integrating and interpreting information from different modalities, such as text, images, audio, or video, by projecting them into a shared latent space. This allows for semantically similar content across modalities to be mapped to nearby points within that space, facilitating a unified understanding of diverse data types. By automatically learning meaningful features from each modality and capturing their inter-modal relationships, multimodal representation learning enables a unified representation that enhances performance in cross-media analysis tasks such as video classification, event detection, and sentiment analysis. It also supports cross-modal retrieval and translation, including image captioning, video description, and text-to-image synthesis. == Motivation == The primary motivations for multimodal representation learning arise from the inherent nature of real-world data and the limitations of unimodal approaches. Since multimodal data offers complementary and supplementary information about an object or event from different perspectives, it is more informative than relying on a single modality. A key motivation is to narrow the heterogeneity gap that exists between different modalities by projecting their features into a shared semantic subspace. This allows semantically similar content across modalities to be represented by similar vectors, facilitating the understanding of relationships and correlations between them. Multimodal representation learning aims to leverage the unique information provided by each modality to achieve a more comprehensive and accurate understanding of concepts. These unified representations are crucial for improving performance in various cross-media analysis tasks such as video classification, event detection, and sentiment analysis. They also enable cross-modal retrieval, allowing users to search and retrieve content across different modalities. Additionally, it facilitates cross-modal translation, where information can be converted from one modality to another, as seen in applications like image captioning and text-to-image synthesis. The abundance of ubiquitous multimodal data in real-world applications, including understudied areas like healthcare, finance, and human-computer interaction (HCI), further motivates the development of effective multimodal representation learning techniques. == Approaches and methods == === Canonical-correlation analysis based methods === Canonical-correlation analysis (CCA) was first introduced in 1936 by Harold Hotelling and is a fundamental approach for multimodal learning. CCA aims to find linear relationships between two sets of variables. Given two data matrices X ∈ R n × p {\displaystyle X\in \mathbb {R} ^{n\times p}} and Y ∈ R n × q {\displaystyle Y\in \mathbb {R} ^{n\times q}} representing different modalities, CCA finds projection vectors w x ∈ R p {\displaystyle w_{x}\in \mathbb {R} ^{p}} and w y ∈ R q {\displaystyle w_{y}\in \mathbb {R} ^{q}} that maximizes the correlation between the projected variables: ρ = max w x , w y w x ⊤ Σ x y w y w x ⊤ Σ x x w x w y ⊤ Σ y y w y {\displaystyle \rho =\max _{w_{x},w_{y}}{\frac {w_{x}^{\top }\Sigma _{xy}w_{y}}{{\sqrt {w_{x}^{\top }\Sigma _{xx}w_{x}}}{\sqrt {w_{y}^{\top }\Sigma _{yy}w_{y}}}}}} such that Σ x x {\displaystyle \Sigma _{xx}} and Σ y y {\displaystyle \Sigma _{yy}} are the within-modality covariance matrices, and Σ x y {\displaystyle \Sigma _{xy}} is the between-modality covariance matrix. However, standard CCA is limited by its linearity, which led to the development of nonlinear extensions, such as kernel CCA and deep CCA. ==== Kernel CCA ==== Kernel canonical correlation analysis (KCCA) extends traditional CCA to capture nonlinear relationships between modalities by implicitly mapping the data into high dimensional feature spaces using kernel functions. Given kernel functions K x {\displaystyle K_{x}} and K y {\displaystyle K_{y}} with corresponding Gram matrices K x ∈ R n × n {\displaystyle K_{x}\in \mathbb {R} ^{n\times n}} and K y ∈ R n × n {\displaystyle K_{y}\in \mathbb {R} ^{n\times n}} , KCCA seeks coefficients α {\displaystyle \alpha } and β {\displaystyle \beta } that maximize: ρ = max α , β α ⊤ K x K y β α ⊤ K x 2 α β ⊤ K y 2 β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{\top }K_{x}Ky\beta }{{\sqrt {\alpha ^{\top }K_{x}^{2}\alpha }}{\sqrt {\beta ^{\top }K_{y}^{2}\beta }}}}} To prevent overfitting, regularization terms are typically added, resulting in: ρ = max α , β α T K x K y β α T ( K x 2 + λ x K x ) α β T ( K y 2 + λ y K y ) β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{T}K_{x}K_{y}\beta }{{\sqrt {\alpha ^{T}\left(K_{x}^{2}+\lambda _{x}K_{x}\right)\alpha }}{\sqrt {\;\beta ^{T}\left(K_{y}^{2}+\lambda _{y}K_{y}\right)\beta }}}}} where λ x {\displaystyle \lambda _{x}} and λ y {\displaystyle \lambda _{y}} are regularization parameters. KCCA has proven effective for tasks such as cross-modal retrieval and semantic analysis, though it faces computational challenges with large datasets due to its O ( n 2 ) {\displaystyle O(n^{2})} memory requirement for sorting kernel matrices. KCCA was proposed independently by several researchers. ==== Deep CCA ==== Deep canonical correlation analysis (DCCA), introduced in 2013, employs neural networks to learn nonlinear transformations for maximizing the correlation between modalities. DCCA uses separate neural networks f x {\displaystyle f_{x}} and f y {\displaystyle f_{y}} for each modality to transform the original data before applying CCA: max W x , W y , θ x , θ y corr ⁡ ( f x ( X ; θ x ) , f y ( Y ; θ y ) ) {\displaystyle \max _{W_{x},W_{y},\theta _{x},\theta _{y}}\operatorname {corr} \left(f_{x}(X;\theta _{x}),f_{y}(Y;\theta _{y})\right)} where θ x {\displaystyle \theta _{x}} and θ y {\displaystyle \theta _{y}} represent the parameters of the neural networks, and W x {\displaystyle W_{x}} and W y {\displaystyle W_{y}} are the CCA projection matrices. The correlation objective is computed as: corr ⁡ ( H x , H y ) = tr ⁡ ( T − 1 / 2 H x T H y S − 1 / 2 ) {\displaystyle \operatorname {corr} (H_{x},H_{y})=\operatorname {tr} \left(T^{-1/2}H_{x}^{T}H_{y}S^{-1/2}\right)} where H x = f x ( X ) {\displaystyle H_{x}=f_{x}(X)} and H y = f y ( Y ) {\displaystyle H_{y}=f_{y}(Y)} are the network outputs, T = H x T H x + r x I {\displaystyle T=H_{x}^{T}H_{x}+r_{x}I} , S = H y T H y + r y I {\displaystyle S=H_{y}^{T}H_{y}+r_{y}I} and r x , r y {\displaystyle r_{x},r_{y}} are the regularization parameters. DCCA overcomes the limitations of linear CCA and kernel CCA by learning complex nonlinear relationships while maintaining computational efficiency for large datasets through mini-batch optimization. === Graph-based methods === Graph-based approaches for multimodal representation learning leverage graph structure to model relationships between entities across different modalities. These methods typically represent each modality as a graph and then learn embedding that preserve cross-modal similarities, enabling more effective joint representation of heterogeneous data. One such method is cross-modal graph neural networks (CMGNNs) that extend traditional graph neural networks (GNNs) to handle data from multiple modalities by constructing graphs that capture both intra-modal and inter-modal relationships. These networks model interactions across modalities by representing them as nodes and their relationships as edges. Other graph-based methods include Probabilistic Graphical Models (PGMs) such as deep belief networks (DBN) and deep Boltzmann machines (DBM). These models can learn a joint representation across modalities, for instance, a multimodal DBN achieves this by adding a shared restricted Boltzmann Machine (RBM) hidden layer on top of modality-specific DBNs. Additionally, the structure of data in some domains like Human-Computer Interaction (HCI), such as the view hierarchy of app screens, can potentially be modeled using graph-like structures. The field of graph representation learning is also relevant, with ongoing progress in developing evaluation benchmarks. === Diffusion maps === Another set of methods relevant to multimodal representation learning are based on diffusion maps and their extensions to handle multiple modalities. ==== Multi-view diffusion maps ==== Multi-view diffusion maps address the challenge of achieving multi-view dimensionality reduction by effectively utilizing the availability of multiple views to extract a coherent low-dimensional representation of the data. The core idea is to exploit both the intrinsic relations within each view and the mutual relations between the different views, defining a cross-view model where a random walk process implicitly hops between objects in different views. A multi-view kernel matrix is constructed by combining these relations, defining a cross-view diffusion process and associ

    Read more →
  • Application-release automation

    Application-release automation

    Application-release automation (ARA) refers to the process of packaging and deploying an application or update of an application from development, across various environments, and ultimately to production. ARA solutions must combine the capabilities of deployment automation, environment management and modeling, and release coordination. == Relationship with DevOps == ARA tools help cultivate DevOps best practices by providing a combination of automation, environment modeling and workflow-management capabilities. These practices help teams deliver software rapidly, reliably and responsibly. ARA tools achieve a key DevOps goal of implementing continuous delivery with a large quantity of releases quickly. == Relationship with deployment == ARA is more than just software-deployment automation – it deploys applications using structured release-automation techniques that allow for an increase in visibility for the whole team. It combines workload automation and release-management tools as they relate to release packages, as well as movement through different environments within the DevOps pipeline. ARA tools help regulate deployments, how environments are created and deployed, and how and when releases are deployed. == ARA Solutions == All ARA solutions must include capabilities in automation, environment modeling, and release coordination. Additionally, the solution must provide this functionality without reliance on other tools.

    Read more →
  • Neural network Gaussian process

    Neural network Gaussian process

    A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained. == Motivation == Bayesian networks are a modeling tool for assigning probabilities to events, and thereby characterizing the uncertainty in a model's predictions. Deep learning and artificial neural networks are approaches used in machine learning to build computational models which learn from training examples. Bayesian neural networks merge these fields. They are a type of neural network whose parameters and predictions are both probabilistic. While standard neural networks often assign high confidence even to incorrect predictions, Bayesian neural networks can more accurately evaluate how likely their predictions are to be correct. Computation in artificial neural networks is usually organized into sequential layers of artificial neurons. The number of neurons in a layer is called the layer width. When we consider a sequence of Bayesian neural networks with increasingly wide layers (see figure), they converge in distribution to a NNGP. This large width limit is of practical interest, since the networks often improve as layers get wider. And the process may give a closed form way to evaluate networks. NNGPs also appears in several other contexts: It describes the distribution over predictions made by wide non-Bayesian artificial neural networks after random initialization of their parameters, but before training; it appears as a term in neural tangent kernel prediction equations; it is used in deep information propagation to characterize whether hyperparameters and architectures will be trainable. It is related to other large width limits of neural networks. === Scope === The first correspondence result had been established in the 1995 PhD thesis of Radford M. Neal, then supervised by Geoffrey Hinton at University of Toronto. Neal cites David J. C. MacKay as inspiration, who worked in Bayesian learning. Today the correspondence is proven for: Single hidden layer Bayesian neural networks; deep fully connected networks as the number of units per layer is taken to infinity; convolutional neural networks as the number of channels is taken to infinity; transformer networks as the number of attention heads is taken to infinity; recurrent networks as the number of units is taken to infinity. In fact, this NNGP correspondence holds for almost any architecture: Generally, if an architecture can be expressed solely via matrix multiplication and coordinatewise nonlinearities (i.e., a tensor program), then it has an infinite-width GP. This in particular includes all feedforward or recurrent neural networks composed of multilayer perceptron, recurrent neural networks (e.g., LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. === Illustration === Every setting of a neural network's parameters θ {\displaystyle \theta } corresponds to a specific function computed by the neural network. A prior distribution p ( θ ) {\displaystyle p(\theta )} over neural network parameters therefore corresponds to a prior distribution over functions computed by the network. As neural networks are made infinitely wide, this distribution over functions converges to a Gaussian process for many architectures. The notation used in this section is the same as the notation used below to derive the correspondence between NNGPs and fully connected networks, and more details can be found there. The figure to the right plots the one-dimensional outputs z L ( ⋅ ; θ ) {\displaystyle z^{L}(\cdot ;\theta )} of a neural network for two inputs x {\displaystyle x} and x ∗ {\displaystyle x^{}} against each other. The black dots show the function computed by the neural network on these inputs for random draws of the parameters from p ( θ ) {\displaystyle p(\theta )} . The red lines are iso-probability contours for the joint distribution over network outputs z L ( x ; θ ) {\displaystyle z^{L}(x;\theta )} and z L ( x ∗ ; θ ) {\displaystyle z^{L}(x^{};\theta )} induced by p ( θ ) {\displaystyle p(\theta )} . This is the distribution in function space corresponding to the distribution p ( θ ) {\displaystyle p(\theta )} in parameter space, and the black dots are samples from this distribution. For infinitely wide neural networks, since the distribution over functions computed by the neural network is a Gaussian process, the joint distribution over network outputs is a multivariate Gaussian for any finite set of network inputs. == Discussion == === Infinitely wide fully connected network === This section expands on the correspondence between infinitely wide neural networks and Gaussian processes for the specific case of a fully connected architecture. It provides a proof sketch outlining why the correspondence holds, and introduces the specific functional form of the NNGP for fully connected networks. The proof sketch closely follows the approach by Novak and coauthors. ==== Network architecture specification ==== Consider a fully connected artificial neural network with inputs x {\displaystyle x} , parameters θ {\displaystyle \theta } consisting of weights W l {\displaystyle W^{l}} and biases b l {\displaystyle b^{l}} for each layer l {\displaystyle l} in the network, pre-activations (pre-nonlinearity) z l {\displaystyle z^{l}} , activations (post-nonlinearity) y l {\displaystyle y^{l}} , pointwise nonlinearity ϕ ( ⋅ ) {\displaystyle \phi (\cdot )} , and layer widths n l {\displaystyle n^{l}} . For simplicity, the width n L + 1 {\displaystyle n^{L+1}} of the readout vector z L {\displaystyle z^{L}} is taken to be 1. The parameters of this network have a prior distribution p ( θ ) {\displaystyle p(\theta )} , which consists of an isotropic Gaussian for each weight and bias, with the variance of the weights scaled inversely with layer width. This network is illustrated in the figure to the right, and described by the following set of equations: x ≡ input y l ( x ) = { x l = 0 ϕ ( z l − 1 ( x ) ) l > 0 z i l ( x ) = ∑ j W i j l y j l ( x ) + b i l W i j l ∼ N ( 0 , σ w 2 n l ) b i l ∼ N ( 0 , σ b 2 ) ϕ ( ⋅ ) ≡ nonlinearity y l ( x ) , z l − 1 ( x ) ∈ R n l × 1 n L + 1 = 1 θ = { W 0 , b 0 , … , W L , b L } {\displaystyle {\begin{aligned}x&\equiv {\text{input}}\\y^{l}(x)&=\left\{{\begin{array}{lcl}x&&l=0\\\phi \left(z^{l-1}(x)\right)&&l>0\end{array}}\right.\\z_{i}^{l}(x)&=\sum _{j}W_{ij}^{l}y_{j}^{l}(x)+b_{i}^{l}\\W_{ij}^{l}&\sim {\mathcal {N}}\left(0,{\frac {\sigma _{w}^{2}}{n^{l}}}\right)\\b_{i}^{l}&\sim {\mathcal {N}}\left(0,\sigma _{b}^{2}\right)\\\phi (\cdot )&\equiv {\text{nonlinearity}}\\y^{l}(x),z^{l-1}(x)&\in \mathbb {R} ^{n^{l}\times 1}\\n^{L+1}&=1\\\theta &=\left\{W^{0},b^{0},\dots ,W^{L},b^{L}\right\}\end{aligned}}} ==== ==== z l | y l {\displaystyle z^{l}|y^{l}} is a Gaussian process We first observe that the pre-activations z l {\displaystyle z^{l}} are described by a Gaussian process conditioned on the preceding activations y l {\displaystyle y^{l}} . This result holds even at finite width. Each pre-activation z i l {\displaystyle z_{i}^{l}} is a weighted sum of Gaussian random variables, corresponding to the weights W i j l {\displaystyle W_{ij}^{l}} and biases b i l {\displaystyle b_{i}^{l}} , where the coefficients for each of those Gaussian variables are the preceding activations y j l {\displaystyle y_{j}^{l}} . Because they are a weighted sum of zero-mean Gaussians, the z i l {\displaystyle z_{i}^{l}} are themselves zero-mean Gaussians (conditioned on the coefficients y j l {\displaystyle y_{j}^{l}} ). Since the z l {\displaystyle z^{l}} are jointly Gaussian for any set of y l {\displaystyle y^{l}} , they are described by a Gaussian process conditioned on the preceding activations y l {\displaystyle y^{l}} . The covariance or kernel of this Gaussian process depends on the weight and bias variances σ w 2 {\displaystyle \sigma _{w}^{2}} and σ b 2 {\displaystyle \sigma _{b}^{2}} , as well as the second moment matrix K l {\displaystyle K^{l}} of the preceding activations y l {\displaystyle y^{l}} , z i l ∣ y l ∼ G P ( 0 , σ w 2 K l + σ b 2 ) K l ( x , x ′ ) = 1 n l ∑ i y i l ( x ) y i l ( x ′ ) {\displaystyle {\begin{aligned}z_{i}^{l}\mid y^{l}&\sim {\mathcal {GP}}\left(0,\sigma _{w}^{2}K^{l}+\sigma _{b}^{2}\right)\\K^{l}(x,x')&={\frac {1}{n^{l}}}\sum _{i}y_{i}^{l}(x)y_{i}^{l}(x')\end{aligned}}} The effect of the weight scale σ w 2 {\displaystyle \sigma _{w}^{2}} is to rescale the contribution to the covariance matrix from K l {\displaystyle K^{l}} , while the bias is shared for all inputs, and so σ b 2 {\displaystyle \sigma _{b}^{2}} makes the z i l {\displaystyle z_{i}^{l}} for different datapoints more similar and

    Read more →
  • Neural network Gaussian process

    Neural network Gaussian process

    A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained. == Motivation == Bayesian networks are a modeling tool for assigning probabilities to events, and thereby characterizing the uncertainty in a model's predictions. Deep learning and artificial neural networks are approaches used in machine learning to build computational models which learn from training examples. Bayesian neural networks merge these fields. They are a type of neural network whose parameters and predictions are both probabilistic. While standard neural networks often assign high confidence even to incorrect predictions, Bayesian neural networks can more accurately evaluate how likely their predictions are to be correct. Computation in artificial neural networks is usually organized into sequential layers of artificial neurons. The number of neurons in a layer is called the layer width. When we consider a sequence of Bayesian neural networks with increasingly wide layers (see figure), they converge in distribution to a NNGP. This large width limit is of practical interest, since the networks often improve as layers get wider. And the process may give a closed form way to evaluate networks. NNGPs also appears in several other contexts: It describes the distribution over predictions made by wide non-Bayesian artificial neural networks after random initialization of their parameters, but before training; it appears as a term in neural tangent kernel prediction equations; it is used in deep information propagation to characterize whether hyperparameters and architectures will be trainable. It is related to other large width limits of neural networks. === Scope === The first correspondence result had been established in the 1995 PhD thesis of Radford M. Neal, then supervised by Geoffrey Hinton at University of Toronto. Neal cites David J. C. MacKay as inspiration, who worked in Bayesian learning. Today the correspondence is proven for: Single hidden layer Bayesian neural networks; deep fully connected networks as the number of units per layer is taken to infinity; convolutional neural networks as the number of channels is taken to infinity; transformer networks as the number of attention heads is taken to infinity; recurrent networks as the number of units is taken to infinity. In fact, this NNGP correspondence holds for almost any architecture: Generally, if an architecture can be expressed solely via matrix multiplication and coordinatewise nonlinearities (i.e., a tensor program), then it has an infinite-width GP. This in particular includes all feedforward or recurrent neural networks composed of multilayer perceptron, recurrent neural networks (e.g., LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. === Illustration === Every setting of a neural network's parameters θ {\displaystyle \theta } corresponds to a specific function computed by the neural network. A prior distribution p ( θ ) {\displaystyle p(\theta )} over neural network parameters therefore corresponds to a prior distribution over functions computed by the network. As neural networks are made infinitely wide, this distribution over functions converges to a Gaussian process for many architectures. The notation used in this section is the same as the notation used below to derive the correspondence between NNGPs and fully connected networks, and more details can be found there. The figure to the right plots the one-dimensional outputs z L ( ⋅ ; θ ) {\displaystyle z^{L}(\cdot ;\theta )} of a neural network for two inputs x {\displaystyle x} and x ∗ {\displaystyle x^{}} against each other. The black dots show the function computed by the neural network on these inputs for random draws of the parameters from p ( θ ) {\displaystyle p(\theta )} . The red lines are iso-probability contours for the joint distribution over network outputs z L ( x ; θ ) {\displaystyle z^{L}(x;\theta )} and z L ( x ∗ ; θ ) {\displaystyle z^{L}(x^{};\theta )} induced by p ( θ ) {\displaystyle p(\theta )} . This is the distribution in function space corresponding to the distribution p ( θ ) {\displaystyle p(\theta )} in parameter space, and the black dots are samples from this distribution. For infinitely wide neural networks, since the distribution over functions computed by the neural network is a Gaussian process, the joint distribution over network outputs is a multivariate Gaussian for any finite set of network inputs. == Discussion == === Infinitely wide fully connected network === This section expands on the correspondence between infinitely wide neural networks and Gaussian processes for the specific case of a fully connected architecture. It provides a proof sketch outlining why the correspondence holds, and introduces the specific functional form of the NNGP for fully connected networks. The proof sketch closely follows the approach by Novak and coauthors. ==== Network architecture specification ==== Consider a fully connected artificial neural network with inputs x {\displaystyle x} , parameters θ {\displaystyle \theta } consisting of weights W l {\displaystyle W^{l}} and biases b l {\displaystyle b^{l}} for each layer l {\displaystyle l} in the network, pre-activations (pre-nonlinearity) z l {\displaystyle z^{l}} , activations (post-nonlinearity) y l {\displaystyle y^{l}} , pointwise nonlinearity ϕ ( ⋅ ) {\displaystyle \phi (\cdot )} , and layer widths n l {\displaystyle n^{l}} . For simplicity, the width n L + 1 {\displaystyle n^{L+1}} of the readout vector z L {\displaystyle z^{L}} is taken to be 1. The parameters of this network have a prior distribution p ( θ ) {\displaystyle p(\theta )} , which consists of an isotropic Gaussian for each weight and bias, with the variance of the weights scaled inversely with layer width. This network is illustrated in the figure to the right, and described by the following set of equations: x ≡ input y l ( x ) = { x l = 0 ϕ ( z l − 1 ( x ) ) l > 0 z i l ( x ) = ∑ j W i j l y j l ( x ) + b i l W i j l ∼ N ( 0 , σ w 2 n l ) b i l ∼ N ( 0 , σ b 2 ) ϕ ( ⋅ ) ≡ nonlinearity y l ( x ) , z l − 1 ( x ) ∈ R n l × 1 n L + 1 = 1 θ = { W 0 , b 0 , … , W L , b L } {\displaystyle {\begin{aligned}x&\equiv {\text{input}}\\y^{l}(x)&=\left\{{\begin{array}{lcl}x&&l=0\\\phi \left(z^{l-1}(x)\right)&&l>0\end{array}}\right.\\z_{i}^{l}(x)&=\sum _{j}W_{ij}^{l}y_{j}^{l}(x)+b_{i}^{l}\\W_{ij}^{l}&\sim {\mathcal {N}}\left(0,{\frac {\sigma _{w}^{2}}{n^{l}}}\right)\\b_{i}^{l}&\sim {\mathcal {N}}\left(0,\sigma _{b}^{2}\right)\\\phi (\cdot )&\equiv {\text{nonlinearity}}\\y^{l}(x),z^{l-1}(x)&\in \mathbb {R} ^{n^{l}\times 1}\\n^{L+1}&=1\\\theta &=\left\{W^{0},b^{0},\dots ,W^{L},b^{L}\right\}\end{aligned}}} ==== ==== z l | y l {\displaystyle z^{l}|y^{l}} is a Gaussian process We first observe that the pre-activations z l {\displaystyle z^{l}} are described by a Gaussian process conditioned on the preceding activations y l {\displaystyle y^{l}} . This result holds even at finite width. Each pre-activation z i l {\displaystyle z_{i}^{l}} is a weighted sum of Gaussian random variables, corresponding to the weights W i j l {\displaystyle W_{ij}^{l}} and biases b i l {\displaystyle b_{i}^{l}} , where the coefficients for each of those Gaussian variables are the preceding activations y j l {\displaystyle y_{j}^{l}} . Because they are a weighted sum of zero-mean Gaussians, the z i l {\displaystyle z_{i}^{l}} are themselves zero-mean Gaussians (conditioned on the coefficients y j l {\displaystyle y_{j}^{l}} ). Since the z l {\displaystyle z^{l}} are jointly Gaussian for any set of y l {\displaystyle y^{l}} , they are described by a Gaussian process conditioned on the preceding activations y l {\displaystyle y^{l}} . The covariance or kernel of this Gaussian process depends on the weight and bias variances σ w 2 {\displaystyle \sigma _{w}^{2}} and σ b 2 {\displaystyle \sigma _{b}^{2}} , as well as the second moment matrix K l {\displaystyle K^{l}} of the preceding activations y l {\displaystyle y^{l}} , z i l ∣ y l ∼ G P ( 0 , σ w 2 K l + σ b 2 ) K l ( x , x ′ ) = 1 n l ∑ i y i l ( x ) y i l ( x ′ ) {\displaystyle {\begin{aligned}z_{i}^{l}\mid y^{l}&\sim {\mathcal {GP}}\left(0,\sigma _{w}^{2}K^{l}+\sigma _{b}^{2}\right)\\K^{l}(x,x')&={\frac {1}{n^{l}}}\sum _{i}y_{i}^{l}(x)y_{i}^{l}(x')\end{aligned}}} The effect of the weight scale σ w 2 {\displaystyle \sigma _{w}^{2}} is to rescale the contribution to the covariance matrix from K l {\displaystyle K^{l}} , while the bias is shared for all inputs, and so σ b 2 {\displaystyle \sigma _{b}^{2}} makes the z i l {\displaystyle z_{i}^{l}} for different datapoints more similar and

    Read more →
  • Shape context

    Shape context

    Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000. == Theory == The shape context is intended to be a way of describing shapes that allows for measuring shape similarity and the recovering of point correspondences. The basic idea is to pick n points on the contours of a shape. For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all other points. The set of all these vectors is a rich description of the shape localized at that point but is far too detailed. The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor. So, for the point pi, the coarse histogram of the relative coordinates of the remaining n − 1 points, h i ( k ) = # { q ≠ p i : ( q − p i ) ∈ bin ( k ) } {\displaystyle h_{i}(k)=\#\{q\neq p_{i}:(q-p_{i})\in {\mbox{bin}}(k)\}} is defined to be the shape context of p i {\displaystyle p_{i}} . The bins are normally taken to be uniform in log-polar space. The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter "A" are shown. (a) and (b) are the sampled edge points of the two shapes. (c) is the diagram of the log-polar bins used to compute the shape context. (d) is the shape context for the point marked with a circle in (a), (e) is that for the point marked as a diamond in (b), and (f) is that for the triangle. As can be seen, since (d) and (e) are the shape contexts for two closely related points, they are quite similar, while the shape context in (f) is very different. For a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scaling, small perturbations, and, depending on the application, rotation. Translational invariance comes naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance α {\displaystyle \alpha } between all the point pairs in the shape although the median distance can also be used. Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers using synthetic point set matching experiments. One can provide complete rotational invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotational invariance e.g. distinguishing a "6" from a "9". == Use in shape matching == A complete system that uses shape contexts for shape matching consists of the following steps (which will be covered in more detail in the Details of Implementation section): Randomly select a set of points that lie on the edges of a known shape and another set of points on an unknown shape. Compute the shape context of each point found in step 1. Match each point from the known shape to a point on an unknown shape. To minimize the cost of matching, first choose a transformation (e.g. affine, thin plate spline, etc.) that warps the edges of the known shape to the unknown (essentially aligning the two shapes). Then select the point on the unknown shape that most closely corresponds to each warped point on the known shape. Calculate the "shape distance" between each pair of points on the two shapes. Use a weighted sum of the shape context distance, the image appearance distance, and the bending energy (a measure of how much transformation is required to bring the two shapes into alignment). To identify the unknown shape, use a nearest-neighbor classifier to compare its shape distance to shape distances of known objects. == Details of implementation == === Step 1: Finding a list of points on shape edges === The approach assumes that the shape of an object is essentially captured by a finite subset of the points on the internal or external contours on the object. These can be simply obtained using the Canny edge detector and picking a random set of points from the edges. Note that these points need not and in general do not correspond to key-points such as maxima of curvature or inflection points. It is preferable to sample the shape with roughly uniform spacing, though it is not critical. === Step 2: Computing the shape context === This step is described in detail in the Theory section. === Step 3: Computing the cost matrix === Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k). As shape contexts are distributions represented as histograms, it is natural to use the χ2 test statistic as the "shape context cost" of matching the two points: C S = 1 2 ∑ k = 1 K [ g ( k ) − h ( k ) ] 2 g ( k ) + h ( k ) {\displaystyle C_{S}={\frac {1}{2}}\sum _{k=1}^{K}{\frac {[g(k)-h(k)]^{2}}{g(k)+h(k)}}} The values of this range from 0 to 1. In addition to the shape context cost, an extra cost based on the appearance can be added. For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition): C A = 1 2 ‖ ( cos ⁡ ( θ 1 ) sin ⁡ ( θ 1 ) ) − ( cos ⁡ ( θ 2 ) sin ⁡ ( θ 2 ) ) ‖ {\displaystyle C_{A}={\frac {1}{2}}{\begin{Vmatrix}{\dbinom {\cos(\theta _{1})}{\sin(\theta _{1})}}-{\dbinom {\cos(\theta _{2})}{\sin(\theta _{2})}}\end{Vmatrix}}} This is half the length of the chord in unit circle between the unit vectors with angles θ 1 {\displaystyle \theta _{1}} and θ 2 {\displaystyle \theta _{2}} . Its values also range from 0 to 1. Now the total cost of matching the two points could be a weighted-sum of the two costs: C = ( 1 − β ) C S + β C A {\displaystyle C=(1-\beta )C_{S}+\beta C_{A}\!\,} Now for each point pi on the first shape and a point qj on the second shape, calculate the cost as described and call it Ci,j. This is the cost matrix. === Step 4: Finding the matching that minimizes total cost === Now, a one-to-one matching π ( i ) {\displaystyle \pi (i)} that matches each point pi on shape 1 and qj on shape 2 that minimizes the total cost of matching, H ( π ) = ∑ i C ( p i , q π ( i ) ) {\displaystyle H(\pi )=\sum _{i}C\left(p_{i},q_{\pi (i)}\right)} is needed. This can be done in O ( N 3 ) {\displaystyle O(N^{3})} time using the Hungarian method, although there are more efficient algorithms. To have robust handling of outliers, one can add "dummy" nodes that have a constant but reasonably large cost of matching to the cost matrix. This would cause the matching algorithm to match outliers to a "dummy" if there is no real match. === Step 5: Modeling transformation === Given the set of correspondences between a finite set of points on the two shapes, a transformation T : R 2 → R 2 {\displaystyle T:\mathbb {R} ^{2}\to \mathbb {R} ^{2}} can be estimated to map any point from one shape to the other. There are several choices for this transformation, described below. ==== Affine ==== The affine model is a standard choice: T ( p ) = A p + o {\displaystyle T(p)=Ap+o\!} . The least squares solution for the matrix A {\displaystyle A} and the translational offset vector o is obtained by: o = 1 n ∑ i = 1 n ( p i − q π ( i ) ) , A = ( Q + P ) t {\displaystyle o={\frac {1}{n}}\sum _{i=1}^{n}\left(p_{i}-q_{\pi (i)}\right),A=(Q^{+}P)^{t}} Where P = ( 1 p 11 p 12 ⋮ ⋮ ⋮ 1 p n 1 p n 2 ) {\displaystyle P={\begin{pmatrix}1&p_{11}&p_{12}\\\vdots &\vdots &\vdots \\1&p_{n1}&p_{n2}\end{pmatrix}}} with a similar expression for Q {\displaystyle Q\!} . Q + {\displaystyle Q^{+}\!} is the pseudoinverse of Q {\displaystyle Q\!} . ==== Thin plate spline ==== The thin plate spline (TPS) model is the most widely used model for transformations when working with shape contexts. A 2D transformation can be separated into two TPS function to model a coordinate transform: T ( x , y ) = ( f x ( x , y ) , f y ( x , y ) ) {\displaystyle T(x,y)=\left(f_{x}(x,y),f_{y}(x,y)\right)} where each of the ƒx and ƒy have the form: f ( x , y ) = a 1 + a x x + a y y + ∑ i = 1 n ω i U ( ‖ ( x i , y i ) − ( x , y ) ‖ ) , {\displaystyle f(x,y)=a_{1}+a_{x}x+a_{y}y+\sum _{i=1}^{n}\omega _{i}U\left({\begin{Vmatrix}(x_{i},y_{i})-(x,y)\end{Vmatrix}}\right),} and the kernel function U ( r ) {\displaystyle U(r)\!} is defined by U ( r ) = r 2 log ⁡ r 2 {\displaystyle U(r)=r^{2}\log r^{2}\!} . The exact details of how to solve for the parameters can be found elsewhere but it essentially involves solving a linear system of equations. The bending energy (a measure of how much transformation is needed to align the points) will also be easily obtained. ==== Regularized TPS ==== The TPS formulation above has exact matching requirement for the pairs of points on the two shapes. For noisy data, it is best to

    Read more →
  • Scale space implementation

    Scale space implementation

    In the areas of computer vision, image analysis and signal processing, the notion of scale-space representation is used for processing measurement data at multiple scales, and specifically enhance or suppress image features over different ranges of scale (see the article on scale space). A special type of scale-space representation is provided by the Gaussian scale space, where the image data in N dimensions is subjected to smoothing by Gaussian convolution. Most of the theory for Gaussian scale space deals with continuous images, whereas one when implementing this theory will have to face the fact that most measurement data are discrete. Hence, the theoretical problem arises concerning how to discretize the continuous theory while either preserving or well approximating the desirable theoretical properties that lead to the choice of the Gaussian kernel (see the article on scale-space axioms). This article describes basic approaches for this that have been developed in the literature, see also for an in-depth treatment regarding the topic of approximating the Gaussian smoothing operation and the Gaussian derivative computations in scale-space theory, and for a complementary treatment regarding hybrid discretization methods. == Statement of the problem == The Gaussian scale-space representation of an N-dimensional continuous signal, f C ( x 1 , ⋯ , x N , t ) , {\displaystyle f_{C}\left(x_{1},\cdots ,x_{N},t\right),} is obtained by convolving fC with an N-dimensional Gaussian kernel: g N ( x 1 , ⋯ , x N , t ) . {\displaystyle g_{N}\left(x_{1},\cdots ,x_{N},t\right).} In other words: L ( x 1 , ⋯ , x N , t ) = ∫ u 1 = − ∞ ∞ ⋯ ∫ u N = − ∞ ∞ f C ( x 1 − u 1 , ⋯ , x N − u N , t ) ⋅ g N ( u 1 , ⋯ , u N , t ) d u 1 ⋯ d u N . {\displaystyle L\left(x_{1},\cdots ,x_{N},t\right)=\int _{u_{1}=-\infty }^{\infty }\cdots \int _{u_{N}=-\infty }^{\infty }f_{C}\left(x_{1}-u_{1},\cdots ,x_{N}-u_{N},t\right)\cdot g_{N}\left(u_{1},\cdots ,u_{N},t\right)\,du_{1}\cdots du_{N}.} However, for implementation, this definition is impractical, since it is continuous. When applying the scale space concept to a discrete signal fD, different approaches can be taken. This article is a brief summary of some of the most frequently used methods. == Separability == Using the separability property of the Gaussian kernel g N ( x 1 , … , x N , t ) = G ( x 1 , t ) ⋯ G ( x N , t ) {\displaystyle g_{N}\left(x_{1},\dots ,x_{N},t\right)=G\left(x_{1},t\right)\cdots G\left(x_{N},t\right)} the N-dimensional convolution operation can be decomposed into a set of separable smoothing steps with a one-dimensional Gaussian kernel G along each dimension L ( x 1 , ⋯ , x N , t ) = ∫ u 1 = − ∞ ∞ ⋯ ∫ u N = − ∞ ∞ f C ( x 1 − u 1 , ⋯ , x N − u N , t ) G ( u 1 , t ) d u 1 ⋯ G ( u N , t ) d u N , {\displaystyle L(x_{1},\cdots ,x_{N},t)=\int _{u_{1}=-\infty }^{\infty }\cdots \int _{u_{N}=-\infty }^{\infty }f_{C}(x_{1}-u_{1},\cdots ,x_{N}-u_{N},t)G(u_{1},t)\,du_{1}\cdots G(u_{N},t)\,du_{N},} where G ( x , t ) = 1 2 π t e − x 2 2 t {\displaystyle G(x,t)={\frac {1}{\sqrt {2\pi t}}}e^{-{\frac {x^{2}}{2t}}}} and the standard deviation of the Gaussian σ is related to the scale parameter t according to t = σ2. Separability will be assumed in all that follows, even when the kernel is not exactly Gaussian, since separation of the dimensions is the most practical way to implement multidimensional smoothing, especially at larger scales. Therefore, the rest of the article focuses on the one-dimensional case. == The sampled Gaussian kernel == When implementing the one-dimensional smoothing step in practice, the presumably simplest approach is to convolve the discrete signal fD with a sampled Gaussian kernel: L ( x , t ) = ∑ n = − ∞ ∞ f ( x − n ) G ( n , t ) {\displaystyle L(x,t)=\sum _{n=-\infty }^{\infty }f(x-n)\,G(n,t)} where G ( n , t ) = 1 2 π t e − n 2 2 t {\displaystyle G(n,t)={\frac {1}{\sqrt {2\pi t}}}e^{-{\frac {n^{2}}{2t}}}} (with t = σ2) which in turn is truncated at the ends to give a filter with finite impulse response L ( x , t ) = ∑ n = − M M f ( x − n ) G ( n , t ) {\displaystyle L(x,t)=\sum _{n=-M}^{M}f(x-n)\,G(n,t)} for M chosen sufficiently large (see error function) such that 2 ∫ M ∞ G ( u , t ) d u = 2 ∫ M t ∞ G ( v , 1 ) d v < ε . {\displaystyle 2\int _{M}^{\infty }G(u,t)\,du=2\int _{\frac {M}{\sqrt {t}}}^{\infty }G(v,1)\,dv<\varepsilon .} A common choice is to set M to a constant C times the standard deviation of the Gaussian kernel M = C σ + 1 = C t + 1 {\displaystyle M=C\sigma +1=C{\sqrt {t}}+1} where C is often chosen somewhere between 3 and 6. Using the sampled Gaussian kernel can, however, lead to implementation problems, in particular when computing higher-order derivatives at finer scales by applying sampled derivatives of Gaussian kernels. When accuracy and robustness are primary design criteria, alternative implementation approaches should therefore be considered. For small values of ε (10−6 to 10−8) the errors introduced by truncating the Gaussian are usually negligible. For larger values of ε, however, there are many better alternatives to a rectangular window function. For example, for a given number of points, a Hamming window, Blackman window, or Kaiser window will do less damage to the spectral and other properties of the Gaussian than a simple truncation will. Notwithstanding this, since the Gaussian kernel decreases rapidly at the tails, the main recommendation is still to use a sufficiently small value of ε such that the truncation effects are no longer important. == The discrete Gaussian kernel == A more refined approach is to convolve the original signal with the discrete Gaussian kernel T(n, t) L ( x , t ) = ∑ n = − ∞ ∞ f ( x − n ) T ( n , t ) {\displaystyle L(x,t)=\sum _{n=-\infty }^{\infty }f(x-n)\,T(n,t)} where T ( n , t ) = e − t I n ( t ) {\displaystyle T(n,t)=e^{-t}I_{n}(t)} and I n ( t ) {\displaystyle I_{n}(t)} denotes the modified Bessel functions of integer order, n. This is the discrete counterpart of the continuous Gaussian in that it is the solution to the discrete diffusion equation (discrete space, continuous time), just as the continuous Gaussian is the solution to the continuous diffusion equation. This filter can be truncated in the spatial domain as for the sampled Gaussian L ( x , t ) = ∑ n = − M M f ( x − n ) T ( n , t ) {\displaystyle L(x,t)=\sum _{n=-M}^{M}f(x-n)\,T(n,t)} or can be implemented in the Fourier domain using a closed-form expression for its discrete-time Fourier transform: T ^ ( θ , t ) = ∑ n = − ∞ ∞ T ( n , t ) e − i θ n = e t ( cos ⁡ θ − 1 ) . {\displaystyle {\widehat {T}}(\theta ,t)=\sum _{n=-\infty }^{\infty }T(n,t)\,e^{-i\theta n}=e^{t(\cos \theta -1)}.} With this frequency-domain approach, the scale-space properties transfer exactly to the discrete domain, or with excellent approximation using periodic extension and a suitably long discrete Fourier transform to approximate the discrete-time Fourier transform of the signal being smoothed. Moreover, higher-order derivative approximations can be computed in a straightforward manner (and preserving scale-space properties) by applying small support central difference operators to the discrete scale space representation. As with the sampled Gaussian, a plain truncation of the infinite impulse response will in most cases be a sufficient approximation for small values of ε, while for larger values of ε it is better to use either a decomposition of the discrete Gaussian into a cascade of generalized binomial filters or alternatively to construct a finite approximate kernel by multiplying by a window function. If ε has been chosen too large such that effects of the truncation error begin to appear (for example as spurious extrema or spurious responses to higher-order derivative operators), then the options are to decrease the value of ε such that a larger finite kernel is used, with cutoff where the support is very small, or to use a tapered window. == Recursive filters == Since computational efficiency is often important, low-order recursive filters are often used for scale-space smoothing. For example, Young and van Vliet use a third-order recursive filter with one real pole and a pair of complex poles, applied forward and backward to make a sixth-order symmetric approximation to the Gaussian with low computational complexity for any smoothing scale. By relaxing a few of the axioms, Lindeberg concluded that good smoothing filters would be "normalized Pólya frequency sequences", a family of discrete kernels that includes all filters with real poles at 0 < Z < 1 and/or Z > 1, as well as with real zeros at Z < 0. For symmetry, which leads to approximate directional homogeneity, these filters must be further restricted to pairs of poles and zeros that lead to zero-phase filters. To match the transfer function curvature at zero frequency of the discrete Gaussian, which ensures an approximate semi-group property of additive t, two poles at Z = 1 + 2 t − ( 1 + 2 t ) 2 − 1 {\displaystyle

    Read more →
  • Attensity

    Attensity

    Attensity was an American company that provided social analytics and engagement applications for social customer relationship management (social CRM). Attensity's text analytics software applications extracted facts, relationships and sentiment from unstructured data. == History == Attensity was founded in 2000. An early investor in Attensity was In-Q-Tel, which funds technology to support the missions of the US Government and the broader DOD. InTTENSITY, an independent company that has combined Inxight with Attensity Software (the only joint development project that combines two InQTel funded software packages), was the exclusive distributor and outlet for Attensity in the Federal Market. In 2009, Attensity Corp., then based in Palo Alto, merged with Germany's Empolis and Living-e AG to form Attensity Group. In 2010, Attensity Group acquired Biz360, a provider of social media monitoring and market intelligence solutions. In early 2012, Attensity Group divested itself of the Empolis business unit via a management buyout; that unit currently conducts business under its pre-merger name. Attensity Group was a closely held private company. Its majority shareholder was Aeris Capital, a private Swiss investment office advising a high-net-worth individual and his charitable foundation. Foundation Capital, Granite Ventures, and Scale Venture Partners were among Biz360's investors and thus became shareholders in Attensity Group. In February 2016, Attensity's IP assets were acquired by InContact, and Attensity closed.

    Read more →
  • Apertus (LLM)

    Apertus (LLM)

    Apertus is a public large language model, developed by the Swiss AI Initiative (a collaboration between EPFL, ETH Zurich, and the Swiss National Supercomputing Centre). It was released on September 2, 2025, under the free and open-source Apache 2.0 license. Designed initially for business and research use cases around the world, Apertus was trained on over 1800 languages, and comes in 8 billion or 70 billion parameter versions and is available on Hugging Face for download. The model was developed aiming to adhere to European copyright law, and is one of the first examples of AI as a public good in the vein of AI Sovereignty. It is also the first large model to comply with the European Union's Artificial Intelligence Act. At its launch, the model creators emphasized multilinguality, transparency, and auditability as priorities in contrast to commercial frontier model. While international reception was largely positive, the first iteration was significantly behind the capabilities of frontier models and needs adaptation for many use cases with chatbots being a secondary but not a primary use case. As of late 2025, it was considered the largest and most capable fully open model. The capability of future models will depend in part on how much more funding can be secured.

    Read more →
  • Vulnerabilities Equities Process

    Vulnerabilities Equities Process

    The Vulnerabilities Equities Process (VEP) is a process used by the U.S. federal government to determine on a case-by-case basis how it should treat zero-day computer security vulnerabilities: whether to disclose them to the public to help improve general computer security, or to keep them secret for offensive use against the government's adversaries. The VEP was first developed during the period 2008–2009, but only became public in 2016, when the government released a redacted version of the VEP in response to a FOIA request by the Electronic Frontier Foundation. Following public pressure for greater transparency in the wake of the Shadow Brokers affair, the U.S. government made a more public disclosure of the VEP process in November 2017. == Participants == According to the VEP plan published in 2017, the Equities Review Board (ERB) is the primary forum for interagency deliberation and determinations concerning the VEP. The ERB meets monthly, but may also be convened sooner if an immediate need arises. The ERB consists of representatives from the following agencies: Office of Management and Budget Office of the Director of National Intelligence (including the Intelligence Community-Security Coordination Center) United States Department of the Treasury United States Department of State United States Department of Justice (including the Federal Bureau of Investigation and the National Cyber Investigative Joint Task Force) Department of Homeland Security (including the National Cybersecurity and Communications Integration Center and the United States Secret Service) United States Department of Energy United States Department of Defense (to include the National Security Agency, including Information Assurance and Signals Intelligence elements), United States Cyber Command, and DoD Cyber Crime Center) United States Department of Commerce Central Intelligence Agency The National Security Agency serves as the executive secretariat for the VEP. == Process == According to the November 2017 version of the VEP, the process is as follows: === Submission and notification === When an agency finds a vulnerability, it will notify the VEP secretariat as soon as is possible. The notification will include a description of the vulnerability and the vulnerable products or systems, together with the agency's recommendation to either disseminate or restrict the vulnerability information. The secretariat will then notify all participants of the submission within one business day, requesting them to respond if they have an relevant interest. === Equity and discussions === An agency expressing an interest must indicate whether it concurs with the original recommendation to disseminate or restrict within five business days. If it does not, it will hold discussions with the submitting agency and the VEP secretariat within seven business days to attempt to reach consensus. If no consensus is reached, the participants will suggest options for the Equities Review Board. === Determination to disseminate or restrict === Decisions whether to disclose or restrict a vulnerability should be made quickly, in full consultation with all concerned agencies, and in the overall best interest of the competing interests of the missions of the U.S. government. As far as possible, determinations should be based on rational, objective methodologies, taking into account factors such as prevalence, reliance, and severity. If the review board members cannot reach consensus, they will vote on a preliminary determination. If an agency with an equity disputes that decision, they may, by providing notice to the VEP secretariat, elect to contest the preliminary determination. If no agency contests a preliminary determination, it will be treated as a final decision. === Handling and follow-on actions === If vulnerability information is released, this will be done as quickly as possible, preferably within seven business days. Disclosure of vulnerabilities will be conducted according to guidelines agreed on by all members. The submitting agency is presumed to be most knowledgeable about the vulnerability and, as such, will be responsible for disseminating vulnerability information to the vendor. The submitting agency may elect to delegate dissemination responsibility to another agency on its behalf. The releasing agency will promptly provide a copy of the disclosed information to the VEP secretariat for record keeping. Additionally, the releasing agency is expected to follow up so the ERB can determine whether the vendor's action meets government requirements. If the vendor chooses not to address a vulnerability, or is not acting with urgency consistent with the risk of the vulnerability, the releasing agency will notify the secretariat, and the government may take other mitigation steps. == Criticism == The VEP process has been criticized for a number of deficiencies, including restriction by non-disclosure agreements, lack of risk ratings, special treatment for the NSA, and less than whole-hearted commitment to disclosure as the default option. == UK equivalent == British intelligence agencies—GCHQ in particular—follow a similar approach, also known as the Equities Process, to determine whether to disclose or retain security vulnerabilities. The Investigatory Powers Act 2016 was amended in 2022 to bring oversight of the operation of the process within the remit of the Investigatory Powers Commissioner. Details of the process were made public in 2018.

    Read more →
  • Airfair

    Airfair

    AirFair was a mobile travel application that checks flights, and shows whether a traveler is owed compensation. == History == AirFair was developed in 2016 by Allay Logic Ltd; a Newcastle-based tech-company. == Services == AirFair offered a free flight check to see if compensation is owed. The app could indicate how much the person is owed within minutes whether the flight was delayed, cancelled or the traveler is refused boarding.

    Read more →
  • Mean shift

    Mean shift

    Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. == History == The mean shift procedure is usually credited to work by Fukunaga and Hostetler in 1975. It is, however, reminiscent of earlier work by Schnell in 1964. == Overview == Mean shift is a procedure for locating the maxima—the modes—of a density function given discrete data sampled from that function. This is an iterative method, and we start with an initial estimate x {\displaystyle x} . Let a kernel function K ( x i − x ) {\displaystyle K(x_{i}-x)} be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a Gaussian kernel on the distance to the current estimate is used, K ( x i − x ) = e − c | | x i − x | | 2 {\displaystyle K(x_{i}-x)=e^{-c||x_{i}-x||^{2}}} . The weighted mean of the density in the window determined by K {\displaystyle K} is m ( x ) = ∑ x i ∈ N ( x ) K ( x i − x ) x i ∑ x i ∈ N ( x ) K ( x i − x ) {\displaystyle m(x)={\frac {\sum _{x_{i}\in N(x)}K(x_{i}-x)x_{i}}{\sum _{x_{i}\in N(x)}K(x_{i}-x)}}} where N ( x ) {\displaystyle N(x)} is the neighborhood of x {\displaystyle x} , a set of points for which K ( x i − x ) ≠ 0 {\displaystyle K(x_{i}-x)\neq 0} . The difference m ( x ) − x {\displaystyle m(x)-x} is called mean shift in Fukunaga and Hostetler. The mean-shift algorithm now sets x ← m ( x ) {\displaystyle x\leftarrow m(x)} , and repeats the estimation until m ( x ) {\displaystyle m(x)} converges. Although the mean shift algorithm has been widely used in many applications, a rigid proof for the convergence of the algorithm using a general kernel in a high dimensional space is still not known. Aliyari Ghassabeh showed the convergence of the mean shift algorithm in one dimension with a differentiable, convex, and strictly decreasing profile function. However, the one-dimensional case has limited real world applications. Also, the convergence of the algorithm in higher dimensions with a finite number of the stationary (or isolated) points has been proved. However, sufficient conditions for a general kernel function to have finite stationary (or isolated) points have not been provided. Gaussian Mean-Shift is an Expectation–maximization algorithm. == Details == Let data be a finite set S {\displaystyle S} embedded in the n {\displaystyle n} -dimensional Euclidean space, X {\displaystyle X} . Let K {\displaystyle K} be a flat kernel that is the characteristic function of the λ {\displaystyle \lambda } -ball in X {\displaystyle X} , In each iteration of the algorithm, s ← m ( s ) {\displaystyle s\leftarrow m(s)} is performed for all s ∈ S {\displaystyle s\in S} simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h {\displaystyle h} , where x i {\displaystyle x_{i}} are the input samples and k ( r ) {\displaystyle k(r)} is the kernel function (or Parzen window). h {\displaystyle h} is the only parameter in the algorithm and is called the bandwidth. This approach is known as kernel density estimation or the Parzen window technique. Once we have computed f ( x ) {\displaystyle f(x)} from the equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this "brute force" approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f ( x ) {\displaystyle f(x)} over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as multiple restart gradient descent. Starting at some guess for a local maximum, y k {\displaystyle y_{k}} , which can be a random input data point x 1 {\displaystyle x_{1}} , mean shift computes the gradient of the density estimate f ( x ) {\displaystyle f(x)} at y k {\displaystyle y_{k}} and takes an uphill step in that direction. == Types of kernels == Kernel definition: Let X {\displaystyle X} be the n {\displaystyle n} -dimensional Euclidean space, R n {\displaystyle \mathbb {R} ^{n}} . The norm of x {\displaystyle x} is a non-negative number, ‖ x ‖ 2 = x ⊤ x ≥ 0 {\displaystyle \|x\|^{2}=x^{\top }x\geq 0} . A function K : X → R {\displaystyle K:X\rightarrow \mathbb {R} } is said to be a kernel if there exists a profile, k : [ 0 , ∞ ] → R {\displaystyle k:[0,\infty ]\rightarrow \mathbb {R} } , such that K ( x ) = k ( ‖ x ‖ 2 ) {\displaystyle K(x)=k(\|x\|^{2})} and k is non-negative. k is non-increasing: k ( a ) ≥ k ( b ) {\displaystyle k(a)\geq k(b)} if a < b {\displaystyle a Read more →

  • DreamLab

    DreamLab

    DreamLab was a volunteer computing Android and iOS app launched in 2015 by Imperial College London and the Vodafone Foundation. It was discontinued on 2nd April 2025. == Description == The app helped to research cancer, COVID-19, new drugs and tropical cyclones. To do this, DreamLab accessed part of the device's processing power, with the user's consent, while the owner charged their smartphone, to speed up the calculations of the algorithms from Imperial College London. The aim of the tropical cyclone project was to prepare for climate change risks. Other projects aimed to find existing drugs and food molecules that could help people with COVID-19 and other diseases. The performance of 100,000 smartphones would reach the annual output of all research computers at Imperial College in just three months, with a nightly runtime of six hours. The app was developed in 2015 by the Garvan Institute of Medical Research in Sydney and the Vodafone Foundation. In May 2020, the project had over 490,000 registered users.

    Read more →
  • SMART Health Card

    SMART Health Card

    The SMART Health Card framework is an open source immunity passport program designed to store and share medical information in paper or digital form. It was initially launched as a vaccine passport during the COVID-19 pandemic, but is envisioned for use for other infectious diseases. SMART Health Cards include a QR code which can be scanned and verified using the official SMART Health Card Verifier mobile app, supported by Apple and Android. It was rolled out by the Vaccination Credential Initiative (VCI) based on technology developed at Boston Children's Hospital, and standards set by Health Level Seven International (HL7) and the World Wide Web Consortium (W3C). It is recognized by the International Organization for Standardization. == History == === Founding === In February 2009, United States president Barack Obama signed an economic stimulus package which included $19 billion in funds for investment in health information technology. The following month, researchers from Boston Children's Hospital and Harvard Medical School, Kenneth Mandl and Isaac Kohane, published an article in The New England Journal of Medicine calling for the modernization of electronic health records through API integrations on mobile devices. In April 2010, the pair secured a $15 million grant through the Office of the National Coordinator for Health Information Technology's Strategic Health IT Advanced Research Projects (SHARP) program. With this federal funding, the researchers began development of an interoperable healthcare IT platform they called "Substitutable Medical Applications and Reusable Technologies" (SMART). The first iteration of the platform API was previewed later that year, and "SMART Classic" was released in 2011. In 2013, SMART adopted the open-source Fast Health Interoperability Resources (FHIR) standard developed by Health Level Seven International (HL7). The newly named SMART on FHIR platform was debuted in February 2014 at the Health Information Management Systems Society conference. === 21st Century Cures Act === According to SMART Health IT, Mandl successfully lobbied for the inclusion of a universal API requirement in the 21st Century Cures Act, signed into law on December 13, 2016. The team also advocated for a federal rule establishing SMART as the universal API. In 2019, the Office of the National Coordinator for Health Information Technology published the "final rule" specifying the SMART framework as the standard to satisfy the requirements of the 21st Century Cures Act; the rule was implemented in June 2020. === COVID-19 === The SMART Health Card framework was deployed as a "de facto standard" for vaccine passports in the COVID-19 pandemic in the United States and other international jurisdictions. On January 14, 2021, the Mitre Corporation announced the launch of a new public–private partnership called the Vaccination Credential Initiative (VCI) alongside the CARIN Alliance, Cerner, Change Healthcare, The Commons Project Foundation, Epic Systems, Evernorth, Mayo Clinic, Microsoft, Oracle, Safe Health, and Salesforce. VCI's purpose was to employ the SMART Health Card framework in order to create a unified proof-of-vaccination system for COVID-19 vaccines.The California Department of Public Health introduced a Digital Covid-19 Vaccine Record portal in June 2021, allowing individuals to verify their vaccination status using the SMART Health Card reader. On August 5, 2021, New York Governor Andrew Cuomo announced the introduction of the "Excelsior Pass Plus" which would expand its Excelsior Pass program into other states and internationally by connecting it to the SMART Health Card system. As of August 27, 2021, 415,000 citizens of Louisiana had added their COVID-19 vaccination status to their state-run, SMART Health Card enabled LA Wallet. On September 8, 2021, Hawaii governor David Ige announced the rollout of the state's Hawaiʻi SMART Health Card. County-level health departments across the United States partnered with VaccineCheck to issue SMART Health Cards by verifying vaccine cards provided by the Centers for Disease Control and Prevention. The Government of Canada spent CAD$4.6 million to develop a proof-of-vaccination credential on the SMART Health Card framework, enabling its ArriveCAN travel application to store, recognize and verify credentials from every province, territory and foreign country. Since October 2021, Canadian provinces and territories used the SMART Health Card format as a requirement by the federal government, including British Columbia, Newfoundland and Labrador, the Northwest Territories, Nova Scotia, Nunavut, Ontario, Quebec, Saskatchewan and the Yukon. On October 13, 2021, the American Immunization Registry Association (AIRA) published a statement encouraging adoption of SMART Health Cards as a common standard "where allowed by local law and policy." "SMARTHealth.Cards" was listed as a supporting member of AIRA through the VCI. A SMART Health Cards Global Forum was held on October 28, 2021. The event featured keynote speakers Andy Slavitt (former Senior Pandemic Advisor to President Joe Biden’s COVID-19 pandemic response team) and Mike Leavitt (former United States Secretary of Health and Human Services). On December 20, 2021, Japan's Ministry of Health, Labour and Welfare launched its COVID-19 Vaccination Certificate Application using the SMART Health Card. By January 2022, about 80% of Americans who had received a COVID-19 vaccine had access to a SMART Health Card through their state governments, local businesses, universities and healthcare systems. == Participants == === Developers === SMART Health IT is based out of the Computational Health Informatics Program (CHIP) at the Boston Children's Hospital. CHIP's related projects include Apache cTAKES, Genomic Information Commons, HealthMap, and VaccineFinder. The SMART Health Card's project sponsor is HL7 International's Public Health Work Group, consisting of representatives from Allscripts, the Altarum Institute, Tennessee Department of Health and Washington State Department of Health. === Issuers === Official registries of authorized SMART Health Card issuers are maintained by SMART Health IT, the Vaccination Credential Initiative, and the CommonTrust Network. Authorized issuers include:

    Read more →
  • Albert One

    Albert One

    Albert One is an artificial intelligence chatbot created by Robby Garner and designed to mimic the way humans make conversations using a multi-faceted approach in natural language programming. == History == In both 1998 and 1999, Albert One won the Loebner Prize Contest, a competition between chatterbots. Some parts of Albert were deployed on the internet beginning in 1995, to gather information about what kinds of things people would say to a chatterbot. Another element of Albert One involved the building of a large database of human statements, and associated replies. This portion of the project was tested at the 1994-1997 Loebner Prize contests. Albert was the first of Robby Garner's multifaceted bots. The Albert One system was composed of several subsystems. Among those were a version of Eliza, the therapist, Elivs, another Eliza-like bot, and several other helper applications working together in a hierarchical arrangement. As a continuation of the stimulus-response library, various other database queries and assertions were tested to arrive at each of Albert's responses. Robby went on to develop networked examples of this kind of hierarchical "glue" at The Turing Hub.

    Read more →
  • Kruti

    Kruti

    Kruti is a multilingual AI agent and chatbot developed by the Indian company Ola Krutrim. It is designed to perform real-world tasks for users, such as booking taxis and ordering food, by integrating directly with various online services. It is notable for its ability to understand and respond in multiple Indian languages. Developed by a team founded by Bhavish Aggarwal, Kruti functions as an "agentic" AI, meaning it can reason, plan, and execute multi-step tasks to fulfill a user's request. The backend technology combines several open-source large language models with Ola's proprietary Krutrim V2 model. The system was developed to work primarily on smartphones, addressing the Indian market's specific needs, including language diversity and potential bandwidth constraints. Kruti was officially released in June 2025, replacing an earlier chatbot from the company that was also named Krutrim. Initially supporting 13 languages, the company plans to expand its capabilities to 22 Indian languages. == Background == Kruti is an improved version of Ola's Krutrim chatbot, which was first launched in 2023 and was intended to be replaced by Kruti. It was officially released on 12 June 2025 as an upgrade to passive chatbots, with support for text and voice in 13 Indian languages. As an agentic AI, it can execute tasks with customization and reasoning, providing adaptive answers based on user preferences and past interactions. Kruti is optimized for smartphone usage and designed to accommodate bandwidth constraints and usage patterns in India. To ensure scalability and cost-effective performance, it combines various open-source large language models with Ola's own Krutrim V2, which has 12 billion parameters. Its speech recognition is built to identify regional Indian languages, dialects, and accents. Due to its integration with numerous apps and services, Kruti is context-aware and can proactively complete tasks. Initially connected only with Ola ecosystem services, Krutrim intends to expand and incorporate various Indian services into Kruti, with the goal of adding services from Blinkit, Swiggy, and Uber with respective voice command support. On 20 June 2025, Krutrim acquired the AI platform BharatSah‘AI’yak to increase its involvement in government, education, and agriculture projects. This acquisition will allow Kruti to assist in broadening the scope of BharatSah'AI'yak's work on India-centric, vernacular retrieval-augmented generation AI bots. == Development == Kruti is designed to perform tasks with minimal user input, accepting documents, images, and text, without requiring users to switch between applications. Its agentic framework breaks queries into sub-tasks executed by multiple agents working sequentially or concurrently, with reported accuracy exceeding 90%. Kruti connects to company databases and APIs via the Model Context Protocol and presents responses as summaries, tables, or narratives adapted to user behaviour. The system supports payments via credit/debit cards and UPI. The underlying stack, which includes foundation models and AI training and inference systems, is intended to support adaptation across sectors such as healthcare, education, and finance. Ola Cabs and the Open Network for Digital Commerce have begun integrating Kruti into their platforms pending broader reliability testing.

    Read more →