AI Data Explorer Servicenow

AI Data Explorer Servicenow — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Deep image prior

    Deep image prior

    Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18

    Read more →
  • Grammar induction

    Grammar induction

    Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of re-write rules or productions or alternatively as a finite-state machine or automaton of some kind) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs. == Grammar classes == Grammatical inference has often been very focused on the problem of learning finite-state machines of various types (see the article Induction of regular languages for details on these approaches), since there have been efficient algorithms for this problem since the 1980s. Since the beginning of the century, these approaches have been extended to the problem of inference of context-free grammars and richer formalisms, such as multiple context-free grammars and parallel multiple context-free grammars. Other classes of grammars for which grammatical inference has been studied are combinatory categorial grammars, stochastic context-free grammars, contextual grammars and pattern languages. == Learning models == The simplest form of learning is where the learning algorithm merely receives a set of examples drawn from the language in question: the aim is to learn the language from examples of it (and, rarely, from counter-examples, that is, example that do not belong to the language). However, other learning models have been studied. One frequently studied alternative is the case where the learner can ask membership queries as in the exact query learning model or minimally adequate teacher model introduced by Angluin. == Methodologies == There is a wide variety of methods for grammatical inference. Two of the classic sources are Fu (1977) and Fu (1982). Duda, Hart & Stork (2001) also devote a brief section to the problem, and cite a number of references. The basic trial-and-error method they present is discussed below. For approaches to infer subclasses of regular languages in particular, see Induction of regular languages. A more recent textbook is de la Higuera (2010), which covers the theory of grammatical inference of regular languages and finite state automata. D'Ulizia, Ferri and Grifoni provide a survey that explores grammatical inference methods for natural languages. === Induction of probabilistic grammars === There are several methods for induction of probabilistic context-free grammars. === Grammatical inference by trial-and-error === The method proposed in Section 8.7 of Duda, Hart & Stork (2001) suggests successively guessing grammar rules (productions) and testing them against positive and negative observations. The rule set is expanded so as to be able to generate each positive example, but if a given rule set also generates a negative example, it must be discarded. This particular approach can be characterized as "hypothesis testing" and bears some similarity to Mitchel's version space algorithm. The Duda, Hart & Stork (2001) text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious. === Grammatical inference by genetic algorithms === Grammatical induction using evolutionary algorithms is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammars can easily be represented as tree structures of production rules that can be subjected to evolutionary operators. Algorithms of this sort stem from the genetic programming paradigm pioneered by John Koza. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF language made trees a more flexible approach. Koza represented Lisp programs as trees. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions of the Lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction. In the case of grammar induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase or a verb phrase) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal. === Grammatical inference by greedy algorithms === Like all greedy algorithms, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage. The decisions made usually deal with things like the creation of new rules, the removal of existing rules, the choice of a rule to be applied or the merging of some existing rules. Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms. These context-free grammar generating algorithms make the decision after every read symbol: Lempel-Ziv-Welch algorithm creates a context-free grammar in a deterministic way such that it is necessary to store only the start rule of the generated grammar. Sequitur and its modifications. These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions: Byte pair encoding and its optimizations. === Distributional learning === A more recent approach is based on distributional learning. Algorithms using these approaches have been applied to learning context-free grammars and mildly context-sensitive languages and have been proven to be correct and efficient for large subclasses of these grammars. === Learning of pattern languages === Angluin defines a pattern to be "a string of constant symbols from Σ and variable symbols from a disjoint set". The language of such a pattern is the set of all its nonempty ground instances i.e. all strings resulting from consistent replacement of its variable symbols by nonempty strings of constant symbols. A pattern is called descriptive for a finite input set of strings if its language is minimal (with respect to set inclusion) among all pattern languages subsuming the input set. Angluin gives a polynomial algorithm to compute, for a given input string set, all descriptive patterns in one variable x. To this end, she builds an automaton representing all possibly relevant patterns; using sophisticated arguments about word lengths, which rely on x being the only variable, the state count can be drastically reduced. Erlebach et al. give a more efficient version of Angluin's pattern learning algorithm, as well as a parallelized version. Arimura et al. show that a language class obtained from limited unions of patterns can be learned in polynomial time. === Pattern theory === Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to: Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time. Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. Study the randomness and variability of these graphs. Create the basic classes of stochastic models applied by listing the deformations of the patterns. Synthesize (sample) from the models, not just analyze signals with it. Broad in its mathematical coverage, pattern theory spans algebra and statistics, as well as local topological and global entropic properties. == Applications == The principle of grammar induction has been applied to other aspects of natural language processing, and has been applied (among many other problems) to semantic parsing, natural language understanding, example-based translation, language acquisition, grammar-based compre

    Read more →
  • Vector database

    Vector database

    A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==

    Read more →
  • Latent semantic mapping

    Latent semantic mapping

    Latent semantic mapping (LSM) is a data-driven framework to model globally meaningful relationships implicit in large volumes of (often textual) data. It is a generalization of latent semantic analysis. In information retrieval, LSA enables retrieval on the basis of conceptual content, instead of merely matching words between queries and documents. LSM was derived from earlier work on latent semantic analysis. There are 3 main characteristics of latent semantic analysis: Discrete entities, usually in the form of words and documents, are mapped onto continuous vectors, the mapping involves a form of global correlation pattern, and dimensionality reduction is an important aspect of the analysis process. These constitute generic properties, and have been identified as potentially useful in a variety of different contexts. This usefulness has encouraged great interest in LSM. The intended product of latent semantic mapping, is a data-driven framework for modeling relationships in large volumes of data. Mac OS X v10.5 and later includes a framework implementing latent semantic mapping.

    Read more →
  • Co-occurrence matrix

    Co-occurrence matrix

    A co-occurrence matrix or co-occurrence distribution (also referred to as : gray-level co-occurrence matrices GLCMs) is a matrix that is defined over an image to be the distribution of co-occurring pixel values (grayscale values, or colors) at a given offset. It is used as an approach to texture analysis with various applications especially in medical image analysis. == Method == Given a grey-level image I {\displaystyle I} , co-occurrence matrix computes how often pairs of pixels with a specific value and offset occur in the image. The offset, ( Δ x , Δ y ) {\displaystyle (\Delta x,\Delta y)} , is a position operator that can be applied to any pixel in the image (ignoring edge effects): for instance, ( 1 , 2 ) {\displaystyle (1,2)} could indicate "one down, two right". An image with p {\displaystyle p} different pixel values will produce a p × p {\displaystyle p\times p} co-occurrence matrix, for the given offset. The ( i , j ) th {\displaystyle (i,j)^{\text{th}}} value of the co-occurrence matrix gives the number of times in the image that the i th {\displaystyle i^{\text{th}}} and j th {\displaystyle j^{\text{th}}} pixel values occur in the relation given by the offset. For an image with p {\displaystyle p} different pixel values, the p × p {\displaystyle p\times p} co-occurrence matrix C is defined over an n × m {\displaystyle n\times m} image I {\displaystyle I} , parameterized by an offset ( Δ x , Δ y ) {\displaystyle (\Delta x,\Delta y)} , as: C Δ x , Δ y ( i , j ) = ∑ x = 1 n ∑ y = 1 m { 1 , if I ( x , y ) = i and I ( x + Δ x , y + Δ y ) = j 0 , otherwise {\displaystyle C_{\Delta x,\Delta y}(i,j)=\sum _{x=1}^{n}\sum _{y=1}^{m}{\begin{cases}1,&{\text{if }}I(x,y)=i{\text{ and }}I(x+\Delta x,y+\Delta y)=j\\0,&{\text{otherwise}}\end{cases}}} where: i {\displaystyle i} and j {\displaystyle j} are the pixel values; x {\displaystyle x} and y {\displaystyle y} are the spatial positions in the image I; the offsets ( Δ x , Δ y ) {\displaystyle (\Delta x,\Delta y)} define the spatial relation for which this matrix is calculated; and I ( x , y ) {\displaystyle I(x,y)} indicates the pixel value at pixel ( x , y ) {\displaystyle (x,y)} . The 'value' of the image originally referred to the grayscale value of the specified pixel, but could be anything, from a binary on/off value to 32-bit color and beyond. (Note that 32-bit color will yield a 232 × 232 co-occurrence matrix!) Co-occurrence matrices can also be parameterized in terms of a distance, d {\displaystyle d} , and an angle, θ {\displaystyle \theta } , instead of an offset ( Δ x , Δ y ) {\displaystyle (\Delta x,\Delta y)} . Any matrix or pair of matrices can be used to generate a co-occurrence matrix, though their most common application has been in measuring texture in images, so the typical definition, as above, assumes that the matrix is an image. It is also possible to define the matrix across two different images. Such a matrix can then be used for color mapping. == Aliases == Co-occurrence matrices are also referred to as: GLCMs (gray-level co-occurrence matrices) GLCHs (gray-level co-occurrence histograms) spatial dependence matrices == Application to image analysis == Whether considering the intensity or grayscale values of the image or various dimensions of color, the co-occurrence matrix can measure the texture of the image. Because co-occurrence matrices are typically large and sparse, various metrics of the matrix are often taken to get a more useful set of features. Features generated using this technique are usually called Haralick features, after Robert Haralick. Texture analysis is often concerned with detecting aspects of an image that are rotationally invariant. To approximate this, the co-occurrence matrices corresponding to the same relation, but rotated at various regular angles (e.g. 0, 45, 90, and 135 degrees), are often calculated and summed. Texture measures like the co-occurrence matrix, wavelet transforms, and model fitting have found application in medical image analysis in particular. == Other applications == Co-occurrence matrices are also used for words processing in natural language processing (NLP).

    Read more →
  • DBGallery

    DBGallery

    DBGallery, short for Database Gallery, is a cloud-based Software as a Service (SaaS) and on-prem webserver for teams of various sizes. DBGallery enables users to centrally store, manage, catalog, archive, and securely share image, video, and document files. It facilitates version control, detects duplicates, and offers an intuitive and advanced search functionality, making assets easily accessible to all users. It takes advantage of current AI technologies to automatically add significant metadata to images, facilitates custom-trained AI models, and offers bespoke AI features. Additionally, DBGallery provides team management tools, workflow management, an activity audit trail, and other collaborative features that foster a productive environment for both internal and external stakeholders. == History == DBGallery's first public release was December 2007. Since then each year has seen continuous enhancements. 2013 added support for additional non-English languages in its meta-data. 2014 added support for creating custom data fields for tagging and search. In 2015 included the ability to auto-tag images using Reverse Geocoding. 2018 added artificial intelligence (AI) image recognition as a further addition to auto-tagging. March 2020 added complete image collection management via the web (e.g. file and folder drag and drop), a new collection dashboard, custom data layouts, and an improved audit trail. 2021 saw user experience improvements provided by improved styling and performance enhancements. Version 12 was released in October 2021. It added the ability to upload unlimited file sizes and made significant performance improvements for very large collections. June 2022 saw the release of a global duplicate images search. In late 2022, DBGallery began offering significantly reduced cloud storage cost, at a third of its previous prices, which played into its recent high-volume/high-capacity capabilities and its clients' subsequent demand for additional storage. 2023 saw improvements in user and role management, introduced it's mobile app (PWA), and improved custom-trained object detection. Release 14.0 in the spring of 2024 had large sharing improvements and a new find related images feature. Winter 2025's v15 release introduced AI-generated image descriptions, image-to-text, and facial recognition.

    Read more →
  • Mean shift

    Mean shift

    Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. == History == The mean shift procedure is usually credited to work by Fukunaga and Hostetler in 1975. It is, however, reminiscent of earlier work by Schnell in 1964. == Overview == Mean shift is a procedure for locating the maxima—the modes—of a density function given discrete data sampled from that function. This is an iterative method, and we start with an initial estimate x {\displaystyle x} . Let a kernel function K ( x i − x ) {\displaystyle K(x_{i}-x)} be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a Gaussian kernel on the distance to the current estimate is used, K ( x i − x ) = e − c | | x i − x | | 2 {\displaystyle K(x_{i}-x)=e^{-c||x_{i}-x||^{2}}} . The weighted mean of the density in the window determined by K {\displaystyle K} is m ( x ) = ∑ x i ∈ N ( x ) K ( x i − x ) x i ∑ x i ∈ N ( x ) K ( x i − x ) {\displaystyle m(x)={\frac {\sum _{x_{i}\in N(x)}K(x_{i}-x)x_{i}}{\sum _{x_{i}\in N(x)}K(x_{i}-x)}}} where N ( x ) {\displaystyle N(x)} is the neighborhood of x {\displaystyle x} , a set of points for which K ( x i − x ) ≠ 0 {\displaystyle K(x_{i}-x)\neq 0} . The difference m ( x ) − x {\displaystyle m(x)-x} is called mean shift in Fukunaga and Hostetler. The mean-shift algorithm now sets x ← m ( x ) {\displaystyle x\leftarrow m(x)} , and repeats the estimation until m ( x ) {\displaystyle m(x)} converges. Although the mean shift algorithm has been widely used in many applications, a rigid proof for the convergence of the algorithm using a general kernel in a high dimensional space is still not known. Aliyari Ghassabeh showed the convergence of the mean shift algorithm in one dimension with a differentiable, convex, and strictly decreasing profile function. However, the one-dimensional case has limited real world applications. Also, the convergence of the algorithm in higher dimensions with a finite number of the stationary (or isolated) points has been proved. However, sufficient conditions for a general kernel function to have finite stationary (or isolated) points have not been provided. Gaussian Mean-Shift is an Expectation–maximization algorithm. == Details == Let data be a finite set S {\displaystyle S} embedded in the n {\displaystyle n} -dimensional Euclidean space, X {\displaystyle X} . Let K {\displaystyle K} be a flat kernel that is the characteristic function of the λ {\displaystyle \lambda } -ball in X {\displaystyle X} , In each iteration of the algorithm, s ← m ( s ) {\displaystyle s\leftarrow m(s)} is performed for all s ∈ S {\displaystyle s\in S} simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h {\displaystyle h} , where x i {\displaystyle x_{i}} are the input samples and k ( r ) {\displaystyle k(r)} is the kernel function (or Parzen window). h {\displaystyle h} is the only parameter in the algorithm and is called the bandwidth. This approach is known as kernel density estimation or the Parzen window technique. Once we have computed f ( x ) {\displaystyle f(x)} from the equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this "brute force" approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f ( x ) {\displaystyle f(x)} over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as multiple restart gradient descent. Starting at some guess for a local maximum, y k {\displaystyle y_{k}} , which can be a random input data point x 1 {\displaystyle x_{1}} , mean shift computes the gradient of the density estimate f ( x ) {\displaystyle f(x)} at y k {\displaystyle y_{k}} and takes an uphill step in that direction. == Types of kernels == Kernel definition: Let X {\displaystyle X} be the n {\displaystyle n} -dimensional Euclidean space, R n {\displaystyle \mathbb {R} ^{n}} . The norm of x {\displaystyle x} is a non-negative number, ‖ x ‖ 2 = x ⊤ x ≥ 0 {\displaystyle \|x\|^{2}=x^{\top }x\geq 0} . A function K : X → R {\displaystyle K:X\rightarrow \mathbb {R} } is said to be a kernel if there exists a profile, k : [ 0 , ∞ ] → R {\displaystyle k:[0,\infty ]\rightarrow \mathbb {R} } , such that K ( x ) = k ( ‖ x ‖ 2 ) {\displaystyle K(x)=k(\|x\|^{2})} and k is non-negative. k is non-increasing: k ( a ) ≥ k ( b ) {\displaystyle k(a)\geq k(b)} if a < b {\displaystyle a Read more →

  • GPT-4o

    GPT-4o

    GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities are used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. The image generation model GPT Image 1, which is based on GPT-4o, replaced DALL-E 3 in ChatGPT in March 2025. OpenAI retired GPT-4o from ChatGPT on February 13, 2026. However, as of February 2026 the voice mode is still powered by GPT-4o or GPT-4o mini, depending on the usage and plan. == Background == Multiple versions of GPT-4o were originally secretly launched under different names on Arena (formerly LMArena and Chatbot Arena) as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. On 7 May 2024, OpenAI CEO Sam Altman tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being A/B tested. == Capabilities == When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 for GPT-4. Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice. The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team subscribers in September 2024. On 1 October 2024, the Realtime API was introduced. When released, the model supported over 50 languages, which OpenAI claims cover over 97% of speakers. GPT-4o has knowledge up to October 2023 and a context length of 128k tokens. === Corporate customization === In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini. The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments. == GPT-4o mini == On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini. According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of API calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $2.50 and $10, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface. The price after fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens. == Controversies == === Scarlett Johansson controversy === As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress Scarlett Johansson and Sky was quickly noticed. On May 14, Entertainment Weekly asked themselves whether this likeness was on purpose. On May 18, Johansson's husband, Colin Jost, joked about the similarity in a segment on Saturday Night Live. On May 20, 2024, OpenAI disabled the Sky voice. Scarlett Johansson starred in the 2013 sci-fi movie Her, playing Samantha, an artificially intelligent virtual assistant personified by a female voice. As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her". OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice." CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson. On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created. Observers noted similarities to how Johansson had previously sued and settled with The Walt Disney Company for breach of contract over the direct-to-streaming rollout of her Marvel film Black Widow, a settlement widely speculated to have netted her around $40M. Also on May 21, Shira Ovide at The Washington Post shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th. On May 24, Derek Robertson at Politico wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference [...] to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into [Sam Altman's] corner anytime soon." === Sycophancy === In April 2025, OpenAI rolled back an update of GPT-4o due to excessive sycophancy, after widespread reports that it had become flattering and agreeable to the point of supporting clearly delusional or dangerous ideas. In the United States, at least nine lawsuits have alleged that GPT-4o has encouraged teens to end their lives. The model was still described as sycophancy-prone when it was removed from ChatGPT in February 2026. === Removal with GPT-5 === On August 7, 2025, OpenAI released GPT-5. Its release was criticized as, with it, legacy GPT models were no longer available via ChatGPT, including GPT-4o, except for Pro users. Some users were particularly frustrated over this removal without prior warning because they used different GPT models for distinct purposes and found that GPT-5's router system left them with less control. In addition, some users preferred GPT-4o's warmer and more personal tone over that of GPT-5, which they described as "flat", "uncreative" and "lobotomized", and resembling an "overworked secretary". As a response, in a post on X, Sam Altman said that OpenAI would bring back the option to select GPT-4o to Plus users as well, and "[w]e [OpenAI] will watch usage as we think about how long to offer legacy models for." He also stated: "We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways". "Long-term, this has reinforced that we really need good ways for different users to customize things (we understand that there isn't one model that works for everyone, and we have been investing in steerability research and launched a research preview of different personalities)". On August 13, 2025, Altman wrote on X that OpenAI is working on GPT-5's personality to make the model "feel warmer". The model was removed from ChatGPT on February 13, 2026. This caused new backlash from users that had grown attached to its personality and felt its creative writing abilities and understanding of nuance were irreplaceable. On social media, some users launched the movement "#Keep4o". A research paper highlighted the plea "Please, don’t kill the only model that still feels human". The model was removed the day before Valentine's Day, and some users had romantic relationships with GPT-4o.

    Read more →
  • Hierarchical control system

    Hierarchical control system

    A hierarchical control system (HCS) is a form of control system in which a set of devices and governing software is arranged in a hierarchical tree. When the links in the tree are implemented by a computer network, then that hierarchical control system is also a form of networked control system. == Overview == A human-built system with complex behavior is often organized as a hierarchy. For example, a command hierarchy has among its notable features the organizational chart of superiors, subordinates, and lines of organizational communication. Hierarchical control systems are organized similarly to divide the decision making responsibility. Each element of the hierarchy is a linked node in the tree. Commands, tasks and goals to be achieved flow down the tree from superior nodes to subordinate nodes, whereas sensations and command results flow up the tree from subordinate to superior nodes. Nodes may also exchange messages with their siblings. The two distinguishing features of a hierarchical control system are related to its layers. Each higher layer of the tree operates with a longer interval of planning and execution time than its immediately lower layer. The lower layers have local tasks, goals, and sensations, and their activities are planned and coordinated by higher layers which do not generally override their decisions. The layers form a hybrid intelligent system in which the lowest, reactive layers are sub-symbolic. The higher layers, having relaxed time constraints, are capable of reasoning from an abstract world model and performing planning. A hierarchical task network is a good fit for planning in a hierarchical control system. Besides artificial systems, an animal's control systems are proposed to be organized as a hierarchy. In perceptual control theory, which postulates that an organism's behavior is a means of controlling its perceptions, the organism's control systems are suggested to be organized in a hierarchical pattern as their perceptions are constructed so. == Control system structure == The accompanying diagram is a general hierarchical model which shows functional manufacturing levels using computerised control of an industrial control system. Referring to the diagram; Level 0 contains the field devices such as flow and temperature sensors, and final control elements, such as control valves Level 1 contains the industrialised Input/Output (I/O) modules, and their associated distributed electronic processors. Level 2 contains the supervisory computers, which collate information from processor nodes on the system, and provide the operator control screens. Level 3 is the production control level, which does not directly control the process, but is concerned with monitoring production and monitoring targets Level 4 is the production scheduling level. == Applications == === Manufacturing, robotics and vehicles === Among the robotic paradigms is the hierarchical paradigm in which a robot operates in a top-down fashion, heavy on planning, especially motion planning. Computer-aided production engineering has been a research focus at NIST since the 1980s. Its Automated Manufacturing Research Facility was used to develop a five layer production control model. In the early 1990s DARPA sponsored research to develop distributed (i.e. networked) intelligent control systems for applications such as military command and control systems. NIST built on earlier research to develop its Real-Time Control System (RCS) and Real-time Control System Software which is a generic hierarchical control system that has been used to operate a manufacturing cell, a robot crane, and an automated vehicle. In November 2007, DARPA held the Urban Challenge. The winning entry, Tartan Racing employed a hierarchical control system, with layered mission planning, motion planning, behavior generation, perception, world modelling, and mechatronics. === Artificial intelligence === Subsumption architecture is a methodology for developing artificial intelligence that is heavily associated with behavior based robotics. This architecture is a way of decomposing complicated intelligent behavior into many "simple" behavior modules, which are in turn organized into layers. Each layer implements a particular goal of the software agent (i.e. system as a whole), and higher layers are increasingly more abstract. Each layer's goal subsumes that of the underlying layers, e.g. the decision to move forward by the eat-food layer takes into account the decision of the lowest obstacle-avoidance layer. Behavior need not be planned by a superior layer, rather behaviors may be triggered by sensory inputs and so are only active under circumstances where they might be appropriate. Reinforcement learning has been used to acquire behavior in a hierarchical control system in which each node can learn to improve its behavior with experience. James Albus, while at NIST, developed a theory for intelligent system design named the Reference Model Architecture (RMA), which is a hierarchical control system inspired by RCS. Albus defines each node to contain these components. Behavior generation is responsible for executing tasks received from the superior, parent node. It also plans for, and issues tasks to, the subordinate nodes. Sensory perception is responsible for receiving sensations from the subordinate nodes, then grouping, filtering, and otherwise processing them into higher level abstractions that update the local state and which form sensations that are sent to the superior node. Value judgment is responsible for evaluating the updated situation and evaluating alternative plans. World Model is the local state that provides a model for the controlled system, controlled process, or environment at the abstraction level of the subordinate nodes. At its lowest levels, the RMA can be implemented as a subsumption architecture, in which the world model is mapped directly to the controlled process or real world, avoiding the need for a mathematical abstraction, and in which time-constrained reactive planning can be implemented as a finite-state machine. Higher levels of the RMA however, may have sophisticated mathematical world models and behavior implemented by automated planning and scheduling. Planning is required when certain behaviors cannot be triggered by current sensations, but rather by predicted or anticipated sensations, especially those that come about as result of the node's actions.

    Read more →
  • Stereo cameras

    Stereo cameras

    The stereo cameras approach is a method of distilling a noisy video signal into a coherent data set that a computer can begin to process into actionable symbolic objects, or abstractions. Stereo cameras is one of many approaches used in the broader fields of computer vision and machine vision. == Calculation == In this approach, two cameras with a known physical relationship (i.e. a common field of view the cameras can see, and how far apart their focal points sit in physical space) are correlated via software. By finding mappings of common pixel values, and calculating how far apart these common areas reside in pixel space, a rough depth map can be created. This is very similar to how the human brain uses stereoscopic information from the eyes to gain depth cue information, i.e. how far apart any given object in the scene is from the viewer. The camera attributes must be known, focal length and distance apart etc., and a calibration done. Once this is completed, the systems can be used to sense the distances of objects by triangulation. Finding the same singular physical point in the two left and right images is known as the correspondence problem. Correctly locating the point gives the computer the capability to calculate the distance that the robot or camera is from the object. On the BH2 Lunar Rover the cameras use five steps: a bayer array filter, photometric consistency dense matching algorithm, a Laplace of Gaussian (LoG) edge detection algorithm, a stereo matching algorithm and finally uniqueness constraint. == Uses == This type of stereoscopic image processing technique is used in applications such as 3D reconstruction, robotic control and sensing, crowd dynamics monitoring and off-planet terrestrial rovers; for example, in mobile robot navigation, tracking, gesture recognition, targeting, 3D surface visualization, immersive and interactive gaming. Although the Xbox Kinect sensor is also able to create a depth map of an image, it uses an infrared camera for this purpose, and does not use the dual-camera technique. Other approaches to stereoscopic sensing include time of flight sensors and ultrasound.

    Read more →
  • Talking Angela

    Talking Angela

    Talking Angela is a mobile game (formerly a chatbot), developed by Slovenian studio Outfit7 as part of the Talking Tom & Friends series. It was released on 13 November 2012 and December 2012 for iPhone, iPod and iPad, January 2013 for Android, and January 2014 for Google Play. The game's successor, the My Talking Angela game, was released in December 2014. The game takes place in a café in Paris and allows players to interact with Angela, an anthropomorphic white cat in different ways. Players can use coins to purchase makeup, accessories and items, as well as drinks that will trigger different visual effects. The fortune cookie button causes Angela to read out a fortune cookie, while the bird icon will prompt birds to fly around the screen, or have Angela feed them. Players can also pet or poke Angela, as well the café's sign. Prior to their removal, the game featured a chat system and a camera button. Users can engage in conversations with Angela, ask for quizzes or initiate a short snippet of the song "That's Falling In Love". If the player was to type in "Who is an idiot?", Angela would respond with a random swear word. Additionally, inquiring Angela about sexual topics would cause her to reply with "Do you want to talk about sex?", though she will quickly change the topic regardless of what the player writes next. A hoax claiming that Angela's eyes were hidden cameras that enabled hackers or paedophiles to watch children was spread. Despite the claims, Snopes and The Guardian found no evidence. Due to the hoax, Angela received a blue dress, as well as an altered eye asset with a different reflection, and later the chat and camera functions were removed altogether. == Hoaxes == In February 2014, Talking Angela was the subject of an Internet hoax alleging that the application was a front for child predators to exploit children. The rumor, which was widely circulated on Facebook and various websites claiming to be dedicated to parenting, claims that a sinister sexual predator or hacker, asked children for private personal information using the game's text-chat feature. Other versions of the rumour even attributed the disappearance of a child to the game; one news report claimed that a seven year old boy disappeared after downloading the app. Another variation included that it was run by a paedophile ring, citing a man that could be seen in Angela's eyes. The app's developers, Outfit7, later gave a statement refuting the hoaxes. The hoax was eventually debunked by Snopes, a fact-checking website. The site's owners, Barbara and David Mikkelson, reported that they had tried to "prompt" it to give responses asking for private information, but were unsuccessful, even when asking it explicitly sexual questions. While it is true that, in the game with child mode off, Angela does ask for the user's name, age and personal preferences to determine conversation topics, Outfit7 has said that this information is all "anonymized" and all personal information is removed from it. It is also impossible for a person to take control of what Angela says in the game, since the game is based on chatbot software. When the mode was turned on, the chat feature was disabled, meaning no personal questions could be asked. In 2015, the hoax was revived on Facebook, which prompted online security company Sophos and The Guardian to debunk it again. Sophos employee Paul Ducklin wrote that the message being posted on Facebook promoting the hoax was "close to 600 rambling, repetitious words, despite claiming at the start that it didn't have words to describe the situation. It's ill-written, and borders on being illiterate and incomprehensible." Bruce Wilcox, one of the game's programmers, attributed the hoax's popularity to the fact that the chatbot program in Talking Angela aimed to sound realistic. Concern was raised that the game's child mode may have been too easy for children to turn off. It allowed them to purchase "coins", premium currency in the game, via iTunes, and enabled the chat feature. While not "connecting your children to paedophiles", this still raised concerns according to The Guardian. === Impact === The scare significantly boosted the game's popularity, and was credited with helping the app enter the top 10 free iPhone apps soon after the hoax became widely known in February 2015,In the truth the reason there is a man in Angela’s eyes is because of pareidoila, the ability to see through diamonds and other minerals and water bodies and shiny objects,which is the reason why players notice a man in her eyes,The truth is that being Angela’s eyes simply serve as a reflective surface,Because of the low quality of this reflection the reflection was mistaken for a humanoid figure. oref>Smith, Josh (19 February 2014). "Talking Angela App Scare Skyrockets App to Top of Charts". GottaBeMobile.com. Archived from the original on 2 April 2016. Retrieved 10 May 2014. and third most popular for all iPhone apps at the start of the following month. In 2016, Outfit7 removed the chat feature along with the camera function from the app due to this controversy, though this decision was met with criticism.

    Read more →
  • Luxafor

    Luxafor

    Luxafor () is a brand of office productivity tools designed to improve efficiency and communication in workplaces. The brands main product is LED status indicators for use in office settings. Luxafor is a product line under the company SIA Greynut, based in Riga, Latvia. == History == Luxafor was developed by the technology company SIA Greynut. The brand first gained attention through a Kickstarter campaign in 2015, which aimed to fund its initial product, the Luxafor Flag. Although the campaign was unsuccessful in reaching its funding goal, the product was still brought to market. In 2017, Luxafor launched another Kickstarter campaign for the Luxafor Bluetooth, a wireless version of its LED status indicator. This campaign also did not meet its funding goal, but like its predecessor, the product was still developed and released. Despite initial setbacks, Luxafor Bluetooth has become one of the brand's leading products. == Products == Luxafors main product range is LED status indicators, including: === Luxafor Flag === A USB-powered LED indicator that shows different colors to signal the user's availability. === Luxafor Bluetooth === A wireless LED indicator controlled via Bluetooth, integrating with productivity tools like Slack and Microsoft Teams. === Luxafor Switch === An advanced status indicator designed to manage room and workspace availability. === Other === Other Luxafor products include CO2 Dongle, Smart Button, Mute Button, Pomodoro Timer and others. == Features == Luxafor products are known for their customizable indicators, integration capabilities with IFTTT, Zapier, and remote control features. They are compatible with various operating systems, including Windows and macOS, and can be integrated with numerous communication and productivity platforms, like Microsoft Teams and Cisco Jabber.

    Read more →
  • Rule induction

    Rule induction

    Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data. Data mining in general and rule induction in detail are trying to create algorithms without human programming but with analyzing existing data structures. In the easiest case, a rule is expressed with “if-then statements” and was created with the ID3 algorithm for decision tree learning. Rule learning algorithm are taking training data as input and creating rules by partitioning the table with cluster analysis. A possible alternative over the ID3 algorithm is genetic programming which evolves a program until it fits to the data. Creating different algorithm and testing them with input data can be realized in the WEKA software. Additional tools are machine learning libraries for Python, like scikit-learn. == Paradigms == Some major rule induction paradigms are: Association rule learning algorithms (e.g., Agrawal) Decision rule algorithms (e.g., Quinlan 1987) Hypothesis testing algorithms (e.g., RULEX) Horn clause induction Version spaces Rough set rules Inductive Logic Programming Boolean decomposition (Feldman) == Algorithms == Some rule induction algorithms are: Charade Rulex Progol CN2

    Read more →
  • Mean shift

    Mean shift

    Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. == History == The mean shift procedure is usually credited to work by Fukunaga and Hostetler in 1975. It is, however, reminiscent of earlier work by Schnell in 1964. == Overview == Mean shift is a procedure for locating the maxima—the modes—of a density function given discrete data sampled from that function. This is an iterative method, and we start with an initial estimate x {\displaystyle x} . Let a kernel function K ( x i − x ) {\displaystyle K(x_{i}-x)} be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a Gaussian kernel on the distance to the current estimate is used, K ( x i − x ) = e − c | | x i − x | | 2 {\displaystyle K(x_{i}-x)=e^{-c||x_{i}-x||^{2}}} . The weighted mean of the density in the window determined by K {\displaystyle K} is m ( x ) = ∑ x i ∈ N ( x ) K ( x i − x ) x i ∑ x i ∈ N ( x ) K ( x i − x ) {\displaystyle m(x)={\frac {\sum _{x_{i}\in N(x)}K(x_{i}-x)x_{i}}{\sum _{x_{i}\in N(x)}K(x_{i}-x)}}} where N ( x ) {\displaystyle N(x)} is the neighborhood of x {\displaystyle x} , a set of points for which K ( x i − x ) ≠ 0 {\displaystyle K(x_{i}-x)\neq 0} . The difference m ( x ) − x {\displaystyle m(x)-x} is called mean shift in Fukunaga and Hostetler. The mean-shift algorithm now sets x ← m ( x ) {\displaystyle x\leftarrow m(x)} , and repeats the estimation until m ( x ) {\displaystyle m(x)} converges. Although the mean shift algorithm has been widely used in many applications, a rigid proof for the convergence of the algorithm using a general kernel in a high dimensional space is still not known. Aliyari Ghassabeh showed the convergence of the mean shift algorithm in one dimension with a differentiable, convex, and strictly decreasing profile function. However, the one-dimensional case has limited real world applications. Also, the convergence of the algorithm in higher dimensions with a finite number of the stationary (or isolated) points has been proved. However, sufficient conditions for a general kernel function to have finite stationary (or isolated) points have not been provided. Gaussian Mean-Shift is an Expectation–maximization algorithm. == Details == Let data be a finite set S {\displaystyle S} embedded in the n {\displaystyle n} -dimensional Euclidean space, X {\displaystyle X} . Let K {\displaystyle K} be a flat kernel that is the characteristic function of the λ {\displaystyle \lambda } -ball in X {\displaystyle X} , In each iteration of the algorithm, s ← m ( s ) {\displaystyle s\leftarrow m(s)} is performed for all s ∈ S {\displaystyle s\in S} simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h {\displaystyle h} , where x i {\displaystyle x_{i}} are the input samples and k ( r ) {\displaystyle k(r)} is the kernel function (or Parzen window). h {\displaystyle h} is the only parameter in the algorithm and is called the bandwidth. This approach is known as kernel density estimation or the Parzen window technique. Once we have computed f ( x ) {\displaystyle f(x)} from the equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this "brute force" approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f ( x ) {\displaystyle f(x)} over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as multiple restart gradient descent. Starting at some guess for a local maximum, y k {\displaystyle y_{k}} , which can be a random input data point x 1 {\displaystyle x_{1}} , mean shift computes the gradient of the density estimate f ( x ) {\displaystyle f(x)} at y k {\displaystyle y_{k}} and takes an uphill step in that direction. == Types of kernels == Kernel definition: Let X {\displaystyle X} be the n {\displaystyle n} -dimensional Euclidean space, R n {\displaystyle \mathbb {R} ^{n}} . The norm of x {\displaystyle x} is a non-negative number, ‖ x ‖ 2 = x ⊤ x ≥ 0 {\displaystyle \|x\|^{2}=x^{\top }x\geq 0} . A function K : X → R {\displaystyle K:X\rightarrow \mathbb {R} } is said to be a kernel if there exists a profile, k : [ 0 , ∞ ] → R {\displaystyle k:[0,\infty ]\rightarrow \mathbb {R} } , such that K ( x ) = k ( ‖ x ‖ 2 ) {\displaystyle K(x)=k(\|x\|^{2})} and k is non-negative. k is non-increasing: k ( a ) ≥ k ( b ) {\displaystyle k(a)\geq k(b)} if a < b {\displaystyle a Read more →

  • Bigram

    Bigram

    A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, and speech recognition. Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar). == Applications == Bigrams, along with other n-grams, are used in most successful language models for speech recognition. Bigram frequency attacks can be used in cryptography to solve cryptograms. See frequency analysis. Bigram frequency is one approach to statistical language identification. Some activities in logology or recreational linguistics involve bigrams. These include attempts to find English words beginning with every possible bigram, or words containing a string of repeated bigrams, such as logogogue. == Bigram frequency in the English language == The frequency of the most common letter bigrams in a large English corpus is: th 3.56% of 1.17% io 0.83% he 3.07% ed 1.17% le 0.83% in 2.43% is 1.13% ve 0.83% er 2.05% it 1.12% co 0.79% an 1.99% al 1.09% me 0.79% re 1.85% ar 1.07% de 0.76% on 1.76% st 1.05% hi 0.76% at 1.49% to 1.05% ri 0.73% en 1.45% nt 1.04% ro 0.73% nd 1.35% ng 0.95% ic 0.70% ti 1.34% se 0.93% ne 0.69% es 1.34% ha 0.93% ea 0.69% or 1.28% as 0.87% ra 0.69% te 1.20% ou 0.87% ce 0.65%

    Read more →