Edge inference

Edge inference

Edge inference is the process of running machine learning or deep learning models on local devices (edge devices) such as smartphones, IoT devices, embedded systems, and edge servers instead of centralized cloud computing infrastructure. A key feature of edge computing is edge inference, which allows for real-time data processing, low latency, and improved privacy by reducing the amount of data sent to remote servers.

Client-side persistent data

Client-side persistent data or CSPD is a term used in computing for storing data required by web applications to complete internet tasks on the client-side as needed rather than exclusively on the server. As a framework it is one solution to the needs of Occasionally connected computing or OCC. A major challenge for HTTP as a stateless protocol has been asynchronous tasks. The AJAX pattern using XMLHttpRequest was first introduced by Microsoft in the context of the Outlook e-mail product. The first CSPD were the 'cookies' introduced by the Netscape Navigator. ActiveX components which have entries in the Windows registry can also be viewed as a form of client-side persistence.

Persian Speech Corpus

The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian speech aligned with recorded speech on the phoneme level, including annotations of word boundaries. Previous spoken corpora of Persian include FARSDAT, which consists of read aloud speech from newspaper texts from 100 Persian speakers and the Telephone FARsi Spoken language DATabase (TFARSDAT) which comprises seven hours of read and spontaneous speech produced by 60 native speakers of Persian from ten regions of Iran. The Persian Speech Corpus was built using the same methodologies laid out in the doctoral project on Modern Standard Arabic of Nawar Halabi at the University of Southampton. The work was funded by MicroLinkPC, who own an exclusive license to commercialise the corpus, though the corpus is available for non-commercial use through the corpus' website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The corpus was built for speech synthesis purposes, but has been used for building HMM based voices in Persian. It can also be used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems. == Contents == The corpus is downloadable from its website, and contains the following: 396 .wav files containing spoken utterances 396 .lab files containing text utterances 396 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line

Kernel method

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing. Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick". Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. Algorithms capable of operating with kernels include the kernel perceptron, support-vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters and many others. Most kernel algorithms are based on convex optimization or eigenproblems and are statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory (for example, using Rademacher complexity). == Motivation and informal explanation == Kernel methods can be thought of as instance-based learners: rather than learning some fixed set of parameters corresponding to the features of their inputs, they instead "remember" the i {\displaystyle i} -th training example ( x i , y i ) {\displaystyle (\mathbf {x} _{i},y_{i})} and learn for it a corresponding weight w i {\displaystyle w_{i}} . Prediction for unlabeled inputs, i.e., those not in the training set, are treated by the application of a similarity function k {\displaystyle k} , called a kernel, between the unlabeled input x ′ {\displaystyle \mathbf {x'} } and each of the training inputs x i {\displaystyle \mathbf {x} _{i}} . For instance, a kernelized binary classifier typically computes a weighted sum of similarities y ^ = sgn ⁡ ∑ i = 1 n w i y i k ( x i , x ′ ) , {\displaystyle {\hat {y}}=\operatorname {sgn} \sum _{i=1}^{n}w_{i}y_{i}k(\mathbf {x} _{i},\mathbf {x'} ),} where y ^ ∈ { − 1 , + 1 } {\displaystyle {\hat {y}}\in \{-1,+1\}} is the kernelized binary classifier's predicted label for the unlabeled input x ′ {\displaystyle \mathbf {x'} } whose hidden true label y {\displaystyle y} is of interest; k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is the kernel function that measures similarity between any pair of inputs x , x ′ ∈ X {\displaystyle \mathbf {x} ,\mathbf {x'} \in {\mathcal {X}}} ; the sum ranges over the n labeled examples { ( x i , y i ) } i = 1 n {\displaystyle \{(\mathbf {x} _{i},y_{i})\}_{i=1}^{n}} in the classifier's training set, with y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} ; the w i ∈ R {\displaystyle w_{i}\in \mathbb {R} } are the weights for the training examples, as determined by the learning algorithm; the sign function sgn {\displaystyle \operatorname {sgn} } determines whether the predicted classification y ^ {\displaystyle {\hat {y}}} comes out positive or negative. Kernel classifiers were described as early as the 1960s, with the invention of the kernel perceptron. They rose to great prominence with the popularity of the support-vector machine (SVM) in the 1990s, when the SVM was found to be competitive with neural networks on tasks such as handwriting recognition. == Mathematics: the kernel trick == The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all x {\displaystyle \mathbf {x} } and x ′ {\displaystyle \mathbf {x'} } in the input space X {\displaystyle {\mathcal {X}}} , certain functions k ( x , x ′ ) {\displaystyle k(\mathbf {x} ,\mathbf {x'} )} can be expressed as an inner product in another space V {\displaystyle {\mathcal {V}}} . The function k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is often referred to as a kernel or a kernel function. The word "kernel" is used in mathematics to denote a weighting function for a weighted sum or integral. Certain problems in machine learning have more structure than an arbitrary weighting function k {\displaystyle k} . The computation is made much simpler if the kernel can be written in the form of a "feature map" φ : X → V {\displaystyle \varphi \colon {\mathcal {X}}\to {\mathcal {V}}} which satisfies k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ V . {\displaystyle k(\mathbf {x} ,\mathbf {x'} )=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle _{\mathcal {V}}.} The key restriction is that ⟨ ⋅ , ⋅ ⟩ V {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {V}}} must be a proper inner product. On the other hand, an explicit representation for φ {\displaystyle \varphi } is not necessary, as long as V {\displaystyle {\mathcal {V}}} is an inner product space. The alternative follows from Mercer's theorem: an implicitly defined function φ {\displaystyle \varphi } exists whenever the space X {\displaystyle {\mathcal {X}}} can be equipped with a suitable measure ensuring the function k {\displaystyle k} satisfies Mercer's condition. Mercer's theorem is similar to a generalization of the result from linear algebra that associates an inner product to any positive-definite matrix. In fact, Mercer's condition can be reduced to this simpler case. If we choose as our measure the counting measure μ ( T ) = | T | {\displaystyle \mu (T)=|T|} for all T ⊂ X {\displaystyle T\subset X} , which counts the number of points inside the set T {\displaystyle T} , then the integral in Mercer's theorem reduces to a summation ∑ i = 1 n ∑ j = 1 n k ( x i , x j ) c i c j ≥ 0. {\displaystyle \sum _{i=1}^{n}\sum _{j=1}^{n}k(\mathbf {x} _{i},\mathbf {x} _{j})c_{i}c_{j}\geq 0.} If this summation holds for all finite sequences of points ( x 1 , … , x n ) {\displaystyle (\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n})} in X {\displaystyle {\mathcal {X}}} and all choices of n {\displaystyle n} real-valued coefficients ( c 1 , … , c n ) {\displaystyle (c_{1},\dots ,c_{n})} (cf. positive definite kernel), then the function k {\displaystyle k} satisfies Mercer's condition. Some algorithms that depend on arbitrary relationships in the native space X {\displaystyle {\mathcal {X}}} would, in fact, have a linear interpretation in a different setting: the range space of φ {\displaystyle \varphi } . The linear interpretation gives us insight about the algorithm. Furthermore, there is often no need to compute φ {\displaystyle \varphi } directly during computation, as is the case with support-vector machines. Some cite this running time shortcut as the primary benefit. Researchers also use it to justify the meanings and properties of existing algorithms. Theoretically, a Gram matrix K ∈ R n × n {\displaystyle \mathbf {K} \in \mathbb {R} ^{n\times n}} with respect to { x 1 , … , x n } {\displaystyle \{\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n}\}} (sometimes also called a "kernel matrix"), where K i j = k ( x i , x j ) {\displaystyle K_{ij}=k(\mathbf {x} _{i},\mathbf {x} _{j})} , must be positive semi-definite (PSD). Empirically, for machine learning heuristics, choices of a function k {\displaystyle k} that do not satisfy Mercer's condition may still perform reasonably if k {\displaystyle k} at least approximates the intuitive idea of similarity. Regardless of whether k {\displaystyle k} is a Mercer kernel, k {\displaystyle k} may still be referred to as a "kernel". If the kernel function k {\displaystyle k} is also a covariance function as used in Gaussian processes, then the Gram matrix K {\displaystyle \mathbf {K} } can also be called a covariance matrix. == Applications == Application areas of kernel methods are diverse and include geostatistics, kriging, inverse distance weighting, 3D reconstruction, bioinformatics, cheminformatics, information extraction and handwriting recognition. == Popular kernels == Fisher kernel Graph kernels Kernel smoother Polynomial kernel Radial basis function kern

Ordination (statistics)

Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). In contrast to cluster analysis, ordination orders quantities in a (usually lower-dimensional) latent space. In the ordination space, quantities that are near each other share attributes (i.e., are similar to some degree), and dissimilar objects are farther from each other. Such relationships between the objects, on each of several axes or latent variables, are then characterized numerically and/or graphically in a biplot. The first ordination method, principal components analysis, was suggested by Karl Pearson in 1901. == Methods == Ordination methods can broadly be categorized in eigenvector-, algorithm-, or model-based methods. Many classical ordination techniques, including principal components analysis, correspondence analysis (CA) and its derivatives (detrended correspondence analysis, canonical correspondence analysis, and redundancy analysis, belong to the first group). The second group includes some distance-based methods such as non-metric multidimensional scaling, and machine learning methods such as T-distributed stochastic neighbor embedding and nonlinear dimensionality reduction. The third group includes model-based ordination methods, which can be considered as multivariate extensions of Generalized Linear Models. Model-based ordination methods are more flexible in their application than classical ordination methods, so that it is for example possible to include random-effects. Unlike in the aforementioned two groups, there is no (implicit or explicit) distance measure in the ordination. Instead, a distribution needs to be specified for the responses as is typical for statistical models. These and other assumptions, such as the assumed mean-variance relationship, can be validated with the use of residual diagnostics, unlike in other ordination methods. == Applications == Ordination can be used on the analysis of any set of multivariate objects. It is frequently used in several environmental or ecological sciences, particularly plant community ecology. It is also used in genetics and systems biology for microarray data analysis and in psychometrics.

Business Controls Corporation

Business Controls Corporation is a privately held computer company that developed an application-program-generator and also a series of accounting software packages. These packages were widely enough used for various business magazines to have back-of-the-book ads for companies seeking accountants with experience in one or more of them. Computer magazines ran coverage for their SB-5 application-program-generator as from time to time new versions were released, each with new or improved features. == Early days == The company's initial offerings were packages for the DEC PDP-8, although Business Controls Corporation also wrote custom-written programs for customers. Large customers with mainframes who also used smaller systems for departmental use and distributed processing also used BCC's services. == SB-5 == The addition of an application-program-generator named SB-5 that, from specifications, could generate COBOL code was a major step forward. Although this began with supporting the DEC PDP-11, they subsequently began to support COBOL on DEC's DECsystem-10 & DECSYSTEM-20. VAX support came later. The specifications also permitted COBOL inserts and overrides: SB-5 could build an application that was all COBOL, yet only code the portions that varied from BCC's "vanilla" accounting packages. === Similar offerings === A similar idea was done for the IBM mainframe world in the form of a series of application-program-generators from Dylakor Corporation. They were named DYL-250, DYL-260, DYL-270 & DYL-280. Dylakor was acquired by Computer Associates. The specific syntax was different, but it had wider use, and - a mark of success and recognition in the industry - syntax-compatible implementations were released by a competitor. Still another alternative was Peat Marwick Mitchell's PMM2170 application-program-generator package. Like the others, it supported COBOL inserts and overrides. === Extended integration === Business Controls Corporation subsequently extended SB-5's feature set to provide support for System 1022, a product for the DECsystem-10 & DECSYSTEM-20; 1022's vendor also had a VAX/VMS (later OpenVMS) product, System 1032.

Shattered set

A class of sets is said to shatter another set if it is possible to "pick out" any element of that set using intersection. The concept of shattered sets plays an important role in Vapnik–Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory. == Definition == Suppose A is a set and C is a class of sets. The class C shatters the set A if for each subset a of A, there is some element c of C such that a = c ∩ A . {\displaystyle a=c\cap A.} Equivalently, C shatters A when their intersection is equal to A's power set: P(A) = { c ∩ A | c ∈ C }. We employ the letter C to refer to a "class" or "collection" of sets, as in a Vapnik–Chervonenkis class (VC-class). The set A is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points. == Example == We will show that the class of all discs in the plane (two-dimensional space) does not shatter every set of four points on the unit circle, yet the class of all convex sets in the plane does shatter every finite set of points on the unit circle. Let A be a set of four points on the unit circle and let C be the class of all discs. To test where C shatters A, we attempt to draw a disc around every subset of points in A. First, we draw a disc around the subsets of each isolated point. Next, we try to draw a disc around every subset of point pairs. This turns out to be doable for adjacent points, but impossible for points on opposite sides of the circle. Any attempt to include those points on the opposite side will necessarily include other points not in that pair. Hence, any pair of opposite points cannot be isolated out of A using intersections with class C and so C does not shatter A. As visualized below: Because there is some subset which can not be isolated by any disc in C, we conclude then that A is not shattered by C. And, with a bit of thought, we can prove that no set of four points is shattered by this C. However, if we redefine C to be the class of all elliptical discs, we find that we can still isolate all the subsets from above, as well as the points that were formerly problematic. Thus, this specific set of 4 points is shattered by the class of elliptical discs. Visualized below: With a bit of thought, we could generalize that any set of finite points on a unit circle could be shattered by the class of all convex sets (visualize connecting the dots). == Shatter coefficient == To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as the growth function). For a collection C of sets s ⊂ Ω {\displaystyle s\subset \Omega } , Ω {\displaystyle \Omega } being any space, often a sample space, we define the nth shattering coefficient of C as S C ( n ) = max ∀ x 1 , x 2 , … , x n ∈ Ω card ⁡ { { x 1 , x 2 , … , x n } ∩ s , s ∈ C } {\displaystyle S_{C}(n)=\max _{\forall x_{1},x_{2},\dots ,x_{n}\in \Omega }\operatorname {card} \{\,\{\,x_{1},x_{2},\dots ,x_{n}\}\cap s,s\in C\}} where card {\displaystyle \operatorname {card} } denotes the cardinality of the set and x 1 , x 2 , … , x n ∈ Ω {\displaystyle x_{1},x_{2},\dots ,x_{n}\in \Omega } is any set of n points,. S C ( n ) {\displaystyle S_{C}(n)} is the largest number of subsets of any set A of n points that can be formed by intersecting A with the sets in collection C. For example, if set A contains 3 points, its power set, P ( A ) {\displaystyle P(A)} , contains 2 3 = 8 {\displaystyle 2^{3}=8} elements. If C shatters A, its shattering coefficient(3) would be 8 and S C ( 2 ) {\displaystyle S_{C}(2)} would be 2 2 = 4 {\displaystyle 2^{2}=4} . However, if one of those sets in P ( A ) {\displaystyle P(A)} cannot be obtained through intersections in c, then S C ( 3 ) {\displaystyle S_{C}(3)} would only be 7. If none of those sets can be obtained, S C ( 3 ) {\displaystyle S_{C}(3)} would be 0. Additionally, if S C ( 2 ) = 3 {\displaystyle S_{C}(2)=3} , for example, then there is an element in the set of all 2-point sets from A that cannot be obtained from intersections with C. It follows from this that S C ( 3 ) {\displaystyle S_{C}(3)} would also be less than 8 (i.e. C would not shatter A) because we have already located a "missing" set in the smaller power set of 2-point sets. This example illustrates some properties of S C ( n ) {\displaystyle S_{C}(n)} : S C ( n ) ≤ 2 n {\displaystyle S_{C}(n)\leq 2^{n}} for all n because { s ∩ A | s ∈ C } ⊆ P ( A ) {\displaystyle \{s\cap A|s\in C\}\subseteq P(A)} for any A ⊆ Ω {\displaystyle A\subseteq \Omega } . If S C ( n ) = 2 n {\displaystyle S_{C}(n)=2^{n}} , that means there is a set of cardinality n, which can be shattered by C. If S C ( N ) < 2 N {\displaystyle S_{C}(N)<2^{N}} for some N > 1 {\displaystyle N>1} then S C ( n ) < 2 n {\displaystyle S_{C}(n)<2^{n}} for all n ≥ N {\displaystyle n\geq N} . The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities. == Vapnik–Chervonenkis class == If A cannot be shattered by C, there will be a smallest value of n that makes the shatter coefficient(n) less than 2 n {\displaystyle 2^{n}} because as n gets larger, there are more sets that could be missed. Alternatively, there is also a largest value of n for which the S C ( n ) {\displaystyle S_{C}(n)} is still 2 n {\displaystyle 2^{n}} , because as n gets smaller, there are fewer sets that could be omitted. The extreme of this is S C ( 0 ) {\displaystyle S_{C}(0)} (the shattering coefficient of the empty set), which must always be 2 0 = 1 {\displaystyle 2^{0}=1} . These statements lends themselves to defining the VC dimension of a class C as: V C ( C ) = min n { n : S C ( n ) < 2 n } {\displaystyle VC(C)={\underset {n}{\min }}\{n:S_{C}(n)<2^{n}\}\,} or, alternatively, as V C 0 ( C ) = max n { n : S C ( n ) = 2 n } . {\displaystyle VC_{0}(C)={\underset {n}{\max }}\{n:S_{C}(n)=2^{n}\}.\,} Note that V C ( C ) = V C 0 ( C ) + 1. {\displaystyle VC(C)=VC_{0}(C)+1.} . The VC dimension is usually defined as V C 0 {\displaystyle VC_{0}} , the largest cardinality of points chosen that will still shatter A (i.e. n such that S C ( n ) = 2 n {\displaystyle S_{C}(n)=2^{n}} ). Altneratively, if for any n there is a set of cardinality n which can be shattered by C, then S C ( n ) = 2 n {\displaystyle S_{C}(n)=2^{n}} for all n and the VC dimension of this class C is infinite. A class with finite VC dimension is called a Vapnik–Chervonenkis class or VC class. A class C is uniformly Glivenko–Cantelli if and only if it is a VC class.