Zassenhaus algorithm

Zassenhaus algorithm

In mathematics, the Zassenhaus algorithm is a method to calculate a basis for the intersection and sum of two subspaces of a vector space. It is named after Hans Zassenhaus, but no publication of this algorithm by him is known. It is used in computer algebra systems. == Algorithm == === Input === Let V be a vector space and U, W two finite-dimensional subspaces of V with the following spanning sets: U = ⟨ u 1 , … , u n ⟩ {\displaystyle U=\langle u_{1},\ldots ,u_{n}\rangle } and W = ⟨ w 1 , … , w k ⟩ . {\displaystyle W=\langle w_{1},\ldots ,w_{k}\rangle .} Finally, let B 1 , … , B m {\displaystyle B_{1},\ldots ,B_{m}} be linearly independent vectors so that u i {\displaystyle u_{i}} and w i {\displaystyle w_{i}} can be written as u i = ∑ j = 1 m a i , j B j {\displaystyle u_{i}=\sum _{j=1}^{m}a_{i,j}B_{j}} and w i = ∑ j = 1 m b i , j B j . {\displaystyle w_{i}=\sum _{j=1}^{m}b_{i,j}B_{j}.} === Output === The algorithm computes the base of the sum U + W {\displaystyle U+W} and a base of the intersection U ∩ W {\displaystyle U\cap W} . === Algorithm === The algorithm creates the following block matrix of size ( ( n + k ) × ( 2 m ) ) {\displaystyle ((n+k)\times (2m))} : ( a 1 , 1 a 1 , 2 ⋯ a 1 , m a 1 , 1 a 1 , 2 ⋯ a 1 , m ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ a n , 1 a n , 2 ⋯ a n , m a n , 1 a n , 2 ⋯ a n , m b 1 , 1 b 1 , 2 ⋯ b 1 , m 0 0 ⋯ 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ b k , 1 b k , 2 ⋯ b k , m 0 0 ⋯ 0 ) {\displaystyle {\begin{pmatrix}a_{1,1}&a_{1,2}&\cdots &a_{1,m}&a_{1,1}&a_{1,2}&\cdots &a_{1,m}\\\vdots &\vdots &&\vdots &\vdots &\vdots &&\vdots \\a_{n,1}&a_{n,2}&\cdots &a_{n,m}&a_{n,1}&a_{n,2}&\cdots &a_{n,m}\\b_{1,1}&b_{1,2}&\cdots &b_{1,m}&0&0&\cdots &0\\\vdots &\vdots &&\vdots &\vdots &\vdots &&\vdots \\b_{k,1}&b_{k,2}&\cdots &b_{k,m}&0&0&\cdots &0\end{pmatrix}}} Using elementary row operations, this matrix is transformed to the row echelon form. Then, it has the following shape: ( c 1 , 1 c 1 , 2 ⋯ c 1 , m ∙ ∙ ⋯ ∙ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ c q , 1 c q , 2 ⋯ c q , m ∙ ∙ ⋯ ∙ 0 0 ⋯ 0 d 1 , 1 d 1 , 2 ⋯ d 1 , m ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 0 ⋯ 0 d ℓ , 1 d ℓ , 2 ⋯ d ℓ , m 0 0 ⋯ 0 0 0 ⋯ 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 0 0 ⋯ 0 0 0 ⋯ 0 ) {\displaystyle {\begin{pmatrix}c_{1,1}&c_{1,2}&\cdots &c_{1,m}&\bullet &\bullet &\cdots &\bullet \\\vdots &\vdots &&\vdots &\vdots &\vdots &&\vdots \\c_{q,1}&c_{q,2}&\cdots &c_{q,m}&\bullet &\bullet &\cdots &\bullet \\0&0&\cdots &0&d_{1,1}&d_{1,2}&\cdots &d_{1,m}\\\vdots &\vdots &&\vdots &\vdots &\vdots &&\vdots \\0&0&\cdots &0&d_{\ell ,1}&d_{\ell ,2}&\cdots &d_{\ell ,m}\\0&0&\cdots &0&0&0&\cdots &0\\\vdots &\vdots &&\vdots &\vdots &\vdots &&\vdots \\0&0&\cdots &0&0&0&\cdots &0\end{pmatrix}}} Here, ∙ {\displaystyle \bullet } stands for arbitrary numbers, and the vectors ( c p , 1 , c p , 2 , … , c p , m ) {\displaystyle (c_{p,1},c_{p,2},\ldots ,c_{p,m})} for every p ∈ { 1 , … , q } {\displaystyle p\in \{1,\ldots ,q\}} and ( d p , 1 , … , d p , m ) {\displaystyle (d_{p,1},\ldots ,d_{p,m})} for every p ∈ { 1 , … , ℓ } {\displaystyle p\in \{1,\ldots ,\ell \}} are nonzero. Then ( y 1 , … , y q ) {\displaystyle (y_{1},\ldots ,y_{q})} with y i := ∑ j = 1 m c i , j B j {\displaystyle y_{i}:=\sum _{j=1}^{m}c_{i,j}B_{j}} is a basis of U + W {\displaystyle U+W} and ( z 1 , … , z ℓ ) {\displaystyle (z_{1},\ldots ,z_{\ell })} with z i := ∑ j = 1 m d i , j B j {\displaystyle z_{i}:=\sum _{j=1}^{m}d_{i,j}B_{j}} is a basis of U ∩ W {\displaystyle U\cap W} . === Proof of correctness === First, we define π 1 : V × V → V , ( a , b ) ↦ a {\displaystyle \pi _{1}:V\times V\to V,(a,b)\mapsto a} to be the projection to the first component. Let H := { ( u , u ) ∣ u ∈ U } + { ( w , 0 ) ∣ w ∈ W } ⊆ V × V . {\displaystyle H:=\{(u,u)\mid u\in U\}+\{(w,0)\mid w\in W\}\subseteq V\times V.} Then π 1 ( H ) = U + W {\displaystyle \pi _{1}(H)=U+W} and H ∩ ( 0 × V ) = 0 × ( U ∩ W ) {\displaystyle H\cap (0\times V)=0\times (U\cap W)} . Also, H ∩ ( 0 × V ) {\displaystyle H\cap (0\times V)} is the kernel of π 1 | H {\displaystyle {\pi _{1}|}_{H}} , the projection restricted to H. Therefore, dim ⁡ ( H ) = dim ⁡ ( U + W ) + dim ⁡ ( U ∩ W ) {\displaystyle \dim(H)=\dim(U+W)+\dim(U\cap W)} . The Zassenhaus algorithm calculates a basis of H. In the first m columns of this matrix, there is a basis y i {\displaystyle y_{i}} of U + W {\displaystyle U+W} . The rows of the form ( 0 , z i ) {\displaystyle (0,z_{i})} (with z i ≠ 0 {\displaystyle z_{i}\neq 0} ) are obviously in H ∩ ( 0 × V ) {\displaystyle H\cap (0\times V)} . Because the matrix is in row echelon form, they are also linearly independent. All rows which are different from zero ( ( y i , ∙ ) {\displaystyle (y_{i},\bullet )} and ( 0 , z i ) {\displaystyle (0,z_{i})} ) are a basis of H, so there are dim ⁡ ( U ∩ W ) {\displaystyle \dim(U\cap W)} such z i {\displaystyle z_{i}} s. Therefore, the z i {\displaystyle z_{i}} s form a basis of U ∩ W {\displaystyle U\cap W} . == Example == Consider the two subspaces U = ⟨ ( 1 − 1 0 1 ) , ( 0 0 1 − 1 ) ⟩ {\displaystyle U=\left\langle \left({\begin{array}{r}1\\-1\\0\\1\end{array}}\right),\left({\begin{array}{r}0\\0\\1\\-1\end{array}}\right)\right\rangle } and W = ⟨ ( 5 0 − 3 3 ) , ( 0 5 − 3 − 2 ) ⟩ {\displaystyle W=\left\langle \left({\begin{array}{r}5\\0\\-3\\3\end{array}}\right),\left({\begin{array}{r}0\\5\\-3\\-2\end{array}}\right)\right\rangle } of the vector space R 4 {\displaystyle \mathbb {R} ^{4}} . Using the standard basis, we create the following matrix of dimension ( 2 + 2 ) × ( 2 ⋅ 4 ) {\displaystyle (2+2)\times (2\cdot 4)} : ( 1 − 1 0 1 1 − 1 0 1 0 0 1 − 1 0 0 1 − 1 5 0 − 3 3 0 0 0 0 0 5 − 3 − 2 0 0 0 0 ) . {\displaystyle \left({\begin{array}{rrrrrrrr}1&-1&0&1&&1&-1&0&1\\0&0&1&-1&&0&0&1&-1\\\\5&0&-3&3&&0&0&0&0\\0&5&-3&-2&&0&0&0&0\end{array}}\right).} Using elementary row operations, we transform this matrix into the following matrix: ( 1 0 0 0 ∙ ∙ ∙ ∙ 0 1 0 − 1 ∙ ∙ ∙ ∙ 0 0 1 − 1 ∙ ∙ ∙ ∙ 0 0 0 0 1 − 1 0 1 ) {\displaystyle \left({\begin{array}{rrrrrrrrr}1&0&0&0&&\bullet &\bullet &\bullet &\bullet \\0&1&0&-1&&\bullet &\bullet &\bullet &\bullet \\0&0&1&-1&&\bullet &\bullet &\bullet &\bullet \\\\0&0&0&0&&1&-1&0&1\end{array}}\right)} (Some entries have been replaced by " ∙ {\displaystyle \bullet } " because they are irrelevant to the result.) Therefore ( ( 1 0 0 0 ) , ( 0 1 0 − 1 ) , ( 0 0 1 − 1 ) ) {\displaystyle \left(\left({\begin{array}{r}1\\0\\0\\0\end{array}}\right),\left({\begin{array}{r}0\\1\\0\\-1\end{array}}\right),\left({\begin{array}{r}0\\0\\1\\-1\end{array}}\right)\right)} is a basis of U + W {\displaystyle U+W} , and ( ( 1 − 1 0 1 ) ) {\displaystyle \left(\left({\begin{array}{r}1\\-1\\0\\1\end{array}}\right)\right)} is a basis of U ∩ W {\displaystyle U\cap W} .

Visual servoing

Visual servoing, also known as vision-based robot control and abbreviated VS, is a technique which uses feedback information extracted from a vision sensor (visual feedback) to control the motion of a robot. One of the earliest papers that talks about visual servoing was from the SRI International Labs in 1979. == Visual servoing taxonomy == There are two fundamental configurations of the robot end-effector (hand) and the camera: Eye-in-hand, or end-point open-loop control, where the camera is attached to the moving hand and observing the relative position of the target. Eye-to-hand, or end-point closed-loop control, where the camera is fixed in the world and observing the target and the motion of the hand. Visual Servoing control techniques are broadly classified into the following types: Image-based (IBVS) Position/pose-based (PBVS) Hybrid approach IBVS was proposed by Weiss and Sanderson. The control law is based on the error between current and desired features on the image plane, and does not involve any estimate of the pose of the target. The features may be the coordinates of visual features, lines or moments of regions. IBVS has difficulties with motions very large rotations, which has come to be called camera retreat. PBVS is a model-based technique (with a single camera). This is because the pose of the object of interest is estimated with respect to the camera and then a command is issued to the robot controller, which in turn controls the robot. In this case the image features are extracted as well, but are additionally used to estimate 3D information (pose of the object in Cartesian space), hence it is servoing in 3D. Hybrid approaches use some combination of the 2D and 3D servoing. There have been a few different approaches to hybrid servoing 2-1/2-D Servoing Motion partition-based Partitioned DOF Based == Survey == The following description of the prior work is divided into 3 parts Survey of existing visual servoing methods. Various features used and their impacts on visual servoing. Error and stability analysis of visual servoing schemes. === Survey of existing visual servoing methods === Visual servo systems, also called servoing, have been around since the early 1980s , although the term visual servo itself was only coined in 1987. Visual Servoing is, in essence, a method for robot control where the sensor used is a camera (visual sensor). Servoing consists primarily of two techniques, one involves using information from the image to directly control the degrees of freedom (DOF) of the robot, thus referred to as Image Based Visual Servoing (IBVS). While the other involves the geometric interpretation of the information extracted from the camera, such as estimating the pose of the target and parameters of the camera (assuming some basic model of the target is known). Other servoing classifications exist based on the variations in each component of a servoing system , e.g. the location of the camera, the two kinds are eye-in-hand and hand–eye configurations. Based on the control loop, the two kinds are end-point-open-loop and end-point-closed-loop. Based on whether the control is applied to the joints (or DOF) directly or as a position command to a robot controller the two types are direct servoing and dynamic look-and-move. Being one of the earliest works the authors proposed a hierarchical visual servo scheme applied to image-based servoing. The technique relies on the assumption that a good set of features can be extracted from the object of interest (e.g. edges, corners and centroids) and used as a partial model along with global models of the scene and robot. The control strategy is applied to a simulation of a two and three DOF robot arm. Feddema et al. introduced the idea of generating task trajectory with respect to the feature velocity. This is to ensure that the sensors are not rendered ineffective (stopping the feedback) for any the robot motions. The authors assume that the objects are known a priori (e.g. CAD model) and all the features can be extracted from the object. The work by Espiau et al. discusses some of the basic questions in visual servoing. The discussions concentrate on modeling of the interaction matrix, camera, visual features (points, lines, etc..). In an adaptive servoing system was proposed with a look-and-move servoing architecture. The method used optical flow along with SSD to provide a confidence metric and a stochastic controller with Kalman filtering for the control scheme. The system assumes (in the examples) that the plane of the camera and the plane of the features are parallel., discusses an approach of velocity control using the Jacobian relationship s˙ = Jv˙ . In addition the author uses Kalman filtering, assuming that the extracted position of the target have inherent errors (sensor errors). A model of the target velocity is developed and used as a feed-forward input in the control loop. Also, mentions the importance of looking into kinematic discrepancy, dynamic effects, repeatability, settling time oscillations and lag in response. Corke poses a set of very critical questions on visual servoing and tries to elaborate on their implications. The paper primarily focuses the dynamics of visual servoing. The author tries to address problems like lag and stability, while also talking about feed-forward paths in the control loop. The paper also, tries to seek justification for trajectory generation, methodology of axis control and development of performance metrics. Chaumette in provides good insight into the two major problems with IBVS. One, servoing to a local minima and second, reaching a Jacobian singularity. The author show that image points alone do not make good features due to the occurrence of singularities. The paper continues, by discussing the possible additional checks to prevent singularities namely, condition numbers of J_s and Jˆ+_s, to check the null space of ˆ J_s and J^T_s . One main point that the author highlights is the relation between local minima and unrealizable image feature motions. Over the years many hybrid techniques have been developed. These involve computing partial/complete pose from Epipolar Geometry using multiple views or multiple cameras. The values are obtained by direct estimation or through a learning or a statistical scheme. While others have used a switching approach that changes between image-based and position-based on a Lyapnov function. The early hybrid techniques that used a combination of image-based and pose-based (2D and 3D information) approaches for servoing required either a full or partial model of the object in order to extract the pose information and used a variety of techniques to extract the motion information from the image. used an affine motion model from the image motion in addition to a rough polyhedral CAD model to extract the object pose with respect to the camera to be able to servo onto the object (on the lines of PBVS). 2-1/2-D visual servoing developed by Malis et al. is a well known technique that breaks down the information required for servoing into an organized fashion which decouples rotations and translations. The papers assume that the desired pose is known a priori. The rotational information is obtained from partial pose estimation, a homography, (essentially 3D information) giving an axis of rotation and the angle (by computing the eigenvalues and eigenvectors of the homography). The translational information is obtained from the image directly by tracking a set of feature points. The only conditions being that the feature points being tracked never leave the field of view and that a depth estimate be predetermined by some off-line technique. 2-1/2-D servoing has been shown to be more stable than the techniques that preceded it. Another interesting observation with this formulation is that the authors claim that the visual Jacobian will have no singularities during the motions. The hybrid technique developed by Corke and Hutchinson, popularly called portioned approach partitions the visual (or image) Jacobian into motions (both rotations and translations) relating X and Y axes and motions related to the Z axis. outlines the technique, to break out columns of the visual Jacobian that correspond to the Z axis translation and rotation (namely, the third and sixth columns). The partitioned approach is shown to handle the Chaumette Conundrum discussed in. This technique requires a good depth estimate in order to function properly. outlines a hybrid approach where the servoing task is split into two, namely main and secondary. The main task is keep the features of interest within the field of view. While the secondary task is to mark a fixation point and use it as a reference to bring the camera to the desired pose. The technique does need a depth estimate from an off-line procedure. The paper discusses two examples for which depth estimates are obtained from robot odometry and by assuming that all

Radial basis function kernel

In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification. The RBF kernel on two samples x , x ′ ∈ R k {\displaystyle \mathbf {x} ,\mathbf {x'} \in \mathbb {R} ^{k}} , represented as feature vectors in some input space, is defined as K ( x , x ′ ) = exp ⁡ ( − ‖ x − x ′ ‖ 2 2 σ 2 ) {\displaystyle K(\mathbf {x} ,\mathbf {x'} )=\exp \left(-{\frac {\|\mathbf {x} -\mathbf {x'} \|^{2}}{2\sigma ^{2}}}\right)} ‖ x − x ′ ‖ 2 {\displaystyle \textstyle \|\mathbf {x} -\mathbf {x'} \|^{2}} may be recognized as the squared Euclidean distance between the two feature vectors. σ {\displaystyle \sigma } is a free parameter. An equivalent definition involves a parameter γ = 1 2 σ 2 {\displaystyle \textstyle \gamma ={\tfrac {1}{2\sigma ^{2}}}} : K ( x , x ′ ) = exp ⁡ ( − γ ‖ x − x ′ ‖ 2 ) {\displaystyle K(\mathbf {x} ,\mathbf {x'} )=\exp(-\gamma \|\mathbf {x} -\mathbf {x'} \|^{2})} Since the value of the RBF kernel decreases with distance and ranges between zero (in the infinite-distance limit) and one (when x = x'), it has a ready interpretation as a similarity measure. The feature space of the kernel has an infinite number of dimensions; for σ = 1 {\displaystyle \sigma =1} , its expansion using the multinomial theorem is: exp ⁡ ( − 1 2 ‖ x − x ′ ‖ 2 ) = exp ⁡ ( 2 2 x ⊤ x ′ − 1 2 ‖ x ‖ 2 − 1 2 ‖ x ′ ‖ 2 ) = exp ⁡ ( x ⊤ x ′ ) exp ⁡ ( − 1 2 ‖ x ‖ 2 ) exp ⁡ ( − 1 2 ‖ x ′ ‖ 2 ) = ∑ j = 0 ∞ ( x ⊤ x ′ ) j j ! exp ⁡ ( − 1 2 ‖ x ‖ 2 ) exp ⁡ ( − 1 2 ‖ x ′ ‖ 2 ) = ∑ j = 0 ∞ ∑ n 1 + n 2 + ⋯ + n k = j exp ⁡ ( − 1 2 ‖ x ‖ 2 ) x 1 n 1 ⋯ x k n k n 1 ! ⋯ n k ! exp ⁡ ( − 1 2 ‖ x ′ ‖ 2 ) x ′ 1 n 1 ⋯ x ′ k n k n 1 ! ⋯ n k ! = ⟨ φ ( x ) , φ ( x ′ ) ⟩ {\displaystyle {\begin{alignedat}{2}\exp \left(-{\frac {1}{2}}\|\mathbf {x} -\mathbf {x'} \|^{2}\right)&=\exp \left({\frac {2}{2}}\mathbf {x} ^{\top }\mathbf {x'} -{\frac {1}{2}}\|\mathbf {x} \|^{2}-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\[5pt]&=\exp \left(\mathbf {x} ^{\top }\mathbf {x'} \right)\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\[5pt]&=\sum _{j=0}^{\infty }{\frac {(\mathbf {x} ^{\top }\mathbf {x'} )^{j}}{j!}}\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\[5pt]&=\sum _{j=0}^{\infty }\quad \sum _{n_{1}+n_{2}+\dots +n_{k}=j}\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right){\frac {x_{1}^{n_{1}}\cdots x_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right){\frac {{x'}_{1}^{n_{1}}\cdots {x'}_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\\[5pt]&=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle \end{alignedat}}} φ ( x ) = exp ⁡ ( − 1 2 ‖ x ‖ 2 ) ( a ℓ 0 ( 0 ) , a 1 ( 1 ) , … , a ℓ 1 ( 1 ) , … , a 1 ( j ) , … , a ℓ j ( j ) , … ) {\displaystyle \varphi (\mathbf {x} )=\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\left(a_{\ell _{0}}^{(0)},a_{1}^{(1)},\dots ,a_{\ell _{1}}^{(1)},\dots ,a_{1}^{(j)},\dots ,a_{\ell _{j}}^{(j)},\dots \right)} where ℓ j = ( k + j − 1 j ) {\displaystyle \ell _{j}={\tbinom {k+j-1}{j}}} , a ℓ ( j ) = x 1 n 1 ⋯ x k n k n 1 ! ⋯ n k ! | n 1 + n 2 + ⋯ + n k = j ∧ 1 ≤ ℓ ≤ ℓ j {\displaystyle a_{\ell }^{(j)}={\frac {x_{1}^{n_{1}}\cdots x_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\quad |\quad n_{1}+n_{2}+\dots +n_{k}=j\wedge 1\leq \ell \leq \ell _{j}} == Approximations == Because support vector machines and other models employing the kernel trick do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced. Typically, these take the form of a function z that maps a single vector to a vector of higher dimensionality, approximating the kernel: ⟨ z ( x ) , z ( x ′ ) ⟩ ≈ ⟨ φ ( x ) , φ ( x ′ ) ⟩ = K ( x , x ′ ) {\displaystyle \langle z(\mathbf {x} ),z(\mathbf {x'} )\rangle \approx \langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle =K(\mathbf {x} ,\mathbf {x'} )} where φ {\displaystyle \textstyle \varphi } is the implicit mapping embedded in the RBF kernel. === Fourier random features === One way to construct such a z is to randomly sample from the Fourier transformation of the kernel φ ( x ) = 1 D [ cos ⁡ ⟨ w 1 , x ⟩ , sin ⁡ ⟨ w 1 , x ⟩ , … , cos ⁡ ⟨ w D , x ⟩ , sin ⁡ ⟨ w D , x ⟩ ] T {\displaystyle \varphi (x)={\frac {1}{\sqrt {D}}}[\cos \langle w_{1},x\rangle ,\sin \langle w_{1},x\rangle ,\ldots ,\cos \langle w_{D},x\rangle ,\sin \langle w_{D},x\rangle ]^{T}} where w 1 , . . . , w D {\displaystyle w_{1},...,w_{D}} are independent samples from the normal distribution N ( 0 , σ − 2 I ) {\displaystyle N(0,\sigma ^{-2}I)} . Theorem: E ⁡ [ ⟨ φ ( x ) , φ ( y ) ⟩ ] = e ‖ x − y ‖ 2 / ( 2 σ 2 ) . {\displaystyle \operatorname {E} [\langle \varphi (x),\varphi (y)\rangle ]=e^{\|x-y\|^{2}/(2\sigma ^{2})}.} Proof: It suffices to prove the case of D = 1 {\displaystyle D=1} . Use the trigonometric identity cos ⁡ ( a − b ) = cos ⁡ ( a ) cos ⁡ ( b ) + sin ⁡ ( a ) sin ⁡ ( b ) {\displaystyle \cos(a-b)=\cos(a)\cos(b)+\sin(a)\sin(b)} , the spherical symmetry of Gaussian distribution, then evaluate the integral ∫ − ∞ ∞ cos ⁡ ( k x ) e − x 2 / 2 2 π d x = e − k 2 / 2 . {\displaystyle \int _{-\infty }^{\infty }{\frac {\cos(kx)e^{-x^{2}/2}}{\sqrt {2\pi }}}dx=e^{-k^{2}/2}.} Theorem: Var ⁡ [ ⟨ φ ( x ) , φ ( y ) ⟩ ] = O ( D − 1 ) {\displaystyle \operatorname {Var} [\langle \varphi (x),\varphi (y)\rangle ]=O(D^{-1})} . (Appendix A.2). === Nyström method === Another approach uses the Nyström method to approximate the eigendecomposition of the Gram matrix K, using only a random sample of the training set.

LogitBoost

In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost function of logistic regression, one can derive the LogitBoost algorithm. == Minimizing the LogitBoost cost function == LogitBoost can be seen as a convex optimization. Specifically, given that we seek an additive model of the form f = ∑ t α t h t {\displaystyle f=\sum _{t}\alpha _{t}h_{t}} the LogitBoost algorithm minimizes the logistic loss: ∑ i log ⁡ ( 1 + e − y i f ( x i ) ) {\displaystyle \sum _{i}\log \left(1+e^{-y_{i}f(x_{i})}\right)}

Ordination (statistics)

Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). In contrast to cluster analysis, ordination orders quantities in a (usually lower-dimensional) latent space. In the ordination space, quantities that are near each other share attributes (i.e., are similar to some degree), and dissimilar objects are farther from each other. Such relationships between the objects, on each of several axes or latent variables, are then characterized numerically and/or graphically in a biplot. The first ordination method, principal components analysis, was suggested by Karl Pearson in 1901. == Methods == Ordination methods can broadly be categorized in eigenvector-, algorithm-, or model-based methods. Many classical ordination techniques, including principal components analysis, correspondence analysis (CA) and its derivatives (detrended correspondence analysis, canonical correspondence analysis, and redundancy analysis, belong to the first group). The second group includes some distance-based methods such as non-metric multidimensional scaling, and machine learning methods such as T-distributed stochastic neighbor embedding and nonlinear dimensionality reduction. The third group includes model-based ordination methods, which can be considered as multivariate extensions of Generalized Linear Models. Model-based ordination methods are more flexible in their application than classical ordination methods, so that it is for example possible to include random-effects. Unlike in the aforementioned two groups, there is no (implicit or explicit) distance measure in the ordination. Instead, a distribution needs to be specified for the responses as is typical for statistical models. These and other assumptions, such as the assumed mean-variance relationship, can be validated with the use of residual diagnostics, unlike in other ordination methods. == Applications == Ordination can be used on the analysis of any set of multivariate objects. It is frequently used in several environmental or ecological sciences, particularly plant community ecology. It is also used in genetics and systems biology for microarray data analysis and in psychometrics.

Decision tree pruning

Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information. Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance. == Techniques == Pruning processes can be divided into two types (pre- and post-pruning). Pre-pruning procedures prevent a complete induction of the training set by replacing a stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain (Attr)> minGain). Pre-pruning methods are considered to be more efficient because they do not induce an entire set, but rather trees remain small from the start. Prepruning methods share a common problem, the horizon effect. This is to be understood as the undesired premature termination of the induction by the stop () criterion. Post-pruning (or just pruning) is the most common way of simplifying trees. Here, nodes and subtrees are replaced with leaves to reduce complexity. Pruning can not only significantly reduce the size but also improve the classification accuracy of unseen objects. It may be the case that the accuracy of the assignment on the train set deteriorates, but the accuracy of the classification properties of the tree increases overall. The procedures are differentiated on the basis of their approach in the tree (top-down or bottom-up). === Bottom-up pruning === These procedures start at the last node in the tree (the lowest point). Following recursively upwards, they determine the relevance of each individual node. If the relevance for the classification is not given, the node is dropped or replaced by a leaf. The advantage is that no relevant sub-trees can be lost with this method. These methods include Reduced Error Pruning (REP), Minimum Cost Complexity Pruning (MCCP), or Minimum Error Pruning (MEP). === Top-down pruning === In contrast to the bottom-up method, this method starts at the root of the tree. Following the structure below, a relevance check is carried out which decides whether a node is relevant for the classification of all n items or not. By pruning the tree at an inner node, it can happen that an entire sub-tree (regardless of its relevance) is dropped. One of these representatives is pessimistic error pruning (PEP), which brings quite good results with unseen items. == Pruning algorithms == === Reduced error pruning === One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed. === Cost complexity pruning === Cost complexity pruning generates a series of trees ⁠ T 0 … T m {\displaystyle T_{0}\dots T_{m}} ⁠ where ⁠ T 0 {\displaystyle T_{0}} ⁠ is the initial tree and ⁠ T m {\displaystyle T_{m}} ⁠ is the root alone. At step ⁠ i {\displaystyle i} ⁠, the tree is created by removing a subtree from tree ⁠ i − 1 {\displaystyle i-1} ⁠ and replacing it with a leaf node with value chosen as in the tree building algorithm. The subtree that is removed is chosen as follows: Define the error rate of tree ⁠ T {\displaystyle T} ⁠ over data set ⁠ S {\displaystyle S} ⁠ as ⁠ err ⁡ ( T , S ) {\displaystyle \operatorname {err} (T,S)} ⁠. The subtree t {\displaystyle t} that minimizes err ⁡ ( prune ⁡ ( T , t ) , S ) − err ⁡ ( T , S ) | leaves ⁡ ( T ) | − | leaves ⁡ ( prune ⁡ ( T , t ) ) | {\displaystyle {\frac {\operatorname {err} (\operatorname {prune} (T,t),S)-\operatorname {err} (T,S)}{\left\vert \operatorname {leaves} (T)\right\vert -\left\vert \operatorname {leaves} (\operatorname {prune} (T,t))\right\vert }}} is chosen for removal. The function ⁠ prune ⁡ ( T , t ) {\displaystyle \operatorname {prune} (T,t)} ⁠ defines the tree obtained by pruning the subtrees ⁠ t {\displaystyle t} ⁠ from the tree ⁠ T {\displaystyle T} ⁠. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. == Examples == Pruning could be applied in a compression scheme of a learning algorithm to remove the redundant details without compromising the model's performances. In neural networks, pruning removes entire neurons or layers of neurons.

PVLV

The primary value learned value (PVLV) model is a possible explanation for the reward-predictive firing properties of dopamine (DA) neurons. It simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards. It is an alternative to the temporal-differences (TD) algorithm. It is used as part of Leabra.