AI for Business

Explore the best AI for Business — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Video browsing

    Video browsing

    Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search. == History == Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. === Video Notebook === Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. === Video Browser Showdown === The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.

    Read more →
  • Parity benchmark

    Parity benchmark

    Parity problems are widely used as benchmark problems in genetic programming but inherited from the artificial neural network community. Parity is calculated by summing all the binary inputs and reporting if the sum is odd or even. This is considered difficult because: a very simple artificial neural network cannot solve it, and all inputs need to be considered and a change to any one of them changes the answer.

    Read more →
  • Jackknife variance estimates for random forest

    Jackknife variance estimates for random forest

    In statistics, jackknife variance estimates for random forest are a way to estimate the variance in random forest models, in order to eliminate the bootstrap effects. == Jackknife variance estimates == The sampling variance of bagged learners is: V ( x ) = V a r [ θ ^ ∞ ( x ) ] {\displaystyle V(x)=Var[{\hat {\theta }}^{\infty }(x)]} Jackknife estimates can be considered to eliminate the bootstrap effects. The jackknife variance estimator is defined as: V ^ j = n − 1 n ∑ i = 1 n ( θ ^ ( − i ) − θ ¯ ) 2 {\displaystyle {\hat {V}}_{j}={\frac {n-1}{n}}\sum _{i=1}^{n}({\hat {\theta }}_{(-i)}-{\overline {\theta }})^{2}} In some classification problems, when random forest is used to fit models, jackknife estimated variance is defined as: V ^ j = n − 1 n ∑ i = 1 n ( t ¯ ( − i ) ⋆ ( x ) − t ¯ ⋆ ( x ) ) 2 {\displaystyle {\hat {V}}_{j}={\frac {n-1}{n}}\sum _{i=1}^{n}({\overline {t}}_{(-i)}^{\star }(x)-{\overline {t}}^{\star }(x))^{2}} Here, t ⋆ {\displaystyle t^{\star }} denotes a decision tree after training, t ( − i ) ⋆ {\displaystyle t_{(-i)}^{\star }} denotes the result based on samples without i t h {\displaystyle ith} observation. == Examples == E-mail spam problem is a common classification problem, in this problem, 57 features are used to classify spam e-mail and non-spam e-mail. Applying IJ-U variance formula to evaluate the accuracy of models with m=15,19 and 57. The results shows in paper( Confidence Intervals for Random Forests: The jackknife and the Infinitesimal Jackknife ) that m = 57 random forest appears to be quite unstable, while predictions made by m=5 random forest appear to be quite stable, this results is corresponding to the evaluation made by error percentage, in which the accuracy of model with m=5 is high and m=57 is low. Here, accuracy is measured by error rate, which is defined as: E r r o r R a t e = 1 N ∑ i = 1 N ∑ j = 1 M y i j , {\displaystyle ErrorRate={\frac {1}{N}}\sum _{i=1}^{N}\sum _{j=1}^{M}y_{ij},} Here N is also the number of samples, M is the number of classes, y i j {\displaystyle y_{ij}} is the indicator function which equals 1 when i t h {\displaystyle ith} observation is in class j, equals 0 when in other classes. No probability is considered here. There is another method which is similar to error rate to measure accuracy: l o g l o s s = 1 N ∑ i = 1 N ∑ j = 1 M y i j l o g ( p i j ) {\displaystyle logloss={\frac {1}{N}}\sum _{i=1}^{N}\sum _{j=1}^{M}y_{ij}log(p_{ij})} Here N is the number of samples, M is the number of classes, y i j {\displaystyle y_{ij}} is the indicator function which equals 1 when i t h {\displaystyle ith} observation is in class j, equals 0 when in other classes. p i j {\displaystyle p_{ij}} is the predicted probability of i t h {\displaystyle ith} observation in class j {\displaystyle j} .This method is used in Kaggle These two methods are very similar. == Modification for bias == When using Monte Carlo MSEs for estimating V I J ∞ {\displaystyle V_{IJ}^{\infty }} and V J ∞ {\displaystyle V_{J}^{\infty }} , a problem about the Monte Carlo bias should be considered, especially when n is large, the bias is getting large: E [ V ^ I J B ] − V ^ I J ∞ ≈ n ∑ b = 1 B ( t b ⋆ − t ¯ ⋆ ) 2 B {\displaystyle E[{\hat {V}}_{IJ}^{B}]-{\hat {V}}_{IJ}^{\infty }\approx {\frac {n\sum _{b=1}^{B}(t_{b}^{\star }-{\bar {t}}^{\star })^{2}}{B}}} To eliminate this influence, bias-corrected modifications are suggested: V ^ I J − U B = V ^ I J B − n ∑ b = 1 B ( t b ⋆ − t ¯ ⋆ ) 2 B {\displaystyle {\hat {V}}_{IJ-U}^{B}={\hat {V}}_{IJ}^{B}-{\frac {n\sum _{b=1}^{B}(t_{b}^{\star }-{\bar {t}}^{\star })^{2}}{B}}} V ^ J − U B = V ^ J B − ( e − 1 ) n ∑ b = 1 B ( t b ⋆ − t ¯ ⋆ ) 2 B {\displaystyle {\hat {V}}_{J-U}^{B}={\hat {V}}_{J}^{B}-(e-1){\frac {n\sum _{b=1}^{B}(t_{b}^{\star }-{\bar {t}}^{\star })^{2}}{B}}}

    Read more →
  • Error-driven learning

    Error-driven learning

    In reinforcement learning, error-driven learning is a method for adjusting a model's (intelligent agent's) parameters based on the difference between its output results and the ground truth. These models stand out as they depend on environmental feedback, rather than explicit labels or categories. They are based on the idea that language acquisition involves the minimization of the prediction error (MPSE). By leveraging these prediction errors, the models consistently refine expectations and decrease computational complexity. Typically, these algorithms are operated by the GeneRec algorithm. Error-driven learning has widespread applications in cognitive sciences and computer vision. These methods have also found successful application in natural language processing (NLP), including areas like part-of-speech tagging, parsing, named entity recognition (NER), machine translation (MT), speech recognition (SR), and dialogue systems. == Formal Definition == Error-driven learning models are ones that rely on the feedback of prediction errors to adjust the expectations or parameters of a model. The key components of error-driven learning include the following: A set S {\displaystyle S} of states representing the different situations that the learner can encounter. A set A {\displaystyle A} of actions that the learner can take in each state. A prediction function P ( s , a ) {\displaystyle P(s,a)} that gives the learner's current prediction of the outcome of taking action a {\displaystyle a} in state s {\displaystyle s} . An error function E ( o , p ) {\displaystyle E(o,p)} that compares the actual outcome o {\displaystyle o} with the prediction p {\displaystyle p} and produces an error value. An update rule U ( p , e ) {\displaystyle U(p,e)} that adjusts the prediction p {\displaystyle p} in light of the error e {\displaystyle e} . == Algorithms == Error-driven learning algorithms refer to a category of reinforcement learning algorithms that leverage the disparity between the real output and the expected output of a system to regulate the system's parameters. Typically applied in supervised learning, these algorithms are provided with a collection of input-output pairs to facilitate the process of generalization. The widely utilized error backpropagation learning algorithm is known as GeneRec, a generalized recirculation algorithm primarily employed for gene prediction in DNA sequences. Many other error-driven learning algorithms are derived from alternative versions of GeneRec. == Applications == === Cognitive science === Simpler error-driven learning models effectively capture complex human cognitive phenomena and anticipate elusive behaviors. They provide a flexible mechanism for modeling the brain's learning process, encompassing perception, attention, memory, and decision-making. By using errors as guiding signals, these algorithms adeptly adapt to changing environmental demands and objectives, capturing statistical regularities and structure. Furthermore, cognitive science has led to the creation of new error-driven learning algorithms that are both biologically acceptable and computationally efficient. These algorithms, including deep belief networks, spiking neural networks, and reservoir computing, follow the principles and constraints of the brain and nervous system. Their primary aim is to capture the emergent properties and dynamics of neural circuits and systems. === Computer vision === Computer vision is a complex task that involves understanding and interpreting visual data, such as images or videos. In the context of error-driven learning, the computer vision model learns from the mistakes it makes during the interpretation process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This repeated process of learning from errors helps improve the model's performance over time. For NLP to do well at computer vision, it employs deep learning techniques. This form of computer vision is sometimes called neural computer vision (NCV), since it makes use of neural networks. NCV therefore interprets visual data based on a statistical, trial and error approach and can deal with context and other subtleties of visual data. === Natural Language Processing === ==== Part-of-speech tagging ==== Part-of-speech (POS) tagging is a crucial component in Natural Language Processing (NLP). It helps resolve human language ambiguity at different analysis levels. In addition, its output (tagged data) can be used in various applications of NLP such as information extraction, information retrieval, question Answering, speech eecognition, text-to-speech conversion, partial parsing, and grammar correction. ==== Parsing ==== Parsing in NLP involves breaking down a text into smaller pieces (phrases) based on grammar rules. If a sentence cannot be parsed, it may contain grammatical errors. In the context of error-driven learning, the parser learns from the mistakes it makes during the parsing process. When an error is encountered, the parser updates its internal model to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the parser's performance over time. In conclusion, error-driven learning plays a crucial role in improving the accuracy and efficiency of NLP parsers by allowing them to learn from their mistakes and adapt their internal models accordingly. ==== Named entity recognition (NER) ==== NER is the task of identifying and classifying entities (such as persons, locations, organizations, etc.) in a text. Error-driven learning can help the model learn from its false positives and false negatives and improve its recall and precision on (NER). In the context of error-driven learning, the significance of NER is quite profound. Traditional sequence labeling methods identify nested entities layer by layer. If an error occurs in the recognition of an inner entity, it can lead to incorrect identification of the outer entity, leading to a problem known as error propagation of nested entities. This is where the role of NER becomes crucial in error-driven learning. By accurately recognizing and classifying entities, it can help minimize these errors and improve the overall accuracy of the learning process. Furthermore, deep learning-based NER methods have shown to be more accurate as they are capable of assembling words, enabling them to understand the semantic and syntactic relationship between various words better. ==== Machine translation ==== Machine translation is a complex task that involves converting text from one language to another. In the context of error-driven learning, the machine translation model learns from the mistakes it makes during the translation process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. ==== Speech recognition ==== Speech recognition is a complex task that involves converting spoken language into written text. In the context of error-driven learning, the speech recognition model learns from the mistakes it makes during the recognition process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. ==== Dialogue systems ==== Dialogue systems are a popular NLP task as they have promising real-life applications. They are also complicated tasks since many NLP tasks deserving study are involved. In the context of error-driven learning, the dialogue system learns from the mistakes it makes during the dialogue process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. == Advantages == Error-driven learning has several advantages over other types of machine learning algorithms: They can learn from feedback and correct their mistakes, which makes them adaptive and robust to noise and changes in the data. They can handle large and high-dimensional data sets, as they do not require explicit feature engineering or prior knowledge of the data distribution. They can achieve high accuracy and performance, as they can learn complex and nonlinear relationships between the input and the output. == Limitations == Although error driven learning has its advantages, their algorithms also have the following limitations: They can suffer from overfitting, which means that they memorize the training data and fail to generalize to new and unseen data. This can be mitigated by using regularization techniques, such as adding a penalty term to the loss function, or reducing the complexity of the model. They can be sensitive to the choice of

    Read more →
  • Spotify Live

    Spotify Live

    Spotify Live, formerly Spotify Greenroom, was a social audio app by Spotify, that allowed users to host or participate in live-audio virtual environments called "room" for conversations. Each room had a maximum capacity of 1000 people. The app was available on Android and iOS, competing with Twitter Spaces and Clubhouse in the social media segment. It was shut down on April 30, 2023. == History == In October 2020, Betty Labs released Locker Room exclusively on the iOS App Store. The app featured virtual audio chat rooms for sports enthusiasts. In late March 2021, Spotify acquired Betty Labs for $50 million and announced plans to rebrand the app with a broader focus on sports, music, and pop culture. On June 16, 2021, Spotify launched the app as Spotify Greenroom on Android (early access) and iOS, expanding its scope beyond just sports. At launch, Spotify introduced the Greenroom Creator Fund to support creators and shows, serving as a rival to Clubhouse's Creator First Accelerator Program. The fund aimed to provide a monetization path for podcasters integrating Greenroom into their verified Spotify accounts. By July 2021, the app had accumulated over 140,000 iOS installs and 100,000 Android installs. In August 2021, Spotify collaborated with the WWE to produce professional wrestling-related podcasts, many of which would be recorded by The Ringer, Spotify's in-house podcasting team, using Greenroom. In March 2022, Spotify Greenroom announced its rebranding as Spotify Live and its migration to the main Spotify app. After a year, Spotify announced it would shut down the Spotify Live app at the end of April 2023. == Features == Greenroom allowed users to create or join a room, which, in the context of the application, was a virtual space for real-time voice chats. Users could only create a room within a pre-defined group, representing either a brand or a generic category. If a user chose to create a room, they became the host, with the ability to invite people, control who could talk, and enable features like recording and the Discussions tab during room creation. Enabling recording displayed a disclaimer informing users that the conversation was being recorded, and the audio, recorded in mp4 format, would be sent to the host via email after the room concluded. If the Discussions tab was enabled, users could send text messages in the public chat section. The host also had the authority to ban users if necessary. When joining a room, a user could opt to be a listener or request to become a speaker. Users had the freedom to follow or block others and join groups at their discretion. Notifications about new rooms in joined groups would be sent to users. Additionally, users could discover new individuals and groups using the search tab. == Partnered creators == By October 2021, Spotify had a variety of partnered creators aimed at boosting traffic and validating its vertically integrated podcast model. These creators primarily focused on Generation Z. In-house Spotify talent, such as The Ringer, produced sports-related content. Simultaneously, the company recruited creators from various social channels to grow Greenroom's audience while also promoting its integration with Spotify and Anchor. Each verified Spotify partner had their Greenroom shows featured in both the Greenroom app and their profiles on the Spotify app. This was part of the company's strategy leading into the 2022 ramp-up to compete with Clubhouse. == Platforms == The app was accessible on both Android and iOS platforms, and users could download the app from their respective app stores. Android users needed Android 8 or above to launch the app, while iOS consumers required iOS 13 or later to run it.

    Read more →
  • Vanishing gradient problem

    Vanishing gradient problem

    In machine learning, the vanishing gradient problem is the problem of greatly diverging gradient magnitudes between earlier and later layers encountered when training neural networks with backpropagation. In such methods, neural network weights are updated proportional to their partial derivative of the loss function. As the number of forward propagation steps in a network increases, for instance due to greater network depth, the gradients of earlier weights are calculated with increasingly many multiplications. These multiplications shrink the gradient magnitude. Consequently, the gradients of earlier weights will be exponentially smaller than the gradients of later weights. This difference in gradient magnitude might introduce instability in the training process, slow it, or halt it entirely. For instance, consider the hyperbolic tangent activation function. The gradients of this function are in range [0,1]. The product of repeated multiplication with such gradients decreases exponentially. The inverse problem, when weight gradients at earlier layers get exponentially larger, is called the exploding gradient problem. Backpropagation allowed researchers to train supervised deep artificial neural networks from scratch, initially with little success. Hochreiter's diplom thesis of 1991 formally identified the reason for this failure in the "vanishing gradient problem", which not only affects many-layered feedforward networks, but also recurrent networks. The latter are trained by unfolding them into very deep feedforward networks, where a new layer is created for each time-step of an input sequence processed by the network (the combination of unfolding and backpropagation is termed backpropagation through time). == Prototypical models == This section is based on the paper On the difficulty of training Recurrent Neural Networks by Pascanu, Mikolov, and Bengio. === Recurrent network model === A generic recurrent network has hidden states h 1 , h 2 , … {\displaystyle h_{1},h_{2},\dots } , inputs u 1 , u 2 , … {\displaystyle u_{1},u_{2},\dots } , and outputs x 1 , x 2 , … {\displaystyle x_{1},x_{2},\dots } . Let it be parameterized by θ {\displaystyle \theta } , so that the system evolves as ( h t , x t ) = F ( h t − 1 , u t , θ ) {\displaystyle (h_{t},x_{t})=F(h_{t-1},u_{t},\theta )} Often, the output x t {\displaystyle x_{t}} is a function of h t {\displaystyle h_{t}} , as some x t = G ( h t ) {\displaystyle x_{t}=G(h_{t})} . The vanishing gradient problem already presents itself clearly when x t = h t {\displaystyle x_{t}=h_{t}} , so we simplify our notation to the special case with: x t = F ( x t − 1 , u t , θ ) {\displaystyle x_{t}=F(x_{t-1},u_{t},\theta )} Now, take its differential: d x t = ∇ θ F ( x t − 1 , u t , θ ) d θ + ∇ x F ( x t − 1 , u t , θ ) d x t − 1 = ∇ θ F ( x t − 1 , u t , θ ) d θ + ∇ x F ( x t − 1 , u t , θ ) [ ∇ θ F ( x t − 2 , u t − 1 , θ ) d θ + ∇ x F ( x t − 2 , u t − 1 , θ ) d x t − 2 ] ⋮ = [ ∇ θ F ( x t − 1 , u t , θ ) + ∇ x F ( x t − 1 , u t , θ ) ∇ θ F ( x t − 2 , u t − 1 , θ ) + ⋯ ] d θ {\displaystyle {\begin{aligned}dx_{t}&=\nabla _{\theta }F(x_{t-1},u_{t},\theta )d\theta +\nabla _{x}F(x_{t-1},u_{t},\theta )dx_{t-1}\\&=\nabla _{\theta }F(x_{t-1},u_{t},\theta )d\theta +\nabla _{x}F(x_{t-1},u_{t},\theta )\left[\nabla _{\theta }F(x_{t-2},u_{t-1},\theta )d\theta +\nabla _{x}F(x_{t-2},u_{t-1},\theta )dx_{t-2}\right]\\&\;\;\vdots \\&=\left[\nabla _{\theta }F(x_{t-1},u_{t},\theta )+\nabla _{x}F(x_{t-1},u_{t},\theta )\nabla _{\theta }F(x_{t-2},u_{t-1},\theta )+\cdots \right]d\theta \end{aligned}}} Training the network requires us to define a loss function to be minimized. Let it be L ( x T , u 1 , … , u T ) {\displaystyle L(x_{T},u_{1},\dots ,u_{T})} , then minimizing it by gradient descent gives Δ θ = − η ⋅ [ ∇ x L ( x T ) ( ∇ θ F ( x t − 1 , u t , θ ) + ∇ x F ( x t − 1 , u t , θ ) ∇ θ F ( x t − 2 , u t − 1 , θ ) + ⋯ ) ] T {\displaystyle \Delta \theta =-\eta \cdot \left[\nabla _{x}L(x_{T})\left(\nabla _{\theta }F(x_{t-1},u_{t},\theta )+\nabla _{x}F(x_{t-1},u_{t},\theta )\nabla _{\theta }F(x_{t-2},u_{t-1},\theta )+\cdots \right)\right]^{T}} where η {\displaystyle \eta } is the learning rate. The vanishing/exploding gradient problem appears because there are repeated multiplications, of the form ∇ x F ( x t − 1 , u t , θ ) ∇ x F ( x t − 2 , u t − 1 , θ ) ∇ x F ( x t − 3 , u t − 2 , θ ) ⋯ {\displaystyle \nabla _{x}F(x_{t-1},u_{t},\theta )\nabla _{x}F(x_{t-2},u_{t-1},\theta )\nabla _{x}F(x_{t-3},u_{t-2},\theta )\cdots } ==== Example: recurrent network with sigmoid activation ==== For a concrete example, consider a typical recurrent network defined by x t = F ( x t − 1 , u t , θ ) = W rec σ ( x t − 1 ) + W in u t + b {\displaystyle x_{t}=F(x_{t-1},u_{t},\theta )=W_{\text{rec}}\sigma (x_{t-1})+W_{\text{in}}u_{t}+b} where θ = ( W rec , W in ) {\displaystyle \theta =(W_{\text{rec}},W_{\text{in}})} is the network parameter, σ {\displaystyle \sigma } is the sigmoid activation function, applied to each vector coordinate separately, and b {\displaystyle b} is the bias vector. Then, ∇ x F ( x t − 1 , u t , θ ) = W rec diag ⁡ ( σ ′ ( x t − 1 ) ) {\displaystyle \nabla _{x}F(x_{t-1},u_{t},\theta )=W_{\text{rec}}\operatorname {diag} (\sigma '(x_{t-1}))} , and so ∇ x F ( x t − 1 , u t , θ ) ∇ x F ( x t − 2 , u t − 1 , θ ) ⋯ ∇ x F ( x t − k , u t − k + 1 , θ ) = W rec diag ⁡ ( σ ′ ( x t − 1 ) ) W rec diag ⁡ ( σ ′ ( x t − 2 ) ) ⋯ W rec diag ⁡ ( σ ′ ( x t − k ) ) {\displaystyle {\begin{aligned}&\nabla _{x}F(x_{t-1},u_{t},\theta )\nabla _{x}F(x_{t-2},u_{t-1},\theta )\cdots \nabla _{x}F(x_{t-k},u_{t-k+1},\theta )\\&=W_{\text{rec}}\operatorname {diag} (\sigma '(x_{t-1}))W_{\text{rec}}\operatorname {diag} (\sigma '(x_{t-2}))\cdots W_{\text{rec}}\operatorname {diag} (\sigma '(x_{t-k}))\end{aligned}}} Since | σ ′ | ≤ 1 {\displaystyle \left|\sigma '\right|\leq 1} , the operator norm of the above multiplication is bounded above by ‖ W rec ‖ k {\displaystyle \left\|W_{\text{rec}}\right\|^{k}} . So if the spectral radius of W rec {\displaystyle W_{\text{rec}}} is γ < 1 {\displaystyle \gamma <1} , then at large k {\displaystyle k} , the above multiplication has operator norm bounded above by γ k → 0 {\displaystyle \gamma ^{k}\to 0} . This is the prototypical vanishing gradient problem. The effect of a vanishing gradient is that the network cannot learn long-range effects. Recall Equation (loss differential): ∇ θ L = ∇ x L ( x T , u 1 , … , u T ) [ ∇ θ F ( x t − 1 , u t , θ ) + ∇ x F ( x t − 1 , u t , θ ) ∇ θ F ( x t − 2 , u t − 1 , θ ) + ⋯ ] {\displaystyle \nabla _{\theta }L=\nabla _{x}L(x_{T},u_{1},\dots ,u_{T})\left[\nabla _{\theta }F(x_{t-1},u_{t},\theta )+\nabla _{x}F(x_{t-1},u_{t},\theta )\nabla _{\theta }F(x_{t-2},u_{t-1},\theta )+\cdots \right]} The components of ∇ θ F ( x , u , θ ) {\displaystyle \nabla _{\theta }F(x,u,\theta )} are just components of σ ( x ) {\displaystyle \sigma (x)} and u {\displaystyle u} , so if u t , u t − 1 , … {\displaystyle u_{t},u_{t-1},\dots } are bounded, then ‖ ∇ θ F ( x t − k − 1 , u t − k , θ ) ‖ {\displaystyle \left\|\nabla _{\theta }F(x_{t-k-1},u_{t-k},\theta )\right\|} is also bounded by some M > 0 {\displaystyle M>0} , and so the terms in ∇ θ L {\displaystyle \nabla _{\theta }L} decay as M γ k {\displaystyle M\gamma ^{k}} . This means that, effectively, ∇ θ L {\displaystyle \nabla _{\theta }L} is affected only by the first O ( γ − 1 ) {\displaystyle O(\gamma ^{-1})} terms in the sum. If γ ≥ 1 {\displaystyle \gamma \geq 1} , the above analysis does not quite work. For the prototypical exploding gradient problem, the next model is clearer. === Dynamical systems model === Following (Doya, 1993), consider this one-neuron recurrent network with sigmoid activation: x t + 1 = ( 1 − ε ) x t + ε σ ( w x t + b ) + ε w ′ u t {\displaystyle x_{t+1}=(1-\varepsilon )x_{t}+\varepsilon \sigma (wx_{t}+b)+\varepsilon w'u_{t}} At the small ε {\displaystyle \varepsilon } limit, the dynamics of the network becomes d x d t = − x ( t ) + σ ( w x ( t ) + b ) + w ′ u ( t ) {\displaystyle {\frac {dx}{dt}}=-x(t)+\sigma (wx(t)+b)+w'u(t)} Consider first the autonomous case, with u = 0 {\displaystyle u=0} . Set w = 5.0 {\displaystyle w=5.0} , and vary b {\displaystyle b} in [ − 3 , − 2 ] {\displaystyle [-3,-2]} . As b {\displaystyle b} decreases, the system has 1 stable point, then has 2 stable points and 1 unstable point, and finally has 1 stable point again. Explicitly, the stable points are ( x , b ) = ( x , ln ⁡ ( x 1 − x ) − 5 x ) {\displaystyle (x,b)=\left(x,\ln \left({\frac {x}{1-x}}\right)-5x\right)} . Now consider Δ x ( T ) Δ x ( 0 ) {\displaystyle {\frac {\Delta x(T)}{\Delta x(0)}}} and Δ x ( T ) Δ b {\displaystyle {\frac {\Delta x(T)}{\Delta b}}} , where T {\displaystyle T} is large enough that the system has settled into one of the stable points. If ( x ( 0 ) , b ) {\displaystyle (x(0),b)} puts the system very close to an unstable point, then a tiny variation in x ( 0 ) {\displaystyle x(0)} or b {\displaystyle b} wo

    Read more →
  • Optical character recognition

    Optical character recognition

    Optical character recognition (OCR) or optical character reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast). Widely used as a form of data entry from printed paper data records – whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printed data, or any suitable documentation – it is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed online, and used in machine processes such as cognitive computing, machine translation, (extracted) text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of accuracy for most fonts are now common, and with support for a variety of image file format inputs. Some systems are capable of reproducing formatted output that closely approximates the original page including images, columns, and other non-textual components. == History == Early optical character recognition may be traced to technologies involving telegraphy and creating reading devices for the blind. In 1914, Emanuel Goldberg developed a machine that read characters and converted them into standard telegraph code. Concurrently, Edmund Fournier d'Albe developed the Optophone, a handheld scanner that when moved across a printed page, produced tones that corresponded to specific letters or characters. In the late 1920s and into the 1930s, Emanuel Goldberg developed what he called a "Statistical Machine" for searching microfilm archives using an optical code recognition system. In 1931, he was granted US Patent number 1,838,389 for the invention. The patent was acquired by IBM. === Visually impaired users === In 1974, Ray Kurzweil started the company Kurzweil Computer Products, Inc. and continued development of omni-font OCR, which could recognize text printed in virtually any font. (Kurzweil is often credited with inventing omni-font OCR, but it was in use by companies, including CompuScan, in the late 1960s and 1970s.) Kurzweil used the technology to create a reading machine for blind people to have a computer read text to them out loud. The device included a CCD-type flatbed scanner and a text-to-speech synthesizer. On January 13, 1976, the finished product was unveiled during a widely reported news conference headed by Kurzweil and the leaders of the National Federation of the Blind. In 1978, Kurzweil Computer Products began selling a commercial version of the optical character recognition computer program. LexisNexis was one of the first customers, and bought the program to upload legal paper and news documents onto its nascent online databases. Two years later, Kurzweil sold his company to Xerox, which eventually spun it off as Scansoft, which merged with Nuance Communications. In the 2000s, OCR was made available online as a service (WebOCR), in a cloud computing environment, and in mobile applications like real-time translation of foreign-language signs on a smartphone. With the advent of smartphones and smartglasses, OCR can be used in internet connected mobile device applications that extract text captured using the device's camera. These devices that do not have built-in OCR functionality will typically use an OCR API to extract the text from the image file captured by the device. The OCR API returns the extracted text, along with information about the location of the detected text in the original image back to the device app for further processing (such as text-to-speech) or display. Various commercial and open source OCR systems are available for most common writing systems, including Latin, Cyrillic, Arabic, Hebrew, Indic, Bengali (Bangla), Devanagari, Tamil, Chinese, Japanese, and Korean characters. == Applications == OCR engines have been developed into software applications specializing in various subjects such as receipts, invoices, checks, and legal billing documents. The software can be used for: Entering data for business documents, e.g. checks, passports, invoices, bank statements and receipts Automatic number-plate recognition Passport recognition and information extraction in airports Automatically extracting key information from insurance documents Traffic-sign recognition Extracting business card information into a contact list Creating textual versions of printed documents, e.g. book scanning for Project Gutenberg Making electronic images of printed documents searchable, e.g. Google Books Converting handwriting in real-time to control a computer (pen computing) Defeating or testing the robustness of CAPTCHA anti-bot systems, though these are specifically designed to prevent OCR. Assistive technology for blind and visually impaired users Writing instructions for vehicles by identifying CAD images in a database that are appropriate to the vehicle design as it changes in real time Making scanned documents searchable by converting them to PDFs == Types == Optical character recognition (OCR) – targets typewritten text, one glyph or character at a time. Optical word recognition – targets typewritten text, one word at a time (for languages that use a space as a word divider). Usually just called "OCR". Intelligent character recognition (ICR) – also targets handwritten printscript or cursive text one glyph or character at a time, usually involving machine learning. Intelligent word recognition (IWR) – also targets handwritten printscript or cursive text, one word at a time. This is especially useful for languages where glyphs are not separated in cursive script. OCR is generally an offline process, which analyses a static document. There are cloud based services which provide an online OCR API service. Handwriting movement analysis can be used as input to handwriting recognition. Instead of merely using the shapes of glyphs and words, this technique is able to capture motion, such as the order in which segments are drawn, the direction, and the pattern of putting the pen down and lifting it. This additional information can make the process more accurate. This technology is also known as "online character recognition", "dynamic character recognition", "real-time character recognition", and "intelligent character recognition". == Techniques == === Pre-processing === OCR software often pre-processes images to improve the chances of successful recognition. Techniques include: De-skewing – if the document was not aligned properly when scanned, it may need to be tilted a few degrees clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical. Despeckling – removal of positive and negative spots, smoothing edges Binarization – conversion of an image from color or greyscale to black-and-white (called a binary image because there are two colors). The task is performed as a simple way of separating the text (or any other desired image component) from the background. The task of binarization is necessary since most commercial recognition algorithms work only on binary images, as it is simpler to do so. In addition, the effectiveness of binarization influences to a significant extent the quality of character recognition, and careful decisions are made in the choice of the binarization employed for a given input image type; since the quality of the method used to obtain the binary result depends on the type of image (scanned document, scene text image, degraded historical document, etc.). Line removal – Cleaning up non-glyph boxes and lines Layout analysis or zoning – Identification of columns, paragraphs, captions, etc. as distinct blocks. Especially important in multi-column layouts and tables. Line and word detection – Establishment of a baseline for word and character shapes, separating words as necessary. Script recognition – In multilingual documents, the script may change at the level of the words and hence, identification of the script is necessary, before the right OCR can be invoked to handle the specific script. Character isolation or segmentation – For per-character OCR, multiple characters that are connected due to image artifacts must be separated; single characters that are broken into multiple pieces due to artifacts must be connected. Normalization of aspect ratio and scale Segmentation of fixed-pitch fonts is accomplished relatively simply by aligning the image to a uniform grid based on where vertical grid lines will least often intersect black areas. For proportional fonts, more sophisticated techniques are needed because whitespace bet

    Read more →
  • Ordination (statistics)

    Ordination (statistics)

    Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). In contrast to cluster analysis, ordination orders quantities in a (usually lower-dimensional) latent space. In the ordination space, quantities that are near each other share attributes (i.e., are similar to some degree), and dissimilar objects are farther from each other. Such relationships between the objects, on each of several axes or latent variables, are then characterized numerically and/or graphically in a biplot. The first ordination method, principal components analysis, was suggested by Karl Pearson in 1901. == Methods == Ordination methods can broadly be categorized in eigenvector-, algorithm-, or model-based methods. Many classical ordination techniques, including principal components analysis, correspondence analysis (CA) and its derivatives (detrended correspondence analysis, canonical correspondence analysis, and redundancy analysis, belong to the first group). The second group includes some distance-based methods such as non-metric multidimensional scaling, and machine learning methods such as T-distributed stochastic neighbor embedding and nonlinear dimensionality reduction. The third group includes model-based ordination methods, which can be considered as multivariate extensions of Generalized Linear Models. Model-based ordination methods are more flexible in their application than classical ordination methods, so that it is for example possible to include random-effects. Unlike in the aforementioned two groups, there is no (implicit or explicit) distance measure in the ordination. Instead, a distribution needs to be specified for the responses as is typical for statistical models. These and other assumptions, such as the assumed mean-variance relationship, can be validated with the use of residual diagnostics, unlike in other ordination methods. == Applications == Ordination can be used on the analysis of any set of multivariate objects. It is frequently used in several environmental or ecological sciences, particularly plant community ecology. It is also used in genetics and systems biology for microarray data analysis and in psychometrics.

    Read more →
  • JAUS Tool Set

    JAUS Tool Set

    The JAUS Tool Set (JTS) is a software engineering tool for the design of software services used in a distributed computing environment. JTS provides a graphical user interface (GUI) and supporting tools for the rapid design, documentation, and implementation of service interfaces that adhere to the Society of Automotive Engineers' standard AS5684A, the JAUS Service Interface Design Language (JSIDL). JTS is designed to support the modeling, analysis, implementation, and testing of the protocol for an entire distributed system. == Overview == The JAUS Tool Set (JTS) is a set of open source software specification and development tools accompanied by an open source software framework to develop Joint Architecture for Unmanned Systems (JAUS) designs and compliant interface implementations for simulations and control of robotic components per SAE-AS4 standards. JTS consists of the components: GUI based Service Editor: The Service Editor (referred to as the GUI in this document) provides a user friendly interface with which a system designer can specify and analyze formal specifications of Components and Services defined using the JAUS Service Interface Definition Language (JSIDL). Validator: A syntactic and semantic validator provides on-the-fly validation of specifications entered (or imported) by the user with respect to JSIDL syntax and semantics is integrated into the GUI. Specification Repository: A repository (or database) that is integrated into the GUI that allows for the storage of and encourages the reuse of existing formal specifications. C++ Code Generator: The Code Generator automatically generates C++ code that has a 1:1 mapping to the formal specifications. The generated code includes all aspects of the service, including the implementations of marshallers and unmarshallers for messages, and implementations of finite-state machines for protocol behavior that are effectively decoupled from application behavior. Document Generator: The Document Generator automatically generates documentation for sets of Service Definitions. Documents may be generated in several formats. Software Framework: The software framework implements the transport layer specification AS5669A, and provides the interfaces necessary to integrate the auto-generated C++ code with the transport layer implementation. Present transport options include UDP and TCP in wired or wireless networks, as well as serial connections. The transport layer itself is modular, and allows end-users to add additional support as needed. Wireshark Plugin: The Wireshark plugin implements a plugin to the popular network protocol analyzer called Wireshark. This plugin allows for the live capture and offline analysis of JAUS message-based communication at runtime. A built-in repository facilitates easy reuse of service interfaces and implementations traffic across the wire. The JAUS Tool Set can be downloaded from www.jaustoolset.org User documentation and community forum are also available at the site. == Release history == Following a successful Beta test, Version 1.0 of the JAUS Tool Set was released in July 2010. The initial offering focused on core areas of User Interface, HTML document generation, C++ code generation, and the software framework. The Version 1.1 update was released in October 2010. In addition to bug fixes and UI improvements, this version offered several important upgrades including enhancement to the Validator, Wireshark plug-in, and generated code. The JTS 2.0 release is scheduled for the second quarter of 2011 and further refines the Tool Set functionality: Protocol Validation: Currently, JTS provides validation for message creation, to ensure users cannot create invalid messages specifications. That capability does not currently exist for protocol definitions, but is being added. This will help ensure that users create all necessary elements of a service definition, and reduce user error. C# and Java Code Generation: Currently, JTS generates cross-platform C++ code. However, other languages including Java and C# are seeing a dramatic increase in their use in distributed systems, particularly in the development of graphical clients to embedded services. MS Word Document Generation: HTML and JSIDL output is supported, but native Office-Open-XML (OOXML) based MS Word generation has advantages in terms of output presentation, and ease of use for integration with other documents. Therefore, we plan to integrate MS Word service document generation. In addition, the development team has several additional goals that are not-yet-scheduled for a particular release window: Protocol Verification: This involves converting the JSIDL definition of a service into a PROMELA model, for validation by the SPIN model checking tool. Using PROMELA to model client and server interfaces will allow developers to formally validate JAUS services. End User Experience: We plan to conduct formal User Interface testing. This involves defining a set of tasks and use cases, asking users with various levels of JAUS experience to accomplish those tasks, and measuring performance and collecting feedback, to look for areas where the overall user experience can be improved. Improved Service Re-Use: JSIDL allows for inheritance of protocol descriptions, much like object-oriented programming languages allow child classes to re-use and extend behaviors defined by the parent class. At present, the generated code 'flattens' these state machines into a series of nested states which gives the correct interface behavior, but only if each single leaf (child) service is generated within its own component. This limits service re-use and can lead to a copy-and-paste of the same implementation across multiple components. The team is evaluating other inheritance solutions that would allow for multiple leaf (child) services to share access to a common parent, but at present the approach is sufficient to address the requirements of the JAUS Core Service Set. == Domains and application == The JAUS Tool Set is based on the JAUS Service Interface Definition Language (JSIDL), which was originally developed for application within the unmanned systems, or robotics, communities. As such, JTS has quickly gained acceptance as a tool for generation of services and interfaces compliant with the SAE AS-4 "JAUS" publications. Although usage statistics are not available, the Tool Set has been downloaded by representatives of US Army, Navy, Marines, and numerous defense contractors. It was also used in a commercial product called the JAUS Expansion Module sold by DeVivo AST, Inc. Since the JSIDL schema is independent of the data being exchanged, however, the Tool Set can be used for the design and implementation of a Service Oriented Architecture for any distributed systems environment that uses binary encoded message exchange. JSIDL is built on a two-layered architecture that separates the application layer and the transport layer, effectively decoupling the data being exchanges from the details of how that data moves from component to component. Furthermore, since the schema itself is widely generic, it's possible to define messages for any number of domains including but not limited to industrial control systems, remote monitoring and diagnostics, and web-based applications. == Licensing == JTS is released under the open source BSD license. The JSIDL Standard is available from the SAE. The Jr Middleware on which the Software Framework (Transport Layer) is based is open source under LGPL. Other packages distributed with JTS may have different licenses. == Sponsors == Development of the JAUS Tool Set was sponsored by several United States Department of Defense organizations: Office of Under Secretary of Defense for Acquisition, Technology & Logistics / Unmanned Warfare. Navy Program Executive Officer Littoral and Mine Navy Program Executive Officer Unmanned Aviation and Strike Weapons Office of Naval Research Air Force Research Lab

    Read more →
  • Semidefinite embedding

    Semidefinite embedding

    Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data. It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality, as it leverages the Kernel trick to non-linearly map the original data into an inner-product space. == Algorithm == MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps: A neighbourhood graph is created. Each input is connected with its k-nearest input vectors (according to Euclidean distance metric) and all k-nearest neighbors are connected with each other. If the data is sampled well enough, the resulting graph is a discrete approximation of the underlying manifold. The neighbourhood graph is "unfolded" with the help of semidefinite programming. Instead of learning the output vectors directly, the semidefinite programming aims to find an inner product matrix that maximizes the pairwise distances between any two inputs that are not connected in the neighbourhood graph while preserving the nearest neighbors distances. The low-dimensional embedding is finally obtained by application of multidimensional scaling on the learned inner product matrix. The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich. == Optimization formulation == Let X {\displaystyle X\,\!} be the original input and Y {\displaystyle Y\,\!} be the embedding. If i , j {\displaystyle i,j\,\!} are two neighbors, then the local isometry constraint that needs to be satisfied is: | X i − X j | 2 = | Y i − Y j | 2 {\displaystyle |X_{i}-X_{j}|^{2}=|Y_{i}-Y_{j}|^{2}\,\!} Let G , K {\displaystyle G,K\,\!} be the Gram matrices of X {\displaystyle X\,\!} and Y {\displaystyle Y\,\!} (i.e.: G i j = X i ⋅ X j , K i j = Y i ⋅ Y j {\displaystyle G_{ij}=X_{i}\cdot X_{j},K_{ij}=Y_{i}\cdot Y_{j}\,\!} ). We can express the above constraint for every neighbor points i , j {\displaystyle i,j\,\!} in term of G , K {\displaystyle G,K\,\!} : G i i + G j j − G i j − G j i = K i i + K j j − K i j − K j i {\displaystyle G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}\,\!} In addition, we also want to constrain the embedding Y {\displaystyle Y\,\!} to center at the origin: 0 = | ∑ i Y i | 2 ⇔ ( ∑ i Y i ) ⋅ ( ∑ i Y i ) ⇔ ∑ i , j Y i ⋅ Y j ⇔ ∑ i , j K i j {\displaystyle 0=|\sum _{i}Y_{i}|^{2}\Leftrightarrow (\sum _{i}Y_{i})\cdot (\sum _{i}Y_{i})\Leftrightarrow \sum _{i,j}Y_{i}\cdot Y_{j}\Leftrightarrow \sum _{i,j}K_{ij}} As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 {\displaystyle T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}} Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint Let τ = m a x { η i j | Y i − Y j | 2 } {\displaystyle \tau =max\{\eta _{ij}|Y_{i}-Y_{j}|^{2}\}\,\!} where η i j := { 1 if i is a neighbour of j 0 otherwise . {\displaystyle \eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}} prevents the objective function from diverging (going to infinity). Since the graph has N points, the distance between any two points | Y i − Y j | 2 ≤ N τ {\displaystyle |Y_{i}-Y_{j}|^{2}\leq N\tau \,\!} . We can then bound the objective function as follows: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 ≤ 1 2 N ∑ i , j ( N τ ) 2 = N 3 τ 2 2 {\displaystyle T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\leq {\dfrac {1}{2N}}\sum _{i,j}(N\tau )^{2}={\dfrac {N^{3}\tau ^{2}}{2}}\,\!} The objective function can be rewritten purely in the form of the Gram matrix: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 = 1 2 N ∑ i , j ( Y i 2 + Y j 2 − Y i ⋅ Y j − Y j ⋅ Y i ) = 1 2 N ( ∑ i , j Y i 2 + ∑ i , j Y j 2 − ∑ i , j Y i ⋅ Y j − ∑ i , j Y j ⋅ Y i ) = 1 2 N ( ∑ i , j Y i 2 + ∑ i , j Y j 2 − 0 − 0 ) = 1 N ( ∑ i Y i 2 ) = 1 N ( T r ( K ) ) {\displaystyle {\begin{aligned}T(Y)&{}={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\\&{}={\dfrac {1}{2N}}\sum _{i,j}(Y_{i}^{2}+Y_{j}^{2}-Y_{i}\cdot Y_{j}-Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-\sum _{i,j}Y_{i}\cdot Y_{j}-\sum _{i,j}Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-0-0)\\&{}={\dfrac {1}{N}}(\sum _{i}Y_{i}^{2})={\dfrac {1}{N}}(Tr(K))\\\end{aligned}}\,\!} Finally, the optimization can be formulated as: Maximize T r ( K ) subject to K ⪰ 0 , ∑ i j K i j = 0 and G i i + G j j − G i j − G j i = K i i + K j j − K i j − K j i , ∀ i , j where η i j = 1 , {\displaystyle {\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}} After the Gram matrix K {\displaystyle K\,\!} is learned by semidefinite programming, the output Y {\displaystyle Y\,\!} can be obtained via Cholesky decomposition. In particular, the Gram matrix can be written as K i j = ∑ α = 1 N ( λ α V α i V α j ) {\displaystyle K_{ij}=\sum _{\alpha =1}^{N}(\lambda _{\alpha }V_{\alpha i}V_{\alpha j})\,\!} where V α i {\displaystyle V_{\alpha i}\,\!} is the i-th element of eigenvector V α {\displaystyle V_{\alpha }\,\!} of the eigenvalue λ α {\displaystyle \lambda _{\alpha }\,\!} . It follows that the α {\displaystyle \alpha \,\!} -th element of the output Y i {\displaystyle Y_{i}\,\!} is λ α V α i {\displaystyle {\sqrt {\lambda _{\alpha }}}V_{\alpha i}\,\!} .

    Read more →
  • Synaptic transistor

    Synaptic transistor

    A synaptic transistor is an electrical device that can learn in ways similar to a neural synapse. It optimizes its own properties for the functions it has carried out in the past. The device mimics the behavior of the property of neurons called spike-timing-dependent plasticity, or STDP. == Structure == Its structure is similar to that of a field effect transistor, where an ionic liquid takes the place of the gate insulating layer between the gate electrode and the conducting channel. That channel is composed of samarium nickelate (SmNiO3, or SNO) rather than the field effect transistor's doped silicon. == Function == A synaptic transistor has a traditional immediate response whose amount of current that passes between the source and drain contacts varies with voltage applied to the gate electrode. It also produces a much slower learned response such that the conductivity of the SNO layer varies in response to the transistor's STDP history, essentially by shuttling oxygen ions between the SNO and the ionic liquid. The analog of strengthening a synapse is to increase the SNO's conductivity, which essentially increases gain. Similarly, weakening a synapse is analogous to decreasing the SNO's conductivity, lowering the gain. The input and output of the synaptic transistor are continuous analog values, rather than digital on-off signals. While the physical structure of the device has the potential to learn from history, it contains no way to bias the transistor to control the memory effect. An external supervisory circuit converts the time delay between input and output into a voltage applied to the ionic liquid that either drives ions into the SNO or removes them. A network of such devices can learn particular responses to "sensory inputs", with those responses being learned through experience rather than explicitly programmed.

    Read more →
  • Rule-based machine learning

    Rule-based machine learning

    Rule-based machine learning (RBML) is a term in computer science intended to encompass any machine learning method that identifies, learns, or evolves 'rules' to store, manipulate or apply. The defining characteristic of a rule-based machine learner is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. Rule-based machine learning approaches include learning classifier systems, association rule learning, artificial immune systems, and any other method that relies on a set of rules, each covering contextual knowledge. While rule-based machine learning is conceptually a type of rule-based system, it is distinct from traditional rule-based systems, which are often hand-crafted, and other rule-based decision makers. This is because rule-based machine learning applies some form of learning algorithm such as Rough sets theory to identify and minimise the set of features and to automatically identify useful rules, rather than a human needing to apply prior domain knowledge to manually construct rules and curate a rule set. == Rules == Rules typically take the form of an '{IF:THEN} expression', (e.g. {IF 'condition' THEN 'result'}, or as a more specific example, {IF 'red' AND 'octagon' THEN 'stop-sign}). An individual rule is not in itself a model, since the rule is only applicable when its condition is satisfied. Therefore rule-based machine learning methods typically comprise a set of rules, or knowledge base, that collectively make up the prediction model usually known as decision algorithm. Rules can also be interpreted in various ways depending on the domain knowledge, data types(discrete or continuous) and in combinations. == RIPPER == Repeated incremental pruning to produce error reduction (RIPPER) is a propositional rule learner proposed by William W. Cohen as an optimized version of IREP.

    Read more →
  • Voice activity detection

    Voice activity detection

    Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speaker diarization, speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth. VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language. It was first investigated for use on time-assignment speech interpolation (TASI) systems. == Algorithm overview == The typical design of a VAD algorithm is as follows: There may first be a noise reduction stage, e.g. via spectral subtraction. Then some features or quantities are calculated from a section of the input signal. A classification rule is applied to classify the section as speech or non-speech – often this classification rule finds when a value exceeds a certain threshold. There may be some feedback in this sequence, in which the VAD decision is used to improve the noise estimate in the noise reduction stage, or to adaptively vary the threshold(s). These feedback operations improve the VAD performance in non-stationary noise (i.e. when the noise varies a lot). A representative set of recently published VAD methods formulates the decision rule on a frame by frame basis using instantaneous measures of the divergence distance between speech and noise. The different measures which are used in VAD methods include spectral slope, correlation coefficients, log likelihood ratio, cepstral, weighted cepstral, and modified distance measures. Independently from the choice of VAD algorithm, a compromise must be made between having voice detected as noise, or noise detected as voice (between false positive and false negative). A VAD operating in a mobile phone must be able to detect speech in the presence of a range of very diverse types of acoustic background noise. In these difficult detection conditions it is often preferable that a VAD should fail-safe, indicating speech detected when the decision is in doubt, to lower the chance of losing speech segments. The biggest difficulty in the detection of speech in this environment is the very low signal-to-noise ratios (SNRs) that are encountered. It may be impossible to distinguish between speech and noise using simple level detection techniques when parts of the speech utterance are buried below the noise. == Applications == VAD is an integral part of different speech communication systems such as audio conferencing, echo cancellation, speech recognition, speech encoding, speaker recognition and hands-free telephony. In the field of multimedia applications, VAD allows simultaneous voice and data applications. Similarly, in Universal Mobile Telecommunications Systems (UMTS), it controls and reduces the average bit rate and enhances overall coding quality of speech. In cellular radio systems (for instance GSM and CDMA systems) based on Discontinuous Transmission (DTX) mode, VAD is essential for enhancing system capacity by reducing co-channel interference and power consumption in portable digital devices. In speech processing applications, voice activity detection plays an important role since non-speech frames are often discarded. For a wide range of applications such as digital mobile radio, Digital Simultaneous Voice and Data (DSVD) or speech storage, it is desirable to provide a discontinuous transmission of speech-coding parameters. Advantages can include lower average power consumption in mobile handsets, higher average bit rate for simultaneous services like data transmission, or a higher capacity on storage chips. However, the improvement depends mainly on the percentage of pauses during speech and the reliability of the VAD used to detect these intervals. On the one hand, it is advantageous to have a low percentage of speech activity. On the other hand, clipping, that is the loss of milliseconds of active speech, should be minimized to preserve quality. This is the crucial problem for a VAD algorithm under heavy noise conditions. === Use in telemarketing === One controversial application of VAD is in conjunction with predictive dialers used by telemarketing firms. In order to maximize agent productivity, telemarketing firms set up predictive dialers to call more numbers than they have agents available, knowing most calls will end up in either "Ring – No Answer" or answering machines. When a person answers, they typically speak briefly ("Hello", "Good evening", etc.) and then there is a brief period of silence. Answering machine messages are usually 3–15 seconds of continuous speech. By setting VAD parameters correctly, dialers can determine whether a person or a machine answered the call and, if it's a person, transfer the call to an available agent. If it detects an answering machine message, the dialer hangs up. Often, even when the system correctly detects a person answering the call, no agent may be available, resulting in a "silent call". Call screening with a multi-second message like "please say who you are, and I may pick up the phone" will frustrate such automated calls. == Performance evaluation == To evaluate a VAD, its output using test recordings is compared with those of an "ideal" VAD – created by hand-annotating the presence or absence of voice in the recordings. The performance of a VAD is commonly evaluated on the basis of the following four parameters: FEC (Front End Clipping): clipping introduced in passing from noise to speech activity; MSC (Mid Speech Clipping): clipping due to speech misclassified as noise; OVER: noise interpreted as speech due to the VAD flag remaining active in passing from speech activity to noise; NDS (Noise Detected as Speech): noise interpreted as speech within a silence period. Although the method described above provides useful objective information concerning the performance of a VAD, it is only an approximate measure of the subjective effect. For example, the effects of speech signal clipping can at times be hidden by the presence of background noise, depending on the model chosen for the comfort noise synthesis, so some of the clipping measured with objective tests is in reality not audible. It is therefore important to carry out subjective tests on VADs, the main aim of which is to ensure that the clipping perceived is acceptable. In VoIP applications, front-end clipping can be reduced by rewinding to shortly before the detection and sending very slightly delayed data. This kind of test requires a certain number of listeners to judge recordings containing the processing results of the VADs being tested, giving marks to several speech sequences on the following features: Quality; Comprehension difficulty; Audibility of clipping. These marks are then used to calculate average results for each of the features listed above, thus providing a global estimate of the behavior of the VAD being tested. To conclude, whereas objective methods are very useful in an initial stage to evaluate the quality of a VAD, subjective methods are more significant. As they require the participation of several people for a few days, increasing cost, they are generally only used when a proposal is about to be standardized. == Implementations == One early standard VAD is that developed by British Telecom for use in the Pan-European digital cellular mobile telephone service in 1991. It uses inverse filtering trained on non-speech segments to filter out background noise, so that it can then more reliably use a simple power-threshold to decide if a voice is present. The G.729 standard calculates the following features for its VAD: line spectral frequencies, full-band energy, low-band energy (<1 kHz), and zero-crossing rate. It applies a simple classification using a fixed decision boundary in the space defined by these features, and then applies smoothing and adaptive correction to improve the estimate. The GSM standard includes two VAD options developed by ETSI. Option 1 computes the SNR in nine bands and applies a threshold to these values. Option 2 calculates different parameters: channel power, voice metrics, and noise power. It then thresholds the voice metrics using a threshold that varies according to the estimated SNR. The Speex audio compression library uses a procedure named Improved Minima Controlled Recursive Averaging, which uses a smoothed representation of spectral pow

    Read more →
  • Geographical cluster

    Geographical cluster

    A geographical cluster is a localized anomaly, usually an excess of something given the distribution or variation of something else. Often it is considered as an incidence rate that is unusual in that there is more of some variable than might be expected. Examples would include: a local excess disease rate, a crime hot spot, areas of high unemployment, accident blackspots, unusually high positive residuals from a model, high concentrations of flora or fauna, physical features or events like earthquake epicenters etc... Identifying these extreme regions may be useful in that there could be implicit geographical associations with other variables that can be identified and would be of interest. Pattern detection via the identification of such geographical clusters is a very simple and generic form of geographical analysis that has many applications in many different contexts. The emphasis is on localized clustering or patterning because this may well contain the most useful information. A geographical cluster is different from a high concentration as it is generally second order, involving the factoring in of the distribution of something else. == Geographical cluster detection == Identifying geographical clusters can be an important stage in a geographical analysis. Mapping the locations of unusual concentrations may help identify causes of these. Some techniques include the Geographical Analysis Machine and Besag and Newell's cluster detection method.

    Read more →
  • Variable-order Bayesian network

    Variable-order Bayesian network

    Variable-order Bayesian network (VOBN) models provide an important extension of both the Bayesian network models and the variable-order Markov models. VOBN models are used in machine learning in general and have shown great potential in bioinformatics applications. These models extend the widely used position weight matrix (PWM) models, Markov models, and Bayesian network (BN) models. In contrast to the BN models, where each random variable depends on a fixed subset of random variables, in VOBN models these subsets may vary based on the specific realization of observed variables. The observed realizations are often called the context and, hence, VOBN models are also known as context-specific Bayesian networks. The flexibility in the definition of conditioning subsets of variables turns out to be a real advantage in classification and analysis applications, as the statistical dependencies between random variables in a sequence of variables (not necessarily adjacent) may be taken into account efficiently, and in a position-specific and context-specific manner.

    Read more →