AI Coding Agents Comparison

AI Coding Agents Comparison — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ArcObjects

    ArcObjects

    ArcObjects is a development environment of the ArcGIS family of applications. Using Visual Basic for Applications, C# or Java SDK for ArcGIS, it allows developers to extend these applications.ArcObjects is a library of Component Object Model (COM) components that build up the foundation of Esri's ArcGIS platform. ArcObjects is written primarily in the C++ programming language. Since ArcGIS is completely built on top of ArcObjects, the ArcGIS platform can be fully customized and extended by making use of its COM services and capabilities. This allows for easy extension of the ArcObjects data model with any programming language that is compatible with COM, such as Visual Basic, C#, Visual Basic.NET, Java and Python. COM enables components to be reused at a binary level, meaning developers do not require access to the source code of ArcObjects in order to extend the ArcGIS platform. For this reason, an ArcObjects programmer can make use of any type inside the ArcObjects system without knowing the implementation details of the type, only needing to know what the type is able to do. The ArcObjects data model is based on the COM standard, which makes it compatible with other COM objects and applications. This allows for easy integration and collaboration with other systems that are also based on the COM standard. The ArcGIS platform was built using ArcObjects types, such as classes, interfaces, and enumerations. ArcObjects use COM interfaces to organize and communicate properties and methods of its classes, ensuring compatibility with other COM-based objects and systems. When working with an ArcObjects COM class, its properties and methods are accessed solely through one of its implemented interfaces via the process of Query Interface (QI). Multiple interfaces are commonly available for classes in ArcObjects. For example, it is possible to query for additional interfaces implemented by an object after instantiation via the process of QI. Although only one interface can be used when instantiating an object, multiple interfaces are often available for classes in ArcObjects, allowing for greater flexibility and compatibility with other systems based on the COM standard.

    Read more →
  • Evolutionary programming

    Evolutionary programming

    Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES( μ + λ {\displaystyle \mu +\lambda } ) in one detail. All individuals are selected for the new population, while in ES( μ + λ {\displaystyle \mu +\lambda } ), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms. == History == It was first used by Lawrence J. Fogel in the US in 1960 in order to use simulated evolution as a learning process aiming to generate artificial intelligence. It was used to evolve finite-state machines as predictors.

    Read more →
  • Proximal policy optimization

    Proximal policy optimization

    Proximal policy optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent. Specifically, it is a policy gradient method, often used for deep RL when the policy network is very large. == History == The predecessor to PPO, Trust Region Policy Optimization (TRPO), was published in 2015. It addressed the instability issue of another algorithm, the Deep Q-Network (DQN), by using the trust region method to limit the KL divergence between the old and new policies. However, TRPO uses the Hessian matrix (a matrix of second derivatives) to enforce the trust region, but the Hessian is inefficient for large-scale problems. PPO was published in 2017. It was essentially an approximation of TRPO that does not require computing the Hessian. The KL divergence constraint was approximated by simply clipping the policy gradient. Since 2018, PPO was the default RL algorithm at OpenAI. PPO has been applied to many areas, such as controlling a robotic arm, beating professional players at Dota 2 (OpenAI Five), and playing Atari games. == TRPO == TRPO, the predecessor of PPO, is an on-policy algorithm. It can be used for environments with either discrete or continuous action spaces. The pseudocode is as follows: Input: initial policy parameters θ 0 {\textstyle \theta _{0}} , initial value function parameters ϕ 0 {\textstyle \phi _{0}} Hyperparameters: KL-divergence limit δ {\textstyle \delta } , backtracking coefficient α {\textstyle \alpha } , maximum number of backtracking steps K {\textstyle K} for k = 0 , 1 , 2 , … {\textstyle k=0,1,2,\ldots } do Collect set of trajectories D k = { τ i } {\textstyle {\mathcal {D}}_{k}=\left\{\tau _{i}\right\}} by running policy π k = π ( θ k ) {\textstyle \pi _{k}=\pi \left(\theta _{k}\right)} in the environment. Compute rewards-to-go R ^ t {\textstyle {\hat {R}}_{t}} . Compute advantage estimates, A ^ t {\textstyle {\hat {A}}_{t}} (using any method of advantage estimation) based on the current value function V ϕ k {\textstyle V_{\phi _{k}}} . Estimate policy gradient as g ^ k = 1 | D k | ∑ τ ∈ D k ∑ t = 0 T ∇ θ log ⁡ π θ ( a t ∣ s t ) | θ k A ^ t {\displaystyle {\hat {g}}_{k}=\left.{\frac {1}{\left|{\mathcal {D}}_{k}\right|}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\nabla _{\theta }\log \pi _{\theta }\left(a_{t}\mid s_{t}\right)\right|_{\theta _{k}}{\hat {A}}_{t}} Use the conjugate gradient algorithm to compute x ^ k ≈ H ^ k − 1 g ^ k {\displaystyle {\hat {x}}_{k}\approx {\hat {H}}_{k}^{-1}{\hat {g}}_{k}} where H ^ k {\textstyle {\hat {H}}_{k}} is the Hessian of the sample average KL-divergence. Update the policy by backtracking line search with θ k + 1 = θ k + α j 2 δ x ^ k T H ^ k x ^ k x ^ k {\displaystyle \theta _{k+1}=\theta _{k}+\alpha ^{j}{\sqrt {\frac {2\delta }{{\hat {x}}_{k}^{T}{\hat {H}}_{k}{\hat {x}}_{k}}}}{\hat {x}}_{k}} where j ∈ { 0 , 1 , 2 , … K } {\textstyle j\in \{0,1,2,\ldots K\}} is the smallest value which improves the sample loss and satisfies the sample KL-divergence constraint. Fit value function by regression on mean-squared error: ϕ k + 1 = arg ⁡ min ϕ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T ( V ϕ ( s t ) − R ^ t ) 2 {\displaystyle \phi _{k+1}=\arg \min _{\phi }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\left(V_{\phi }\left(s_{t}\right)-{\hat {R}}_{t}\right)^{2}} typically via some gradient descent algorithm. == PPO == The pseudocode is as follows: Input: initial policy parameters θ 0 {\textstyle \theta _{0}} , initial value function parameters ϕ 0 {\textstyle \phi _{0}} for k = 0 , 1 , 2 , … {\textstyle k=0,1,2,\ldots } do Collect set of trajectories D k = { τ i } {\textstyle {\mathcal {D}}_{k}=\left\{\tau _{i}\right\}} by running policy π k = π ( θ k ) {\textstyle \pi _{k}=\pi \left(\theta _{k}\right)} in the environment. Compute rewards-to-go R ^ t {\textstyle {\hat {R}}_{t}} . Compute advantage estimates, A ^ t {\textstyle {\hat {A}}_{t}} (using any method of advantage estimation) based on the current value function V ϕ k {\textstyle V_{\phi _{k}}} . Update the policy by maximizing the PPO-Clip objective: θ k + 1 = arg ⁡ max θ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T min ( π θ ( a t ∣ s t ) π θ k ( a t ∣ s t ) A π θ k ( s t , a t ) , g ( ϵ , A π θ k ( s t , a t ) ) ) {\displaystyle \theta _{k+1}=\arg \max _{\theta }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\min \left({\frac {\pi _{\theta }\left(a_{t}\mid s_{t}\right)}{\pi _{\theta _{k}}\left(a_{t}\mid s_{t}\right)}}A^{\pi _{\theta _{k}}}\left(s_{t},a_{t}\right),\quad g\left(\epsilon ,A^{\pi _{\theta _{k}}}\left(s_{t},a_{t}\right)\right)\right)} typically via stochastic gradient ascent with Adam. Fit value function by regression on mean-squared error: ϕ k + 1 = arg ⁡ min ϕ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T ( V ϕ ( s t ) − R ^ t ) 2 {\displaystyle \phi _{k+1}=\arg \min _{\phi }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\left(V_{\phi }\left(s_{t}\right)-{\hat {R}}_{t}\right)^{2}} typically via some gradient descent algorithm. Like all policy gradient methods, PPO is used for training an RL agent whose actions are determined by a differentiable policy function by gradient ascent. Intuitively, a policy gradient method takes small policy update steps, so the agent can reach higher and higher rewards in expectation. Policy gradient methods may be unstable: A step size that is too big may direct the policy in a suboptimal direction, thus having little possibility of recovery; a step size that is too small lowers the overall efficiency. To solve the instability, PPO implements a clip function that constrains the policy update of an agent from being too large, so that larger step sizes may be used without negatively affecting the gradient ascent process. === Basic concepts === To begin the PPO training process, the agent is set in an environment to perform actions based on its current input. In the early phase of training, the agent can freely explore solutions and keep track of the result. Later, with a certain amount of transition samples and policy updates, the agent will select an action to take by randomly sampling from the probability distribution P ( A | S ) {\displaystyle P(A|S)} generated by the policy network. The actions that are most likely to be beneficial will have the highest probability of being selected from the random sample. After an agent arrives at a different scenario (a new state) by acting, it is rewarded with a positive reward or a negative reward. The objective of an agent is to maximize the cumulative reward signal across sequences of states, known as episodes. === Policy gradient laws: the advantage function === The advantage function (denoted as A {\displaystyle A} ) is central to PPO, as it tries to answer the question of whether a specific action of the agent is better or worse than some other possible action in a given state. By definition, the advantage function is an estimate of the relative value for a selected action. If the output of this function is positive, it means that the action in question is better than the average return, so the possibilities of selecting that specific action will increase. The opposite is true for a negative advantage output. The advantage function can be defined as A = Q − V {\displaystyle A=Q-V} , where Q {\displaystyle Q} is the discounted sum of rewards (the total weighted reward for the completion of an episode) and V {\displaystyle V} is the baseline estimate. Since the advantage function is calculated after the completion of an episode, the program records the outcome of the episode. Therefore, calculating advantage is essentially an unsupervised learning problem. The baseline estimate comes from the value function that outputs the expected discounted sum of an episode starting from the current state. In the PPO algorithm, the baseline estimate will be noisy (with some variance), as it also uses a neural network, like the policy function itself. With Q {\displaystyle Q} and V {\displaystyle V} computed, the advantage function is calculated by subtracting the baseline estimate from the actual discounted return. If A > 0 {\displaystyle A>0} , the actual return of the action is better than the expected return from experience; if A < 0 {\displaystyle A<0} , the actual return is worse. === Ratio function === In PPO, the ratio function ( r t {\displaystyle r_{t}} ) calculates the probability of selecting action a {\displaystyle a} in state s {\displaystyle s} given the current policy network, divided by the previous probability under the old policy. In other words: If r t ( θ ) > 1 {\displaystyle r_{t}(\theta )>1} , where θ {\displaystyle \theta } are the policy network parameters, then selecting action a {\displaystyle a} in state s {\displaystyle s} is more likely based on the current policy than the previous policy. If 0 ≤ r t ( θ ) < 1 {\displaystyle 0\leq r_{t}(\theta )<1} , then selecting actio

    Read more →
  • Moral graph

    Moral graph

    In graph theory, a moral graph is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models. The moralized counterpart of a directed acyclic graph is formed by adding edges between all pairs of non-adjacent nodes that have a common child, and then making all edges in the graph undirected. Equivalently, a moral graph of a directed acyclic graph G is an undirected graph in which each node of the original G is now connected to its Markov blanket. The name stems from the fact that, in a moral graph, two nodes that have a common child are required to be married by sharing an edge. Moralization may also be applied to mixed graphs, called in this context "chain graphs". In a chain graph, a connected component of the undirected subgraph is called a chain. Moralization adds an undirected edge between any two vertices that both have outgoing edges to the same chain, and then forgets the orientation of the directed edges of the graph. == Weakly recursively simplicial == A graph is weakly recursively simplicial if it has a simplicial vertex and the subgraph after removing a simplicial vertex and some edges (possibly none) between its neighbours is weakly recursively simplicial. A graph is moral if and only if it is weakly recursively simplicial. A chordal graph (a.k.a., recursive simplicial) is a special case of weakly recursively simplicial when no edge is removed during the elimination process. Therefore, a chordal graph is also moral. But a moral graph is not necessarily chordal. == Recognising moral graphs == Unlike chordal graphs that can be recognised in polynomial time, Verma & Pearl (1993) proved that deciding whether or not a graph is moral is NP-complete.

    Read more →
  • Zolostays

    Zolostays

    Zolostays is a real-tech co-living focused startup that provides ready-to-move rooms/beds. It was founded in 2015 by Nikhil Sikri, Akhil Sikri and Sneha Choudhry. == Overview == During the pandemic, Zolo provided 75 of rent-free accommodation to those who lost their jobs. Zolo uses bulk inventory in usually residential township and ties up with real estate companies to make the rooms/beds available. Zolostays has both revenue sharing and leased model. == History == Zolostays was founded in 2015 to solve the problem of students and young professionals who would move to temporarily go to other cities to study and work and look for affordable housing. In 2020, it was operating in 10 Indian cities. It has four round of funding, with total $98 million.

    Read more →
  • Evolutionary attractor

    Evolutionary attractor

    An evolutionary attractor is a point in an evolutionary space where a selection process will always drive trait values towards that point from the region around it. Because of the importance of evolution through natural selection, often such an evolutionary space will be defined by genetic or phenotypic traits, or possibly both. In this case the selection process will be a form of natural selection. The existence of an evolutionary attractor in a biological evolutionary space does not always imply that it can be reached from all points in that evolutionary space, nor does it identify what will happen when the evolutionary attractor is reached. While an evolutionary attractor may represent a point in evolutionary space that is resistant to further selection, such as an evolutionarily stable strategy, other possibilities are available. Because identification of an evolutionary attractor on its own does not describe everything about the evolutionary space in which it lies, this has led to interest in the evolutionary dynamics surrounding evolutionary attractors and in evolutionary spaces in general. (Theoretical biologists and mathematicians working in the area may prefer the terms adaptive dynamics or evolutionary invasion analysis to evolutionary dynamics.) These fields use differential equations which allows a more complete understanding of the dynamics in evolutionary spaces including the existence or otherwise of evolutionary attractors. Advances in the study of molecular evolution have also led to the identification of evolutionary attractors at a molecular level. Because biological evolutionary processes have been studied using evolutionary game theory, a technique inspired by game theory originally derived to address economic problems, not only can evolutionary attractors be found in biology but economists studying evolutionary economic models have also identified evolutionary attractors. Evolution in biology has also inspired evolutionary computation in computer science. Many algorithms in this field use a form of selection inspired by natural selection to generate results through evolutionary algorithms. This is therefore another area in which evolutionary attractors have been identified. == Evolutionary attractors in biology == It is not probably not surprising that biology is the field where most examples of evolutionary attractors have been identified, given the importance of evolution through natural selection. === Evolutionary attractors in adaptive landscapes === An evolutionary attractor is a point in genetic and/or phenotypic trait space, that evolution will always drive trait values towards via a selection process. The concept of an evolutionary attractor arose in population genetics following the origin of the adaptive landscape originally proposed by Sewall Wright in 1932. The height of a point in an adaptive landscape is a measure of evolutionary fitness. If a point in an adaptive landscape is a peak, then selection will always drive traits towards it and it will be an evolutionary attractor. While population genetics deals with discrete genetic traits, quantitative genetics extended such concepts to deal with continuous genetic traits, where the concept of evolutionary attractor is also valid. === Evolutionary attractors in evolutionary game models === Evolutionary game theory introduced into evolutionary biology concepts originally used in economics, with the advantage that evolution could be studied in relation to strategic choices made in animal conflicts. This is of particular interest because of the concept of the evolutionarily stable strategy or ESS, a strategy that once established is resistant to invasion by other strategies. ESSs will not always be evolutionary attractors, but if they are they will persist over evolutionary time. === Dynamics around evolutionary attractors in biology === Evolutionary attractors in biology do not exist in isolation. By definition they must exist in an evolutionary trait space where selection drives all traits towards them from a region immediately around them. That is, they must be convergence stable. Eshel (1983) modified the definition of an ESS by considering individually advantageous reduction from a majority deviation: he created the term continuous stability. A continuously stable ESS can be shown to be convergence stable, therefore it will act as an evolutionary attractor. But the nature of evolutionary trait spaces in biology means that it is not possible to guarantee that the region of convergence to the evolutionary attractor covers the whole of the trait space, nor that there is only one evolutionary attractor in a particular trait space. These issues have led to the emergence of the related fields of evolutionary dynamics, adaptive dynamics and evolutionary invasion analysis, all of which use differential equations to understand the dynamics in evolutionary trait spaces. Hence, if one or more evolutionary attractor exists in an evolutionary trait space, they provide techniques to understand the dynamics in that trait space around the evolutionary attractor. === Evolutionary attractors in an ecological context === Evolution in biology does not take place in single species in isolation. Ecological interaction of species leads to coevolution. Important examples of this are host-parasite or host-pathogen interaction, which can make both the dynamics around evolutionary attractors more complex, and the occurrence and number of evolutionary attractors more diverse. Evolutionary attractors have been identified in the analysis of evolutionary epidemiology of plant pathogens. In the above study working on plant populations the authors were able to identify evolutionary attractors using methods from adaptive dynamics. A model applied to the analysis of a maize (Zea mays L.) virus identified convergence stable equilibria through simulation modelling. A related model identified evolutionary attractors in the interaction of plants with fungal pathogens. === Evolutionary attractors in molecular genetics === As mentioned above much of the consideration of evolutionary attractors in biology has been through investigation of selection at a genetic or phenotypic level or both, in a single species or in coevolving species. Advances in the study of molecular genetics now allow the study of evolutionary attractors to be taken to a molecular genetic level. Wilson et. al (2019) studied the evolution of gene regulatory networks and identified the emergence of evolutionary attractors. == Evolutionary attractors in economics == Evolutionary game theory as applied in biology was inspired by game theory originally devised for applications in economics. Game theory remains an active field of research outside of biology, and thus it is not surprising that researchers in evolutionary economics use evolutionary game theory. Evolutionary attractors have been demonstrated by economists studying the evolutionary dynamics of market entry with market dynamics based on the replicator dynamics of biological evolutionary games. == Evolutionary attractors in computing == Evolutionary computation is a branch of computer science inspired by biological evolution. Many algorithms in evolutionary computation use a form of selection. Thus evolutionary attractors have been identified in computer science as well as in biology and economics. Evolutionary algorithms have generated evolutionary attractors, probably because of the similarity between adaptive hill-climbing in evolutionary heuristics and the adaptive landscape originated to explain evolution through natural selection.

    Read more →
  • Evolutionary programming

    Evolutionary programming

    Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES( μ + λ {\displaystyle \mu +\lambda } ) in one detail. All individuals are selected for the new population, while in ES( μ + λ {\displaystyle \mu +\lambda } ), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms. == History == It was first used by Lawrence J. Fogel in the US in 1960 in order to use simulated evolution as a learning process aiming to generate artificial intelligence. It was used to evolve finite-state machines as predictors.

    Read more →
  • BrownBoost

    BrownBoost

    BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is the case for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods. BrownBoost was introduced by Yoav Freund in 2001. == Motivation == AdaBoost performs well on a variety of datasets; however, it can be shown that AdaBoost does not perform well on noisy data sets. This is a result of AdaBoost's focus on examples that are repeatedly misclassified. In contrast, BrownBoost effectively "gives up" on examples that are repeatedly misclassified. The core assumption of BrownBoost is that noisy examples will be repeatedly mislabeled by the weak hypotheses and non-noisy examples will be correctly labeled frequently enough to not be "given up on." Thus only noisy examples will be "given up on," whereas non-noisy examples will contribute to the final classifier. In turn, if the final classifier is learned from the non-noisy examples, the generalization error of the final classifier may be much better than if learned from noisy and non-noisy examples. The user of the algorithm can set the amount of error to be tolerated in the training set. Thus, if the training set is noisy (say 10% of all examples are assumed to be mislabeled), the booster can be told to accept a 10% error rate. Since the noisy examples may be ignored, only the true examples will contribute to the learning process. == Algorithm description == BrownBoost uses a non-convex potential loss function, thus it does not fit into the AdaBoost framework. The non-convex optimization provides a method to avoid overfitting noisy data sets. However, in contrast to boosting algorithms that analytically minimize a convex loss function (e.g. AdaBoost and LogitBoost), BrownBoost solves a system of two equations and two unknowns using standard numerical methods. The only parameter of BrownBoost ( c {\displaystyle c} in the algorithm) is the "time" the algorithm runs. The theory of BrownBoost states that each hypothesis takes a variable amount of time ( t {\displaystyle t} in the algorithm) which is directly related to the weight given to the hypothesis α {\displaystyle \alpha } . The time parameter in BrownBoost is analogous to the number of iterations T {\displaystyle T} in AdaBoost. A larger value of c {\displaystyle c} means that BrownBoost will treat the data as if it were less noisy and therefore will give up on fewer examples. Conversely, a smaller value of c {\displaystyle c} means that BrownBoost will treat the data as more noisy and give up on more examples. During each iteration of the algorithm, a hypothesis is selected with some advantage over random guessing. The weight of this hypothesis α {\displaystyle \alpha } and the "amount of time passed" t {\displaystyle t} during the iteration are simultaneously solved in a system of two non-linear equations ( 1. uncorrelated hypothesis w.r.t example weights and 2. hold the potential constant) with two unknowns (weight of hypothesis α {\displaystyle \alpha } and time passed t {\displaystyle t} ). This can be solved by bisection (as implemented in the JBoost software package) or Newton's method (as described in the original paper by Freund). Once these equations are solved, the margins of each example ( r i ( x j ) {\displaystyle r_{i}(x_{j})} in the algorithm) and the amount of time remaining s {\displaystyle s} are updated appropriately. This process is repeated until there is no time remaining. The initial potential is defined to be 1 m ∑ j = 1 m 1 − erf ( c ) = 1 − erf ( c ) {\displaystyle {\frac {1}{m}}\sum _{j=1}^{m}1-{\mbox{erf}}({\sqrt {c}})=1-{\mbox{erf}}({\sqrt {c}})} . Since a constraint of each iteration is that the potential be held constant, the final potential is 1 m ∑ j = 1 m 1 − erf ( r i ( x j ) / c ) = 1 − erf ( c ) {\displaystyle {\frac {1}{m}}\sum _{j=1}^{m}1-{\mbox{erf}}(r_{i}(x_{j})/{\sqrt {c}})=1-{\mbox{erf}}({\sqrt {c}})} . Thus the final error is likely to be near 1 − erf ( c ) {\displaystyle 1-{\mbox{erf}}({\sqrt {c}})} . However, the final potential function is not the 0–1 loss error function. For the final error to be exactly 1 − erf ( c ) {\displaystyle 1-{\mbox{erf}}({\sqrt {c}})} , the variance of the loss function must decrease linearly w.r.t. time to form the 0–1 loss function at the end of boosting iterations. This is not yet discussed in the literature and is not in the definition of the algorithm below. The final classifier is a linear combination of weak hypotheses and is evaluated in the same manner as most other boosting algorithms. == BrownBoost learning algorithm definition == Input: m {\displaystyle m} training examples ( x 1 , y 1 ) , … , ( x m , y m ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{m},y_{m})} where x j ∈ X , y j ∈ Y = { − 1 , + 1 } {\displaystyle x_{j}\in X,\,y_{j}\in Y=\{-1,+1\}} The parameter c {\displaystyle c} Initialise: s = c {\displaystyle s=c} . (The value of s {\displaystyle s} is the amount of time remaining in the game) r i ( x j ) = 0 {\displaystyle r_{i}(x_{j})=0} ∀ j {\displaystyle \forall j} . The value of r i ( x j ) {\displaystyle r_{i}(x_{j})} is the margin at iteration i {\displaystyle i} for example x j {\displaystyle x_{j}} . While s > 0 {\displaystyle s>0} : Set the weights of each example: W i ( x j ) = e − ( r i ( x j ) + s ) 2 c {\displaystyle W_{i}(x_{j})=e^{-{\frac {(r_{i}(x_{j})+s)^{2}}{c}}}} , where r i ( x j ) {\displaystyle r_{i}(x_{j})} is the margin of example x j {\displaystyle x_{j}} Find a classifier h i : X → { − 1 , + 1 } {\displaystyle h_{i}:X\to \{-1,+1\}} such that ∑ j W i ( x j ) h i ( x j ) y j > 0 {\displaystyle \sum _{j}W_{i}(x_{j})h_{i}(x_{j})y_{j}>0} Find values α , t {\displaystyle \alpha ,t} that satisfy the equation: ∑ j h i ( x j ) y j e − ( r i ( x j ) + α h i ( x j ) y j + s − t ) 2 c = 0 {\displaystyle \sum _{j}h_{i}(x_{j})y_{j}e^{-{\frac {(r_{i}(x_{j})+\alpha h_{i}(x_{j})y_{j}+s-t)^{2}}{c}}}=0} . (Note this is similar to the condition E W i + 1 [ h i ( x j ) y j ] = 0 {\displaystyle E_{W_{i+1}}[h_{i}(x_{j})y_{j}]=0} set forth by Schapire and Singer. In this setting, we are numerically finding the W i + 1 = exp ⁡ ( ⋯ ⋯ ) {\displaystyle W_{i+1}=\exp \left({\frac {\cdots }{\cdots }}\right)} such that E W i + 1 [ h i ( x j ) y j ] = 0 {\displaystyle E_{W_{i+1}}[h_{i}(x_{j})y_{j}]=0} .) This update is subject to the constraint ∑ ( Φ ( r i ( x j ) + α h ( x j ) y j + s − t ) − Φ ( r i ( x j ) + s ) ) = 0 {\displaystyle \sum \left(\Phi \left(r_{i}(x_{j})+\alpha h(x_{j})y_{j}+s-t\right)-\Phi \left(r_{i}(x_{j})+s\right)\right)=0} , where Φ ( z ) = 1 − erf ( z / c ) {\displaystyle \Phi (z)=1-{\mbox{erf}}(z/{\sqrt {c}})} is the potential loss for a point with margin r i ( x j ) {\displaystyle r_{i}(x_{j})} Update the margins for each example: r i + 1 ( x j ) = r i ( x j ) + α h ( x j ) y j {\displaystyle r_{i+1}(x_{j})=r_{i}(x_{j})+\alpha h(x_{j})y_{j}} Update the time remaining: s = s − t {\displaystyle s=s-t} Output: H ( x ) = sign ( ∑ i α i h i ( x ) ) {\displaystyle H(x)={\textrm {sign}}\left(\sum _{i}\alpha _{i}h_{i}(x)\right)} == Empirical results == In preliminary experimental results with noisy datasets, BrownBoost outperformed AdaBoost's generalization error; however, LogitBoost performed as well as BrownBoost. An implementation of BrownBoost can be found in the open source software JBoost.

    Read more →
  • Test data management

    Test data management

    Test data management (TDM) is a process in software testing concerned with the creation, preparation, and control of data used for testing software systems. It involves supplying datasets required to execute test cases and verifying system behaviour under defined conditions. Test data management is an integral part of the software development lifecycle (SDLC) and is utilized in both manual and automated testing processes. It is applied in environments that use continuous integration and DevOps practices, where test execution requires consistent and repeatable data conditions. == Overview == Test data management includes the generation, selection, and preparation of data for testing purposes, as well as its distribution across test environments. It also involves controlling data versions and ensuring that datasets correspond to specific test scenarios. In many cases, production data is adapted for testing through techniques such as masking or subsetting to reduce size and remove sensitive content. Test data management ensures that test cases are executed with relevant, consistent, and readily available data. This reduces variability in test results and supports reproducibility across test cycles. == Importance == The role of test data management has expanded with the growth of complex, data-driven systems and regulatory requirements governing data usage. Testing often depends on data that reflects real-world conditions, but direct use of production data may introduce security and privacy risks. As a result, organizations apply methods such as data masking and anonymization to meet compliance requirements, including those set by the California Privacy Rights Act (CPRA) and Europe’s General Data Protection Regulation (GDPR). Inadequate control of test data can lead to incomplete test coverage, unreliable test results, or delays in testing processes due to unavailable or inconsistent datasets. == Techniques and tools == Test data management leverages various techniques for preparing and controlling data used in testing. These include the generation of synthetic data, the extraction of subsets from production datasets, and the modification of data to remove or obscure sensitive information. A key technical requirement in these processes is maintaining referential integrity, or ensuring that relationships between data entities remain consistent across different tables and systems after masking or subsetting. Data virtualization is also used to provide access to datasets without full replication. These methods may be implemented using software tools that automate data preparation, masking, and distribution.

    Read more →
  • Multispectral pattern recognition

    Multispectral pattern recognition

    Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic spectrum (Jensen, 2005). Subcategories of multispectral remote sensing include hyperspectral, in which hundreds of bands are collected and analyzed, and ultraspectral remote sensing where many hundreds of bands are used (Logicon, 1997). The main purpose of multispectral imaging is the potential to classify the image using multispectral classification. This is a much faster method of image analysis than is possible by human interpretation. == Multispectral remote sensing systems == Remote sensing systems gather data via instruments typically carried on satellites in orbit around the Earth. The remote sensing scanner detects the energy that radiates from the object or area of interest. This energy is recorded as an analog electrical signal and converted into a digital value though an A-to-D conversion. There are several multispectral remote sensing systems that can be categorized in the following way: === Multispectral imaging using discrete detectors and scanning mirrors === Landsat Multispectral Scanner (MSS) Landsat Thematic Mapper (TM) NOAA Geostationary Operational Environmental Satellite (GOES) NOAA Advanced Very High Resolution Radiometer (AVHRR) NASA and ORBIMAGE, Inc., Sea-viewing Wide field-of-view Sensor (SeaWiFS) Daedalus, Inc., Aircraft Multispectral Scanner (AMS) NASA Airborne Terrestrial Applications Sensor (ATLAS) === Multispectral imaging using linear arrays === SPOT 1, 2, and 3 High Resolution Visible (HRV) sensors and Spot 4 and 5 High Resolution Visible Infrared (HRVIR) and vegetation sensor Indian Remote Sensing System (IRS) Linear Imaging Self-scanning Sensor (LISS) Space Imaging, Inc. (IKONOS) Digital Globe, Inc. (QuickBird) ORBIMAGE, Inc. (OrbView-3) ImageSat International, Inc. (EROS A1) NASA Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) NASA Terra Multiangle Imaging Spectroradiometer (MISR) === Imaging spectrometry using linear and area arrays === NASA Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Compact Airborne Spectrographic Imager 3 (CASI 3) NASA Terra Moderate Resolution Imaging Spectrometer (MODIS) NASA Earth Observer (EO-1) Advanced Land Imager (ALI), Hyperion, and LEISA Atmospheric Corrector (LAC) === Satellite analog and digital photographic systems === Russian SPIN-2 TK-350, and KVR-1000 NASA Space Shuttle and International Space Station Imagery == Multispectral classification methods == A variety of methods can be used for the multispectral classification of images: Algorithms based on parametric and nonparametric statistics that use ratio-and interval-scaled data and nonmetric methods that can also incorporate nominal scale data (Duda et al., 2001), Supervised or unsupervised classification logic, Hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products, Per-pixel or object-oriented classification logic, and Hybrid approaches == Supervised classification == In this classification method, the identity and location of some of the land-cover types are obtained beforehand from a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. The analyst would locate sites that have similar characteristics to the known land-cover types. These areas are known as training sites because the known characteristics of these sites are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. All pixels inside and outside of the training sites are evaluated and allocated to the class with the more similar characteristics. === Classification scheme === The first step in the supervised classification method is to identify the land-cover and land-use classes to be used. Land-cover refers to the type of material present on the site (e.g. water, crops, forest, wet land, asphalt, and concrete). Land-use refers to the modifications made by people to the land cover (e.g. agriculture, commerce, settlement). All classes should be selected and defined carefully to properly classify remotely sensed data into the correct land-use and/or land-cover information. To achieve this purpose, it is necessary to use a classification system that contains taxonomically correct definitions of classes. If a hard classification is desired, the following classes should be used: Mutually exclusive: there is not any taxonomic overlap of any classes (i.e., rain forest and evergreen forest are distinct classes). Exhaustive: all land-covers in the area have been included. Hierarchical: sub-level classes (e.g., single-family residential, multiple-family residential) are created, allowing that these classes can be included in a higher category (e.g., residential). Some examples of hard classification schemes are: American Planning Association Land-Based Classification System United States Geological Survey Land-use/Land-cover Classification System for Use with Remote Sensor Data U.S. Department of the Interior Fish and Wildlife Service U.S. National Vegetation and Classification System International Geosphere-Biosphere Program IGBP Land Cover Classification System === Training sites === Once the classification scheme is adopted, the image analyst may select training sites in the image that are representative of the land-cover or land-use of interest. If the environment where the data was collected is relatively homogeneous, the training data can be used. If different conditions are found in the site, it would not be possible to extend the remote sensing training data to the site. To solve this problem, a geographical stratification should be done during the preliminary stages of the project. All differences should be recorded (e.g. soil type, water turbidity, crop species, etc.). These differences should be recorded on the imagery and the selection training sites made based on the geographical stratification of this data. The final classification map would be a composite of the individual stratum classifications. After the data are organized in different training sites, a measurement vector is created. This vector would contain the brightness values for each pixel in each band in each training class. The mean, standard deviation, variance-covariance matrix, and correlation matrix are calculated from the measurement vectors. Once the statistics from each training site are determined, the most effective bands for each class should be selected. The objective of this discrimination is to eliminate the bands that can provide redundant information. Graphical and statistical methods can be used to achieve this objective. Some of the graphic methods are: Bar graph spectral plots Cospectral mean vector plots Feature space plots Cospectral parallelepiped or ellipse plots === Classification algorithm === The last step in supervised classification is selecting an appropriate algorithm. The choice of a specific algorithm depends on the input data and the desired output. Parametric algorithms are based on the fact that the data is normally distributed. If the data is not normally distributed, nonparametric algorithms should be used. The more common nonparametric algorithms are: One-dimensional density slicing Parallelipiped Minimum distance Nearest-neighbor Expert system analysis Convolutional neural network == Unsupervised classification == Unsupervised classification (also known as clustering) is a method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Unsupervised classification require less input information from the analyst compared to supervised classification because clustering does not require training data. This process consists in a series of numerical operations to search for the spectral properties of pixels. From this process, a map with m spectral classes is obtained. Using the map, the analyst tries to assign or transform the spectral classes into thematic information of interest (i.e. forest, agriculture, urban). This process may not be easy because some spectral clusters represent mixed classes of surface materials and may not be useful. The analyst has to understand the spectral characteristics of the terrain to be able to label clusters as a specific information class. There are hundreds of clustering algorithms. Two of the most conceptually simple algorithms are the chain method and the ISODATA method. === Chain method === The algorithm used in this method operates in a two-pass mode (it passes through the multispectral dataset two times. In the first pass, the program reads through the dataset and sequentially builds clusters (groups of p

    Read more →
  • List of datasets for machine-learning research

    List of datasets for machine-learning research

    These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less intuitively, the availability of high-quality training datasets. High-quality labeled training datasets for supervised and semi-supervised machine-learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality unlabeled datasets for unsupervised learning can also be difficult and costly to produce. Many organizations, including governments, publish and share their datasets, often using common metadata formats (such as Croissant). The datasets are classified, based on the licenses, into two groups: open data and non-open data. The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made available for searching, depositing and accessing through interfaces like Open API. The datasets are made available as various sorted types and subtypes. == List of sorting used for datasets == The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many government organizations and academic institutions. == List of open data portals == == List of portals suitable for multiple types of applications == The data portal sometimes lists a wide variety of subtypes of datasets pertaining to many machine learning applications. == List of portals suitable for a specific subtype of applications == The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections. == Image data == == Text data == These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis. === Reviews === === News articles === === Messages === === Twitter and tweets === === Dialogues === === Legal === === Other text === == Sound data == These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis. === Speech === === Music === === Other sounds === == Signal data == Datasets containing electric signal information requiring some sort of signal processing for further analysis. === Electrical === === Motion-tracking === === Other signals === == Chemical data == Datasets from physical systems. === Chemical Reactions with transition states (TS) === === OpenReACT-CHON-EFH === OpenReACT-CHON-EFH (Open Reaction Dataset of Atomic ConfiguraTions comprising C, H, O and N with Energies, Forces and Hessians) is a 2025 open-access benchmark for machine-learning interatomic potentials. RTP set – 35,087 stationary-point geometries (reactant, transition state and product) drawn from 11,961 elementary reactions, each labeled with density-functional energies, atomic forces and full Hessian matrices at the ωB97X-D/6-31G(d) level. IRC set – 34,248 structures along 600 minimum-energy reaction paths, used to test extrapolation beyond trained stationary points. NMS set – 62,527 off-equilibrium geometries generated by normal-mode sampling to probe model robustness under thermal perturbations. The collection underpins the study Does Hessian Data Improve the Performance of Machine Learning Potentials? and was used to train and benchmark the machine-learning interatomic potentials reported therein. The dataset itself is distributed under a CC licence via Figshare. == Physical data == Datasets from physical systems. === High-energy physics === === Systems === === Astronomy === === Earth science === === Other physical === == Biological data == Datasets from biological systems. === Human === === Animal === === Fungi === === Plant === === Microbe === === Drug discovery === == Anomaly data == == Question answering data == This section includes datasets that deals with structured data. == Dialog or instruction prompted data == This section includes datasets that contains multi-turn text with at least two actors, a "user" and an "agent". The user makes requests for the agent, which performs the request. == Cybersecurity == == Climate and sustainability == == Code data == == Multivariate data == === Financial === === Weather === === Census === === Transit === === Internet === === Games === === Other multivariate === == Curated repositories of datasets == As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research. OpenML: Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms. PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification and regression datasets in a standardized format that are accessible through a Python API. Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic. Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.

    Read more →
  • Shattered set

    Shattered set

    A class of sets is said to shatter another set if it is possible to "pick out" any element of that set using intersection. The concept of shattered sets plays an important role in Vapnik–Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory. == Definition == Suppose A is a set and C is a class of sets. The class C shatters the set A if for each subset a of A, there is some element c of C such that a = c ∩ A . {\displaystyle a=c\cap A.} Equivalently, C shatters A when their intersection is equal to A's power set: P(A) = { c ∩ A | c ∈ C }. We employ the letter C to refer to a "class" or "collection" of sets, as in a Vapnik–Chervonenkis class (VC-class). The set A is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points. == Example == We will show that the class of all discs in the plane (two-dimensional space) does not shatter every set of four points on the unit circle, yet the class of all convex sets in the plane does shatter every finite set of points on the unit circle. Let A be a set of four points on the unit circle and let C be the class of all discs. To test where C shatters A, we attempt to draw a disc around every subset of points in A. First, we draw a disc around the subsets of each isolated point. Next, we try to draw a disc around every subset of point pairs. This turns out to be doable for adjacent points, but impossible for points on opposite sides of the circle. Any attempt to include those points on the opposite side will necessarily include other points not in that pair. Hence, any pair of opposite points cannot be isolated out of A using intersections with class C and so C does not shatter A. As visualized below: Because there is some subset which can not be isolated by any disc in C, we conclude then that A is not shattered by C. And, with a bit of thought, we can prove that no set of four points is shattered by this C. However, if we redefine C to be the class of all elliptical discs, we find that we can still isolate all the subsets from above, as well as the points that were formerly problematic. Thus, this specific set of 4 points is shattered by the class of elliptical discs. Visualized below: With a bit of thought, we could generalize that any set of finite points on a unit circle could be shattered by the class of all convex sets (visualize connecting the dots). == Shatter coefficient == To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as the growth function). For a collection C of sets s ⊂ Ω {\displaystyle s\subset \Omega } , Ω {\displaystyle \Omega } being any space, often a sample space, we define the nth shattering coefficient of C as S C ( n ) = max ∀ x 1 , x 2 , … , x n ∈ Ω card ⁡ { { x 1 , x 2 , … , x n } ∩ s , s ∈ C } {\displaystyle S_{C}(n)=\max _{\forall x_{1},x_{2},\dots ,x_{n}\in \Omega }\operatorname {card} \{\,\{\,x_{1},x_{2},\dots ,x_{n}\}\cap s,s\in C\}} where card {\displaystyle \operatorname {card} } denotes the cardinality of the set and x 1 , x 2 , … , x n ∈ Ω {\displaystyle x_{1},x_{2},\dots ,x_{n}\in \Omega } is any set of n points,. S C ( n ) {\displaystyle S_{C}(n)} is the largest number of subsets of any set A of n points that can be formed by intersecting A with the sets in collection C. For example, if set A contains 3 points, its power set, P ( A ) {\displaystyle P(A)} , contains 2 3 = 8 {\displaystyle 2^{3}=8} elements. If C shatters A, its shattering coefficient(3) would be 8 and S C ( 2 ) {\displaystyle S_{C}(2)} would be 2 2 = 4 {\displaystyle 2^{2}=4} . However, if one of those sets in P ( A ) {\displaystyle P(A)} cannot be obtained through intersections in c, then S C ( 3 ) {\displaystyle S_{C}(3)} would only be 7. If none of those sets can be obtained, S C ( 3 ) {\displaystyle S_{C}(3)} would be 0. Additionally, if S C ( 2 ) = 3 {\displaystyle S_{C}(2)=3} , for example, then there is an element in the set of all 2-point sets from A that cannot be obtained from intersections with C. It follows from this that S C ( 3 ) {\displaystyle S_{C}(3)} would also be less than 8 (i.e. C would not shatter A) because we have already located a "missing" set in the smaller power set of 2-point sets. This example illustrates some properties of S C ( n ) {\displaystyle S_{C}(n)} : S C ( n ) ≤ 2 n {\displaystyle S_{C}(n)\leq 2^{n}} for all n because { s ∩ A | s ∈ C } ⊆ P ( A ) {\displaystyle \{s\cap A|s\in C\}\subseteq P(A)} for any A ⊆ Ω {\displaystyle A\subseteq \Omega } . If S C ( n ) = 2 n {\displaystyle S_{C}(n)=2^{n}} , that means there is a set of cardinality n, which can be shattered by C. If S C ( N ) < 2 N {\displaystyle S_{C}(N)<2^{N}} for some N > 1 {\displaystyle N>1} then S C ( n ) < 2 n {\displaystyle S_{C}(n)<2^{n}} for all n ≥ N {\displaystyle n\geq N} . The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities. == Vapnik–Chervonenkis class == If A cannot be shattered by C, there will be a smallest value of n that makes the shatter coefficient(n) less than 2 n {\displaystyle 2^{n}} because as n gets larger, there are more sets that could be missed. Alternatively, there is also a largest value of n for which the S C ( n ) {\displaystyle S_{C}(n)} is still 2 n {\displaystyle 2^{n}} , because as n gets smaller, there are fewer sets that could be omitted. The extreme of this is S C ( 0 ) {\displaystyle S_{C}(0)} (the shattering coefficient of the empty set), which must always be 2 0 = 1 {\displaystyle 2^{0}=1} . These statements lends themselves to defining the VC dimension of a class C as: V C ( C ) = min n { n : S C ( n ) < 2 n } {\displaystyle VC(C)={\underset {n}{\min }}\{n:S_{C}(n)<2^{n}\}\,} or, alternatively, as V C 0 ( C ) = max n { n : S C ( n ) = 2 n } . {\displaystyle VC_{0}(C)={\underset {n}{\max }}\{n:S_{C}(n)=2^{n}\}.\,} Note that V C ( C ) = V C 0 ( C ) + 1. {\displaystyle VC(C)=VC_{0}(C)+1.} . The VC dimension is usually defined as V C 0 {\displaystyle VC_{0}} , the largest cardinality of points chosen that will still shatter A (i.e. n such that S C ( n ) = 2 n {\displaystyle S_{C}(n)=2^{n}} ). Altneratively, if for any n there is a set of cardinality n which can be shattered by C, then S C ( n ) = 2 n {\displaystyle S_{C}(n)=2^{n}} for all n and the VC dimension of this class C is infinite. A class with finite VC dimension is called a Vapnik–Chervonenkis class or VC class. A class C is uniformly Glivenko–Cantelli if and only if it is a VC class.

    Read more →
  • FuseBase

    FuseBase

    FuseBase (previously Nimbus Note and Nimbus Platform) is a B2B SaaS platform. It is among the first to support the Model Context Protocol (MCP), an open standard enabling seamless integration of AI agents with external tools, systems, and data sources. == History == The platform was founded in 2014 as Nimbus Note, the platform started as a cross-platform note-taking and information management tool. As it evolved into Nimbus Platform, it added project management and client portal capabilities. In 2023, the company rebranded as FuseBase, pivoting to connect and automate both internal and external collaboration through AI Agents and cutting-edge protocol adoption like MCP. At the same time, FuseBase was named Product of the Year on Product Hunt. == Technical overview == The platform integrates the Model Context Protocol (MCP), an open-source framework created by Anthropic. MCP allows AI models to securely access and interact with external data, tools, and systems. This enables FuseBase AI Agents to gather relevant context, perform actions, and provide more advanced automation.

    Read more →
  • Winner-take-all (computing)

    Winner-take-all (computing)

    Winner-take-all is a computational principle applied in computational models of neural networks by which neurons compete with each other for activation. In the classical form, only the neuron with the highest activation stays active while all other neurons shut down; however, other variations allow more than one neuron to be active, for example the soft winner take-all, by which a power function is applied to the neurons. == Neural networks == In the theory of artificial neural networks, winner-take-all networks are a case of competitive learning in recurrent neural networks. Output nodes in the network mutually inhibit each other, while simultaneously activating themselves through reflexive connections. After some time, only one node in the output layer will be active, namely the one corresponding to the strongest input. Thus the network uses nonlinear inhibition to pick out the largest of a set of inputs. Winner-take-all is a general computational primitive that can be implemented using different types of neural network models, including both continuous-time and spiking networks. Winner-take-all networks are commonly used in computational models of the brain, particularly for distributed decision-making or action selection in the cortex. Important examples include hierarchical models of vision, and models of selective attention and recognition. They are also common in artificial neural networks and neuromorphic analog VLSI circuits. It has been formally proven that the winner-take-all operation is computationally powerful compared to other nonlinear operations, such as thresholding. In many practical cases, there is not only one single neuron which becomes active but there are exactly k neurons which become active for a fixed number k. This principle is referred to as k-winners-take-all. === Example algorithm === Consider a single linear neuron, with inputs x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} . Each input has weight w i {\displaystyle w_{i}} , and the output of the neuron is ∑ i w i x i {\displaystyle \sum _{i}w_{i}x_{i}} . In the Instar learning rule, on each input vector, the weight vectors are modified according to Δ w i = η ( x i − w i ) {\displaystyle \Delta w_{i}=\eta (x_{i}-w_{i})} where η {\displaystyle \eta } is the learning rate. This rule is unsupervised, since we need just the input vector, not a reference output. Now, consider multiple linear neurons y 1 , … , y m {\displaystyle y_{1},\dots ,y_{m}} . The output of each satisfies y i = ∑ j w i j x j {\displaystyle y_{i}=\sum _{j}w_{ij}x_{j}} . In the winner-take-all algorithm, the weights are modified as follows. Given an input vector x {\displaystyle x} , each output is computed. The neuron with the largest output is selected, and the weights going into that neuron are modified according to the Instar learning rule. All other weights remain unchanged. The k-winners-take-all rule is similar, except that the Instar learning rule is applied to the weights going into the k neurons with the largest outputs. == Circuit example == A simple, but popular CMOS winner-take-all circuit is shown on the right. This circuit was originally proposed by Lazzaro et al. (1989) using MOS transistors biased to operate in the weak-inversion or subthreshold regime. In the particular case shown there are only two inputs (IIN,1 and IIN,2), but the circuit can be easily extended to multiple inputs in a straightforward way. It operates on continuous-time input signals (currents) in parallel, using only two transistors per input. In addition, the bias current IBIAS is set by a single global transistor that is common to all the inputs. The largest of the input currents sets the common potential VC. As a result, the corresponding output carries almost all the bias current, while the other outputs have currents that are close to zero. Thus, the circuit selects the larger of the two input currents, i.e., if IIN,1 > IIN,2, we get IOUT,1 = IBIAS and IOUT,2 = 0. Similarly, if IIN,2 > IIN,1, we get IOUT,1 = 0 and IOUT,2 = IBIAS. A SPICE-based DC simulation of the CMOS winner-take-all circuit in the two-input case is shown on the right. As shown in the top subplot, the input IIN,1 was fixed at 6nA, while IIN,2 was linearly increased from 0 to 10nA. The bottom subplot shows the two output currents. As expected, the output corresponding to the larger of the two inputs carries the entire bias current (10nA in this case), forcing the other output current nearly to zero. == Other uses == In stereo matching algorithms, following the taxonomy proposed by Scharstein and Szelliski, winner-take-all is a local method for disparity computation. Adopting a winner-take-all strategy, the disparity associated with the minimum or maximum cost value is selected at each pixel. It is axiomatic that in the electronic commerce market, early dominant players such as AOL or Yahoo! get most of the rewards. By 1998, one study found the top 5% of all web sites garnered more than 74% of all traffic. The winner-take-all hypothesis in economics suggests that once a technology or a firm gets ahead, it will do better and better over time, whereas lagging technology and firms will fall further behind. See First-mover advantage.

    Read more →
  • GNU Octave

    GNU Octave

    GNU Octave is a scientific programming language for scientific computing and numerical computation. Among other things, Octave can be used to solve linear and nonlinear problems numerically and to perform other numerical experiments using a language that is mostly compatible with MATLAB. It may also be used as a batch-oriented language. As part of the GNU Project, it is free software under the terms of the GNU General Public License. == History == The project was conceived around 1988. At first it was intended to be a companion to a chemical reactor design course. Full development was started by John W. Eaton in 1992. The first alpha release dates back to 4 January 1993 and on 17 February 1994 version 1.0 was released. Version 9.2.0 was released on 7 June 2024. The program is named after Octave Levenspiel, a former professor of the principal author. Levenspiel was known for his ability to perform quick back-of-the-envelope calculations. == Development history == == Developments == In addition to use on desktops for personal scientific computing, Octave is used in academia and industry. For example, Octave was used on a massive parallel computer at Pittsburgh Supercomputing Center to find vulnerabilities related to guessing social security numbers. Acceleration with OpenCL or CUDA is also possible with use of GPUs. == Technical details == Octave is written in C++ using the C++ standard library. Octave uses an interpreter to execute the Octave scripting language. Octave is extensible using dynamically loadable modules. Octave interpreter has an OpenGL-based graphics engine to create plots, graphs and charts and to save or print them. Alternatively, gnuplot can be used for the same purpose. Octave includes a graphical user interface (GUI) in addition to the traditional command-line interface (CLI); see #User interfaces for details. == Octave, the language == The Octave language is an interpreted programming language. It is a structured programming language (similar to C) and supports many common C standard library functions, and also certain UNIX system calls and functions. However, it does not support passing arguments by reference although function arguments are copy-on-write to avoid unnecessary duplication. Octave programs consist of a list of function calls or a script. The syntax is matrix-based and provides various functions for matrix operations. It supports various data structures and allows object-oriented programming. Its syntax is very similar to MATLAB, and careful programming of a script will allow it to run on both Octave and MATLAB. Because Octave is made available under the GNU General Public License, it may be freely changed, copied and used. The program runs on Microsoft Windows and most Unix and Unix-like operating systems, including Linux, Android, and macOS. == Notable features == === Command and variable name completion === Typing a TAB character on the command line causes Octave to attempt to complete variable, function, and file names (similar to Bash's tab completion). Octave uses the text before the cursor as the initial portion of the name to complete. === Command history === When running interactively, Octave saves the commands typed in an internal buffer so that they can be recalled and edited. === Data structures === Octave includes a limited amount of support for organizing data in structures. In this example, we see a structure x with elements a, b, and c, (an integer, an array, and a string, respectively): === Short-circuit Boolean operators === Octave's && and || logical operators are evaluated in a short-circuit fashion (like the corresponding operators in the C language), in contrast to the element-by-element operators & and |. === Increment and decrement operators === Octave includes the C-like increment and decrement operators ++ and -- in both their prefix and postfix forms. Octave also does augmented assignment, e.g. x += 5. === Unwind-protect === Octave supports a limited form of exception handling modelled after the unwind_protect of Lisp. The general form of an unwind_protect block looks like this: As a general rule, GNU Octave recognizes as termination of a given block either the keyword end (which is compatible with the MATLAB language) or a more specific keyword endblock or, in some cases, end_block. As a consequence, an unwind_protect block can be terminated either with the keyword end_unwind_protect as in the example, or with the more portable keyword end. The cleanup part of the block is always executed. In case an exception is raised by the body part, cleanup is executed immediately before propagating the exception outside the block unwind_protect. GNU Octave also supports another form of exception handling (compatible with the MATLAB language): This latter form differs from an unwind_protect block in two ways. First, exception_handling is only executed when an exception is raised by body. Second, after the execution of exception_handling the exception is not propagated outside the block (unless a rethrow( lasterror ) statement is explicitly inserted within the exception_handling code). === Variable-length argument lists === Octave has a mechanism for handling functions that take an unspecified number of arguments without explicit upper limit. To specify a list of zero or more arguments, use the special argument varargin as the last (or only) argument in the list. varargin is a cell array containing all the input arguments. === Variable-length return lists === A function can be set up to return any number of values by using the special return value varargout. For example: === C++ integration === It is also possible to execute Octave code directly in a C++ program. For example, here is a code snippet for calling rand([10,1]): C and C++ code can be integrated into GNU Octave by creating oct files, or using the MATLAB compatible MEX files. == MATLAB compatibility == Octave has been built with MATLAB compatibility in mind, and shares many features with MATLAB: % Script: myscript.m a = 5; b = a 2 % Function: myfunc.m function result = myfunc(x) result = x^2 + 3; end Matrices as fundamental data type. Built-in support for complex numbers. Powerful built-in math functions and extensive function libraries. Extensibility in the form of user-defined functions. Octave treats incompatibility with MATLAB as a bug; therefore, it could be considered a software clone, which does not infringe software copyright as per Lotus v. Borland court case. MATLAB scripts from the MathWorks' FileExchange repository in principle are compatible with Octave. However, while they are often provided and uploaded by users under an Octave compatible and proper open source BSD license, the FileExchange Terms of use prohibit any usage beside MathWorks' proprietary MATLAB. === Syntax compatibility === There are a few purposeful, albeit minor, syntax additions Archived 2012-04-26 at the Wayback Machine: Comment lines can be prefixed with the # character as well as the % character; Various C-based operators ++, --, +=, =, /= are supported; Elements can be referenced without creating a new variable by cascaded indexing, e.g. [1:10](3); Strings can be defined with the double-quote " character as well as the single-quote ' character; When the variable type is single (a single-precision floating-point number), Octave calculates the "mean" in the single-domain (MATLAB in double-domain) which is faster but gives less accurate results; Blocks can also be terminated with more specific Control structure keywords, i.e., endif, endfor, endwhile, etc.; Functions can be defined within scripts and at the Octave prompt; Presence of a do-until loop (similar to do-while in C). === Function compatibility === Many, but not all, of the numerous MATLAB functions are available in GNU Octave, some of them accessible through packages in Octave Forge. The functions available as part of either core Octave or Forge packages are listed online Archived 2024-03-14 at the Wayback Machine. A list of unavailable functions is included in the Octave function __unimplemented.m__. Unimplemented functions are also listed under many Octave Forge packages in the Octave Wiki. When an unimplemented function is called the following error message is shown: == User interfaces == Octave comes with an official graphical user interface (GUI) and an integrated development environment (IDE) based on Qt. It has been available since Octave 3.8, and has become the default interface (over the command-line interface) with the release of Octave 4.0. It was well-received by an EDN contributor, who wrote "[Octave] now has a very workable GUI" in reviewing the then-new GUI in 2014. Several 3rd-party graphical front-ends have also been developed, like ToolboX for coding education. == GUI applications == With Octave code, the user can create GUI applications. See GUI Development (GNU Octave (version 7.1.0)). Below are some examples: Button, edit control, checkboxTextboxListbox wit

    Read more →