AI Avatar Tools

AI Avatar Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Huroof

    Huroof

    Huroof (Arabic: حروف, lit. 'letters') is an Android kids application produced by the Islamic State, specifically the Islamic States' Al-Himmah Library, which is targeted towards kids in order to teach kids the Arabic alphabet, and to also get kids to support the Islamic State and its practices. == Application == Huroof uses child-like appearances on the main menu, and throughout multiple of Huroof's in-game games for learning the alphabet, a lot of the games reference jihadist concepts, including imagery of weapons (such as missile, tank, cannon, sword,...), 'violent' images, as well as Islamic State imagery, including the flag of the Islamic State, Huroof uses nasheeds from Ajnad Media Foundation for audio production in the app. Reportedly, Huroof was released via Telegram channels of the Islamic State, as well as other file sharing websites. It is not the first moblie app released by Islamic State, but it is the first time they released a moblie application targeting children. === Nasheed game === In the Huroof app, there's a game where you listen to a radio, with the Al-Bayan logo on it, and learn the Arabic alphabet while the nasheed plays. === Writing game === In Huroof, there's a game where you can write out letters of the Arabic alphabet, as well as numbers while a small child tells you what they are. === Letter choosing game === In the app, there's a game they shows you images, and you choose which letter that image/item starts with.

    Read more →
  • Dendrogram

    Dendrogram

    A dendrogram is a diagram representing a tree graph. This diagrammatic representation is frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. in computational biology, it shows the clustering of genes or samples, sometimes in the margins of heatmaps. in phylogenetics, it displays the evolutionary relationships among various biological taxa. In this case, the dendrogram is also called a phylogenetic tree. The name dendrogram derives from the two ancient greek words δένδρον (déndron), meaning "tree", and γράμμα (grámma), meaning "drawing, mathematical figure". == Clustering example == For a clustering example, suppose that five taxa ( a {\displaystyle a} to e {\displaystyle e} ) have been clustered by UPGMA based on a matrix of genetic distances. The hierarchical clustering dendrogram would show a column of five nodes representing the initial data (here individual taxa), and the remaining nodes represent the clusters to which the data belong, with the arrows representing the distance (dissimilarity). The distance between merged clusters is monotone, increasing with the level of the merger: the height of each node in the plot is proportional to the value of the intergroup dissimilarity between its two daughters (the nodes on the right representing individual observations all plotted at zero height).

    Read more →
  • Proximal policy optimization

    Proximal policy optimization

    Proximal policy optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent. Specifically, it is a policy gradient method, often used for deep RL when the policy network is very large. == History == The predecessor to PPO, Trust Region Policy Optimization (TRPO), was published in 2015. It addressed the instability issue of another algorithm, the Deep Q-Network (DQN), by using the trust region method to limit the KL divergence between the old and new policies. However, TRPO uses the Hessian matrix (a matrix of second derivatives) to enforce the trust region, but the Hessian is inefficient for large-scale problems. PPO was published in 2017. It was essentially an approximation of TRPO that does not require computing the Hessian. The KL divergence constraint was approximated by simply clipping the policy gradient. Since 2018, PPO was the default RL algorithm at OpenAI. PPO has been applied to many areas, such as controlling a robotic arm, beating professional players at Dota 2 (OpenAI Five), and playing Atari games. == TRPO == TRPO, the predecessor of PPO, is an on-policy algorithm. It can be used for environments with either discrete or continuous action spaces. The pseudocode is as follows: Input: initial policy parameters θ 0 {\textstyle \theta _{0}} , initial value function parameters ϕ 0 {\textstyle \phi _{0}} Hyperparameters: KL-divergence limit δ {\textstyle \delta } , backtracking coefficient α {\textstyle \alpha } , maximum number of backtracking steps K {\textstyle K} for k = 0 , 1 , 2 , … {\textstyle k=0,1,2,\ldots } do Collect set of trajectories D k = { τ i } {\textstyle {\mathcal {D}}_{k}=\left\{\tau _{i}\right\}} by running policy π k = π ( θ k ) {\textstyle \pi _{k}=\pi \left(\theta _{k}\right)} in the environment. Compute rewards-to-go R ^ t {\textstyle {\hat {R}}_{t}} . Compute advantage estimates, A ^ t {\textstyle {\hat {A}}_{t}} (using any method of advantage estimation) based on the current value function V ϕ k {\textstyle V_{\phi _{k}}} . Estimate policy gradient as g ^ k = 1 | D k | ∑ τ ∈ D k ∑ t = 0 T ∇ θ log ⁡ π θ ( a t ∣ s t ) | θ k A ^ t {\displaystyle {\hat {g}}_{k}=\left.{\frac {1}{\left|{\mathcal {D}}_{k}\right|}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\nabla _{\theta }\log \pi _{\theta }\left(a_{t}\mid s_{t}\right)\right|_{\theta _{k}}{\hat {A}}_{t}} Use the conjugate gradient algorithm to compute x ^ k ≈ H ^ k − 1 g ^ k {\displaystyle {\hat {x}}_{k}\approx {\hat {H}}_{k}^{-1}{\hat {g}}_{k}} where H ^ k {\textstyle {\hat {H}}_{k}} is the Hessian of the sample average KL-divergence. Update the policy by backtracking line search with θ k + 1 = θ k + α j 2 δ x ^ k T H ^ k x ^ k x ^ k {\displaystyle \theta _{k+1}=\theta _{k}+\alpha ^{j}{\sqrt {\frac {2\delta }{{\hat {x}}_{k}^{T}{\hat {H}}_{k}{\hat {x}}_{k}}}}{\hat {x}}_{k}} where j ∈ { 0 , 1 , 2 , … K } {\textstyle j\in \{0,1,2,\ldots K\}} is the smallest value which improves the sample loss and satisfies the sample KL-divergence constraint. Fit value function by regression on mean-squared error: ϕ k + 1 = arg ⁡ min ϕ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T ( V ϕ ( s t ) − R ^ t ) 2 {\displaystyle \phi _{k+1}=\arg \min _{\phi }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\left(V_{\phi }\left(s_{t}\right)-{\hat {R}}_{t}\right)^{2}} typically via some gradient descent algorithm. == PPO == The pseudocode is as follows: Input: initial policy parameters θ 0 {\textstyle \theta _{0}} , initial value function parameters ϕ 0 {\textstyle \phi _{0}} for k = 0 , 1 , 2 , … {\textstyle k=0,1,2,\ldots } do Collect set of trajectories D k = { τ i } {\textstyle {\mathcal {D}}_{k}=\left\{\tau _{i}\right\}} by running policy π k = π ( θ k ) {\textstyle \pi _{k}=\pi \left(\theta _{k}\right)} in the environment. Compute rewards-to-go R ^ t {\textstyle {\hat {R}}_{t}} . Compute advantage estimates, A ^ t {\textstyle {\hat {A}}_{t}} (using any method of advantage estimation) based on the current value function V ϕ k {\textstyle V_{\phi _{k}}} . Update the policy by maximizing the PPO-Clip objective: θ k + 1 = arg ⁡ max θ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T min ( π θ ( a t ∣ s t ) π θ k ( a t ∣ s t ) A π θ k ( s t , a t ) , g ( ϵ , A π θ k ( s t , a t ) ) ) {\displaystyle \theta _{k+1}=\arg \max _{\theta }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\min \left({\frac {\pi _{\theta }\left(a_{t}\mid s_{t}\right)}{\pi _{\theta _{k}}\left(a_{t}\mid s_{t}\right)}}A^{\pi _{\theta _{k}}}\left(s_{t},a_{t}\right),\quad g\left(\epsilon ,A^{\pi _{\theta _{k}}}\left(s_{t},a_{t}\right)\right)\right)} typically via stochastic gradient ascent with Adam. Fit value function by regression on mean-squared error: ϕ k + 1 = arg ⁡ min ϕ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T ( V ϕ ( s t ) − R ^ t ) 2 {\displaystyle \phi _{k+1}=\arg \min _{\phi }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\left(V_{\phi }\left(s_{t}\right)-{\hat {R}}_{t}\right)^{2}} typically via some gradient descent algorithm. Like all policy gradient methods, PPO is used for training an RL agent whose actions are determined by a differentiable policy function by gradient ascent. Intuitively, a policy gradient method takes small policy update steps, so the agent can reach higher and higher rewards in expectation. Policy gradient methods may be unstable: A step size that is too big may direct the policy in a suboptimal direction, thus having little possibility of recovery; a step size that is too small lowers the overall efficiency. To solve the instability, PPO implements a clip function that constrains the policy update of an agent from being too large, so that larger step sizes may be used without negatively affecting the gradient ascent process. === Basic concepts === To begin the PPO training process, the agent is set in an environment to perform actions based on its current input. In the early phase of training, the agent can freely explore solutions and keep track of the result. Later, with a certain amount of transition samples and policy updates, the agent will select an action to take by randomly sampling from the probability distribution P ( A | S ) {\displaystyle P(A|S)} generated by the policy network. The actions that are most likely to be beneficial will have the highest probability of being selected from the random sample. After an agent arrives at a different scenario (a new state) by acting, it is rewarded with a positive reward or a negative reward. The objective of an agent is to maximize the cumulative reward signal across sequences of states, known as episodes. === Policy gradient laws: the advantage function === The advantage function (denoted as A {\displaystyle A} ) is central to PPO, as it tries to answer the question of whether a specific action of the agent is better or worse than some other possible action in a given state. By definition, the advantage function is an estimate of the relative value for a selected action. If the output of this function is positive, it means that the action in question is better than the average return, so the possibilities of selecting that specific action will increase. The opposite is true for a negative advantage output. The advantage function can be defined as A = Q − V {\displaystyle A=Q-V} , where Q {\displaystyle Q} is the discounted sum of rewards (the total weighted reward for the completion of an episode) and V {\displaystyle V} is the baseline estimate. Since the advantage function is calculated after the completion of an episode, the program records the outcome of the episode. Therefore, calculating advantage is essentially an unsupervised learning problem. The baseline estimate comes from the value function that outputs the expected discounted sum of an episode starting from the current state. In the PPO algorithm, the baseline estimate will be noisy (with some variance), as it also uses a neural network, like the policy function itself. With Q {\displaystyle Q} and V {\displaystyle V} computed, the advantage function is calculated by subtracting the baseline estimate from the actual discounted return. If A > 0 {\displaystyle A>0} , the actual return of the action is better than the expected return from experience; if A < 0 {\displaystyle A<0} , the actual return is worse. === Ratio function === In PPO, the ratio function ( r t {\displaystyle r_{t}} ) calculates the probability of selecting action a {\displaystyle a} in state s {\displaystyle s} given the current policy network, divided by the previous probability under the old policy. In other words: If r t ( θ ) > 1 {\displaystyle r_{t}(\theta )>1} , where θ {\displaystyle \theta } are the policy network parameters, then selecting action a {\displaystyle a} in state s {\displaystyle s} is more likely based on the current policy than the previous policy. If 0 ≤ r t ( θ ) < 1 {\displaystyle 0\leq r_{t}(\theta )<1} , then selecting actio

    Read more →
  • Q-learning

    Q-learning

    Q-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment (model-free). It can handle problems with stochastic transitions and rewards without requiring adaptations. For example, in a grid maze, an agent learns to reach an exit worth 10 points. At a junction, Q-learning might assign a higher value to moving right than left if right gets to the exit faster, improving this choice by trying both directions over time. For any finite Markov decision process, Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state. Q-learning can identify an optimal action-selection policy for any given finite Markov decision process, given infinite exploration time and a partly random policy. "Q" refers to the function that the algorithm computes: the expected reward—that is, the quality—of an action taken in a given state. == Reinforcement learning == Reinforcement learning involves an agent, a set of states S {\displaystyle {\mathcal {S}}} , and a set A {\displaystyle {\mathcal {A}}} of actions per state. By performing an action a ∈ A {\displaystyle a\in {\mathcal {A}}} , the agent transitions from state to state. Executing an action in a specific state provides the agent with a reward (a numerical score). The goal of the agent is to maximize its total reward. It does this by adding the maximum reward attainable from future states to the reward for achieving its current state, effectively influencing the current action by the potential future reward. This potential reward is a weighted sum of expected values of the rewards of all future steps starting from the current state. As an example, consider the process of boarding a train, in which the reward is measured by the negative of the total time spent boarding (alternatively, the cost of boarding the train is equal to the boarding time). One strategy is to enter the train door as soon as they open, minimizing the initial wait time for yourself. If the train is crowded, however, then you will have a slow entry after the initial action of entering the door as people are fighting you to depart the train as you attempt to board. The total boarding time, or cost, is then: 0 seconds wait time + 15 seconds fight time On the next day, by random chance (exploration), you decide to wait and let other people depart first. This initially results in a longer wait time. However, less time is spent fighting the departing passengers. Overall, this path has a higher reward than that of the previous day, since the total boarding time is now: 5 second wait time + 0 second fight time Through exploration, despite the initial (patient) action resulting in a larger cost (or negative reward) than in the forceful strategy, the overall cost is lower, thus revealing a more rewarding strategy. == Algorithm == After Δ t {\displaystyle \Delta t} steps into the future the agent will decide some next step. The weight for this step is calculated as γ Δ t {\displaystyle \gamma ^{\Delta t}} , where γ {\displaystyle \gamma } (the discount factor) is a number between 0 and 1 ( 0 ≤ γ ≤ 1 {\displaystyle 0\leq \gamma \leq 1} ). Assuming γ < 1 {\displaystyle \gamma <1} , it has the effect of valuing rewards received earlier higher than those received later (reflecting the value of a "good start"). γ {\displaystyle \gamma } may also be interpreted as the probability to succeed (or survive) at every step Δ t {\displaystyle \Delta t} . The algorithm, therefore, has a function that calculates the quality of a state–action combination: Q : S × A → R {\displaystyle Q:{\mathcal {S}}\times {\mathcal {A}}\to \mathbb {R} } . Before learning begins, ⁠ Q {\displaystyle Q} ⁠ is initialized to a possibly arbitrary fixed value (chosen by the programmer). Then, at each time t {\displaystyle t} the agent selects an action A t {\displaystyle A_{t}} , observes a reward R t + 1 {\displaystyle R_{t+1}} , enters a new state S t + 1 {\displaystyle S_{t+1}} (that may depend on both the previous state S t {\displaystyle S_{t}} and the selected action), and Q {\displaystyle Q} is updated. The core of the algorithm is a Bellman equation as a simple value iteration update, using the weighted average of the current value and the new information: Q n e w ( S t , A t ) ← ( 1 − α ⏟ learning rate ) ⋅ Q ( S t , A t ) ⏟ current value + α ⏟ learning rate ⋅ ( R t + 1 ⏟ reward + γ ⏟ discount factor ⋅ max a Q ( S t + 1 , a ) ⏟ estimate of optimal future value ⏟ new value (temporal difference target) ) {\displaystyle Q^{new}(S_{t},A_{t})\leftarrow (1-\underbrace {\alpha } _{\text{learning rate}})\cdot \underbrace {Q(S_{t},A_{t})} _{\text{current value}}+\underbrace {\alpha } _{\text{learning rate}}\cdot {\bigg (}\underbrace {\underbrace {R_{t+1}} _{\text{reward}}+\underbrace {\gamma } _{\text{discount factor}}\cdot \underbrace {\max _{a}Q(S_{t+1},a)} _{\text{estimate of optimal future value}}} _{\text{new value (temporal difference target)}}{\bigg )}} where R t + 1 {\displaystyle R_{t+1}} is the reward received when moving from the state S t {\displaystyle S_{t}} to the state S t + 1 {\displaystyle S_{t+1}} , and α {\displaystyle \alpha } is the learning rate ( 0 < α ≤ 1 ) {\displaystyle (0<\alpha \leq 1)} . Note that Q n e w ( S t , A t ) {\displaystyle Q^{new}(S_{t},A_{t})} is the sum of three terms: ( 1 − α ) Q ( S t , A t ) {\displaystyle (1-\alpha )Q(S_{t},A_{t})} : the current value (weighted by one minus the learning rate) α R t + 1 {\displaystyle \alpha \,R_{t+1}} : the reward R t + 1 {\displaystyle R_{t+1}} to obtain if action A t {\displaystyle A_{t}} is taken when in state S t {\displaystyle S_{t}} (weighted by learning rate) α γ max a Q ( S t + 1 , a ) {\displaystyle \alpha \gamma \max _{a}Q(S_{t+1},a)} : the maximum reward that can be obtained from state S t + 1 {\displaystyle S_{t+1}} (weighted by learning rate and discount factor) An episode of the algorithm ends when state S t + 1 {\displaystyle S_{t+1}} is a final or terminal state. However, Q-learning can also learn in non-episodic tasks (as a result of the property of convergent infinite series). If the discount factor is lower than 1, the action values are finite even if the problem can contain infinite loops or paths. For all final states s f {\displaystyle s_{f}} , Q ( s f , a ) {\displaystyle Q(s_{f},a)} is never updated, but is set to the reward value r {\displaystyle r} observed for state s f {\displaystyle s_{f}} . In most cases, Q ( s f , a ) {\displaystyle Q(s_{f},a)} can be taken to equal zero. == Influence of variables == === Learning rate === The learning rate or step size determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities). In fully deterministic environments, a learning rate of α t = 1 {\displaystyle \alpha _{t}=1} is optimal. When the problem is stochastic, the algorithm converges under some technical conditions on the learning rate that require it to decrease to zero. In practice, often a constant learning rate is used, such as α t = 0.1 {\displaystyle \alpha _{t}=0.1} for all t {\displaystyle t} . === Discount factor === The discount factor ⁠ γ {\displaystyle \gamma } ⁠ determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, i.e. r t {\displaystyle r_{t}} (in the update rule above), while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the action values may diverge. For ⁠ γ = 1 {\displaystyle \gamma =1} ⁠, without a terminal state, or if the agent never reaches one, all environment histories become infinitely long, and utilities with additive, undiscounted rewards generally become infinite. Even with a discount factor only slightly lower than 1, Q-function learning leads to propagation of errors and instabilities when the value function is approximated with an artificial neural network. In that case, starting with a lower discount factor and increasing it towards its final value accelerates learning. === Initial conditions (Q0) === Since Q-learning is an iterative algorithm, it implicitly assumes an initial condition before the first update occurs. High initial values, also known as "optimistic initial conditions", can encourage exploration: no matter what action is selected, the update rule will cause it to have lower values than the other alternative, thus increasing their choice probability. The first reward r {\displaystyle r} can be used to reset the initial conditions. According to this idea, the first time an action is taken the reward is used to set the value

    Read more →
  • Yorba (software)

    Yorba (software)

    Yorba is a web-based personal information management platform for finding, monitoring, or deleting online accounts and subscriptions. Yorba is a participating member of Consumer Reports’ Data Rights Protocol (DRP) consortium that develops open technical standards for exercising consumer data rights under laws including the California Consumer Privacy Act. == History == Yorba began as a research project around 2021. It was founded by Chris Zeunstrom (CEO), Nolan Cabeje (CDO) and David Schmudde (CTO). Zeunstrom says he began developing Yorba after growing frustrated with managing numerous email accounts, noting overloaded inboxes create distraction and potential security vulnerabilities. Yorba’s early development was also influenced by security issues he encountered at a previous company, which had been affected by data breaches at a time when such incidents were becoming increasingly common. In 2023, Yorba launched a private beta as a public benefit corporation funded through a give-back model operated by Zeunstrom's New York-based design firm, Ruca. In January 2024, Yorba entered public beta and reported over 1,000 users, including 160 premium subscribers. At the time of the public beta launch, Yorba integrated with Gmail and announced plans to expand compatibility to other online services and cloud storage providers. In September 2024, Yorba completed conformance testing under the Data Rights Protocol, an initiative developed by Consumer Reports, to establish a standard and open-source framework for securely transmitting consumer data rights requests under laws like the California Consumer Privacy Act. Yorba was named among twelve participating companies that implemented the protocol alongside OneTrust and Consumer Reports’ own Permission Slip app. Yorba was one of nine startups selected as 2025 finalist in the Santander X Global Awards international entrepreneurship competition. == Features == Yorba scans user inbox history data to identify online accounts, mailing lists, and possible data breaches. It uses natural language processing and machine learning to identify a user's accounts, services, and subscriptions. The platform prompts password resets for compromised accounts and locates unused accounts. The platform also supports mailing list management by identifying and helping users unsubscribe from newsletters. Paid subscribers can locate and cancel recurring charges. Yorba links with financial institutions in the U.S., Canada, and EU via Plaid Inc. to detect recurring charges and delete unwanted subscriptions. == Privacy and Ethics == Yorba's founder has openly criticized dark patterns that make canceling services difficult, citing personal frustration with inbox clutter as part of his inspiration for Yorba. Yorba offers privacy policy analysis in partnership with Amsterdam-based nonprofit Terms of Service; Didn’t Read, assigning grades based on invasiveness or ethical concerns. As of 2024, the company described its pricing as designed to cover operational costs and sustain the platform without outside investment.

    Read more →
  • Confirmatory blockmodeling

    Confirmatory blockmodeling

    Confirmatory blockmodeling is a deductive approach in blockmodeling, where a blockmodel (or part of it) is prespecify before the analysis, and then the analysis is fit to this model. When only a part of analysis is prespecify (like individual cluster(s) or location of the block types), it is called partially confirmatory blockmodeling. This is so-called indirect approach, where the blockmodeling is done on the blockmodel fitting (e.g., a priori hypothesized blockmodel). Opposite approach to the confirmatory blockmodeling is an inductive exploratory blockmodeling.

    Read more →
  • Multiple correspondence analysis

    Multiple correspondence analysis

    In statistics, multiple correspondence analysis (MCA) is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Euclidean space. The procedure thus appears to be the counterpart of principal component analysis for categorical data. MCA can be viewed as an extension of simple correspondence analysis (CA) in that it is applicable to a large set of categorical variables. == As an extension of correspondence analysis == MCA is performed by applying the CA algorithm to either an indicator matrix (also called complete disjunctive table – CDT) or a Burt table formed from these variables. An indicator matrix is an individuals × variables matrix, where the rows represent individuals and the columns are dummy variables representing categories of the variables. Analyzing the indicator matrix allows the direct representation of individuals as points in geometric space. The Burt table is the symmetric matrix of all two-way cross-tabulations between the categorical variables, and has an analogy to the covariance matrix of continuous variables. Analyzing the Burt table is a more natural generalization of simple correspondence analysis, and individuals or the means of groups of individuals can be added as supplementary points to the graphical display. In the indicator matrix approach, associations between variables are uncovered by calculating the chi-square distance between different categories of the variables and between the individuals (or respondents). These associations are then represented graphically as "maps", which eases the interpretation of the structures in the data. Oppositions between rows and columns are then maximized, in order to uncover the underlying dimensions best able to describe the central oppositions in the data. As in factor analysis or principal component analysis, the first axis is the most important dimension, the second axis the second most important, and so on, in terms of the amount of variance accounted for. The number of axes to be retained for analysis is determined by calculating modified eigenvalues. == Details == Since MCA is adapted to draw statistical conclusions from categorical variables (such as multiple choice questions), the first thing one needs to do is to transform quantitative data (such as age, size, weight, day time, etc) into categories (using for instance statistical quantiles). When the dataset is completely represented as categorical variables, one is able to build the corresponding so-called complete disjunctive table. We denote this table X {\displaystyle X} . If I {\displaystyle I} persons answered a survey with J {\displaystyle J} multiple choices questions with 4 answers each, X {\displaystyle X} will have I {\displaystyle I} rows and 4 J {\displaystyle 4J} columns. More theoretically, assume X {\displaystyle X} is the completely disjunctive table of I {\displaystyle I} observations of K {\displaystyle K} categorical variables. Assume also that the k {\displaystyle k} -th variable have J k {\displaystyle J_{k}} different levels (categories) and set J = ∑ k = 1 K J k {\displaystyle J=\sum _{k=1}^{K}J_{k}} . The table X {\displaystyle X} is then a I × J {\displaystyle I\times J} matrix with all coefficient being 0 {\displaystyle 0} or 1 {\displaystyle 1} . Set the sum of all entries of X {\displaystyle X} to be N {\displaystyle N} and introduce Z = X / N {\displaystyle Z=X/N} . In an MCA, there are also two special vectors: first r {\displaystyle r} , that contains the sums along the rows of Z {\displaystyle Z} , and c {\displaystyle c} , that contains the sums along the columns of Z {\displaystyle Z} . Note D r = diag ( r ) {\displaystyle D_{r}={\text{diag}}(r)} and D c = diag ( c ) {\displaystyle D_{c}={\text{diag}}(c)} , the diagonal matrices containing r {\displaystyle r} and c {\displaystyle c} respectively as diagonal. With these notations, computing an MCA consists essentially in the singular value decomposition of the matrix: M = D r − 1 / 2 ( Z − r c T ) D c − 1 / 2 {\displaystyle M=D_{r}^{-1/2}(Z-rc^{T})D_{c}^{-1/2}} The decomposition of M {\displaystyle M} gives you P {\displaystyle P} , Δ {\displaystyle \Delta } and Q {\displaystyle Q} such that M = P Δ Q T {\displaystyle M=P\Delta Q^{T}} with P, Q two unitary matrices and Δ {\displaystyle \Delta } is the generalized diagonal matrix of the singular values (with the same shape as Z {\displaystyle Z} ). The positive coefficients of Δ 2 {\displaystyle \Delta ^{2}} are the eigenvalues of Z {\displaystyle Z} . The interest of MCA comes from the way observations (rows) and variables (columns) in Z {\displaystyle Z} can be decomposed. This decomposition is called a factor decomposition. The coordinates of the observations in the factor space are given by F = D r − 1 / 2 P Δ {\displaystyle F=D_{r}^{-1/2}P\Delta } The i {\displaystyle i} -th rows of F {\displaystyle F} represent the i {\displaystyle i} -th observation in the factor space. And similarly, the coordinates of the variables (in the same factor space as observations!) are given by G = D c − 1 / 2 Q Δ {\displaystyle G=D_{c}^{-1/2}Q\Delta } == Recent works and extensions == In recent years, several students of Jean-Paul Benzécri have refined MCA and incorporated it into a more general framework of data analysis known as geometric data analysis. This involves the development of direct connections between simple correspondence analysis, principal component analysis and MCA with a form of cluster analysis known as Euclidean classification. Two extensions have great practical use. It is possible to include, as active elements in the MCA, several quantitative variables. This extension is called factor analysis of mixed data (see below). Very often, in questionnaires, the questions are structured in several issues. In the statistical analysis it is necessary to take into account this structure. This is the aim of multiple factor analysis which balances the different issues (i.e. the different groups of variables) within a global analysis and provides, beyond the classical results of factorial analysis (mainly graphics of individuals and of categories), several results (indicators and graphics) specific of the group structure. == Application fields == In the social sciences, MCA is arguably best known for its application by Pierre Bourdieu, notably in his books La Distinction, Homo Academicus and The State Nobility. Bourdieu argued that there was an internal link between his vision of the social as spatial and relational --– captured by the notion of field, and the geometric properties of MCA. Sociologists following Bourdieu's work most often opt for the analysis of the indicator matrix, rather than the Burt table, largely because of the central importance accorded to the analysis of the 'cloud of individuals'. == Multiple correspondence analysis and principal component analysis == MCA can also be viewed as a PCA applied to the complete disjunctive table. To do this, the CDT must be transformed as follows. Let y i k {\displaystyle y_{ik}} denote the general term of the CDT. y i k {\displaystyle y_{ik}} is equal to 1 if individual i {\displaystyle i} possesses the category k {\displaystyle k} and 0 if not. Let denote p k {\displaystyle p_{k}} , the proportion of individuals possessing the category k {\displaystyle k} . The transformed CDT (TCDT) has as general term: x i k = y i k / p k − 1 {\displaystyle x_{ik}=y_{ik}/p_{k}-1} The unstandardized PCA applied to TCDT, the column k {\displaystyle k} having the weight p k {\displaystyle p_{k}} , leads to the results of MCA. This equivalence is fully explained in a book by Jérôme Pagès. It plays an important theoretical role because it opens the way to the simultaneous treatment of quantitative and qualitative variables. Two methods simultaneously analyze these two types of variables: factor analysis of mixed data and, when the active variables are partitioned in several groups: multiple factor analysis. This equivalence does not mean that MCA is a particular case of PCA as it is not a particular case of CA. It only means that these methods are closely linked to one another, as they belong to the same family: the factorial methods. == Software == There are numerous software of data analysis that include MCA, such as STATA and SPSS. The R package FactoMineR also features MCA. This software is related to a book describing the basic methods for performing MCA . There is also a Python package for [1] which works with numpy array matrices; the package has not been implemented yet for Spark dataframes.

    Read more →
  • Ordination (statistics)

    Ordination (statistics)

    Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). In contrast to cluster analysis, ordination orders quantities in a (usually lower-dimensional) latent space. In the ordination space, quantities that are near each other share attributes (i.e., are similar to some degree), and dissimilar objects are farther from each other. Such relationships between the objects, on each of several axes or latent variables, are then characterized numerically and/or graphically in a biplot. The first ordination method, principal components analysis, was suggested by Karl Pearson in 1901. == Methods == Ordination methods can broadly be categorized in eigenvector-, algorithm-, or model-based methods. Many classical ordination techniques, including principal components analysis, correspondence analysis (CA) and its derivatives (detrended correspondence analysis, canonical correspondence analysis, and redundancy analysis, belong to the first group). The second group includes some distance-based methods such as non-metric multidimensional scaling, and machine learning methods such as T-distributed stochastic neighbor embedding and nonlinear dimensionality reduction. The third group includes model-based ordination methods, which can be considered as multivariate extensions of Generalized Linear Models. Model-based ordination methods are more flexible in their application than classical ordination methods, so that it is for example possible to include random-effects. Unlike in the aforementioned two groups, there is no (implicit or explicit) distance measure in the ordination. Instead, a distribution needs to be specified for the responses as is typical for statistical models. These and other assumptions, such as the assumed mean-variance relationship, can be validated with the use of residual diagnostics, unlike in other ordination methods. == Applications == Ordination can be used on the analysis of any set of multivariate objects. It is frequently used in several environmental or ecological sciences, particularly plant community ecology. It is also used in genetics and systems biology for microarray data analysis and in psychometrics.

    Read more →
  • Scanimate

    Scanimate

    Scanimate is an analog computer animation (video synthesizer) system created by Lee Harrison III of Denver, Colorado. Harrison had developed its predecessor, ANIMAC, which generated used a motion capture system, based on a body suit with potentiometers. In contrast, Scanimate included TV technology. Scanimate's successor was called Caesar, and used a digital computer to control the analog system. The eight Scanimate systems were used to produce much of the video-based animation seen on television between most of the 1970s and early 1980s in commercials, promotions, and show openings. One of the major advantages the Scanimate system had over film-based animation and computer animation was the ability to create animations in real time. The speed with which animation could be produced on the system because of this, as well as its range of possible effects, helped it to supersede film-based animation techniques for television graphics. By the mid-1980s, it was superseded by digital computer animation, which produced sharper images and more sophisticated 3D imagery. Animations created on Scanimate and similar analog computer animation systems have a number of characteristic features that distinguish them from film-based animation: the motion is extremely fluid, using all 60 fields per second (in NTSC format video) or 50 fields (in PAL format video) rather than the 24 frames per second that film uses; the colors are much brighter and more saturated; and the images have a very "electronic" look that results from the direct manipulation of video signals through which the Scanimate produces the images. == How it works == A special high-resolution (around 945 lines) monochrome camera records high-contrast artwork. The image is then displayed on a high-resolution screen. Unlike a normal monitor, its deflection signals are passed through a special analog computer that enables the operator to bend the image in a variety of ways. The image is then shot from the screen by either a film camera or a video camera. In the case of a video camera, this signal is then fed into a colorizer, a device that takes certain shades of grey and turns it into color as well as transparency. The idea behind this is that the output of the Scanimate itself is always monochrome. Another advantage of the colorizer is that it gives the operator the ability to continuously add layers of graphics. This makes possible the creation of very complex graphics. This is done by using two video recorders. The background is played by one recorder and then recorded by another one. This process is repeated for every layer. This requires very high-quality video recorders (such as both the Ampex VR-2000 or IVC's IVC-9000 of Scanimate's era, the IVC-9000 being used quite frequently for Scanimate composition due to its very high generational quality between re-recordings). == Current usage == Two of the Scanimates are still in use at ZFx studios in Asheville, North Carolina. The original "Black Swan" R&D machine has been updated with more modern power supplies and can produce material in standard or 1080P high definition video. The "white Pearl" machine is the last one produced and is being kept in its original configuration for historical purposes by David Sieg at ZFx inc. The machines are installed in a working production environment with Grass Valley switchers, Kaleidoscope digital video effects systems, and Accom digital disk recorders for layering. == Use in television, music and films == === Music videos === Let's Groove by Earth, Wind & Fire Get Down on It by Kool & the Gang Blame It on the Boogie by The Jacksons Knock on Wood by Amii Stewart Popcorn Love by New Edition === TV programs/movies === === TV channels/home video/TV productions ===

    Read more →
  • Constellation model

    Constellation model

    The constellation model is a probabilistic, generative model for category-level object recognition in computer vision. Like other part-based models, the constellation model attempts to represent an object class by a set of N parts under mutual geometric constraints. Because it considers the geometric relationship between different parts, the constellation model differs significantly from appearance-only, or "bag-of-words" representation models, which explicitly disregard the location of image features. The problem of defining a generative model for object recognition is difficult. The task becomes significantly complicated by factors such as background clutter, occlusion, and variations in viewpoint, illumination, and scale. Ideally, we would like the particular representation we choose to be robust to as many of these factors as possible. In category-level recognition, the problem is even more challenging because of the fundamental problem of intra-class variation. Even if two objects belong to the same visual category, their appearances may be significantly different. However, for structured objects such as cars, bicycles, and people, separate instances of objects from the same category are subject to similar geometric constraints. For this reason, particular parts of an object such as the headlights or tires of a car still have consistent appearances and relative positions. The Constellation Model takes advantage of this fact by explicitly modeling the relative location, relative scale, and appearance of these parts for a particular object category. Model parameters are estimated using an unsupervised learning algorithm, meaning that the visual concept of an object class can be extracted from an unlabeled set of training images, even if that set contains "junk" images or instances of objects from multiple categories. It can also account for the absence of model parts due to appearance variability, occlusion, clutter, or detector error. == History == The idea for a "parts and structure" model was originally introduced by Fischler and Elschlager in 1973. This model has since been built upon and extended in many directions. The Constellation Model, as introduced by Dr. Perona and his colleagues, was a probabilistic adaptation of this approach. In the late '90s, Burl et al. revisited the Fischler and Elschlager model for the purpose of face recognition. In their work, Burl et al. used manual selection of constellation parts in training images to construct a statistical model for a set of detectors and the relative locations at which they should be applied. In 2000, Weber et al. made the significant step of training the model using a more unsupervised learning process, which precluded the necessity for tedious hand-labeling of parts. Their algorithm was particularly remarkable because it performed well even on cluttered and occluded image data. Fergus et al. then improved upon this model by making the learning step fully unsupervised, having both shape and appearance learned simultaneously, and accounting explicitly for the relative scale of parts. == The method of Weber and Welling et al. == In the first step, a standard interest point detection method, such as Harris corner detection, is used to generate interest points. Image features generated from the vicinity of these points are then clustered using k-means or another appropriate algorithm. In this process of vector quantization, one can think of the centroids of these clusters as being representative of the appearance of distinctive object parts. Appropriate feature detectors are then trained using these clusters, which can be used to obtain a set of candidate parts from images. As a result of this process, each image can now be represented as a set of parts. Each part has a type, corresponding to one of the aforementioned appearance clusters, as well as a location in the image space. === Basic generative model === Weber & Welling here introduce the concept of foreground and background. Foreground parts correspond to an instance of a target object class, whereas background parts correspond to background clutter or false detections. Let T be the number of different types of parts. The positions of all parts extracted from an image can then be represented in the following "matrix," X o = ( x 11 , x 12 , ⋯ , x 1 N 1 x 21 , x 22 , ⋯ , x 2 N 2 ⋮ x T 1 , x T 2 , ⋯ , x T N T ) {\displaystyle X^{o}={\begin{pmatrix}x_{11},x_{12},{\cdots },x_{1N_{1}}\\x_{21},x_{22},{\cdots },x_{2N_{2}}\\\vdots \\x_{T1},x_{T2},{\cdots },x_{TN_{T}}\end{pmatrix}}} where N i {\displaystyle N_{i}\,} represents the number of parts of type i ∈ { 1 , … , T } {\displaystyle i\in \{1,\dots ,T\}} observed in the image. The superscript o indicates that these positions are observable, as opposed to missing. The positions of unobserved object parts can be represented by the vector x m {\displaystyle x^{m}\,} . Suppose that the object will be composed of F {\displaystyle F\,} distinct foreground parts. For notational simplicity, we assume here that F = T {\displaystyle F=T\,} , though the model can be generalized to F > T {\displaystyle F>T\,} . A hypothesis h {\displaystyle h\,} is then defined as a set of indices, with h i = j {\displaystyle h_{i}=j\,} , indicating that point x i j {\displaystyle x_{ij}\,} is a foreground point in X o {\displaystyle X^{o}\,} . The generative probabilistic model is defined through the joint probability density p ( X o , x m , h ) {\displaystyle p(X^{o},x^{m},h)\,} . === Model details === The rest of this section summarizes the details of Weber & Welling's model for a single component model. The formulas for multiple component models are extensions of those described here. To parametrize the joint probability density, Weber & Welling introduce the auxiliary variables b {\displaystyle b\,} and n {\displaystyle n\,} , where b {\displaystyle b\,} is a binary vector encoding the presence/absence of parts in detection ( b i = 1 {\displaystyle b_{i}=1\,} if h i > 0 {\displaystyle h_{i}>0\,} , otherwise b i = 0 {\displaystyle b_{i}=0\,} ), and n {\displaystyle n\,} is a vector where n i {\displaystyle n_{i}\,} denotes the number of background candidates included in the i t h {\displaystyle i^{th}} row of X o {\displaystyle X^{o}\,} . Since b {\displaystyle b\,} and n {\displaystyle n\,} are completely determined by h {\displaystyle h\,} and the size of X o {\displaystyle X^{o}\,} , we have p ( X o , x m , h ) = p ( X o , x m , h , n , b ) {\displaystyle p(X^{o},x^{m},h)=p(X^{o},x^{m},h,n,b)\,} . By decomposition, p ( X o , x m , h , n , b ) = p ( X o , x m | h , n , b ) p ( h | n , b ) p ( n ) p ( b ) {\displaystyle p(X^{o},x^{m},h,n,b)=p(X^{o},x^{m}|h,n,b)p(h|n,b)p(n)p(b)\,} The probability density over the number of background detections can be modeled by a Poisson distribution, p ( n ) = ∏ i = 1 T 1 n i ! ( M i ) n i e − M i {\displaystyle p(n)=\prod _{i=1}^{T}{\frac {1}{n_{i}!}}(M_{i})^{n_{i}}e^{-M_{i}}} where M i {\displaystyle M_{i}\,} is the average number of background detections of type i {\displaystyle i\,} per image. Depending on the number of parts F {\displaystyle F\,} , the probability p ( b ) {\displaystyle p(b)\,} can be modeled either as an explicit table of length 2 F {\displaystyle 2^{F}\,} , or, if F {\displaystyle F\,} is large, as F {\displaystyle F\,} independent probabilities, each governing the presence of an individual part. The density p ( h | n , b ) {\displaystyle p(h|n,b)\,} is modeled by p ( h | n , b ) = { 1 ∏ f = 1 F N f b f , if h ∈ H ( b , n ) 0 , for other h {\displaystyle p(h|n,b)={\begin{cases}{\frac {1}{\textstyle \prod _{f=1}^{F}N_{f}^{b_{f}}}},&{\mbox{if }}h\in H(b,n)\\0,&{\mbox{for other }}h\end{cases}}} where H ( b , n ) {\displaystyle H(b,n)\,} denotes the set of all hypotheses consistent with b {\displaystyle b\,} and n {\displaystyle n\,} , and N f {\displaystyle N_{f}\,} denotes the total number of detections of parts of type f {\displaystyle f\,} . This expresses the fact that all consistent hypotheses, of which there are ∏ f = 1 F N f b f {\displaystyle \textstyle \prod _{f=1}^{F}N_{f}^{b_{f}}} , are equally likely in the absence of information on part locations. And finally, p ( X o , x m | h , n ) = p f g ( z ) p b g ( x b g ) {\displaystyle p(X^{o},x^{m}|h,n)=p_{fg}(z)p_{bg}(x_{bg})\,} where z = ( x o x m ) {\displaystyle z=(x^{o}x^{m})\,} are the coordinates of all foreground detections, observed and missing, and x b g {\displaystyle x_{bg}\,} represents the coordinates of the background detections. Note that foreground detections are assumed to be independent of the background. p f g ( z ) {\displaystyle p_{fg}(z)\,} is modeled as a joint Gaussian with mean μ {\displaystyle \mu \,} and covariance Σ {\displaystyle \Sigma \,} . === Classification === The ultimate objective of this model is to classify images into classes "object present" (class C 1 {\displaystyle C_{1}\,} ) and "object absent" (class C 0 {\displaystyle C_{0}\,} ) given t

    Read more →
  • Diffusion map

    Diffusion map

    Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive. == Definition of diffusion maps == Following and , diffusion maps can be defined in four steps. === Connectivity === Diffusion maps exploit the relationship between heat diffusion and random walk Markov chain. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let ( X , A , μ ) {\displaystyle (X,{\mathcal {A}},\mu )} be a measure space, where X {\displaystyle X} is the data set and μ {\displaystyle \mu } represents the distribution of the points on X {\displaystyle X} . Based on this, the connectivity k {\displaystyle k} between two data points, x {\displaystyle x} and y {\displaystyle y} , can be defined as the probability of walking from x {\displaystyle x} to y {\displaystyle y} in one step of the random walk. Usually, this probability is specified in terms of a kernel function of the two points: k : X × X → R {\displaystyle k:X\times X\rightarrow \mathbb {R} } . For example, the popular Gaussian kernel: k ( x , y ) = exp ⁡ ( − | | x − y | | 2 ϵ ) {\displaystyle k(x,y)=\exp \left(-{\frac {||x-y||^{2}}{\epsilon }}\right)} More generally, the kernel function has the following properties k ( x , y ) = k ( y , x ) {\displaystyle k(x,y)=k(y,x)} ( k {\displaystyle k} is symmetric) k ( x , y ) ≥ 0 ∀ x , y {\displaystyle k(x,y)\geq 0\,\,\forall x,y} ( k {\displaystyle k} is positivity preserving). The kernel constitutes the prior definition of the local geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once. Given ( X , k ) {\displaystyle (X,k)} , we can then construct a reversible discrete-time Markov chain on X {\displaystyle X} (a process known as the normalized graph Laplacian construction): d ( x ) = ∫ X k ( x , y ) d μ ( y ) {\displaystyle d(x)=\int _{X}k(x,y)d\mu (y)} and define: p ( x , y ) = k ( x , y ) d ( x ) {\displaystyle p(x,y)={\frac {k(x,y)}{d(x)}}} Although the new normalized kernel does not inherit the symmetric property, it does inherit the positivity-preserving property and gains a conservation property: ∫ X p ( x , y ) d μ ( y ) = 1 {\displaystyle \int _{X}p(x,y)d\mu (y)=1} === Diffusion process === From p ( x , y ) {\displaystyle p(x,y)} we can construct a transition matrix of a Markov chain ( M {\displaystyle M} ) on X {\displaystyle X} . In other words, p ( x , y ) {\displaystyle p(x,y)} represents the one-step transition probability from x {\displaystyle x} to y {\displaystyle y} , and M t {\displaystyle M^{t}} gives the t-step transition matrix. We define the diffusion matrix L {\displaystyle L} (it is also a version of graph Laplacian matrix) L i , j = k ( x i , x j ) {\displaystyle L_{i,j}=k(x_{i},x_{j})\,} We then define the new kernel L i , j ( α ) = k ( α ) ( x i , x j ) = L i , j ( d ( x i ) d ( x j ) ) α {\displaystyle L_{i,j}^{(\alpha )}=k^{(\alpha )}(x_{i},x_{j})={\frac {L_{i,j}}{(d(x_{i})d(x_{j}))^{\alpha }}}\,} or equivalently, L ( α ) = D − α L D − α {\displaystyle L^{(\alpha )}=D^{-\alpha }LD^{-\alpha }\,} where D is a diagonal matrix and D i , i = ∑ j L i , j . {\displaystyle D_{i,i}=\sum _{j}L_{i,j}.} We apply the graph Laplacian normalization to this new kernel: M = ( D ( α ) ) − 1 L ( α ) , {\displaystyle M=({D}^{(\alpha )})^{-1}L^{(\alpha )},\,} where D ( α ) {\displaystyle D^{(\alpha )}} is a diagonal matrix and D i , i ( α ) = ∑ j L i , j ( α ) . {\displaystyle {D}_{i,i}^{(\alpha )}=\sum _{j}L_{i,j}^{(\alpha )}.} p ( x j , t | x i ) = M i , j t {\displaystyle p(x_{j},t|x_{i})=M_{i,j}^{t}\,} One of the main ideas of the diffusion framework is that running the chain forward in time (taking larger and larger powers of M {\displaystyle M} ) reveals the geometric structure of X {\displaystyle X} at larger and larger scales (the diffusion process). Specifically, the notion of a cluster in the data set is quantified as a region in which the probability of escaping this region is low (within a certain time t). Therefore, t not only serves as a time parameter, but it also has the dual role of scale parameter. The eigendecomposition of the matrix M t {\displaystyle M^{t}} yields M i , j t = ∑ l λ l t ψ l ( x i ) ϕ l ( x j ) {\displaystyle M_{i,j}^{t}=\sum _{l}\lambda _{l}^{t}\psi _{l}(x_{i})\phi _{l}(x_{j})\,} where { λ l } {\displaystyle \{\lambda _{l}\}} is the sequence of eigenvalues of M {\displaystyle M} and { ψ l } {\displaystyle \{\psi _{l}\}} and { ϕ l } {\displaystyle \{\phi _{l}\}} are the biorthogonal left and right eigenvectors respectively. Due to the spectrum decay of the eigenvalues, only a few terms are necessary to achieve a given relative accuracy in this sum. ==== Parameter α and the diffusion operator ==== The reason to introduce the normalization step involving α {\displaystyle \alpha } is to tune the influence of the data point density on the infinitesimal transition of the diffusion. In some applications, the sampling of the data is generally not related to the geometry of the manifold we are interested in describing. In this case, we can set α = 1 {\displaystyle \alpha =1} and the diffusion operator approximates the Laplace–Beltrami operator. We then recover the Riemannian geometry of the data set regardless of the distribution of the points. To describe the long-term behavior of the point distribution of a system of stochastic differential equations, we can use α = 0.5 {\displaystyle \alpha =0.5} and the resulting Markov chain approximates the Fokker–Planck diffusion. With α = 0 {\displaystyle \alpha =0} , it reduces to the classical graph Laplacian normalization. === Diffusion distance === The diffusion distance at time t {\displaystyle t} between two points can be measured as the similarity of two points in the observation space with the connectivity between them. It is given by D t ( x i , x j ) 2 = ∑ y ( p ( y , t | x i ) − p ( y , t | x j ) ) 2 ϕ 0 ( y ) {\displaystyle D_{t}(x_{i},x_{j})^{2}=\sum _{y}{\frac {(p(y,t|x_{i})-p(y,t|x_{j}))^{2}}{\phi _{0}(y)}}} where ϕ 0 ( y ) {\displaystyle \phi _{0}(y)} is the stationary distribution of the Markov chain, given by the first left eigenvector of M {\displaystyle M} . Explicitly: ϕ 0 ( y ) = d ( y ) ∑ z ∈ X d ( z ) {\displaystyle \phi _{0}(y)={\frac {d(y)}{\sum _{z\in X}d(z)}}} Intuitively, D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} is small if there is a large number of short paths connecting x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} . There are several interesting features associated with the diffusion distance, based on our previous discussion that t {\displaystyle t} also serves as a scale parameter: Points are closer at a given scale (as specified by D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} ) if they are highly connected in the graph, therefore emphasizing the concept of a cluster. This distance is robust to noise, since the distance between two points depends on all possible paths of length t {\displaystyle t} between the points. From a machine learning point of view, the distance takes into account all evidences linking x i {\displaystyle x_{i}} to x j {\displaystyle x_{j}} , allowing us to conclude that this distance is appropriate for the design of inference algorithms based on the majority of preponderance. === Diffusion process and low-dimensional embedding === The diffusion distance can be calculated using the eigenvectors by D t ( x i , x j ) 2 = ∑ l λ l 2 t ( ψ l ( x i ) − ψ l ( x j ) ) 2 {\displaystyle D_{t}(x_{i},x_{j})^{2}=\sum _{l}\lambda _{l}^{2t}(\psi _{l}(x_{i})-\psi _{l}(x_{j}))^{2}\,} So the eigenvectors can be used as a new set of coordinates for the data. The diffusion map is defined as: Ψ t ( x ) = ( λ 1 t ψ 1 ( x ) , λ 2 t ψ 2 ( x ) , … , λ k t ψ k ( x ) ) {\displaystyle \Psi _{t}(x)=(\lambda _{1}^{t}\psi _{1}(x),\lambda _{2}^{t}\psi _{2}(x),\ld

    Read more →
  • List of datasets for machine-learning research

    List of datasets for machine-learning research

    These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less intuitively, the availability of high-quality training datasets. High-quality labeled training datasets for supervised and semi-supervised machine-learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality unlabeled datasets for unsupervised learning can also be difficult and costly to produce. Many organizations, including governments, publish and share their datasets, often using common metadata formats (such as Croissant). The datasets are classified, based on the licenses, into two groups: open data and non-open data. The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made available for searching, depositing and accessing through interfaces like Open API. The datasets are made available as various sorted types and subtypes. == List of sorting used for datasets == The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many government organizations and academic institutions. == List of open data portals == == List of portals suitable for multiple types of applications == The data portal sometimes lists a wide variety of subtypes of datasets pertaining to many machine learning applications. == List of portals suitable for a specific subtype of applications == The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections. == Image data == == Text data == These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis. === Reviews === === News articles === === Messages === === Twitter and tweets === === Dialogues === === Legal === === Other text === == Sound data == These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis. === Speech === === Music === === Other sounds === == Signal data == Datasets containing electric signal information requiring some sort of signal processing for further analysis. === Electrical === === Motion-tracking === === Other signals === == Chemical data == Datasets from physical systems. === Chemical Reactions with transition states (TS) === === OpenReACT-CHON-EFH === OpenReACT-CHON-EFH (Open Reaction Dataset of Atomic ConfiguraTions comprising C, H, O and N with Energies, Forces and Hessians) is a 2025 open-access benchmark for machine-learning interatomic potentials. RTP set – 35,087 stationary-point geometries (reactant, transition state and product) drawn from 11,961 elementary reactions, each labeled with density-functional energies, atomic forces and full Hessian matrices at the ωB97X-D/6-31G(d) level. IRC set – 34,248 structures along 600 minimum-energy reaction paths, used to test extrapolation beyond trained stationary points. NMS set – 62,527 off-equilibrium geometries generated by normal-mode sampling to probe model robustness under thermal perturbations. The collection underpins the study Does Hessian Data Improve the Performance of Machine Learning Potentials? and was used to train and benchmark the machine-learning interatomic potentials reported therein. The dataset itself is distributed under a CC licence via Figshare. == Physical data == Datasets from physical systems. === High-energy physics === === Systems === === Astronomy === === Earth science === === Other physical === == Biological data == Datasets from biological systems. === Human === === Animal === === Fungi === === Plant === === Microbe === === Drug discovery === == Anomaly data == == Question answering data == This section includes datasets that deals with structured data. == Dialog or instruction prompted data == This section includes datasets that contains multi-turn text with at least two actors, a "user" and an "agent". The user makes requests for the agent, which performs the request. == Cybersecurity == == Climate and sustainability == == Code data == == Multivariate data == === Financial === === Weather === === Census === === Transit === === Internet === === Games === === Other multivariate === == Curated repositories of datasets == As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research. OpenML: Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms. PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification and regression datasets in a standardized format that are accessible through a Python API. Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic. Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.

    Read more →
  • Thai QR Payment

    Thai QR Payment

    Thai QR Payment or PromptPay (พร้อมเพย์) is a real-time payment system in Thailand that allows money transfers through digital channels using identifiers linked to a bank account, including a mobile phone number, citizen identification number, tax identification number or bank account number. The system was introduced in 2016 as part of Thailand's national e-payment infrastructure and was developed under the National e-Payment Master Plan, a government programme intended to expand digital payment infrastructure and reduce the use of cash in everyday transactions. It is owned by National ITMX ltd and Bank of Thailand and developed by Vocalink, a group by Mastercard == History == PromptPay (originally AnyID) is one of the National e-Payment projects and policies by Thailand, to regulate and standardize electronic payments to follow the technologies with internet and smartphones that is expanding and bringing technology into Finance and Commerce. By 22 December 2015, The First Prayut cabinet have approved the project as a national infastructure PromptPay has also been used in cross-border payment linkages with other real-time payment systems in Southeast Asia. In April 2021, the Monetary Authority of Singapore and the Bank of Thailand launched a linkage between Singapore's PayNow and Thailand's PromptPay, allowing customers of participating banks to send money between the two countries using a mobile phone number. In June 2021, the central banks of Thailand and Malaysia launched a cross-border QR payment linkage between PromptPay and Malaysia's DuitNow system. == Services == PromptPay's Services have included Encrypted Transactions and Payment between Two Individuals (C2C) Government Infrastructure Payment Tax Returns Individual PromptPay e-Wallet Thai QR Payment Pay Alert e-Donation Cross Border QR Payment

    Read more →
  • Low-rank approximation

    Low-rank approximation

    In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure. Low-rank approximation is closely related to numerous other techniques, including principal component analysis, factor analysis, total least squares, latent semantic analysis, orthogonal regression, and dynamic mode decomposition. == Definition == Given structure specification S : R n p → R m × n {\displaystyle {\mathcal {S}}:\mathbb {R} ^{n_{p}}\to \mathbb {R} ^{m\times n}} , vector of structure parameters p ∈ R n p {\displaystyle p\in \mathbb {R} ^{n_{p}}} , norm ‖ ⋅ ‖ {\displaystyle \|\cdot \|} , and desired rank r {\displaystyle r} , minimize over p ^ ‖ p − p ^ ‖ subject to rank ⁡ ( S ( p ^ ) ) ≤ r . {\displaystyle {\text{minimize}}\quad {\text{over }}{\widehat {p}}\quad \|p-{\widehat {p}}\|\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\mathcal {S}}({\widehat {p}}){\big )}\leq r.} == Applications == Linear system identification, in which case the approximating matrix is Hankel structured. Machine learning, in which case the approximating matrix is nonlinearly structured. Recommender systems, in which cases the data matrix has missing values and the approximation is categorical. Distance matrix completion, in which case there is a positive definiteness constraint. Natural language processing, in which case the approximation is nonnegative. Computer algebra, in which case the approximation is Sylvester structured. Matrix product states, in which case the approximation is usually rescaled to have fixed Frobenius norm. == Basic low-rank approximation problem == The unstructured problem with fit measured by the Frobenius norm, i.e., minimize over D ^ ‖ D − D ^ ‖ F subject to rank ⁡ ( D ^ ) ≤ r {\displaystyle {\text{minimize}}\quad {\text{over }}{\widehat {D}}\quad \|D-{\widehat {D}}\|_{\text{F}}\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\widehat {D}}{\big )}\leq r} has an analytic solution in terms of the singular value decomposition of the data matrix. The result is referred to as the matrix approximation lemma or Eckart–Young–Mirsky theorem. This problem was originally solved by Erhard Schmidt in the infinite dimensional context of integral operators (although his methods easily generalize to arbitrary compact operators on Hilbert spaces) and later rediscovered by C. Eckart and G. Young. L. Mirsky generalized the result to arbitrary unitarily invariant norms. Let D = U Σ V ⊤ ∈ R m × n , m ≥ n {\displaystyle D=U\Sigma V^{\top }\in \mathbb {R} ^{m\times n},\quad m\geq n} be the singular value decomposition of D {\displaystyle D} , where Σ =: diag ⁡ ( σ 1 , … , σ r ) {\displaystyle \Sigma =:\operatorname {diag} (\sigma _{1},\ldots ,\sigma _{r})} , where r ≤ min { m , n } = n {\displaystyle r\leq \min\{m,n\}=n} , is the m × n {\displaystyle m\times n} rectangular diagonal matrix with r {\displaystyle r} non-zero singular values σ 1 ≥ … ≥ σ r > σ r + 1 = … = σ n = 0 {\displaystyle \sigma _{1}\geq \ldots \geq \sigma _{r}>\sigma _{r+1}=\ldots =\sigma _{n}=0} . For a given k ∈ { 1 , … , r } {\displaystyle k\in \{1,\dots ,r\}} , partition U {\displaystyle U} , Σ {\displaystyle \Sigma } , and V {\displaystyle V} as follows: U =: [ U 1 U 2 ] , Σ =: [ Σ 1 0 0 Σ 2 ] , and V =: [ V 1 V 2 ] , {\displaystyle U=:{\begin{bmatrix}U_{1}&U_{2}\end{bmatrix}},\quad \Sigma =:{\begin{bmatrix}\Sigma _{1}&0\\0&\Sigma _{2}\end{bmatrix}},\quad {\text{and}}\quad V=:{\begin{bmatrix}V_{1}&V_{2}\end{bmatrix}},} where U 1 {\displaystyle U_{1}} is m × k {\displaystyle m\times k} , Σ 1 {\displaystyle \Sigma _{1}} is k × k {\displaystyle k\times k} , and V 1 {\displaystyle V_{1}} is n × k {\displaystyle n\times k} . Then the rank k {\displaystyle k} matrix D ^ ∗ := U 1 Σ 1 V 1 ⊤ , {\displaystyle {\widehat {D}}^{}:=U_{1}\Sigma _{1}V_{1}^{\top },} obtained from the truncated singular value decomposition is such that ‖ D − D ^ ∗ ‖ F = min rank ⁡ ( D ^ ) ≤ k ‖ D − D ^ ‖ F = σ k + 1 2 + ⋯ + σ r 2 . {\displaystyle \|D-{\widehat {D}}^{}\|_{\text{F}}=\min _{\operatorname {rank} ({\widehat {D}})\leq k}\|D-{\widehat {D}}\|_{\text{F}}={\sqrt {\sigma _{k+1}^{2}+\cdots +\sigma _{r}^{2}}}.} The minimizer D ^ ∗ {\displaystyle {\widehat {D}}^{}} is unique if and only if σ k > σ k + 1 {\displaystyle \sigma _{k}>\sigma _{k+1}} . == Proof of Eckart–Young–Mirsky theorem (for spectral norm) == Let A ∈ R m × n {\displaystyle A\in \mathbb {R} ^{m\times n}} be a real (possibly rectangular) matrix with m ≤ n {\displaystyle m\leq n} . Suppose that A = U Σ V ⊤ {\displaystyle A=U\Sigma V^{\top }} is the singular value decomposition of A {\displaystyle A} . Recall that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices, and Σ {\displaystyle \Sigma } is an m × n {\displaystyle m\times n} diagonal matrix with entries ( σ 1 , σ 2 , ⋯ , σ m ) {\displaystyle (\sigma _{1},\sigma _{2},\cdots ,\sigma _{m})} such that σ 1 ≥ σ 2 ≥ ⋯ ≥ σ m ≥ 0 {\displaystyle \sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{m}\geq 0} . We claim that the best rank- k {\displaystyle k} approximation to A {\displaystyle A} in the spectral norm, denoted by ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} , is given by A k := ∑ i = 1 k σ i u i v i ⊤ {\displaystyle A_{k}:=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }} where u i {\displaystyle u_{i}} and v i {\displaystyle v_{i}} denote the i {\displaystyle i} th column of U {\displaystyle U} and V {\displaystyle V} , respectively. First, note that we have ‖ A − A k ‖ 2 = ‖ ∑ i = 1 n σ i u i v i ⊤ − ∑ i = 1 k σ i u i v i ⊤ ‖ 2 = ‖ ∑ i = k + 1 n σ i u i v i ⊤ ‖ 2 = σ k + 1 {\displaystyle \|A-A_{k}\|_{2}=\left\|\sum _{i=1}^{\color {red}{n}}\sigma _{i}u_{i}v_{i}^{\top }-\sum _{i=1}^{\color {red}{k}}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\left\|\sum _{i=\color {red}{k+1}}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\sigma _{k+1}} Therefore, we need to show that if B k = X Y ⊤ {\displaystyle B_{k}=XY^{\top }} where X {\displaystyle X} and Y {\displaystyle Y} have k {\displaystyle k} columns then ‖ A − A k ‖ 2 = σ k + 1 ≤ ‖ A − B k ‖ 2 {\displaystyle \|A-A_{k}\|_{2}=\sigma _{k+1}\leq \|A-B_{k}\|_{2}} . Since Y {\displaystyle Y} has k {\displaystyle k} columns, then there must be a nontrivial linear combination of the first k + 1 {\displaystyle k+1} columns of V {\displaystyle V} , i.e., w = γ 1 v 1 + ⋯ + γ k + 1 v k + 1 , {\displaystyle w=\gamma _{1}v_{1}+\cdots +\gamma _{k+1}v_{k+1},} such that Y ⊤ w = 0 {\displaystyle Y^{\top }w=0} . Without loss of generality, we can scale w {\displaystyle w} so that ‖ w ‖ 2 = 1 {\displaystyle \|w\|_{2}=1} or (equivalently) γ 1 2 + ⋯ + γ k + 1 2 = 1 {\displaystyle \gamma _{1}^{2}+\cdots +\gamma _{k+1}^{2}=1} . Therefore, ‖ A − B k ‖ 2 2 ≥ ‖ ( A − B k ) w ‖ 2 2 = ‖ A w ‖ 2 2 = γ 1 2 σ 1 2 + ⋯ + γ k + 1 2 σ k + 1 2 ≥ σ k + 1 2 . {\displaystyle \|A-B_{k}\|_{2}^{2}\geq \|(A-B_{k})w\|_{2}^{2}=\|Aw\|_{2}^{2}=\gamma _{1}^{2}\sigma _{1}^{2}+\cdots +\gamma _{k+1}^{2}\sigma _{k+1}^{2}\geq \sigma _{k+1}^{2}.} The result follows by taking the square root of both sides of the above inequality. == Proof of Eckart–Young–Mirsky theorem (for Frobenius norm) == Let A ∈ R m × n {\displaystyle A\in \mathbb {R} ^{m\times n}} be a real (possibly rectangular) matrix with m ≤ n {\displaystyle m\leq n} . Suppose that A = U Σ V ⊤ {\displaystyle A=U\Sigma V^{\top }} is the singular value decomposition of A {\displaystyle A} . We claim that the best rank k {\displaystyle k} approximation to A {\displaystyle A} in the Frobenius norm, denoted by ‖ ⋅ ‖ F {\displaystyle \|\cdot \|_{F}} , is given by A k = ∑ i = 1 k σ i u i v i ⊤ {\displaystyle A_{k}=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }} where u i {\displaystyle u_{i}} and v i {\displaystyle v_{i}} denote the i {\displaystyle i} th column of U {\displaystyle U} and V {\displaystyle V} , respectively. First, note that we have ‖ A − A k ‖ F 2 = ‖ ∑ i = k + 1 n σ i u i v i ⊤ ‖ F 2 = ∑ i = k + 1 n σ i 2 {\displaystyle \|A-A_{k}\|_{F}^{2}=\left\|\sum _{i=k+1}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}} Therefore, we need to show that if B k = X Y ⊤ {\displaystyle B_{k}=XY^{\top }} where X {\displaystyle X} and Y {\displaystyle Y} have k {\displaystyle k} columns then ‖ A − A k ‖ F 2 = ∑ i = k + 1 n σ i 2 ≤ ‖ A − B k ‖ F 2 . {\displaystyle \|A-A_{k}\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}\leq \|A-B_{k}\|_{F}^{2}.} By the triangle inequality with the spectral norm

    Read more →
  • Minimum Population Search

    Minimum Population Search

    In evolutionary computation, Minimum Population Search (MPS) is a computational method that optimizes a problem by iteratively trying to improve a set of candidate solutions with regard to a given measure of quality. It solves a problem by evolving a small population of candidate solutions by means of relatively simple arithmetical operations. MPS is a metaheuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. For problems where finding the precise global optimum is less important than finding an acceptable local optimum in a fixed amount of time, using a metaheuristic such as MPS may be preferable to alternatives such as brute-force search or gradient descent. MPS is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means MPS does not require for the optimization problem to be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods. MPS can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. == Background == In a similar way to Differential evolution, MPS uses difference vectors between the members of the population in order to generate new solutions. It attempts to provide an efficient use of function evaluations by maintaining a small population size. If the population size is smaller than the dimensionality of the search space, then the solutions generated through difference vectors will be constrained to the n − 1 {\displaystyle n-1} dimensional hyperplane. A smaller population size will lead to a more restricted subspace. With a population size equal to the dimensionality of the problem ( n = d ) {\displaystyle (n=d)} , the “line/hyperplane points” in MPS will be generated within a d − 1 {\displaystyle d-1} dimensional hyperplane. Taking a step orthogonal to this hyperplane will allow the search process to cover all the dimensions of the search space. Population size is a fundamental parameter in the performance of population-based heuristics. Larger populations promote exploration, but they also allow fewer generations, and this can reduce the chance of convergence. Searching with a small population can increase the chances of convergence and the efficient use of function evaluations, but it can also induce the risk of premature convergence. If the risk of premature convergence can be avoided, then a population-based heuristic could benefit from the efficiency and faster convergence rate of a smaller population. To avoid premature convergence, it is important to have a diversified population. By including techniques for explicitly increasing diversity and exploration, it is possible to have smaller populations with less risk of premature convergence. === Thresheld Convergence === Thresheld Convergence (TC) is a diversification technique which attempts to separate the processes of exploration and exploitation. TC uses a “threshold” function to establish a minimum search step, and managing this step makes it possible to influence the transition from exploration to exploitation, convergence is thus “held” back until the last stages of the search process. The goal of a controlled transition is to avoid an early concentration of the population around a few search regions and avoid the loss of diversity which can cause premature convergence. Thresheld Convergence has been successfully applied to several population-based metaheuristics such as Particle Swarm Optimization, Differential evolution, Evolution strategies, Simulated annealing and Estimation of Distribution Algorithms. The ideal case for Thresheld Convergence is to have one sample solution from each attraction basin, and for each sample solution to have the same relative fitness with respect to its local optimum. Enforcing a minimum step aims to achieve this ideal case. In MPS Thresheld Convergence is specifically used to preserve diversity and avoid premature convergence by establishing a minimum search step. By disallowing new solutions which are too close to members of the current population, TC forces a strong exploration during the early stages of the search while preserving the diversity of the (small) population. == Algorithm == A basic variant of the MPS algorithm works by having a population of size equal to the dimension of the problem. New solutions are generated by exploring the hyperplane defined by the current solutions (by means of difference vectors) and performing an additional orthogonal step in order to avoid getting caught in this hyperplane. The step sizes are controlled by the Thresheld Convergence technique, which gradually reduces step sizes as the search process advances. An outline for the algorithm is given below: Generate the first initial population. Allowing these solutions to lie near the bounds of the search space generally gives good results: s k = ( r s 1 ∗ b o u n d 1 / 2 , r s 2 ∗ b o u n d 2 / 2 , . . . , r s n ∗ b o u n d n / 2 ) {\displaystyle s_{k}=(rs_{1}bound_{1}/2,rs_{2}bound_{2}/2,...,rs_{n}bound_{n}/2)} where s k {\displaystyle s_{k}} is the k {\displaystyle k} -th population member, r s i {\displaystyle rs_{i}} are random numbers which can be −1 or 1, and the b o u n d i {\displaystyle bound_{i}} are the lower and upper bounds on each dimension. While a stop condition is not reached: Update threshold convergence values ( m i n _ s t e p {\displaystyle min\_step} and m a x _ s t e p {\displaystyle max\_step} ) Calculate the centroid of the current population ( x c {\displaystyle x_{c}} ) For each member of the population ( x i {\displaystyle x_{i}} ), generate a new offspring as follows: Uniformly generate a scaling factor ( F i {\displaystyle F_{i}} ) between − m a x _ s t e p {\displaystyle -max\_step} and m a x _ s t e p {\displaystyle max\_step} Generate a vector ( x o {\displaystyle x_{o}} ) orthogonal to the difference vector between x i {\displaystyle x_{i}} and x c {\displaystyle x_{c}} Calculate a scaling factor for the orthogonal vector: m i n _ o r t h = s q r t ( m a x ( m i n _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle min\_orth=sqrt(max(min\_step^{2}-F_{i}^{2},0))} m a x _ o r t h = s q r t ( m a x ( m a x _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle max\_orth=sqrt(max(max\_step^{2}-F_{i}^{2},0))} o r t h _ s t e p = u n i f o r m ( m i n _ o r t h , m a x _ o r t h ) {\displaystyle orth\_step=uniform(min\_orth,max\_orth)} Generate the new solution by adding the difference and the orthogonal vectors to the original solution n e w _ s o l u t i o n = x i + F i ∗ ( x i − x c ) ∗ o r t h _ s t e p ∗ x o {\displaystyle new\_solution=x_{i}+F_{i}(x_{i}-x_{c})orth\_stepx_{o}} Pick the best members between the old population and the new one by discarding the least fit members. Return the single best solution or the best population found as the final result.

    Read more →