AI Coding Wiki

AI Coding Wiki — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Exercism

    Exercism

    Exercism is an online, open-source, free coding platform that offers code practice and mentorship on 77 different programming languages. == History == Software developer Katrina Owen created Exercism while she was teaching programming at Jumpstart Labs. The platform was developed as an internal tool to solve the problem of her own students not receiving feedback on the coding problems they were practicing. Katrina put the site publicly online and found that people were sharing it with their friends, practicing together and giving each other feedback. Within 12 months, the site had organically grown to see over 6,000 users had submitted code or feedback, and hundreds of volunteers contribute to the languages or tooling on the platform. In 2016, Jeremy Walker joined as co-founder and CEO. In July 2018, the site was relaunched with a new design and centered around a formal mentoring mode, at which point Katrina stepped back from day-to-day involvement. == Product == In the past, the website differed from other coding platforms by requiring students to download exercises through a command line client, solve the code on their own computers then submit the solution for feedback, at which point they can also view other's solutions to the same problem. Since its second relaunch in 2021, solutions can be edited and submitted through a web editor, though the command line client remains available. Exercism has tracks for 74 programming languages. Among the notable languages taught: ABAP, C, C#, C++, CoffeeScript, Delphi, Elm, Erlang, F#, Gleam, Go, Java, JavaScript, Julia, Kotlin, Objective-C, PHP, Python, Raku, Red, Ruby, Rust, Scala, Swift, and V (Vlang). In 2023, the site launched a "12 in 23" challenge for users to learn the basics of 12 different languages - one per month in 2023. == Open source == The Exercism codebase is open source. In April 2016, it consisted of 50 repositories including website code, API code, command-line code and, most of all, over 40 stand-alone repositories for different language tracks. As of February 2024 Exercism has 14,344 contributors, maintains 366 repositories, and 19,603 mentors.

    Read more →
  • Random forest

    Random forest

    Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees. Random forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created in 1995 by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg. An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.). The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance. == History == The general method of random decision forests was first proposed by Salzberg and Heath in 1993, with a method that used a randomized decision tree algorithm to create multiple trees and then combine them using majority voting. This idea was developed further by Ho in 1995. Ho established that forests of trees splitting with oblique hyperplanes can gain accuracy as they grow without suffering from overtraining, as long as the forests are randomly restricted to be sensitive to only selected feature dimensions. A subsequent work along the same lines concluded that other splitting methods behave similarly, as long as they are randomly forced to be insensitive to some feature dimensions. This observation that a more complex classifier (a larger forest) gets more accurate nearly monotonically is in sharp contrast to the common belief that the complexity of a classifier can only grow to a certain level of accuracy before being hurt by overfitting. The explanation of the forest method's resistance to overtraining can be found in Kleinberg's theory of stochastic discrimination. The early development of Breiman's notion of random forests was influenced by the work of Amit and Geman who introduced the idea of searching over a random subset of the available decisions when splitting a node, in the context of growing a single tree. The idea of random subspace selection from Ho was also influential in the design of random forests. This method grows a forest of trees, and introduces variation among the trees by projecting the training data into a randomly chosen subspace before fitting each tree or each node. Finally, the idea of randomized node optimization, where the decision at each node is selected by a randomized procedure, rather than a deterministic optimization was first introduced by Thomas G. Dietterich. The proper introduction of random forests was made in a paper by Leo Breiman, that has become one of the world's most cited papers. This paper describes a method of building a forest of uncorrelated trees using a CART like procedure, combined with randomized node optimization and bagging. In addition, this paper combines several ingredients, some previously known and some novel, which form the basis of the modern practice of random forests, in particular: Using out-of-bag error as an estimate of the generalization error. Measuring variable importance through permutation. The report also offers the first theoretical result for random forests in the form of a bound on the generalization error which depends on the strength of the trees in the forest and their correlation. == Algorithm == === Preliminaries: decision tree learning === Decision trees are a popular method for various machine learning tasks. Tree learning is almost "an off-the-shelf procedure for data mining", say Hastie et al., "because it is invariant under scaling and various other transformations of feature values, is robust to inclusion of irrelevant features, and produces inspectable models. However, they are seldom accurate". In particular, trees that are grown very deep tend to learn highly irregular patterns: they overfit their training sets, i.e. have low bias, but very high variance. Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance. This comes at the expense of a small increase in the bias and some loss of interpretability, but generally greatly boosts the performance in the final model. === Bagging === The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples: After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': f ^ = 1 B ∑ b = 1 B f b ( x ′ ) {\displaystyle {\hat {f}}={\frac {1}{B}}\sum _{b=1}^{B}f_{b}(x')} or by taking the plurality vote in the case of classification trees. This bootstrapping procedure leads to better model performance because it decreases the variance of the model, without increasing the bias. This means that while the predictions of a single tree are highly sensitive to noise in its training set, the average of many trees is not, as long as the trees are not correlated. Simply training many trees on a single training set would give strongly correlated trees (or even the same tree many times, if the training algorithm is deterministic); bootstrap sampling is a way of de-correlating the trees by showing them different training sets. Additionally, an estimate of the uncertainty of the prediction can be made as the standard deviation of the predictions from all the individual regression trees on x′: σ = ∑ b = 1 B ( f b ( x ′ ) − f ^ ) 2 B − 1 . {\displaystyle \sigma ={\sqrt {\frac {\sum _{b=1}^{B}(f_{b}(x')-{\hat {f}})^{2}}{B-1}}}.} The number B of samples (equivalently, of trees) is a free parameter. Typically, a few hundred to several thousand trees are used, depending on the size and nature of the training set. B can be optimized using cross-validation, or by observing the out-of-bag error: the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. The training and test error tend to level off after some number of trees have been fit. === From bagging to random forests === The above procedure describes the original bagging algorithm for trees. Random forests also include another type of bagging scheme: they use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. This process is sometimes called "feature bagging". The reason for doing this is the correlation of the trees in an ordinary bootstrap sample: if one or a few features are very strong predictors for the response variable (target output), these features will be selected in many of the B trees, causing them to become correlated. An analysis of how bagging and random subspace projection contribute to accuracy gains under different conditions is given by Ho. Typically, for a classification problem with p {\displaystyle p} features, p {\displaystyle {\sqrt {p}}} (rounded down) features are used in each split. For regression problems the inventors recommend p / 3 {\displaystyle p/3} (rounded down) with a minimum node size of 5 as the default. In practice, the best values for these parameters should be tuned on a case-to-case basis for every problem. === ExtraTrees === Adding one further step of randomization yields extremely randomized trees, or ExtraTrees. As with ordinary random forests, they are an ensemble of individual trees, but there are two main differences: (1) each tree is trained using the whole learning sample (rather than a bootstrap sample), and (2) the top-down splitting is randomized: for each feature under consideration, a number of random cut-points are selected, instead of computing the locally optimal cut-point (based on, e.g., information gain or the Gini impurity). The values are chosen from a uniform distribution within the feature's empirical range (in the tree's training set). Then, of all the randomly chosen splits, the split that yields the highest score is chosen to split the node. Similar to ordinary random forests, the number of randomly selected features to be considered at each node can be specified. Default values for this parameter are p {\displaystyle {\sqrt {p}}} for classification and p {\displaystyle p} for regression, where p {\displaystyle p} is the number of features in the model. === Random forests for high-dimensional data === The basic random forest procedure may

    Read more →
  • Absorbing Markov chain

    Absorbing Markov chain

    In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .

    Read more →
  • Sigmoid function

    Sigmoid function

    A sigmoid function is any mathematical function whose graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function. Other sigmoid functions are given in the Examples section. In some fields, most notably in the context of artificial neural networks, the term "sigmoid function" is used as a synonym for "logistic function". Special cases of sigmoid functions include the Gompertz curve (used in modeling systems that saturate at large values of x) and the ogee curve (used in the spillway of some dams). Sigmoid functions have domain of all real numbers, with return (response) value commonly monotonically increasing but could be decreasing. Sigmoid functions most often show a return value (y axis) in the range 0 to 1. Another commonly used range is from −1 to 1. There is also the Heaviside step function, which instantaneously transitions between 0 and 1. A wide variety of sigmoid functions including the logistic and hyperbolic tangent functions have been used as the activation function of artificial neurons. Sigmoid curves are also common in statistics as cumulative distribution functions (which go from 0 to 1), such as the integrals of the logistic density, the normal density, and Student's t probability density functions. The logistic sigmoid function is invertible, and its inverse is the logit function. == Theory == In mathematics, a unitary sigmoid function is a bounded sigmoid-type function normalized to the unit range, typically with lower and upper asymptotes at 0 and 1. The theory proposed by Grebenc distinguishes three kinds of unitary sigmoid functions according to their asymptotic behavior and the presence or absence of oscillation near the asymptotes. A general form of a unitary sigmoid function is y = A S ( f ( x ) ) + B , {\displaystyle y=A\,S(f(x))+B,} where S {\displaystyle S} is an increasing sigmoid function, f ( x ) {\displaystyle f(x)} is a transformation of the independent variable, and A {\displaystyle A} and B {\displaystyle B} are constants controlling scaling and translation. === Classification === ==== 1st kind ==== A unitary sigmoid function of the first kind is a bounded increasing function that approaches its lower and upper asymptotes monotonically, without oscillation. This class includes many of the standard sigmoid functions used in statistics, biomathematics, and engineering, such as the logistic function and related generalizations. ==== 2nd kind ==== A unitary sigmoid function of the second kind is a bounded increasing function that oscillates near the upper asymptote while preserving an overall sigmoid transition. ==== 3rd kind ==== A unitary sigmoid function of the third kind is a bounded increasing function that oscillates near both the lower and upper asymptotes. These functions retain the global shape of a sigmoid curve but exhibit oscillatory behavior in the vicinity of both limiting states. === Taxonomy === The tables below show the taxonomy of unitary sigmoid functions of all three kinds. Table 1. Taxonomy matrix with examples of sigmoid functions of the 1st kind Table 2. Taxonomy matrix with examples of sigmoid functions of the 2nd kind on the unbounded interval Table 3. Taxonomy matrix with examples of sigmoid functions of the 3rd kind === Construction methods === The same theory presents a list of 30 methods for constructing sigmoid functions.. These include algebraic transformations, integration and convolution methods, constructions from bell-shaped functions, solutions of ordinary and partial differential equations, recursive schemes, stochastic differential equations, feedback systems, and chaotic systems. M0: Construction method for sigmoid functions not evident or intuitive M1: Inverse of singularity functions M2: Sigmoid functions of embedded positive functions M3: Rising a sigmoid function to the power M4: Exponentiating a sigmoid function M5: Symmetric sigmoid functions derived from asymmetric ones M6: Sigmoid functions of the reciprocal independent variable M7: Embedding a sigmoid function into other function M8: Sum of sigmoid functions M9: Multiplication of sigmoid functions M10: Integral of the product of an increasing and a decreasing function M11: Derivation from lambda (bell-shaped) functions M12: Integration of lambda (bell-shaped) function M13: Integration of the sum of lambda (bell-shaped) functions M14: Integration of the product of two lambda (bell-shaped) functions M15: Integration of the difference of two shifted sigmoid functions M16: Integration of the product of two shifted sigmoid functions M17: Convolution of sigmoid functions M18: Integration of the product of lambda and sigmoid function M19: Solutions of ordinary differential equations M20: Solutions of partial differential equation (PDE) M21: Solutions of functional differential equation (FDE) M22: Sum of a sigmoid function and some derivatives M23: Combination of sigmoid functions, its derivative and integral M24: Filtering sigmoid functions M25: Special cases of Gauss hypergeometric functions M26: Feedback closed-loop systems M27: Recursive functions M28: Recursive time-delayed feed-forward loops M29: Solutions of stochastic differential equation M30: Chaotic sigmoid functions Consult reference for more details. == Definition == A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a positive derivative at each point. == Properties == In general, a sigmoid function is monotonic, and has a first derivative which is bell shaped. Conversely, the integral of any continuous, non-negative, bell-shaped function (with one local maximum and no local minimum, unless degenerate) will be sigmoidal. Thus the cumulative distribution functions for many common probability distributions are sigmoidal. One such example is the error function, which is related to the cumulative distribution function of a normal distribution; another is the arctan function, which is related to the cumulative distribution function of a Cauchy distribution. A sigmoid function is constrained by a pair of horizontal asymptotes as x → ± ∞ {\displaystyle x\rightarrow \pm \infty } . A sigmoid function is convex for values less than a particular point, and it is concave for values greater than that point: in many of the examples here, that point is 0. == Examples == Logistic function f ( x ) = 1 1 + e − x {\displaystyle f(x)={\frac {1}{1+e^{-x}}}} Hyperbolic tangent (shifted and scaled version of the logistic function, above) f ( x ) = tanh ⁡ x = e x − e − x e x + e − x {\displaystyle f(x)=\tanh x={\frac {e^{x}-e^{-x}}{e^{x}+e^{-x}}}} Arctangent function f ( x ) = arctan ⁡ x {\displaystyle f(x)=\arctan x} Gudermannian function f ( x ) = gd ⁡ ( x ) = ∫ 0 x d t cosh ⁡ t = 2 arctan ⁡ ( tanh ⁡ ( x 2 ) ) {\displaystyle f(x)=\operatorname {gd} (x)=\int _{0}^{x}{\frac {dt}{\cosh t}}=2\arctan \left(\tanh \left({\frac {x}{2}}\right)\right)} Error function f ( x ) = erf ⁡ ( x ) = 2 π ∫ 0 x e − t 2 d t {\displaystyle f(x)=\operatorname {erf} (x)={\frac {2}{\sqrt {\pi }}}\int _{0}^{x}e^{-t^{2}}\,dt} Generalised logistic function f ( x ) = ( 1 + e − x ) − α , α > 0 {\displaystyle f(x)=\left(1+e^{-x}\right)^{-\alpha },\quad \alpha >0} Smoothstep function f ( x ) = { ( ∫ 0 1 ( 1 − u 2 ) N d u ) − 1 ∫ 0 x ( 1 − u 2 ) N d u , | x | ≤ 1 sgn ⁡ ( x ) | x | ≥ 1 N ∈ Z ≥ 1 {\displaystyle f(x)={\begin{cases}{\displaystyle \left(\int _{0}^{1}\left(1-u^{2}\right)^{N}du\right)^{-1}\int _{0}^{x}\left(1-u^{2}\right)^{N}\ du},&|x|\leq 1\\\\\operatorname {sgn}(x)&|x|\geq 1\\\end{cases}}\quad N\in \mathbb {Z} \geq 1} Some algebraic functions, for example f ( x ) = x 1 + x 2 {\displaystyle f(x)={\frac {x}{\sqrt {1+x^{2}}}}} and in a more general form f ( x ) = x ( 1 + | x | k ) 1 / k {\displaystyle f(x)={\frac {x}{\left(1+|x|^{k}\right)^{1/k}}}} Up to shifts and scaling, many sigmoids are special cases of f ( x ) = φ ( φ ( x , β ) , α ) , {\displaystyle f(x)=\varphi (\varphi (x,\beta ),\alpha ),} where φ ( x , λ ) = { ( 1 − λ x ) 1 / λ λ ≠ 0 e − x λ = 0 {\displaystyle \varphi (x,\lambda )={\begin{cases}(1-\lambda x)^{1/\lambda }&\lambda \neq 0\\e^{-x}&\lambda =0\\\end{cases}}} is the inverse of the negative Box–Cox transformation, and α < 1 {\displaystyle \alpha <1} and β < 1 {\displaystyle \beta <1} are shape parameters. Smooth transition function normalized to (−1,1): f ( x ) = { 2 1 + e − 2 m x 1 − x 2 − 1 , | x | < 1 sgn ⁡ ( x ) | x | ≥ 1 = { tanh ⁡ ( m x 1 − x 2 ) , | x | < 1 sgn ⁡ ( x ) | x | ≥ 1 {\displaystyle {\begin{aligned}f(x)&={\begin{cases}{\displaystyle {\frac {2}{1+e^{-2m{\frac {x}{1-x^{2}}}}}}-1},&|x|<1\\\\\operatorname {sgn}(x)&|x|\geq 1\\\end{cases}}\\&={\begin{cases}{\displaystyle \tanh \left(m{\frac {x}{1-x^{2}}}\right)},&|x|<1\\\\\operatorname {sgn}(x)&|x|\geq 1\\\end{cases}}\end{aligned}}} using the hyperbolic tangent mentioned above. Here, m {\displaystyle m} is a free parameter encoding the slope at x = 0 {\displaystyle x=0} , which must be great

    Read more →
  • Clips (software)

    Clips (software)

    Clips is a discontinued mobile video editing software application created by Apple Inc. It was released onto the iOS App Store on April 6, 2017, for free. Initially, it was only available on 64-bit devices running iOS 10.3 or later; as of version 3.1.3, it requires iOS 16.0 or later. Apple describes it as an app for "making and sharing fun videos with text, effects, graphics, and more.". Its final release was on May 9, 2024 before was removed from the App Store on October 10, 2025. == Features == After launching of the app, the user sees the view of the front-facing camera. The app allows the user to create a new clip by tapping on a red record button, or use photos or videos from the device's photo library. Once a clip is recorded, it can be added to a project timeline shown at the bottom of the screen. The user can share their project on social media platforms. The user can also add filters and effects to the project. "Live Titles" (available in several styles) can also be created by dictating to the device.

    Read more →
  • Julia (programming language)

    Julia (programming language)

    Julia is a dynamic general-purpose programming language. As a high-level language, distinctive aspects of Julia's design include a type system with parametric polymorphism, the use of multiple dispatch as a core programming paradigm, just-in-time compilation and a parallel garbage collection implementation. Notably, Julia does not support classes with encapsulated methods but instead relies on the types of all of a function's arguments to determine which method will be called. By default, Julia is run similarly to scripting languages, using its runtime, and allows for interactions, but Julia programs can also be compiled to small binary standalone executables (or to small libraries for e.g. Python), with e.g. the JuliaC.jl compiler. Julia programs can reuse libraries from other languages, and vice versa. Julia has interoperability with C, C++, Fortran, Rust, Python, and R. Additionally, some Julia packages have bindings to be used from Python and R as libraries. Julia is supported by programmer tools like IDEs (see below) and by notebooks like Pluto.jl, Jupyter, and since 2025, Google Colab officially supports Julia natively. Julia is sometimes used in embedded systems (e.g. has been used in a satellite in space on a Raspberry Pi Compute Module 4; 64-bit Pis work best with Julia, and Julia is supported in Raspbian). == History == Work on Julia began in 2009, when Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman set out to create a free language that was both high-level and fast. On 14 February 2012, the team launched a website with a blog post explaining the language's mission. In an interview with InfoWorld in April 2012, Karpinski said about the name of the language, Julia: "There's no good reason, really. It just seemed like a pretty name." Bezanson said he chose the name on the recommendation of a friend, then years later wrote: Maybe julia stands for "Jeff's uncommon lisp is automated"? Julia's syntax is stable, since version 1.0 in 2018, and Julia has a backward compatibility guarantee for 1.x and also a stability promise for the documented (stable) API, while in the years before in the early development prior to 0.7 the syntax (and semantics) was changed in new versions. All of the (registered package) ecosystem uses the new and improved syntax, and in most cases relies on new APIs that have been added regularly, and in some cases minor additional syntax added in a forward compatible way e.g. in Julia 1.7. In the 10 years since the 2012 launch of pre-1.0 Julia, the community has grown. The Julia package ecosystem has over 11.8 million lines of code (including docs and tests). The JuliaCon academic conference for Julia users and developers has been held annually since 2014 with JuliaCon2020 welcoming over 28,900 unique viewers, and then JuliaCon2021 breaking all previous records (with more than 300 JuliaCon2021 presentations available for free on YouTube, up from 162 the year before), and 43,000 unique viewers during the conference. Three of the Julia co-creators are the recipients of the 2019 James H. Wilkinson Prize for Numerical Software (awarded every four years) "for the creation of Julia, an innovative environment for the creation of high-performance tools that enable the analysis and solution of computational science problems." Also, Alan Edelman, professor of applied mathematics at MIT, has been selected to receive the 2019 IEEE Computer Society Sidney Fernbach Award "for outstanding breakthroughs in high-performance computing, linear algebra, and computational science and for contributions to the Julia programming language." Version 0.3 was released in August 2014. Both Julia 0.7 and version 1.0 were released on 8 August 2018. Julia 1.4 added syntax for generic array indexing to handle e.g. 0-based arrays. The memory model was also changed. Julia 1.5 released in August 2020 added record and replay debugging support, for Mozilla's rr tool. The release changed the behavior in the REPL (to soft scope) to the one used in Jupyter, but keeps full compatible with non-REPL code (that retains hard scope). Julia 1.6 was the largest release since 1.0, and it was the long-term support (LTS) version for the longest time. Since Julia 1.7 development is back to time-based releases, and it was released in November 2021 with e.g. a new default random-number generator and Julia 1.7.3 fixed at least one security issue. Julia 1.8 added options for hiding source code when compiling Julia source code to executables. Julia 1.9 has added the ability to precompile packages to native machine code, done automatically; to improve precompilation of packages a new package PrecompileTools.jl was introduced, for use by package developers. Julia 1.10 was released on 25 December 2023 with new features such as parallel garbage collection. Julia 1.11 was released on 7 October 2024, and with it 1.10.5 became the next long-term support (LTS) version (i.e. those became the only two supported versions), since replaced by 1.10.10 released on 27 June, and 1.6 is no longer an LTS version. Julia 1.11 adds e.g. the new public keyword to signal safe public API (Julia users are advised to use such API, not internals, of Julia or packages, and package authors advised to use the keyword, generally indirectly, e.g. prefixed with the @compat macro, from Compat.jl, to also support older Julia versions, at least the LTS version). Julia 1.12 was released on 7 October 2025 (and 1.12.5 on 9 February 2026), and with it a JuliaC.jl package including the juliac compiler that works with it, for making rather small binary executables (much smaller than was possible before; through the use of new so-called trimming feature). Julia 1.10 LTS is an officially still-supported branch, but the 1.11 branch has also been maintained after 1.12 release, with 1.11.8 released and then 1.11.9 released on 8 February 2026. === JuliaCon === Since 2014, the Julia Community has hosted an annual Julia Conference focused on developers and users. The first JuliaCon took place in Chicago and kickstarted the annual occurrence of the conference. Since 2014, the conference has taken place across a number of locations including MIT and the University of Maryland, Baltimore. The event audience has grown from a few dozen people to over 28,900 unique attendees during JuliaCon 2020, which took place virtually. JuliaCon 2021 also took place virtually with keynote addresses from professors William Kahan, the primary architect of the IEEE 754 floating-point standard (which virtually all CPUs and languages, including Julia, use), Jan Vitek, Xiaoye Sherry Li, and Soumith Chintala, a co-creator of PyTorch. JuliaCon grew to 43,000 unique attendees and more than 300 presentations (still freely accessible, plus for older years). JuliaCon 2022 will also be virtual held between July 27 and July 29, 2022, for the first time in several languages, not just in English. === Sponsors === The Julia language became a NumFOCUS fiscally sponsored project in 2014 in an effort to ensure the project's long-term sustainability. Jeremy Kepner at MIT Lincoln Laboratory was the founding sponsor of the Julia project in its early days. In addition, funds from the Gordon and Betty Moore Foundation, the Alfred P. Sloan Foundation, Intel, and agencies such as NSF, DARPA, NIH, NASA, and FAA have been essential to the development of Julia. Mozilla, the maker of Firefox web browser, with its research grants for H1 2019, sponsored "a member of the official Julia team" for the project "Bringing Julia to the Browser", meaning to Firefox and other web browsers. The Julia language is also supported by individual donors on GitHub. === The Julia company === JuliaHub, Inc. was founded in 2015 as Julia Computing, Inc. by Viral B. Shah, Deepak Vinchhi, Alan Edelman, Jeff Bezanson, Stefan Karpinski and Keno Fischer. In June 2017, Julia Computing raised US$4.6 million in seed funding from General Catalyst and Founder Collective, the same month was "granted $910,000 by the Alfred P. Sloan Foundation to support open-source Julia development, including $160,000 to promote diversity in the Julia community", and in December 2019 the company got $1.1 million funding from the US government to "develop a neural component machine learning tool to reduce the total energy consumption of heating, ventilation, and air conditioning (HVAC) systems in buildings". In July 2021, Julia Computing announced they raised a $24 million Series A round led by Dorilton Ventures, which also owns Formula One team Williams Racing, that partnered with Julia Computing. Williams' Commercial Director said: "Investing in companies building best-in-class cloud technology is a strategic focus for Dorilton and Julia's versatile platform, with revolutionary capabilities in simulation and modelling, is hugely relevant to our business. We look forward to embedding Julia Computing in the world's most technologically advanced sport". In June 2023, JuliaHub received (again, now

    Read more →
  • Swish function

    Swish function

    The swish function is a family of mathematical function defined as follows: swish β ⁡ ( x ) = x sigmoid ⁡ ( β x ) = x 1 + e − β x . {\displaystyle \operatorname {swish} _{\beta }(x)=x\operatorname {sigmoid} (\beta x)={\frac {x}{1+e^{-\beta x}}}.} where β {\displaystyle \beta } can be constant (usually set to 1) or trainable and "sigmoid" refers to the logistic function. The swish family was designed to smoothly interpolate between a linear function and the Rectified linear unit (ReLU) function. When considering positive values, Swish is a particular case of doubly parameterized sigmoid shrinkage function defined in . Variants of the swish function include Mish. == Special values == For β = 0, the function is linear: f(x) = x/2. For β = 1, the function is the Sigmoid Linear Unit (SiLU). For β = 1.702, the function approximates GeLU. With β → ∞, the function converges to ReLU. Thus, the swish family smoothly interpolates between a linear function and the ReLU function. Since swish β ⁡ ( x ) = swish 1 ⁡ ( β x ) / β {\displaystyle \operatorname {swish} _{\beta }(x)=\operatorname {swish} _{1}(\beta x)/\beta } , all instances of swish have the same shape as the default swish 1 {\displaystyle \operatorname {swish} _{1}} , zoomed by β {\displaystyle \beta } . One usually sets β > 0 {\displaystyle \beta >0} . When β {\displaystyle \beta } is trainable, this constraint can be enforced by β = e b {\displaystyle \beta =e^{b}} , where b {\displaystyle b} is trainable. swish 1 ⁡ ( x ) = x 2 + x 2 4 − x 4 48 + x 6 480 + O ( x 8 ) {\displaystyle \operatorname {swish} _{1}(x)={\frac {x}{2}}+{\frac {x^{2}}{4}}-{\frac {x^{4}}{48}}+{\frac {x^{6}}{480}}+O\left(x^{8}\right)} swish 1 ⁡ ( x ) = x 2 tanh ⁡ ( x 2 ) + x 2 swish 1 ⁡ ( x ) + swish − 1 ⁡ ( x ) = x tanh ⁡ ( x 2 ) swish 1 ⁡ ( x ) − swish − 1 ⁡ ( x ) = x {\displaystyle {\begin{aligned}\operatorname {swish} _{1}(x)&={\frac {x}{2}}\tanh \left({\frac {x}{2}}\right)+{\frac {x}{2}}\\\operatorname {swish} _{1}(x)+\operatorname {swish} _{-1}(x)&=x\tanh \left({\frac {x}{2}}\right)\\\operatorname {swish} _{1}(x)-\operatorname {swish} _{-1}(x)&=x\end{aligned}}} == Derivatives == Because swish β ⁡ ( x ) = swish 1 ⁡ ( β x ) / β {\displaystyle \operatorname {swish} _{\beta }(x)=\operatorname {swish} _{1}(\beta x)/\beta } , it suffices to calculate its derivatives for the default case. swish 1 ′ ⁡ ( x ) = x + sinh ⁡ ( x ) 4 cosh 2 ⁡ ( x 2 ) + 1 2 {\displaystyle \operatorname {swish} _{1}'(x)={\frac {x+\sinh(x)}{4\cosh ^{2}\left({\frac {x}{2}}\right)}}+{\frac {1}{2}}} so swish 1 ′ ⁡ ( x ) − 1 2 {\displaystyle \operatorname {swish} _{1}'(x)-{\frac {1}{2}}} is odd. swish 1 ″ ⁡ ( x ) = 1 − x 2 tanh ⁡ ( x 2 ) 2 cosh 2 ⁡ ( x 2 ) {\displaystyle \operatorname {swish} _{1}''(x)={\frac {1-{\frac {x}{2}}\tanh \left({\frac {x}{2}}\right)}{2\cosh ^{2}\left({\frac {x}{2}}\right)}}} so swish 1 ″ ⁡ ( x ) {\displaystyle \operatorname {swish} _{1}''(x)} is even. == History == SiLU was first proposed alongside the GELU in 2016, then again proposed in 2017 as the Sigmoid-weighted Linear Unit (SiL) in reinforcement learning. The SiLU/SiL was then again proposed as the SWISH over a year after its initial discovery, originally proposed without the learnable parameter β, so that β implicitly equaled 1. The swish paper was then updated to propose the activation with the learnable parameter β. In 2017, after performing analysis on ImageNet data, researchers from Google indicated that using this function as an activation function in artificial neural networks improves the performance, compared to ReLU and sigmoid functions. It is believed that one reason for the improvement is that the swish function helps alleviate the vanishing gradient problem during backpropagation.

    Read more →
  • Crossover (evolutionary algorithm)

    Crossover (evolutionary algorithm)

    Crossover in evolutionary algorithms and evolutionary computation, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during sexual reproduction in biology. New solutions can also be generated by cloning an existing solution, which is analogous to asexual reproduction. Newly generated solutions may be mutated before being added to the population. The aim of recombination is to transfer good characteristics from two different parents to one child. Different algorithms in evolutionary computation may use different data structures to store genetic information, and each genetic representation can be recombined with different crossover operators. Typical data structures that can be recombined with crossover are bit arrays, vectors of real numbers, or trees. The list of operators presented below is by no means complete and serves mainly as an exemplary illustration of this dyadic genetic operator type. More operators and more details can be found in the literature. == Crossover for binary arrays == Traditional genetic algorithms store genetic information in a chromosome represented by a bit array. Crossover methods for bit arrays are popular and an illustrative example of genetic recombination. === One-point crossover === A point on both parents' chromosomes is picked randomly, and designated a 'crossover point'. Bits to the right of that point are swapped between the two parent chromosomes. This results in two offspring, each carrying some genetic information from both parents. === Two-point and k-point crossover === In two-point crossover, two crossover points are picked randomly from the parent chromosomes. The bits in between the two points are swapped between the parent organisms. Two-point crossover is equivalent to performing two single-point crossovers with different crossover points. This strategy can be generalized to k-point crossover for any positive integer k, picking k crossover points. === Uniform crossover === In uniform crossover, typically, each bit is chosen from either parent with equal probability. Other mixing ratios are sometimes used, resulting in offspring which inherit more genetic information from one parent than the other. In a uniform crossover, we don’t divide the chromosome into segments, rather we treat each gene separately. In this, we essentially flip a coin for each chromosome to decide whether or not it will be included in the off-spring. == Crossover for integer or real-valued genomes == For the crossover operators presented above and for most other crossover operators for bit strings, it holds that they can also be applied accordingly to integer or real-valued genomes whose genes each consist of an integer or real-valued number. Instead of individual bits, integer or real-valued numbers are then simply copied into the child genome. The offspring lie on the remaining corners of the hyperbody spanned by the two parents P 1 = ( 1.5 , 6 , 8 ) {\displaystyle P_{1}=(1.5,6,8)} and P 2 = ( 7 , 2 , 1 ) {\displaystyle P_{2}=(7,2,1)} , as exemplified in the accompanying image for the three-dimensional case. === Discrete recombination === If the rules of the uniform crossover for bit strings are applied during the generation of the offspring, this is also called discrete recombination. === Intermediate recombination === In this recombination operator, the allele values of the child genome a i {\displaystyle a_{i}} are generated by mixing the alleles of the two parent genomes a i , P 1 {\displaystyle a_{i,P_{1}}} and a i , P 2 {\displaystyle a_{i,P_{2}}} : α i = α i , P 1 ⋅ β i + α i , P 2 ⋅ ( 1 − β i ) w i t h β i ∈ [ − d , 1 + d ] {\displaystyle \alpha _{i}=\alpha _{i,P_{1}}\cdot \beta _{i}+\alpha _{i,P_{2}}\cdot \left(1-\beta _{i}\right)\quad {\mathsf {with}}\quad \beta _{i}\in \left[-d,1+d\right]} randomly equally distributed per gene i {\displaystyle i} The choice of the interval [ − d , 1 + d ] {\displaystyle [-d,1+d]} causes that besides the interior of the hyperbody spanned by the allele values of the parent genes additionally a certain environment for the range of values of the offspring is in question. A value of 0.25 {\displaystyle 0.25} is recommended for d {\displaystyle d} to counteract the tendency to reduce the allele values that otherwise exists at d = 0 {\displaystyle d=0} . The adjacent figure shows for the two-dimensional case the range of possible new alleles of the two exemplary parents P 1 = ( 3 , 6 ) {\displaystyle P_{1}=(3,6)} and P 2 = ( 9 , 2 ) {\displaystyle P_{2}=(9,2)} in intermediate recombination. The offspring of discrete recombination C 1 {\displaystyle C_{1}} and C 2 {\displaystyle C_{2}} are also plotted. Intermediate recombination satisfies the arithmetic calculation of the allele values of the child genome required by virtual alphabet theory. Discrete and intermediate recombination are used as a standard in the evolution strategy. == Crossover for permutations == For combinatorial tasks, permutations are usually used that are specifically designed for genomes that are themselves permutations of a set. The underlying set is usually a subset of N {\displaystyle \mathbb {N} } or N 0 {\displaystyle \mathbb {N} _{0}} . If 1- or n-point or uniform crossover for integer genomes is used for such genomes, a child genome may contain some values twice and others may be missing. This can be remedied by genetic repair, e.g. by replacing the redundant genes in positional fidelity for missing ones from the other child genome. In order to avoid the generation of invalid offspring, special crossover operators for permutations have been developed which fulfill the basic requirements of such operators for permutations, namely that all elements of the initial permutation are also present in the new one and only the order is changed. It can be distinguished between combinatorial tasks, where all sequences are admissible, and those where there are constraints in the form of inadmissible partial sequences. A well-known representative of the first task type is the traveling salesman problem (TSP), where the goal is to visit a set of cities exactly once on the shortest tour. An example of the constrained task type is the scheduling of multiple workflows. Workflows involve sequence constraints on some of the individual work steps. For example, a thread cannot be cut until the corresponding hole has been drilled in a workpiece. Such problems are also called order-based permutations. In the following, two crossover operators are presented as examples, the partially mapped crossover (PMX) motivated by the TSP and the order crossover (OX1) designed for order-based permutations. A second offspring can be produced in each case by exchanging the parent chromosomes. === Partially mapped crossover (PMX) === The PMX operator was designed as a recombination operator for TSP like Problems. The explanation of the procedure is illustrated by an example: === Order crossover (OX1) === The order crossover goes back to Davis in its original form and is presented here in a slightly generalized version with more than two crossover points. It transfers information about the relative order from the second parent to the offspring. First, the number and position of the crossover points are determined randomly. The resulting gene sequences are then processed as described below: Among other things, order crossover is well suited for scheduling multiple workflows, when used in conjunction with 1- and n-point crossover. === Further crossover operators for permutations === Over time, a large number of crossover operators for permutations have been proposed, so the following list is only a small selection. For more information, the reader is referred to the literature. cycle crossover (CX) order-based crossover (OX2) position-based crossover (POS) edge recombination voting recombination (VR) alternating-positions crossover (AP) maximal preservative crossover (MPX) merge crossover (MX) sequential constructive crossover operator (SCX) The usual approach to solving TSP-like problems by genetic or, more generally, evolutionary algorithms, presented earlier, is either to repair illegal descendants or to adjust the operators appropriately so that illegal offspring do not arise in the first place. Alternatively, Riazi suggests the use of a double chromosome representation, which avoids illegal offspring.

    Read more →
  • Nextcloud

    Nextcloud

    Nextcloud is a modular workspace platform designed to provide teams and businesses with a comprehensive environment for digital collaboration. Beyond central data management, it integrates office suites like Collabora Online and EuroOffice office suites. for seamless, cooperative workflows. The platform features built-in tools for chat, videoconferencing, and a privacy-focused AI assistant capable of running entirely on local LLMs. Supported by a rich ecosystem of apps, it can be hosted in the cloud or on premises and can scale up to millions of users. It has been translated into over 100 languages. == Features == Nextcloud files are stored in conventional directory structures, accessible via WebDAV if necessary. A SQLite, MySQL/MariaDB or PostgreSQL database is required to provide additional functionality like permissions, shares, and comments. Nextcloud can synchronize with local clients running Windows (Windows 8.1 and above), macOS (10.14 or later), Linux and FreeBSD. Nextcloud permits user and group administration locally or via different backends like OpenID or LDAP. Content can be shared inside the system by defining granular read/write permissions between users and groups. Nextcloud users can create public URLs when sharing files. Logging of file-related actions, as well as disallowing access based on file access rules is also available. Security options like brute-force protection and multi-factor authentication using TOTP, WebAuthn, Oauth2, and OpenID Connect are available. Nextcloud has planned new features such as monitoring capabilities, full-text search and Kerberos authentication, as well as audio/video conferencing, expanded federation and smaller user interface improvements. == History == In April 2016 Frank Karlitschek and most core contributors left ownCloud Inc. These included some of ownCloud's staff according to sources near to the ownCloud community. Karlitschek and many of these contributors went on to fork ownCloud, creating Nextcloud. The fork was preceded by a blog post of Karlitschek announcing his departure and raising questions about the management of the ownCloud, its community, and priorities between growth, money, and sustainability. There have been no official statements about the reason for the fork. However, Karlitschek mentioned the fork several times in a talk at the 2018 FOSDEM conference and in two appearances on the FLOSS Weekly podcast, emphasizing cultural mismatch between open source developers and business oriented people not used to the open source community. On June 2, within 12 hours of the announcement of the fork, the American entity "ownCloud Inc." announced that it is shutting down with immediate effect, stating that "[...] main lenders in the US have cancelled our credit. Following American law, we are forced to close the doors of ownCloud, Inc. with immediate effect and terminate the contracts of 8 employees." ownCloud Inc. accused Karlitschek of poaching developers, while Nextcloud developers such as Arthur Schiwon stated that he "decided to quit because not everything in the ownCloud Inc. company world evolved as I imagined". ownCloud GmbH continued operations, secured financing from new investors and took over the business of ownCloud Inc. In April 2018 Informationstechnikzentrum Bund (ITZBund) reported Nextcloud won the tender for "Bundescloud" (Germany government cloud) project. In August 2019 it was announced that the governments of France, Sweden and the Netherlands would use Nextcloud for file transfer. In January 2020 Nextcloud 18 "Nextcloud Hub" was released. The major change was direct integration with an Office suite (OnlyOffice) and Nextcloud announced that their goal was to compete with Office 365 and Google Docs. A partnership with Ionos was revealed – its hosting location in Germany and compliance with GDPR should support the goal of data sovereignty. In spring 2020 remote work and web conferencing usage increased due to the COVID-19 pandemic and Nextcloud released version 19 with chat and videoconferencing Talk app integrated into the application core. Communication with an optional "high performance back-end" allows self-hosting of web conferences with more than 10 participants. Collabora Online was introduced as another integrated office suite. In August 2021 Nextcloud was chosen as a collaboration platform for European cloud software GAIA-X. In a September 2021 European Commission report it was mentioned as "the most widely deployed Open Source content collaboration platform" Following the 2025 United States tariffs against the European Union, fear of overreliance on US cloud providers such as Microsoft 365 and Google Workspace increased, with Nextcloud being one of the foremost contenders to replace them. Some governmental organisations including the European Data Protection Supervisor and the German state of Schleswig-Holstein have since switched from Microsoft's Sharepoint to Nextcloud. According to Nextcloud, during the first 5 months of 2025, customer interest in the software had tripled.

    Read more →
  • Cross-entropy

    Cross-entropy

    In information theory, the cross-entropy between two probability distributions p {\displaystyle p} and q {\displaystyle q} , over the same underlying set of events, measures the average number of bits needed to identify an event drawn from the set when the coding scheme used for the set is optimized for an estimated probability distribution q {\displaystyle q} , rather than the true distribution p {\displaystyle p} . == Definition == The cross-entropy of the distribution q {\displaystyle q} relative to a distribution p {\displaystyle p} over a given set is defined as follows: H ( p , q ) = − E p ⁡ [ log ⁡ q ] , {\displaystyle H(p,q)=-\operatorname {E} _{p}[\log q],} where E p ⁡ [ ⋅ ] {\displaystyle \operatorname {E} _{p}[\cdot ]} is the expected value operator with respect to the distribution p {\displaystyle p} . The definition may be formulated using the Kullback–Leibler divergence D K L ( p ∥ q ) {\displaystyle D_{\mathrm {KL} }(p\parallel q)} , divergence of p {\displaystyle p} from q {\displaystyle q} (also known as the relative entropy of p {\displaystyle p} with respect to q {\displaystyle q} ). H ( p , q ) = H ( p ) + D K L ( p ∥ q ) , {\displaystyle H(p,q)=H(p)+D_{\mathrm {KL} }(p\parallel q),} where H ( p ) {\displaystyle H(p)} is the entropy of p {\displaystyle p} . For discrete probability distributions p {\displaystyle p} and q {\displaystyle q} with the same support X {\displaystyle {\mathcal {X}}} , this means The situation for continuous distributions is analogous. We have to assume that p {\displaystyle p} and q {\displaystyle q} are absolutely continuous with respect to some reference measure r {\displaystyle r} (usually r {\displaystyle r} is a Lebesgue measure on a Borel σ-algebra). Let P {\displaystyle P} and Q {\displaystyle Q} be probability density functions of p {\displaystyle p} and q {\displaystyle q} with respect to r {\displaystyle r} . Then − ∫ X P ( x ) log ⁡ Q ( x ) d x = E p ⁡ [ − log ⁡ Q ] , {\displaystyle -\int _{\mathcal {X}}P(x)\,\log Q(x)\,\mathrm {d} x=\operatorname {E} _{p}[-\log Q],} and therefore NB: The notation H ( p , q ) {\displaystyle H(p,q)} is also used for a different concept, the joint entropy of p {\displaystyle p} and q {\displaystyle q} . == Motivation == In information theory, the Kraft–McMillan theorem establishes that any directly decodable coding scheme for coding a message to identify one value x i {\displaystyle x_{i}} out of a set of possibilities { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} can be seen as representing an implicit probability distribution q ( x i ) = ( 1 2 ) ℓ i {\displaystyle q(x_{i})=\left({\frac {1}{2}}\right)^{\ell _{i}}} over { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} , where ℓ i {\displaystyle \ell _{i}} is the length of the code for x i {\displaystyle x_{i}} in bits. Therefore, cross-entropy can be interpreted as the expected message-length per datum when a wrong distribution q {\displaystyle q} is assumed while the data actually follows a distribution p {\displaystyle p} . That is why the expectation is taken over the true probability distribution p {\displaystyle p} and not q . {\displaystyle q.} Indeed the expected message-length under the true distribution p {\displaystyle p} is E p ⁡ [ ℓ ] = − E p ⁡ [ ln ⁡ q ( x ) ln ⁡ ( 2 ) ] = − E p ⁡ [ log 2 ⁡ q ( x ) ] = − ∑ x i p ( x i ) log 2 ⁡ q ( x i ) = − ∑ x p ( x ) log 2 ⁡ q ( x ) = H ( p , q ) . {\displaystyle {\begin{aligned}\operatorname {E} _{p}[\ell ]&=-\operatorname {E} _{p}\left[{\frac {\ln {q(x)}}{\ln(2)}}\right]\\[1ex]&=-\operatorname {E} _{p}\left[\log _{2}{q(x)}\right]\\[1ex]&=-\sum _{x_{i}}p(x_{i})\,\log _{2}q(x_{i})\\[1ex]&=-\sum _{x}p(x)\,\log _{2}q(x)=H(p,q).\end{aligned}}} == Estimation == There are many situations where cross-entropy needs to be measured but the distribution of p {\displaystyle p} is unknown. An example is language modeling, where a model is created based on a training set T {\displaystyle T} , and then its cross-entropy is measured on a test set to assess how accurate the model is in predicting the test data. In this example, p {\displaystyle p} is the true distribution of words in any corpus, and q {\displaystyle q} is the distribution of words as predicted by the model. Since the true distribution is unknown, cross-entropy cannot be directly calculated. In these cases, an estimate of cross-entropy is calculated using the following formula: H ( T , q ) = − ∑ i = 1 N 1 N log 2 ⁡ q ( x i ) {\displaystyle H(T,q)=-\sum _{i=1}^{N}{\frac {1}{N}}\log _{2}q(x_{i})} where N {\displaystyle N} is the size of the test set, and q ( x ) {\displaystyle q(x)} is the probability of event x {\displaystyle x} estimated from the training set. In other words, q ( x i ) {\displaystyle q(x_{i})} is the probability estimate of the model that the i-th word of the text is x i {\displaystyle x_{i}} . The sum is averaged over the N {\displaystyle N} words of the test. This is a Monte Carlo estimate of the true cross-entropy, where the test set is treated as samples from p ( x ) {\displaystyle p(x)} . == Relation to maximum likelihood == The cross entropy arises in classification problems when introducing a logarithm in the guise of the log-likelihood function. This section concerns the estimation of the probabilities of different discrete outcomes. To this end, denote a parametrized family of distributions by q θ {\displaystyle q_{\theta }} , with θ {\displaystyle \theta } subject to the optimization effort. Consider a given finite sequence of N {\displaystyle N} values x i {\displaystyle x_{i}} from a training set, obtained from conditionally independent sampling. The likelihood assigned to any considered parameter θ {\displaystyle \theta } of the model is then given by the product over all probabilities q θ ( X = x i ) {\displaystyle q_{\theta }(X=x_{i})} . Repeated occurrences are possible, leading to equal factors in the product. If the count of occurrences of the value equal to x {\displaystyle x} is denoted by # x {\displaystyle \#x} , then the frequency of that value equals # x / N {\displaystyle \#x/N} . If p ( X = x ) {\displaystyle p(X=x)} is the underlying probability distribution, for large N {\displaystyle N} we expect p ( X = x ) ≈ # x / N {\displaystyle p(X=x)\approx \#x/N} , by the law of large numbers. Writing our likelihood function as the product of observations from the distribution q θ {\displaystyle q_{\theta }} : L ( θ ; x ) = ∏ i q θ ( X = x i ) = ∏ x q θ ( X = x ) # x ≈ ∏ x q θ ( X = x ) N ⋅ p ( X = x ) = exp ⁡ log ⁡ [ ∏ x q θ ( X = x ) N ⋅ p ( X = x ) ] = exp ⁡ ( ∑ x N ⋅ p ( X = x ) log ⁡ q θ ( X = x ) ) , {\displaystyle {\begin{aligned}{\mathcal {L}}(\theta ;{\mathbf {x} })&=\prod _{i}q_{\theta }(X=x_{i})=\prod _{x}q_{\theta }(X=x)^{\#x}\\&\approx \prod _{x}q_{\theta }(X=x)^{N\cdot p(X=x)}=\exp \log \left[\prod _{x}q_{\theta }(X=x)^{N\cdot p(X=x)}\right]\\&=\exp \left(\sum _{x}N\cdot p(X=x)\log q_{\theta }(X=x)^{}\right),\end{aligned}}} where we have used the calculation rules for the logarithm in the final line. Notice how the exponent contains a − H ( p , q θ ) {\displaystyle -H(p,q_{\theta })} term. Taking the logarithm of both sides gives: log ⁡ L ( θ ; x ) = − N ⋅ H ( p , q θ ) . {\displaystyle \log {\mathcal {L}}(\theta ;{\mathbf {x} })=-N\cdot H(p,q_{\theta }).} Since the logarithm is a monotonically increasing function, the maximizing value of θ {\displaystyle \theta } is unaffected by this final step. Similarly, the maximizing value of θ {\displaystyle \theta } is unaffected by the factor of N {\displaystyle N} . So we observe that the likelihood maximization amounts to minimization of the cross-entropy. == Cross-entropy minimization == Cross-entropy minimization is frequently used in optimization and rare-event probability estimation. When comparing a distribution q {\displaystyle q} against a fixed reference distribution p {\displaystyle p} , cross-entropy and KL divergence are identical up to an additive constant (since p {\displaystyle p} is fixed): According to the Gibbs' inequality, both take on their minimal values when p = q {\displaystyle p=q} , which is 0 {\displaystyle 0} for KL divergence, and H ( p ) {\displaystyle \mathrm {H} (p)} for cross-entropy. In the engineering literature, the principle of minimizing KL divergence (Kullback's "Principle of Minimum Discrimination Information") is often called the Principle of Minimum Cross-Entropy (MCE), or Minxent. However, as discussed in the article Kullback–Leibler divergence, sometimes the distribution q {\displaystyle q} is the fixed prior reference distribution, and the distribution p {\displaystyle p} is optimized to be as close to q {\displaystyle q} as possible, subject to some constraint. In this case the two minimizations are not equivalent. This has led to some ambiguity in the literature, with some authors attempting to resolve the inconsistency by restating cross-entropy to be D K L ( p ∥ q ) {\displaystyle D_{\mathrm {KL} }(p\parallel q)} , rather than H (

    Read more →
  • Almeida–Pineda recurrent backpropagation

    Almeida–Pineda recurrent backpropagation

    Almeida–Pineda recurrent backpropagation is an extension to the backpropagation algorithm that is applicable to recurrent neural networks. It is a type of supervised learning. It was described somewhat cryptically in Richard Feynman's senior thesis, and rediscovered independently in the context of artificial neural networks by both Fernando Pineda and Luis B. Almeida. A recurrent neural network for this algorithm consists of some input units, some output units and eventually some hidden units. For a given set of (input, target) states, the network is trained to settle into a stable activation state with the output units in the target state, based on a given input state clamped on the input units.

    Read more →
  • Log-linear model

    Log-linear model

    A log-linear model is a mathematical model that takes the form of a function whose logarithm equals a linear combination of the parameters of the model, which makes it possible to apply (possibly multivariate) linear regression. That is, it has the general form exp ⁡ ( c + ∑ i w i f i ( X ) ) {\displaystyle \exp \left(c+\sum _{i}w_{i}f_{i}(X)\right)} , in which the fi(X) are quantities that are functions of the variable X, in general a vector of values, while c and the wi stand for the model parameters. The term may specifically be used for: A log-linear plot or graph, which is a type of semi-log plot. Poisson regression for contingency tables, a type of generalized linear model. The specific applications of log-linear models are where the output quantity lies in the range 0 to ∞, for values of the independent variables X, or more immediately, the transformed quantities fi(X) in the range −∞ to +∞. This may be contrasted to logistic models, similar to the logistic function, for which the output quantity lies in the range 0 to 1. Thus the contexts where these models are useful or realistic often depends on the range of the values being modelled.

    Read more →
  • Artificial intelligence and elections

    Artificial intelligence and elections

    As artificial intelligence (AI) has become more mainstream, there is growing concern about how this will influence elections. Potential targets of AI include election processes, election offices, election officials and election vendors. There are also global efforts to improve elections using AI. == Tactics == Generative AI capabilities allow creation of misleading content. Examples of this include text-to-video, deepfake videos, text-to-image, AI-altered images, text-to-speech, voice cloning, and text-to-text. In the context of an election, a deepfake video of a candidate may propagate information that the candidate does not endorse. Chatbots could spread misinformation related to election locations, times or voting methods. In contrast to malicious actors in the past, these techniques require little technical skill and can spread rapidly. LLM-generated messages have the capacity to persuade humans on political issues. Researchers have begun to investigate how people rate messages that LLMs generate for how persuasive they are. When it came to policy issues, the LLM-generated messages received a 2.91 compared to a 2.80 when it came to smartness between the AI and humans. The LLM-generated messages were often more technical and analytical than human-generated messages. Generative AI has been used to micro-target people during tight political elections. The generation of targeted large language models has triggered concern that they will be used to leverage readily scale microtargeting. Rephrasing inputs have been used to generate fraudulent emails and phishing websites. Rephrasing inputs in a microtargeting does not violate the terms of OpenAI usage. There are no safeguards to prevent the use of rephrasing and creation of fraudulent emails. Political campaign managers have access to this allowing for them to create targeted content. == Usage by country == === Argentina === ==== 2023 elections ==== During the 2023 Argentine primary elections, Javier Milei's team distributed AI generated images including a fabricated image of his rival Sergio Massa and drew 3 million views. The team also created an unofficial Instagram account entitled "AI for the Homeland." Sergio Massa's team also distributed AI generated images and videos. === Bangladesh === ==== 2024 elections ==== In the run up to the 2024 Bangladeshi general election, deepfake videos of female opposition politicians appeared. Rumin Farhana was pictured in a bikini while Nipun Ray was shown in a swimming pool. === Canada === ==== 2025 elections ==== In the run up to the 2025 Canadian federal election, the use of AI tools is likely to figure prominently. India, Pakistan and Iran are all expected to make efforts to subvert the national vote using disinformation campaigns to deceive voters and sway diaspora communities. In a report by the Canadian Centre for Cyber Security called "Cyber Threats to Canada's Democratic Process: 2025 Update", it states that malicious actors including China and Russia: "are most likely to use generative AI as a means of creating and spreading disinformation, designed to sow division among Canadians and push narratives conducive to the interests of foreign states". === France === ==== 2024 elections ==== In the 2024 French legislative election, deepfake videos appeared claiming: i) That they showed the family of Marine le Pen. In the videos, young women, supposedly Le Pen's nieces, are seen skiing, dancing and at the beach "while making fun of France’s racial minorities": However, the family members don't exist. On social media there were over 2 million views. ii) In a video seen on social media, a deepfake video of a France24 broadcast appeared to report that the Ukrainian leadership had "tried to lure French president Emmanuel Macron to Ukraine to assassinate him and then blame his death on Russia". === Ghana === ==== 2024 elections ==== During the months before the December 2024 Ghanaian general election, a network of at least 171 fake accounts has been used to spam social media. Posts have been used by a group identified as "@TheTPatriots" to promote the New Patriotic Party, although it is not known whether the two are connected. All the networks' posts were "highly likely" to have been generated by ChatGPT and appear to be the "first secretly partisan network using AI to influence elections in Ghana". The opposition National Democratic Congress was also criticized with its leader John Mahama being called a drunkard. === India === ==== 2024 elections ==== In the 2024 Indian general election, politicians used deepfakes in their campaign materials. These deepfakes included politicians who had died prior to the election. Mathuvel Karunanidhi's party posted with his likeness even though he had died 2018. A video The All-India Anna Dravidian Progressive Federation party posted showed an audio clip of Jayaram Jayalalithaa even though she had died in 2016. The Deepfakes Analysis Unit (DAU) is an open source platform created in March 2024 for the public to share misleading content and assess if it had been AI-generated. AI was also used to translate political speeches in real time. This translating ability was widely used to reach more voters. === Indonesia === ==== 2024 elections ==== In the 2024 Indonesian presidential election, Prabowo Subianto made extensive use of AI-generated art in his campaign, which ranged from images of himself as an adorable child to various child portrayals in his advertisements. The Indonesian Children's Protection Commission condemned these ads, labeling them as a form of misuse. Other candidates, Anies Baswedan and Ganjar Pranowo, also incorporated AI art into their campaigns. Throughout the election period, all presidential candidates faced attacks from deepfakes, both in video and audio formats. === Ireland === ==== 2024 elections ==== In the last weeks of the 2024 Irish general election a spoof election poster appeared in Dublin featuring "an AI-generated candidate with three arms". The candidate is called Aidan Irwin, but no-one stood in the election with that name. A slogan on the poster says "put matters into artificial intelligence’s hands". The convincing election poster shows a man that "has six fingers on one hand, three arms, and a distorted thumb". === New Zealand === ==== 2023 elections ==== In May 2023, ahead of the 2023 New Zealand general election in October 2023, the New Zealand National Party published a "series of AI-generated political advertisements" on its Instagram account. After confirming that the images were faked, a party spokesperson said that it was "an innovative way to drive our social media". === Pakistan === ==== 2024 elections ==== AI has been used by the imprisoned ex-Prime Minister Imran Khan and his media team in the 2024 Pakistani general election: i) An AI generated audio of his voice was added to a video clip and was broadcast at a virtual rally. ii) An op-ed in The Economist written by Khan was later claimed by himself to have been written by AI which was later denied by his team. The article was liked and shared on social media by thousands of users. === South Africa === ==== 2024 elections ==== In the 2024 South African general election, there were several uses of AI content: i) A deepfaked video of Joe Biden emerged on social media showing him saying that "The U.S. would place sanctions on SA and declare it an enemy state if the African National Congress (ANC) won". ii) In a deepfake video, Donald Trump was shown endorsing the uMkhonto weSizwe party. It was posted to social media and was viewed more than 158,000 times. iii) Less than 3 months before the elections, a deepfake video showed U.S. rapper Eminem endorsing the Economic Freedom Fighters party while criticizing the ANC. The deepfake was viewed on social media more than 173,000 times. === South Korea === ==== 2022 elections ==== In the 2022 South Korean presidential election, a committee for one presidential candidate Yoon Suk Yeol released an AI avatar 'Al Yoon Seok-yeol' that would campaign in places the candidate could not go. The other presidential candidate Lee Jae-myung introduced a chatbot that provided information about the candidate's pledges. ==== 2024 elections ==== Deepfakes were used to spread misinformation before the 2024 South Korean legislative election with one source reporting 129 deepfake violations of election laws within a two week period. Seoul hosted the 2024 Summit for Democracy, a virtual gathering of world leaders initiated by US President Joe Biden in 2021. The focus of the summit was on digital threats to democracy including artificial intelligence and deepfakes. === Taiwan === ==== 2024 elections ==== AI-generated content was used during the 2024 Taiwanese presidential election. Among the media were: i) A deepfake video of General Secretary of the Chinese Communist Party Xi Jinping which showed him supporting the presidential elections. Created on social media, the video was "widely circulated

    Read more →
  • Spiking neural network

    Spiking neural network

    Spiking neural networks (SNNs) are artificial neural networks (ANN) that mimic natural neural networks. These models leverage timing of discrete spikes as the main information carrier. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle (as it happens with typical multi-layer perceptron networks), but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model. While spike rates can be considered the analogue of the variable output of a traditional ANN, neurobiology research indicated that high speed processing cannot be performed solely through a rate-based scheme. For example humans can perform an image recognition task requiring no more than 10ms of processing time per neuron through the successive layers (going from the retina to the temporal lobe). This time window is too short for rate-based encoding. The precise spike timings in a small set of spiking neurons also has a higher information coding capacity compared with a rate-based approach. The most prominent spiking neuron model is the leaky integrate-and-fire model. In that model, the momentary activation level (modeled as a differential equation) is normally considered to be the neuron's state, with incoming spikes pushing this value higher or lower, until the state eventually either decays or—if the firing threshold is reached—the neuron fires. After firing, the state variable is reset to a lower value. Various decoding methods exist for interpreting the outgoing spike train as a real-value number, relying on either the frequency of spikes (rate-code), the time-to-first-spike after stimulation, or the interval between spikes. == History == Many multi-layer artificial neural networks are fully connected, receiving input from every neuron in the previous layer and signalling every neuron in the subsequent layer. Although these networks have achieved breakthroughs, they do not match biological networks and do not mimic neurons. The biology-inspired Hodgkin–Huxley model of a spiking neuron was proposed in 1952. This model described how action potentials are initiated and propagated. Communication between neurons, which requires the exchange of chemical neurotransmitters in the synaptic gap, is described in models such as the integrate-and-fire model, FitzHugh–Nagumo model (1961–1962), and Hindmarsh–Rose model (1984). The leaky integrate-and-fire model (or a derivative) is commonly used as it is easier to compute than Hodgkin–Huxley. While the notion of an artificial spiking neural network became popular only in the twenty-first century, studies between 1980 and 1995 supported the concept. The first models of this type of ANN appeared to simulate non-algorithmic intelligent information processing systems. However, the notion of the spiking neural network as a mathematical model was first worked on in the early 1970s. As of 2019 SNNs lagged behind ANNs in accuracy, but the gap is decreasing, and has vanished on some tasks. == Underpinnings == Information in the brain is represented as action potentials (neuron spikes), which may group into spike trains or coordinated waves. A fundamental question of neuroscience is to determine whether neurons communicate by a rate or temporal code. Temporal coding implies that a single spiking neuron can replace hundreds of hidden units on a conventional neural net. SNNs define a neuron's current state as its potential (possibly modeled as a differential equation). An input pulse causes the potential to rise and then gradually decline. Encoding schemes can interpret these pulse sequences as a number, considering pulse frequency and pulse interval. Using the precise time of pulse occurrence, a neural network can consider more information and offer better computing properties. SNNs compute in the continuous domain. Such neurons test for activation only when their potentials reach a certain value. When a neuron is activated, it produces a signal that is passed to connected neurons, accordingly raising or lowering their potentials. The SNN approach produces a continuous output instead of the binary output of traditional ANNs. Pulse trains are not easily interpretable, hence the need for encoding schemes. However, a pulse train representation may be more suited for processing spatiotemporal data (or real-world sensory data classification). SNNs connect neurons only to nearby neurons so that they process input blocks separately (similar to CNN using filters). They consider time by encoding information as pulse trains so as not to lose information. This avoids the complexity of a recurrent neural network (RNN). Impulse neurons are more powerful computational units than traditional artificial neurons. SNNs are theoretically more powerful than so called "second-generation networks" defined as ANNs "based on computational units that apply activation function with a continuous set of possible output values to a weighted sum (or polynomial) of the inputs"; however, SNN training issues and hardware requirements limit their use. Although unsupervised biologically inspired learning methods are available such as Hebbian learning and STDP, no effective supervised training method is suitable for SNNs that can provide better performance than second-generation networks. Spike-based activation of SNNs is not differentiable, thus gradient descent-based backpropagation (BP) is not available. SNNs have much larger computational costs for simulating realistic neural models than traditional ANNs. Pulse-coupled neural networks (PCNN) are often confused with SNNs. A PCNN can be seen as a kind of SNN. Researchers are actively working on various topics. The first concerns differentiability. The expressions for both the forward- and backward-learning methods contain the derivative of the neural activation function which is not differentiable because a neuron's output is either 1 when it spikes, and 0 otherwise. This all-or-nothing behavior disrupts gradients and makes these neurons unsuitable for gradient-based optimization. Approaches to resolving it include: resorting to entirely biologically inspired local learning rules for the hidden units translating conventionally trained "rate-based" NNs to SNNs smoothing the network model to be continuously differentiable defining an SG (Surrogate Gradient) as a continuous relaxation of the real gradients The second concerns the optimization algorithm. Standard BP can be expensive in terms of computation, memory, and communication and may be poorly suited to the hardware that implements it (e.g., a computer, brain, or neuromorphic device). Incorporating additional neuron dynamics such as Spike Frequency Adaptation (SFA) is a notable advance, enhancing efficiency and computational power. These neurons sit between biological complexity and computational complexity. Originating from biological insights, SFA offers significant computational benefits by reducing power usage, especially in cases of repetitive or intense stimuli. This adaptation improves signal/noise clarity and introduces an elementary short-term memory at the neuron level, which in turn, improves accuracy and efficiency. This was mostly achieved using compartmental neuron models. The simpler versions are of neuron models with adaptive thresholds, are an indirect way of achieving SFA. It equips SNNs with improved learning capabilities, even with constrained synaptic plasticity, and elevates computational efficiency. This feature lessens the demand on network layers by decreasing the need for spike processing, thus lowering computational load and memory access time—essential aspects of neural computation. Moreover, SNNs utilizing neurons capable of SFA achieve levels of accuracy that rival those of conventional ANNs, while also requiring fewer neurons for comparable tasks. This efficiency streamlines the computational workflow and conserves space and energy, while maintaining technical integrity. High-performance deep spiking neural networks can operate with 0.3 spikes per neuron. == Applications == SNNs can in principle be applied to the same applications as traditional ANNs. In addition, SNNs can model the central nervous system of biological organisms, such as an insect seeking food without prior knowledge of the environment. Due to their relative realism, they can be used to study biological neural circuits. Starting with a hypothesis about the topology of a biological neuronal circuit and its functi

    Read more →
  • Genotypic and phenotypic repair

    Genotypic and phenotypic repair

    Genotypic and phenotypic repair are optional components of an evolutionary algorithm (EA). An EA reproduces essential elements of biological evolution as a computer algorithm in order to solve demanding optimization or planning tasks, at least approximately. A candidate solution is represented by a - usually linear - data structure that plays the role of an individual's chromosome. New solution candidates are generated by mutation and crossover operators following the example of biology. These offspring may be defective, which is corrected or compensated for by genotypic or phenotypic repair. == Description == Genotypic repair, also known as genetic repair, is the removal or correction of impermissible entries in the chromosome that violate restrictions. In phenotypic repair, the corrections are only made in the genotype-phenotype mapping and the chromosome remains unchanged. Michalewicz wrote about the importance of restrictions in real-world applications: "In general, constraints are an integral part of the formulation of any problem". Restriction violations are application-specific and therefore it depends on the current problem whether and which type of repair is useful. They can usually also be treated by a correspondingly extended evaluation and it depends on the problem which measures are possible and which is the most suitable. If a phenotypic repair is feasible, then it is usually the most efficient compared to the other measures. A survey on repair methods used as constraint handling techniques can be found in. Violations of the range limits of genes should be avoided as far as possible by the formulation of the genome. If this is not possible or if restrictions within the search space defined by the genome are involved, their violations are usually handled by the evaluation. This can be done, for example, by penalty functions that lower the fitness. Repair is often also required for combinatorial tasks. The application of a 1- or n-point crossover operator can, for example, lead to genes being missing in one of the child genomes that are present in duplicate in the other. In this case, a suitable genotypic repair measure is to move the surplus genes to the other genome in a positional manner. The use of the aforementioned operators in combinatorial tasks has also proven to be useful in combination with crossover types specially developed for permutations, at least for certain problems. Particularly in combinatorial problems, it has been observed that genotypic repair can promote premature convergence to a suboptimum, but can also significantly accelerate a successful search. Studies on various tasks have shown that this is application-dependent. An effective measure to avoid premature convergence is generally the use of structured populations instead of the usual panmictic ones. Sequence restrictions play a role in many scheduling tasks, for example when it comes to planning workflows. If, for example, it is specified that step A must be carried out before step B and the gene of step B is located before the gene of A in the chromosome, then there is an impermissible gene sequence. This is because the scheduling operation of step B requires the planned end of step A for correct scheduling, but this is not yet scheduled at the time gene B is processed. The problem can be solved in two ways: The scheduling operation of step B is postponed until the gene from step A has been processed. The genome remains unchanged and the repair only influences the genotype-phenotype mapping. Since only the phenotype is changed, this is referred to as phenotypic repair. If, on the other hand, the gene of step B is moved behind the gene of step A, this is a genotypic repair. The same applies to the alternative shift of gene A in front of gene B. In this case, genotypic repair has the disadvantage that it prevents a meaningful restructuring of the gene sequence in the chromosome if this requires several intermediate steps (mutations) that at least partially violate restrictions.

    Read more →