AI Chatbot Companion

AI Chatbot Companion — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Pixel-art scaling algorithms

    Pixel-art scaling algorithms

    Pixel art scaling algorithms are graphical filters that attempt to enhance the appearance of hand-drawn 2D pixel art graphics. These algorithms are a form of automatic image enhancement. Pixel art scaling algorithms employ methods significantly different than the common methods of image rescaling, which have the goal of preserving the appearance of images. As pixel art graphics are commonly used at very low resolutions, they employ careful coloring of individual pixels. This results in graphics that rely on a high amount of stylized visual cues to define complex shapes. Several specialized algorithms have been developed to handle re-scaling of such graphics. These specialized algorithms can improve the appearance of pixel-art graphics, but in doing so they introduce changes. Such changes may be undesirable, especially if the goal is to faithfully reproduce the original appearance. Since a typical application of this technology is improving the appearance of fourth-generation and earlier video games on arcade and console emulators, many pixel art scaling algorithms are designed to run in real-time for sufficiently small input images at 60-frames per second. This places constraints on the type of programming techniques that can be used for this sort of real-time processing. Many work only on specific scale factors. 2× is the most common scale factor, while 3×, 4×, 5×, and 6× exist but are less used. == Algorithms == === SAA5050 'Diagonal Smoothing' === The Mullard SAA5050 Teletext character generator chip (1980) used a primitive pixel scaling algorithm to generate higher-resolution characters on the screen from a lower-resolution representation from its internal ROM. Internally, each character shape was defined on a 5 × 9 pixel grid, which was then interpolated by smoothing diagonals to give a 10 × 18 pixel character, with a characteristically angular shape, surrounded to the top and the left by two pixels of blank space. The algorithm only works on monochrome source data, and assumes the source pixels will be logically true or false depending on whether they are 'on' or 'off'. Pixels 'outside the grid pattern' are assumed to be off. The algorithm works as follows: A B C --\ 1 2 D E F --/ 3 4 1 = B | (A & E & !B & !D) 2 = B | (C & E & !B & !F) 3 = E | (!A & !E & B & D) 4 = E | (!C & !E & B & F) Note that this algorithm, like the Eagle algorithm below, has a flaw: If a pattern of 4 pixels in a hollow diamond shape appears, the hollow will be obliterated by the expansion. The SAA5050's internal character ROM carefully avoids ever using this pattern. The degenerate case: becomes: === EPX/Scale2×/AdvMAME2× === Eric's Pixel Expansion (EPX) is an algorithm developed by Eric Johnston at LucasArts around 1992, when porting the SCUMM engine games from the IBM PC (which ran at 320 × 200 × 256 colors) to the early color Macintosh computers, which ran at more or less double that resolution. The algorithm works as follows, expanding P into 4 new pixels based on P's surroundings: 1=P; 2=P; 3=P; 4=P; IF C==A => 1=A IF A==B => 2=B IF D==C => 3=C IF B==D => 4=D IF of A, B, C, D, three or more are identical: 1=2=3=4=P Later implementations of this same algorithm (as AdvMAME2× and Scale2×, developed around 2001) are slightly more efficient but functionally identical: 1=P; 2=P; 3=P; 4=P; IF C==A AND C!=D AND A!=B => 1=A IF A==B AND A!=C AND B!=D => 2=B IF D==C AND D!=B AND C!=A => 3=C IF B==D AND B!=A AND D!=C => 4=D AdvMAME2× is available in DOSBox via the scaler=advmame2x dosbox.conf option. The AdvMAME4×/Scale4× algorithm is just EPX applied twice to get 4× resolution. ==== Scale3×/AdvMAME3× and ScaleFX ==== The AdvMAME3×/Scale3× algorithm (available in DOSBox via the scaler=advmame3x dosbox.conf option) can be thought of as a generalization of EPX to the 3× case. The corner pixels are calculated identically to EPX. 1=E; 2=E; 3=E; 4=E; 5=E; 6=E; 7=E; 8=E; 9=E; IF D==B AND D!=H AND B!=F => 1=D IF (D==B AND D!=H AND B!=F AND E!=C) OR (B==F AND B!=D AND F!=H AND E!=A) => 2=B IF B==F AND B!=D AND F!=H => 3=F IF (H==D AND H!=F AND D!=B AND E!=A) OR (D==B AND D!=H AND B!=F AND E!=G) => 4=D 5=E IF (B==F AND B!=D AND F!=H AND E!=I) OR (F==H AND F!=B AND H!=D AND E!=C) => 6=F IF H==D AND H!=F AND D!=B => 7=D IF (F==H AND F!=B AND H!=D AND E!=G) OR (H==D AND H!=F AND D!=B AND E!=I) => 8=H IF F==H AND F!=B AND H!=D => 9=F There is also a variant improved over Scale3× called ScaleFX, developed by Sp00kyFox, and a version combined with Reverse-AA called ScaleFX-Hybrid. === Eagle === Eagle works as follows: for every in pixel, we will generate 4 out pixels. First, set all 4 to the color of the pixel we are currently scaling (as nearest-neighbor). Next look at the three pixels above, to the left, and diagonally above left: if all three are the same color as each other, set the top left pixel of our output square to that color in preference to the nearest-neighbor color. Work similarly for all four pixels, and then move to the next one. Assume an input matrix of 3 × 3 pixels where the centermost pixel is the pixel to be scaled, and an output matrix of 2 × 2 pixels (i.e., the scaled pixel) first: |Then . . . --\ CC |S T U --\ 1 2 . C . --/ CC |V C W --/ 3 4 . . . |X Y Z | IF V==S==T => 1=S | IF T==U==W => 2=U | IF V==X==Y => 3=X | IF W==Z==Y => 4=Z Thus if we have a single black pixel on a white background it will vanish. This is a bug in the Eagle algorithm but is solved by other algorithms such as EPX, 2xSaI, and HQ2x. === 2×SaI === 2×SaI, short for 2× Scale and Interpolation engine, was inspired by Eagle. It was designed by Derek Liauw Kie Fa, also known as Kreed, primarily for use in console and computer emulators, and it has remained fairly popular in this niche. Many of the most popular emulators, including ZSNES and VisualBoyAdvance, offer this scaling algorithm as a feature. Several slightly different versions of the scaling algorithm are available, and these are often referred to as Super 2×SaI and Super Eagle. The 2xSaI family works on a 4 × 4 matrix of pixels where the pixel marked A below is scaled: I E F J G A B K --\ W X H C D L --/ Y Z M N O P For 16-bit pixels, they use pixel masks which change based on whether the 16-bit pixel format is 565 or 555. The constants colorMask, lowPixelMask, qColorMask, qLowPixelMask, redBlueMask, and greenMask are 16-bit masks. The lower 8 bits are identical in either pixel format. Two interpolation functions are described: INTERPOLATE(uint32 A, UINT32 B). -- linear midpoint of A and B if (A == B) return A; return ( ((A & colorMask) >> 1) + ((B & colorMask) >> 1) + (A & B & lowPixelMask) ); Q_INTERPOLATE(uint32 A, uint32 B, uint32 C, uint32 D) -- bilinear interpolation; A, B, C, and D's average x = ((A & qColorMask) >> 2) + ((B & qColorMask) >> 2) + ((C & qColorMask) >> 2) + ((D & qColorMask) >> 2); y = (A & qLowPixelMask) + (B & qLowPixelMask) + (C & qLowPixelMask) + (D & qLowPixelMask); y = (y >> 2) & qLowPixelMask; return x + y; The algorithm checks A, B, C, and D for a diagonal match such that A==D and B!=C, or the other way around, or if they are both diagonals or if there is no diagonal match. Within these, it checks for three or four identical pixels. Based on these conditions, the algorithm decides whether to use one of A, B, C, or D, or an interpolation among only these four, for each output pixel. The 2xSaI arbitrary scaler can enlarge any image to any resolution and uses bilinear filtering to interpolate pixels. Since Kreed released the source code under the GNU General Public License, it is freely available to anyone wishing to utilize it in a project released under that license. Developers wishing to use it in a non-GPL project would be required to rewrite the algorithm without using any of Kreed's existing code. It is available in DOSBox via scaler=2xsai option. === hqnx family === Maxim Stepin's hq2x, hq3x, and hq4x are for scale factors of 2:1, 3:1, and 4:1 respectively. Each work by comparing the color value of each pixel to those of its eight immediate neighbors, marking the neighbors as close or distant, and using a pre-generated lookup table to find the proper proportion of input pixels' values for each of the 4, 9 or 16 corresponding output pixels. The hq3x family will perfectly smooth any diagonal line whose slope is ±0.5, ±1, or ±2 and which is not anti-aliased in the input; one with any other slope will alternate between two slopes in the output. It will also smooth very tight curves. Unlike 2xSaI, it anti-aliases the output. hqnx was initially created for the Super NES emulator ZSNES. The author of bsnes has released a space-efficient implementation of hq2x to the public domain. A port to shaders, which has comparable quality to the early versions of xBR, is available. Before the port, a shader called "scalehq" has often been confused for hqx. === xBR family === There are 6 filters in this family: xBR , xBRZ, xBR-Hybrid, Super xBR, xBR+3D and Super xBR+3D. xBR ("scale by rules"), cre

    Read more →
  • FastICA

    FastICA

    FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration. == Algorithm == === Prewhitening the data === Let the X := ( x i j ) ∈ R N × M {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}} denote the input data matrix, M {\displaystyle M} the number of columns corresponding with the number of samples of mixed signals and N {\displaystyle N} the number of rows corresponding with the number of independent source signals. The input data matrix X {\displaystyle \mathbf {X} } must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it. Centering the data entails demeaning each component of the input data X {\displaystyle \mathbf {X} } , that is, for each i = 1 , … , N {\displaystyle i=1,\ldots ,N} and j = 1 , … , M {\displaystyle j=1,\ldots ,M} . After centering, each row of X {\displaystyle \mathbf {X} } has an expected value of 0 {\displaystyle 0} . Whitening the data requires a linear transformation L : R N × M → R N × M {\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}} of the centered data so that the components of L ( X ) {\displaystyle \mathbf {L} (\mathbf {X} )} are uncorrelated and have variance one. More precisely, if X {\displaystyle \mathbf {X} } is a centered data matrix, the covariance of L x := L ( X ) {\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )} is the ( N × N ) {\displaystyle (N\times N)} -dimensional identity matrix, that is, A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data X {\displaystyle \mathbf {X} } , E { X X T } = E D E T {\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}} , where E {\displaystyle \mathbf {E} } is the matrix of eigenvectors and D {\displaystyle \mathbf {D} } is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by === Single component extraction === The iterative algorithm finds the direction for the weight vector w ∈ R N {\displaystyle \mathbf {w} \in \mathbb {R} ^{N}} that maximizes a measure of non-Gaussianity of the projection w T X {\displaystyle \mathbf {w} ^{T}\mathbf {X} } , with X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} denoting a prewhitened data matrix as described above. Note that w {\displaystyle \mathbf {w} } is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function f ( u ) {\displaystyle f(u)} , its first derivative g ( u ) {\displaystyle g(u)} , and its second derivative g ′ ( u ) {\displaystyle g^{\prime }(u)} . Hyvärinen states that the functions are useful for general purposes, while may be highly robust. The steps for extracting the weight vector w {\displaystyle \mathbf {w} } for single component in FastICA are the following: Randomize the initial weight vector w {\displaystyle \mathbf {w} } Let w + ← E { X g ( w T X ) T } − E { g ′ ( w T X ) } w {\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} } , where E { . . . } {\displaystyle E\left\{...\right\}} means averaging over all column-vectors of matrix X {\displaystyle \mathbf {X} } Let w ← w + / ‖ w + ‖ {\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|} If not converged, go back to 2 === Multiple component extraction === The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, 1 M {\displaystyle \mathbf {1_{M}} } is a column vector of 1's of dimension M {\displaystyle M} . Algorithm FastICA Input: C {\displaystyle C} Number of desired components Input: X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} Prewhitened matrix, where each column represents an N {\displaystyle N} -dimensional sample, where C <= N {\displaystyle C<=N} Output: W ∈ R N × C {\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}} Un-mixing matrix where each column projects X {\displaystyle \mathbf {X} } onto independent component. Output: S ∈ R C × M {\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}} Independent components matrix, with M {\displaystyle M} columns representing a sample with C {\displaystyle C} dimensions. for p in 1 to C: w p ← {\displaystyle \mathbf {w_{p}} \leftarrow } Random vector of length N while w p {\displaystyle \mathbf {w_{p}} } changes w p ← 1 M X g ( w p T X ) T − 1 M g ′ ( w p T X ) 1 M w p {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1_{M}} \mathbf {w_{p}} } w p ← w p − ∑ j = 1 p − 1 ( w p T w j ) w j {\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}(\mathbf {w_{p}} ^{T}\mathbf {w_{j}} )\mathbf {w_{j}} } w p ← w p ‖ w p ‖ {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}} output W ← [ w 1 , … , w C ] {\displaystyle \mathbf {W} \leftarrow {\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}} output S ← W T X {\displaystyle \mathbf {S} \leftarrow \mathbf {W^{T}} \mathbf {X} }

    Read more →
  • TabPFN

    TabPFN

    TabPFN (Tabular Prior-data Fitted Network) is a machine learning model for tabular datasets proposed in 2022. It uses a transformer architecture. It is intended for supervised classification and regression analysis on tabular datasets, particularly focusing on small- to medium-sized datasets. The latest version, TabPFN-3, was released in May 2026 and supports datasets with up to one million rows and 200 features. == History == TabPFN was first introduced in a 2022 pre-print and presented at ICLR 2023. TabPFN v2 was published in 2025 in Nature by Hollmann and co-authors. The source code is published on GitHub under a modified Apache License and on PyPi. Writing for ICLR blogs, McCarter states that the model has attracted attention due to its performance on small dataset benchmarks. TabPFN v2.5 was released on November 6, 2025. TabPFN-3 was released on May 12, 2026. Prior Labs, founded in 2024, aims to commercialize TabPFN. As of April 2026, the open-source TabPFN repository had more than 6,000 stars on GitHub. == Overview and pre-training == TabPFN supports classification, regression and generative tasks. It leverages "Prior-Data Fitted Networks" models to model tabular data. By using a transformer pre-trained on synthetic tabular datasets, TabPFN avoids benchmark contamination and costs of curating real-world data. TabPFN v2 was pre-trained on approximately 130 million such datasets. Synthetic datasets are generated using causal models or Bayesian neural networks; this can include simulating missing values, imbalanced data, and noise. Random inputs are passed through these models to generate outputs, with a bias towards simpler causal structures. During pre-training, TabPFN predicts the masked target values of new data points given training data points and their known targets, effectively learning a generic learning algorithm that is executed by running a neural network forward pass. The new dataset is then processed in a single forward pass without retraining. The model's transformer encoder processes features and labels by alternating attention across rows and columns. TabPFN v2 handles numerical and categorical features, missing values, and supports tasks like regression and synthetic data generation, while TabPFN-2.5 scales this approach to datasets with up to 50,000 rows and 2,000 features. TabPFN-3 introduced a redesigned architecture with row-compression, an attention-based many-class decoder, native missing-value handling, and inference optimizations such as row chunking and a reduced key-value cache, with benchmark-validated regimes of up to 1 million rows with 200 features, 100,000 rows with 2,000 features, or 1,000 rows with 20,000 features. Since TabPFN is pre-trained, in contrast to other deep learning methods, it does not require costly hyperparameter optimization. == Research == TabPFN is the subject of on-going research. Applications for TabPFN have been investigated for domains such as chemoproteomics, insurance risk classification, and metagenomics. In clinical research, TabPFN was used in a study on the early detection of pancreatic cancer from blood samples, where it was combined with metabolomic data and reported high diagnostic performance. == Applications == TabPFN has been used in industrial and biomedical contexts. Hitachi Ltd. has been reported to use the model for predictive maintenance in rail networks, with its use described as helping to identify track issues earlier and reduce manual inspections. In the biomedical domain, Oxford Cancer Analytics has used TabPFN in the analysis of proteomic data in lung disease research. A 2025 ML Contests report noted that the winners of DrivenData's PREPARE challenge used TabPFN to generate features for gradient-boosted decision tree models. == Limitations == TabPFN has been criticized for its "one large neural network is all you need" approach to modeling problems. Further, its performance is limited in high-dimensional and large-scale datasets. == Scaling Mode == In late November 2025, Prior Labs introduced ‘‘Scaling Mode’’, an operating mode for TabPFN designed to remove the fixed upper bound on training set size. Earlier versions of TabPFN had been optimized and validated primarily for datasets of up to 100,000 rows, whereas Scaling Mode was reported to extend support to substantially larger datasets, with benchmarked experiments on datasets containing up to 10 million rows. According to Prior Labs, Scaling Mode preserves the existing TabPFN architecture, including its alternating row-attention and column-attention design, as well as the same feature-count limits as prior releases. Inference remains based on a single forward pass rather than dataset-specific gradient-descent training, while scalability is described as being constrained primarily by available compute and memory resources. Prior Labs reported benchmark results on an internal collection of datasets ranging from 1 million to 10 million rows across industry and scientific applications. In these benchmarks, Scaling Mode was compared with CatBoost, XGBoost, LightGBM, and TabPFN 2.5 using 50,000-row subsampling. The company stated that predictive performance improved monotonically with increasing training set size and that no diminishing returns from scaling were observed within the tested range. Prior Labs also announced the release through company and executive social media channels. TabPFN-3 later incorporated related scaling improvements, including row chunking and a reduced key-value cache, into the model architecture and inference pipeline.

    Read more →
  • Synaptic weight

    Synaptic weight

    In neuroscience and computer science, synaptic weight refers to the strength or amplitude of a connection between two nodes, corresponding in biology to the amount of influence the firing of one neuron has on another. The term is typically used in artificial and biological neural network research. == Computation == In a computational neural network, a vector or set of inputs x {\displaystyle {\textbf {x}}} and outputs y {\displaystyle {\textbf {y}}} , or pre- and post-synaptic neurons respectively, are interconnected with synaptic weights represented by the matrix w {\displaystyle w} , where for a linear neuron y j = ∑ i w i j x i or y = w x {\displaystyle y_{j}=\sum _{i}w_{ij}x_{i}~~{\textrm {or}}~~{\textbf {y}}=w{\textbf {x}}} . where the rows of the synaptic matrix represent the vector of synaptic weights for the output indexed by j {\displaystyle j} . The synaptic weight is changed by using a learning rule, the most basic of which is Hebb's rule, which is usually stated in biological terms as Neurons that fire together, wire together. Computationally, this means that if a large signal from one of the input neurons results in a large signal from one of the output neurons, then the synaptic weight between those two neurons will increase. The rule is unstable, however, and is typically modified using such variations as Oja's rule, radial basis functions or the backpropagation algorithm. == Biology == For biological networks, the effect of synaptic weights is not as simple as for linear neurons or Hebbian learning. However, biophysical models such as BCM theory have seen some success in mathematically describing these networks. In the mammalian central nervous system, signal transmission is carried out by interconnected networks of nerve cells, or neurons. For the basic pyramidal neuron, the input signal is carried by the axon, which releases neurotransmitter chemicals into the synapse which is picked up by the dendrites of the next neuron, which can then generate an action potential which is analogous to the output signal in the computational case. The synaptic weight in this process is determined by several variable factors: How well the input signal propagates through the axon (see myelination), The amount of neurotransmitter released into the synapse and the amount that can be absorbed in the following cell (determined by the number of AMPA and NMDA receptors on the cell membrane and the amount of intracellular calcium and other ions), The number of such connections made by the axon to the dendrites, How well the signal propagates and integrates in the postsynaptic cell. The changes in synaptic weight that occur is known as synaptic plasticity, and the process behind long-term changes (long-term potentiation and depression) is still poorly understood. Hebb's original learning rule was originally applied to biological systems, but has had to undergo many modifications as a number of theoretical and experimental problems came to light.

    Read more →
  • Artificial intelligence

    Artificial intelligence

    Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. High-profile applications of AI include advanced web search engines, chatbots, virtual assistants, autonomous vehicles, and play and analysis in strategy games (e.g., chess and Go). Since the 2020s, generative AI has become widely available to generate images, audio, and videos from text prompts. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, and perception, as well as support for robotics. To reach these goals, AI researchers have used techniques including state space search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics. AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields. Some companies, such as OpenAI, Google DeepMind and Meta, aim to create artificial general intelligence (AGI) – AI that can complete virtually any cognitive task at least as well as a human. Artificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism throughout its history, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest increased substantially after 2012, when graphics processing units began being used to accelerate neural networks, and deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture. In the 2020s, an AI boom has coincided with advances in generative AI, which allowed for the creation and modification of media. In addition to AI safety and unintended consequences and harms from the use of AI, ethical concerns, AI's long-term effects, and potential existential risks have prompted discussions of AI regulation. == Goals == The general problem of simulating (or creating) intelligence has been broken into subproblems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research. === Reasoning and problem-solving === Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions. By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics. Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": They become exponentially slower as the problems grow. Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments. Accurate and efficient reasoning is an unsolved problem. === Knowledge representation === Knowledge representation and knowledge engineering allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval, scene interpretation, clinical decision support, knowledge discovery (mining "interesting" and actionable inferences from large databases), and other areas. A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge. Knowledge bases need to represent things such as objects, properties, categories, and relations between objects; situations, events, states, and time; causes and effects; knowledge about knowledge (what we know about what other people know); default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing); and many other aspects and domains of knowledge. Among the most difficult problems in knowledge representation are the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous); and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally). There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications. === Planning and decision-making === An "agent" is any entity (artificial or not) that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen. In automated planning, the agent has a specific goal. In automated decision-making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility. In classical planning, the agent knows exactly what the effect of any action will be. In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked. Alongside thorough testing and improvement based on previous decisions, having an explanation for why the agent took certain decisions is a way to build trust, especially when the decisions have to be relied upon. In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning), or the agent can seek information to improve its preferences. Information value theory can be used to weigh the value of exploratory or experimental actions. The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be. A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned. Game theory describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents. === Learning === Machine learning is the study of programs that can improve their performance on a given task automatically. It has been a part of AI from the beginning. There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance. Supervised learning requires labeling the training data with the expected answers, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input). In reinforcement learning, the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good". Transfer learning is when the knowledge gained from one problem is applied to a new problem. Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning. Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization. === Natural language processing === Natural language processing (NLP) allows programs to read, write and communicate in human languages. Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering. Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation unless

    Read more →
  • Sample exclusion dimension

    Sample exclusion dimension

    In computational learning theory, sample exclusion dimensions arise in the study of exact concept learning with queries. In algorithmic learning theory, a concept over a domain X is a Boolean function over X. Here we only consider finite domains. A partial approximation S of a concept c is a Boolean function over Y ⊆ X {\displaystyle Y\subseteq X} such that c is an extension to S. Let C be a class of concepts and c be a concept (not necessarily in C). Then a specifying set for c w.r.t. C, denoted by S is a partial approximation S of c such that C contains at most one extension to S. If we have observed a specifying set for some concept w.r.t. C, then we have enough information to verify a concept in C with at most one more mind change. The exclusion dimension, denoted by XD(C), of a concept class is the maximum of the size of the minimum specifying set of c' with respect to C, where c' is a concept not in C.

    Read more →
  • LogitBoost

    LogitBoost

    In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost function of logistic regression, one can derive the LogitBoost algorithm. == Minimizing the LogitBoost cost function == LogitBoost can be seen as a convex optimization. Specifically, given that we seek an additive model of the form f = ∑ t α t h t {\displaystyle f=\sum _{t}\alpha _{t}h_{t}} the LogitBoost algorithm minimizes the logistic loss: ∑ i log ⁡ ( 1 + e − y i f ( x i ) ) {\displaystyle \sum _{i}\log \left(1+e^{-y_{i}f(x_{i})}\right)}

    Read more →
  • Spiking neural network

    Spiking neural network

    Spiking neural networks (SNNs) are artificial neural networks (ANN) that mimic natural neural networks. These models leverage timing of discrete spikes as the main information carrier. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle (as it happens with typical multi-layer perceptron networks), but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model. While spike rates can be considered the analogue of the variable output of a traditional ANN, neurobiology research indicated that high speed processing cannot be performed solely through a rate-based scheme. For example humans can perform an image recognition task requiring no more than 10ms of processing time per neuron through the successive layers (going from the retina to the temporal lobe). This time window is too short for rate-based encoding. The precise spike timings in a small set of spiking neurons also has a higher information coding capacity compared with a rate-based approach. The most prominent spiking neuron model is the leaky integrate-and-fire model. In that model, the momentary activation level (modeled as a differential equation) is normally considered to be the neuron's state, with incoming spikes pushing this value higher or lower, until the state eventually either decays or—if the firing threshold is reached—the neuron fires. After firing, the state variable is reset to a lower value. Various decoding methods exist for interpreting the outgoing spike train as a real-value number, relying on either the frequency of spikes (rate-code), the time-to-first-spike after stimulation, or the interval between spikes. == History == Many multi-layer artificial neural networks are fully connected, receiving input from every neuron in the previous layer and signalling every neuron in the subsequent layer. Although these networks have achieved breakthroughs, they do not match biological networks and do not mimic neurons. The biology-inspired Hodgkin–Huxley model of a spiking neuron was proposed in 1952. This model described how action potentials are initiated and propagated. Communication between neurons, which requires the exchange of chemical neurotransmitters in the synaptic gap, is described in models such as the integrate-and-fire model, FitzHugh–Nagumo model (1961–1962), and Hindmarsh–Rose model (1984). The leaky integrate-and-fire model (or a derivative) is commonly used as it is easier to compute than Hodgkin–Huxley. While the notion of an artificial spiking neural network became popular only in the twenty-first century, studies between 1980 and 1995 supported the concept. The first models of this type of ANN appeared to simulate non-algorithmic intelligent information processing systems. However, the notion of the spiking neural network as a mathematical model was first worked on in the early 1970s. As of 2019 SNNs lagged behind ANNs in accuracy, but the gap is decreasing, and has vanished on some tasks. == Underpinnings == Information in the brain is represented as action potentials (neuron spikes), which may group into spike trains or coordinated waves. A fundamental question of neuroscience is to determine whether neurons communicate by a rate or temporal code. Temporal coding implies that a single spiking neuron can replace hundreds of hidden units on a conventional neural net. SNNs define a neuron's current state as its potential (possibly modeled as a differential equation). An input pulse causes the potential to rise and then gradually decline. Encoding schemes can interpret these pulse sequences as a number, considering pulse frequency and pulse interval. Using the precise time of pulse occurrence, a neural network can consider more information and offer better computing properties. SNNs compute in the continuous domain. Such neurons test for activation only when their potentials reach a certain value. When a neuron is activated, it produces a signal that is passed to connected neurons, accordingly raising or lowering their potentials. The SNN approach produces a continuous output instead of the binary output of traditional ANNs. Pulse trains are not easily interpretable, hence the need for encoding schemes. However, a pulse train representation may be more suited for processing spatiotemporal data (or real-world sensory data classification). SNNs connect neurons only to nearby neurons so that they process input blocks separately (similar to CNN using filters). They consider time by encoding information as pulse trains so as not to lose information. This avoids the complexity of a recurrent neural network (RNN). Impulse neurons are more powerful computational units than traditional artificial neurons. SNNs are theoretically more powerful than so called "second-generation networks" defined as ANNs "based on computational units that apply activation function with a continuous set of possible output values to a weighted sum (or polynomial) of the inputs"; however, SNN training issues and hardware requirements limit their use. Although unsupervised biologically inspired learning methods are available such as Hebbian learning and STDP, no effective supervised training method is suitable for SNNs that can provide better performance than second-generation networks. Spike-based activation of SNNs is not differentiable, thus gradient descent-based backpropagation (BP) is not available. SNNs have much larger computational costs for simulating realistic neural models than traditional ANNs. Pulse-coupled neural networks (PCNN) are often confused with SNNs. A PCNN can be seen as a kind of SNN. Researchers are actively working on various topics. The first concerns differentiability. The expressions for both the forward- and backward-learning methods contain the derivative of the neural activation function which is not differentiable because a neuron's output is either 1 when it spikes, and 0 otherwise. This all-or-nothing behavior disrupts gradients and makes these neurons unsuitable for gradient-based optimization. Approaches to resolving it include: resorting to entirely biologically inspired local learning rules for the hidden units translating conventionally trained "rate-based" NNs to SNNs smoothing the network model to be continuously differentiable defining an SG (Surrogate Gradient) as a continuous relaxation of the real gradients The second concerns the optimization algorithm. Standard BP can be expensive in terms of computation, memory, and communication and may be poorly suited to the hardware that implements it (e.g., a computer, brain, or neuromorphic device). Incorporating additional neuron dynamics such as Spike Frequency Adaptation (SFA) is a notable advance, enhancing efficiency and computational power. These neurons sit between biological complexity and computational complexity. Originating from biological insights, SFA offers significant computational benefits by reducing power usage, especially in cases of repetitive or intense stimuli. This adaptation improves signal/noise clarity and introduces an elementary short-term memory at the neuron level, which in turn, improves accuracy and efficiency. This was mostly achieved using compartmental neuron models. The simpler versions are of neuron models with adaptive thresholds, are an indirect way of achieving SFA. It equips SNNs with improved learning capabilities, even with constrained synaptic plasticity, and elevates computational efficiency. This feature lessens the demand on network layers by decreasing the need for spike processing, thus lowering computational load and memory access time—essential aspects of neural computation. Moreover, SNNs utilizing neurons capable of SFA achieve levels of accuracy that rival those of conventional ANNs, while also requiring fewer neurons for comparable tasks. This efficiency streamlines the computational workflow and conserves space and energy, while maintaining technical integrity. High-performance deep spiking neural networks can operate with 0.3 spikes per neuron. == Applications == SNNs can in principle be applied to the same applications as traditional ANNs. In addition, SNNs can model the central nervous system of biological organisms, such as an insect seeking food without prior knowledge of the environment. Due to their relative realism, they can be used to study biological neural circuits. Starting with a hypothesis about the topology of a biological neuronal circuit and its functi

    Read more →
  • Web-based simulation

    Web-based simulation

    Web-based simulation (WBS) is the invocation of computer simulation services over the World Wide Web, specifically through a web browser. Increasingly, the web is being looked upon as an environment for providing modeling and simulation applications, and as such, is an emerging area of investigation within the simulation community. == Application == Web-based simulation is used in several contexts: In e-learning, various principles can quickly be illustrated to students by means of interactive computer animations, for example during lecture demonstrations and computer exercises. In distance learning, web-based simulation may provide an alternative to installing expensive simulation software on the student computer, or an alternative to expensive laboratory equipment. In software engineering, web-based emulation allows application development and testing on one platform for other target platforms, for example for various mobile operating systems or mobile web browsers, without the need of target hardware or locally installed emulation software. In online computer games, 3D environments can be simulated, and old home computers and video game consoles can be emulated, allowing the user to play old computer games in the web browser. In medical education, nurse education and allied health education (like sonographer training), web-based simulations can be used for learning and practicing clinical healthcare procedures. Web-based procedural simulations emphasize the cognitive elements such as the steps of the procedure, the decisions, the tools/devices to be used, and the correct anatomical location. == Client-side vs server-side approaches == Web-based simulation can take place either on the server side or on the client side. In server-side simulation, the numerical calculations and visualization (generation of plots and other computer graphics) is carried out on the web server, while the interactive graphical user interface (GUI) often partly is provided by the client-side, for example using server-side scripting such as PHP or CGI scripts, interactive services based on Ajax or a conventional application software remotely accessed through a VNC Java applet. In client-side simulation, the simulation program is downloaded from the server side but completely executed on the client side, for example using Java applets, Flash animations, JavaScript, or some mathematical software viewer plug-in. Server-side simulation is not scalable for many simultaneous users, but places fewer demands on the user computer performance and web-browser plug-ins than client-side simulation. The term on-line simulation sometimes refers to server-side web-based simulation, sometimes to symbiotic simulation, i.e. a simulation that interacts in real-time with a physical system. The upcoming cloud-computing technologies can be used for new server-side simulation approaches. For instance, there are multi-agent-simulation applications which are deployed on cloud-computing instances and act independently. This allows simulations to be highly scalable. == Existing tools == AgentSheets – graphically programmed tool for creating web-based The Sims-like simulation games, and for teaching beginner students programming. AnyLogic – a graphically programmed tool that generates Java code for discrete-event simulation, system dynamics and agent-based models Easy Java Simulations – a tool for modelling and visualization of physical phenomenons, that automatically generates Java code from mathematical expressions. ExploreLearning Gizmos – a large library of interactive online simulations for math and science education in grades 3–12. FreeFem++ Javascript Version – FreeFem++ is a free and open source PDE solver using the finite element method. GNU Octave web interfaces – MATLAB compatible open-source software Lanner Group Ltd L-SIM Server – Java-based discrete-event simulation engine which supports model standards such as BPMN 2.0 Nanohub – web 2.0 in-browser interactive simulation of nanotechnology NetLogo – a multi-agent programming language and integrated modeling environment that runs on the Java Virtual Machine OpenPlaG – PHP-based function graph plotter for the use on websites OpenEpi – web-based packet of tools for biostatistics Recursive Porous Agent Simulation Toolkit (Repast) – agent-based modeling and simulation toolkit implemented in Java and many other languages SageMath – open-source numerical-analysis software with web interface, based on the Python programming language SimScale – web-based simulation platform supporting computational fluid dynamics, solid mechanics, and thermodynamics StarLogo – agent-based simulation language written in Java. VisSim viewer – graphically programmed data-flow diagrams for simulation of dynamical systems webMathematica and Mathematica Player – a computer algebra system and programming language. VisualSim Architect – VisualSim Explorer enables system-level models to be embedded in documents for viewing, simulation and analysis from within a web browser without any local software installation.

    Read more →
  • Minimum Population Search

    Minimum Population Search

    In evolutionary computation, Minimum Population Search (MPS) is a computational method that optimizes a problem by iteratively trying to improve a set of candidate solutions with regard to a given measure of quality. It solves a problem by evolving a small population of candidate solutions by means of relatively simple arithmetical operations. MPS is a metaheuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. For problems where finding the precise global optimum is less important than finding an acceptable local optimum in a fixed amount of time, using a metaheuristic such as MPS may be preferable to alternatives such as brute-force search or gradient descent. MPS is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means MPS does not require for the optimization problem to be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods. MPS can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. == Background == In a similar way to Differential evolution, MPS uses difference vectors between the members of the population in order to generate new solutions. It attempts to provide an efficient use of function evaluations by maintaining a small population size. If the population size is smaller than the dimensionality of the search space, then the solutions generated through difference vectors will be constrained to the n − 1 {\displaystyle n-1} dimensional hyperplane. A smaller population size will lead to a more restricted subspace. With a population size equal to the dimensionality of the problem ( n = d ) {\displaystyle (n=d)} , the “line/hyperplane points” in MPS will be generated within a d − 1 {\displaystyle d-1} dimensional hyperplane. Taking a step orthogonal to this hyperplane will allow the search process to cover all the dimensions of the search space. Population size is a fundamental parameter in the performance of population-based heuristics. Larger populations promote exploration, but they also allow fewer generations, and this can reduce the chance of convergence. Searching with a small population can increase the chances of convergence and the efficient use of function evaluations, but it can also induce the risk of premature convergence. If the risk of premature convergence can be avoided, then a population-based heuristic could benefit from the efficiency and faster convergence rate of a smaller population. To avoid premature convergence, it is important to have a diversified population. By including techniques for explicitly increasing diversity and exploration, it is possible to have smaller populations with less risk of premature convergence. === Thresheld Convergence === Thresheld Convergence (TC) is a diversification technique which attempts to separate the processes of exploration and exploitation. TC uses a “threshold” function to establish a minimum search step, and managing this step makes it possible to influence the transition from exploration to exploitation, convergence is thus “held” back until the last stages of the search process. The goal of a controlled transition is to avoid an early concentration of the population around a few search regions and avoid the loss of diversity which can cause premature convergence. Thresheld Convergence has been successfully applied to several population-based metaheuristics such as Particle Swarm Optimization, Differential evolution, Evolution strategies, Simulated annealing and Estimation of Distribution Algorithms. The ideal case for Thresheld Convergence is to have one sample solution from each attraction basin, and for each sample solution to have the same relative fitness with respect to its local optimum. Enforcing a minimum step aims to achieve this ideal case. In MPS Thresheld Convergence is specifically used to preserve diversity and avoid premature convergence by establishing a minimum search step. By disallowing new solutions which are too close to members of the current population, TC forces a strong exploration during the early stages of the search while preserving the diversity of the (small) population. == Algorithm == A basic variant of the MPS algorithm works by having a population of size equal to the dimension of the problem. New solutions are generated by exploring the hyperplane defined by the current solutions (by means of difference vectors) and performing an additional orthogonal step in order to avoid getting caught in this hyperplane. The step sizes are controlled by the Thresheld Convergence technique, which gradually reduces step sizes as the search process advances. An outline for the algorithm is given below: Generate the first initial population. Allowing these solutions to lie near the bounds of the search space generally gives good results: s k = ( r s 1 ∗ b o u n d 1 / 2 , r s 2 ∗ b o u n d 2 / 2 , . . . , r s n ∗ b o u n d n / 2 ) {\displaystyle s_{k}=(rs_{1}bound_{1}/2,rs_{2}bound_{2}/2,...,rs_{n}bound_{n}/2)} where s k {\displaystyle s_{k}} is the k {\displaystyle k} -th population member, r s i {\displaystyle rs_{i}} are random numbers which can be −1 or 1, and the b o u n d i {\displaystyle bound_{i}} are the lower and upper bounds on each dimension. While a stop condition is not reached: Update threshold convergence values ( m i n _ s t e p {\displaystyle min\_step} and m a x _ s t e p {\displaystyle max\_step} ) Calculate the centroid of the current population ( x c {\displaystyle x_{c}} ) For each member of the population ( x i {\displaystyle x_{i}} ), generate a new offspring as follows: Uniformly generate a scaling factor ( F i {\displaystyle F_{i}} ) between − m a x _ s t e p {\displaystyle -max\_step} and m a x _ s t e p {\displaystyle max\_step} Generate a vector ( x o {\displaystyle x_{o}} ) orthogonal to the difference vector between x i {\displaystyle x_{i}} and x c {\displaystyle x_{c}} Calculate a scaling factor for the orthogonal vector: m i n _ o r t h = s q r t ( m a x ( m i n _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle min\_orth=sqrt(max(min\_step^{2}-F_{i}^{2},0))} m a x _ o r t h = s q r t ( m a x ( m a x _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle max\_orth=sqrt(max(max\_step^{2}-F_{i}^{2},0))} o r t h _ s t e p = u n i f o r m ( m i n _ o r t h , m a x _ o r t h ) {\displaystyle orth\_step=uniform(min\_orth,max\_orth)} Generate the new solution by adding the difference and the orthogonal vectors to the original solution n e w _ s o l u t i o n = x i + F i ∗ ( x i − x c ) ∗ o r t h _ s t e p ∗ x o {\displaystyle new\_solution=x_{i}+F_{i}(x_{i}-x_{c})orth\_stepx_{o}} Pick the best members between the old population and the new one by discarding the least fit members. Return the single best solution or the best population found as the final result.

    Read more →
  • Absorbing Markov chain

    Absorbing Markov chain

    In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .

    Read more →
  • Genetic operator

    Genetic operator

    A genetic operator is an operator used in evolutionary algorithms (EA) to guide the algorithm towards a solution to a given problem. There are three main types of operators (mutation, crossover and selection), which must work in conjunction with one another in order for the algorithm to be successful. Genetic operators are used to create and maintain genetic diversity (mutation operator), combine existing solutions (also known as chromosomes) into new solutions (crossover) and select between solutions (selection). The classic representatives of evolutionary algorithms include genetic algorithms, evolution strategies, genetic programming and evolutionary programming. In his book discussing the use of genetic programming for the optimization of complex problems, computer scientist John Koza has also identified an 'inversion' or 'permutation' operator; however, the effectiveness of this operator has never been conclusively demonstrated and this operator is rarely discussed in the field of genetic programming. For combinatorial problems, however, these and other operators tailored to permutations are frequently used by other EAs. Mutation (or mutation-like) operators are said to be unary operators, as they only operate on one chromosome at a time. In contrast, crossover operators are said to be binary operators, as they operate on two chromosomes at a time, combining two existing chromosomes into one new chromosome. == Operators == Genetic variation is a necessity for the process of evolution. Genetic operators used in evolutionary algorithms are analogous to those in the natural world: survival of the fittest, or selection; reproduction (crossover, also called recombination); and mutation. === Selection === Selection operators give preference to better candidate solutions (chromosomes), allowing them to pass on their 'genes' to the next generation (iteration) of the algorithm. The best solutions are determined using some form of objective function (also known as a 'fitness function' in evolutionary algorithms), before being passed to the crossover operator. Different methods for choosing the best solutions exist, for example, fitness proportionate selection and tournament selection. A further or the same selection operator is used to determine the individuals for being selected to form the next parental generation. The selection operator may also ensure that the best solution(s) from the current generation always become(s) a member of the next generation without being altered; this is known as elitism or elitist selection. === Crossover === Crossover is the process of taking more than one parent solutions (chromosomes) and producing a child solution from them. By recombining portions of good solutions, the evolutionary algorithm is more likely to create a better solution. As with selection, there are a number of different methods for combining the parent solutions, including the edge recombination operator (ERO) and the 'cut and splice crossover' and 'uniform crossover' methods. The crossover method is often chosen to closely match the chromosome's representation of the solution; this may become particularly important when variables are grouped together as building blocks, which might be disrupted by a non-respectful crossover operator. Similarly, crossover methods may be particularly suited to certain problems; the ERO is considered a good option for solving the travelling salesman problem. === Mutation === The mutation operator encourages genetic diversity amongst solutions and attempts to prevent the evolutionary algorithm converging to a local minimum by stopping the solutions becoming too close to one another. In mutating the current pool of solutions, a given solution may change between slightly and entirely from the previous solution. By mutating the solutions, an evolutionary algorithm can reach an improved solution solely through the mutation operator. Again, different methods of mutation may be used; these range from a simple bit mutation (flipping random bits in a binary string chromosome with some low probability) to more complex mutation methods in which genes in the solution are changed, for example by adding a random value from the Gaussian distribution to the current gene value. As with the crossover operator, the mutation method is usually chosen to match the representation of the solution within the chromosome. == Combining operators == While each operator acts to improve the solutions produced by the evolutionary algorithm working individually, the operators must work in conjunction with each other for the algorithm to be successful in finding a good solution. Using the selection operator on its own will tend to fill the solution population with copies of the best solution from the population. If the selection and crossover operators are used without the mutation operator, the algorithm will tend to converge to a local minimum, that is, a good but sub-optimal solution to the problem. Using the mutation operator on its own leads to a random walk through the search space. Only by using all three operators together can the evolutionary algorithm become a noise-tolerant global search algorithm, yielding good solutions to the problem at hand.

    Read more →
  • Projection-slice theorem

    Projection-slice theorem

    In mathematics, the projection-slice theorem, central slice theorem or Fourier slice theorem in two dimensions states that the results of the following two calculations are equal: Take a two-dimensional function f(r), project (e.g. using the Radon transform) it onto a (one-dimensional) line, and do a Fourier transform of that projection. Take that same function, but do a two-dimensional Fourier transform first, and then slice the function through its origin, parallel to the projection line. In operator terms, if F1 and F2 are the 1- and 2-dimensional Fourier transform operators mentioned above, P1 is the projection operator (which projects a 2-D function onto a 1-D line), S1 is a slice operator (which extracts a 1-D central slice from a function), then F 1 P 1 = S 1 F 2 . {\displaystyle F_{1}P_{1}=S_{1}F_{2}.} This idea can be extended to higher dimensions. This theorem is used, for example, in the analysis of medical CT scans where a "projection" is an x-ray image of an internal organ. The Fourier transforms of these images are seen to be slices through the Fourier transform of the 3-dimensional density of the internal organ, and these slices can be interpolated to build up a complete Fourier transform of that density. The inverse Fourier transform is then used to recover the 3-dimensional density of the object. This technique was first derived by Ronald N. Bracewell in 1956 for a radio-astronomy problem. == The projection-slice theorem in N dimensions == In N dimensions, the projection-slice theorem states that the Fourier transform of the projection of an N-dimensional function f(r) onto an m-dimensional linear submanifold is equal to an m-dimensional slice of the N-dimensional Fourier transform of that function consisting of an m-dimensional linear submanifold through the origin in the Fourier space which is parallel to the projection submanifold. In operator terms: F m P m = S m F N . {\displaystyle F_{m}P_{m}=S_{m}F_{N}.\,} == The generalized Fourier-slice theorem == In addition to generalizing to N dimensions, the projection-slice theorem can be further generalized with an arbitrary change of basis. For convenience of notation, we consider the change of basis to be represented as B, an N-by-N invertible matrix operating on N-dimensional column vectors. Then the generalized Fourier-slice theorem can be stated as F m P m B = S m B − T | B − T | F N {\displaystyle F_{m}P_{m}B=S_{m}{\frac {B^{-T}}{|B^{-T}|}}F_{N}} where B − T = ( B − 1 ) T {\displaystyle B^{-T}=(B^{-1})^{T}} is the transpose of the inverse of the change of basis transform. == Proof in two dimensions == The projection-slice theorem is easily proven for the case of two dimensions. Without loss of generality, we can take the projection line to be the x-axis. There is no loss of generality because if we use a shifted and rotated line, the law still applies. Using a shifted line (in y) gives the same projection and therefore the same 1D Fourier transform results. The rotated function is the Fourier pair of the rotated Fourier transform, for which the theorem again holds. If f(x, y) is a two-dimensional function, then the projection of f(x, y) onto the x axis is p(x) where p ( x ) = ∫ − ∞ ∞ f ( x , y ) d y . {\displaystyle p(x)=\int _{-\infty }^{\infty }f(x,y)\,dy.} The Fourier transform of f ( x , y ) {\displaystyle f(x,y)} is F ( k x , k y ) = ∫ − ∞ ∞ ∫ − ∞ ∞ f ( x , y ) e − 2 π i ( x k x + y k y ) d x d y . {\displaystyle F(k_{x},k_{y})=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }f(x,y)\,e^{-2\pi i(xk_{x}+yk_{y})}\,dxdy.} The slice is then s ( k x ) {\displaystyle s(k_{x})} s ( k x ) = F ( k x , 0 ) = ∫ − ∞ ∞ ∫ − ∞ ∞ f ( x , y ) e − 2 π i x k x d x d y {\displaystyle s(k_{x})=F(k_{x},0)=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }f(x,y)\,e^{-2\pi ixk_{x}}\,dxdy} = ∫ − ∞ ∞ [ ∫ − ∞ ∞ f ( x , y ) d y ] e − 2 π i x k x d x {\displaystyle =\int _{-\infty }^{\infty }\left[\int _{-\infty }^{\infty }f(x,y)\,dy\right]\,e^{-2\pi ixk_{x}}dx} = ∫ − ∞ ∞ p ( x ) e − 2 π i x k x d x {\displaystyle =\int _{-\infty }^{\infty }p(x)\,e^{-2\pi ixk_{x}}dx} which is just the Fourier transform of p(x). The proof for higher dimensions is easily generalized from the above example. == The FHA cycle == If the two-dimensional function f(r) is circularly symmetric, it may be represented as f(r), where r = |r|. In this case the projection onto any projection line will be the Abel transform of f(r). The two-dimensional Fourier transform of f(r) will be a circularly symmetric function given by the zeroth-order Hankel transform of f(r), which will therefore also represent any slice through the origin. The projection-slice theorem then states that the Fourier transform of the projection equals the slice or F 1 A 1 = H , {\displaystyle F_{1}A_{1}=H,} where A1 represents the Abel-transform operator, projecting a two-dimensional circularly symmetric function onto a one-dimensional line, F1 represents the 1-D Fourier-transform operator, and H represents the zeroth-order Hankel-transform operator. == Extension to fan beam or cone-beam CT == The projection-slice theorem is suitable for CT image reconstruction with parallel beam projections. It does not directly apply to fanbeam or conebeam CT. The theorem was extended to fan-beam and conebeam CT image reconstruction by Shuang-ren Zhao in 1995.

    Read more →
  • GNU Octave

    GNU Octave

    GNU Octave is a scientific programming language for scientific computing and numerical computation. Among other things, Octave can be used to solve linear and nonlinear problems numerically and to perform other numerical experiments using a language that is mostly compatible with MATLAB. It may also be used as a batch-oriented language. As part of the GNU Project, it is free software under the terms of the GNU General Public License. == History == The project was conceived around 1988. At first it was intended to be a companion to a chemical reactor design course. Full development was started by John W. Eaton in 1992. The first alpha release dates back to 4 January 1993 and on 17 February 1994 version 1.0 was released. Version 9.2.0 was released on 7 June 2024. The program is named after Octave Levenspiel, a former professor of the principal author. Levenspiel was known for his ability to perform quick back-of-the-envelope calculations. == Development history == == Developments == In addition to use on desktops for personal scientific computing, Octave is used in academia and industry. For example, Octave was used on a massive parallel computer at Pittsburgh Supercomputing Center to find vulnerabilities related to guessing social security numbers. Acceleration with OpenCL or CUDA is also possible with use of GPUs. == Technical details == Octave is written in C++ using the C++ standard library. Octave uses an interpreter to execute the Octave scripting language. Octave is extensible using dynamically loadable modules. Octave interpreter has an OpenGL-based graphics engine to create plots, graphs and charts and to save or print them. Alternatively, gnuplot can be used for the same purpose. Octave includes a graphical user interface (GUI) in addition to the traditional command-line interface (CLI); see #User interfaces for details. == Octave, the language == The Octave language is an interpreted programming language. It is a structured programming language (similar to C) and supports many common C standard library functions, and also certain UNIX system calls and functions. However, it does not support passing arguments by reference although function arguments are copy-on-write to avoid unnecessary duplication. Octave programs consist of a list of function calls or a script. The syntax is matrix-based and provides various functions for matrix operations. It supports various data structures and allows object-oriented programming. Its syntax is very similar to MATLAB, and careful programming of a script will allow it to run on both Octave and MATLAB. Because Octave is made available under the GNU General Public License, it may be freely changed, copied and used. The program runs on Microsoft Windows and most Unix and Unix-like operating systems, including Linux, Android, and macOS. == Notable features == === Command and variable name completion === Typing a TAB character on the command line causes Octave to attempt to complete variable, function, and file names (similar to Bash's tab completion). Octave uses the text before the cursor as the initial portion of the name to complete. === Command history === When running interactively, Octave saves the commands typed in an internal buffer so that they can be recalled and edited. === Data structures === Octave includes a limited amount of support for organizing data in structures. In this example, we see a structure x with elements a, b, and c, (an integer, an array, and a string, respectively): === Short-circuit Boolean operators === Octave's && and || logical operators are evaluated in a short-circuit fashion (like the corresponding operators in the C language), in contrast to the element-by-element operators & and |. === Increment and decrement operators === Octave includes the C-like increment and decrement operators ++ and -- in both their prefix and postfix forms. Octave also does augmented assignment, e.g. x += 5. === Unwind-protect === Octave supports a limited form of exception handling modelled after the unwind_protect of Lisp. The general form of an unwind_protect block looks like this: As a general rule, GNU Octave recognizes as termination of a given block either the keyword end (which is compatible with the MATLAB language) or a more specific keyword endblock or, in some cases, end_block. As a consequence, an unwind_protect block can be terminated either with the keyword end_unwind_protect as in the example, or with the more portable keyword end. The cleanup part of the block is always executed. In case an exception is raised by the body part, cleanup is executed immediately before propagating the exception outside the block unwind_protect. GNU Octave also supports another form of exception handling (compatible with the MATLAB language): This latter form differs from an unwind_protect block in two ways. First, exception_handling is only executed when an exception is raised by body. Second, after the execution of exception_handling the exception is not propagated outside the block (unless a rethrow( lasterror ) statement is explicitly inserted within the exception_handling code). === Variable-length argument lists === Octave has a mechanism for handling functions that take an unspecified number of arguments without explicit upper limit. To specify a list of zero or more arguments, use the special argument varargin as the last (or only) argument in the list. varargin is a cell array containing all the input arguments. === Variable-length return lists === A function can be set up to return any number of values by using the special return value varargout. For example: === C++ integration === It is also possible to execute Octave code directly in a C++ program. For example, here is a code snippet for calling rand([10,1]): C and C++ code can be integrated into GNU Octave by creating oct files, or using the MATLAB compatible MEX files. == MATLAB compatibility == Octave has been built with MATLAB compatibility in mind, and shares many features with MATLAB: % Script: myscript.m a = 5; b = a 2 % Function: myfunc.m function result = myfunc(x) result = x^2 + 3; end Matrices as fundamental data type. Built-in support for complex numbers. Powerful built-in math functions and extensive function libraries. Extensibility in the form of user-defined functions. Octave treats incompatibility with MATLAB as a bug; therefore, it could be considered a software clone, which does not infringe software copyright as per Lotus v. Borland court case. MATLAB scripts from the MathWorks' FileExchange repository in principle are compatible with Octave. However, while they are often provided and uploaded by users under an Octave compatible and proper open source BSD license, the FileExchange Terms of use prohibit any usage beside MathWorks' proprietary MATLAB. === Syntax compatibility === There are a few purposeful, albeit minor, syntax additions Archived 2012-04-26 at the Wayback Machine: Comment lines can be prefixed with the # character as well as the % character; Various C-based operators ++, --, +=, =, /= are supported; Elements can be referenced without creating a new variable by cascaded indexing, e.g. [1:10](3); Strings can be defined with the double-quote " character as well as the single-quote ' character; When the variable type is single (a single-precision floating-point number), Octave calculates the "mean" in the single-domain (MATLAB in double-domain) which is faster but gives less accurate results; Blocks can also be terminated with more specific Control structure keywords, i.e., endif, endfor, endwhile, etc.; Functions can be defined within scripts and at the Octave prompt; Presence of a do-until loop (similar to do-while in C). === Function compatibility === Many, but not all, of the numerous MATLAB functions are available in GNU Octave, some of them accessible through packages in Octave Forge. The functions available as part of either core Octave or Forge packages are listed online Archived 2024-03-14 at the Wayback Machine. A list of unavailable functions is included in the Octave function __unimplemented.m__. Unimplemented functions are also listed under many Octave Forge packages in the Octave Wiki. When an unimplemented function is called the following error message is shown: == User interfaces == Octave comes with an official graphical user interface (GUI) and an integrated development environment (IDE) based on Qt. It has been available since Octave 3.8, and has become the default interface (over the command-line interface) with the release of Octave 4.0. It was well-received by an EDN contributor, who wrote "[Octave] now has a very workable GUI" in reviewing the then-new GUI in 2014. Several 3rd-party graphical front-ends have also been developed, like ToolboX for coding education. == GUI applications == With Octave code, the user can create GUI applications. See GUI Development (GNU Octave (version 7.1.0)). Below are some examples: Button, edit control, checkboxTextboxListbox wit

    Read more →
  • Apache Giraph

    Apache Giraph

    Apache Giraph is an Apache project to perform graph processing on big data. Giraph utilizes Apache Hadoop's MapReduce implementation to process graphs. Facebook used Giraph with some performance improvements to analyze one trillion edges using 200 machines in 4 minutes. Giraph is based on a paper published by Google about its own graph processing system called Pregel. It can be compared to other Big Graph processing libraries such as Cassovary. As of September 2023, it is no longer actively developed.

    Read more →