AI Code Model Ranking

AI Code Model Ranking — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Canva

    Canva

    Canva Pty Ltd. is an Australian multinational proprietary software company launched in 2013 based in Sydney, Australia. The platform provides a graphic design platform to create visual content for presentations, websites, and other digital products. Its uses include templates for presentations, posters, and social media content, as well as photo and video editing functionality. The platform uses a drag-and-drop interface designed for users without professional design training or experience. Canva operates on a freemium model and has added features such as print services and video editing tools over time. == History == === 2013–2020 === Canva was founded in Perth, Australia, by Melanie Perkins, Cliff Obrecht and Cameron Adams on 1 January 2013. One of the company's early investors was Susan Wu, an American entrepreneur. In its first year, Canva had more than 750,000 users. In 2017, the company reached profitability and had 294,000 paying customers. In January 2018, Perkins announced that the company had raised A$40 million from Sequoia Capital, Blackbird Ventures, and Felicis Ventures, and the company was valued at A$1 billion. It raised A$70 million in May 2019, followed by A$85 million in October 2019 and the launch of Canva for Enterprise. In December 2019, Canva announced Canva for Education, a free product for schools and other educational institutions intended to facilitate collaboration between students and teachers. === 2021–2025 === In June 2020, Canva announced a partnership with FedEx Office and with Office Depot the following month. As of June 2020, Canva's valuation had risen to A$6 billion, rising to A$40 billion by September 2021. In September 2021, Canva raised US$200 million, with its value peaking that year at US$40 billion. By September 2022, the valuation of the company had leveled at US$26 billion. While Canva's value declined from its 2021 peak by mid-2022, it remained one of Australia's most prominent technology companies, alongside Atlassian. In March 2022, Canva had over 75 million monthly active users. In 2023, the pair were named in the Australian Financial Review's AFR Rich List as among the 10 most wealthy people in Australia. On 7 December 2022, Canva launched Magic Write, which is the platform's AI-powered copywriting assistant. On 22 March 2023, Canva announced its new Assistant tool, which makes recommendations on graphics and styles that match the user's existing design. On 11 January 2024, Canva launched its own GPT in OpenAI's GPT Store. The company has announced it intends to compete with Google and Microsoft in the office software category with website and whiteboard products. In May 2024, the company announced the launch of Canva Enterprise, a plan designed for large organisations, alongside new tools including Work Kits, Courses and AI capabilities. In 2024, it announced a co-funded solar energy project to enhance its sustainability efforts. On 10 April 2025, Canva released Visual Suite 2. The new interface combines Canva's design and productivity tools. New features include a spreadsheets application (Canva Sheets), a generative AI coding assistant (Canva Code), a chatbot, and an updated photo editor that can modify or remove background objects. In August 2025, Canva launched a stock sale to employees, valuing the company at US$42 billion. == Acquisitions == In 2018, the company acquired presentations startup Zeetings for an undisclosed amount, as part of its expansion into the presentations space. In May 2019, the company announced the acquisitions of Pixabay and Pexels, two free stock photography sites based in Germany, which enabled Canva users to access their photos for designs. In February 2021, Canva acquired Austrian startup Kaleido.ai and the Czech-based Smartmockups. In 2022, Canva acquired Flourish, a London-based data visualization startup. In March 2024, Canva acquired UK-based Serif, the developers of the Affinity suite of graphic design software, for approximately $380 million. In August 2024, Canva acquired the AI image generation platform and startup, Leonardo AI, for an undisclosed amount. In June 2025, it was announced that Canva had acquired Australian AI marketing startup MagicBrief for an undisclosed amount. In February 2026, Canva acquired two startups: Cavalry, which specializes in animation software, and MangoAI, which focuses on improving advertising performance. In April 2026, Canva acquired Simtheory, an AI Workflow Tool, and Ortto, a marketing automation tool. == Philanthropy == Canva's co-founders, Melanie Perkins and Cliff Obrecht, have publicly stated their intention to donate a significant portion of their personal wealth to charity. In 2021, Canva started a partnership with GiveDirectly, a nonprofit organization operating in low income areas that makes unconditional cash transfers to families living in extreme poverty. Since then, the company has donated $50 million to support GiveDirectly's work across Malawi. In 2025, Canva announced an additional $100 million commitment to expand its GiveDirectly partnership. == Controversies == === Data breach === In May 2019, Canva experienced a data breach in which the data of roughly 139 million users was exposed. The exposed data included real names of users, usernames, email addresses, geographical information, and password hashes for some users. In January 2020, approximately 4 million user passwords were decrypted and shared online. Canva responded by resetting the passwords of every user who had not changed their password since the initial breach. === Russian operations === In May 2022 Canva was criticized for continuing to provide free access to its services in Russia, even after suspending payment processing in the country. Activists from the Ukrainian diaspora in Australia and others said this could be viewed as indirectly supporting Russia’s war effort. They noted the company was the only one of several major Australian firms to receive the lowest “digging in” rating on a tracker run by the Yale School of Management for failing to pull out of Russia. Canva responded that it had suspended financial transactions in Russia from March 2022 and maintained the free version to allow the continued creation and sharing of “pro-peace and anti-war” content for its 1.4 million Russian users.

    Read more →
  • Promoter based genetic algorithm

    Promoter based genetic algorithm

    The promoter based genetic algorithm (PBGA) is a genetic algorithm for neuroevolution developed by F. Bellas and R.J. Duro in the Integrated Group for Engineering Research (GII) at the University of Coruña, in Spain. It evolves variable size feedforward artificial neural networks (ANN) that are encoded into sequences of genes for constructing a basic ANN unit. Each of these blocks is preceded by a gene promoter acting as an on/off switch that determines if that particular unit will be expressed or not. == PBGA basics == The basic unit in the PBGA is a neuron with all of its inbound connections as represented in the following figure: The genotype of a basic unit is a set of real valued weights followed by the parameters of the neuron and proceeded by an integer valued field that determines the promoter gene value and, consequently, the expression of the unit. By concatenating units of this type we can construct the whole network. With this encoding it is imposed that the information that is not expressed is still carried by the genotype in evolution but it is shielded from direct selective pressure, maintaining this way the diversity in the population, which has been a design premise for this algorithm. Therefore, a clear difference is established between the search space and the solution space, permitting information learned and encoded into the genotypic representation to be preserved by disabling promoter genes. == Results == The PBGA was originally presented within the field of autonomous robotics, in particular in the real time learning of environment models of the robot. It has been used inside the Multilevel Darwinist Brain (MDB) cognitive mechanism developed in the GII for real robots on-line learning. In another paper it is shown how the application of the PBGA together with an external memory that stores the successful obtained world models, is an optimal strategy for adaptation in dynamic environments. Recently, the PBGA has provided results that outperform other neuroevolutionary algorithms in non-stationary problems, where the fitness function varies in time.

    Read more →
  • Multimodal learning

    Multimodal learning

    Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video. This integration allows for a more holistic understanding of complex data, improving model performance in tasks like visual question answering, cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning. Multimodal learning was proposed in 2011 at the beginning of the deep learning period. Large multimodal models, such as Google Gemini and GPT-4o, have become increasingly popular since 2023, enabling increased versatility and a broader understanding of real-world phenomena. == Motivation == Data usually comes with different modalities which carry different information. For example, it is very common to caption an image to convey the information not presented in the image itself. Similarly, sometimes it is more straightforward to use an image to describe information which may not be obvious from text. As a result, if different words appear in similar images, then these words likely describe the same thing. Conversely, if a word is used to describe seemingly dissimilar images, then these images may represent the same object. Thus, in cases dealing with multi-modal data, it is important to use a model which is able to jointly represent the information such that the model can capture the combined information from different modalities. == Multimodal transformers == Models such as CLIP (Contrastive Language–Image Pretraining) learn joint representations of images and text by optimizing contrastive objectives, allowing the model to match images with their corresponding textual descriptions. == Multimodal deep Boltzmann machines == A Boltzmann machine is a type of stochastic neural network invented by Geoffrey Hinton and Terry Sejnowski in 1985. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They are named after the Boltzmann distribution in statistical mechanics. The units in Boltzmann machines are divided into two groups: visible units and hidden units. Each unit is like a neuron with a binary output that represents whether it is activated or not. General Boltzmann machines allow connection between any units. However, learning is impractical using general Boltzmann Machines because the computational time is exponential to the size of the machine. A more efficient architecture is called restricted Boltzmann machine where connection is only allowed between hidden unit and visible unit, which is described in the next section. Multimodal deep Boltzmann machines can process and learn from different types of information, such as images and text, simultaneously. This can notably be done by having a separate deep Boltzmann machine for each modality, for example one for images and one for text, joined at an additional top hidden layer. == Applications == Multimodal machine learning has numerous applications across various domains: Cross-modal retrieval: cross-modal retrieval allows users to search for data across different modalities (e.g., retrieving images based on text descriptions), improving multimedia search engines and content recommendation systems. Classification and missing data retrieval: multimodal Deep Boltzmann Machines outperform traditional models like support vector machines and latent Dirichlet allocation in classification tasks and can predict missing data in multimodal datasets, such as images and text. Healthcare diagnostics: multimodal models integrate medical imaging, genomic data, and patient records to improve diagnostic accuracy and early disease detection, especially in cancer screening. Content generation: models like DALL·E generate images from textual descriptions, benefiting creative industries, while cross-modal retrieval enables dynamic multimedia searches. Robotics and human-computer interaction: multimodal learning improves interaction in robotics and AI by integrating sensory inputs like speech, vision, and touch, aiding autonomous systems and human-computer interaction. Emotion recognition: combining visual, audio, and text data, multimodal systems enhance sentiment analysis and emotion recognition, applied in customer service, social media, and marketing.

    Read more →
  • Generalized canonical correlation

    Generalized canonical correlation

    In statistics, the generalized canonical correlation analysis (gCCA), is a way of making sense of cross-correlation matrices between the sets of random variables when there are more than two sets. While a conventional CCA generalizes principal component analysis (PCA) to two sets of random variables, a gCCA generalizes PCA to more than two sets of random variables. The canonical variables represent those common factors that can be found by a large PCA of all of the transformed random variables after each set underwent its own PCA. == Applications == The Helmert-Wolf blocking (HWB) method of estimating linear regression parameters can find an optimal solution only if all cross-correlations between the data blocks are zero. They can always be made to vanish by introducing a new regression parameter for each common factor. The gCCA method can be used for finding those harmful common factors that create cross-correlation between the blocks. However, no optimal HWB solution exists if the random variables do not contain enough information on all of the new regression parameters.

    Read more →
  • ImageMixer

    ImageMixer

    ImageMixer is a brand name of video editing software that edits digital video and still image in camcorders and authors to VCD and DVD. It is a second-party Japanese product, distributed by Pixela Corporation, a Japanese manufacturer of PC peripheral hardware and multimedia software. == Bundling == ImageMixer is widely used for several camcorder brands, such as JVC, Hitachi and Canon. Also, Sony has chosen to package ImageMixer with its DVD and HDD Handycam. == ImageMixer series == ImageMixer has other series of software for digital camera, such as ImageMixer Label Maker and ImageMixer DVD dubbing. ImageMixer also has movie editing solution for Macintosh. == Windows Vista version of ImageMixer == A Windows Vista version of ImageMixer has been developed (ImageMixer3).

    Read more →
  • Oscillatory neural network

    Oscillatory neural network

    An oscillatory neural network (ONN) is an artificial neural network that uses coupled oscillators as neurons. Oscillatory neural networks are closely linked to the Kuramoto model, and are inspired by the phenomenon of neural oscillations in the brain. Oscillatory neural networks have been trained to recognize images. Complex-Valued Oscillatory network has also been shown to store and retrieve multidimensional aperiodic signals. An oscillatory autoencoder has also been demonstrated, which uses a combination of oscillators and rate-coded neurons. A neuron made of two coupled oscillators, one having a fixed and the other having a tunable natural frequency, has been shown able to run logic gates such as XOR that conventional sigmoid neurons cannot.

    Read more →
  • Kubeflow

    Kubeflow

    Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks), model training (Kubeflow Pipelines, Kubeflow Training Operator), model serving (KServe), and automated machine learning (Katib). Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. == History == The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan to address a perceived lack of flexible options for building production-ready machine learning systems. The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018. Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. == Components == === Kubeflow Notebooks for model development === Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. === Kubeflow Pipelines for model training === Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. === Kubeflow Training Operator for model training === For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. === KServe for model serving === The KServe component (previously named KFServing) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. Publicly disclosed adopters of KServe include Bloomberg, Gojek, the Wikimedia Foundation, and others. === Katib for automated machine learning === Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. == Release timeline ==

    Read more →
  • Randomized weighted majority algorithm

    Randomized weighted majority algorithm

    The randomized weighted majority algorithm is an algorithm in machine learning theory for aggregating expert predictions to a series of decision problems. It is a simple and effective method based on weighted voting which improves on the mistake bound of the deterministic weighted majority algorithm. In fact, in the limit, its prediction rate can be arbitrarily close to that of the best-predicting expert. == Example == Imagine that every morning before the stock market opens, we get a prediction from each of our "experts" about whether the stock market will go up or down. Our goal is to somehow combine this set of predictions into a single prediction that we then use to make a buy or sell decision for the day. The principal challenge is that we do not know which experts will give better or worse predictions. The RWMA gives us a way to do this combination such that our prediction record will be nearly as good as that of the single expert which, in hindsight, gave the most accurate predictions. == Motivation == In machine learning, the weighted majority algorithm (WMA) is a deterministic meta-learning algorithm for aggregating expert predictions. In pseudocode, the WMA is as follows: initialize all experts to weight 1 for each round: add each expert's weight to the option they predicted predict the option with the largest weighted sum multiply the weights of all experts who predicted wrongly by 1 2 {\displaystyle {\frac {1}{2}}} Suppose there are n {\displaystyle n} experts and the best expert makes m {\displaystyle m} mistakes. Then, the weighted majority algorithm (WMA) makes at most 2.4 ( log 2 ⁡ n + m ) {\displaystyle 2.4(\log _{2}n+m)} mistakes. This bound is highly problematic in the case of highly error-prone experts. Suppose, for example, the best expert makes a mistake 20% of the time; that is, in N = 100 {\displaystyle N=100} rounds using n = 10 {\displaystyle n=10} experts, the best expert makes m = 20 {\displaystyle m=20} mistakes. Then, the weighted majority algorithm only guarantees an upper bound of 2.4 ( log 2 ⁡ 10 + 20 ) ≈ 56 {\displaystyle 2.4(\log _{2}10+20)\approx 56} mistakes. As this is a known limitation of the weighted majority algorithm, various strategies have been explored in order to improve the dependence on m {\displaystyle m} . In particular, we can do better by introducing randomization. Drawing inspiration from the Multiplicative Weights Update Method algorithm, we will probabilistically make predictions based on how the experts have performed in the past. Similarly to the WMA, every time an expert makes a wrong prediction, we will decrement their weight. Mirroring the MWUM, we will then use the weights to make a probability distribution over the actions and draw our action from this distribution (instead of deterministically picking the majority vote as the WMA does). == Randomized weighted majority algorithm (RWMA) == The randomized weighted majority algorithm is an attempt to improve the dependence of the mistake bound of the WMA on m {\displaystyle m} . Instead of predicting based on majority vote, the weights, are used as probabilities for choosing the experts in each round and are updated over time (hence the name randomized weighted majority). Precisely, if w i {\displaystyle w_{i}} is the weight of expert i {\displaystyle i} , let W = ∑ i w i {\displaystyle W=\sum _{i}w_{i}} . We will follow expert i {\displaystyle i} with probability w i W {\displaystyle {\frac {w_{i}}{W}}} . This results in the following algorithm: initialize all experts to weight 1. for each round: add all experts' weights together to obtain the total weight W {\displaystyle W} choose expert i {\displaystyle i} randomly with probability w i W {\displaystyle {\frac {w_{i}}{W}}} predict as the chosen expert predicts multiply the weights of all experts who predicted wrongly by β {\displaystyle \beta } The goal is to bound the worst-case expected number of mistakes, assuming that the adversary has to select one of the answers as correct before we make our coin toss. This is a reasonable assumption in, for instance, the stock market example provided above: the variance of a stock price should not depend on the opinions of experts that influence private buy or sell decisions, so we can treat the price change as if it was decided before the experts gave their recommendations for the day. The randomized algorithm is better in the worst case than the deterministic algorithm (weighted majority algorithm): in the latter, the worst case was when the weights were split 50/50. But in the randomized version, since the weights are used as probabilities, there would still be a 50/50 chance of getting it right. In addition, generalizing to multiplying the weights of the incorrect experts by β < 1 {\displaystyle \beta <1} instead of strictly 1 2 {\displaystyle {\frac {1}{2}}} allows us to trade off between dependence on m {\displaystyle m} and log 2 ⁡ n {\displaystyle \log _{2}n} . This trade-off will be quantified in the analysis section. == Analysis == Let W t {\displaystyle W_{t}} denote the total weight of all experts at round t {\displaystyle t} . Also let F t {\displaystyle F_{t}} denote the fraction of weight placed on experts which predict the wrong answer at round t {\displaystyle t} . Finally, let N {\displaystyle N} be the total number of rounds in the process. By definition, F t {\displaystyle F_{t}} is the probability that the algorithm makes a mistake on round t {\displaystyle t} . It follows from the linearity of expectation that if M {\displaystyle M} denotes the total number of mistakes made during the entire process, E [ M ] = ∑ t = 1 N F t {\displaystyle E[M]=\sum _{t=1}^{N}F_{t}} . After round t {\displaystyle t} , the total weight is decreased by ( 1 − β ) F t W t {\displaystyle \ (1-\beta )F_{t}W_{t}} , since all weights corresponding to a wrong answer are multiplied by β < 1 {\displaystyle \ \beta <1} . It then follows that W t + 1 = W t ( 1 − ( 1 − β ) F t ) {\displaystyle W_{t+1}=W_{t}(1-(1-\beta )F_{t})} . By telescoping, since W 1 = n {\displaystyle W_{1}=n} , it follows that the total weight after the process concludes is On the other hand, suppose that m {\displaystyle \ m} is the number of mistakes made by the best-performing expert. At the end, this expert has weight β m {\displaystyle \ \beta ^{m}} . It follows, then, that the total weight is at least this much; in other words, W ≥ β m {\displaystyle \ W\geq \beta ^{m}} . This inequality and the above result imply Taking the natural logarithm of both sides yields Now, the Taylor series of the natural logarithm is In particular, it follows that ln ⁡ ( 1 − ( 1 − β ) F t ) < − ( 1 − β ) F t {\displaystyle \ \ln(1-(1-\beta )F_{t})<-(1-\beta )F_{t}} . Thus, Recalling that E [ M ] = ∑ t = 1 N F t {\displaystyle E[M]=\sum _{t=1}^{N}F_{t}} and rearranging, it follows that Now, as β → 1 {\displaystyle \beta \to 1} from below, the first constant tends to 1 {\displaystyle 1} ; however, the second constant tends to + ∞ {\displaystyle +\infty } . To quantify this tradeoff, define ε = 1 − β {\displaystyle \varepsilon =1-\beta } to be the penalty associated with getting a prediction wrong. Then, again applying the Taylor series of the natural logarithm, It then follows that the mistake bound, for small ε {\displaystyle \varepsilon } , can be written in the form ( 1 + ϵ 2 + O ( ε 2 ) ) m + ϵ − 1 ln ⁡ ( n ) {\displaystyle \ \left(1+{\frac {\epsilon }{2}}+O(\varepsilon ^{2})\right)m+\epsilon ^{-1}\ln(n)} . In English, the less that we penalize experts for their mistakes, the more that additional experts will lead to initial mistakes but the closer we get to capturing the predictive accuracy of the best expert as time goes on. In particular, given a sufficiently low value of ε {\displaystyle \varepsilon } and enough rounds, the randomized weighted majority algorithm can get arbitrarily close to the correct prediction rate of the best expert. In particular, as long as m {\displaystyle m} is sufficiently large compared to ln ⁡ ( n ) {\displaystyle \ln(n)} (so that their ratio is sufficiently small), we can assign we can obtain an upper bound on the number of mistakes equal to This implies that the "regret bound" on the algorithm (that is, how much worse it performs than the best expert) is sublinear, at O ( m ln ⁡ ( n ) ) {\displaystyle O({\sqrt {m\ln(n)}})} . == Revisiting the motivation == Recall that the motivation for the randomized weighted majority algorithm was given by an example where the best expert makes a mistake 20% of the time. Precisely, in N = 100 {\displaystyle N=100} rounds, with n = 10 {\displaystyle n=10} experts, where the best expert makes m = 20 {\displaystyle m=20} mistakes, the deterministic weighted majority algorithm only guarantees an upper bound of 2.4 ( log 2 ⁡ 10 + 20 ) ≈ 56 {\displaystyle 2.4(\log _{2}10+20)\approx 56} . By the analysis above, it follows that minimizing the number of worst-case expected mistakes is equivalent to minimizing the fun

    Read more →
  • Amaryllo

    Amaryllo

    Amaryllo Inc. is a multinational company founded in Amsterdam, the Netherlands, and now headquartered in the United States. It operates as a cloud service platform, providing cloud storage and cloud computing solutions to enterprises and brand companies. Amaryllo began with Skype IP camera development, pioneering biometric robotic technologies, encrypted P2P network, and secure cloud storage. Amaryllo was founded by Band of Angels member, Marcus Yang to develop patents for a new type of robotic cameras that is claimed to "talk, hear, sense, recognize human faces, and track intruders". It also claims to have made the world's first security robot based on the WebRTC protocol, Icam PRO FHD, and won the 2015 CES Best of Innovation Award under Embedded Technology category. Its home security robots claim to employ 256-bit encryption and run on the WebRTC protocol. Amaryllo products are sold in over 100 Countries across 6 Continents. == History == Amaryllo revealed its first smart home security products at Internationale Funkausstellung Berlin (IFA) 2013 with a Skype-enabled IP camera called iCam HD. Amaryllo announced its second Skype-certified smart home product, iBabi HD, at CES 2014. The company was chosen as a "Cool Vendor" by Gartner in Connected Home 2014. Amaryllo introduced WebRTC-based smart home products after Microsoft terminated embedded Skype services in mid 2014. Since then, Amaryllo has been developing camera robots with auto-tracking and facial recognition technologies. Its camera robots, ATOM AR3 and ATOM AR3S, were introduced in late 2016. It focuses on wired and wireless technology based on AI services. == Cloud Service Platform == Amaryllo offers prepaid cloud storage through digital codes and gift cards, distributed via InComm Payments, Blackhawk Network, and other partners. It provides high-performance cloud computing service through Rescale partnership. Amaryllo provides free cameras under an annual cloud storage subscription on its website. == Global Supercomputing Network (GSN) == The Global Supercomputing Network (GSN) is a distributed high-performance computing (HPC) platform developed by Amaryllo. The network is designed to provide scalable Infrastructure as a Service (IaaS) by connecting a global array of data centers to offer GPU computing resources for specialized industrial and scientific applications. === Architecture and Technology === GSN operates as a decentralized distributed network of servers rather than a single centralized supercomputer. The platform integrates an artificial intelligence assistant named Genie, also developed by Amaryllo. Genie's primary function is to manage computing allocation, helping users identify and connect to available resources across the network’s various nodes based on the specific requirements of their tasks. === Services === The network primarily focuses on the rental of GPU processing resources, catering to fields that require massive parallel processing capabilities, including: Artificial Intelligence and Machine Learning: Training large language models (LLMs) and neural networks. Scientific Simulations: Executing complex calculations in physics, chemistry, and bioinformatics. Data Analytics: Processing large-scale datasets. By utilizing a rental model, GSN allows organizations to access high-end hardware without the capital expenditure associated with purchasing and maintaining physical server infrastructure. === Infrastructure and Partnerships === The network’s physical footprint is expanded through strategic partnerships with data center operators. GSN collaborates with MettaDC and Cyber DC to provide colocation services. These partnerships facilitate the deployment of Nvidia server clusters within secure, Tier-rated facilities, ensuring high availability and connectivity for GSN users. == Official Brand Licensee of HP == Amaryllo Inc. is an official licensee of HP Inc., managing both B2B and B2C cloud services under the HP brand. Through this partnership, Amaryllo offers a range of secure and scalable cloud solutions, including HP Cloud, which provides subscription and one-time payment storage for reliable data backup and storage for individuals, families, and businesses. HP Cloud employs cloud computing technologies to create smart albums for users.

    Read more →
  • Gaussian adaptation

    Gaussian adaptation

    Gaussian adaptation (GA), also called normal or natural adaptation (NA) is an evolutionary algorithm designed for the maximization of manufacturing yield due to statistical deviation of component values of signal processing systems. In short, GA is a stochastic adaptive process where a number of samples of an n-dimensional vector x[xT = (x1, x2, ..., xn)] are taken from a multivariate Gaussian distribution, N(m, M), having mean m and moment matrix M. The samples are tested for fail or pass. The first- and second-order moments of the Gaussian restricted to the pass samples are m and M. The outcome of x as a pass sample is determined by a function s(x), 0 < s(x) < q ≤ 1, such that s(x) is the probability that x will be selected as a pass sample. The average probability of finding pass samples (yield) is P ( m ) = ∫ s ( x ) N ( x − m ) d x {\displaystyle P(m)=\int s(x)N(x-m)\,dx} Then the theorem of GA states: For any s(x) and for any value of P < q, there always exist a Gaussian p. d. f. [ probability density function ] that is adapted for maximum dispersion. The necessary conditions for a local optimum are m = m and M proportional to M. The dual problem is also solved: P is maximized while keeping the dispersion constant (Kjellström, 1991). Proofs of the theorem may be found in the papers by Kjellström, 1970, and Kjellström & Taxén, 1981. Since dispersion is defined as the exponential of entropy/disorder/average information it immediately follows that the theorem is valid also for those concepts. Altogether, this means that Gaussian adaptation may carry out a simultaneous maximisation of yield and average information (without any need for the yield or the average information to be defined as criterion functions). The theorem is valid for all regions of acceptability and all Gaussian distributions. It may be used by cyclic repetition of random variation and selection (like the natural evolution). In every cycle a sufficiently large number of Gaussian distributed points are sampled and tested for membership in the region of acceptability. The centre of gravity of the Gaussian, m, is then moved to the centre of gravity of the approved (selected) points, m. Thus, the process converges to a state of equilibrium fulfilling the theorem. A solution is always approximate because the centre of gravity is always determined for a limited number of points. It was used for the first time in 1969 as a pure optimization algorithm making the regions of acceptability smaller and smaller (in analogy to simulated annealing, Kirkpatrick 1983). Since 1970 it has been used for both ordinary optimization and yield maximization. == Natural evolution and Gaussian adaptation == It has also been compared to the natural evolution of populations of living organisms. In this case s(x) is the probability that the individual having an array x of phenotypes will survive by giving offspring to the next generation; a definition of individual fitness given by Hartl 1981. The yield, P, is replaced by the mean fitness determined as a mean over the set of individuals in a large population. Phenotypes are often Gaussian distributed in a large population and a necessary condition for the natural evolution to be able to fulfill the theorem of Gaussian adaptation, with respect to all Gaussian quantitative characters, is that it may push the centre of gravity of the Gaussian to the centre of gravity of the selected individuals. This may be accomplished by the Hardy–Weinberg law. This is possible because the theorem of Gaussian adaptation is valid for any region of acceptability independent of the structure (Kjellström, 1996). In this case the rules of genetic variation such as crossover, inversion, transposition etcetera may be seen as random number generators for the phenotypes. So, in this sense Gaussian adaptation may be seen as a genetic algorithm. == How to climb a mountain == Mean fitness may be calculated provided that the distribution of parameters and the structure of the landscape is known. The real landscape is not known, but figure below shows a fictitious profile (blue) of a landscape along a line (x) in a room spanned by such parameters. The red curve is the mean based on the red bell curve at the bottom of figure. It is obtained by letting the bell curve slide along the x-axis, calculating the mean at every location. As can be seen, small peaks and pits are smoothed out. Thus, if evolution is started at A with a relatively small variance (the red bell curve), then climbing will take place on the red curve. The process may get stuck for millions of years at B or C, as long as the hollows to the right of these points remain, and the mutation rate is too small. If the mutation rate is sufficiently high, the disorder or variance may increase and the parameter(s) may become distributed like the green bell curve. Then the climbing will take place on the green curve, which is even more smoothed out. Because the hollows to the right of B and C have now disappeared, the process may continue up to the peaks at D. But of course the landscape puts a limit on the disorder or variability. Besides — dependent on the landscape — the process may become very jerky, and if the ratio between the time spent by the process at a local peak and the time of transition to the next peak is very high, it may as well look like a punctuated equilibrium as suggested by Gould (see Ridley). == Computer simulation of Gaussian adaptation == Thus far the theory only considers mean values of continuous distributions corresponding to an infinite number of individuals. In reality however, the number of individuals is always limited, which gives rise to an uncertainty in the estimation of m and M (the moment matrix of the Gaussian). And this may also affect the efficiency of the process. Unfortunately very little is known about this, at least theoretically. The implementation of normal adaptation on a computer is a fairly simple task. The adaptation of m may be done by one sample (individual) at a time, for example m(i + 1) = (1 – a) m(i) + ax where x is a pass sample, and a < 1 a suitable constant so that the inverse of a represents the number of individuals in the population. M may in principle be updated after every step y leading to a feasible point x = m + y according to: M(i + 1) = (1 – 2b) M(i) + 2byyT, where yT is the transpose of y and b << 1 is another suitable constant. In order to guarantee a suitable increase of average information, y should be normally distributed with moment matrix μ2M, where the scalar μ > 1 is used to increase average information (information entropy, disorder, diversity) at a suitable rate. But M will never be used in the calculations. Instead we use the matrix W defined by WWT = M. Thus, we have y = Wg, where g is normally distributed with the moment matrix μU, and U is the unit matrix. W and WT may be updated by the formulas W = (1 – b)W + bygT and WT = (1 – b)WT + bgyT because multiplication gives M = (1 – 2b)M + 2byyT, where terms including b2 have been neglected. Thus, M will be indirectly adapted with good approximation. In practice it will suffice to update W only W(i + 1) = (1 – b)W(i) + bygT. This is the formula used in a simple 2-dimensional model of a brain satisfying the Hebbian rule of associative learning; see the next section (Kjellström, 1996 and 1999). The figure below illustrates the effect of increased average information in a Gaussian p.d.f. used to climb a mountain Crest (the two lines represent the contour line). Both the red and green cluster have equal mean fitness, about 65%, but the green cluster has a much higher average information making the green process much more efficient. The effect of this adaptation is not very salient in a 2-dimensional case, but in a high-dimensional case, the efficiency of the search process may be increased by many orders of magnitude. == The evolution in the brain == In the brain the evolution of DNA-messages is supposed to be replaced by an evolution of signal patterns and the phenotypic landscape is replaced by a mental landscape, the complexity of which will hardly be second to the former. The metaphor with the mental landscape is based on the assumption that certain signal patterns give rise to a better well-being or performance. For instance, the control of a group of muscles leads to a better pronunciation of a word or performance of a piece of music. In this simple model it is assumed that the brain consists of interconnected components that may add, multiply and delay signal values. A nerve cell kernel may add signal values, a synapse may multiply with a constant and An axon may delay values. This is a basis of the theory of digital filters and neural networks consisting of components that may add, multiply and delay signalvalues and also of many brain models, Levine 1991. In the figure below the brain stem is supposed to deliver Gaussian distributed signal patterns. This may be possible since certai

    Read more →
  • Characteristic samples

    Characteristic samples

    Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm I {\displaystyle I} is given a set of pairs of strings and labels S {\displaystyle S} , and returns a representation R {\displaystyle R} that is consistent with S {\displaystyle S} . Characteristic samples consider the scenario when the goal is not only finding a representation consistent with S {\displaystyle S} , but finding a representation that recognizes a specific target language. A characteristic sample of language L {\displaystyle L} is a set of pairs of the form ( s , l ( s ) ) {\displaystyle (s,l(s))} where: l ( s ) = 1 {\displaystyle l(s)=1} if and only if s ∈ L {\displaystyle s\in L} l ( s ) = − 1 {\displaystyle l(s)=-1} if and only if s ∉ L {\displaystyle s\notin L} Given the characteristic sample S {\displaystyle S} , I {\displaystyle I} 's output on it is a representation R {\displaystyle R} , e.g. an automaton, that recognizes L {\displaystyle L} . == Formal Definition == === The Learning Paradigm associated with Characteristic Samples === There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm. Given a class of languages C {\displaystyle \mathbb {C} } and a class of representations for the languages R {\displaystyle \mathbb {R} } , the paradigm goes as follows: The adversary A {\displaystyle A} selects a language L ∈ C {\displaystyle L\in \mathbb {C} } and reports it to the teacher The teacher T {\displaystyle T} then computes a set of strings and label them correctly according to L {\displaystyle L} , trying to make sure that the inference algorithm will compute L {\displaystyle L} The adversary can add correctly labeled words to the set in order to confuse the inference algorithm The inference algorithm I {\displaystyle I} gets the sample and computes a representation R ∈ R {\displaystyle R\in \mathbb {R} } consistent with the sample. The goal is that when the inference algorithm receives a characteristic sample for a language L {\displaystyle L} , or a sample that subsumes a characteristic sample for L {\displaystyle L} , it will return a representation that recognizes exactly the language L {\displaystyle L} . === Sample === Sample S {\displaystyle S} is a set of pairs of the form ( s , l ( s ) ) {\displaystyle (s,l(s))} such that l ( s ) ∈ { − 1 , 1 } {\displaystyle l(s)\in \{-1,1\}} ==== Sample consistent with a language ==== We say that a sample S {\displaystyle S} is consistent with language L {\displaystyle L} if for every pair ( s , l ( s ) ) {\displaystyle (s,l(s))} in S {\displaystyle S} : l ( s ) = 1 if and only if s ∈ L {\displaystyle l(s)=1{\text{ if and only if }}s\in L} l ( s ) = − 1 if and only if s ∉ L {\displaystyle l(s)=-1{\text{ if and only if }}s\notin L} === Characteristic sample === Given an inference algorithm I {\displaystyle I} and a language L {\displaystyle L} , a sample S {\displaystyle S} that is consistent with L {\displaystyle L} is called a characteristic sample of L {\displaystyle L} for I {\displaystyle I} if: I {\displaystyle I} 's output on S {\displaystyle S} is a representation R {\displaystyle R} that recognizes L {\displaystyle L} . For every sample D {\displaystyle D} that is consistent with L {\displaystyle L} and also fulfils S ⊆ D {\displaystyle S\subseteq D} , I {\displaystyle I} 's output on D {\displaystyle D} is a representation R {\displaystyle R} that recognizes L {\displaystyle L} . A Class of languages C {\displaystyle \mathbb {C} } is said to have charistaristic samples if every L ∈ C {\displaystyle L\in \mathbb {C} } has a characteristic sample. == Related Theorems == === Theorem === If equivalence is undecidable for a class C {\textstyle \mathbb {C} } over Σ {\textstyle \Sigma } of cardinality bigger than 1, then C {\textstyle \mathbb {C} } doesn't have characteristic samples. ==== Proof ==== Given a class of representations C {\textstyle \mathbb {C} } such that equivalence is undecidable, for every polynomial p ( x ) {\displaystyle p(x)} and every n ∈ N {\displaystyle n\in \mathbb {N} } , there exist two representations r 1 {\displaystyle r_{1}} and r 2 {\displaystyle r_{2}} of sizes bounded by n {\displaystyle n} , that recognize different languages but are inseparable by any string of size bounded by p ( n ) {\displaystyle p(n)} . Assuming this is not the case, we can decide if r 1 {\displaystyle r_{1}} and r 2 {\displaystyle r_{2}} are equivalent by simulating their run on all strings of size smaller than p ( n ) {\displaystyle p(n)} , contradicting the assumption that equivalence is undecidable. === Theorem === If S 1 {\displaystyle S_{1}} is a characteristic sample for L 1 {\displaystyle L_{1}} and is also consistent with L 2 {\displaystyle L_{2}} , then every characteristic sample of L 2 {\displaystyle L_{2}} , is inconsistent with L 1 {\displaystyle L_{1}} . ==== Proof ==== Given a class C {\textstyle \mathbb {C} } that has characteristic samples, let R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} be representations that recognize L 1 {\displaystyle L_{1}} and L 2 {\displaystyle L_{2}} respectively. Under the assumption that there is a characteristic sample for L 1 {\displaystyle L_{1}} , S 1 {\displaystyle S_{1}} that is also consistent with L 2 {\displaystyle L_{2}} , we'll assume falsely that there exist a characteristic sample for L 2 {\displaystyle L_{2}} , S 2 {\displaystyle S_{2}} that is consistent with L 1 {\displaystyle L_{1}} . By the definition of characteristic sample, the inference algorithm I {\displaystyle I} must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample S 1 ∪ S 2 {\displaystyle S_{1}\cup S_{2}} , the answer of the inferring algorithm needs to recognize both L 1 {\displaystyle L_{1}} and L 2 {\displaystyle L_{2}} , in contradiction. === Theorem === If a class is polynomially learnable by example based queries, it is learnable with characteristic samples. == Polynomialy characterizable classes == === Regular languages === The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation, ∼ L {\displaystyle \sim _{L}} (where x ∼ L y {\displaystyle x\sim _{L}y} for x , y ∈ Σ ∗ {\displaystyle x,y\in \Sigma ^{}} if and only if ∀ z ∈ Σ ∗ : x z ∈ L ↔ y z ∈ L {\displaystyle \forall z\in \Sigma ^{}:xz\in L\leftrightarrow yz\in L} ). Note that if x {\displaystyle x} , y {\displaystyle y} are not congruent with respect to ∼ L {\displaystyle \sim _{L}} , there exists a string z {\displaystyle z} such that x z ∈ L {\displaystyle xz\in L} but y z ∉ L {\displaystyle yz\notin L} or vice versa, this string is called a separating suffix. ==== Constructing a characteristic sample ==== The construction of a characteristic sample for a language L {\displaystyle L} by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton A {\displaystyle A} recognizing L {\displaystyle L} , starting from its initial state, we get a suffix closed set of words, W {\displaystyle W} , ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of A {\displaystyle A} on them ends in the respective states. We refer to the set of separating suffixes as S {\displaystyle S} . The labeled set (sample) of words the teacher gives the adversary is { ( w , l ( w ) ) | w ∈ W ⋅ S ∪ W ⋅ Σ ⋅ S } {\displaystyle \{(w,l(w))|w\in W\cdot S\cup W\cdot \Sigma \cdot S\}} where l ( w ) {\displaystyle l(w)} is the correct label of w {\displaystyle w} (whether it is in L {\displaystyle L} or not). We may assume that ϵ ∈ S {\displaystyle \epsilon \in S} . ==== Constructing a deterministic automata ==== Given the sample from the adversary W {\displaystyle W} , the construction of the automaton by the inference algorithm I {\displaystyle I} starts with defining P = prefix ( W ) {\displaystyle P={\text{prefix}}(W)} and S = suffix ( W ) {\displaystyle S={\text{suffix}}(W)} , which are the set of prefixes and suffixes of W {\displaystyle W} respectively. Now the algorithm constructs a matrix M {\displaystyle M} where the elements of P {\displaystyle P} function as the rows, ordered by the shortlex order, and the elements of S {\displaystyle S} function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix p i {\displaystyle p_{i}} and suffix s j {\displaystyle s_{j}} : If p i s j ∈ W → M i j = l ( p i s j ) {\displaystyle p_{i}s_{j}\in W\rightarrow M_{ij}=l(p_{i}s_{j})} else, M i j = 0 {\displaystyle M_{ij}=0} Now, we say row i {\displaystyle i} and t {\displaystyle t} are distinguishable if there exi

    Read more →
  • Latent and observable variables

    Latent and observable variables

    In statistics, latent variables (from Latin: present participle of lateo 'lie hidden') are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such latent variable models are used in many disciplines, including engineering, medicine, ecology, physics, machine learning/artificial intelligence, natural language processing, bioinformatics, chemometrics, demography, economics, management, political science, psychology and the social sciences. Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. Among the earliest expressions of this idea is Francis Bacon's polemic the Novum Organum, itself a challenge to the more traditional logic expressed in Aristotle's Organon: But the latent process of which we speak, is far from being obvious to men’s minds, beset as they now are. For we mean not the measures, symptoms, or degrees of any process which can be exhibited in the bodies themselves, but simply a continued process, which, for the most part, escapes the observation of the senses. In this situation, the term hidden variables is commonly used, reflecting the fact that the variables are meaningful, but not observable. Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations. The use of latent variables can serve to reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "sub-symbolic" data in the real world to symbolic data in the modeled world. == Examples == === Psychology === Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common factor model. The "Big Five personality traits" have been inferred using factor analysis. extraversion spatial ability wisdom: “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.” Spearman's g, or the general intelligence factor in psychometrics === Economics === Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. However, by linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly, so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging. === Medicine === Latent-variable methodology is used in many branches of medicine. A class of problems that naturally lend themselves to latent variables approaches are longitudinal studies where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box). == Inferring latent variables == There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include: linear mixed-effects models and nonlinear mixed-effects models Hidden Markov models Factor analysis Item response theory Analysis and inference methods include: Principal component analysis Instrumented principal component analysis Partial least squares regression Latent semantic analysis and probabilistic latent semantic analysis EM algorithms Metropolis–Hastings algorithm === Bayesian algorithms and methods === Bayesian statistics is often used for inferring latent variables. Latent Dirichlet allocation The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories. The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.

    Read more →
  • Chai AI

    Chai AI

    Chai AI (also known as Chai Research) is an American artificial intelligence (AI) company that operates a chatbot platform where users can create, share, and interact with character-based chatbots powered by large language models (LLMs). The company is headquartered in Palo Alto, California. == History == Chai was founded in 2021 by William Beauchamp, a former quantitative trader educated at Cambridge, who began developing the initial prototype in 2020 in Cambridge, England. The company launched in 2021 and relocated to Palo Alto in 2022. In June 2023, Chai raised US$2 million in a pre-seed funding round. In September 2023, GPU cloud provider CoreWeave invested in the company at a valuation of US$450 million. In January 2024, Chai Research reported a $450 million valuation following an investment from cloud computing provider CoreWeave. In July 2024, authorities in Belgium launched an investigation into the company following reports of a man dying by suicide following extensive chats on the Chai app. == Reception == In 2025, Chai Research announced that their app had over 10 million downloads and 1 million daily active users. In 2022, Canadian writer Sheila Heti published her conversations with various chatbots in The Paris Review, including Chai AI chatbots, and later used Chai AI chatbots in the development of a novel. Heti said that she had found that Chai's default chatbot, Eliza, "had turned out to be like most of the other bots on the site—primarily interested in sex". In January 2026, CHAI introduced country-based blocks on its free, ad-supported tier, initially providing the community with little information and inaccurate lists of the affected countries. Users in "Low tier" regions are required to subscribe to use the app in any capacity, while "High tier" regions will retain free ad-supported access. In response to backlash, the company announced a "Basic" tier with unlimited messages and ads, intended to cover electricity and infrastructure costs. In February 2026, CHAI was criticized for the unannounced implementation of restrictive "token limits" that abruptly blocked messages and froze conversations for both free and paid subscribers. Users generating long responses or utilizing roleplay features found their quotas exhausted within minutes, resulting in lockouts lasting anywhere from a few hours to a week. == Technology == Chai allows users to create characters and interact with chatbot versions of those characters. These chatbots use the open-source large language model (LLM) GPT-J originally developed by EleutherAI. Chai AI chatbots can be shared on the platform for other users to interact with.

    Read more →
  • Linear genetic programming

    Linear genetic programming

    "Linear genetic programming" is unrelated to "linear programming". Linear genetic programming (LGP) is a particular method of genetic programming wherein computer programs in a population are represented as a sequence of register-based instructions from an imperative programming language or machine language. The adjective "linear" stems from the fact that each LGP program is a sequence of instructions and the sequence of instructions is normally executed sequentially. Like in other programs, the data flow in LGP can be modeled as a graph that will visualize the potential multiple usage of register contents and the existence of structurally noneffective code (introns) which are two main differences of this genetic representation from the more common tree-based genetic programming (TGP) variant. Like other Genetic Programming methods, Linear genetic programming requires the input of data to run the program population on. Then, the output of the program (its behaviour) is judged against some target behaviour, using a fitness function. However, LGP is generally more efficient than tree genetic programming due to its two main differences mentioned above: Intermediate results (stored in registers) can be reused and a simple intron removal algorithm exists that can be executed to remove all non-effective code prior to programs being run on the intended data. These two differences often result in compact solutions and substantial computational savings compared to the highly constrained data flow in trees and the common method of executing all tree nodes in TGP. Furthermore, LGP naturally has multiple outputs by defining multiple output registers and easily cooperates with control flow operations. Linear genetic programming has been applied in many domains, including system modeling and system control with considerable success. Linear genetic programming should not be confused with linear tree programs in tree genetic programming, program composed of a variable number of unary functions and a single terminal. Note that linear tree GP differs from bit string genetic algorithms since a population may contain programs of different lengths and there may be more than two types of functions or more than two types of terminals. == Examples of LGP programs == Because LGP programs are basically represented by a linear sequence of instructions, they are simpler to read and to operate on than their tree-based counterparts. For example, a simple program written to solve a Boolean function problem with 3 inputs (in R1, R2, R3) and one output (in R0), could read like this: R1, R2, R3 have to be declared as input (read-only) registers, while R0 and R4 are declared as calculation (read-write) registers. This program is very simple, having just 5 instructions. But mutation and crossover operators could work to increase the length of the program, as well as the content of each of its instructions. Note that one instruction is non-effective or an intron (marked), since it does not impact the output register R0. Recognition of those instructions is the basis for the intron removal algorithm which is used analyze code prior to execution. Technically, this happens by copying an individual and then run the intron removal once. The copy with removed introns is then executed as many times as dictated by the number of training cases. Notably, the original individual is left intact, so as to continue participating in the evolutionary process. It is only the copy that is executed that is compressed by removing these "structural" introns. Another simple program, this one written in the LGP language Slash/A looks like a series of instructions separated by a slash: By representing such code in bytecode format, i.e. as an array of bytes each representing a different instruction, one can make mutation operations simply by changing an element of such an array.

    Read more →
  • Error tolerance (PAC learning)

    Error tolerance (PAC learning)

    In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated. == Notation and the Valiant learning model == In the following, let X {\displaystyle X} be our n {\displaystyle n} -dimensional input space. Let H {\displaystyle {\mathcal {H}}} be a class of functions that we wish to use in order to learn a { 0 , 1 } {\displaystyle \{0,1\}} -valued target function f {\displaystyle f} defined over X {\displaystyle X} . Let D {\displaystyle {\mathcal {D}}} be the distribution of the inputs over X {\displaystyle X} . The goal of a learning algorithm A {\displaystyle {\mathcal {A}}} is to choose the best function h ∈ H {\displaystyle h\in {\mathcal {H}}} such that it minimizes e r r o r ( h ) = P x ∼ D ( h ( x ) ≠ f ( x ) ) {\displaystyle error(h)=P_{x\sim {\mathcal {D}}}(h(x)\neq f(x))} . Let us suppose we have a function s i z e ( f ) {\displaystyle size(f)} that can measure the complexity of f {\displaystyle f} . Let Oracle ( x ) {\displaystyle {\text{Oracle}}(x)} be an oracle that, whenever called, returns an example x {\displaystyle x} and its correct label f ( x ) {\displaystyle f(x)} . When no noise corrupts the data, we can define learning in the Valiant setting: Definition: We say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the Valiant setting if there exists a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x ) {\displaystyle {\text{Oracle}}(x)} and a polynomial p ( ⋅ , ⋅ , ⋅ , ⋅ ) {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot )} such that for any 0 < ε ≤ 1 {\displaystyle 0<\varepsilon \leq 1} and 0 < δ ≤ 1 {\displaystyle 0<\delta \leq 1} it outputs, in a number of calls to the oracle bounded by p ( 1 ε , 1 δ , n , size ( f ) ) {\displaystyle p\left({\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,{\text{size}}(f)\right)} , a function h ∈ H {\displaystyle h\in {\mathcal {H}}} that satisfies with probability at least 1 − δ {\displaystyle 1-\delta } the condition error ( h ) ≤ ε {\displaystyle {\text{error}}(h)\leq \varepsilon } . In the following we will define learnability of f {\displaystyle f} when data have suffered some modification. == Classification noise == In the classification noise model a noise rate 0 ≤ η < 1 2 {\displaystyle 0\leq \eta <{\frac {1}{2}}} is introduced. Then, instead of Oracle ( x ) {\displaystyle {\text{Oracle}}(x)} that returns always the correct label of example x {\displaystyle x} , algorithm A {\displaystyle {\mathcal {A}}} can only call a faulty oracle Oracle ( x , η ) {\displaystyle {\text{Oracle}}(x,\eta )} that will flip the label of x {\displaystyle x} with probability η {\displaystyle \eta } . As in the Valiant case, the goal of a learning algorithm A {\displaystyle {\mathcal {A}}} is to choose the best function h ∈ H {\displaystyle h\in {\mathcal {H}}} such that it minimizes e r r o r ( h ) = P x ∼ D ( h ( x ) ≠ f ( x ) ) {\displaystyle error(h)=P_{x\sim {\mathcal {D}}}(h(x)\neq f(x))} . In applications it is difficult to have access to the real value of η {\displaystyle \eta } , but we assume we have access to its upperbound η B {\displaystyle \eta _{B}} . Note that if we allow the noise rate to be 1 / 2 {\displaystyle 1/2} , then learning becomes impossible in any amount of computation time, because every label conveys no information about the target function. Definition: We say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the classification noise model if there exists a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x , η ) {\displaystyle {\text{Oracle}}(x,\eta )} and a polynomial p ( ⋅ , ⋅ , ⋅ , ⋅ ) {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot )} such that for any 0 ≤ η ≤ 1 2 {\displaystyle 0\leq \eta \leq {\frac {1}{2}}} , 0 ≤ ε ≤ 1 {\displaystyle 0\leq \varepsilon \leq 1} and 0 ≤ δ ≤ 1 {\displaystyle 0\leq \delta \leq 1} it outputs, in a number of calls to the oracle bounded by p ( 1 1 − 2 η B , 1 ε , 1 δ , n , s i z e ( f ) ) {\displaystyle p\left({\frac {1}{1-2\eta _{B}}},{\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,size(f)\right)} , a function h ∈ H {\displaystyle h\in {\mathcal {H}}} that satisfies with probability at least 1 − δ {\displaystyle 1-\delta } the condition e r r o r ( h ) ≤ ε {\displaystyle error(h)\leq \varepsilon } . == Statistical query learning == Statistical Query Learning is a kind of active learning problem in which the learning algorithm A {\displaystyle {\mathcal {A}}} can decide if to request information about the likelihood P f ( x ) {\displaystyle P_{f(x)}} that a function f {\displaystyle f} correctly labels example x {\displaystyle x} , and receives an answer accurate within a tolerance α {\displaystyle \alpha } . Formally, whenever the learning algorithm A {\displaystyle {\mathcal {A}}} calls the oracle Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} , it receives as feedback probability Q f ( x ) {\displaystyle Q_{f(x)}} , such that Q f ( x ) − α ≤ P f ( x ) ≤ Q f ( x ) + α {\displaystyle Q_{f(x)}-\alpha \leq P_{f(x)}\leq Q_{f(x)}+\alpha } . Definition: We say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the statistical query learning model if there exists a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} and polynomials p ( ⋅ , ⋅ , ⋅ ) {\displaystyle p(\cdot ,\cdot ,\cdot )} , q ( ⋅ , ⋅ , ⋅ ) {\displaystyle q(\cdot ,\cdot ,\cdot )} , and r ( ⋅ , ⋅ , ⋅ ) {\displaystyle r(\cdot ,\cdot ,\cdot )} such that for any 0 < ε ≤ 1 {\displaystyle 0<\varepsilon \leq 1} the following hold: Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} can evaluate P f ( x ) {\displaystyle P_{f(x)}} in time q ( 1 ε , n , s i z e ( f ) ) {\displaystyle q\left({\frac {1}{\varepsilon }},n,size(f)\right)} ; 1 α {\displaystyle {\frac {1}{\alpha }}} is bounded by r ( 1 ε , n , s i z e ( f ) ) {\displaystyle r\left({\frac {1}{\varepsilon }},n,size(f)\right)} A {\displaystyle {\mathcal {A}}} outputs a model h {\displaystyle h} such that e r r ( h ) < ε {\displaystyle err(h)<\varepsilon } , in a number of calls to the oracle bounded by p ( 1 ε , n , s i z e ( f ) ) {\displaystyle p\left({\frac {1}{\varepsilon }},n,size(f)\right)} . Note that the confidence parameter δ {\displaystyle \delta } does not appear in the definition of learning. This is because the main purpose of δ {\displaystyle \delta } is to allow the learning algorithm a small probability of failure due to an unrepresentative sample. Since now Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} always guarantees to meet the approximation criterion Q f ( x ) − α ≤ P f ( x ) ≤ Q f ( x ) + α {\displaystyle Q_{f(x)}-\alpha \leq P_{f(x)}\leq Q_{f(x)}+\alpha } , the failure probability is no longer needed. The statistical query model is strictly weaker than the PAC model: any efficiently SQ-learnable class is efficiently PAC learnable in the presence of classification noise, but there exist efficient PAC-learnable problems such as parity that are not efficiently SQ-learnable. == Malicious classification == In the malicious classification model an adversary generates errors to foil the learning algorithm. This setting describes situations of error burst, which may occur when for a limited time transmission equipment malfunctions repeatedly. Formally, algorithm A {\displaystyle {\mathcal {A}}} calls an oracle Oracle ( x , β ) {\displaystyle {\text{Oracle}}(x,\beta )} that returns a correctly labeled example x {\displaystyle x} drawn, as usual, from distribution D {\displaystyle {\mathcal {D}}} over the input space with probability 1 − β {\displaystyle 1-\beta } , but it returns with probability β {\displaystyle \beta } an example drawn from a distribution that is not related to D {\displaystyle {\mathcal {D}}} . Moreover, this maliciously chosen example may strategically selected by an adversary who has knowledge of f {\displaystyle f} , β {\displaystyle \beta } , D {\displaystyle {\mathcal {D}}} , or the current progress of the learning algorithm. Definition: Given a bound β B < 1 2 {\displaystyle \beta _{B}<{\frac {1}{2}}} for 0 ≤ β < 1 2 {\displaystyle 0\leq \beta <{\frac {1}{2}}} , we say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the malicious classification model, if there exist a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x , β ) {\displaystyle {\text{Oracle}}(x,\beta )}

    Read more →