Best AI Image Generators

Best AI Image Generators — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Supertoroid

    Supertoroid

    In geometry and computer graphics, a supertoroid or supertorus is usually understood to be a family of doughnut-like surfaces (technically, a topological torus) whose shape is defined by mathematical formulas similar to those that define the superellipsoids. The plural of "supertorus" is either supertori or supertoruses. The family was described and named by Alan Barr in 1994. Barr's supertoroids have been fairly popular in computer graphics as a convenient model for many objects, such as smooth frames for rectangular things. One quarter of a supertoroid can provide a smooth and seamless 90-degree joint between two superquadric cylinders. However, they are not algebraic surfaces (except in special cases). == Formulas == Alan Barr's supertoroids are defined by parametric equations similar to the trigonometric equations of the torus, except that the sine and cosine terms are raised to arbitrary powers. Namely, the generic point P(u, v) of the surface is given by P ( u , v ) = ( X ( u , v ) Y ( u , v ) Z ( u , v ) ) = ( ( a + C u s ) C v t ( b + C u s ) S v t S u s ) {\displaystyle P(u,v)=\left({\begin{array}{c}X(u,v)\\Y(u,v)\\Z(u,v)\end{array}}\right)=\left({\begin{array}{c}(a+C_{u}^{s})C_{v}^{t}\\(b+C_{u}^{s})S_{v}^{t}\\S_{u}^{s}\end{array}}\right)} where C θ ε = sgn ⁡ ( cos ⁡ θ ) | cos ⁡ θ | ε , S θ ε = sgn ⁡ ( sin ⁡ θ ) | sin ⁡ θ | ε , {\displaystyle {\begin{aligned}C_{\theta }^{\varepsilon }&=\operatorname {sgn} (\cos \theta )\,\left|\,\cos \theta \,\right|^{\varepsilon },\\S_{\theta }^{\varepsilon }&=\operatorname {sgn} (\sin \theta )\ \left|\,\sin \theta \ \right|^{\varepsilon },\end{aligned}}} sgn is the sign function, and the parameters u, v range from 0 to 360 degrees (0 to 2π radians). In these formulas, the parameter s > 0 controls the "squareness" of the vertical sections, t > 0 controls the squareness of the horizontal sections, and a, b ≥ 1 are the major radii in the x and y directions. With s = t = 1 and a = b = R one obtains the ordinary torus with major radius R and minor radius 1, with the center at the origin and rotational symmetry about the z-axis. In general, the supertorus defined as above spans the intervals: − ( a + 1 ) ≤ x ≤ + ( a + 1 ) − ( b + 1 ) ≤ y ≤ + ( b + 1 ) − 1 ≤ z ≤ + 1 {\displaystyle {\begin{array}{rcccl}-(a+1)&\leq &x&\leq &+(a+1)\\[4pt]-(b+1)&\leq &y&\leq &+(b+1)\\[4pt]-1&\leq &z&\leq &+1\end{array}}} The whole shape is symmetric about the planes x = 0, y = 0, and z = 0. The hole runs in the z direction and spans the intervals − ( a − 1 ) ≤ x ≤ + ( a − 1 ) − ( b − 1 ) ≤ y ≤ + ( b − 1 ) − ∞ ≤ z ≤ + ∞ {\displaystyle {\begin{array}{rcccl}-(a-1)&\leq &x&\leq &+(a-1)\\[4pt]-(b-1)&\leq &y&\leq &+(b-1)\\[4pt]-\infty &\leq &z&\leq &+\infty \end{array}}} A curve of constant u on this surface is a horizontal Lamé curve with exponent ⁠ 2 t , {\displaystyle {\tfrac {2}{t}},} ⁠ scaled in x and y and displaced in z. A curve of constant v, projected on the plane x = 0 or y = 0, is a Lamé curve with exponent ⁠ 2 s , {\displaystyle {\tfrac {2}{s}},} ⁠ scaled and horizontally shifted. If v = 0, the curve is planar and spans the intervals: a − 1 ≤ x ≤ a + 1 − 1 ≤ z ≤ + 1 {\displaystyle {\begin{array}{rcccl}a-1&\leq &x&\leq &a+1\\[4pt]-1&\leq &z&\leq &+1\end{array}}} and similarly if v = 90°, 180°, 270°. The curve is also planar if a = b. In general, if a ≠ b and v is not a multiple of 90 degrees, the curve of constant v will not be planar; and, conversely, a vertical plane section of the supertorus will not be a Lamé curve. The basic supertoroid shape defined above is often modified by non-uniform scaling to yield supertoroids of specific width, length, and vertical thickness. == Plotting code == The following GNU Octave code generates plots of a supertorus:

    Read more →
  • Universal psychometrics

    Universal psychometrics

    Universal psychometrics encompasses psychometrics instruments that could measure the psychological properties of any intelligent agent. Up until the early 21st century, psychometrics relied heavily on psychological tests that require the subject to cooperate and answer questions, the most famous example being an intelligence test. Such methods are only applicable to the measurement of human psychological properties. As a result, some researchers have proposed the idea of universal psychometrics - they suggest developing testing methods that allow for the measurement of non-human entities' psychological properties. For example, it has been suggested that the Turing test is a form of universal psychometrics. This test involves having testers (without any foreknowledge) attempt to distinguish a human from a machine by interacting with both (while not being to see either individuals). It is supposed that if the machine is equally intelligent to a human, the testers will not be able to distinguish between the two, i.e., their guesses will not be better than chance. Thus, Turing test could measure the intelligence (a psychological variable) of an AI. Other instruments proposed for universal psychometrics include reinforcement learning and measuring the ability to predict complexity.

    Read more →
  • Alexander Y. Tetelbaum

    Alexander Y. Tetelbaum

    Alexander Y. Tetelbaum (born August 16, 1948) is a Ukrainian American computer scientist, inventor, and academic who has contributed to electronic design automation (EDA) and artificial intelligence (AI) since the late 1960s; and holds 46 U.S. patents in EDA and related fields. Tetelbaum is the founding president of International Solomon University, the first Jewish university in Ukraine, established during a period of renewed efforts to address antisemitism in Ukraine. == Early life and education == He graduated from a Kyiv mathematical high school with a silver medal in 1966. Tetelbaum enrolled at the Kyiv Polytechnic Institute (KPI), now National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" in 1966, graduating in 1972 with an MS in Electronics with honors. He earned his PhD in Electrical and Computer Engineering from KPI in 1975, with a dissertation on electronic design automation, and his Doctor of Engineering Science in 1986. == Academic career == Tetelbaum began his academic career at KPI in 1973 as a junior scientist, becoming a professor in the Computer and Electrical Engineering Department in 1980. Later, he founded and served as president of International Solomon University in Kyiv from 1991 to 1996, the first Jewish university in Ukraine. The university became a major academic center for computer science and Jewish studies in the post-Soviet era. He was a visiting and adjunct professor at Michigan State University from 1993 to 1996. == Professional career == Tetelbaum worked as an engineer at the Kiev Institute of Cybernetics from 1972 to 1973, and later, he led the Design Automation Lab at Kyiv Polytechnic Institute from 1975 to 1987. In the United States, he served as EDA manager at Silicon Graphics Corporation from 1996 to 1998 and principal engineer at LSI Corporation from 1998 to 2012. He founded and served as CEO of Abelite Design Automation, Inc., from 2012 to 2022. == Contributions in computer science == Tetelbaum has contributed to electronic design automation (EDA) and artificial intelligence (AI) since the 1960s. His early work included methods for EDA, particularly physical design automation and mathematical optimization; and he developed force-directed placement and topological routing methods. Tetelbaum generalized Rent's rule for hierarchical systems and large blocks, proposing a graph-based framework that extends applicability to arbitrary partition sizes with improved accuracy. Additional IEEE and related conference contributions from the mid-1990s include: "Path Search for Complicated Function", 1995 IEEE International Symposium on Circuits and Systems "A Performance-driven Placement Approach of Standard Cells" (International Conference on Intelligent Systems, 1995) "Framework of a New Methodology for Behavioral to Physical Design Linkage" (38th Midwest Symposium on Circuits and Systems, 1996) Statistical timing design and variations Test Methodologies These and other works and patents contributed to timing-driven placement, crosstalk reduction, clock tree synthesis, and interconnect optimization in VLSI design. == Patents == Tetelbaum holds 46 U.S. patents in EDA and related fields. Notable examples include: For the full list of patents, see Justia Patents or Google Patents. == Publications == === Early publications in the Soviet Union === Before the appearance of American books on electronic design automation (EDA), Tetelbaum published several scientific books and monographs on the subject in Russian/Ukrainian. Electronic Design Automation, Kiev: Znanie Publisher, 1975. Planar Design of Electronic Circuits, Kiev: Znanie Publisher, 1977. Formal Design of Computer Systems, Moscow: Sovetskoe Radio, 1979. CAD of Electronic Equipment: Topological Approach, Kiev: Vyssha Shkola, 1980; 2nd ed. 1981. Automated Design of Electronic Circuits (1981) CAD of VLSI Circuits, Kiev: Vyssha Shkola, 1983. Topological Algorithms of Multilayer Printed Circuit Boards Routing, Moscow: Radio i Svyaz, 1983. CAD of VLSI Circuits on Master Slice Chips, Moscow: Radio i Svyaz, 1988. Increasing the Effectiveness of CAD Systems, Kiev: UMKVO, 1991. === Scientific Monographs (English) === Minimum Number of Timing Signoff Corners (2022) Interviewing AI (2026) The AI Debate (2026) New Nostradamus Predictions: 2026: The Next Decade & Beyond (2035–2050+) (2026) For a consolidated record of Tetelbaum's publications, see Alexander Y. Tetelbaum, Wikidata Q4720205. === Other publications === Tetelbaum also published educational books on problem-solving methods: Yes-No Puzzles-Games Puzzle Games for Kids Solving Non-Standard Problems Solving Non-Standard Very Hard Problems Additionally, Tetelbaum published three thrillers: Omerta Operations Executive Director Eruption Yacht Finally, he published his memoir and an entertaining book: Unfinished Equations Artificially Intelligent Humor

    Read more →
  • Leakage (machine learning)

    Leakage (machine learning)

    In statistics and machine learning, leakage (also known as data leakage or target leakage) refers to the use of information during model training that would not be available at prediction time. This results in overly optimistic performance estimates, as the model appears to perform better during evaluation than it actually would in a production environment. Leakage is often subtle and indirect, making it difficult to detect and eliminate. It can lead a statistician or modeler to select a suboptimal model, which may be outperformed by a leakage-free alternative. == Leakage modes == Leakage can occur at multiple stages of the machine learning workflow. Broadly, its sources can be divided into two categories: those arising from features and those arising from training examples. === Feature leakage === Feature or column-wise leakage is caused by the inclusion of columns which are one of the following: a duplicate label, a proxy for the label, or the label itself. These features, known as anachronisms, will not be available when the model is used for predictions, and result in leakage if included when the model is trained. For example, including a "MonthlySalary" column when predicting "YearlySalary"; or "MinutesLate" when predicting "IsLate". === Training example leakage === Row-wise leakage is caused by improper sharing of information between rows of data. Types of row-wise leakage include: Premature featurization; leaking from premature featurization before Cross-validation/Train/Test split (must fit MinMax/ngrams/etc on only the train split, then transform the test set) Duplicate rows between train/validation/test (for example, oversampling a dataset to pad its size before splitting; or, different rotations/augmentations of a single image; bootstrap sampling before splitting; or duplicating rows to up sample the minority class) Non-independent and identically distributed random (non-IID) data Time leakage (for example, splitting a time-series dataset randomly instead of newer data in test set using a train/test split or rolling-origin cross-validation) Group leakage—not including a grouping split column (for example, Andrew Ng's group had 100k x-rays of 30k patients, meaning ~3 images per patient. The paper used random splitting instead of ensuring that all images of a patient were in the same split. Hence the model partially memorized the patients instead of learning to recognize pneumonia in chest x-rays.) A 2023 review found data leakage to be "a widespread failure mode in machine-learning (ML)-based science", having affected at least 294 academic publications across 17 disciplines, and causing a potential reproducibility crisis. == Detection == Data leakage in machine learning can be detected through various methods, focusing on performance analysis, feature examination, data auditing, and model behavior analysis. Performance-wise, unusually high accuracy or significant discrepancies between training and test results often indicate leakage. Inconsistent cross-validation outcomes may also signal issues. Feature examination involves scrutinizing feature importance rankings and ensuring temporal integrity in time series data. A thorough audit of the data pipeline is crucial, reviewing pre-processing steps, feature engineering, and data splitting processes. Detecting duplicate entries across dataset splits is also important. For language models, the Min-K% method can detect the presence of data in a pretraining dataset. It presents a sentence suspected to be present in the pretraining dataset, and computes the log-likelihood of each token, then compute the average of the lowest K of these. If this exceeds a threshold, then the sentence is likely present. This method is improved by comparing against a baseline of the mean and variance. Analyzing model behavior can reveal leakage. Models relying heavily on counter-intuitive features or showing unexpected prediction patterns warrant investigation. Performance degradation over time when tested on new data may suggest earlier inflated metrics due to leakage. Advanced techniques include backward feature elimination, where suspicious features are temporarily removed to observe performance changes. Using a separate hold-out dataset for final validation before deployment is advisable.

    Read more →
  • Image texture

    Image texture

    An image texture is the small-scale structure perceived on an image, based on the spatial arrangement of color or intensities. It can be quantified by a set of metrics calculated in image processing. Image texture metrics give us information about the whole image or selected regions. Image textures can be artificially created or found in natural scenes captured in an image. Image textures are one way that can be used to help in segmentation or classification of images. For more accurate segmentation the most useful features are spatial frequency and an average grey level. To analyze an image texture in computer graphics, there are two ways to approach the issue: structured approach and statistical approach. == Structured approach == A structured approach sees an image texture as a set of primitive texels in some regular or repeated pattern. This works well when analyzing artificial textures. To obtain a structured description a characterization of the spatial relationship of the texels is gathered by using Voronoi tessellation of the texels. == Statistical approach == A statistical approach sees an image texture as a quantitative measure of the arrangement of intensities in a region. In general this approach is easier to compute and is more widely used, since natural textures are made of patterns of irregular subelements. === Edge detection === The use of edge detection is to determine the number of edge pixels in a specified region, helps determine a characteristic of texture complexity. After edges have been found the direction of the edges can also be applied as a characteristic of texture and can be useful in determining patterns in the texture. These directions can be represented as an average or in a histogram. Consider a region with N pixels. the gradient-based edge detector is applied to this region by producing two outputs for each pixel p: the gradient magnitude Mag(p) and the gradient direction Dir(p). The edgeness per unit area can be defined by F e d g e n e s s = | { p | M a g ( p ) > T } | N {\displaystyle F_{edgeness}={\frac {|\{p|Mag(p)>T\}|}{N}}} for some threshold T. To include orientation with edgeness histograms for both gradient magnitude and gradient direction can be used. Hmag(R) denotes the normalized histogram of gradient magnitudes of region R, and Hdir(R) denotes the normalized histogram of gradient orientations of region R. Both are normalized according to the size NR Then F m a g , d i r = ( H m a g ( R ) , H d i r ( R ) ) {\displaystyle F_{mag,dir}=(H_{mag}(R),H_{dir}(R))} is a quantitative texture description of region R. === Co-occurrence matrices === The co-occurrence matrix captures numerical features of a texture using spatial relations of similar gray tones. Numerical features computed from the co-occurrence matrix can be used to represent, compare, and classify textures. The following are a subset of standard features derivable from a normalized co-occurrence matrix: A n g u l a r 2 n d M o m e n t = ∑ i ∑ j p [ i , j ] 2 C o n t r a s t = ∑ i = 1 N g ∑ j = 1 N g n 2 p [ i , j ] , where | i − j | = n C o r r e l a t i o n = ∑ i = 1 N g ∑ j = 1 N g ( i j ) p [ i , j ] − μ x μ y σ x σ y E n t r o p y = − ∑ i ∑ j p [ i , j ] l n ( p [ i , j ] ) {\displaystyle {\begin{aligned}Angular{\text{ }}2nd{\text{ }}Moment&=\sum _{i}\sum _{j}p[i,j]^{2}\\Contrast&=\sum _{i=1}^{Ng}\sum _{j=1}^{Ng}n^{2}p[i,j]{\text{, where }}|i-j|=n\\Correlation&={\frac {\sum _{i=1}^{Ng}\sum _{j=1}^{Ng}(ij)p[i,j]-\mu _{x}\mu _{y}}{\sigma _{x}\sigma _{y}}}\\Entropy&=-\sum _{i}\sum _{j}p[i,j]ln(p[i,j])\\\end{aligned}}} where p [ i , j ] {\displaystyle p[i,j]} is the [ i , j ] {\displaystyle [i,j]} th entry in a gray-tone spatial dependence matrix, and Ng is the number of distinct gray-levels in the quantized image. One negative aspect of the co-occurrence matrix is that the extracted features do not necessarily correspond to visual perception. It is used in dentistry for the objective evaluation of lesions [DOI: 10.1155/2020/8831161], treatment efficacy [DOI: 10.3390/ma13163614; DOI: 10.11607/jomi.5686; DOI: 10.3390/ma13173854; DOI: 10.3390/ma13132935] and bone reconstruction during healing [DOI: 10.5114/aoms.2013.33557; DOI: 10.1259/dmfr/22185098; EID: 2-s2.0-81455161223; DOI: 10.3390/ma13163649]. === Laws texture energy measures === Another approach is to use local masks to detect various types of texture features. Laws originally used four vectors representing texture features to create sixteen 2D masks from the outer products of the pairs of vectors. The four vectors and relevant features were as follows: L5 = [ +1 +4 6 +4 +1 ] (Level) E5 = [ -1 -2 0 +2 +1 ] (Edge) S5 = [ -1 0 2 0 -1 ] (Spot) R5 = [ +1 -4 6 -4 +1 ] (Ripple) To these 4, a fifth is sometimes added: W5 = [ -1 +2 0 -2 +1 ] (Wave) From Laws' 4 vectors, 16 5x5 "energy maps" are then filtered down to 9 in order to remove certain symmetric pairs. For instance, L5E5 measures vertical edge content and E5L5 measures horizontal edge content. The average of these two measures is the "edginess" of the content. The resulting 9 maps used by Laws are as follows: L5E5/E5L5 L5R5/R5L5 E5S5/S5E5 S5S5 R5R5 L5S5/S5L5 E5E5 E5R5/R5E5 S5R5/R5S5 Running each of these nine maps over an image to create a new image of the value of the origin ([2,2]) results in 9 "energy maps," or conceptually an image with each pixel associated with a vector of 9 texture attributes. === Autocorrelation and power spectrum === The autocorrelation function of an image can be used to detect repetitive patterns of textures. == Texture segmentation == The use of image texture can be used as a description for regions into segments. There are two main types of segmentation based on image texture, region based and boundary based. Though image texture is not a perfect measure for segmentation it is used along with other measures, such as color, that helps solve segmenting in image. === Region based === Attempts to group or cluster pixels based on texture properties. === Boundary based === Attempts to group or cluster pixels based on edges between pixels that come from different texture properties.

    Read more →
  • Superintelligence ban

    Superintelligence ban

    Superintelligence ban refers to proposed legal, ethical, or policy measures intended to restrict or prohibit the development of artificial superintelligence, AI systems that would surpass human cognitive abilities in nearly all domains. The idea arises from concerns that such systems could become uncontrollable, potentially posing existential threats to humanity or causing severe social and economic disruption. == Background == The concept of limiting or banning superintelligence research has roots in early 21st-century debates on artificial general intelligence (AGI) safety. Thinkers such as Nick Bostrom and Eliezer Yudkowsky warned that self-improving AI could rapidly exceed human oversight. As advanced models like large-scale language models and autonomous agents began demonstrating complex reasoning abilities, policymakers and ethicists increasingly discussed the need for legal constraints on the creation of systems capable of recursive self-improvement. In October 2025, the Future of Life Institute published a statement calling for "a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it will be done safely and controllably, and strong public buy-in." This statement was signed by various public personalities, such as Richard Branson and Steve Wozniak, and AI experts, such as Yoshua Bengio and Geoffrey Hinton. == Rationale == Supporters of a superintelligence ban argue that once AI systems surpass human intelligence, traditional containment, alignment, and control methods may fail. They contend that even limited experimentation with such systems could lead to irreversible outcomes, including loss of human decision-making power or unintended global harm. Some propose international treaties modeled after the nuclear non-proliferation framework to prevent a competitive AI arms race. Opponents argue that a ban would be difficult to define and enforce, given the lack of a precise threshold distinguishing advanced AGI from superintelligence. They also warn that excessive restriction could slow scientific progress, hinder beneficial automation, and encourage unregulated underground research. == Global discussion == Although no government has enacted an explicit superintelligence ban, the idea has been debated within the European Union, United Nations, and several independent AI safety organizations. The Future of Life Institute, Center for AI Safety, and other organizations have called for international cooperation to manage risks associated with the pursuit of superintelligent systems. In 2024 and 2025, proposals for a temporary moratorium on frontier AI research were circulated among major technology firms and research institutes, reflecting growing public concern over the trajectory of AI capabilities.

    Read more →
  • The AI Con

    The AI Con

    The AI Con: How to Fight Big Tech's Hype and Create the Future We Want is a 2025 non-fiction book by linguist Emily M. Bender and sociologist Alex Hanna. It argues that much of what is labeled "artificial intelligence" is a misleading term that obscures ordinary automation while concentrating power in a small number of technology firms. The book was published in May 2025 by Harper in the United States and Bodley Head in the United Kingdom. It was developed alongside the authors' long-running podcast Mystery AI Hype Theater 3000, which critiques exaggerated claims about AI. == Synopsis == The authors present AI as a marketing umbrella that encourages audiences to infer understanding and agency where none exist. They argue readers should treat such language skeptically and to separate specific automated tasks from broad claims of intelligence. The book describes a recurring hype cycle in which corporate narratives justify data and labor extraction, the replacement of human services with cheaper substitutes, and the diversion of attention from present harms to speculative futures. While acknowledging limited uses such as pattern recognition, the authors argue that contemporary systems are best understood as text and media generators shaped by training data and human labor, not as thinking or reasoning entities. A central theme is the social and environmental cost of scaling these systems, including increased energy and water use, the appropriation of creative work for training, and the outsourcing of ghost work to low-paid data workers worldwide. These costs are linked to workplace effects, with the authors arguing that automation rarely eliminates jobs outright and more often degrades them through surveillance, work intensification, and unpaid oversight. As alternatives to passive adoption, the authors propose concrete responses: asking precise questions about what is being automated and why, demanding transparency about data and evaluation, and practicing what they call strategic refusal when deployment conflicts with evidence or values. The book also develops a vocabulary for public debate, rejecting both boosterish and doomerish narratives as grounded in the same assumption that AI is a singular, autonomous force. The authors recommend reading strategies such as favoring trusted human sources over automated summaries and using humor to deflate inflated claims. They describe a link between language to policy and power, arguing that precise terminology can help policymakers and the public resist austerity-driven automation and demand accountability for errors and harms. == Reception == The Guardian praised the book's myth-busting approach and its analysis of how hype erodes cultural and civic life by normalizing synthetic media as a substitute for human judgment. Kirkus Reviews described it as a contrarian account that catalogs concrete risks while cutting through speculative predictions. An interview in Business Insider highlighted the authors' accessible frameworks, including their proposal to describe chatbots as conversation simulators and to evaluate systems in terms of values, labor, and evidence. Coverage in GeekWire emphasized the book's call for resistance through collective bargaining, stronger data rights, and a norm of rejecting deployments that fail basic standards of necessity and evaluation. Some reviews were more critical. A review in LLRX argued that the book's tone could be overly polemical and that it gave limited attention to potential benefits claimed for generative systems. Coverage in the Financial Times, focused on Bender's broader public scholarship, situated the book within her long-standing critique of anthropomorphic narratives about large language models and her advocacy for more democratic oversight of automated systems.

    Read more →
  • Bayesian programming

    Bayesian programming

    Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available. Edwin T. Jaynes proposed that probability could be considered as an alternative and an extension of logic for rational reasoning with incomplete and uncertain information. In his founding book Probability Theory: The Logic of Science he developed this theory and proposed what he called "the robot," which was not a physical device, but an inference engine to automate probabilistic reasoning—a kind of Prolog for probability instead of logic. Bayesian programming is a formal and concrete implementation of this "robot". Bayesian programming may also be seen as an algebraic formalism to specify graphical models such as, for instance, Bayesian networks, dynamic Bayesian networks, Kalman filters or hidden Markov models. Indeed, Bayesian programming is more general than Bayesian networks and has a power of expression equivalent to probabilistic factor graphs. == Formalism == A Bayesian program is a means of specifying a family of probability distributions. The constituent elements of a Bayesian program are presented below: Program { Description { Specification ( π ) { Variables Decomposition Forms Identification (based on δ ) Question {\displaystyle {\text{Program}}{\begin{cases}{\text{Description}}{\begin{cases}{\text{Specification}}(\pi ){\begin{cases}{\text{Variables}}\\{\text{Decomposition}}\\{\text{Forms}}\\\end{cases}}\\{\text{Identification (based on }}\delta )\end{cases}}\\{\text{Question}}\end{cases}}} A program is constructed from a description and a question. A description is constructed using some specification ( π {\displaystyle \pi } ) as given by the programmer and an identification or learning process for the parameters not completely specified by the specification, using a data set ( δ {\displaystyle \delta } ). A specification is constructed from a set of pertinent variables, a decomposition and a set of forms. Forms are either parametric forms or questions to other Bayesian programs. A question specifies which probability distribution has to be computed. === Description === The purpose of a description is to specify an effective method of computing a joint probability distribution on a set of variables { X 1 , X 2 , ⋯ , X N } {\displaystyle \left\{X_{1},X_{2},\cdots ,X_{N}\right\}} given a set of experimental data δ {\displaystyle \delta } and some specification π {\displaystyle \pi } . This joint distribution is denoted as: P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) {\displaystyle P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)} . To specify preliminary knowledge π {\displaystyle \pi } , the programmer must undertake the following: Define the set of relevant variables { X 1 , X 2 , ⋯ , X N } {\displaystyle \left\{X_{1},X_{2},\cdots ,X_{N}\right\}} on which the joint distribution is defined. Decompose the joint distribution (break it into relevant independent or conditional probabilities). Define the forms of each of the distributions (e.g., for each variable, one of the list of probability distributions). ==== Decomposition ==== Given a partition of { X 1 , X 2 , … , X N } {\displaystyle \left\{X_{1},X_{2},\ldots ,X_{N}\right\}} containing K {\displaystyle K} subsets, K {\displaystyle K} variables are defined L 1 , ⋯ , L K {\displaystyle L_{1},\cdots ,L_{K}} , each corresponding to one of these subsets. Each variable L k {\displaystyle L_{k}} is obtained as the conjunction of the variables { X k 1 , X k 2 , ⋯ } {\displaystyle \left\{X_{k_{1}},X_{k_{2}},\cdots \right\}} belonging to the k t h {\displaystyle k^{th}} subset. Recursive application of Bayes' theorem leads to: P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) = P ( L 1 ∧ ⋯ ∧ L K ∣ δ ∧ π ) = P ( L 1 ∣ δ ∧ π ) × P ( L 2 ∣ L 1 ∧ δ ∧ π ) × ⋯ × P ( L K ∣ L K − 1 ∧ ⋯ ∧ L 1 ∧ δ ∧ π ) {\displaystyle {\begin{aligned}&P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\wedge \cdots \wedge L_{K}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\mid \delta \wedge \pi \right)\times P\left(L_{2}\mid L_{1}\wedge \delta \wedge \pi \right)\times \cdots \times P\left(L_{K}\mid L_{K-1}\wedge \cdots \wedge L_{1}\wedge \delta \wedge \pi \right)\end{aligned}}} Conditional independence hypotheses then allow further simplifications. A conditional independence hypothesis for variable L k {\displaystyle L_{k}} is defined by choosing some variable X n {\displaystyle X_{n}} among the variables appearing in the conjunction L k − 1 ∧ ⋯ ∧ L 2 ∧ L 1 {\displaystyle L_{k-1}\wedge \cdots \wedge L_{2}\wedge L_{1}} , labelling R k {\displaystyle R_{k}} as the conjunction of these chosen variables and setting: P ( L k ∣ L k − 1 ∧ ⋯ ∧ L 1 ∧ δ ∧ π ) = P ( L k ∣ R k ∧ δ ∧ π ) {\displaystyle P\left(L_{k}\mid L_{k-1}\wedge \cdots \wedge L_{1}\wedge \delta \wedge \pi \right)=P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)} We then obtain: P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) = P ( L 1 ∣ δ ∧ π ) × P ( L 2 ∣ R 2 ∧ δ ∧ π ) × ⋯ × P ( L K ∣ R K ∧ δ ∧ π ) {\displaystyle {\begin{aligned}&P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\mid \delta \wedge \pi \right)\times P\left(L_{2}\mid R_{2}\wedge \delta \wedge \pi \right)\times \cdots \times P\left(L_{K}\mid R_{K}\wedge \delta \wedge \pi \right)\end{aligned}}} Such a simplification of the joint distribution as a product of simpler distributions is called a decomposition, derived using the chain rule. This ensures that each variable appears at the most once on the left of a conditioning bar, which is the necessary and sufficient condition to write mathematically valid decompositions. ==== Forms ==== Each distribution P ( L k ∣ R k ∧ δ ∧ π ) {\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)} appearing in the product is then associated with either a parametric form (i.e., a function f μ ( L k ) {\displaystyle f_{\mu }\left(L_{k}\right)} ) or a question to another Bayesian program P ( L k ∣ R k ∧ δ ∧ π ) = P ( L ∣ R ∧ δ ^ ∧ π ^ ) {\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)=P\left(L\mid R\wedge {\widehat {\delta }}\wedge {\widehat {\pi }}\right)} . When it is a form f μ ( L k ) {\displaystyle f_{\mu }\left(L_{k}\right)} , in general, μ {\displaystyle \mu } is a vector of parameters that may depend on R k {\displaystyle R_{k}} or δ {\displaystyle \delta } or both. Learning takes place when some of these parameters are computed using the data set δ {\displaystyle \delta } . An important feature of Bayesian programming is this capacity to use questions to other Bayesian programs as components of the definition of a new Bayesian program. P ( L k ∣ R k ∧ δ ∧ π ) {\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)} is obtained by some inferences done by another Bayesian program defined by the specifications π ^ {\displaystyle {\widehat {\pi }}} and the data δ ^ {\displaystyle {\widehat {\delta }}} . This is similar to calling a subroutine in classical programming and provides an easy way to build hierarchical models. === Question === Given a description (i.e., P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) {\displaystyle P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)} ), a question is obtained by partitioning { X 1 , X 2 , ⋯ , X N } {\displaystyle \left\{X_{1},X_{2},\cdots ,X_{N}\right\}} into three sets: the searched variables, the known variables and the free variables. The 3 variables S e a r c h e d {\displaystyle Searched} , K n o w n {\displaystyle Known} and F r e e {\displaystyle Free} are defined as the conjunction of the variables belonging to these sets. A question is defined as the set of distributions: P ( S e a r c h e d ∣ Known ∧ δ ∧ π ) {\displaystyle P\left(Searched\mid {\text{Known}}\wedge \delta \wedge \pi \right)} made of many "instantiated questions" as the cardinal of K n o w n {\displaystyle Known} , each instantiated question being the distribution: P ( Searched ∣ Known ∧ δ ∧ π ) {\displaystyle P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)} === Inference === Given the joint distribution P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) {\displaystyle P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)} , it is always possible to compute any possible question using the following general inference: P ( Searched ∣ Known ∧ δ ∧ π ) = ∑ Free [ P ( Searched ∧ Free ∣ Known ∧ δ ∧ π ) ] = ∑ Free [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] P ( Known ∣ δ ∧ π ) = ∑ Free [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] ∑ Free ∧ Searched [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] = 1 Z × ∑ Free [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] {\displaystyle {\begin{aligned}&P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)\\={}&\sum _{\text{Free}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\mid {\text{Known}}\wedge \delta \wedge \

    Read more →
  • Instance selection

    Instance selection

    Instance selection (or dataset reduction, or dataset condensation) is an important data pre-processing step that can be applied in many machine learning (or data mining) tasks. Approaches for instance selection can be applied for reducing the original dataset to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Algorithms of instance selection can also be applied for removing noisy instances, before applying learning algorithms. This step can improve the accuracy in classification problems. Algorithm for instance selection should identify a subset of the total available data to achieve the original purpose of the data mining (or machine learning) application as if the whole data had been used. Considering this, the optimal outcome of IS would be the minimum data subset that can accomplish the same task with no performance loss, in comparison with the performance achieved when the task is performed using the whole available data. Therefore, every instance selection strategy should deal with a trade-off between the reduction rate of the dataset and the classification quality. == Instance selection algorithms == The literature provides several different algorithms for instance selection. They can be distinguished from each other according to several different criteria. Considering this, instance selection algorithms can be grouped in two main classes, according to what instances they select: algorithms that preserve the instances at the boundaries of classes and algorithms that preserve the internal instances of the classes. Within the category of algorithms that select instances at the boundaries it is possible to cite DROP3, ICF and LSBo. On the other hand, within the category of algorithms that select internal instances, it is possible to mention ENN and LSSm. In general, algorithm such as ENN and LSSm are used for removing harmful (noisy) instances from the dataset. They do not reduce the data as the algorithms that select border instances, but they remove instances at the boundaries that have a negative impact on the data mining task. They can be used by other instance selection algorithms, as a filtering step. For example, the ENN algorithm is used by DROP3 as the first step, and the LSSm algorithm is used by LSBo. There is also another group of algorithms that adopt different selection criteria. For example, the algorithms LDIS, CDIS and XLDIS select the densest instances in a given arbitrary neighborhood. The selected instances can include both, border and internal instances. The LDIS and CDIS algorithms are very simple and select subsets that are very representative of the original dataset. Besides that, since they search by the representative instances in each class separately, they are faster (in terms of time complexity and effective running time) than other algorithms, such as DROP3 and ICF. Besides that, there is a third category of algorithms that, instead of selecting actual instances of the dataset, select prototypes (that can be synthetic instances). In this category it is possible to include PSSA, PSDSP and PSSP. The three algorithms adopt the notion of spatial partition (a hyperrectangle) for identifying similar instances and extract prototypes for each set of similar instances. In general, these approaches can also be modified for selecting actual instances of the datasets. The algorithm ISDSP adopts a similar approach for selecting actual instances (instead of prototypes).

    Read more →
  • AI Security Institute

    AI Security Institute

    The AI Security Institute (AISI) is a research organisation under the Department for Science, Innovation and Technology, UK, that aims "to equip governments with a scientific understanding of the risks posed by advanced AI". It conducts research and develop and test mitigations. Previously, it was known as the AI Safety Institute. Its creation followed world's first major AI Safety Summit that was held in Bletchley Park in 2023. The institute's professed goal is "building the world's leading understanding of advanced AI risks and solutions, to inform governments so they can keep the public safe". It is designed like a startup in the government "combining the authority of government with the expertise and agility of the private sector". AISI has made access agreements with Anthropic, Google and OpenAI to test their models before release. It has an open source platform called Inspect that permits companies, governments and academics to run standardised safety tests for AI usage. Among the works AISI has done is the reported detection of multiple serious vulnerabilities that could enable development of biological weapons; the vulnerabilities were fixed before the model was launched. It conducts research on diverse fields of AI application. One study by AISI found that LLMs post-trained for political persuasiveness became systematically less accurate and up to 51% more persuasive on political issues. AISI has also worked on the usage of AI for emotional needs. It found that nearly 10 percent of UK citizens used systems like chatbots for emotional purposes on a weekly basis. It found that "systems are now outperforming PhD-level researchers on scientific knowledge tests and helping non-experts succeed at lab work that would previously have been out of reach" in a report published in December 2025. Former chief AI officer of GCHQ Adam Beaumont is the institution's interim director. UK prime minister's AI advisor Jade Leung is the chief technology officer.

    Read more →
  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Artificial intelligence in education

    Artificial intelligence in education

    Artificial intelligence in education (often abbreviated as AIEd) is a subfield of educational technology that studies how to use artificial intelligence to create learning environments. Considerations in the field include data-driven decision-making, AI ethics, data privacy and AI literacy. Concerns include the potential for cheating, over-reliance, equity of access, reduced critical thinking, and the perpetuation of misinformation and bias. == History == Efforts to integrate AI into educational contexts have often followed technological advancement in the history of artificial intelligence. In the 1960s, educators and researchers began developing computer-based instruction systems, such as PLATO, developed by the University of Illinois. In the 1970s and 1980s, intelligent tutoring systems (ITS) were being adapted for classroom instruction. The International Artificial Intelligence in Education Society was founded in 1993. Coinciding with the AI boom of the 2020s, the use of large language models in the global north has been promoted and funded by venture capital and big tech. Companies creating AI services have targeted students and educational institutions as customers. Similarly, pre-AI boom educational companies have expanded their use of AI technologies. These commercial incentives for AIEd use may be related to a potential AI bubble. In the U.S., bipartisan support of AI development in K-12 education has been expressed, but specific implementations and best practices remain contentious. == Theory == AIEd applies theory from education studies, machine learning, and related fields. A 2019 review of the previous decade of studies found that most research prioritized technological design over pedagogical integration. Ouyang and Jiao (2021) propose three paradigms for AI in education, which follow roughly from least to most learner-centered and from requiring least to most technical complexity from the AI systems: AI-directed, learner-as-recipient: AIEd systems present a pre-set curriculum based on statistical patterns that do not adjust to learner's feedback. AI-supported, learner-as-collaborator: Systems that incorporate responsiveness to learner's feedback through, for example, natural language processing, wherein AI can support knowledge construction. AI-empowered, learner-as-leader: This model seeks to position AI as a supplement to human intelligence wherein learners take agency and AI provides consistent and actionable feedback. Some scholars place AI in education within a socio-technical framework. This positions AI alongside other emerging educational technologies, such as computing, the internet, and social media. The framework of Tsao, Heinrichs and Camit (2025) draws on new materialism and posthumanism, specifically Donna Haraway's concept of sympoiesis (making-with). This perspective views learning as an entanglement of human and non-human actors (students, teachers, and AI algorithms), where knowledge is co-composed in contact zones between human context and algorithmic prediction. AI agents have been trained on biased datasets, and thus continue to perpetuate societal biases. Since LLMs were created to produce human-like text, algorithmic bias can be introduced and reproduced. AI's data processing and monitoring reinforce neoliberal approaches to education rather than addressing inequalities. == Applications == Uses of generative AI chatbots in education have included assessment and feedback, machine translations, proof-reading exam question generation and copy editing, or as virtual assistants. Emotional AI in education is the study and development of systems that can detect learners' emotions or provide emotional support in learning. == Usage == === Schools and educators === Following the release of ChatGPT in November 2022, some schools and large school districts blocked access to the site and issued warnings that the use of such tools would be seen as cheating. Governmental and non-governmental organizations such as UNESCO, Article 4 of the European Union's AI Act, and the U.S. Department of Education have published reports advocating for specific AIEd approaches. National higher-education bodies have also published guidance on generative AI, including Ireland's Higher Education Authority, which issued a policy framework for higher education teaching and learning in December 2025. In 2024, UNESCO released updated global guidance for generative AI in education, emphasizing ethical use, teacher training, and data protection to ensure responsible integration of AI tools in learning environments. According to Taso (2025), policy implementation in higher education is interpreted and enacted differently by various organizations. These decentralized policies can lead to inconsistent enforcement and confusion among students regarding what constitutes acceptable use, with the burden of ethical navigation falling on individual teachers and students. AI integration in classrooms has created new forms of invisible labour for educators, who must navigate ambiguous policies, redesign assessments to be AI-resilient, and adjudicate potential academic integrity violations. The use of AI detection tools has also been criticised for creating an adversarial relationship between students and institutions, where students may be falsely accused of misconduct based on probabilistic software. AIEd advocates say that efforts should be made towards increasing global accessibility and training educators to serve underprivileged areas. === Students === Reliance on generative AI has been linked with reduced academic self-esteem and performance, and heightened learned helplessness. Algorithm errors and hallucinations are common flaws in AI agents, making them less trustworthy and reliable. According to a 2025 survey from Inside Higher Ed, 85% of higher education students use generative AI technology in some way, with 25% using AI to complete assignments for them. The most common reason cited for using AI to cheat was pressure to get high grades. 97% of students wanted some form of action from schools on the threat to academic integrity caused by AI, with the most popular options being clearer policies and more education about ethical uses of AI. In September 2025, The Atlantic published an op-ed from a high school senior arguing that the normalization of AI cheating was eroding critical thinking, academic integrity, creativity, and the shared student experience.

    Read more →
  • SAP NetWeaver Visual Composer

    SAP NetWeaver Visual Composer

    SAP NetWeaver Visual Composer is SAP’s web-based software modelling tool. It enables business process specialists and developers to create business application components, without coding. Visual Composer produces applications in a declarative form, enabling code-free execution mode for multiple runtime environments. It provides application lifecycle support by maintaining the connection between an application and its model throughout its lifecycle. Visual Composer is designed with an open architecture, which enables developers to extend its design-time environment and modelling language, as well as to integrate external data services. The tool aims to increase productivity by reducing development effort time, and narrowing the gap between application definition and implementation. Starting with a blank canvas, the Visual Composer user, typically a business process specialist, draws the application in Visual Composer Storyboard (workspace), without writing code, to prototype, design and produce applications. A typical workflow for creating, deploying and running an application using Visual Composer is: Create a model Discover data services and add them to the model Select necessary UI elements and add them to the model Connect model elements to define the model logic and data flow Edit the layout Arranging the UI elements and the controls of the application on forms and tables. Deploy the model This step includes compilation, validation and deployment to a selected environment. Run the application The application can run using different runtime environment (such as Adobe Flex and HTML). In 2014 a runtime environment was introduced that is utilizing HTML5 capabilities of SAPUI5.

    Read more →
  • Label noise

    Label noise

    Label noise refers to errors or inaccuracies in the class labels of data instances. This is a widespread issue in machine learning datasets, arising from human annotator mistakes, unclear labeling instructions, automated labeling methods, or adversarial attacks in supervised learning. Label noise can be roughly divided into random noise, where labels are flipped independently of input features, and systematic noise, where mislabeling is dependent on certain patterns or biases in the data. Label noise can be damaging to model performance, especially for complex models that may overfit to noisy labels rather than generalizable patterns. Many approaches have been proposed to deal with the effects of label noise, including robust loss functions, noise-tolerant algorithms, data cleaning methods, and semi-supervised learning approaches. To reduce the impact of wrong labels during training, techniques like label smoothing, sample reweighting and using trusted validation sets are used. The role of noise-robust training paradigms and curriculum learning strategies to improve resilience against mislabeled data is also explored in recent research.

    Read more →
  • Gödel machine

    Gödel machine

    A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. == Limitations == Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired. This does not take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome. The Gödel machine has limitations of its own, however. According to Gödel's First Incompleteness Theorem, any formal system that encompasses arithmetic is either flawed or allows for statements that cannot be proved in the system. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove. == Variables of interest == There are three variables that are particularly useful in the run time of the Gödel machine. At some time t {\displaystyle t} , the variable time {\displaystyle {\text{time}}} will have the binary equivalent of t {\displaystyle t} . This is incremented steadily throughout the run time of the machine. Any input meant for the Gödel machine from the natural environment is stored in variable x {\displaystyle x} . It is likely the case that x {\displaystyle x} will hold different values for different values of variable time {\displaystyle {\text{time}}} . The outputs of the Gödel machine are stored in variable y {\displaystyle y} , where y ( t ) {\displaystyle y(t)} would be the output bit-string at some time t {\displaystyle t} . At any given time t {\displaystyle t} , where ( 1 ≤ t ≤ T ) {\displaystyle (1\leq t\leq T)} , the goal is to maximize future success or utility. A typical utility function follows the pattern u ( s , E n v ) : S × E → R {\displaystyle u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R} } : u ( s , E n v ) = E μ [ ∑ τ = time T r ( τ ) ∣ s , E n v ] {\displaystyle u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}} where r ( t ) {\displaystyle r(t)} is a real-valued reward input (encoded within s ( t ) {\displaystyle s(t)} ) at time t {\displaystyle t} , E μ [ ⋅ ∣ ⋅ ] {\displaystyle E_{\mu }[\cdot \mid \cdot ]} denotes the conditional expectation operator with respect to some possibly unknown distribution μ {\displaystyle \mu } from a set M {\displaystyle M} of possible distributions ( M {\displaystyle M} reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned time = time ⁡ ( s ) {\displaystyle {\text{time}}=\operatorname {time} (s)} is a function of state s {\displaystyle s} which uniquely identifies the current cycle. Note that we take into account the possibility of extending the expected lifespan through appropriate actions. == Instructions used by proof techniques == The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification. === get-axiom(n) === Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme: Hardware Axioms formally specify how components of the machine could change from one cycle to the next. Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs. Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y. Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine. Initial State Axioms contain information about how to reconstruct parts or all of the initial state. Utility Axioms describe the overall goal in the form of utility function u. === apply-rule(k, m, n) === Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof. === delete-theorem(m) === Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function. === set-switchprog(m, n) === Replaces switchprog S pm:n, provided it is a non-empty substring of S p. === check() === Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs). === state2theorem(m, n) === Takes in two arguments, m and n, and attempts to convert the contents of Sm:n into a theorem. == Example applications == === Time-limited NP-hard optimization === The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias. === Fast theorem proving === Prove or disprove as quickly as possible that all even integers > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof. === Maximizing expected reward with bounded resources === A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.

    Read more →