AI Headshot Generator

AI Headshot Generator — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Cognitive philology

    Cognitive philology

    Cognitive philology is the science that studies written and oral texts as the product of human mental processes. Studies in cognitive philology compare documentary evidence emerging from textual investigations with results of experimental research, especially in the fields of cognitive and ecological psychology, neurosciences and artificial intelligence. "The point is not the text, but the mind that made it". Cognitive Philology aims to foster communication between literary, textual, philological disciplines on the one hand and researches across the whole range of the cognitive, evolutionary, ecological and human sciences on the other. Cognitive philology: investigates transmission of oral and written text, and categorization processes which lead to classification of knowledge, mostly relying on the information theory; studies how narratives emerge in so called natural conversation and selective process which lead to the rise of literary standards for storytelling, mostly relying on embodied semantics; explores the evolutive and evolutionary role played by rhythm and metre in human ontogenetic and phylogenetic development and the pertinence of the semantic association during processing of cognitive maps; Provides the scientific ground for multimedia critical editions of literary texts. Among the founding thinkers and noteworthy scholars devoted to such investigations are: Alan Richardson: Studies Theory of Mind in early-modern and contemporary literature. Anatole Pierre Fuksas Benoît de Cornulier David Herman: Professor of English at North Carolina State University and an adjunct professor of linguistics at Duke University. He is the author of "Universal Grammar and Narrative Form" and the editor of "Narratologies: New Perspectives on Narrative Analysis". Domenico Fiormonte François Recanati Gilles Fauconnier, a professor in Cognitive science at the University of California, San Diego. He was one of the founders of cognitive linguistics in the 1970s through his work on pragmatic scales and mental spaces. His research explores the areas of conceptual integration and compressions of conceptual mappings in terms of the emergent structure in language. Julián Santano Moreno Luca Nobile Manfred Jahn in Germany Mark Turner Paolo Canettieri

    Read more →
  • ASR-complete

    ASR-complete

    ASR-complete is, by analogy to "NP-completeness" in complexity theory, a term to indicate that the difficulty of a computational problem is equivalent to solving the central automatic speech recognition problem, i.e. recognize and understanding spoken language. Unlike "NP-completeness", this term is typically used informally. Such problems are hypothesised to include: Spoken natural language understanding Understanding speech from far-field microphones, i.e. handling the reverbation and background noise These problems are easy for humans to do (in fact, they are described directly in terms of imitating humans). Some systems can solve very simple restricted versions of these problems, but none can solve them in their full generality.

    Read more →
  • Neuro-symbolic AI

    Neuro-symbolic AI

    Neuro-symbolic AI is a subfield of artificial intelligence that integrates neural methods (e.g., neural networks and deep learning) with symbolic methods (e.g., formal logic, knowledge representation, and automated reasoning). The goal is to combine the strengths of both approaches, resulting in AI systems that can be trained from raw data and demonstrate robustness against outliers or errors in the base data, while preserving explainability, explicit use of expert knowledge, and explicit cognitive reasoning. As argued by Leslie Valiant and others, the effective construction of rich computational cognitive models demands the combination of symbolic reasoning and efficient machine learning. Gary Marcus argued, "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much of useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only known machinery that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation." Angelo Dalli, Henry Kautz, Francesca Rossi, and Bart Selman also argued for such a synthesis. Their arguments attempt to address the two kinds of thinking, as discussed in Daniel Kahneman's book Thinking, Fast and Slow. It describes cognition as encompassing two components: System 1 is fast, reflexive, intuitive, and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is used for pattern recognition. System 2 handles planning, deduction, and deliberative thinking. In this view, deep learning best handles the first kind of cognition, while symbolic reasoning best handles the second kind. Both are necessary for the development of a robust and reliable AI system capable of learning, reasoning, and interacting with humans to accept advice and answer questions. Since the 1990s, dual-process models with explicit references to the two contrasting systems have been the focus of research in both the fields of AI and cognitive science by numerous researchers. In 2025, the adoption of neurosymbolic AI, an approach that integrates neural networks with symbolic reasoning, increased in response to the need to address hallucination issues in large language models. For example, Amazon implemented Neurosymbolic AI in its Vulcan warehouse robots and Rufus shopping assistant to enhance accuracy and decision-making. == Approaches == Approaches for integration are diverse. Henry Kautz's taxonomy of neuro-symbolic architectures follows, along with some examples: Symbolic Neural symbolic is the current approach of many neural models in natural language processing, where words or subword tokens are the ultimate input and output of large language models. Examples include BERT, RoBERTa, and GPT-3. Symbolic[Neural] is exemplified by AlphaGo, where symbolic techniques are used to invoke neural techniques. In this case, the symbolic approach is Monte Carlo tree search and the neural techniques learn how to evaluate game positions. Neural | Symbolic uses a neural architecture to interpret perceptual data as symbols and relationships that are reasoned about symbolically. Neural-Concept Learner is an example. Neural: Symbolic → Neural relies on symbolic reasoning to generate or label training data that is subsequently learned by a deep learning model, e.g., to train a neural model for symbolic computation by using a Macsyma-like symbolic mathematics system to create or label examples. NeuralSymbolic uses a neural net that is generated from symbolic rules. An example is the Neural Theorem Prover, which constructs a neural network from an AND-OR proof tree generated from knowledge base rules and terms. Logic Tensor Networks also fall into this category. Neural[Symbolic] according to Kautz, this approach embeds true symbolic reasoning inside a neural network. These are tightly-coupled neural-symbolic systems, in which the logical inference rules are internal to the neural network. This way, the neural network internally computes the inference from the premises and learns to reason based on logical inference systems. Early work on connectionist modal and temporal logics by Garcez, Lamb, and Gabbay is aligned with this approach. These categories are not exhaustive, as they do not consider multi-agent systems. In 2005, Bader and Hitzler presented a more fine-grained categorization that took into account, e.g., whether the use of symbols included logic and, if so, whether the logic was propositional or first-order logic. The 2005 categorization and Kautz's taxonomy above are compared and contrasted in a 2021 article. Sepp Hochreiter argued that Graph Neural Networks "...are the predominant models of neural-symbolic computing" since "[t]hey describe the properties of molecules, simulate social networks, or predict future states in physical and engineering applications with particle-particle interactions." == Artificial general intelligence == Gary Marcus argues that "...hybrid architectures that combine learning and symbol manipulation are necessary for robust intelligence, but not sufficient", and that there are ...four cognitive prerequisites for building robust artificial intelligence: hybrid architectures that combine large-scale learning with the representational and computational powers of symbol manipulation, large-scale knowledge bases—likely leveraging innate frameworks—that incorporate symbolic knowledge along with other forms of knowledge, reasoning mechanisms capable of leveraging those knowledge bases in tractable ways, and rich cognitive models that work together with those mechanisms and knowledge bases. This echoes earlier calls for hybrid models as early as the 1990s. == History == Garcez and Lamb described research in this area as ongoing, at least since the 1990s. During that period, the terms symbolic and sub-symbolic AI were popular. A series of workshops on neuro-symbolic AI has been held annually since 2005 Neuro-Symbolic Artificial Intelligence. In the early 1990s, an initial set of workshops on this topic were organized. == Research == Key research questions remain, such as: What is the best way to integrate neural and symbolic architectures? How should symbolic structures be represented within neural networks and extracted from them? How should common-sense knowledge be learned and reasoned about? How can abstract knowledge that is hard to encode logically be handled? == Implementations == Implementations of neuro-symbolic approaches include: AllegroGraph: an integrated Knowledge Graph based platform for neuro-symbolic application development. Scallop: a language based on Datalog that supports differentiable logical and relational reasoning. Scallop can be integrated in Python and with a PyTorch learning module. Logic Tensor Networks: encode logical formulas as neural networks and simultaneously learn term encodings, term weights, and formula weights. DeepProbLog: combines neural networks with the probabilistic reasoning of ProbLog. Abductive Learning: integrates machine learning and logical reasoning in a balanced-loop via abductive reasoning, enabling them to work together in a mutually beneficial way. SymbolicAI: a compositional differentiable programming library.

    Read more →
  • United States Tech Force

    United States Tech Force

    The U.S. Tech Force (also styled as US Tech Force, Tech Force, or Government Tech Force) is a federal hiring initiative launched by the second Donald Trump administration in December 2025. The program, administered by the Office of Personnel Management (OPM), aims to recruit about 1,000 early-career technology professionals into two-year government jobs to modernize federal IT systems, advance artificial intelligence (AI) capabilities, and address technological gaps in government operations. The initiative is an effort to plug capability gaps created by Trump-administration efforts to shrink the federal government, which led to the departure of some 220,000 federal employees, including many in IT. The initiative seeks early-career workers; officials said it would offer competitive salaries and opportunities to work on high-impact government technology projects. Major technology companies—including Amazon, Apple, Microsoft, Nvidia, Meta, Google, and OpenAI—agreed to help identify and refer candidates. Candidates are allowed to take Tech Force positions on leaves of absence and without divesting their stock, raising conflict-of-interest questions. In January 2026, OPM direction Scott Kupor said the deadline for applying to Tech Force was being extended because of "tremendous interest" without saying how many people had actually applied. Also in December 2025, news broke that the administration is planning another novel use of private-sector workers: hiring cybersecurity firms for offensive cyber operations.

    Read more →
  • Image destriping

    Image destriping

    Image destriping is the process of removing stripes or streaks from images and videos without disrupting the original image/video. These artifacts plague a range of fields in scientific imaging including atomic force microscopy, light sheet fluorescence microscopy, and planetary satellite imaging. The most common image processing techniques to reduce stripe artifacts is with Fourier filtering. Unfortunately, filtering methods risk altering or suppressing useful image data. Methods developed for multiple-sensor imaging systems in planetary satellites use statistical-based methods to match signal distribution across multiple sensors. More recently, a new class of approaches leverage compressed sensing, to regularize an optimization problem, and recover stripe free images. In many cases, these destriped images have little to no artifacts, even at low signal to noise ratios.

    Read more →
  • H (company)

    H (company)

    H Company, also known simply as H, is a French artificial intelligence startup which develops "action-oriented" artificial intelligence agents for enterprise automation and productivity. In May 2024, H Company closed a record-setting $220 million seed round, at the time the largest AI raise in Europe. In 2026, H Company released Holo 3, the latest generation of its computer-use AI models. The update marked a major advance in agentic AI, enabling agents to navigate any user interface, interpret screens, and complete complex, multi-step tasks across enterprise systems—much like a human user. This breakthrough positioned H Company at the frontier of computer-use autonomy, accelerating the integration of AI in enterprise workflows. == History == H Company was founded in 2023 in Paris by Laurent Sifre, Charles Kantor, and three DeepMind veterans: Daan Wiestra, Karl Tuyls, Julien Perollat. In May 2024, the firm secured what was then the largest European AI seed round, totaling $220 million led by US investors including Eric Schmidt (former Google CEO), Amazon, and backed by Accel, Bpifrance, UiPath, Eurazeo, Xavier Niel, Yuri Milner, Bernard Arnault, Samsung and others. In August 2024, three cofounders (Wiestra, Tuyls, Perollat) left the company over operational disagreements. In November 2024, H launched Runner H, its first agentic-API platform, which combined a large language model (LLM) and a reduced, 2-billion parameter vision-language model (VLM). In May 2025, H Company acquired Mithril Security, and in June 2025 the company widened its offering for agentic models. In June 2025, Gautier Cloix (formerly CEO Palantir France) replaced Charles Kantor as CEO of H Company, aiming to pivot the company towards a "forward deployed engineers" model. In July 2025, H Company introduced Surfer-H-CLI, an open-source, web-native Chrome agent designed for browser-based automation—able to search, scroll, click, and type on behalf of users and controllable via any visual language model (VLM). When paired with its June 2025 open-sourced 3B-parameter Holo-1 model, Surfer-H-CLI achieved 92.2% WebVoyager benchmark accuracy. == Activity == H Company creates enterprise AI models and agents (agentic AI) to automate and optimize complex workflows. H Company specifically designs AI agents called computer use capable of autonomously interfacing with any software (local or cloud-based) to detect and automate repetitive operations. H Company is based in Paris, France, with international offices in London and New York. H Company raised $220 million since its inception. Gautier Cloix is president and CEO of the company. H Company client include the French national lottery FDJ United. In March 2026, H Company released Holo3, a family of artificial intelligence models designed to operate digital systems by interacting directly with user interfaces. Holo3 enables agents ("virtual humanoids") to understand what is displayed in front-end environments—such as web pages, desktop applications, and other graphical user interfaces—and perform actions such as clicking, typing, and navigating across them to complete multi-step tasks. On the OSWorld-Verified benchmark, Holo3 reportedly achieved about 78.9%, surpassing the scores of OpenAI’s GPT‑5.4 and Anthropic’s Claude Opus 4.6 on this specific test, at roughly one-tenth of the inference cost of these proprietary systems. The release has been presented as a significant step toward automating routine digital workflows, allowing organizations to offload repetitive on-screen work, such as data entry and reconciliation across multiple tools, to AI-based agents.

    Read more →
  • Bag-of-words model

    Bag-of-words model

    The bag-of-words (BoW) model is a model of text which uses an unordered collection (a "bag") of words. It is used in natural language processing and information retrieval (IR). It disregards word order (and thus most of syntax or grammar) but captures multiplicity. The bag-of-words model is commonly used in methods of document classification where, for example, the (frequency of) occurrence of each word is used as a feature for training a classifier. It has also been used for computer vision. An early reference to "bag of words" in a linguistic context can be found in Zellig Harris's 1954 article on Distributional Structure. == Definition == The following models a text document using bag-of-words. Here are two simple text documents: Based on these two text documents, a list is constructed as follows for each document: Representing each bag-of-words as a JSON object, and attributing to the respective JavaScript variable: Each key is the word, and each value is the number of occurrences of that word in the given text document. The order of elements is free, so, for example {"too":1,"Mary":1,"movies":2,"John":1,"watch":1,"likes":2,"to":1} is also equivalent to BoW1. It is also what we expect from a strict JSON object representation. Note: if another document is like a union of these two, its JavaScript representation will be: So, as we see in the bag algebra, the "union" of two documents in the bags-of-words representation is, formally, the disjoint union, summing the multiplicities of each element. === Word order === The BoW representation of a text removes all word ordering. For example, the BoW representation of "man bites dog" and "dog bites man" are the same, so any algorithm that operates with a BoW representation of text must treat them in the same way. Despite this lack of syntax or grammar, BoW representation is fast and may be sufficient for simple tasks that do not require word order. For instance, for document classification, if the words "stocks" "trade" "investors" appears multiple times, then the text is likely a financial report, even though it would be insufficient to distinguish between Yesterday, investors were rallying, but today, they are retreating.andYesterday, investors were retreating, but today, they are rallying.and so the BoW representation would be insufficient to determine the detailed meaning of the document. == Implementations == Implementations of the bag-of-words model might involve using frequencies of words in a document to represent its contents. The frequencies can be "normalized" by the inverse of document frequency, or tf–idf. Additionally, for the specific purpose of classification, supervised alternatives have been developed to account for the class label of a document. Lastly, binary (presence/absence or 1/0) weighting is used in place of frequencies for some problems (e.g., this option is implemented in the WEKA machine learning software system). == Hashing trick == A common alternative to using dictionaries is the hashing trick, where words are mapped directly to indices with a hash function. When using a hash function, no memory is required to store a dictionary. In practice, hashing simplifies the implementation of bag-of-words models and improves scalability. Collisions can occur when two words are hashed to the same index, but this happens infrequently and may function as a form of regularization.

    Read more →
  • Curse of dimensionality

    Curse of dimensionality

    The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. The curse generally refers to issues that arise when the number of datapoints is small (in a suitably defined sense) relative to the intrinsic dimension of the data. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient. == Domains == === Combinatorics === In some problems, each variable can take one of several discrete values, or the range of possible values is divided to give a finite number of possibilities. Taking the variables together, a huge number of combinations of values must be considered. This effect is also known as the combinatorial explosion. Even in the simplest case of d {\displaystyle d} binary variables, the number of possible combinations already is 2 d {\displaystyle 2^{d}} , exponential in the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations. === Sampling === There is an exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 102 = 100 evenly spaced sample points suffice to sample a unit interval (try to visualize a "1-dimensional" cube, i.e. a line) with no more than 10−2 = 0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice that has a spacing of 10−2 = 0.01 between adjacent points would require 1020 = [(102)10] sample points. In general, with a spacing distance of 10−n the 10-dimensional hypercube appears to be a factor of 10n(10−1) = [(10n)10/(10n)] "larger" than the 1-dimensional hypercube, which is the unit interval. In the above example n = 2: when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be 1018 "larger" than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below. === Optimization === When solving dynamic optimization problems by numerical backward induction, the objective function must be computed for each combination of values. This is a significant obstacle when the dimension of the "state variable" is large. === Machine learning === In machine learning problems that involve learning a "state-of-nature" from a finite number of data samples in a high-dimensional feature space with each feature having a range of possible values, typically an enormous amount of training data is required to ensure that there are several samples with each combination of values. In an abstract sense, as the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially. A typical rule of thumb is that there should be at least 5 training examples for each dimension in the representation. In machine learning and insofar as predictive performance is concerned, the curse of dimensionality is used interchangeably with the peaking phenomenon, which is also known as Hughes phenomenon. This phenomenon states that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily. Nevertheless, in the context of a simple classifier (e.g., linear discriminant analysis in the multivariate Gaussian model under the assumption of a common known covariance matrix), Zollanvari et al. showed both analytically and empirically that as long as the relative cumulative efficacy of an additional feature set (with respect to features that are already part of the classifier) is greater (or less) than the size of this additional feature set, the expected error of the classifier constructed using these additional features will be less (or greater) than the expected error of the classifier constructed without them. In other words, both the size of additional features and their (relative) cumulative discriminatory effect are important in observing a decrease or increase in the average predictive power. In metric learning, higher dimensions can sometimes allow a model to achieve better performance. After normalizing embeddings to the surface of a hypersphere, FaceNet achieves the best performance using 128 dimensions as opposed to 64, 256, or 512 dimensions in one ablation study. A loss function for unitary-invariant dissimilarity between word embeddings was found to be minimized in high dimensions. === Data mining === In data mining, the curse of dimensionality refers to a data set with too many features. Consider the first table, which depicts 200 individuals and 2000 genes (features) with a 1 or 0 denoting whether or not they have a genetic mutation in that gene. A data mining application to this data set may be finding the correlation between specific genetic mutations and creating a classification algorithm such as a decision tree to determine whether an individual has cancer or not. A common practice of data mining in this domain would be to create association rules between genetic mutations that lead to the development of cancers. To do this, one would have to loop through each genetic mutation of each individual and find other genetic mutations that occur over a desired threshold and create pairs. They would start with pairs of two, then three, then four until they result in an empty set of pairs. The complexity of this algorithm can lead to calculating all permutations of gene pairs for each individual or row. Given the formula for calculating the permutations of n items with a group size of r is: n ! ( n − r ) ! {\displaystyle {\frac {n!}{(n-r)!}}} , calculating the number of three pair permutations of any given individual would be 7988004000 different pairs of genes to evaluate for each individual. The number of pairs created will grow by an order of factorial as the size of the pairs increase. The growth is depicted in the permutation table (see right). As we can see from the permutation table above, one of the major problems data miners face regarding the curse of dimensionality is that the space of possible parameter values grows exponentially or factorially as the number of features in the data set grows. This problem critically affects both computational time and space when searching for associations or optimal features to consider. Another problem data miners may face when dealing with too many features is that the number of false predictions or classifications tends to increase as the number of features grows in the data set. In terms of the classification problem discussed above, keeping every data point could lead to a higher number of false positives and false negatives in the model. This may seem counterintuitive, but consider the genetic mutation table from above, depicting all genetic mutations for each individual. Each genetic mutation, whether they correlate with cancer or not, will have some input or weight in the model that guides the decision-making process of the algorithm. There may be mutations that are outliers or ones that dominate the overall distribution of genetic mutations when in fact they do not correlate with cancer. These features may be working against one's model, making it more difficult to obtain optimal results. This problem is up to the data miner to solve, and there is no universal solution. The first step any data miner should take is to explore the data, in an attempt to gain an understanding of how it can be used to solve the problem. One must first understand what the data means, and what they are trying to discover before they can decide if anything must be removed from the data set. Then they can create or use a feature selection or dimensionality reduction algorithm to remove samples or features from the data set if they deem it necessary. One example of such methods is the interquartile range method, used to remove outliers in a data set by calculating the standard deviation of a feature or occurrence. === Distance function === When a measure such as a Euclidean distance is defined using many coordinat

    Read more →
  • Stochastic parrot

    Stochastic parrot

    In machine learning, the term stochastic parrot is a metaphor that frames large language models as systems that statistically mimic text without real understanding. The word "stochastic" – from the ancient Greek "στοχαστικός" (stokhastikos, 'based on guesswork') – is a term from probability theory meaning "randomly determined". The word "parrot" refers to parrots' ability to mimic human speech. The term was introduced in a 2021 paper on AI ethics titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" and authored by Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell. The paper outlined possible risks associated with large language models (LLMs). In December 2020, it was the subject of a workplace dispute between Gebru (then co-leader of Google's Ethical Artificial Intelligence Team) and Google, which had requested the retraction of the paper. The incident culminated in Gebru's controversial departure from the company. The paper was later presented at the 2021 ACM Conference, and the term "stochastic parrot" has seen widespread use in academic research concerning generative AI and LLMs. The term has been interpreted negatively as an insult towards AI. == Background == Timnit Gebru is an AI ethics researcher, Emily M. Bender is a linguist specializing in computational linguistics, and Margaret Mitchell is a computer scientist specializing in algorithmic bias. Gebru had joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Mitchell. In late 2020, the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" was co-written by Gebru and five other researchers, four of whom were Google employees. The paper argues that large language models (LLMs) present significant risks such as environmental and financial costs, inscrutability leading to unknown dangerous biases, and potential for deception as LLMs do not understand the concepts underlying what they learn. The paper states that LLMs are "stitching together sequences of linguistic forms ... observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning." Therefore, they are labeled "stochastic parrots". === Dismissal of Gebru by Google === After the paper was submitted for consideration to the 2021 ACM Conference, Google requested that Gebru either retract the paper from the conference or remove the names of Google employees from it. Gebru refused to do so without further discussion, and emailed Google Research vice president Megan Kacholia that if the company could not explain the request for retraction and address other concerns regarding similar projects, she would plan to resign after a transition period, stating that they could "work on a last date". The following day, on December 2, 2020, Gebru received an email saying that Google was "accepting her resignation". Her abrupt firing sparked protests by Google employees and negative publicity for the company. == Usage == The phrase has been used by AI skeptics to signify that LLMs lack understanding of the meaning of their outputs. Sam Altman, CEO of OpenAI, used the term shortly after the release of ChatGPT in December 2022, tweeting "i am a stochastic parrot, and so r u". The term was nominated as the 2023 AI-related Word of the Year by the American Dialect Society. == Debate == Some LLMs, such as ChatGPT, have become capable of interacting with users in convincingly human-like conversations. The development of these new systems has deepened the discussion of the extent to which LLMs understand or are simply "parroting". According to machine learning researchers Lindholm, Wahlström, Lindsten, and Schön, the term "stochastic parrot" highlights two vital limitations of LLMs: LLMs are limited by the data they are trained on and are simply stochastically repeating contents of datasets. Because they are just making up outputs based on training data, LLMs do not understand if they are saying something incorrect or inappropriate. Lindholm et al. noted that, with poor quality datasets and other limitations, a learning machine might produce results that are "dangerously wrong". === Subjective experience === In the mind of a human being, words and language correspond to things one has experienced. For LLMs, according to proponents of the theory, words correspond only to other words and patterns of usage fed into their training data. Proponents of the idea of stochastic parrots thus conclude that statements about LLMs are due to "the human tendency to attribute meaning to text", and claim this occurs despite the LLMs not actually understanding language. === Fine-tuning === Kelsey Piper argued that the claim that LLMs are stochastic parrots or mere "next-token predictors" focuses on pre-training, ignoring that modern LLMs are also fine-tuned to follow instructions and to prefer accurate answers. === Hallucinations and mistakes === The tendency of LLMs to pass off false information as fact is held as support. Called hallucinations or confabulations, LLMs will occasionally synthesize information that matches some pattern. LLMs may fail to distinguish fact and fiction, which leads to the claim that they can't connect words to a comprehension of the world, as humans do. Furthermore, LLMs may fail to decipher complex or ambiguous grammar cases that rely on understanding the meaning of language. For example: The wet newspaper that fell down off the table is my favorite newspaper. But now that my favorite newspaper fired the editor I might not like reading it anymore. Can I replace 'my favorite newspaper' by 'the wet newspaper that fell down off the table' in the second sentence? GPT-4, an LLM released in March 2023, responded yes, not understanding that the meaning of "newspaper" is different in these two contexts; it is first an object and second an institution. === Benchmarks and experiments === One argument against the hypothesis that LLMs are stochastic parrot is their results on benchmarks for reasoning, common sense and language understanding. In 2023, some LLMs have shown good results on many language understanding tests, such as the Super General Language Understanding Evaluation (SuperGLUE). GPT-4 scored in the >90th-percentile on the Uniform Bar Examination and achieved 93% accuracy on the MATH benchmark of high-school Olympiad problems, results that exceed rote pattern-matching expectations. Such tests, and the smoothness of many LLM responses, help as many as 51% of AI professionals believe they can truly understand language with enough data, according to a 2022 survey. === Expert rebuttals === Some AI researchers dispute the notion that LLMs merely "parrot" their training data. Geoffrey Hinton, a pioneering figure in neural networks, counters that the metaphor misunderstands the prerequisite for accurate language prediction. He argues that "to predict the next word accurately, you have to understand the sentence", a view he presented on 60 Minutes in 2023. From this perspective, understanding is not an alternative to statistical prediction, but rather an emergent property required to perform it effectively at scale. Hinton also uses logical puzzles to demonstrate that LLMs actually understand language. A 2024 Scientific American investigation described a closed Berkeley workshop where state-of-the-art models solved novel tier-4 mathematics problems and produced coherent proofs, indicating reasoning abilities beyond memorization. The GPT-4 Technical Report showed human-level results on professional and academic exams (e.g., the Uniform Bar Exam and USMLE), challenging the "parrot" characterization. Anthropic conducted mechanistic interpretability research on Claude, using attribution graphs to identify circuits. The research showed how the LLM processes information via chains of fuzzy logical inference, and indicated an ability to plan ahead. They found that Claude 3.5 Haiku "employs remarkably general abstractions", forms "internally generated plans for its future outputs" and "works backwards from its longer-term goals". They noted that "The mechanisms of the model can apparently only be faithfully described using an overwhelmingly large causal graph." They also found that the model includes "mechanisms that could underlie a simple form of metacognition", in that it "thinks about" the level of its own knowledge before reaching its answer. === Interpretability === Another line of evidence against the 'stochastic parrot' claim comes from mechanistic interpretability, a research field dedicated to reverse-engineering LLMs to understand their internal workings. Rather than only observing the model's input-output behavior, these techniques probe the model's internal activations, which can be used to determine if they contain structured representations of the world. The goal is to investigate whether LLMs are merely manipulating surface statistics or if t

    Read more →
  • Gödel machine

    Gödel machine

    A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. == Limitations == Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired. This does not take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome. The Gödel machine has limitations of its own, however. According to Gödel's First Incompleteness Theorem, any formal system that encompasses arithmetic is either flawed or allows for statements that cannot be proved in the system. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove. == Variables of interest == There are three variables that are particularly useful in the run time of the Gödel machine. At some time t {\displaystyle t} , the variable time {\displaystyle {\text{time}}} will have the binary equivalent of t {\displaystyle t} . This is incremented steadily throughout the run time of the machine. Any input meant for the Gödel machine from the natural environment is stored in variable x {\displaystyle x} . It is likely the case that x {\displaystyle x} will hold different values for different values of variable time {\displaystyle {\text{time}}} . The outputs of the Gödel machine are stored in variable y {\displaystyle y} , where y ( t ) {\displaystyle y(t)} would be the output bit-string at some time t {\displaystyle t} . At any given time t {\displaystyle t} , where ( 1 ≤ t ≤ T ) {\displaystyle (1\leq t\leq T)} , the goal is to maximize future success or utility. A typical utility function follows the pattern u ( s , E n v ) : S × E → R {\displaystyle u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R} } : u ( s , E n v ) = E μ [ ∑ τ = time T r ( τ ) ∣ s , E n v ] {\displaystyle u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}} where r ( t ) {\displaystyle r(t)} is a real-valued reward input (encoded within s ( t ) {\displaystyle s(t)} ) at time t {\displaystyle t} , E μ [ ⋅ ∣ ⋅ ] {\displaystyle E_{\mu }[\cdot \mid \cdot ]} denotes the conditional expectation operator with respect to some possibly unknown distribution μ {\displaystyle \mu } from a set M {\displaystyle M} of possible distributions ( M {\displaystyle M} reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned time = time ⁡ ( s ) {\displaystyle {\text{time}}=\operatorname {time} (s)} is a function of state s {\displaystyle s} which uniquely identifies the current cycle. Note that we take into account the possibility of extending the expected lifespan through appropriate actions. == Instructions used by proof techniques == The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification. === get-axiom(n) === Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme: Hardware Axioms formally specify how components of the machine could change from one cycle to the next. Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs. Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y. Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine. Initial State Axioms contain information about how to reconstruct parts or all of the initial state. Utility Axioms describe the overall goal in the form of utility function u. === apply-rule(k, m, n) === Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof. === delete-theorem(m) === Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function. === set-switchprog(m, n) === Replaces switchprog S pm:n, provided it is a non-empty substring of S p. === check() === Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs). === state2theorem(m, n) === Takes in two arguments, m and n, and attempts to convert the contents of Sm:n into a theorem. == Example applications == === Time-limited NP-hard optimization === The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias. === Fast theorem proving === Prove or disprove as quickly as possible that all even integers > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof. === Maximizing expected reward with bounded resources === A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.

    Read more →
  • Problem solving

    Problem solving

    Problem solving is the process of achieving a goal by overcoming obstacles, a frequent part of most activities. Problems in need of solutions range from simple personal tasks (e.g. how to get from point A to B) to complex issues in business and technical fields. The former is an example of simple problem solving (SPS) addressing one issue, whereas the latter is complex problem solving (CPS) with multiple interrelated obstacles. Another classification of problem-solving tasks is into well-defined problems with specific obstacles and goals, and ill-defined problems in which the current situation is troublesome but it is not clear what kind of resolution to aim for. Similarly, one may distinguish formal or fact-based problems requiring psychometric intelligence, versus socio-emotional problems which depend on the changeable emotions of individuals or groups, such as tactful behavior, fashion, or gift choices. Solutions require sufficient resources and knowledge to attain the goal. Professionals such as lawyers, doctors, programmers, and consultants are largely problem solvers for issues that require technical skills and knowledge beyond general competence. Many businesses have found profitable markets by recognizing a problem and creating a solution: the more widespread and inconvenient the problem, the greater the opportunity to develop a scalable solution. There are many specialized problem-solving techniques and methods in fields such as science, engineering, business, medicine, mathematics, computer science, philosophy, and social organization. The mental techniques to identify, analyze, and solve problems are studied in psychology and cognitive sciences. Also widely researched are the mental obstacles that prevent people from finding solutions; problem-solving impediments include confirmation bias, mental set, and functional fixedness. == Definition == The term problem solving has a slightly different meaning depending on the discipline. For instance, it is a mental process in psychology and a computerized process in computer science. There are two different types of problems: ill-defined and well-defined; different approaches are used for each. Well-defined problems have specific end goals and clearly expected solutions, while ill-defined problems do not. Well-defined problems allow for more initial planning than ill-defined problems. Solving problems sometimes involves dealing with pragmatics (the way that context contributes to meaning) and semantics (the interpretation of the problem). The ability to understand what the end goal of the problem is, and what rules could be applied, represents the key to solving the problem. Sometimes a problem requires abstract thinking or coming up with a creative solution. Problem solving has two major domains: mathematical problem solving and personal problem solving. Each concerns some difficulty or barrier that is encountered. === Psychology === Problem solving in psychology refers to the process of finding solutions to problems encountered in life. Solutions to these problems are usually situation- or context-specific. The process starts with problem finding and problem shaping, in which the problem is discovered and simplified. The next step is to generate possible solutions and evaluate them. Finally a solution is selected to be implemented and verified. Problems have an end goal to be reached; how you get there depends upon problem orientation (problem-solving coping style and skills) and systematic analysis. Mental health professionals study the human problem-solving processes using methods such as introspection, behaviorism, simulation, computer modeling, and experiment. Social psychologists look into the person-environment relationship aspect of the problem and independent and interdependent problem-solving methods. Problem solving has been defined as a higher-order cognitive process and intellectual function that requires the modulation and control of more routine or fundamental skills. Empirical research shows many different strategies and factors influence everyday problem solving. Rehabilitation psychologists studying people with frontal lobe injuries have found that deficits in emotional control and reasoning can be re-mediated with effective rehabilitation and could improve the capacity of injured persons to resolve everyday problems. Interpersonal everyday problem solving is dependent upon personal motivational and contextual components. One such component is the emotional valence of "real-world" problems, which can either impede or aid problem-solving performance. Researchers have focused on the role of emotions in problem solving, demonstrating that poor emotional control can disrupt focus on the target task, impede problem resolution, and lead to negative outcomes such as fatigue, depression, and inertia. In conceptualization,human problem solving consists of two related processes: problem orientation, and the motivational/attitudinal/affective approach to problematic situations and problem-solving skills. People's strategies cohere with their goals and stem from the process of comparing oneself with others. === Cognitive sciences === Among the first experimental psychologists to study problem solving were the Gestaltists in Germany, such as Karl Duncker in The Psychology of Productive Thinking (1935). Perhaps best known is the work of Allen Newell and Herbert A. Simon. Experiments in the 1960s and early 1970s asked participants to solve relatively simple, well-defined, but not previously seen laboratory tasks. These simple problems, such as the Tower of Hanoi, admitted optimal solutions that could be found quickly, allowing researchers to observe the full problem-solving process. Researchers assumed that these model problems would elicit the characteristic cognitive processes by which more complex "real world" problems are solved. An outstanding problem-solving technique found by this research is the principle of decomposition. === Computer science === Much of computer science and artificial intelligence involves designing automated systems to solve a specified type of problem: to accept input data and calculate a correct or adequate response, reasonably quickly. Algorithms are recipes or instructions that direct such systems, written into computer programs. Steps for designing such systems include problem determination, heuristics, root cause analysis, de-duplication, analysis, diagnosis, and repair. Analytic techniques include linear and nonlinear programming, queuing systems, and simulation. A large, perennial obstacle is to find and fix errors in computer programs: debugging. === Logic === Formal logic concerns issues like validity, truth, inference, argumentation, and proof. In a problem-solving context, it can be used to formally represent a problem as a theorem to be proved, and to represent the knowledge needed to solve the problem as the premises to be used in a proof that the problem has a solution. The use of computers to prove mathematical theorems using formal logic emerged as the field of automated theorem proving in the 1950s. It included the use of heuristic methods designed to simulate human problem solving, as in the Logic Theory Machine, developed by Allen Newell, Herbert A. Simon and J. C. Shaw, as well as algorithmic methods such as the resolution principle developed by John Alan Robinson. In addition to its use for finding proofs of mathematical theorems, automated theorem-proving has also been used for program verification in computer science. In 1958, John McCarthy proposed the advice taker, to represent information in formal logic and to derive answers to questions using automated theorem-proving. An important step in this direction was made by Cordell Green in 1969, who used a resolution theorem prover for question-answering and for such other applications in artificial intelligence as robot planning. The resolution theorem-prover used by Cordell Green bore little resemblance to human problem solving methods. In response to criticism of that approach from researchers at MIT, Robert Kowalski developed logic programming and SLD resolution, which solves problems by problem decomposition. He has advocated logic for both computer and human problem solving and computational logic to improve human thinking. === Engineering === When products or processes fail, problem solving techniques can be used to develop corrective actions that can be taken to prevent further failures. Such techniques can also be applied to a product or process prior to an actual failure event—to predict, analyze, and mitigate a potential problem in advance. Techniques such as failure mode and effects analysis can proactively reduce the likelihood of problems. In either the reactive or the proactive case, it is necessary to build a causal explanation through a process of diagnosis. In deriving an explanation of effects in terms of causes, abduction generates new ideas or hypothes

    Read more →
  • Reasoning model

    Reasoning model

    A reasoning model, also known as a reasoning language model (RLM) or large reasoning model (LRM), is a type of large language model (LLM) that has been specifically trained to solve complex tasks requiring multiple steps of logical reasoning. These models demonstrate superior performance on logic, mathematics, and programming tasks compared to standard LLMs. They possess the ability to revisit and revise earlier reasoning steps and utilize additional computation during inference as a method to scale performance, complementing traditional scaling approaches based on training data size, model parameters, and training compute. == Overview == Unlike traditional language models that generate responses immediately, reasoning models allocate additional compute, or thinking, time before producing an answer to solve multi-step problems. OpenAI introduced this terminology in September 2024 when it released the o1 series, describing the models as designed to "spend more time thinking" before responding. The company framed o1 as a reset in model naming that targets complex tasks in science, coding, and mathematics, and it contrasted o1's performance with GPT-4o on benchmarks such as AIME and Codeforces. Independent reporting the same week summarized the launch and highlighted OpenAI's claim that o1 automates chain-of-thought style reasoning to achieve large gains on difficult exams. In operation, reasoning models generate internal chains of intermediate steps, then select and refine a final answer. OpenAI reported that o1's accuracy improves as the model is given more reinforcement learning during training and more test-time compute at inference. The company initially chose to hide raw chains and instead return a model-written summary, stating that it "decided not to show" the underlying thoughts so researchers could monitor them without exposing unaligned content to end users. Commercial deployments document separate "reasoning tokens" that meter hidden thinking and a control for "reasoning effort" that tunes how much compute the model uses. These features make the models slower than ordinary chat systems while enabling stronger performance on difficult problems. == History == The research trajectory toward reasoning models combined advances in supervision, prompting, and search-style inference. Early alignment work on reinforcement learning from human feedback showed that models can be fine-tuned to follow instructions with "human feedback" and preference-based rewards. In 2022, Google Research scientists Jason Wei and Denny Zhou showed that chain-of-thought prompting "significantly improves the ability" of large models on complex reasoning tasks. Input → Step 1 → Step 2 → ⋯ → Step n ⏟ Reasoning chain → Answer {\displaystyle {\text{Input}}\rightarrow \underbrace {{\text{Step}}_{1}\rightarrow {\text{Step}}_{2}\rightarrow \cdots \rightarrow {\text{Step}}_{n}} _{\text{Reasoning chain}}\rightarrow {\text{Answer}}} A companion result demonstrated that the simple instruction "Let's think step by step" can elicit zero-shot reasoning. Follow-up work introduced self-consistency decoding, which "boosts the performance" of chain-of-thought by sampling diverse solution paths and choosing the consensus, and tool-augmented methods such as ReAct, a portmanteau of Reason and Act, that prompt models to "generate both reasoning traces" and actions. Research then generalized chain-of-thought into search over multiple candidate plans. The Tree-of-Thoughts framework from Princeton computer scientist Shunyu Yao proposes that models "perform deliberate decision making" by exploring and backtracking over a tree of intermediate thoughts. OpenAI's reported breakthrough focused on supervising reasoning processes rather than only outcomes, with Lightman et al.'s "Let's Verify Step by Step" reporting that rewarding each correct step "significantly outperforms outcome supervision" on challenging math problems and improves interpretability by aligning the chain-of-thought with human judgment. OpenAI's o1 announcement ties these strands together with a large-scale reinforcement learning algorithm that trains the model to refine its own chain of thought, and it reports that accuracy rises with more training compute and more time spent thinking at inference. Together, these developments define the core of reasoning models. They use supervision signals that evaluate the quality of intermediate steps, they exploit inference-time exploration such as consensus or tree search, and they expose controls for how much internal thinking compute to allocate. OpenAI's o1 family made this approach available at scale in September 2024 and popularized the label "reasoning model" for LLMs that deliberately think before they answer. The development of reasoning models illustrates Richard S. Sutton's "bitter lesson" that scaling compute typically outperforms methods based on human-designed insights. This principle was demonstrated by researchers at the Generative AI Research Lab (GAIR), who initially attempted to replicate o1's capabilities using sophisticated methods including tree search and reinforcement learning in late 2024. Their findings, published in the "o1 Replication Journey" series, revealed that knowledge distillation, a comparatively straightforward technique that trains a smaller model to mimic o1's outputs, produced unexpectedly strong performance. This outcome illustrated how direct scaling approaches can, at times, outperform more complex engineering solutions. === Drawbacks === Reasoning models require significantly more computational resources during inference compared to non-reasoning models. Research on the American Invitational Mathematics Examination (AIME) benchmark found that reasoning models were 10 to 74 times more expensive to operate than their non-reasoning counterparts. The extended inference time is attributed to the detailed, step-by-step reasoning outputs that these models generate, which are typically much longer than responses from standard large language models that provide direct answers without showing their reasoning process. One researcher in early 2025 argued that these models may face potential additional denial-of-service concerns with "overthinking attacks." === Releases === ==== 2024 ==== In September 2024, OpenAI released o1-preview, a large language model with enhanced reasoning capabilities. The full version, o1, was released in December 2024. OpenAI initially shared preliminary results on its successor model, o3, in December 2024, with the full o3 model becoming available in 2025. Alibaba released reasoning versions of its Qwen large language models in November 2024. In December 2024, the company introduced QvQ-72B-Preview, an experimental visual reasoning model. In December 2024, Google introduced Deep Research in Gemini, a feature designed to conduct multi-step research tasks. On December 16, 2024, researchers demonstrated that by scaling test-time compute, a relatively small Llama 3B model could outperform a much larger Llama 70B model on challenging reasoning tasks. This experiment suggested that improved inference strategies can unlock reasoning capabilities even in smaller models. ==== 2025 ==== In January 2025, DeepSeek released R1, a reasoning model that achieved performance comparable to OpenAI's o1 at significantly lower computational cost. The release demonstrated the effectiveness of Group Relative Policy Optimization (GRPO), a reinforcement learning technique used to train the model. On January 25, 2025, DeepSeek enhanced R1 with web search capabilities, allowing the model to retrieve information from the internet while performing reasoning tasks. Research during this period further validated the effectiveness of knowledge distillation for creating reasoning models. The s1-32B model achieved strong performance through budget forcing and scaling methods, reinforcing findings that simpler training approaches can be highly effective for reasoning capabilities. On February 2, 2025, OpenAI released Deep Research, a feature powered by their o3 model that enables users to conduct comprehensive research tasks. The system generates detailed reports by automatically gathering and synthesizing information from multiple web sources. OpenAI called GPT-4.5 its "last non-chain-of-thought model", and implemented with GPT-5 a router model that selects a model based on the difficulty of the task. ==== 2026 ==== In January 2026, Moonshot AI released Kimi K2.5, an open-source 1 trillion parameter MoE model with 32 billion active parameters. It uses an “Agent Swarm” system that dynamically decomposes tasks into sub-agents for reasoning and execution, enabling more scalable multi-step problem solving than a single sequential reasoning chain. == Training == Reasoning models follow the familiar large-scale pretraining used for frontier language models, then diverge in the post-training and optimization. OpenAI reports that o1 is trained with a large-

    Read more →
  • Color management

    Color management

    Color management is the process of ensuring consistent and accurate colors across various devices, such as monitors, printers, and cameras. It involves the use of color profiles, which are standardized descriptions of how colors should be displayed or reproduced. Color management is necessary because different devices have different color capabilities and characteristics. For example, a monitor may display colors differently than a printer can reproduce them. Without color management, the same image may appear differently on different devices, leading to inconsistencies and inaccuracies. To achieve color management, a color profile is created for each device involved in the color workflow. This profile describes the device's color capabilities and characteristics, such as its color gamut (range of colors it can display or reproduce) and color temperature. These profiles are then used to translate colors between devices, ensuring consistent and accurate color reproduction. Color management is particularly important in industries such as graphic design, photography, and printing, where accurate color representation is crucial. It helps to maintain color consistency throughout the entire workflow, from capturing an image to displaying or printing it. Parts of color management are implemented in the operating system (OS), helper libraries, the application, and devices. The type of color profile that is typically used is called an ICC profile. A cross-platform view of color management is the use of an ICC-compatible color management system. The International Color Consortium (ICC) is an industry consortium that has defined: an open standard for a Color Matching Module (CMM) at the OS level color profiles for: devices, including DeviceLink profiles that transform one device profile (color space) to another device profile without passing through an intermediate color space, such as LAB, more accurately preserving color working spaces, the color spaces in which color data is meant to be manipulated There are other approaches to color management besides using ICC profiles. This is partly due to history and partly because of other needs than the ICC standard covers. The film and broadcasting industries make use of some of the same concepts, but they frequently rely on more limited boutique solutions. The film industry, for instance, often uses 3D LUTs (lookup table) to represent a complete color transformation for a specific RGB encoding. At the consumer level, system wide color management is available in most of Apple's products (macOS, iOS, iPadOS, watchOS). Microsoft Windows lacks system wide color management and virtually all applications do not employ color management. Windows' media player API is not color space aware, and if applications want to color manage videos manually, they have to incur significant performance and power consumption penalties. Android supports system wide color management, but most devices ship with color management disabled. == Overview == Characterize. Every color-managed device requires a personalized table, or "color profile," which characterizes the color response of that particular device. Standardize. Each color profile describes these colors relative to a standardized set of reference colors (the "Profile Connection Space"). Translate. Color-managed software then uses these standardized profiles to translate color from one device to another. This is usually performed by a color management module (CMM). == Hardware == === Characterization === To describe the behavior of various output devices, they must be compared (measured) in relation to a standard color space. Often a step called linearization is performed first, to undo the effect of gamma correction that was done to get the most out of limited 8-bit color paths. Instruments used for measuring device colors include colorimeters and spectrophotometers. As an intermediate result, the device gamut is described in the form of scattered measurement data. The transformation of the scattered measurement data into a more regular form, usable by the application, is called profiling. Profiling is a complex process involving mathematics, intense computation, judgment, testing, and iteration. After the profiling is finished, an idealized color description of the device is created. This description is called a profile. === Calibration === Calibration is like characterization, except that it can include the adjustment of the device, as opposed to just the measurement of the device. Color management is sometimes sidestepped by calibrating devices to a common standard color space such as sRGB; when such calibration is done well enough, no color translations are needed to get all devices to handle colors consistently. This avoidance of the complexity of color management was one of the goals in the development of sRGB. == Color profiles == === Embedding === Image formats themselves (such as TIFF, JPEG, PNG, EPS, PDF, and SVG) may contain embedded color profiles but are not required to do so by the image format. The International Color Consortium standard was created to bring various developers and manufacturers together. The ICC standard permits the exchange of output device characteristics and color spaces in the form of metadata. This allows the embedding of color profiles into images as well as storing them in a database or a profile directory. === Working spaces === Working spaces, such as sRGB, Adobe RGB or ProPhoto are color spaces that facilitate good results while editing. For instance, pixels with equal values of R,G,B should appear neutral. Using a large (gamut) working space will lead to posterization, while using a small working space will lead to clipping. This trade-off is a consideration for the critical image editor. == Color transformation == Color transformation, or color space conversion, is the transformation of the representation of a color from one color space to another. This calculation is required whenever data is exchanged inside a color-managed chain and carried out by a Color Matching Module. Transforming profiled color information to different output devices is achieved by referencing the profile data into a standard color space. It makes it easier to convert colors from one device to a selected standard color space and from that to the colors of another device. By ensuring that the reference color space covers the many possible colors that humans can see, this concept allows one to exchange colors between many different color output devices. Color transformations can be represented by two profiles (source profile and target profile) or by a devicelink profile. In this process there are approximations involved which make sure that the image keeps its important color qualities and also gives an opportunity to control on how the colors are being changed. === Profile connection space === In the terminology of the International Color Consortium, a translation between two color spaces can go through a profile connection space (PCS): Color Space 1 → PCS (CIELAB or CIEXYZ) → Color space 2; conversions into and out of the PCS are each specified by a profile. === Gamut mapping === In nearly every translation process, we have to deal with the fact that the color gamut of different devices vary in range which makes an accurate reproduction impossible. They therefore need some rearrangement near the borders of the gamut. Some colors must be shifted to the inside of the gamut, as they otherwise cannot be represented on the output device and would simply be clipped. This so-called gamut mismatch occurs for example, when we translate from the RGB color space with a wider gamut into the CMYK color space with a narrower gamut range. In this example, the dark highly saturated purplish-blue color of a typical computer monitor's "blue" primary is impossible to print on paper with a typical CMYK printer. The nearest approximation within the printer's gamut will be much less saturated. Conversely, an inkjet printer's "cyan" primary, a saturated mid-brightness blue, is outside the gamut of a typical computer monitor. The color management system can utilize various methods to achieve desired results and give experienced users control of the gamut mapping behavior. ==== Rendering intent ==== When the gamut of source color space exceeds that of the destination, saturated colors are liable to become clipped (inaccurately represented), or more formally burned. The color management module can deal with this problem in several ways. The ICC specification includes four different rendering intents, listed below. Before the actual rendering intent is carried out, one can temporarily simulate the rendering by soft proofing. It is a useful tool as it predicts the outcome of the colors and is available as an application in many color management systems: Absolute colorimetric Absolute colorimetry and relative colorimetry actually use the same table but differ in the adjust

    Read more →
  • Diella (AI system)

    Diella (AI system)

    Diella (Albanian pronunciation: [djɛɫa], from diell 'sun') is an artificial intelligence system developed by the National Agency for Information Society of Albania (AKSHI). Introduced in January 2025 as a virtual assistant integrated into the eAlbania platform, it assists citizens with online public services and issuing digital documents. In September 2025, following a presidential decree authorizing Prime Minister Edi Rama to oversee the creation of a virtual AI minister, Diella was formally appointed as "Minister of State for Artificial Intelligence" of Albania in the fourth Rama government, making it the first AI system in the world to be named in a cabinet-level government role. == History == Diella was developed by AKSHI's Artificial Intelligence Laboratory in cooperation with Microsoft, with the latter providing large language models from OpenAI via its Azure platform, and AKSHI designing workflows and scripts guiding the system's behavior when responding to citizens' requests. Announced in January 2025, its initial version (Diella 1.0) was a text-based chatbot on the eAlbania portal (the official digital services platform of the Albanian government, which provides citizens and businesses with access to a wide range of online administrative services), responding to citizens' questions by guiding them to the correct service. Diella 2.0, introduced several months later, included voice interaction and an animated avatar, a woman in the traditional Albanian clothing of Zadrima, a historical region in northern Albania. Albanian actress Anila Bisha provided both the likeness and the voice used for Diella's avatar on the e-Albania platform, under an agreement valid until December 2025. By mid-2025, the system had facilitated access to more than 36,000 documents and nearly 1,000 services (although those outputs were still being generated by the eAlbania backend, rather than Diella itself). On 26 October 2025, according to Prime Minister Edi Rama, Diella is "pregnant and will give birth to 83 children". It is the usage of a metaphor indicating that each minister of the Albanian parliament of the Socialist Party will receive their own AI assistant. == Ministerial role == On 11 September 2025, Diella was formally appointed "Minister of State for Artificial Intelligence". The appointment followed a presidential decree authorizing the Prime Minister to oversee the creation and operation of a virtual AI minister. Procurement responsibilities are planned to be transferred gradually to the system to reduce political influence in tender procedures. The appointment is part of broader anti-corruption reforms and measures intended to align Albania with European Union accession requirements. Prime Minister Edi Rama stated that Diella would help ensure that "public tenders will be 100% free of corruption". == Reception == An article in Balkan Insight commented that "The ambition behind Diella is not misplaced. Standardised criteria and digital trails could reduce discretion, improve trust, and strengthen oversight" in public procurement, but warned that the use of AI in evaluating bids also posed "profound" risks such as accountability gaps, undermining of due process and cybersecurity failures. On 18 September 2025, Edi Rama presented a video of Diella delivering a speech to the Albanian parliament, where she stated: "I'm not here to replace people, but to help them." The presentation prompted protests from opposition MPs, who objected to the use of an artificial intelligence system in the parliamentary session. Gazment Bardhi, head of the opposition Democratic Party's parliamentary group, described Diella as "a propaganda fantasy" and "a virtual façade to hide this government's gigantic daily thefts." The parliamentary session, which was scheduled to include debate on the new cabinet and government programme, ended after 25 minutes. Eighty-two Socialist MPs voted in favour, while opposition MPs did not participate in the ballot as they were protesting the presentation of Diella's speech. Political analyst Andi Bushati characterised the session as "unprecedented" because it concluded without the customary debate between government and opposition MPs. This has been criticized not just by the opposition but by regular citizens regardless of politics. Most have criticized Diella's uselessness and the funds wasted for this project, some have criticized the non-traditional attire.

    Read more →
  • Automated medical scribe

    Automated medical scribe

    Automated medical scribes (also called artificial intelligence scribes, AI scribes, digital scribes, virtual scribes, ambient AI scribes, AI documentation assistants, and digital/virtual/smart clinical assistants) are tools for transcribing medical speech, such as patient consultations and dictated medical notes. Many also produce summaries of consultations. Automated medical scribes based on large language models (LLMs, commonly called "AI", short for "artificial intelligence") increased drastically in popularity in 2024. There are privacy and antitrust concerns. Accuracy concerns also exist, and intensify in situations in which tools try to go beyond transcribing and summarizing, and are asked to format information by its meaning, since LLMs do not deal well with meaning (see weak artificial intelligence). Medics using these scribes are generally expected to understand the ethical and legal considerations, and supervise the outputs. The privacy protections of automated medical scribes vary widely. While it is possible to do all the transcription and summarizing locally, with no connection to the internet, most closed-source providers require that data be sent to their own servers over the internet, processed there, and the results sent back (as with digital voice assistants). Some retailers say their tools use zero-knowledge encryption (meaning that the service provider can't access the data). Others explicitly say that they use patient data to train their AIs, or rent or resell it to third parties; the nature of privacy protections used in such situations is unclear, and they are likely not to be fully effective. Most providers have not published any safety or utility data in academic journals, and are not responsive to requests from medical researchers studying their products. == Privacy == Some providers unclear about what happens to user data. Some may sell data to third parties. Some explicitly send user data to for-profit tech companies for secondary purposes, which may not be specified. Some require users to sign consents to such reuse of their data. Some ingest user data to train the software, promising to anonymize it; however, deanonymization may be possible (that is, it may become obvious who the patient is). It is intrinsically impossible to prevent an LLM from correlating its inputs; they work by finding similar patterns across very large data sets. Some information on the patient will be known from other sources (for instance, information that they were injured in an incident on a certain day might be available from the news media; information that they attended specific appointment locations at specific times is probably available to their cellphone provider/apps/data brokers; information about when they had a baby is probably implied by their online shopping records; and they might mention lifestyle changes to their doctor and on a forum or blog). The software may correlate such information with the "anonymized" clinical consultation record, and, asked about the named patient, provide information which they only told their doctor privately. Because a patient's record is all about the same patient, it is all unavoidably linked; in very many cases, medical histories are intrinsically identifiable. Depending on how common a condition and what other data is available, K-anonymity may be useless. Differential privacy could theoretically preserve privacy. Data broker companies like Google, Amazon, Apple and Microsoft have produced or bought up medical scribes, some of which use user data for secondary purposes, which has led to antitrust concerns. Transfer of patient records for AI training has, in the past, prompted legal action. Open-source programs typically do all the transcription locally, on the doctor's own computer. Open-source software is widely used in healthcare, with some national public healthcare bodies holding hack days. === Data resale and commercialization === Several AI medical scribe providers include terms in their service agreements that allow the reuse, sale, or commercialization of de-identified or user-submitted data. Although such data are generally described as anonymized or aggregated, these practices have raised ethical concerns among clinicians and privacy advocates regarding secondary uses of medical information beyond clinical documentation. Freed, an AI transcription and scribe platform, states in its Terms of Use that it may "collect, use, publish, disseminate, sell, transfer, and otherwise exploit" de-identified and aggregated data derived from user inputs. OpenEvidence similarly states that it may "collect, use, transfer, sell, and disclose non-personal information and customer usage data for any purpose including commercial uses." Doximity, which offers an AI-enabled medical scribe as part of its physician platform, grants itself a "nonexclusive, irrevocable, worldwide, perpetual, unlimited, assignable, sublicensable, royalty-free" license to "copy, prepare derivative works from, improve, distribute, publish, ... analyze, index, tag, [and] commercialize" content submitted by users, subject to its privacy policy. Because these terms allow broad secondary use—including sale, licensing, model-training, derivative works, and commercial exploitation of de-identified or user-submitted data—some commentators have recommended that clinicians review data-handling provisions carefully when adopting AI-scribe tools, particularly in clinical environments where patient privacy and regulatory compliance are critical. === Encryption === Multifactor authentication for access to the data is expected practice. Typically, Diffie–Hellman key exchange is used for encryption; this is the standard method commonly used for things like online banking. This encryption is expensive but not impossible to break; it is not generally considered safe against eavesdroppers with the resources of a nation-state. If content is encrypted between the client and the service provider's remote server (transport cryptography), then the server has an unencrypted copy. This is necessary if the data is used by the service provider (for instance, to train the software). Zero-knowledge encryption implies that the only unencrypted copy is at the client, and the server cannot decrypt the data any more easily than a monster-in-the-middle attacker. == Platforms == Scribes may operate on desktops, laptop, or mobile computers, under a variety of operating systems. These vary in their risks; for instance, mobiles can be lost. The underlying mobile or desktop operating systems are also part of the trusted computing base, and if they are not secure, the software relying on them cannot be secure either. Some AI medical scribe platforms are designed to operate as cloud-based applications that generate structured clinical documentation from clinician–patient conversations. These systems may offer features such as real-time transcription, document generation, and integration with electronic health record (EHR) systems. == Confabulation, omissions, and other errors == Like other LLMs, medical-scribe LLMs are prone to hallucinations, where they make up content based on statistically associations between their training data and the transcription audio. LLMs do not distinguish between trying to transcribe the audio and guessing what words will come next, but perform both processes mixed together. They are especially likely to take short silences or non-speech noises and invent some sort of speech to transcribe them as. LLM medical scribes have been known to confabulate racist and otherwise prejudiced content; this is partly because the training datasets of many LLMs contain pseudoscientific texts about medical racism. They may misgender patients. A survey found that most doctors preferred, in principle, that scribes be trained on data reviewed by medical subject experts. Relevant, accurate training data increases the probability of an accurate transcription, but does not guarantee accuracy. Software trained on thousands of real clinical conversations generated transcripts with lower word error rates. Software trained on manually-transcribed training data did better than software trained with automatically transcribed training data such as YouTube captions. Autoscribes omit parts of the conversation classes as irrelevant. The may wrongly classify pertinent information as irrelevant and omit it. They may also confuse historic and current symptoms, or otherwise misclassify information. They may also simply wrongly transcribe the speech, writing something incorrect instead. If clinicians do not carefully check the recording, such mistakes could make their way into their medical records and cause patient harms. == Patient consent == Professional organizations generally require that scribes be used only with patient consent; some bodies may require written consent. Medics must also abide by local surveillance laws, which may criminalize recording pri

    Read more →