Logic Theorist

Logic Theorist

Logic Theorist is a computer program completed in 1956 by Allen Newell, Herbert A. Simon, and Cliff Shaw. It was the first program deliberately engineered to perform automated reasoning, and has been described as "the first artificial intelligence program". Logic Theorist proved 38 of the first 52 theorems in chapter two of Whitehead and Bertrand Russell's Principia Mathematica, and found new and shorter proofs for some of them. == History == In 1955, when Newell and Simon began to work on the Logic Theorist, the field of artificial intelligence did not yet exist; the term "artificial intelligence" would not be coined until the following summer. Simon was a political scientist who had previously studied the way bureaucracies function as well as developing his theory of bounded rationality (for which he would later win the Nobel Memorial Prize in Economic Sciences in 1978). He believed the study of business organizations requires, like artificial intelligence, an insight into the nature of human problem solving and decision making. Simon has stated that when consulting at RAND Corporation in the early 1950s, he saw a printer typing out a map, using ordinary letters and punctuation as symbols. This led him to think that a machine that could manipulate symbols could simulate decision making and possibly even the process of human thought. The program that printed the map had been written by Newell, a RAND scientist studying logistics and organization theory. For Newell, the decisive moment was in 1954 when Oliver Selfridge came to RAND to describe his work on pattern matching. Watching the presentation, Newell suddenly understood how the interaction of simple, programmable units could accomplish complex behavior, including the intelligent behavior of human beings. "It all happened in one afternoon," he would later say. It was a rare moment of scientific epiphany. "I had such a sense of clarity that this was a new path, and one I was going to go down. I haven't had that sensation very many times. I'm pretty skeptical, and so I don't normally go off on a toot, but I did on that one. Completely absorbed in it—without existing with the two or three levels consciousness so that you're working, and aware that you're working, and aware of the consequences and implications, the normal mode of thought. No. Completely absorbed for ten to twelve hours." Newell and Simon began to talk about the possibility of teaching machines to think. Their first project was a program that could prove mathematical theorems like the ones used in Bertrand Russell and Alfred North Whitehead's Principia Mathematica. They enlisted the help of computer programmer Cliff Shaw, also from RAND, to develop the program. (Newell says "Cliff was the genuine computer scientist of the three".) The first version was hand-simulated: they wrote the program onto 3x5 cards and, as Simon recalled:In January 1956, we assembled my wife and three children together with some graduate students. To each member of the group, we gave one of the cards, so that each one became, in effect, a component of the computer program ... Here was nature imitating art imitating nature. They succeeded in showing that the program could successfully prove theorems as well as a talented mathematician. Eventually Shaw was able to run the program on the computer at RAND's Santa Monica facility. In the summer of 1956, John McCarthy, Marvin Minsky, Claude Shannon and Nathan Rochester organized a conference on the subject of what they called "artificial intelligence" (a term coined by McCarthy for the occasion). Newell and Simon proudly presented the group with the Logic Theorist. It was met with a lukewarm reception. Pamela McCorduck writes "the evidence is that nobody save Newell and Simon themselves sensed the long-range significance of what they were doing." Simon confides that "we were probably fairly arrogant about it all" and adds: They didn't want to hear from us, and we sure didn't want to hear from them: we had something to show them! ... In a way it was ironic because we already had done the first example of what they were after; and second, they didn't pay much attention to it. Logic Theorist soon proved 38 of the first 52 theorems in chapter 2 of the Principia Mathematica. The proof of theorem 2.85 was actually more elegant than the proof produced laboriously by hand by Russell and Whitehead (2026-03-20: What is called here Theorem 2.85 is, in fact, numbered as 2.53 in the page 107 of the 1963 Cambridge University Press edition (https://www.uhu.es/francisco.moreno/gii_mac/docs/Principia_Mathematica_vol1.pdf) and which appears, under the same 2.53 number, on page 112 of the 1910 CUP Edition, according to the digitalization on wikibooks (https://en.wikisource.org/wiki/Russell_%26_Whitehead%27s_Principia_Mathematica/Part_1/Section_A#Discussion_2)). Simon was able to show the new proof to Russell himself who "responded with delight". They attempted to publish the new proof in The Journal of Symbolic Logic, but it was rejected on the grounds that a new proof of an elementary mathematical theorem was not notable, apparently overlooking the fact that one of the authors was a computer program. Newell and Simon formed a lasting partnership, founding one of the first AI laboratories at the Carnegie Institute of Technology and developing a series of influential artificial intelligence programs and ideas, including the General Problem Solver, Soar, and their unified theory of cognition. == Architecture == The Logic Theorist is a program that performs logical processes on logical expressions. The Logic Theorist operates on the following principles: === Expressions === An expression is made of elements. There are two kinds of memories: working and storage. Each working memory contains a single element. The Logic Theorist usually uses 1 to 3 working memories. Each storage memory is a list representing a full expression or a set of elements. In particular, it contains all the axioms and proven logical theorems. An expression is an abstract syntax tree, each node being an element with up to 11 attributes. For example, the logical expression ¬ P → ( Q ∧ ¬ P ) {\displaystyle \neg P\to (Q\wedge \neg P)} is represented as a tree with a root element representing → {\displaystyle \to } . Among the attributes of the root element are pointers to the two elements representing the subexpressions ¬ P {\displaystyle \neg P} and Q ∧ ¬ P {\displaystyle Q\wedge \neg P} . === Processes === There are four kinds of processes, from the lowest to the highest level. Instruction: These are similar to assembly code. They may either perform a primitive operation on an expression in working memory, or perform a conditional jump to another instruction. An example is "put the right sub-element of working-memory 1 to working-memory 2" Elementary process: These are similar to subroutines. A sequence of instructions that can be called. Method: A sequence of elementary processes. There are 4 methods: substitution: given an expression, it attempts to transform it to a proven theorem or axiom by substitutions of variables and logical connectives. detachment: given expression B {\displaystyle B} , it attempts to find a proven theorem or axiom of form A → B ′ {\displaystyle A\to B'} , where B ′ {\displaystyle B'} yields B {\displaystyle B} after substitution, then attempts to prove A {\displaystyle A} by substitution. chaining forward: given expression A → C {\displaystyle A\to C} , it attempts to find for a proven theorem or axiom of form A → B {\displaystyle A\to B} , then attempt to prove B → C {\displaystyle B\to C} by substitution. chaining backward: given expression A → C {\displaystyle A\to C} , it attempts to find for a proven theorem or axiom of form B → C {\displaystyle B\to C} , then attempt to prove A → B {\displaystyle A\to B} by substitution. executive control method: This method applies each of the 4 methods in sequence to each theorem to be proved. == Logic Theorist's influence on AI == Logic Theorist introduced several concepts that would be central to AI research: Reasoning as search Logic Theorist explored a search tree: the root was the initial hypothesis, each branch was a deduction based on the rules of logic. Somewhere in the tree was the goal: the proposition the program intended to prove. The pathway along the branches that led to the goal was a proof – a series of statements, each deduced using the rules of logic, that led from the hypothesis to the proposition to be proved. Heuristics Newell and Simon realized that the search tree would grow exponentially and that they needed to "trim" some branches, using "rules of thumb" to determine which pathways were unlikely to lead to a solution. They called these ad hoc rules "heuristics", using a term introduced by George Pólya in his classic book on mathematical proof, How to Solve It. (Newell had taken courses from Pólya at Stanford). Heuristics would become an important area o

Poop Map

Poop Map is a social app where users can track on a map where and when they defecate. In addition to logging location and time of each bowel movement, users can also add a photo, "like" other users' logs, and rate each account. The social elements of the app allow for groups of users to create a competitive league. Certain behaviors unlock achievements in-app. == Development == The app was created by app developer Nino Uzelac. It was launched in July 2013. == Popularity == The app charted at number one on the Apple App Store charts in 2021 after going viral on TikTok. As of September 2024, the app has a 4.8 rating on the App Store and more than 58,000 ratings. It also has more than one million downloads on the Google Play Store. Poop Map is notably popular among hikers, and has been written about in the outdoors magazine Outside.

Kolmogorov–Arnold Networks

Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.

Accelerated Linear Algebra

XLA (Accelerated Linear Algebra) is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include: Compilation of Computation Graphs: Compiles computation graphs into efficient machine code. Optimization Techniques: Applies operation fusion, memory optimization, and other techniques. Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs. Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference. Seamless Integration: Can be used with existing machine learning code with minimal changes. XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance. == OpenXLA Project == OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation. == Components == The OpenXLA ecosystem includes several core components: XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets. PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API. StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks. Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments. Additional profiling, testing, and integration tools maintained under the OpenXLA organization. == Users and adopters == Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive. == Supported target devices == x86-64 ARM64 NVIDIA GPU AMD GPU Intel GPU Apple GPU Google TPU AWS Trainium, Inferentia Cerebras Graphcore IPU == Governance == OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures. == History == The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.

Instance-based learning

In machine learning, instance-based learning (sometimes called memory-based learning) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory. Because computation is postponed until a new instance is observed, these algorithms are sometimes referred to as "lazy." It is called instance-based because it constructs hypotheses directly from the training instances themselves. This means that the hypothesis complexity can grow with the data: in the worst case, a hypothesis is a list of n training items and the computational complexity of classifying a single new instance is O(n). One advantage that instance-based learning has over other methods of machine learning is its ability to adapt its model to previously unseen data. Instance-based learners may simply store a new instance or throw an old instance away. Examples of instance-based learning algorithms are the k-nearest neighbors algorithm, kernel machines and RBF networks. These store (a subset of) their training set; when predicting a value/class for a new instance, they compute distances or similarities between this instance and the training instances to make a decision. To battle the memory complexity of storing all training instances, as well as the risk of overfitting to noise in the training set, instance reduction algorithms have been proposed.

Dailyhunt

Dailyhunt (formerly Newshunt) is an Indian content and news aggregator application based in Bangalore, India that provides local language content in 14 Indian languages from multiple content providers. Viru serves as Founder of Dailyhunt with Co-founder Umang Bedi. == History == Dailyhunt, earlier called Newshunt, was created as a Symbian app in 2009 by two ex-Nokia employees Umesh Kulkarni and Chandrashekhar Sohoni. Later in 2011, Newshunt became available on the Android platform. It was by that time that Virendra Gupta, founder of Verse acquired the application. Virendra Gupta, better known as Viru, had started Verse in 2007 as a value-added service (VAS) company. In 2011, he acquired Newshunt from its owners Umesh and Chandrashekhar. Umesh became the CTO and stayed on to oversee its transition towards the smartphone era. In 2015, Viru renamed Newshunt as Dailyhunt. In early 2018, Viru roped in Umang Bedi, to be the President of Dailyhunt and lead the business with him while focusing on making the benefits of the platform available to a larger audience. Umang was elevated to co-founder in 2020. == Funding == In September 2014, Dailyhunt (then known as Newshunt) closed its Series B funding of INR 1 billion ( or approx $12 million in 2014) from Sequoia Capital India. The Series C funding round was led by Falcon Capital and was closed with $40 million in February 2015. In October 2016, the company received its Series D funding of $25 million from ByteDance and a Series E funding of $6.39 million from Falcon Edge Capital in September 2018. Additionally, Dailyhunt raised $3 Mn (INR 21.75 Cr) in a Series F funding round from Stonebridge Capital in August 2019. Other investors of Dailyhunt include Matrix Partners India, Omidyar Network, Goldman Sachs and Sofina. == Tie-ups and partnerships == In January 2021, Dailyhunt partnered with Twitter to bring ‘Twitter Moments’ to the Indian social app. Dailyhunt app now has a dedicated tab called “Twitter Moments India” to showcase curated tweets pertaining to news and other events. In January 2021, Dailyhunt announced the premiere of Season 2 of the popular show QuoteUnquote with KK (Kapil Khandelwal) on the app. It was the first podcast to have been launched on the Dailyhunt app. In September 2020, Dailyhunt signed up as an Associate Sponsor with Star Sports for Dream 11 IPL 2020. In May 2020, Snapdeal partnered with Dailyhunt to add new content on marketplace. In March 2019, Discovery Communications India, the factual entertainment network, entered into a multi-year partnership with Dailyhunt to showcase short-form content.

Environmental impact of AI

The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl