"The Sword in the Stoned" is the fifth episode of the second season of the American fantasy comedy television series Ted. Written by Julius Sharpe, and directed by Seth MacFarlane, it premiered on the American streaming service Peacock, along with the rest of season two, on March 5, 2026. The series acts as a precursor to the Ted film franchise, showcasing the childhood lives of the protagonists. The series, set in 1994, focuses on John Bennett (Max Burkholder), the series' primary protagonist, an awkward high-school aged boy; along with Ted (MacFarlane), the series' titular anthropomorphic teddy bear. The two live with John's family, Susan (Alanna Ubach), his mild mannered mother, and Matty (Scott Grimes), his conservative father. Also residing with the family is Blaire (Giorgia Whigham), his radically liberal cousin whom often clashes with Matty. In the episode, Ted and John join the school play so they can have more extracurricular activities for their college applications, but the latter grows a connection with the school's popular teenager, Erin (Francesca Xuereb). Concurrently, Susan and Matty get a job at Dunkin' Donuts to help with their financial troubles, and Matty is given an opportunity to tell off Bill Clinton. Burkholder wore prop armor during the episode's play scenes. Bill Clinton’s appearance in the episode was portrayed by MacFarlane. After conventional makeup and visual techniques failed to convincingly resemble Clinton, the production used artificial intelligence to digitally replace MacFarlane's face with Clinton's likeness. Upon release, the episode received generally positive reviews from critics, though the use of AI in the Clinton scene was polarizing among audiences and reviewers. == Plot == John tells Ted that he is the last single guy left at their school, to which Ted points out the popular, single cheerleader, Erin, but John dismisses this. At home, Blaire tells John that he needs extracurricular activities to get into college, while Susan and Matty discuss their financial troubles, especially regarding John's college tuition. Looking over their options, they decide to audition for a school production of the play Camelot. Matty takes a job at Dunkin' Donuts, despite being told that nobody will give him a tip, and having to wear an incorrect name tag. Waiting for their auditions, John and Ted watch several poor auditions for the play before seeing Erin's, who delivers a flawless performance; John and Ted do less serious auditions, getting cast as knights, while Erin gets the role of Guinevere. Matty complains about his low salary, and Susan decides to get a job at Dunkin' Donuts beside him to help earn more income. Erin clashes with Lancelot's actor while rehearsing, and John compliments her performance, which she ignores, but, seeing Ted and John give good performances in a repetition exercise, she becomes interested in him, particularly since he treats her better than her stage-partner. Matty and Susan watch an employee training video, explaining how they should treat customers politely, not affecting Matty's nihilistic attitude. The manager announces that Bill Clinton is visiting their Dunkin' Donuts for publicity, and Matty sees this as a chance to tell Bill off. John and Erin practice lines, as she reveals the show is being taped so it can be sent to Emerson College in hopes of her getting in; Erin asks John to go out with her after the show. At dinner, Matty enthusiastically reveals what he plans to tell Bill, as John becomes stressed about the play when Susan tells there will be a large audience. Bill comes to the Dunkin' Donuts, and, seeing Matty is nervously insulting him, stages a private meeting with him, where Bill yells at Matty, calling him a loser before posing for a picture with Matty and subsequently throwing the cold coffee onto him. To ease the pressure, Ted and John take edibles from Blaire, but learn at the show that they contained mushrooms, causing them to stress further. On stage, Ted and John yell nervously that they're on drugs as the latter urinates in his costume, causing Erin to angrily storm off. == Production == "The Sword in the Stoned" was directed by series creator and lead Seth MacFarlane, and written by Julius Sharpe in his third and final writing credit for the series. When Ted and John are doing repetition exercises, they tackle each other to the ground, which required a stuntman named Ashton to play the role of Ted, according to Max Burkholder, who portrays John. Burkholder also recalled that, when Ted was choking John in the scene, he kept making a noise during the choking, which made Bill, the cameraman, laugh, despite being a "stone face" that never laughs, noting that seeing him be amused by the noise he was making assured Burkholder that what he was doing was "hilarious". Burkholder found the filming of the play scenes "weird", as he was put in fake armor with a hose inside his suit—which was filled with water mixed with yellow food coloring—that was made to create the urine stream that comes out of John's armor in the episode; he also noted that it took around 45 minutes to put on and take off the armor. He revealed that he himself had to urinate during the filming, as doing a scene about a character having to do so "really [broke] my brain", with the fact that it took 45 minutes to get the suit off adding to the frustration. Jennifer Ashley Connell, who worked for wardrobe, had to repeatedly go to Burkholder quickly between takes to dry off his pants with two hair dryers to make it look like the fake urine hadn't already streamed down his pants, so they could get as many shots of it as possible. Francesca Xuereb guest stars in the episode as Erin, the cheerleader who stars in the play. Incumbent president Bill Clinton was portrayed by MacFarlane, with artificial intelligence (AI) being used to digitally make MacFarlane's face look like Clinton's during post-production. Before settling on AI, the crew tried to use traditional computer-generated imagery and prosthetics, which made him look "terrifying", resulting in them deciding that AI would give them a more accurate look. One of the original technologies considered was one where, after scanning MacFarlane, a mesh of his head was created, and they had to use computer graphics to replace MacFarlane's face with Clinton's. An issue was faced, however, when they found the archival footage used as reference from the Clinton Library—an official Presidential Library containing information related to Clinton—to be extremely low-quality, making it hard to properly emulate his face, since only still images were of acceptable quality, and there weren't references of his moving face to work off of. A forensic artist was hired to help with this, and they created a 3D model of Clinton's head in ZBrush, based off of his presidential portrait. The model head worked for still frames, but movement was still difficult to do realistically, due to it being made for a "single-point perspective", which made details like the cheekbones or other minor issues more noticeable when using it for the scene. Since this did not work, AI was ultimately chosen through the studio Deep Voodoo, which used large language models to teach the tool how to correctly replicate Clinton's appearance. Defending the episode's use of AI, MacFarlane noted that the crew did not want people to focus on the tool being used, trying to utilize it in a way that wouldn't distract from the humor and narrative. Like the rest of the series, the episode was shot using ViewScreen; MacFarlane was able to act live with the cast as Ted due to ViewScreen, a technology that allows the production crew to visualize what Ted will look like in each scene in real time. == Release and reception == "The Sword in the Stoned" was first released on March 5, 2026, on the American streaming service Peacock, along with the rest of the second season. Nate Richards of Collider highlighted the Dunkin' Donuts subplot as an example of Scott Grimes delivering a "lot of laughs" through his performance as Matty. Dustin Rowles of Pajiba called "The Sword in the Stoned" one of the season's many episodes he'd recommend, particularly for the scenes of Ted and John being high on mushrooms during the play. Oppositely, Nick Valdez of ComicBook.com ranked the episode as the worst of the second season, criticizing it for not having a "huge impact" on the Bennett family dynamic like other episodes of the season do, and Susan and Matty's side story as the main reason he felt it was "[kept] from being great". Valdez noted the episode for likely being an advertisement for Dunkin' Donuts, calling the plot's ending scene involving Clinton the reason "it just all sticks out like a sore thumb". === Response to AI usage === The episode's use of AI for MacFarlane's portrayal of Clinton proved controversial, mainly on social media, where audiences asserted that the crew should have gotten an actor that resembl
Rule-based machine translation
Rule-based machine translation (RBMT) is a classical approach of machine translation systems based on linguistic information about source and target languages. Such information is retrieved from (unilingual, bilingual or multilingual) dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language. Having input sentences, an RBMT system generates output sentences on the basis of analysis of both the source and the target languages involved. RBMT has been progressively superseded by more efficient methods, particularly neural machine translation. == History == The first RBMT systems were developed in the early 1970s. The most important steps of this evolution were the emergence of the following RBMT systems: Systran Japanese MT systems Today, other common RBMT systems include: Apertium GramTrans == Types of RBMT == There are three different types of rule-based machine translation systems: Direct Systems (Dictionary Based Machine Translation) map input to output with basic rules. Transfer RBMT Systems (Transfer Based Machine Translation) employ morphological and syntactical analysis. Interlingual RBMT Systems (Interlingua) use an abstract meaning. RBMT systems can also be characterized as the systems opposite to Example-based Systems of Machine Translation (Example Based Machine Translation), whereas Hybrid Machine Translations Systems make use of many principles derived from RBMT. == Basic principles == The main approach of RBMT systems is based on linking the structure of the given input sentence with the structure of the demanded output sentence, necessarily preserving their unique meaning. The following example can illustrate the general frame of RBMT: A girl eats an apple. Source Language = English; Demanded Target Language = German Minimally, to get a German translation of this English sentence one needs: A dictionary that will map each English word to an appropriate German word. Rules representing regular English sentence structure. Rules representing regular German sentence structure. And finally, we need rules according to which one can relate these two structures together. Accordingly, we can state the following stages of translation: 1st: getting basic part-of-speech information of each source word: a = indef.article; girl = noun; eats = verb; an = indef.article; apple = noun 2nd: getting syntactic information about the verb "to eat": NP-eat-NP; here: eat – Present Simple, 3rd Person Singular, Active Voice 3rd: parsing the source sentence: (NP an apple) = the object of eat Often only partial parsing is sufficient to get to the syntactic structure of the source sentence and to map it onto the structure of the target sentence. 4th: translate English words into German a (category = indef.article) => ein (category = indef.article) girl (category = noun) => Mädchen (category = noun) eat (category = verb) => essen (category = verb) an (category = indef. article) => ein (category = indef.article) apple (category = noun) => Apfel (category = noun) 5th: Mapping dictionary entries into appropriate inflected forms (final generation): A girl eats an apple. => Ein Mädchen isst einen Apfel. == Ontologies == An ontology is a formal representation of knowledge that includes the concepts (such as objects, processes etc.) in a domain and some relations between them. If the stored information is of linguistic nature, one can speak of a lexicon. In NLP, ontologies can be used as a source of knowledge for machine translation systems. With access to a large knowledge base, rule-based systems can be enabled to resolve many (especially lexical) ambiguities on their own. In the following classic examples, as humans, we are able to interpret the prepositional phrase according to the context because we use our world knowledge, stored in our lexicons:I saw a man/star/molecule with a microscope/telescope/binoculars.Since the syntax does not change, a traditional rule-based machine translation system may not be able to differentiate between the meanings. With a large enough ontology as a source of knowledge however, the possible interpretations of ambiguous words in a specific context can be reduced. === Building ontologies === The ontology generated for the PANGLOSS knowledge-based machine translation system in 1993 may serve as an example of how an ontology for NLP purposes can be compiled: A large-scale ontology is necessary to help parsing in the active modules of the machine translation system. In the PANGLOSS example, about 50,000 nodes were intended to be subsumed under the smaller, manually-built upper (abstract) region of the ontology. Because of its size, it had to be created automatically. The goal was to merge the two resources LDOCE online and WordNet to combine the benefits of both: concise definitions from Longman, and semantic relations allowing for semi-automatic taxonomization to the ontology from WordNet. A definition match algorithm was created to automatically merge the correct meanings of ambiguous words between the two online resources, based on the words that the definitions of those meanings have in common in LDOCE and WordNet. Using a similarity matrix, the algorithm delivered matches between meanings including a confidence factor. This algorithm alone, however, did not match all meanings correctly on its own. A second hierarchy match algorithm was therefore created which uses the taxonomic hierarchies found in WordNet (deep hierarchies) and partially in LDOCE (flat hierarchies). This works by first matching unambiguous meanings, then limiting the search space to only the respective ancestors and descendants of those matched meanings. Thus, the algorithm matched locally unambiguous meanings (for instance, while the word seal as such is ambiguous, there is only one meaning of seal in the animal subhierarchy). Both algorithms complemented each other and helped constructing a large-scale ontology for the machine translation system. The WordNet hierarchies, coupled with the matching definitions of LDOCE, were subordinated to the ontology's upper region. As a result, the PANGLOSS MT system was able to make use of this knowledge base, mainly in its generation element. == Components == The RBMT system contains: a SL morphological analyser - analyses a source language word and provides the morphological information; a SL parser - is a syntax analyser which analyses source language sentences; a translator - used to translate a source language word into the target language; a TL morphological generator - works as a generator of appropriate target language words for the given grammatica information; a TL parser - works as a composer of suitable target language sentences; Several dictionaries - more specifically a minimum of three dictionaries: a SL dictionary - needed by the source language morphological analyser for morphological analysis, a bilingual dictionary - used by the translator to translate source language words into target language words, a TL dictionary - needed by the target language morphological generator to generate target language words. The RBMT system makes use of the following: a Source Grammar for the input language which builds syntactic constructions from input sentences; a Source Lexicon which captures all of the allowable vocabulary in the domain; Source Mapping Rules which indicate how syntactic heads and grammatical functions in the source language are mapped onto domain concepts and semantic roles in the interlingua; a Domain Model/Ontology which defines the classes of domain concepts and restricts the fillers of semantic roles for each class; Target Mapping Rules which indicate how domain concepts and semantic roles in the interlingua are mapped onto syntactic heads and grammatical functions in the target language; a Target Lexicon which contains appropriate target lexemes for each domain concept; a Target Grammar for the target language which realizes target syntactic constructions as linearized output sentences. == Advantages == No bilingual texts are required. This makes it possible to create translation systems for languages that have no texts in common, or even no digitized data whatsoever. Domain independent. Rules are usually written in a domain independent manner, so the vast majority of rules will "just work" in every domain, and only a few specific cases per domain may need rules written for them. No quality ceiling. Every error can be corrected with a targeted rule, even if the trigger case is extremely rare. This is in contrast to statistical systems where infrequent forms will be washed away by default. Total control. Because all rules are hand-written, you can easily debug a rule-based system to see exactly where a given error enters the system, and why. Reusability. Because RBMT systems are generally built from a strong source language analysis that is fed to a transfer step and target language generator, the source language analysis and targe
Evaluation of binary classifiers
Evaluation of a binary classifier typically assigns a numerical value, or values, to a classifier that represent its accuracy. An example is error rate, which measures how frequently the classifier makes a mistake. There are many metrics that can be used; different fields have different preferences. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent of the prevalence or skew (how often each class occurs in the population), and metrics that depend on the prevalence – both types are useful, but they have very different properties. Often, evaluation is used to compare two methods of classification, so that one can be adopted and the other discarded. Such comparisons are more directly achieved by a form of evaluation that results in a single unitary metric rather than a pair of metrics. == Contingency table == Given a data set, a classification (the output of a classifier on that set) gives two numbers: the number of positives and the number of negatives, which add up to the total size of the set. To evaluate a classifier, one compares its output to another reference classification – ideally a perfect classification, but in practice the output of another gold standard test – and cross tabulates the data into a 2×2 contingency table, comparing the two classifications. One then evaluates the classifier relative to the gold standard by computing summary statistics of these 4 numbers. Generally these statistics will be scale invariant (scaling all the numbers by the same factor does not change the output), to make them independent of population size, which is achieved by using ratios of homogeneous functions, most simply homogeneous linear or homogeneous quadratic functions. Say we test some people for the presence of a disease. Some of these people have the disease, and our test correctly says they are positive. They are called true positives (TP). Some have the disease, but the test incorrectly claims they don't. They are called false negatives (FN). Some don't have the disease, and the test says they don't – true negatives (TN). Finally, there might be healthy people who have a positive test result – false positives (FP). These can be arranged into a 2×2 contingency table (confusion matrix), conventionally with the test result on the vertical axis and the actual condition on the horizontal axis. These numbers can then be totaled, yielding both a grand total and marginal totals. Totaling the entire table, the number of true positives, false negatives, true negatives, and false positives add up to 100% of the set. Totaling the columns (adding vertically) the number of true positives and false positives add up to 100% of the test positives, and likewise for negatives. Totaling the rows (adding horizontally), the number of true positives and false negatives add up to 100% of the condition positives (conversely for negatives). The basic marginal ratio statistics are obtained by dividing the 2×2=4 values in the table by the marginal totals (either rows or columns), yielding 2 auxiliary 2×2 tables, for a total of 8 ratios. These ratios come in 4 complementary pairs, each pair summing to 1, and so each of these derived 2×2 tables can be summarized as a pair of 2 numbers, together with their complements. Further statistics can be obtained by taking ratios of these ratios, ratios of ratios, or more complicated functions. The contingency table and the most common derived ratios are summarized below; see sequel for details. Note that the rows correspond to the condition actually being positive or negative (or classified as such by the gold standard), as indicated by the color-coding, and the associated statistics are prevalence-independent, while the columns correspond to the test being positive or negative, and the associated statistics are prevalence-dependent. There are analogous likelihood ratios for prediction values, but these are less commonly used, and not depicted above. == Pairs of metrics == Often accuracy is evaluated with a pair of metrics composed in a standard pattern. === Sensitivity and specificity === The fundamental prevalence-independent statistics are sensitivity and specificity. Sensitivity or True Positive Rate (TPR), also known as recall, is the proportion of people that tested positive and are positive (True Positive, TP) of all the people that actually are positive (Condition Positive, CP = TP + FN). It can be seen as the probability that the test is positive given that the patient is sick. With higher sensitivity, fewer actual cases of disease go undetected (or, in the case of the factory quality control, fewer faulty products go to the market). Specificity (SPC) or True Negative Rate (TNR) is the proportion of people that tested negative and are negative (True Negative, TN) of all the people that actually are negative (Condition Negative, CN = TN + FP). As with sensitivity, it can be looked at as the probability that the test result is negative given that the patient is not sick. With higher specificity, fewer healthy people are labeled as sick (or, in the factory case, fewer good products are discarded). The relationship between sensitivity and specificity, as well as the performance of the classifier, can be visualized and studied using the Receiver Operating Characteristic (ROC) curve. In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100% in both (such as in the red/blue ball example given above). In more practical, less contrived instances, however, there is usually a trade-off, such that they are inversely proportional to one another to some extent. This is because we rarely measure the actual thing we would like to classify; rather, we generally measure an indicator of the thing we would like to classify, referred to as a surrogate marker. The reason why 100% is achievable in the ball example is because redness and blueness is determined by directly detecting redness and blueness. However, indicators are sometimes compromised, such as when non-indicators mimic indicators or when indicators are time-dependent, only becoming evident after a certain lag time. The following example of a pregnancy test will make use of such an indicator. Modern pregnancy tests do not use the pregnancy itself to determine pregnancy status; rather, human chorionic gonadotropin is used, or hCG, present in the urine of gravid females, as a surrogate marker to indicate that a woman is pregnant. Because hCG can also be produced by a tumor, the specificity of modern pregnancy tests cannot be 100% (because false positives are possible). Also, because hCG is present in the urine in such small concentrations after fertilization and early embryogenesis, the sensitivity of modern pregnancy tests cannot be 100% (because false negatives are possible). === Positive and negative predictive values === In addition to sensitivity and specificity, the performance of a binary classification test can be measured with positive predictive value (PPV), also known as precision, and negative predictive value (NPV). The positive prediction value answers the question "If the test result is positive, how well does that predict an actual presence of disease?". It is calculated as TP/(TP + FP); that is, it is the proportion of true positives out of all positive results. The negative prediction value is the same, but for negatives, naturally. ==== Impact of prevalence on predictive values ==== Prevalence has a significant impact on prediction values. As an example, suppose there is a test for a disease with 99% sensitivity and 99% specificity. If 2000 people are tested and the prevalence (in the sample) is 50%, 1000 of them are sick and 1000 of them are healthy. Thus about 990 true positives and 990 true negatives are likely, with 10 false positives and 10 false negatives. The positive and negative prediction values would be 99%, so there can be high confidence in the result. However, if the prevalence is only 5%, so of the 2000 people only 100 are really sick, then the prediction values change significantly. The likely result is 99 true positives, 1 false negative, 1881 true negatives and 19 false positives. Of the 19+99 people tested positive, only 99 really have the disease – that means, intuitively, that given that a patient's test result is positive, there is only 84% chance that they really have the disease. On the other hand, given that the patient's test result is negative, there is only 1 chance in 1882, or 0.05% probability, that the patient has the disease despite the test result. === Precision and recall === Precision and recall can be interpreted as (estimated) conditional probabilities: Precision is given by P ( C = P | C ^ = P ) {\displaystyle P(C=P|{\hat {C}}=P)} while recall is given by P ( C ^ = P | C = P ) {\displaystyle P({\hat {C}}=P|C=P)} , where C ^ {\
Environmental impact of AI
The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl
Coherent extrapolated volition
Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment describing an approach by which an artificial superintelligence (ASI) would act on a benevolent supposition of what humans would want if they were more knowledgeable, more rational, had more time to think, and had matured together as a society, as opposed to humanity's current individual or collective preferences. It was proposed by Eliezer Yudkowsky in 2004 as part of his work on friendly AI. == Concept == CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences. In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted. == Debate == Yudkowsky and Nick Bostrom note that CEV has several interesting properties. It is designed to be humane and self-correcting, by capturing the source of human values instead of trying to list them. It avoids the difficulty of laying down an explicit, fixed list of rules. It encapsulates moral growth, preventing flawed current moral beliefs from getting locked in. It limits the influence that a small group of programmers can have on what the ASI would value, thus also reducing the incentives to build ASI first. And it keeps humanity in charge of its destiny. CEV also faces significant theoretical and practical challenges. Bostrom notes that CEV has "a number of free parameters that could be specified in various ways, yielding different versions of the proposal." One such parameter is the extrapolation base (whose extrapolated volition is taken into account). For example, whether it should include people with severe dementia, patients in a vegetative state, foetuses, or embryos. He also notes that if CEV's extrapolation base only includes humans, there is a risk that the result would be ungenerous toward other animals and digital minds. One possible solution would be to include a mechanism to expand CEV's extrapolation base. == Variants and alternatives == A proposed theoretical alternative to CEV is to rely on an artificial superintelligence's superior cognitive capabilities to figure out what is morally right, and let it act accordingly. It is also possible to combine both techniques, for instance with the ASI following CEV except when it is morally impermissible. In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.
Cloudflare
Cloudflare, Inc., is an American technology company headquartered in San Francisco, California, that provides a range of internet services, including content delivery network (CDN) services, cloud cybersecurity, DDoS mitigation, and ICANN-accredited domain registration. The company's services act primarily as a reverse proxy between website visitors and a customer's hosting provider, improving performance and protecting against malicious traffic. Cloudflare was founded in 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn. The company went public on the New York Stock Exchange in 2019 under the ticker symbol NET. Cloudflare has since expanded its offerings to include edge computing through its Workers platform, a public DNS resolver (1.1.1.1), and a VPN-like service known as WARP. In recent years, the company has integrated artificial intelligence into its infrastructure, acquiring companies such as Replicate and launching tools to manage AI bots and scrapers. According to W3Techs, Cloudflare is used by approximately 21.3% of all websites on the Internet as of January 2026. The company has been the subject of controversy regarding its policy of content neutrality. While Cloudflare executives have historically advocated for remaining a neutral infrastructure provider, the company has terminated services for specific high-profile websites associated with hate speech and violence, including The Daily Stormer, 8chan, and Kiwi Farms, following significant public pressure. Cloudflare has also faced criticism and litigation regarding copyright infringement by websites using its services, notably losing a lawsuit against Japanese publishers in 2025. The company experienced significant global outages in late 2025 which disrupted services for major platforms internationally. == History == Cloudflare was founded on July 26, 2009, by Matthew Prince, Lee Holloway, and Michelle Zatlyn. Prince and Holloway had previously collaborated on Project Honey Pot, a product of Unspam Technologies that partly inspired the basis of Cloudflare. In 2009, the company was venture-capital funded. On August 15, 2019, Cloudflare submitted its S-1 filing for an initial public offering on the New York Stock Exchange under the stock ticker NET. It opened for public trading on September 13, 2019, at $15 per share. According to the company, the name 'Cloudflare' was chosen, over the initial 'WebWall', because it best described what they were trying to do: build a "firewall in the cloud." In 2020, Cloudflare co-founder and COO Michelle Zatlyn was named president. Cloudflare has acquired web-services and security companies, including StopTheHacker (February 2014), CryptoSeal (June 2014), Eager Platform Co. (December 2016), Neumob (November 2017), S2 Systems (January 2020), Linc (December 2020), Zaraz (December 2021), Vectrix (February 2022), Area 1 Security (February 2022), Nefeli Networks (March 2024), BastionZero (May 2024), and Kivera (October 2024). Replicate (November 2025), and Human Native (January 2026). Since at least 2017, Cloudflare has used a wall of lava lamps at its San Francisco headquarters as a source of randomness for encryption keys, alongside double pendulums at its London offices and a Geiger counter at its Singapore offices. The lava lamp installation implements the Lavarand method, where a camera transforms the unpredictable shapes of the "lava" blobs into a digital image. In Q4 2022, Cloudflare provided paid services to 162,086 customers. In October 2024, Cloudflare won a lawsuit against patent troll Sable Networks. Sable paid Cloudflare $225,000, granted it a royalty-free license to its patent portfolio, and dedicated its patents to the public by abandoning its patent rights. In November 2025, it was announced Cloudflare had agreed to acquire Replicate, a San Francisco–based platform that enables software developers to run, fine-tune, and deploy open-source machine-learning models via an API without managing infrastructure. In January 2026, Cloudflare released an analysis regarding BGP routing leaks observed from the Venezuelan state-owned ISP CANTV (AS8048), which occurred on January 2 coincides with the arrest of Nicolás Maduro. While some security researchers had speculated that the outages were linked to U.S. cyber operations, Cloudflare's data indicated that the anomalies were consistent with a pattern of "insufficient routing export and import policies" by the ISP rather than malicious external interference. In January 2026, Cloudflare acquired Human Native, an AI data marketplace that brokers transactions between developers and content creators, for an undisclosed amount. On January 16, 2026, Cloudflare acquired The Astro Technology Company, the developers behind the open-source web framework Astro. In May 2026, Cloudflare announced the elimination of approximately 1,100 positions, around 20 percent of its workforce, in a restructuring the company attributed to the rapid adoption of artificial intelligence tools. The announcement coincided with the company's first-quarter 2026 earnings, which reported a record $639.8 million in quarterly revenue, a 34 percent year-over-year increase. CEO Matthew Prince stated the cuts were not driven by performance concerns but reflected roles made obsolete by AI, and that Cloudflare expected to employ more people by the end of 2027 than at any point during 2026. == Products == Cloudflare provides network and security products for consumers and businesses, utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content distribution network to serve content across its network of servers. It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as DNS over HTTPS, SMTP, and HTTP/2 with support for HTTP/2 Server Push. As of 2023, Cloudflare handles an average of 45 million HTTP requests per second. As of 2024, Cloudflare servers are powered by AMD EPYC 9684X processors. Cloudflare also provides analysis and reports on large-scale outages, including Verizon's October 2024 outage. === Artificial intelligence === In 2023, Cloudflare launched "Workers AI", a framework allowing for use of Nvidia GPU's within Cloudflare's network. In 2024, Cloudflare launched a tool that prevents bots from scraping websites. To build automatic bot detector models, the company analyzed "AI" bots and crawler traffic. The company also launched an "AI" assistant to generate charts based on queries by leveraging "Workers AI". Cloudflare announced plans in September 2024 to launch a marketplace where website owners can sell "AI" model providers access to scrape their site's content. In March 2025, Cloudflare announced a new feature called "AI Labyrinth", which combats unauthorized "AI" data scraping by serving fake "AI"-generated content to LLM bots. In July, the company rolled out a permission-based setting to allow websites to automatically block online bots from scraping data and content. Cloudflare released AutoRAG into beta in 2025. AutoRAG (retrieval augmented generation) creates a vector database of a website's unstructured content to identify relationships between concepts. It is part of an initiative with Microsoft, alongside their NLWeb standard, to make websites easier for people and automated systems to query. Cloudflare and GoDaddy partnered in April 2026 to enable AI Crawl Control features on GoDaddy hosted websites. This would allow site owners to decide how AI bot crawlers interact with their content. === DDoS mitigation === Cloudflare provides free and paid DDoS mitigation services that protect customers from distributed denial of service (DDoS) attacks. Cloudflare received media attention in June 2011 for providing DDoS mitigation for the website of LulzSec, a black hat hacking group. In March 2013, The Spamhaus Project was targeted by a DDoS attack that Cloudflare reported exceeded 300 gigabits per second (Gbit/s). Patrick Gilmore, of Akamai, stated that at the time it was "the largest publicly announced DDoS attack in the history of the Internet". While trying to defend Spamhaus against the DDoS attacks, Cloudflare ended up being attacked as well; Google and other companies eventually came to Spamhaus' defense and helped it to absorb the unprecedented amount of attack traffic. In 2014, Cloudflare began providing free DDoS mitigation for artists, activists, journalists, and human rights groups under the name "Project Galileo". In 2017, it extended the service to electoral infrastructure and political campaigns under the name "Athenian Project". By 2025, more than 2,900 users and organizations were participating in Project Galileo, including 31 US states. In February 2014, Cloudflare claimed to have mitigated an NTP reflection attack against an unnamed European customer, which it stated peaked at 400 Gbit/s. In November 2014, it reported a 500 Gbit/s DDoS attack in Hong Kong. In July 2021, the company claimed to have absorbed a DDoS atta
PagedAttention
PagedAttention is an attention algorithm for efficient serving of large language models (LLMs). It was introduced in 2023 by Woosuk Kwon and colleagues in the paper Efficient Memory Management for Large Language Model Serving with PagedAttention, alongside the vLLM serving engine. The method stores the key–value cache used during autoregressive decoding in fixed-size blocks that can be mapped to non-contiguous physical memory, borrowing ideas from virtual memory, paging, and operating system design. == Background == In transformer inference, the key–value cache grows with sequence length and the number of concurrent requests. Kwon et al. argued that earlier serving systems typically reserved contiguous cache regions in advance, which caused reserved space, internal fragmentation, and external fragmentation. In their experiments, the paper reported that the effective memory utilization of previous systems could fall as low as 20.4%. == Description == PagedAttention partitions the cache of each sequence into fixed-size KV blocks. A request's cache is represented as a sequence of logical blocks, while a block table maps those logical blocks to physical GPU-memory blocks. As a result, neighboring logical blocks do not need to be contiguous in physical memory, and new blocks can be allocated on demand as generation proceeds. The design also makes it easier to share cache state across related decoding paths. In vLLM, physical blocks can be reference-counted and shared among requests or branches, with block-granularity copy-on-write used when a shared block must be modified. The original paper applied this design to parallel sampling, beam search, and prompts with shared prefixes. == Mathematical formulation == For a query token i {\displaystyle i} in causal self-attention, the standard attention output can be written as a i j = exp ( q i ⊤ k j / d ) ∑ t = 1 i exp ( q i ⊤ k t / d ) , o i = ∑ j = 1 i a i j v j {\displaystyle a_{ij}={\frac {\exp(\mathbf {q} _{i}^{\top }\mathbf {k} _{j}/{\sqrt {d}})}{\sum _{t=1}^{i}\exp(\mathbf {q} _{i}^{\top }\mathbf {k} _{t}/{\sqrt {d}})}},\;\mathbf {o} _{i}=\sum _{j=1}^{i}a_{ij}\mathbf {v} _{j}} where q i {\displaystyle \mathbf {q} _{i}} , k j {\displaystyle \mathbf {k} _{j}} , and v j {\displaystyle \mathbf {v} _{j}} are the query, key, and value vectors, and d {\displaystyle d} is the attention dimension. If the cache is partitioned into blocks of size B {\displaystyle B} , the key and value blocks may be written as K j = ( k ( j − 1 ) B + 1 , … , k j B ) , V j = ( v ( j − 1 ) B + 1 , … , v j B ) {\displaystyle \mathbf {K} _{j}=(\mathbf {k} _{(j-1)B+1},\ldots ,\mathbf {k} _{jB}),\;\mathbf {V} _{j}=(\mathbf {v} _{(j-1)B+1},\ldots ,\mathbf {v} _{jB})} PagedAttention then performs the computation blockwise: A i j = exp ( q i ⊤ K j / d ) ∑ t = 1 ⌈ i / B ⌉ exp ( q i ⊤ K t / d ) , o i = ∑ j = 1 ⌈ i / B ⌉ V j A i j ⊤ {\displaystyle \mathbf {A} _{ij}={\frac {\exp(\mathbf {q} _{i}^{\top }\mathbf {K} _{j}/{\sqrt {d}})}{\sum _{t=1}^{\lceil i/B\rceil }\exp(\mathbf {q} _{i}^{\top }\mathbf {K} _{t}/{\sqrt {d}})}},\;\mathbf {o} _{i}=\sum _{j=1}^{\lceil i/B\rceil }\mathbf {V} _{j}\mathbf {A} _{ij}^{\top }} where A i j {\displaystyle \mathbf {A} _{ij}} is the vector of attention scores for the j {\displaystyle j} -th KV block. In the formulation given by Kwon et al., this preserves the causal attention calculation while allowing the key and value blocks to reside in non-contiguous physical memory. == Performance and use == The vLLM paper reported that, on its evaluated workloads, the use of PagedAttention and the associated memory-management design improved serving throughput by 2–4× over the compared baselines, including FasterTransformer and Orca, while preserving model outputs. In experiments on OPT-13B with the Alpaca trace, the paper also reported memory savings of 6.1–9.8% for parallel sampling and 37.6–55.2% for beam search through KV-block sharing. A 2024 survey of LLM serving systems described PagedAttention as having become an industry norm in LLM serving frameworks, citing support in TGI, vLLM, and TensorRT-LLM. == Limitations and alternatives == Subsequent work has described trade-offs in the approach. The 2025 vAttention paper argued that PagedAttention requires attention kernels to be rewritten to support paging and increases software complexity, portability issues, redundancy, and execution overhead, proposing instead a memory manager that keeps the cache contiguous in virtual memory while relying on demand paging for physical allocation. === vAttention === Unlike PagedAttention, vAttention does not introduce a different attention rule; it retains the standard attention computation Attention ( q i , K , V ) = softmax ( q i K ⊤ s c a l e ) V . {\displaystyle \operatorname {Attention} (q_{i},K,V)=\operatorname {softmax} \left({\frac {q_{i}K^{\top }}{\mathrm {scale} }}\right)V.} In the notation of Prabhu et al., the key and value tensors for a request seen so far are K , V ∈ R L ′ × ( H × D ) {\displaystyle K,V\in \mathbb {R} ^{L'\times (H\times D)}} , where L ′ {\displaystyle L'} is the context length seen so far, H {\displaystyle H} is the number of KV heads on a worker, and D {\displaystyle D} is the dimension of each KV head. In systems prior to PagedAttention, the K cache (or V cache) at each layer of a worker is typically allocated as a 4D tensor of shape [ B , L , H , D ] , {\displaystyle [B,L,H,D],} where B {\displaystyle B} is batch size and L {\displaystyle L} is the maximum context length supported by the model. vAttention preserves this contiguous virtual-memory view while deferring physical-memory allocation to runtime. A serving framework maintains separate K and V tensors for each layer, so vAttention reserves 2 N {\displaystyle 2N} virtual-memory buffers on a worker, where N {\displaystyle N} is the number of layers managed by that worker. The maximum size of one virtual-memory buffer is B S = B × S , {\displaystyle BS=B\times S,} where S {\displaystyle S} is the maximum size of a single request's per-layer K cache (or V cache) on a worker. The paper defines S = L × H × D × P , {\displaystyle S=L\times H\times D\times P,} where P {\displaystyle P} is the number of bytes needed to store one element. In this formulation, vAttention keeps the KV cache contiguous in virtual memory and relies on demand paging for physical allocation, rather than modifying the attention kernel to operate over non-contiguous KV-cache blocks.