AI Face Swap App

AI Face Swap App — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Legendre moment

    Legendre moment

    In mathematics, Legendre moments are a type of image moment and are achieved by using the Legendre polynomial. Legendre moments are used in areas of image processing including: pattern and object recognition, image indexing, line fitting, feature extraction, edge detection, and texture analysis. Legendre moments have been studied as a means to reduce image moment calculation complexity by limiting the amount of information redundancy through approximation. == Legendre moments == Source: With order of m + n, and object intensity function f(x,y): L m n = ( 2 m + 1 ) ( 2 n + 1 ) 4 ∫ − 1 1 ∫ − 1 1 P m ( x ) P n ( y ) f ( x , y ) d x d y {\displaystyle L_{mn}={\frac {(2m+1)(2n+1)}{4}}\int \limits _{-1}^{1}\int \limits _{-1}^{1}P_{m}(x)P_{n}(y)f(x,y)\,dx\,dy} where m,n = 1, 2, 3, ...∞ with the nth-order Legendre polynomials being: P n ( x ) = ∑ k = 0 n a k , n x k = ( − 1 ) n 2 n n ! ( d d x ) [ ( 1 − x 2 ) n ] {\displaystyle P_{n}(x)=\sum _{k=0}^{n}a_{k,n}x^{k}={\frac {(-1)^{n}}{2^{n}n!}}\left({\frac {d}{dx}}\right)[(1-x^{2})^{n}]} which can also be written: P n ( x ) = ∑ k = 0 D ( n ) ( − 1 ) k ( 2 n − 2 k ) ! 2 n k ! ( n − k ) ! ( n − 2 k ) ! x n − 2 k = ( 2 n ) ! 2 n ( n ! ) 2 x n − ( 2 n − 2 ) ! 2 n 1 ! ( n − 1 ) ! ( n − 2 ) ! x n − 2 + ⋯ {\displaystyle {\begin{aligned}P_{n}(x)&=\sum _{k=0}^{D(n)}(-1)^{k}{\frac {(2n-2k)!}{2^{n}k!(n-k)!(n-2k)!}}x^{n-2k}\\[5pt]&={\frac {(2n)!}{2^{n}(n!)^{2}}}x^{n}-{\frac {(2n-2)!}{2^{n}1!(n-1)!(n-2)!}}x^{n-2}+\cdots \end{aligned}}} where D(n) = floor(n/2). The set of Legendre polynomials {Pn(x)} form an orthogonal set on the interval [−1,1]: ∫ − 1 1 P n ( x ) P m ( x ) d x = 2 2 n + 1 δ n m {\displaystyle \int _{-1}^{1}P_{n}(x)P_{m}(x)\,dx={\frac {2}{2n+1}}\delta _{nm}} A recurrence relation can be used to compute the Legendre polynomial: ( n + 1 ) P n + 1 ( x ) − ( 2 n + 1 ) x P n ( x ) + n P n − 1 ( x ) = 0 {\displaystyle (n+1)P_{n+1}(x)-(2n+1)xP_{n}(x)+nP_{n-1}(x)=0} f(x,y) can be written as an infinite series expansion in terms of Legendre polynomials [−1 ≤ x,y ≤ 1.]: f ( x , y ) = ∑ m = 0 ∞ ∑ n = 0 ∞ λ m n P m ( x ) P n ( y ) {\displaystyle f(x,y)=\sum _{m=0}^{\infty }\sum _{n=0}^{\infty }\lambda _{mn}P_{m}(x)P_{n}(y)}

    Read more →
  • Thompson sampling

    Thompson sampling

    Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that address the exploration–exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief. == Description == Consider a set of contexts X {\displaystyle {\mathcal {X}}} , a set of actions A {\displaystyle {\mathcal {A}}} , and rewards in R {\displaystyle \mathbb {R} } . The aim of the player is to play actions under the various contexts, such as to maximize the cumulative rewards. Specifically, in each round, the player obtains a context x ∈ X {\displaystyle x\in {\mathcal {X}}} , plays an action a ∈ A {\displaystyle a\in {\mathcal {A}}} and receives a reward r ∈ R {\displaystyle r\in \mathbb {R} } following a distribution that depends on the context and the issued action. The elements of Thompson sampling are as follows: a likelihood function P ( r | θ , a , x ) {\displaystyle P(r|\theta ,a,x)} ; a set Θ {\displaystyle \Theta } of parameters θ {\displaystyle \theta } of the distribution of r {\displaystyle r} ; a prior distribution P ( θ ) {\displaystyle P(\theta )} on these parameters; past observations triplets D = { ( x ; a ; r ) } {\displaystyle {\mathcal {D}}=\{(x;a;r)\}} ; a posterior distribution P ( θ | D ) ∝ P ( D | θ ) P ( θ ) {\displaystyle P(\theta |{\mathcal {D}})\propto P({\mathcal {D}}|\theta )P(\theta )} , where P ( D | θ ) {\displaystyle P({\mathcal {D}}|\theta )} is the likelihood function. Thompson sampling consists of playing the action a ∗ ∈ A {\displaystyle a^{\ast }\in {\mathcal {A}}} according to the probability that it maximizes the expected reward; action a ∗ {\displaystyle a^{\ast }} is chosen with probability ∫ I [ E ( r | a ∗ , x , θ ) = max a ′ E ( r | a ′ , x , θ ) ] P ( θ | D ) d θ , {\displaystyle \int \mathbb {I} \left[\mathbb {E} (r|a^{\ast },x,\theta )=\max _{a'}\mathbb {E} (r|a',x,\theta )\right]P(\theta |{\mathcal {D}})d\theta ,} where I {\displaystyle \mathbb {I} } is the indicator function. In practice, the rule is implemented by sampling. In each round, parameters θ ∗ {\displaystyle \theta ^{\ast }} are sampled from the posterior P ( θ | D ) {\displaystyle P(\theta |{\mathcal {D}})} , and an action a ∗ {\displaystyle a^{\ast }} chosen that maximizes E [ r | θ ∗ , a ∗ , x ] {\displaystyle \mathbb {E} [r|\theta ^{\ast },a^{\ast },x]} , i.e. the expected reward given the sampled parameters, the action, and the current context. Conceptually, this means that the player instantiates their beliefs randomly in each round according to the posterior distribution, and then acts optimally according to them. In most practical applications, it is computationally onerous to maintain and sample from a posterior distribution over models. As such, Thompson sampling is often used in conjunction with approximate sampling techniques. == History == Thompson sampling was originally described by Thompson in 1933. It was subsequently rediscovered numerous times independently in the context of multi-armed bandit problems. A first proof of convergence for the bandit case has been shown in 1997. The first application to Markov decision processes was in 2000. A related approach (see Bayesian control rule) was published in 2010. In 2010 it was also shown that Thompson sampling is instantaneously self-correcting. Asymptotic convergence results for contextual bandits were published in 2011. Thompson Sampling has been widely used in many online learning problems including A/B testing in website design and online advertising, and accelerated learning in decentralized decision making. A Double Thompson Sampling (D-TS) algorithm has been proposed for dueling bandits, a variant of traditional MAB, where feedback comes in the form of pairwise comparison. == Relationship to other approaches == === Probability matching === Probability matching is a decision strategy in which predictions of class membership are proportional to the class base rates. Thus, if in the training set positive examples are observed 60% of the time, and negative examples are observed 40% of the time, the observer using a probability-matching strategy will predict (for unlabeled examples) a class label of "positive" on 60% of instances, and a class label of "negative" on 40% of instances. === Bayesian control rule === A generalization of Thompson sampling to arbitrary dynamical environments and causal structures, known as Bayesian control rule, has been shown to be the optimal solution to the adaptive coding problem with actions and observations. In this formulation, an agent is conceptualized as a mixture over a set of behaviours. As the agent interacts with its environment, it learns the causal properties and adopts the behaviour that minimizes the relative entropy to the behaviour with the best prediction of the environment's behaviour. If these behaviours have been chosen according to the maximum expected utility principle, then the asymptotic behaviour of the Bayesian control rule matches the asymptotic behaviour of the perfectly rational agent. The setup is as follows. Let a 1 , a 2 , … , a T {\displaystyle a_{1},a_{2},\ldots ,a_{T}} be the actions issued by an agent up to time T {\displaystyle T} , and let o 1 , o 2 , … , o T {\displaystyle o_{1},o_{2},\ldots ,o_{T}} be the observations gathered by the agent up to time T {\displaystyle T} . Then, the agent issues the action a T + 1 {\displaystyle a_{T+1}} with probability: P ( a T + 1 | a ^ 1 : T , o 1 : T ) , {\displaystyle P(a_{T+1}|{\hat {a}}_{1:T},o_{1:T}),} where the "hat"-notation a ^ t {\displaystyle {\hat {a}}_{t}} denotes the fact that a t {\displaystyle a_{t}} is a causal intervention (see Causality), and not an ordinary observation. If the agent holds beliefs θ ∈ Θ {\displaystyle \theta \in \Theta } over its behaviors, then the Bayesian control rule becomes P ( a T + 1 | a ^ 1 : T , o 1 : T ) = ∫ Θ P ( a T + 1 | θ , a ^ 1 : T , o 1 : T ) P ( θ | a ^ 1 : T , o 1 : T ) d θ {\displaystyle P(a_{T+1}|{\hat {a}}_{1:T},o_{1:T})=\int _{\Theta }P(a_{T+1}|\theta ,{\hat {a}}_{1:T},o_{1:T})P(\theta |{\hat {a}}_{1:T},o_{1:T})\,d\theta } , where P ( θ | a ^ 1 : T , o 1 : T ) {\displaystyle P(\theta |{\hat {a}}_{1:T},o_{1:T})} is the posterior distribution over the parameter θ {\displaystyle \theta } given actions a 1 : T {\displaystyle a_{1:T}} and observations o 1 : T {\displaystyle o_{1:T}} . In practice, the Bayesian control amounts to sampling, at each time step, a parameter θ ∗ {\displaystyle \theta ^{\ast }} from the posterior distribution P ( θ | a ^ 1 : T , o 1 : T ) {\displaystyle P(\theta |{\hat {a}}_{1:T},o_{1:T})} , where the posterior distribution is computed using Bayes' rule by only considering the (causal) likelihoods of the observations o 1 , o 2 , … , o T {\displaystyle o_{1},o_{2},\ldots ,o_{T}} and ignoring the (causal) likelihoods of the actions a 1 , a 2 , … , a T {\displaystyle a_{1},a_{2},\ldots ,a_{T}} , and then by sampling the action a T + 1 ∗ {\displaystyle a_{T+1}^{\ast }} from the action distribution P ( a T + 1 | θ ∗ , a ^ 1 : T , o 1 : T ) {\displaystyle P(a_{T+1}|\theta ^{\ast },{\hat {a}}_{1:T},o_{1:T})} . === Upper-confidence-bound (UCB) algorithms === Thompson sampling and upper-confidence bound algorithms share a fundamental property that underlies many of their theoretical guarantees. Roughly speaking, both algorithms allocate exploratory effort to actions that might be optimal and are in this sense "optimistic". Leveraging this property, one can translate regret bounds established for UCB algorithms to Bayesian regret bounds for Thompson sampling or unify regret analysis across both these algorithms and many classes of problems.

    Read more →
  • Deepfake

    Deepfake

    Deepfakes (a portmanteau of 'deep learning' and 'fake') are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or audio-video editing software. They may depict real or fictional people and are considered a form of synthetic media, that is media that is usually created by artificial intelligence systems by combining various media elements into a new media artifact. While the act of creating fake content is not new, deepfakes uniquely leverage machine learning and artificial intelligence techniques, including facial recognition algorithms and artificial neural networks such as variational autoencoders and generative adversarial networks (GANs). In turn, the field of image forensics has worked to develop techniques to detect manipulated images. Deepfakes have garnered widespread attention for their potential use in creating child sexual abuse material, celebrity pornographic videos, revenge porn, fake news, hoaxes, bullying, and financial fraud. Academics have raised concerns about the potential for deepfakes to promote disinformation and hate speech, as well as interfere with elections. In response, the information technology industry and governments have proposed recommendations and methods to detect and mitigate their use. Academic research has also delved deeper into the factors driving deepfake engagement online as well as potential countermeasures to malicious application of deepfakes. From traditional entertainment to gaming, deepfake technology has evolved to be increasingly convincing and available to the public, allowing for the disruption of the entertainment and media industries. == History == Photo manipulation was developed in the 19th century and soon applied to motion pictures. Technology steadily improved during the 20th century, and more quickly with the advent of digital video. Deepfake technology has been developed by researchers at academic institutions beginning in the 1990s, and later by amateurs in online communities. More recently, the methods have been adopted by industry. The development of generative adversarial networks (GANs) in the mid-2010s represented a key technical turning point in the evolution of deepfakes. GANs allowed for the creation of highly realistic fake images and videos by training competing neural networks, achieving a much improved visual fidelity over previous methods of creating the content using rules or by using autoencoders, and formed the basis for modern deepfake methods. === Academic research === Academic research related to deepfakes is split between the field of computer vision, a sub-field of computer science, which develops techniques for creating and identifying deepfakes, and humanities and social science approaches that study the social, ethical, aesthetic implications as well as journalistic and informational implications of deepfakes. As deepfakes have risen in prominence in popularity with innovations provided by AI tools, significant research has gone into detection methods and defining the factors driving engagement with deepfakes on the internet. Deepfakes have been shown to appear on social media platforms and other parts of the internet for purposes ranging from entertainment and education related to deepfakes to misinformation to elicit strong reactions. There are gaps in research related to the propagation of deepfakes on social media. Negativity and emotional response are the primary driving factors for users sharing deepfakes. === Social science and humanities approaches to deepfakes === In cinema studies, deepfakes illustrate how "the human face is emerging as a central object of ambivalence in the digital age". Video artists have used deepfakes to "playfully rewrite film history by retrofitting canonical cinema with new star performers". Film scholar Christopher Holliday analyses how altering the gender and race of performers in familiar movie scenes destabilizes gender classifications and categories. The concept of "queering" deepfakes is also discussed in Oliver M. Gingrich's discussion of media artworks that use deepfakes to reframe gender, including British artist Jake Elwes' Zizi: Queering the Dataset, an artwork that uses deepfakes of drag queens to intentionally play with gender. The aesthetic potentials of deepfakes are also beginning to be explored. Theatre historian John Fletcher notes that early demonstrations of deepfakes are presented as performances, and situates these in the context of theater, discussing "some of the more troubling paradigm shifts" that deepfakes represent as a performance genre. While most English-language academic studies of deepfakes focus on the Western anxieties about disinformation and pornography, digital anthropologist Gabriele de Seta has analyzed the Chinese reception of deepfakes, which are known as huanlian, which translates to "changing faces". The Chinese term does not contain the "fake" of the English deepfake, and de Seta argues that this cultural context may explain why the Chinese response has centered on practical regulatory measures to "fraud risks, image rights, economic profit, and ethical imbalances". === Computer science research on deepfakes === A landmark early project was the "Video Rewrite" program, published in 1997. The program modified existing video footage of a person speaking to depict that person mouthing the words from a different audio track. It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face. Contemporary academic projects have focused on creating more realistic videos and improving deepfake techniques. The "Synthesizing Obama" program, published in 2017, modifies video footage of former president Barack Obama to depict him mouthing the words contained in a separate audio track. The project lists as a main research contribution to its photorealistic technique for synthesizing mouth shapes from audio. The "Face2Face" program, published in 2016, modifies video footage of a person's face to depict them mimicking another person's facial expressions. The project highlights its primary research contribution as the development of the first method for re-enacting facial expressions in real time using a camera that does not capture depth, enabling the technique to work with common consumer cameras. Researchers have also shown that deepfakes are expanding into other domains such as medical imagery. In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test. A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection of deepfakes have advanced over the last few years. The survey identifies that researchers have been focusing on resolving the following challenges of deepfake creation: Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data and the time to train the model required to produce quality images and to enable the execution of trained models on new identities (unseen during training). Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from. Data pairing is laborious and impractical when training on multiple identities and facial behaviors. Some solutions include self-supervised training (using frames from the same video), the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings. Identity leakage. This is where the identity of the driver (i.e., the actor controlling the face in a reenactment) is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections. Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artifacts can occur. A common occlusion is a closed mouth which hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting. Temporal coherence. In videos containing deepfakes, artifacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing. Overall, deepfakes are expected to have several implications in media and society, med

    Read more →
  • DialogOS

    DialogOS

    DialogOS is a graphical programming environment to design computer system which can converse through voice with the user. Dialogs are clicked together in a Flowchart. DialogOS includes bindings to control Lego Mindstorms robots by voice and has bindings to SQL databases, as well as a generic plugin architecture to integrate with other types of backends. DialogOS is used in computer science courses in schools and universities to teach programming and to introduce beginners in the basic principles of human/computer interaction and dialog design. It has also been used in research systems. DialogOS was initially developed commercially by CLT Sprachtechnologie GmbH until its liquidation in 2017. The rights were then acquired by Saarland University and the software was released as open-source. == Bindings to Lego Mindstorms NXT == DialogOS can control the LEGO Mindstorms NXT Series. It uses sensor-nodes to obtain values for the following sensors: noise sensor ultrasonic sensor touch sensor luminosity sensor

    Read more →
  • Information extraction

    Information extraction

    Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction. Recent advances in NLP techniques have allowed for significantly improved performance compared to previous years. An example is the extraction from newswire reports of corporate mergers, such as denoted by the formal relation: MergerBetween ⁡ ( c o m p a n y 1 , c o m p a n y 2 , d a t e ) {\displaystyle \operatorname {MergerBetween} (\mathrm {company} _{1},\mathrm {company} _{2},\mathrm {date} )} , from an online news sentence such as: "Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow automated reasoning about the logical form of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display. The discipline of information retrieval (IR) has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents. Another complementary approach is that of natural language processing (NLP) which has solved the problem of modelling human language processing with considerable success when taking into account the magnitude of the task. In terms of both difficulty and emphasis, IE deals with tasks in between both IR and NLP. In terms of input, IE assumes the existence of a set of documents in which each document follows a template, i.e. describes one or more entities or events in a manner that is similar to those in other documents but differing in the details. An example, consider a group of newswire articles on Latin American terrorism with each article presumed to be based upon one or more terroristic acts. We also define for any given IE task a template, which is a(or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to "understand" an attack article only enough to find data corresponding to the slots in this template. == History == Information extraction dates back to the late 1970s in the early days of NLP. An early commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the aim of providing real-time financial news to financial traders. Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference that focused on the following domains: MUC-1 (1987), MUC-3 (1989): Naval operations messages. MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries. MUC-5 (1993): Joint ventures and microelectronics domain. MUC-6 (1995): News articles on management changes. MUC-7 (1998): Satellite launch reports. Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism. == Present significance == The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the World Wide Web, refers to the existing Internet as the web of documents and advocates that more of the content be made available as a web of data. Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. == Tasks and subtasks == Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical IE tasks and subtasks include: Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack. Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks. Knowledge Base Population: Fill a database of facts given a set of documents. Typically the database is in the form of triplets, (entity 1, relation, entity 2), e.g. (Barack Obama, Spouse, Michelle Obama) Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about. Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". Relationship extraction: identification of relations between entities, such as: PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") PERSON located in LOCATION (extracted from the sentence "Bill is in France.") Semi-structured information extraction which may refer to any IE that tries to restore some kind of information structure that has been lost through publication, such as: Table extraction: finding and extracting tables from documents. Table information extraction : extracting information in structured manner from the tables. This task is more complex than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction. Comments extraction : extracting comments from the actual content of articles in order to restore the link between authors of each of the sentences Language and vocabulary analysis Terminology extraction: finding the relevant terms for a given corpus Audio extraction Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece. Note that this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE. IE on non-text documents is becoming an increasingly interesting topic in research, and information extracted from multimedia documents can now be expressed in a high level structure as it is done on text. This naturally leads to the fusion of extracted information from multiple kinds of documents and sources. == World Wide Web applications == IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people

    Read more →
  • For a Breath I Tarry

    For a Breath I Tarry

    "For a Breath I Tarry" is a 1966 post-apocalyptic novelette by American writer Roger Zelazny, which was nominated for the Hugo Award for Best Novelette in 1967. Set in a future long after the self-extinction of humanity, the novelette recounts the tale of Frost, a sentient machine. Although humans have caused their own extinction, the sentient machines that they created continue the work of rebuilding a shattered Earth. Along the way, the story explores the differences between humanity and machines, the former experiencing the world qualitatively, while the latter doing so quantitatively. This difference is illustrated through philosophical conversations between Frost and another machine named Mordel. Frost's goal of becoming human, along with literary allusions, drives the plot and sets the tone of the novelette. These allusions include the first chapter of the Book of Job, in both situation and language, since verses are both quoted directly and paraphrased. In addition, the first three chapters of the Book of Genesis are echoed. Finally, Frost and Mordel enter into a Faustian bargain, though with better results than in the original story. The other major character is the Beta Machine, Frost's peer in the Southern Hemisphere. (Frost controls the Northern Hemisphere.) The novelette hints that though being a machine, Beta has a feminine personality. After Frost has succeeded in his millennium-long quest to become human (via recovered DNA), Beta agrees to join him in becoming human—suggesting the possibility of rebirth for the human race. The novelette has appeared in collections of Zelazny's works and in anthologies. The title is from a phrase in the poet A. E. Housman's collection A Shropshire Lad.

    Read more →
  • International Conference on Automated Planning and Scheduling

    International Conference on Automated Planning and Scheduling

    The International Conference on Automated Planning and Scheduling (ICAPS) is a leading international academic conference in automated planning and scheduling held annually for researchers and practitioners in planning and scheduling. ICAPS is supported by the National Science Foundation, the journal Artificial Intelligence, and other supporters. == The IPC and PDDL == ICAPS conducts the International Planning Competition (IPC), a competition scheduled every few years that empirically evaluates state-of-the-art planning systems on a collection of benchmark problems. The Planning Domain Definition Language (PDDL) was developed mainly to make the 1998/2000 International Planning Competition possible, and then evolved with each competition. PDDL is an attempt to standardize Artificial Intelligence (AI) planning languages. PDDL was first developed by Drew McDermott and his colleagues in 1998, inspired by STRIPS, ADL, and other sources. == History == The ICAPS conferences began in 2003 as a merge of two bi-annual conferences, the International Conference on Artificial Intelligence Planning and Scheduling (AIPS) and the European Conference on Planning (ECP). == List of events ==

    Read more →
  • Pulsar (social listening platform)

    Pulsar (social listening platform)

    Pulsar is a software platform for social media monitoring, audience intelligence and social listening that allows organizations to monitor and analyze online conversations across social media, news, and other digital sources. The platform combines social media listening, media monitoring, trend analysis, and audience segmentation to help users understand public discussions and audience behavior in real time. The platform is a social listening platform, which aggregates data from networks such as X, Facebook, Instagram, and forums) and applies artificial intelligence for text and sentiment analysis. Pulsar is offered as a cloud-based Software as a Service (SaaS) tool and insights consultancy. It has been part of Pulsar Group (formerly Access Intelligence), a publicly listed group of communications software products, since 2019. As well as commercial uses, the platform has been used in peer-reviewed academic research analysing online discourse. The platform is listed on the UK government's G-Cloud 14 Digital Marketplace for the provision of social listening and audience intelligence services. == History == Pulsar originated in the early 2010s as a project within Face, a London-based innovation and market research consultancy. The platform's first product, Pulsar TRAC, launched in 2013 as a social media analytics tool. Pulsar TRAC was designed to measure the reach of conversations, mapping brand audiences, and tracking how content spreads through networks. The development was led by Dr Francesco D'Orazio, who created the Pulsar brand and led the development of the platform while serving as VP of Product and Innovation at Face. Face itself had been acquired by the Cello Group Plc (a UK-based advisory firm) in 2012, and Pulsar became part of Cello's portfolio of research and data tools. In January 2017, Cello Group made a significant investment to scale Pulsar and announced the merger of Face's qualitative research business into Pulsar, unifying both under the Pulsar brand for global expansion. In 2018, Pulsar opened an office in Los Angeles to better serve its growing U.S. client base in media, healthcare, and entertainment sectors and Francesco D'Orazio was appointed CEO. The company focused on developing new products amid a wave of consolidation in the social listening industry. In October 2019, Pulsar was acquired by Access Intelligence Plc (now Pulsar Group), an AIM-listed communications software company. The group, which also owns PR and media tools Isentia, Vuelio and ResponseSource, integrated Pulsar to their end-to-end marketing and communications insights offering. Pulsar established a new office in Sydney, Australia in 2022 as part of this global expansion, adding to its existing offices in London and Los Angeles. In 2023, Pulsar Group (then Access Intelligence) was recognised as one of Europe's fastest growing companies by the Financial Times. In May 2024, Access Intelligence PLC changed its name to Pulsar Group PLC. The company has since continued to develop its platform. In March 2025 it introduced new tool Narratives AI, described as a "search engine for public opinion" and the first of its kind for analyzing public narratives and their evolutions in both social media and the news. In October 2025, Pulsar launched Insight Agents, a set of AI agents embedded into the platform advertised to "proactively anticipate user needs or issues, carry out routine tasks, uncover anomalies in your datasets, and prompt responses at scale, 24/7." == Products == Pulsar's architecture integrates four main products into a single interface. The core product suite is often broken into three main components: Pulsar TRAC (for social listening and audience analysis), Pulsar TRENDS (for trend discovery and analysis), and Pulsar CORE (for owned-channel and web analytics). Pulsar's fourth product is Narratives AI. === Pulsar TRAC === Pulsar TRAC is a social listening and audience intelligence platform that allows users to configure searches that track public conversations and measure audience behaviour. Pulsar TRAC is focused on conversation insights and audience segmentations - the platform is reported to collect and analyse data from a wide range of sources, including major social networks, forums, news and review sites, and ecommerce platforms, with real-time visualisations and AI-supported analytics used to find patterns and communities of interest. Pulsar TRAC can be incorporated into workflows with other audience tools, such as an integration with Audiense that connects TRAC's conversation insights to external audience-segmentation datasets. === Pulsar CORE === Pulsar CORE centres on the analysis of owned-channel data, such as brand social media profiles, website interaction and other in-house digital assets, to generate audience and content insights. CORE can monitor published content, evaluate competitors, and extract demographic and behavioural segmentation from owned channels. === Narratives AI === Narratives AI is a tool within the Pulsar audience intelligence platform that uses artificial intelligence to detect, cluster and analyse narratives forming across social and news media. It was launched in March 2025 as a standalone search interface that processes real-time and historical data to find cultural trends, behaviours and beliefs. It uses clustering algorithms and visualisation to show how conversations form and spread online, and their relative importance within wider discourse. == Notable features == === Insight Agents === Pulsar's Insight Agents are AI-powered agents within the Pulsar platform designed to automate and augment common tasks in media, social, audience and narrative intelligence. Branded as TeamMates, these agents are grouped into four functional types: Sentinels for real-time monitoring, anomaly detection and alerting Oracles for forecasting and scenario planning Custodians for governance, compliance and policy enforcement Analysts for research, reporting and recommendations Each agent is trained on Pulsar's multi-source data and domain-specific workflows. In February 2026, Pulsar introduced 'Crisis Oracle,' an AI-driven system designed to quantify narrative momentum and predict reputational risk. == Academic research == Pulsar has been used as a data collection and analysis tool in peer-reviewed academic research across public health, infodemiology, veterinary science, and policy research. Published uses include a World Health Organization report on infodemic management, a Journal of Medical Internet Research study on headache and migraine discourse across Japan, Germany, and France, a Frontiers in Big Data study of Long COVID narratives, and Frontiers in Veterinary Science studies on canine chronic kidney disease and oral medication administration in dogs.

    Read more →
  • SAP StreamWork

    SAP StreamWork

    SAP StreamWork is an enterprise collaboration tool from SAP SE released in March 2010, and discontinued in December 2015. StreamWork allowed real-time collaboration like Google Wave, but focused on business activities such as analyzing data, planning meetings, and making decisions. It incorporated technology from Box.net and Evernote to allow users to connect to online files and documents, and document-reader technology from Scribd allowed users to view documents directly within its environment. StreamWork supported the OpenSocial set of application programming interfaces (APIs), allowing it to connect to tools built by third-party developers, such as Google Docs. A version of StreamWork intended for large enterprises used a virtual appliance based on Novell's SUSE Linux Enterprise to connect it to business systems, including those from SAP.

    Read more →
  • Privacy Lost

    Privacy Lost

    Privacy Lost is a 2023 short science fiction film directed by Peter Stoel and Robert Berger. It follows a family using augmented reality (AR) and artificial intelligence (AI) devices capable of reading emotional states, raising questions about privacy and manipulation. == Premise == Privacy Lost follows a family using AR glasses that capture and interpret emotions in real time. As the parents argue in a restaurant, their emotional states and even hidden feelings become visible through these glasses. An AI-driven waiter adapts its appearance for each family member, employing emotional data to influence their decisions. == Cast == Brian Kant as Waiter Michael Krass as Husband Estelle Levinson as Waitress Thor van der Linden as Scotty Carlijn van Ramshorst as Wife == Production == Filming took place at HeadQ Productions, a virtual studio located in Amsterdam. The creators sought to depict a near-future scenario in which real-time emotion analysis becomes part of daily interactions. The film was screened at the Augmented World Expo (AWE), where it was noted for its thematic focus on AI-driven manipulation and emotional tracking. The depiction of AR glasses and AI characters integrates modern visual effects to show how devices might analyze emotional responses in real time. It also depicts how AI-driven interactions could influence consumer decisions, pointing to concerns over potential misuse. == Themes == Privacy Lost focuses on the intersection of advanced AI capabilities and AR environments, showing how real-time emotional analysis can be leveraged for targeted persuasion. The film aims to highlight the social and ethical implications of emerging AR and AI technologies, underlining how establishing clear regulatory frameworks for them is necessary to protect individual privacy, govern the storage of emotion-based data, and prevent manipulative practices. Critics describe the film’s theme as dystopian and note that such a reality is unlikely to occur in the near future. However, despite the exaggerated scenario, the film emphasizes the importance of a responsible approach by developers toward emerging technologies.

    Read more →
  • Sword Art Online

    Sword Art Online

    Sword Art Online (Japanese: ソードアート・オンライン, Hepburn: Sōdo Āto Onrain) is a Japanese light novel series written by Reki Kawahara and illustrated by abec. The series takes place in the 2020s and focuses on protagonists Kazuto "Kirito" Kirigaya and Asuna Yuuki as they play through various virtual reality MMORPG worlds, and later their involvement in the matters of a simulated civilization. Kawahara originally released the series as a web novel on his website from 2002 to 2008. The light novels began publication on ASCII Media Works' Dengeki Bunko imprint from April 10, 2009, with a spin-off series launching in October 2012. The series has spawned twelve manga adaptations published by ASCII Media Works and Kadokawa. The novels and the manga adaptations have been licensed for release in North America by Yen Press. An anime television series produced by A-1 Pictures, known simply as Sword Art Online, aired in Japan between July and December 2012, with a television film Sword Art Online: Extra Edition airing on December 31, 2013, and a second season, titled Sword Art Online II, airing between July and December 2014. An animated film titled Sword Art Online the Movie: Ordinal Scale, featuring an original story by Kawahara, premiered in Japan and Southeast Asia on February 18, 2017, and was released in the United States on March 9, 2017. A spin-off anime series titled Sword Art Online Alternative: Gun Gale Online premiered in April 2018, while a third season titled Sword Art Online: Alicization aired from October 2018 to September 2020. An anime film adaptation of Sword Art Online: Progressive titled Sword Art Online Progressive: Aria of a Starless Night premiered on October 30, 2021. A second film titled Sword Art Online Progressive: Scherzo of Deep Night premiered on October 22, 2022. Many video games based on the series have been released for consoles, PC, and mobile devices. Sword Art Online has achieved widespread commercial success, with the light novels having over 30 million copies sold worldwide. The anime series has received mixed to positive reviews, with praise for its animation, musical score, and exploration of the psychological aspects of virtual reality, but it has also been met with criticisms for its pacing and writing. == Synopsis == === Setting === The light novel series spans several virtual reality worlds, beginning with the game, Sword Art Online (SAO), which is set in a world known as Aincrad. Each world is built on a game engine called Cardinal system, which was initially developed specifically for SAO by Akihiko Kayaba, but was later duplicated for Alfheim Online (ALO), and a consolidated package is later given to Kirito in the form of the World Seed, who had it leaked online with the successful intention of reviving the virtual reality industry. A third world known as Gun Gale Online (GGO) appears in the third arc and is stylized as a first-person shooter game instead of a role-playing game, and is the main setting of Alternative Gun Gale Online. It was created using the World Seed by an American company. A fourth world appears in the fourth arc known as the Underworld (UW). The world itself was created using the World Seed as a base, but it is as realistic as the real world due to using many powerful government resources to keep it running. === Plot === In 2022, a virtual reality massively multiplayer online role-playing game (VRMMORPG) called Sword Art Online (SAO) was released. With the NerveGear, a helmet that stimulates the user's five senses via their brain, players can experience and control their in-game characters with their minds. Both the game and the NerveGear were created by Akihiko Kayaba. On November 6, 10,000 players log into SAO's mainframe cyberspace for the first time, only to discover that they are unable to log out. Kayaba appears and tells the players that they must beat all 100 floors of Aincrad, a steel castle which is the setting of SAO, if they wish to be free. He also states that those who suffer in-game deaths or forcibly remove the NerveGear out-of-game will suffer real-life deaths. A player named Kazuto "Kirito" Kirigaya is one of 1,000 testers in the game's previous closed beta. With the advantage of previous VR gaming experience and a drive to protect other beta testers from discrimination, he isolates himself from the greater groups and plays the game alone, bearing the mantle of "beater", a portmanteau of "beta tester" and "cheater". As the players progress through the game Kirito eventually befriends a young woman named Asuna Yuuki, forming a relationship with and later marrying her in-game. After the duo discover the identity of Kayaba's secret ID, who was playing as "Heathcliff", the leader of the guild Asuna joined in, they confront and destroy him, freeing themselves and the other players from the game. In the real world, Kazuto discovers that 300 SAO players, including Asuna, remain trapped in their NerveGear. As he goes to the hospital to see Asuna, he meets Asuna's father Shouzou Yuuki who is asked by an associate of his, Nobuyuki Sugou, to make a decision, which Sugou later reveals to be his marriage with Asuna, angering Kazuto. Several months later, he is informed by Agil, another SAO survivor, that a figure similar to Asuna was spotted on "The World Tree" in another VRMMORPG cyberspace called Alfheim Online (ALO). Assisted in-game by his cousin and adoptive sister Suguha "Leafa" Kirigaya and Yui, a navigation pixie (originally an AI from SAO), he quickly learns that the trapped players in ALO are part of a plan conceived by Sugou to perform illegal experiments on their minds. The goal is to create the perfect mind-control for financial gain and to subjugate Asuna, whom he intends to marry in the real world, to assume control of her family's corporation. Kirito eventually stops the experiment and rescues the remaining 300 SAO players, foiling Sugou's plans. Before leaving ALO to see Asuna, Kayaba, who has uploaded his mind to the Internet using an experimental, destructively high-powered version of NerveGear at the cost of his life, entrusts Kirito with The Seed – a package program designed to create virtual worlds. Kazuto eventually reunites with Asuna in the real world after thwarting an attack from Sugou and The Seed is released onto the Internet, reviving Aincrad as other VRMMORPGs begin to thrive. One year after the events of SAO, at the prompting of a government official investigating strange occurrences in VR, Kazuto takes on a job to investigate a series of murders involving another VRMMORPG called Gun Gale Online (GGO), the AmuSphere (the successor of the NerveGear), and a player called Death Gun. Aided by a female player named Shino "Sinon" Asada, he participates in a gunfight tournament called the Bullet of Bullets (BoB) and discovers the truth behind the murders, which originated with a player who participated in a player-killing guild in SAO. Through his and Sinon's efforts, two suspects are captured, though the third suspect, Johnny Black, escapes. Kazuto is later recruited to test an experimental FullDive machine, Soul Translator (STL), which has an interface far more realistic and complex than the previous machine he had played, to help RATH, a research and development organization under the Ministry of Defense (MOD), develop an artificial intelligence named A.L.I.C.E. He tests the STL by entering the Underworld (UW), a virtual reality cyberspace created with The Seed package. In the UW, the flow of time proceeds a thousand times faster than in the real world, and Kirito's memories of what happens inside are restricted. However, when Johnny Black ambushes and mortally wounds Kazuto with suxamethonium chloride, RATH recovers Kazuto and places him back into the STL to preserve his mind while attempts are made to save him. During his time in Underworld, Kirito befriends Eugeo, a carver in a small village of Rulid, and helps him on a journey to save Alice Zuberg, his friend who was taken by a group of highly skilled warriors known as the Integrity Knights for accidentally breaking a rule of the Axiom Church, the leaders of the Human Empire. He and Eugeo soon find themselves uncovering the secrets of the Axiom Church, led by a woman only known as "The Administrator", and the true purpose of Underworld itself, while unbeknownst to them, a war against the opposing Dark Territory is brewing on the horizon. They meet Alice, now an Integrity Knight, and though she does not remember them, Kirito helps her remember her true identity: a form of true artificial intelligence known as A.L.I.C.E. In the battle against the Administrator, Kirito manages to slay her, though Eugeo dies in the process, to Kirito's dismay. Meanwhile, in the real world, conflict escalates as American forces raid RATH's facility in the Ocean Turtle in an effort to take A.L.I.C.E. for purposes unknown. Two of the attackers - Gabriel "Vecta" Miller and Vassago "Prince of Hell" Cassals - take contr

    Read more →
  • For a Breath I Tarry

    For a Breath I Tarry

    "For a Breath I Tarry" is a 1966 post-apocalyptic novelette by American writer Roger Zelazny, which was nominated for the Hugo Award for Best Novelette in 1967. Set in a future long after the self-extinction of humanity, the novelette recounts the tale of Frost, a sentient machine. Although humans have caused their own extinction, the sentient machines that they created continue the work of rebuilding a shattered Earth. Along the way, the story explores the differences between humanity and machines, the former experiencing the world qualitatively, while the latter doing so quantitatively. This difference is illustrated through philosophical conversations between Frost and another machine named Mordel. Frost's goal of becoming human, along with literary allusions, drives the plot and sets the tone of the novelette. These allusions include the first chapter of the Book of Job, in both situation and language, since verses are both quoted directly and paraphrased. In addition, the first three chapters of the Book of Genesis are echoed. Finally, Frost and Mordel enter into a Faustian bargain, though with better results than in the original story. The other major character is the Beta Machine, Frost's peer in the Southern Hemisphere. (Frost controls the Northern Hemisphere.) The novelette hints that though being a machine, Beta has a feminine personality. After Frost has succeeded in his millennium-long quest to become human (via recovered DNA), Beta agrees to join him in becoming human—suggesting the possibility of rebirth for the human race. The novelette has appeared in collections of Zelazny's works and in anthologies. The title is from a phrase in the poet A. E. Housman's collection A Shropshire Lad.

    Read more →
  • Environmental impact of AI

    Environmental impact of AI

    The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl

    Read more →
  • AI art

    AI art

    Artificial intelligence visual art, or AI art, is visual artwork generated or enhanced through the implementation of artificial intelligence (AI) programs, most commonly using text-to-image models. The process of automated art-making has existed since antiquity. The field of artificial intelligence was founded in the 1950s, and artists began to create art with artificial intelligence shortly after the discipline's founding. A select number of these creations have been showcased in museums and have been recognized with awards. Throughout its history, AI has raised many philosophical questions related to the human mind, artificial beings, and the nature of art in human–AI collaboration. During the AI boom of the 2020s, text-to-image models such as Midjourney, DALL-E and Stable Diffusion became widely available to the public, allowing users to quickly generate imagery with little effort. Commentary about AI art in the 2020s has often focused on issues related to copyright, deception, defamation, and its impact on more traditional artists, including technological unemployment. In August 2023, the US Supreme Court ruled that AI art is ineligible for copyright due to failure to meet human authorship. In March 2026, it declined to hear a case over whether AI-generated art can be subject to copyright. == History == === Early history === Automated art dates back at least to the automata of ancient Greek civilization, when inventors such as Daedalus and Hero of Alexandria were described as designing machines capable of writing text, generating sounds, and playing music. Creative automatons have flourished throughout history, such as Maillardet's automaton, created around 1800 and capable of creating multiple drawings and poems. Also in the 19th century, Ada Lovelace, wrote that "computing operations" could potentially be used to generate music and poems. In 1950, Alan Turing's paper "Computing Machinery and Intelligence" focused on whether machines can mimic human behavior convincingly. Shortly after, the academic discipline of artificial intelligence was founded at a research workshop at Dartmouth College in 1956. Since its founding, AI researchers have explored philosophical questions about the nature of the human mind and the consequences of creating artificial beings with human-like intelligence; these issues have previously been explored by myth, fiction, and philosophy since antiquity. === Artistic history === Since the founding of AI in the 1950s, artists have used artificial intelligence to create artistic works. These works were sometimes referred to as algorithmic art, computer art, digital art, or new media art. One of the first significant AI art systems is AARON, developed by Harold Cohen beginning in the late 1960s at the University of California at San Diego. AARON uses a symbolic rule-based approach to generate technical images in the era of GOFAI programming, and it was developed by Cohen with the goal of being able to code the act of drawing. AARON was exhibited in 1972 at the Los Angeles County Museum of Art. From 1973 to 1975, Cohen refined AARON during a residency at the Artificial Intelligence Laboratory at Stanford University. In 2024, the Whitney Museum of American Art exhibited AI art from throughout Cohen's career, including re-created versions of his early robotic drawing machines. Karl Sims has exhibited art created with artificial life since the 1980s. He received an M.S. in computer graphics from the MIT Media Lab in 1987 and was artist-in-residence from 1990 to 1996 at the supercomputer manufacturer and artificial intelligence company Thinking Machines. In both 1991 and 1992, Sims won the Golden Nica award at Prix Ars Electronica for his videos using artificial evolution. In 1997, Sims created the interactive artificial evolution installation Galápagos for the NTT InterCommunication Center in Tokyo. Sims received an Emmy Award in 2019 for outstanding achievement in engineering development. In 1999, Scott Draves and a team of several engineers created and released Electric Sheep as a free software screensaver. Electric Sheep is a volunteer computing project for animating and evolving fractal flames, which are distributed to networked computers that display them as a screensaver. The screensaver used AI to create an infinite animation by learning from its audience. In 2001, Draves won the Fundacion Telefónica Life 4.0 prize for Electric Sheep. In 2014, Stephanie Dinkins began working on Conversations with Bina48. For the series, Dinkins recorded her conversations with BINA48, a social robot that resembles a middle-aged black woman. In 2019, Dinkins won the Creative Capital award for her creation of an evolving artificial intelligence based on the "interests and culture(s) of people of color." In 2015, Sougwen Chung began Mimicry (Drawing Operations Unit: Generation 1), an ongoing collaboration between the artist and a robotic arm. In 2019, Chung won the Lumen Prize for her continued performances with a robotic arm that uses AI to attempt to draw in a manner similar to Chung. In 2018, an auction sale of artificial intelligence art was held at Christie's in New York where the AI artwork Edmond de Belamy sold for US$432,500, which was almost 45 times higher than its estimate of US$7,000–10,000. The artwork was created by Obvious, a Paris-based collective. In 2024, Japanese film generAIdoscope was released. The film was co-directed by Hirotaka Adachi, Takeshi Sone, and Hiroki Yamaguchi. All video, audio, and music in the film were created with artificial intelligence. In 2025, the Japanese anime television series Twins Hinahima was released. The anime was produced and animated with AI assistance during the process of cutting and conversion of photographs into anime illustrations and later retouched by art staff. Most of the remaining parts such as characters and logos were hand-drawn with various software. === Technical history === Deep learning, characterized by its multi-layer structure that attempts to mimic the human brain, first came about in the 2010s, causing a significant shift in the world of AI art. During the deep learning era, there are mainly these types of designs for generative art: autoregressive models, diffusion models, GANs, normalizing flows. In 2014, Ian Goodfellow and colleagues at Université de Montréal developed the generative adversarial network (GAN), a type of deep neural network capable of learning to mimic the statistical distribution of input data such as images. The GAN uses a "generator" to create new images and a "discriminator" to decide which created images are considered successful. Unlike previous algorithmic art that followed hand-coded rules, generative adversarial networks could learn a specific aesthetic by analyzing a dataset of example images. In 2015, a team at Google released DeepDream, a program that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia. The process creates deliberately over-processed images with a dream-like appearance reminiscent of a psychedelic experience. Later, in 2017, a conditional GAN learned to generate 1000 image classes of ImageNet, a large visual database designed for use in visual object recognition software research. By conditioning the GAN on both random noise and a specific class label, this approach enhanced the quality of image synthesis for class-conditional models. Autoregressive models were used for image generation, such as PixelRNN (2016), which autoregressively generates one pixel after another with a recurrent neural network. Immediately after the Transformer architecture was proposed in Attention Is All You Need (2018), it was used for autoregressive generation of images, but without text conditioning. The website Artbreeder, launched in 2018, uses the models StyleGAN and BigGAN to allow users to generate and modify images such as faces, landscapes, and paintings. In the 2020s, text-to-image models, which generate images based on prompts, became widely used, marking yet another shift in the creation of AI-generated artworks. In 2021, using the influential large language generative pre-trained transformer models that are used in GPT-2 and GPT-3, OpenAI released a series of images created with the text-to-image AI model DALL-E 1. It is an autoregressive generative model with essentially the same architecture as GPT-3. Along with this, later in 2021, EleutherAI released the open source VQGAN-CLIP based on OpenAI's CLIP model. Diffusion models, generative models used to create synthetic data based on existing data, were first proposed in 2015, but they only became better than GANs in early 2021. Latent diffusion model was published in December 2021 and became the basis for the later Stable Diffusion (August 2022), developed through a collaboration between Stability AI, CompVis Group at LMU Munich, and Runway. In 2022, Midjourney was released, followed by Google Brain's Imagen and Pa

    Read more →
  • The Great Automatic Grammatizator

    The Great Automatic Grammatizator

    The Great Automatic Grammatizator (published in the U.S. as The Umbrella Man and Other Stories) is a posthumous 1998 collection of thirteen short stories written by British author Roald Dahl. The stories were selected for teenagers from Dahl's adult works. All the stories included were published elsewhere originally; their sources are noted below. The stories, with the exception of the war story "Katina", possess a deadpan, ironic, bizarre, or even macabre sense of humor. They generally end with unexpected plot twists. == Stories == "The Great Automatic Grammatizator" (from Someone Like You): A mechanically-minded man reasons that the rules of grammar are fixed by certain, almost mathematical principles. By exploiting this idea, he is able to create a mammoth machine that can write a prize-winning novel in roughly fifteen minutes. The story ends on a fearful note, as more and more of the world's writers are forced into licensing their names—and all hope of human creativity—to the machine. "Mrs. Bixby and the Colonel's Coat" (from Kiss Kiss): Mrs. Bixby cheats on her dentist husband with a rich, dashing colonel. When their relationship breaks off, the colonel offers Mrs. Bixby a gorgeous and expensive mink coat. In an attempt to explain the coat away, Mrs. Bixby sets up an elaborate trick with the help of a pawn shop—but her husband learns of the ruse and manages to turn the tables. "The Butler" (from More Tales of the Unexpected): An obnoxious and newly wealthy couple employs a butler and chef to impress dinner guests. The butler recommends that the husband buy expensive wines to please his guests, and the man slavishly follows the idea. The butler and the chef reap the rewards of this idea, while making fools of the "fashionable" couple. "Man from the South" (from Someone Like You): At a seaside resort in Jamaica, a strange old man makes a bet with an American man in his late teens. If the young man's cigarette lighter can spark ten times without fail, the American will win a brand-new Cadillac car—but failure means losing the little finger of his right hand. The high-tension wager ensues, and with only a few sparks left, a woman—who knows only too well the cost of the old man's bets—appears and stops the madness. "The Landlady" (from Kiss Kiss): A young man traveling to London on business stops at a bed and breakfast along the way, where a strange and slightly dotty landlady eagerly welcomes him. The eccentric nature of the house, and the news that only two other young men have ever stayed there, confuse and frighten the young man. In the end, the landlady—who indulges in the hobby of taxidermy—and the boy share a drink of tea that tastes of bitter almonds, and the landlady softly smiles at what may be her latest stuffing project. "Parson's Pleasure" (from Kiss Kiss): A man discovers an extremely rare piece of Chippendale furniture at the farm of some boorish ranchers. He desperately attempts to buy the piece cheap, in the hope of selling it at auction to earn a huge profit. He manages to buy the piece "for firewood", only for the ranchers to destroy it in an attempt to make it fit into his car. "The Umbrella Man" (from More Tales of the Unexpected): On a rainy day, a mother and daughter meet a gentlemanly old man on a street corner, who offers them a beautiful silk umbrella in exchange for a pound note. They trade, and the daughter notices that the "feeble" old man suddenly seems much sprier. They follow him, and discover that the gentleman is a con artist who visits various pubs, has a drink, and then steals another umbrella to continue the cycle. "Katina" (from Over to You: Ten Stories of Flyers and Flying): A group of RAF pilots stationed in Greece during World War II discover a hauntingly beautiful young girl, whose "family is beneath the rubble." She becomes their squadron's unofficial "mascot". In the end, her fragile life is taken as she stands defiantly against a rain of bullets from Nazi aircraft, shaking her fists at the heavens. "The Way Up to Heaven" (from Kiss Kiss): Mrs. Foster suffers from a chronic phobia of being late for appointments. Her husband enjoys the cruel sport of purposely delaying their activities, just to rile his wife. On the day when Mrs. Foster is due to fly to Paris to visit her grandchildren, her husband engages in his usual tricks. But as Mrs. Foster rushes from their taxi to the house to find him, she hears a strange noise—and turns triumphantly toward her cab. It is only when she returns, and calls a man to "repair the lift" that was stuck between floors in the house, that readers guess Mr. Foster's fate. "Royal Jelly" (from Kiss Kiss): New parents fear for the life of their little girl, who is sickly and dangerously underweight. The husband, a beekeeper, remembers hearing of the miraculous royal jelly used by bees to transform one particular larva into a queen. He adds the mixture to his daughter's bottles, and she puts on weight at an astonishing rate. The mother senses that something is amiss, and the husband confesses his actions—along with the fact that he himself swallowed buckets of the jelly for months in an attempt to cure his impotence. The royal jelly did the trick—but the strange side-effects include a disturbing metamorphosis for both father and daughter. "Vengeance is Mine Inc." (from More Tales of the Unexpected): Two brothers who are short of cash bemoan their fate over breakfast while reading the society column of a newspaper. They hit upon a scheme to take revenge on cruel tabloid writers in exchange for money from wealthy patrons. The unconventional plan works, and the brothers line their pockets with the spoils of their plans. "Taste" (from Someone Like You): A rich man with a beautiful young daughter hosts a dinner party, inviting a famous connoisseur of fine wines. When the rich man boasts that he has a wine that the expert cannot identify, the stakes become frighteningly high: if he can guess the name and vintage of the wine, he will win his daughter's hand. After an elaborate show, the expert guesses correctly; however, the family's maid appears and inadvertently exposes the guest as a cheat, thus saving the girl. "Neck" (from Someone Like You): A newspaper heir finds himself suddenly engaged to the voluptuous and controlling Lady Tutton. He loses all control of his life, and only his trusted butler and friends realize how broken he is by her control. A weekend trip to their estate, however, proves the perfect opportunity for Lord Tutton to engage in revenge against his wicked wife: her head is trapped in a valuable piece of wooden sculpture, and he must decide whether to use a saw or an axe to cut her free. == Publication details == Dahl, Roald (19 January 2004). The Umbrella Man and Other Stories. Speak. ISBN 9780142400876. == Reception == Groff Conklin in 1954 called the short story "The Great Automatic Grammatizator" "an awe-inspiring fantasy-satire ... an unforgettable bit of biting nonsense".

    Read more →