AI art

AI art

Artificial intelligence visual art, or AI art, is visual artwork generated or enhanced through the implementation of artificial intelligence (AI) programs, most commonly using text-to-image models. The process of automated art-making has existed since antiquity. The field of artificial intelligence was founded in the 1950s, and artists began to create art with artificial intelligence shortly after the discipline's founding. A select number of these creations have been showcased in museums and have been recognized with awards. Throughout its history, AI has raised many philosophical questions related to the human mind, artificial beings, and the nature of art in human–AI collaboration. During the AI boom of the 2020s, text-to-image models such as Midjourney, DALL-E and Stable Diffusion became widely available to the public, allowing users to quickly generate imagery with little effort. Commentary about AI art in the 2020s has often focused on issues related to copyright, deception, defamation, and its impact on more traditional artists, including technological unemployment. In August 2023, the US Supreme Court ruled that AI art is ineligible for copyright due to failure to meet human authorship. In March 2026, it declined to hear a case over whether AI-generated art can be subject to copyright. == History == === Early history === Automated art dates back at least to the automata of ancient Greek civilization, when inventors such as Daedalus and Hero of Alexandria were described as designing machines capable of writing text, generating sounds, and playing music. Creative automatons have flourished throughout history, such as Maillardet's automaton, created around 1800 and capable of creating multiple drawings and poems. Also in the 19th century, Ada Lovelace, wrote that "computing operations" could potentially be used to generate music and poems. In 1950, Alan Turing's paper "Computing Machinery and Intelligence" focused on whether machines can mimic human behavior convincingly. Shortly after, the academic discipline of artificial intelligence was founded at a research workshop at Dartmouth College in 1956. Since its founding, AI researchers have explored philosophical questions about the nature of the human mind and the consequences of creating artificial beings with human-like intelligence; these issues have previously been explored by myth, fiction, and philosophy since antiquity. === Artistic history === Since the founding of AI in the 1950s, artists have used artificial intelligence to create artistic works. These works were sometimes referred to as algorithmic art, computer art, digital art, or new media art. One of the first significant AI art systems is AARON, developed by Harold Cohen beginning in the late 1960s at the University of California at San Diego. AARON uses a symbolic rule-based approach to generate technical images in the era of GOFAI programming, and it was developed by Cohen with the goal of being able to code the act of drawing. AARON was exhibited in 1972 at the Los Angeles County Museum of Art. From 1973 to 1975, Cohen refined AARON during a residency at the Artificial Intelligence Laboratory at Stanford University. In 2024, the Whitney Museum of American Art exhibited AI art from throughout Cohen's career, including re-created versions of his early robotic drawing machines. Karl Sims has exhibited art created with artificial life since the 1980s. He received an M.S. in computer graphics from the MIT Media Lab in 1987 and was artist-in-residence from 1990 to 1996 at the supercomputer manufacturer and artificial intelligence company Thinking Machines. In both 1991 and 1992, Sims won the Golden Nica award at Prix Ars Electronica for his videos using artificial evolution. In 1997, Sims created the interactive artificial evolution installation Galápagos for the NTT InterCommunication Center in Tokyo. Sims received an Emmy Award in 2019 for outstanding achievement in engineering development. In 1999, Scott Draves and a team of several engineers created and released Electric Sheep as a free software screensaver. Electric Sheep is a volunteer computing project for animating and evolving fractal flames, which are distributed to networked computers that display them as a screensaver. The screensaver used AI to create an infinite animation by learning from its audience. In 2001, Draves won the Fundacion Telefónica Life 4.0 prize for Electric Sheep. In 2014, Stephanie Dinkins began working on Conversations with Bina48. For the series, Dinkins recorded her conversations with BINA48, a social robot that resembles a middle-aged black woman. In 2019, Dinkins won the Creative Capital award for her creation of an evolving artificial intelligence based on the "interests and culture(s) of people of color." In 2015, Sougwen Chung began Mimicry (Drawing Operations Unit: Generation 1), an ongoing collaboration between the artist and a robotic arm. In 2019, Chung won the Lumen Prize for her continued performances with a robotic arm that uses AI to attempt to draw in a manner similar to Chung. In 2018, an auction sale of artificial intelligence art was held at Christie's in New York where the AI artwork Edmond de Belamy sold for US$432,500, which was almost 45 times higher than its estimate of US$7,000–10,000. The artwork was created by Obvious, a Paris-based collective. In 2024, Japanese film generAIdoscope was released. The film was co-directed by Hirotaka Adachi, Takeshi Sone, and Hiroki Yamaguchi. All video, audio, and music in the film were created with artificial intelligence. In 2025, the Japanese anime television series Twins Hinahima was released. The anime was produced and animated with AI assistance during the process of cutting and conversion of photographs into anime illustrations and later retouched by art staff. Most of the remaining parts such as characters and logos were hand-drawn with various software. === Technical history === Deep learning, characterized by its multi-layer structure that attempts to mimic the human brain, first came about in the 2010s, causing a significant shift in the world of AI art. During the deep learning era, there are mainly these types of designs for generative art: autoregressive models, diffusion models, GANs, normalizing flows. In 2014, Ian Goodfellow and colleagues at Université de Montréal developed the generative adversarial network (GAN), a type of deep neural network capable of learning to mimic the statistical distribution of input data such as images. The GAN uses a "generator" to create new images and a "discriminator" to decide which created images are considered successful. Unlike previous algorithmic art that followed hand-coded rules, generative adversarial networks could learn a specific aesthetic by analyzing a dataset of example images. In 2015, a team at Google released DeepDream, a program that uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia. The process creates deliberately over-processed images with a dream-like appearance reminiscent of a psychedelic experience. Later, in 2017, a conditional GAN learned to generate 1000 image classes of ImageNet, a large visual database designed for use in visual object recognition software research. By conditioning the GAN on both random noise and a specific class label, this approach enhanced the quality of image synthesis for class-conditional models. Autoregressive models were used for image generation, such as PixelRNN (2016), which autoregressively generates one pixel after another with a recurrent neural network. Immediately after the Transformer architecture was proposed in Attention Is All You Need (2018), it was used for autoregressive generation of images, but without text conditioning. The website Artbreeder, launched in 2018, uses the models StyleGAN and BigGAN to allow users to generate and modify images such as faces, landscapes, and paintings. In the 2020s, text-to-image models, which generate images based on prompts, became widely used, marking yet another shift in the creation of AI-generated artworks. In 2021, using the influential large language generative pre-trained transformer models that are used in GPT-2 and GPT-3, OpenAI released a series of images created with the text-to-image AI model DALL-E 1. It is an autoregressive generative model with essentially the same architecture as GPT-3. Along with this, later in 2021, EleutherAI released the open source VQGAN-CLIP based on OpenAI's CLIP model. Diffusion models, generative models used to create synthetic data based on existing data, were first proposed in 2015, but they only became better than GANs in early 2021. Latent diffusion model was published in December 2021 and became the basis for the later Stable Diffusion (August 2022), developed through a collaboration between Stability AI, CompVis Group at LMU Munich, and Runway. In 2022, Midjourney was released, followed by Google Brain's Imagen and Pa

Hi uTandem

Hi uTandem, also known as uTandem, is a free language exchange mobile app. It helps people to connect with other language learners in order to carry out face-to-face language exchange sessions and also offers learners lists of businesses in the field of language learning or language exchange. == Use == Hi uTandem is built around the concept of language exchange, which is a method of language learning based on mutual oral linguistic exchange between partners. Ideally, each partner is a native speaker of the language they are helping their counterpart to learn. The app designed for users to chat with other users and translate messages, find suitable language partners and to locate language schools, bars, cafés and language exchange groups around them. == Team and development == Hi uTandem was released in January, 2016. The initial idea was conceived by Alberto Rodríguez as part of a team of eight Spanish youngsters. Hi uTandem belongs to the company Velvor Tech S.L., founded by the same members and registered in Ronda (Spain). == Reception == Hi uTandem was listed on the Top 4 Apps to Learn Languages list by ElPlural.com and since its launch it has been featured in numerous online and physical sources, including 20 minutos, Europapress, ABC Andalucía and Telefónica's Think Big Blog.

Kruti

Kruti is a multilingual AI agent and chatbot developed by the Indian company Ola Krutrim. It is designed to perform real-world tasks for users, such as booking taxis and ordering food, by integrating directly with various online services. It is notable for its ability to understand and respond in multiple Indian languages. Developed by a team founded by Bhavish Aggarwal, Kruti functions as an "agentic" AI, meaning it can reason, plan, and execute multi-step tasks to fulfill a user's request. The backend technology combines several open-source large language models with Ola's proprietary Krutrim V2 model. The system was developed to work primarily on smartphones, addressing the Indian market's specific needs, including language diversity and potential bandwidth constraints. Kruti was officially released in June 2025, replacing an earlier chatbot from the company that was also named Krutrim. Initially supporting 13 languages, the company plans to expand its capabilities to 22 Indian languages. == Background == Kruti is an improved version of Ola's Krutrim chatbot, which was first launched in 2023 and was intended to be replaced by Kruti. It was officially released on 12 June 2025 as an upgrade to passive chatbots, with support for text and voice in 13 Indian languages. As an agentic AI, it can execute tasks with customization and reasoning, providing adaptive answers based on user preferences and past interactions. Kruti is optimized for smartphone usage and designed to accommodate bandwidth constraints and usage patterns in India. To ensure scalability and cost-effective performance, it combines various open-source large language models with Ola's own Krutrim V2, which has 12 billion parameters. Its speech recognition is built to identify regional Indian languages, dialects, and accents. Due to its integration with numerous apps and services, Kruti is context-aware and can proactively complete tasks. Initially connected only with Ola ecosystem services, Krutrim intends to expand and incorporate various Indian services into Kruti, with the goal of adding services from Blinkit, Swiggy, and Uber with respective voice command support. On 20 June 2025, Krutrim acquired the AI platform BharatSah‘AI’yak to increase its involvement in government, education, and agriculture projects. This acquisition will allow Kruti to assist in broadening the scope of BharatSah'AI'yak's work on India-centric, vernacular retrieval-augmented generation AI bots. == Development == Kruti is designed to perform tasks with minimal user input, accepting documents, images, and text, without requiring users to switch between applications. Its agentic framework breaks queries into sub-tasks executed by multiple agents working sequentially or concurrently, with reported accuracy exceeding 90%. Kruti connects to company databases and APIs via the Model Context Protocol and presents responses as summaries, tables, or narratives adapted to user behaviour. The system supports payments via credit/debit cards and UPI. The underlying stack, which includes foundation models and AI training and inference systems, is intended to support adaptation across sectors such as healthcare, education, and finance. Ola Cabs and the Open Network for Digital Commerce have begun integrating Kruti into their platforms pending broader reliability testing.

Quantum natural language processing

Quantum natural language processing (QNLP) is the application of quantum computing to natural language processing (NLP). It computes word embeddings as parameterised quantum circuits that can solve NLP tasks faster than any classical computer. It is inspired by categorical quantum mechanics and the DisCoCat framework, making use of string diagrams to translate from grammatical structure to quantum processes. == Theory == The first quantum algorithm for natural language processing used the DisCoCat framework and Grover's algorithm to show a quadratic quantum speedup for a text classification task. It was later shown that quantum language processing is BQP-Complete, i.e. quantum language models are more expressive than their classical counterpart, unless quantum mechanics can be efficiently simulated by classical computers. These two theoretical results assume fault-tolerant quantum computation and a QRAM, i.e. an efficient way to load classical data on a quantum computer. Thus, they are not applicable to the noisy intermediate-scale quantum (NISQ) computers available today. == Experiments == The algorithm of Zeng and Coecke was adapted to the constraints of NISQ computers and implemented on IBM quantum computers to solve binary classification tasks. Instead of loading classical word vectors onto a quantum memory, the word vectors are computed directly as the parameters of quantum circuits. These parameters are optimised using methods from quantum machine learning to solve data-driven tasks such as question answering, machine translation and even algorithmic music composition.

Ontology learning

Ontology learning (ontology extraction, ontology augmentation generation, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process. Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical or symbolic techniques are used to extract relation signatures, often based on pattern-based or definition-based hypernym extraction techniques. == Procedure == Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text. The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system. === Domain terminology extraction === During the domain terminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures. === Concept discovery === In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step. === Concept hierarchy derivation === In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage. === Learning of non-taxonomic relations === In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy. === Rule discovery === During rule discovery, axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist. === Ontology population === At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application of bootstrapping methods, which collect relation tuples. === Concept hierarchy extension === In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures. === Frame and Event detection === During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM with kernel methods to semantic role labeling (SRL) to deep semantic parsing techniques. == Tools == Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.

Microsoft Forms

Microsoft Forms (formerly Office 365 Forms) is an online survey creator, part of Microsoft 365. == Usage == Forms allows users to create surveys and quizzes with automatic marking. The data can be exported to Microsoft Excel, Power BI dashboards and viewed live using the Present feature. == Phishing and fraud == Due to a wave of phishing attacks utilizing Microsoft 365 in early 2021, Microsoft uses algorithms to automatically detect and block phishing attempts with Microsoft Forms. Also, Microsoft advises Forms users not to submit personal information, such as passwords, in a form or survey. It also place a similar advisory underneath the “Submit” button in every form created with Forms, warning users not to give out their password.

Teknomo–Fernandez algorithm

The Teknomo–Fernandez algorithm (TF algorithm), is an efficient algorithm for generating the background image of a given video sequence. By assuming that the background image is shown in the majority of the video, the algorithm is able to generate a good background image of a video in O ( R ) {\displaystyle O(R)} -time using only a small number of binary operations and Boolean bit operations, which require a small amount of memory and has built-in operators found in many programming languages such as C, C++, and Java. == History == People tracking from videos usually involves some form of background subtraction to segment foreground from background. Once foreground images are extracted, then desired algorithms (such as those for motion tracking, object tracking, and facial recognition) may be executed using these images. However, background subtraction requires that the background image is already available and unfortunately, this is not always the case. Traditionally, the background image is searched for manually or automatically from the video images when there are no objects. More recently, automatic background generation through object detection, medial filtering, medoid filtering, approximated median filtering, linear predictive filter, non-parametric model, Kalman filter, and adaptive smoothening have been suggested; however, most of these methods have high computational complexity and are resource-intensive. The Teknomo–Fernandez algorithm is also an automatic background generation algorithm. Its advantage, however, is its computational speed of only O ( R ) {\displaystyle O(R)} -time, depending on the resolution R {\displaystyle R} of an image and its accuracy gained within a manageable number of frames. Only at least three frames from a video is needed to produce the background image assuming that for every pixel position, the background occurs in the majority of the videos. Furthermore, it can be performed for both grayscale and colored videos. == Assumptions == The camera is stationary. The light of the environment changes only slowly relative to the motions of the people in the scene. The number of people does not occupy the scene for most of the time at the same place. Generally, however, the algorithm will certainly work whenever the following single important assumption holds: For each pixel position, the majority of the pixel values in the entire video contain the pixel value of the actual background image (at that position).As long as each part of the background is shown in the majority of the video, the entire background image needs not to appear in any of its frames. The algorithm is expected to work accurately. == Background image generation == === Equations === For three frames of image sequence x 1 {\displaystyle x_{1}} , x 2 {\displaystyle x_{2}} , and x 3 {\displaystyle x_{3}} , the background image B {\displaystyle B} is obtained using B = x 3 ( x 1 ⊕ x 2 ) + x 1 x 2 {\displaystyle B=x_{3}(x_{1}\oplus x_{2})+x_{1}x_{2}} where ⊕ {\displaystyle \oplus } denotes the exclusive disjunctive bit operator. The Boolean mode function S {\displaystyle S} of the table occurs when the number of 1 entries is larger than half of the number of images such that S = { 1 , if ∑ i = 1 n x i ≥ ⌈ n 2 + 1 ⌉ , and n ≥ 3 0 , otherwise {\displaystyle S={\begin{cases}1,&{\text{if }}\sum _{i=1}^{n}x_{i}\geq \left\lceil {\frac {n}{2}}+1\right\rceil ,{\text{ and }}n\geq 3\\0,&{\text{otherwise}}\end{cases}}} For three images, the background image B {\displaystyle B} can be taken as the value x ¯ 1 x 2 x 3 + x 1 x ¯ 2 x 3 + x 1 x 2 x ¯ 3 + x 1 x 2 x 3 {\displaystyle {\bar {x}}_{1}x_{2}x_{3}+x_{1}{\bar {x}}_{2}x_{3}+x_{1}x_{2}{\bar {x}}_{3}+x_{1}x_{2}x_{3}} === Background generation algorithm === At the first level, three frames are selected at random from the image sequence to produce a background image by combining them using the first equation. This yields a better background image at the second level. The procedure is repeated until desired level L {\displaystyle L} . == Theoretical accuracy == At level ℓ {\displaystyle \ell } , the probability p ℓ {\displaystyle p_{\ell }} that the modal bit predicted is the actual modal bit is represented by the equation p ℓ = ( p ℓ − 1 ) 3 + 3 ( p ℓ − 1 ) 2 ( 1 − p ℓ − 1 ) {\displaystyle p_{\ell }=(p_{\ell -1})^{3}+3(p_{\ell -1})^{2}(1-p_{\ell -1})} . The table below gives the computed probability values across several levels using some specific initial probabilities. It can be observed that even if the modal bit at the considered position is at a low 60% of the frames, the probability of accurate modal bit determination is already more than 99% at 6 levels. == Space complexity == The space requirement of the Teknomo–Fernandez algorithm is given by the function O ( R F + R 3 L ) {\displaystyle O(RF+R3^{L})} , depending on the resolution R {\displaystyle R} of the image, the number F {\displaystyle F} of frames in the video, and the desired number L {\displaystyle L} of levels. However, the fact that L {\displaystyle L} will probably not exceed 6 reduces the space complexity to O ( R F ) {\displaystyle O(RF)} . == Time complexity == The entire algorithm runs in O ( R ) {\displaystyle O(R)} -time, only depending on the resolution of the image. Computing the modal bit for each bit can be done in O ( 1 ) {\displaystyle O(1)} -time while the computation of the resulting image from the three given images can be done in O ( R ) {\displaystyle O(R)} -time. The number of the images to be processed in L {\displaystyle L} levels is O ( 3 L ) {\displaystyle O(3^{L})} . However, since L ≤ 6 {\displaystyle L\leq 6} , then this is actually O ( 1 ) {\displaystyle O(1)} , thus the algorithm runs in O ( R ) {\displaystyle O(R)} . == Variants == A variant of the Teknomo–Fernandez algorithm that incorporates the Monte-Carlo method named CRF has been developed. Two different configurations of CRF were implemented: CRF9,2 and CRF81,1. Experiments on some colored video sequences showed that the CRF configurations outperform the TF algorithm in terms of accuracy. However, the TF algorithm remains more efficient in terms of processing time. == Applications == Object detection Face detection Face recognition Pedestrian detection Video surveillance Motion capture Human-computer interaction Content-based video coding Traffic monitoring Real-time gesture recognition