DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts. The first version of DALL-E was announced in January 2021. In the following year, its successor DALL-E 2 was released. DALL-E 3 was released natively into ChatGPT for ChatGPT Plus and ChatGPT Enterprise customers in October 2023, with availability via OpenAI's API and "Labs" platform provided in early November. Microsoft implemented the model in Bing's Image Creator tool and plans to implement it into their Designer app. With Bing's Image Creator tool, Microsoft Copilot runs on DALL-E 3. In March 2025, DALL-E-3 was replaced in ChatGPT by GPT Image's native image-generation capabilities. == History and background == DALL-E was revealed by OpenAI in a blog post on 5 January 2021, and uses a version of GPT-3 modified to generate images. On 6 April 2022, OpenAI announced DALL-E 2, a successor designed to generate more realistic images at higher resolutions that "can combine concepts, attributes, and styles". On 20 July 2022, DALL-E 2 entered into a beta phase with invitations sent to 1 million waitlisted individuals; users could generate a certain number of images for free every month and may purchase more. Access had previously been restricted to pre-selected users for a research preview due to concerns about ethics and safety. On 28 September 2022, DALL-E 2 was opened to everyone and the waitlist requirement was removed. In September 2023, OpenAI announced their latest image model, DALL-E 3, capable of understanding "significantly more nuance and detail" than previous iterations. In early November 2022, OpenAI released DALL-E 2 as an API, allowing developers to integrate the model into their own applications. Microsoft unveiled their implementation of DALL-E 2 in their Designer app and Image Creator tool included in Bing and Microsoft Edge. The API operates on a cost-per-image basis, with prices varying depending on image resolution. Volume discounts are available to companies working with OpenAI's enterprise team. The software's name is a portmanteau of the names of animated robot Pixar character WALL-E and the Spanish surrealist artist Salvador Dalí. In February 2024, OpenAI began adding watermarks to DALL-E generated images, containing metadata in the C2PA (Coalition for Content Provenance and Authenticity) standard promoted by the Content Authenticity Initiative. == Technology == The first generative pre-trained transformer (GPT) model was initially developed by OpenAI in 2018, using a Transformer architecture. The first iteration, GPT-1, was scaled up to produce GPT-2 in 2019; in 2020, it was scaled up again to produce GPT-3, with 175 billion parameters. === DALL-E === DALL-E has three components: a discrete VAE, an autoregressive decoder-only Transformer model (12 billion parameters) similar to GPT-3, and a CLIP pair of image encoder and text encoder. The discrete VAE can convert an image to a sequence of tokens, and conversely, convert a sequence of tokens back to an image. This is necessary as the Transformer model does not directly process image data. The input to the Transformer model is a sequence of tokenised image caption followed by tokenised image patches. The image caption is in English, tokenised by byte pair encoding (vocabulary size 16384), and can be up to 256 tokens long. Each image is a 256×256 RGB image, divided into 32×32 patches of 4×4 each. Each patch is then converted by a discrete variational autoencoder to a token (vocabulary size 8192). DALL-E was developed and announced to the public in conjunction with CLIP (Contrastive Language-Image Pre-training). CLIP is a separate model based on contrastive learning that was trained on 400 million pairs of images with text captions scraped from the Internet. Its role is to "understand and rank" DALL-E's output by predicting which caption from a list of 32,768 captions randomly selected from the dataset (of which one was the correct answer) is most appropriate for an image. A trained CLIP pair is used to filter a larger initial list of images generated by DALL-E to select the image that is closest to the text prompt. === DALL-E 2 === DALL-E 2 uses 3.5 billion parameters, a smaller number than its predecessor. Instead of an autoregressive Transformer, DALL-E 2 uses a diffusion model conditioned on CLIP image embeddings, which, during inference, are generated from CLIP text embeddings by a prior model. This is the same architecture as that of Stable Diffusion, released a few months later. === DALL-E 3 === While a technical report was written for DALL-E 3, it does not include training or implementation details of the model, instead focusing on the improved prompt following capabilities developed for DALL-E 3. == Capabilities == DALL-E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji. It can "manipulate and rearrange" objects in its images, and can correctly place design elements in novel compositions without explicit instruction. Thom Dunn writing for BoingBoing remarked that "For example, when asked to draw a daikon radish blowing its nose, sipping a latte, or riding a unicycle, DALL-E often draws the handkerchief, hands, and feet in plausible locations." DALL-E showed the ability to "fill in the blanks" to infer appropriate details without specific prompts, such as adding Christmas imagery to prompts commonly associated with the celebration, and appropriately placed shadows to images that did not mention them. Furthermore, DALL-E exhibits a broad understanding of visual and design trends. DALL-E can produce images for a wide variety of arbitrary descriptions from various viewpoints with only rare failures. Mark Riedl, an associate professor at the Georgia Tech School of Interactive Computing, found that DALL-E could blend concepts (described as a key element of human creativity). Its visual reasoning ability is sufficient to solve Raven's Matrices (visual tests often administered to humans to measure intelligence). DALL-E 3 follows complex prompts with more accuracy and detail than its predecessors, and is able to generate more coherent and accurate text. DALL-E 3 is integrated into ChatGPT Plus. === Image modification === Given an existing image, DALL-E 2 and DALL-E 3 can produce "variations" of the image as individual outputs based on the original, as well as edit the image to modify or expand upon it. The "inpainting" and "outpainting" abilities of these models use context from an image to fill in missing areas using a medium consistent with the original, following a given prompt. For example, this can be used to insert a new subject into an image, or expand an image beyond its original borders. According to OpenAI, "Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image." === Technical limitations === DALL-E 2's language understanding has limits. It is sometimes unable to distinguish "A yellow book and a red vase" from "A red book and a yellow vase" or "A panda making latte art" from "Latte art of a panda". It generates images of an astronaut riding a horse when presented with the prompt "a horse riding an astronaut". It also fails to generate the correct images in a variety of circumstances. Requesting more than three objects, negation, numbers, and connected sentences may result in mistakes, and object features may appear on the wrong object. Additional limitations include generating text, ambigrams and other forms of typography, which often results in dream-like gibberish. The model also has a limited capacity to address scientific information, such as astronomy or medical imagery. == Ethical concerns == DALL-E 2's reliance on public datasets influences its results and leads to algorithmic bias in some cases, such as generating higher numbers of men than women for requests that do not mention gender. DALL-E 2's training data was filtered to remove violent and sexual imagery, but this was found to increase bias in some cases such as reducing the frequency of women being generated. OpenAI hypothesise that this may be because women were more likely to be sexualised in training data which caused the filter to influence results. In September 2022, OpenAI confirmed to The Verge that DALL-E invisibly inserts phrases into user prompts to address bias in results; for instance, "black man" and "Asian woman" are inserted into prompts that do not specify gender or race. OpenAI claims to address concerns for potential "racy content" – containing nudity or sexual content generation, with DALL-E 3 through input/output filters, blocklists, ChatGPT refusals, and model level interventions. However, DALL-E 3 continues to disproportionally represent people as White, female, and youthful. Users are able to somewhat remedy
Database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database. Before digital storage and retrieval of data became widespread, index cards were used for data storage in a wide range of applications and environments: in the home to record and store recipes, shopping lists, contact information and other organizational data; in business to record presentation notes, project research and notes, and contact information; in schools as flash cards or other visual aids; and in academic research to hold data such as bibliographical citations or notes in a card file. Professional book indexers used index cards in the creation of book indexes until they were replaced by indexing software in the 1980s and 1990s. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance. Computer scientists may classify database management systems according to the database models that they support. Relational databases became dominant in the 1980s. These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, collectively referred to as NoSQL, because they use different query languages. == Terminology and overview == Formally, a "database" refers to a set of related data accessed through the use of a "database management system" (DBMS), which is an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized. Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it. Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index) as size and usage requirements typically necessitate use of a database management system. Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups: Data definition – Creation, modification and removal of definitions that detail how the data is to be organized. Update – Insertion, modification, and deletion of the data itself. Retrieval – Selecting data according to specified criteria (e.g., a query, a position in a hierarchy, or a position in relation to other data) and providing that data either directly to the user, or making it available for further processing by the database itself or by other applications. The retrieved data may be made available in a more or less direct form without modification, as it is stored in the database, or in a new form obtained by altering it or combining it with existing data from the database. Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure. Both a database and its DBMS conform to the principles of a particular database model. "Database system" refers collectively to the database model, database management system, and database. Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large-volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions. Since DBMSs comprise a significant market, computer and storage vendors often take into account DBMS requirements in their own development plans. Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security. == History == The sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitude. These performance increases were enabled by the technology progress in the areas of processors, computer memory, computer storage, and computer networks. The concept of a database was made possible by the emergence of direct access storage media such as magnetic disks, which became widely available in the mid-1960s; earlier systems relied on sequential storage of data on magnetic tape. The subsequent development of database technology can be divided into three eras based on data model or structure: navigational, SQL/relational, and post-relational. The two main early navigational data models were the hierarchical model and the CODASYL model (network model). These were characterized by the use of pointers (often physical disk addresses) to follow relationships from one record to another. The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated in all large-scale data processing applications, and as of 2018 they remain dominant: IBM Db2, Oracle, MySQL, and Microsoft SQL Server are the most searched DBMS. The dominant database language, standardized SQL for the relational model, has influenced database languages for other data models. Object databases were developed in the 1980s to overcome the inconvenience of object–relational impedance mismatch, which led to the coining of the term "post-relational" and also the development of hybrid object–relational databases. The next generation of post-relational databases in the late 2000s became known as NoSQL databases, introducing fast key–value stores and document-oriented databases. A competing "next generation" known as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs. === 1960s, navigational DBMS === The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing. The Oxford English Dictionary cites a 1962 report by the System Development Corporation of California as the first to use the term "data-base" in a specific technical sense. As computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s a number of such systems had come into commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, the Integrated Data Store (IDS), founded the Database Task Group within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971, the Database Task Group delivered their standard, which generally became known as the CODASYL approach, and soon a number of commercial products based on this approach entered the market. The CODASYL approach of
Linde–Buzo–Gray algorithm
The Linde–Buzo–Gray algorithm (named after its creators Yoseph Linde, Andrés Buzo and Robert M. Gray, who designed it in 1980) is an iterative vector quantization algorithm to improve a small set of vectors (codebook) to represent a larger set of vectors (training set), such that it will be locally optimal. It combines Lloyd's Algorithm with a splitting technique in which larger codebooks are built from smaller codebooks by splitting each code vector in two. The core idea of the algorithm is that by splitting the codebook such that all code vectors from the previous codebook are present, the new codebook must be as good as the previous one or better. == Description == The Linde–Buzo–Gray algorithm may be implemented as follows: algorithm linde-buzo-gray is input: set of training vectors training, codebook to improve old-codebook output: codebook that is twice the size and better or as good as old-codebook new-codebook ← {} for each old-codevector in old-codebook do insert old-codevector into new-codebook insert old-codevector + 𝜖 into new-codebook where 𝜖 is a small vector return lloyd(new-codebook, training) algorithm lloyd is input: codebook to improve, set of training vectors training output: improved codebook do previous-codebook ← codebook clusters ← divide training into |codebook| clusters, where each cluster contains all vectors in training who are best represented by the corresponding vector in codebook for each cluster cluster in clusters do the corresponding code vector in codebook ← the centroid of all training vectors in cluster while difference in error representing training between codebook and previous-codebook > 𝜖 return codebook
Dr.Fill
Dr.Fill is a computer program that solves American-style crossword puzzles. It was developed by Matt Ginsberg and described by Ginsberg in an article in the Journal of Artificial Intelligence Research. Ginsberg claims in that article that Dr.Fill is among the top fifty crossword solvers in the world. == History == Dr.Fill participated in the 2012 American Crossword Puzzle Tournament, finishing 141st of approximately 650 entrants with a total score of just over 10,000 points. The appearance led to a variety of descriptions of Dr.Fill in the popular press, including The Economist, the San Francisco Chronicle and Gizmodo. A description of Dr.Fill appeared on the front page of the March 17, 2012 New York Times. Dr.Fill's score in 2013 improved to 10,550, which would have earned it 92nd place. Videos of the program solving the problems from the tournament are available on YouTube. The score in 2014 improved further to 10,790, which would have tied for 67th place. A video of the program solving the first six puzzles from that tournament, together with a talk given by Ginsberg describing its performance, can be found on YouTube. Dr.Fill has largely continued to improve since the 2014 event. In 2015, it scored 10,920 points and finished in 55th place. In 2016, it scored 11,205 points and finished in 41st place. In 2017, it scored 11,795 and finished in 11th place. In 2018, it scored 10,740 points, dropping to 78th place. Dr.Fill returned to "form" in 2019, once again scoring 11,795 and finishing in 14th place. The 2020 ACPT was cancelled due to COVID-19, and Dr.Fill participated as a non-competitor in the Boswords tournament instead. The program outperformed the humans, scoring 11,218 points (fast solves with a total of one mistake) while the best scoring human scored 10,994 points (slower solves but no mistakes). The 2021 ACPT was virtual, again due to COVID-19. The Dr.Fill effort was joined by the Berkeley NLP Group, creating a hybrid system named Berkeley Crossword Solver, and Dr.Fill won the main event, scoring 12,825 points with Erik Agard, the highest scoring human, scoring 12,810 points. The tournament was won by Tyler Hinman (12,760 points), who completed the championship puzzle perfectly in three minutes. Dr.Fill also completed that puzzle perfectly, but in 49 seconds. After winning the tournament, Ginsberg announced on August 8, 2021, that both he and Dr.Fill would be retiring from crosswords. == Algorithm == As described by Ginsberg, Dr.Fill works by converting a crossword to a weighted constraint satisfaction problem and then attempting to maximize the probability that the fill is correct. Probabilities for individual words or phrases in the puzzle are computed using relatively simple statistical techniques based on features such as previous appearances of the clue, number of Google hits for the fill, and so on. In doing this, Dr.Fill is attempting to solve a problem similar to that tackled by the Jeopardy!-playing program Watson; Dr.Fill runs on a laptop instead of a supercomputer and Ginsberg remarks that Watson is far more effective than Dr.Fill at solving this portion of the problem. Instead of computational horsepower, Dr.Fill relies on the constraints provided by crossing words to refine its answers. A variety of techniques from artificial intelligence are applied to attempt to find the most likely fill. These include a small amount of lookahead, limited discrepancy search, and postprocessing. Ginsberg remarks that postprocessing was chosen over branch and bound because the two techniques are mutually incompatible and postprocessing was found to be more effective in this domain.
Alexey Chervonenkis
Alexey Yakovlevich Chervonenkis (Russian: Алексей Яковлевич Червоненкис; 7 September 1938 – 22 September 2014) was a Soviet and Russian mathematician. Along with Vladimir Vapnik, he was one of the main developers of the Vapnik–Chervonenkis theory, also known as the "fundamental theory of learning", an important part of computational learning theory. Chervonenkis held joint appointments with the Russian Academy of Sciences and Royal Holloway, University of London. Alexey Chervonenkis got lost in Losiny Ostrov National Park on 22 September 2014, and later during a search operation was found dead near Mytishchi, a suburb of Moscow. He had died of hypothermia.
AFNLP
AFNLP (Asian Federation of Natural Language Processing Associations) is the organization for coordinating the natural language processing related activities and events in the Asia-Pacific region. == Foundation == AFNLP was founded on 4 October 2000. == Member Associations == ALTA – Australasian Language Technology Association ANLP Japan Association of Natural Language Processing ROCLING Taiwan ROC Computational Linguistics Society SIG-KLC Korea SIG-Korean Language Computing of Korea Information Science Society == Existing Asian Initiatives == NLPRS: Natural Language Processing Pacific Rim Symposium IRAL: International Workshop on Information Retrieval with Asian Languages PACLING: Pacific Association for Computational Linguistics PACLIC: Pacific Asia Conference on Language, Information and Computation PRICAI: Pacific Rim International Conference on AI ICCPOL: International Conference on Computer Processing of Oriental Languages ROCLING: Research on Computational Linguistics Conference == Conferences == IJCNLP-04: The 1st International Joint Conference on Natural Language Processing in Hainan Island, China IJCNLP-05: The 2nd International Joint Conference on Natural Language Processing in Jeju Island, Korea IJCNLP-08: The 3rd International Joint Conference on Natural Language Processing in Hyderabad, India ACL-IJCNLP-2009: Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and 4th International Joint Conference on Natural Language Processing (IJCNLP) in Singapore IJNCLP-11: The 5th International Joint Conference on Natural Language Processing in Chiang Mai, Thailand
International Journal on Artificial Intelligence Tools
The International Journal on Artificial Intelligence Tools was founded in 1992 and is published by World Scientific. It covers research on artificial intelligence (AI) tools, including new architectures, languages and algorithms. Topics include AI in Bioinformatics, Cognitive Informatics, Knowledge-Based/Expert Systems and Object-Oriented Programming for AI. == Abstracting and indexing == The journal is abstracted and indexed in: Inspec Science Citation Index Expanded ISI Alerting Services CompuMath Citation Index Current Contents/Engineering, Computing, and Technology