AI Assistant Jetbrains Plugin

AI Assistant Jetbrains Plugin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Speech recognition

Speech recognition (automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT)) is a sub-field of computational linguistics concerned with methods and technologies that translate spoken language into text or other interpretable forms. Speech recognition applications include voice user interfaces, where the user speaks to a device, which "listens" and processes the audio. Common voice applications include interpreting commands for calling, call routing, home automation, and aircraft control. These applications are called direct voice input. Productivity applications include searching audio recordings, creating transcripts, and dictation. Speech recognition can be used to analyse speaker characteristics, such as identifying native language using pronunciation assessment. Voice recognition (speaker identification) refers to identifying the speaker, rather than speech contents. Recognizing the speaker can simplify the task of translating speech in systems trained on a specific person's voice. It can also be used to authenticate the speaker as part of a security process. == History == Applications for speech recognition developed over many decades, with progress accelerated due to advances in deep learning and the use of big data. These advances are reflected in an increase in academic papers, and greater system adoption. Key areas of growth include vocabulary size, more accurate recognition for unfamiliar speakers (speaker independence), and faster processing speed. === Pre-1970 === 1952 – Bell Labs researchers, Stephen Balashek, R. Biddulph, and K. H. Davis, built Audrey for single-speaker digit recognition. Their system located the formants in the power spectrum of each utterance. 1960 – Gunnar Fant developed and published the source–filter model of speech production. 1962 – IBM's 16-word "Shoebox" machine's speech recognition debuted at the 1962 World's Fair. 1966 – Linear predictive coding, a speech coding method, was proposed by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone. 1969 – Funding at Bell Labs came to a halt for several years after the company's head engineer, John R. Pierce, wrote an open letter criticizing speech recognition research. This defunding lasted until Pierce retired and James L. Flanagan took over. Raj Reddy was the first person to work on continuous speech recognition, as a graduate student at Stanford University in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing chess. Around this time, Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary. DTW processed speech by dividing it into short frames (e.g. 10 ms segments) and treating each frame as a unit. Speaker independence, however, remained unsolved. === 1970–1990 === 1971 – DARPA funded a five-year speech recognition research project, Speech Understanding Research, seeking a minimum vocabulary size of 1,000 words. The project considered speech understanding a key to achieving progress in speech recognition, which was later disproved. BBN, IBM, Carnegie Mellon (CMU), and Stanford Research Institute participated. 1972 – The IEEE Acoustics, Speech, and Signal Processing group held a conference in Newton, Massachusetts. 1976 – The first ICASSP was held in Philadelphia, which became a major venue for publishing on speech recognition. During the late 1960s, Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis. A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the hidden Markov model (HMM) for speech recognition. James Baker had learned about HMMs while at the Institute for Defense Analysis. HMMs enabled researchers to combine sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model. By the mid-1980s, Fred Jelinek's team at IBM created a voice-activated typewriter called Tangora, which could handle a 20,000-word vocabulary. Jelinek's statistical approach placed less emphasis on emulating human brain processes in favor of statistical modelling. (Jelinek's group independently discovered the application of HMMs to speech.) This was controversial among linguists since HMMs are too simplistic to account for many features of human languages. However, the HMM proved to be a highly useful way for modelling speech and replaced dynamic time warping as the dominant speech recognition algorithm in the 1980s. 1982 – Dragon Systems, founded by James and Janet M. Baker, was one of IBM's few competitors. === Practical speech recognition === The 1980s also saw the introduction of the n-gram language model. 1987 – The back-off model enabled language models to use multiple-length n-grams, and CSELT used HMM to recognize languages (in software and hardware, e.g. RIPAC). At the end of the DARPA program in 1976, the best computer available to researchers was the PDP-10 with 4 MB of RAM. It could take up to 100 minutes to decode 30 seconds of speech. Practical products included: 1984 – the Apricot Portable was released with up to 4096 words support, of which only 64 could be held in RAM at a time. 1987 – a recognizer from Kurzweil Applied Intelligence 1990 – Dragon Dictate, a consumer product released in 1990. AT&T deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without a human operator. The technology was developed by Lawrence Rabiner and others at Bell Labs. By the early 1990s, the vocabulary of the typical commercial speech recognition system had exceeded the average human vocabulary. Reddy's former student, Xuedong Huang, developed the Sphinx-II system at CMU. Sphinx-II was the first to do speaker-independent, large vocabulary, continuous speech recognition, and it won DARPA's 1992 evaluation. Handling continuous speech with a large vocabulary was a major milestone. Huang later founded the speech recognition group at Microsoft in 1993. Reddy's student Kai-Fu Lee joined Apple, where, in 1992, he helped develop the Casper speech interface prototype. Lernout & Hauspie, a Belgium-based speech recognition company, acquired other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000. L&H was used in Windows XP. L&H was an industry leader until an accounting scandal destroyed it in 2001. L&H speech technology was bought by ScanSoft, which became Nuance in 2005. Apple licensed Nuance software for its digital assistant Siri. ==== 2000s ==== In the 2000s, DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002, followed by Global Autonomous Language Exploitation (GALE) in 2005. Four teams participated in EARS: IBM; a team led by BBN with LIMSI and the University of Pittsburgh; Cambridge University; and a team composed of ICSI, SRI, and the University of Washington. EARS funded the collection of the Switchboard telephone speech corpus, which contained 260 hours of recorded conversations from over 500 speakers. The GALE program focused on Arabic and Mandarin broadcast news. Google's first effort at speech recognition came in 2007 after recruiting Nuance researchers. Its first product, GOOG-411, was a telephone-based directory service. Since at least 2006, the U.S. National Security Agency has employed keyword spotting, allowing analysts to index large volumes of recorded conversations and identify speech containing "interesting" keywords. Other government research programs focused on intelligence applications, such as DARPA's EARS program and IARPA's Babel program. In the early 2000s, speech recognition was dominated by hidden Markov models combined with feed-forward artificial neural networks (ANN). Later, speech recognition was taken over by long short-term memory (LSTM), a recurrent neural network (RNN) published by Sepp Hochreiter & Jürgen Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks that require memories of events that happened thousands of discrete time steps earlier, which is important for speech. Around 2007, LSTMs trained with Connectionist Temporal Classification (CTC) began to outperform. In 2015, Google reported a 49 percent error‑rate reduction in its speech recognition via CTC‑trained LSTM. Transformers, a type of neural network based solely on attention, were adopted in computer vision and language modelling, and then to speech recognition. Deep feed-forward (non-recurrent) networks for acoustic modelling were introduced in 2009 by Geoffrey Hinton and his students at the University of Toronto, and by Li Deng and colleagues at Microsoft Research. In contrast to the prioer incremental improvements, deep learning decreased error rates by 30%. Both shallow and deep forms (e.g., recurrent nets) of ANNs had been explored since the 1980s. Howev
Read more →
Polyworld

Polyworld is a cross-platform (Linux, Mac OS X) program written by Larry Yaeger to evolve Artificial Intelligence through natural selection and evolutionary algorithms. It uses the Qt graphics toolkit and OpenGL to display a graphical environment in which a population of trapezoid agents search for food, mate, have offspring, and prey on each other. The population is typically only in the hundreds, as each individual is rather complex and the environment consumes considerable computer resources. The graphical environment is necessary since the individuals actually move around the 2-D plane and must be able to "see." Since some basic abilities, like eating carcasses or randomly generated food, seeing other individuals, mating or fighting with them, etc., are possible, a number of interesting behaviours have been observed to spontaneously arise after prolonged evolution, such as cannibalism, predators and prey, and mimicry. Each individual makes decisions based on a neural net using Hebbian learning; the neural net is derived from each individual's genome. The genome does not merely specify the wiring of the neural nets, but also determines their size, speed, color, mutation rate and a number of other factors. The genome is randomly mutated at a set probability, which are also changed in descendant organisms.
Read more →
Whisper (speech recognition system)

Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.
Read more →
WebCrow

The WebCrow is a research project carried out at the Information Engineering Department of the University of Siena with the purpose of automatically solving crosswords. == The Project == The scientific relevance of the project can be understood considering that cracking crosswords requires human-level knowledge. Unlike chess and related games and there is no closed world configuration space. A first nucleus of technology, such as search engines, information retrieval, and machine learning techniques enable computers to enfold with semantics real-life concepts. The project is based on a software system whose major assumption is to attack crosswords making use of the Web as its primary source of knowledge. WebCrow is very fast and often thrashes human challengers in competitions, especially on multi language crossword schemes. A distinct feature of the WebCrow software system is to combine properly natural language processing (NLP) techniques, the Google web search engine, and constraint satisfaction algorithms from artificial intelligence to acquire knowledge and to fill the schema. The most important component of WebCrow is the Web Search Module (WSM), which implements a domain specific web based question answering algorithm. The way WebCrow approaches crosswords solving is quite different with respect to humans: Whereas we tend to first answer clues we are sure of and then proceed filling the schema by exploiting the already answered clues as hints, WebCrow uses two clearly distinct stages. In the first one, it processes all the clues and tries to answer them all: For each clue it finds many possible candidates and sorts them according to complex ranking models mainly based on a probability criteria. In the second stage, WebCrow uses constraint satisfaction algorithms to fill the grid with the overall most likely combination of clue answers. In order to interact with Google, first of all, WebCrow needs to compose queries on the basis of the given clues. This is done by query expansion, whose purpose is to convert the clue into a query expressed by a simplified and more appropriate language for Google. The retrieved documents are parsed so as to extract a list of word candidates that are congruent with the crossword length constraints. Crosswords can hardly be faced by using encyclopedic knowledge only, since many clues are wordplays or are otherwise purposefully very ambiguous. This enigmatic component of crosswords is faced by a massive use of database of solved crosswords, and by automatic reasoning on a properly organized knowledge base of wired rules. Last but not the least, the final constraint satisfaction step is very effective to fill the correct candidate, even though, unlike humans, the system can not rely on very high confidence on the correctness of the answer. == Competitions == WebCrow speed and effectiveness has been tested many times in man-machine competitions on Italian, English and multi-language crosswords The outcome of the tests is that WebCrow can successfully compete with average human players on single language schemes and reaches expert level performance in multi-language crosswords. However, WebCrow has not reached expert level in single-language crosswords, yet. === ECAI-06 Competition === On August 30, 2006, at the European Conference on Artificial Intelligence (ECAI2006), 25 conference attendees and 53 internet connected crosswords lovers, competed with WebCrow in an official challenge organized within the conference program. The challenge consisted in 5 different crosswords (2 in Italian, 2 in English and one multi-language in Italian and English) and 15 minutes were assigned for each crossword. WebCrow ranked 21 out of 74 participants in the Italian competition, and won both the bilingual and English competitions. === Other Competitions === Several competitions have been held in Florence, Italy within the Creativity Festival in December 2006, and another official conference competition took place in Hyderabad, India in January 2007, within the International Conference of Artificial Intelligence, where it ranked second out of 25 participants.
Read more →
Mistral Vibe

Mistral Vibe or Vibe (Le Chat until May 2026), is a chatbot that uses generative artificial intelligence developed in France by Mistral AI. Mistral Vibe is available in iOS and Android. Its services are operated on a freemium model. == History == In February 2024, Mistral AI released Le Chat. In January 2025, Mistral AI made a content deal with Agence France-Presse (AFP) that lets Le Chat query AFP's entire archive dating back to 1983. On 6 February 2025, a mobile app for Le Chat was released for iOS and Android, and a subscription tier, Pro, was introduced at a cost of $14.99 per month. In July 2025, Mistral AI released Voxtral, an open-source language model that understands and generates audio. Mistral introduced a voice mode for chatting that uses Voxtral, and projects, which allows grouping chats and files. In September 2025, Le Chat introduced the capability to remember previous conversations. In May 2026, Mistral AI announced the rebrand from Le Chat to Mistral Vibe and new features were introduced at the same time.
Read more →
AI Safety Summit 2023

The AI Safety Summit 2023 was an international conference on the safety and regulation of artificial intelligence. Organized by the British government, it was held in November 2023 at Bletchley Park, Milton Keynes, England. The event was the first ever global summit on artificial intelligence. The event led to the release of the Bletchley Declaration, which focused on "identifying AI safety risks of shared concern" and "building respective risk-based policies" to "ensure that the benefits of the technology can be harnessed responsibly for good and for all." == Background == The prime minister of the United Kingdom at the time, Rishi Sunak, made AI one of the priorities of his government, announcing that the UK would host a global AI Safety conference in autumn 2023. == Venue == Bletchley Park was a World War II codebreaking facility established by the British government on the site of a Victorian manor and is in the British city of Milton Keynes. It has played an important role in the history of computing, with some of the first modern computers being built at the facility. == Outcomes == 28 countries at the summit, including the United States, China, Australia, and the European Union, have issued an agreement known as the Bletchley Declaration, calling for international co-operation to manage the challenges and risks of artificial intelligence. The Bletchley Declaration affirms that AI should be designed, developed, deployed, and used in a manner that is safe, human-centric, trustworthy and responsible. Emphasis has been placed on regulating "Frontier AI", a term for the latest and most powerful AI systems. Concerns that have been raised at the summit include the potential use of AI for terrorism, criminal activity, and warfare, as well as existential risk posed to humanity as a whole.The president of the United States, Joe Biden, signed an executive order requiring AI developers to share safety results with the US government. The US government also announced the creation of an American AI Safety Institute, as part of the National Institute of Standards and Technology. The tech entrepreneur Elon Musk and Sunak did a live interview on AI safety on 2 November on X. == Notable attendees == The following individuals attended the summit: Rishi Sunak, Prime Minister of the United Kingdom Kamala Harris, Vice President of the United States Charles III, King of the United Kingdom (attending virtually) Elon Musk, CEO of Tesla, owner of X, SpaceX, Neuralink, and xAI Giorgia Meloni, Prime Minister of Italy Ursula von der Leyen, President of the European Commission Sam Altman, CEO of OpenAI Nick Clegg, former British politician and president of global affairs at Meta Platforms Mustafa Suleyman, co-founder of DeepMind Michelle Donelan, UK secretary of state for Science, Innovation and Technology Věra Jourová, the European Commission’s vice-president for Values and Transparency Gina Raimondo, United States secretary of commerce Wu Zhaohui, Chinese vice-minister of science and technology == Global AI Summit series ==
Read more →
Sunspring

Sunspring is a 2016 experimental science fiction short film entirely written by an artificial intelligence bot using neural networks. It was conceived by BAFTA-nominated filmmaker Oscar Sharp and NYU AI researcher Ross Goodwin and produced by film production company, End Cue along with Allison Friedman and Andrew Swett. It stars Thomas Middleditch, Elisabeth Grey, and Humphrey Ker as three people, namely H, H2, and C, living in a future world and eventually connecting with each other through a love triangle. The script of the film was authored by a recurrent neural network called long short-term memory (LSTM) by an AI bot named Benjamin. Originally made for the Sci-Fi-London film festival's 48hr Challenge, it was released online by technology news website Ars Technica on 9 June 2016. == Premise == Sunspring narrates the story of three people - H (Middleditch), H2 (Grey), and C (Ker) - set in a futuristic world and entangled with murder and love. == Cast == Thomas Middleditch as H Elisabeth Grey as H2 Humphrey Ker as C == Production == Oscar Sharp originally created the film for the 48hr Film Challenge contest of Sci-Fi-London, a film festival which focuses on science fiction. For the challenge, contestants are given a set of prompts (mostly props and lines) that have to appear in a movie they make over the next two days. It eventually contested in the festival and was nominated among the final top ten films Sharp collaborated with his longtime associate Ross Goodwin, an AI researcher in New York University to create the AI bot, which was initially called Jetson. The bot, which later came to call itself Benjamin, wrote the screenplay including stage directions and dialog. The garbled script was then interpreted by Sharp who directed the actors to construe the plot points themselves and enact the play. According to Ars Technica, the final plot turned out to be a tale of romance and murder, set in a dark future world. === Benjamin, the automatic screenwriter === Called the world's first automatic screenwriter, Benjamin is a self-improving LSTM RNN machine intelligence trained on human screenplays conceived by Goodwin and Sharp. It was trained to write the screenplay by feeding it with a corpus of dozens of sci-fi screenplays found online—mostly movies from the 1980s and 90s. == Music == The film contains a song from Brooklyn-based electro-acoustic duo Tiger and Man, with lyrics written by Benjamin using a database of 30,000 folk songs. As well as a score written by composer Andrew Orkin. == Reception == CNet called it "a beautiful, bizarre sci-fi novelty." Critic Amanda Kooser said, "...probably won't start a rush for replacing human screenwriters with machines. Some day, neural networks may get better at imitating the art of coherent storytelling, but we're not there yet. That doesn't mean "Sunspring" isn't entertaining or worthy of viewing. It is. It's a thought experiment come to life, a novelty." As of April 2019, it has surpassed 1 million views on YouTube.
Read more →
Speech-generating device

Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies. There are several input and display methods for users of varying abilities to make use of SGDs. Some SGDs have multiple pages of symbols to accommodate a large number of utterances, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices can produce electronic voice output by using digitized recordings of natural speech or through speech synthesis—which may carry less emotional information but can permit the user to speak novel messages. The content, organization, and updating of the vocabulary on an SGD is influenced by a number of factors, such as the user's needs and the contexts that the device will be used in. The development of techniques to improve the available vocabulary and rate of speech production is an active research area. Vocabulary items should be of high interest to the user, be frequently applicable, have a range of meanings, and be pragmatic in functionality. There are multiple methods of accessing messages on devices: directly or indirectly, or using specialized access devices—although the specific access method will depend on the skills and abilities of the user. SGD output is typically much slower than speech, although rate enhancement strategies can increase the user's rate of output, resulting in enhanced efficiency of communication. The first known SGD was prototyped in the mid-1970s, and rapid progress in hardware and software development has meant that SGD capabilities can now be integrated into devices like smartphones. Notable users of SGDs include Stephen Hawking, Roger Ebert, Tony Proudfoot, and Pete Frates (founder of the ALS Ice Bucket Challenge). Speech-generating systems may be dedicated devices developed solely for AAC, or non-dedicated devices such as computers running additional software to allow them to function as AAC devices. == History == SGDs have their roots in early electronic communication aids. The first such aid was a sip-and-puff typewriter controller named the patient-operated selector mechanism (Naman) prototyped by Reg Maling in the United Kingdom in 1960. POSSUM scanned through a set of symbols on an illuminated display. Researchers at Delft University in the Netherlands created the lightspot-operated typewriter (LOT) in 1970, which made use of small movements of the head to point a small spot of light at a matrix of characters, each equipped with a photoelectric cell. Although it was commercially unsuccessful, the LOT was well received by its users. In 1966, Barry Romich, a freshman engineering student at Case Western Reserve University, and Ed Prentke, an engineer at Highland View Hospital in Cleveland, Ohio, formed a partnership, creating the Prentke Romich Company. In 1969, the company produced its first communication device, a typing system based on a discarded Teletype machine. In 1979, Mark Dahmke developed software for a vocal communication aid program using the Computalker CT-1 analog speech synthesizer with a microcomputer. The software utilized phonemes to generate speech, assisting individuals with communication impairments in constructing words and sentences. Dahmke's work contributed to the advancement of assistive technology for people with disabilities. Notably, he designed the "Vocabulary Management System" for Bill Rush, a student with cerebral palsy. This early speech synthesis technology facilitated improved communication for Rush and was featured in a 1980 issue of LIFE Magazine. Dahmke's contributions have influenced the development of augmentative and alternative communication (AAC) technologies. During the 1970s and early 1980s, several other companies emerged that have since become prominent manufacturers of SGDs. Toby Churchill founded Toby Churchill Ltd in 1973, after losing his speech following encephalitis. In the US, Dynavox (then known as Sentient Systems Technology) grew out of a student project at Carnegie-Mellon University, created in 1982 to help a young woman with cerebral palsy to communicate. Beginning in the 1980s, improvements in technology led to a greatly increased number, variety, and performance of commercially available communication devices, and a reduction in their size and price. Alternative methods of access such as Target Scanning (also known as eye pointing) calibrate the movement of a user's eyes to direct an SGD to produce the desired speech. Scanning, in which alternatives are presented to the user sequentially, became available on communication devices. Speech output possibilities included both digitized and synthesized speech. Rapid progress in hardware and software development continued, including projects funded by the European Community. The first commercially available dynamic screen speech-generating devices were developed in the 1990s. Software was developed that allowed the computer-based production of communication boards. High-tech devices have continued to become smaller and lighter, while increasing accessibility and capability; communication devices can be accessed using eye-tracking systems, perform as a computer for word-processing and Internet use, and as an environmental control device for independent access to other equipment such as TV, radio and telephones. Stephen Hawking came to be associated with the unique voice of his particular synthesis equipment. Hawking was unable to speak due to a combination of disabilities caused by ALS, and an emergency tracheotomy. In the past 20 or so years SGD have gained popularity amongst young children with speech deficiencies, such as autism, Down syndrome, and predicted brain damage due to surgery. Starting in the early 2000s, specialists saw the benefit of using SGDs not only for adults but for children, as well. Neuro-linguists found that SGDs were just as effective in helping children who were at risk for temporary language deficits after undergoing brain surgery as it is for patients with ALS. In particular, digitized SGDs have been used as communication aids for pediatric patients during the recovery process. == Access methods == There are many methods of accessing messages on devices: directly, indirectly, and with specialized access devices. Direct access methods involve physical contact with the system, by using a keyboard or a touch screen. Users accessing SGDs indirectly and through specialized devices must manipulate an object in order to access the system, such as maneuvering a joystick, head mouse, optical head pointer, light pointer, infrared pointer, or switch access scanner. The specific access method will depend on the skills and abilities of the user. With direct selection a body part, pointer, adapted mouse, joystick, or eye tracking could be used, whereas switch access scanning is often used for indirect selection. Unlike direct selection (e.g., typing on a keyboard, touching a screen), users of Target Scanning can only make selections when the scanning indicator (or cursor) of the electronic device is on the desired choice. Those who are unable to point typically calibrate their eyes to use eye gaze as a way to point and blocking as a way to select desired words and phrases. The speed and pattern of scanning, as well as the way items are selected, are individualized to the physical, visual and cognitive capabilities of the user. == Message construction == Augmentative and alternative communication is typically much slower than speech, with users generally producing 8–10 words per minute. Rate enhancement strategies can increase the user's rate of output to around 12–15 words per minute, and as a result enhance the efficiency of communication. In any given SGD there may be a large number of vocal expressions that facilitate efficient and effective communication, including greetings, expressing desires, and asking questions. Some SGDs have multiple pages of symbols to accommodate a large number of vocal expressions, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices generally display a set of selections either using a dynamically changing screen, or a fixed display. There are two main options for increasing the rate of communication for an SGD: encoding, and prediction. Encoding permits a user to produce a word, sentence or phrase using only on
Read more →
Learning curve (machine learning)

In machine learning (ML), a learning curve (or training curve) is a graphical representation that shows how a model's performance on a training set (and usually a validation set) changes with the number of training iterations (epochs) or the amount of training data. Typically, the number of training epochs or training set size is plotted on the x-axis, and the value of the loss function (and possibly some other metric such as the cross-validation score) on the y-axis. Synonyms include error curve, experience curve, improvement curve and generalization curve. More abstractly, learning curves plot the difference between learning effort and predictive performance, where "learning effort" usually means the number of training samples, and "predictive performance" means accuracy on testing samples. Learning curves have many useful purposes in ML, including: choosing model parameters during design, adjusting optimization to improve convergence, and diagnosing problems such as overfitting (or underfitting). Learning curves can also be tools for determining how much a model benefits from adding more training data, and whether the model suffers more from a variance error or a bias error. If both the validation score and the training score converge to a certain value, then the model will no longer significantly benefit from more training data. == Formal definition == When creating a function to approximate the distribution of some data, it is necessary to define a loss function L ( f θ ( X ) , Y ) {\displaystyle L(f_{\theta }(X),Y)} to measure how good the model output is (e.g., accuracy for classification tasks or mean squared error for regression). We then define an optimization process which finds model parameters θ {\displaystyle \theta } such that L ( f θ ( X ) , Y ) {\displaystyle L(f_{\theta }(X),Y)} is minimized, referred to as θ ∗ {\displaystyle \theta ^{}} . === Training curve for amount of data === If the training data is { x 1 , x 2 , … , x n } , { y 1 , y 2 , … y n } {\displaystyle \{x_{1},x_{2},\dots ,x_{n}\},\{y_{1},y_{2},\dots y_{n}\}} and the validation data is { x 1 ′ , x 2 ′ , … x m ′ } , { y 1 ′ , y 2 ′ , … y m ′ } {\displaystyle \{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1}',y_{2}',\dots y_{m}'\}} , a learning curve is the plot of the two curves i ↦ L ( f θ ∗ ( X i , Y i ) ( X i ) , Y i ) {\displaystyle i\mapsto L(f_{\theta ^{}(X_{i},Y_{i})}(X_{i}),Y_{i})} i ↦ L ( f θ ∗ ( X i , Y i ) ( X i ′ ) , Y i ′ ) {\displaystyle i\mapsto L(f_{\theta ^{}(X_{i},Y_{i})}(X_{i}'),Y_{i}')} where X i = { x 1 , x 2 , … x i } {\displaystyle X_{i}=\{x_{1},x_{2},\dots x_{i}\}} === Training curve for number of iterations === Many optimization algorithms are iterative, repeating the same step (such as backpropagation) until the process converges to an optimal value. Gradient descent is one such algorithm. If θ i ∗ {\displaystyle \theta _{i}^{}} is the approximation of the optimal θ {\displaystyle \theta } after i {\displaystyle i} steps, a learning curve is the plot of i ↦ L ( f θ i ∗ ( X , Y ) ( X ) , Y ) {\displaystyle i\mapsto L(f_{\theta _{i}^{}(X,Y)}(X),Y)} i ↦ L ( f θ i ∗ ( X , Y ) ( X ′ ) , Y ′ ) {\displaystyle i\mapsto L(f_{\theta _{i}^{}(X,Y)}(X'),Y')}
Read more →
Document AI

Document AI, also known as Document Intelligence, refers to a field of technology that employs machine learning (ML) techniques, such as natural language processing (NLP). These techniques are used to develop computer models capable of analyzing documents in a manner akin to human review. Through NLP, computer systems are able to understand relationships and contextual nuances in document contents, which facilitates the extraction of information and insights. Additionally, this technology enables the categorization and organization of the documents themselves. The applications of Document AI extend to processing and parsing a variety of semi-structured documents, such as forms, tables, receipts, invoices, tax forms, contracts, loan agreements, and financial reports. == Key features == Machine learning is utilized in Document AI to extract information from both printed and digital documents. This technology recognizes images, text, and characters in various languages, aiding in the extraction of insights from unstructured documents. The use of this technology can improve the speed and quality of decision-making in document analysis. Additionally, the automation of data extraction and validation can contribute to increased efficiency in document analysis processes. Since the early 2020s, the integration of large language models has extended Document AI beyond extraction toward generative tasks, including the automated drafting of forms, contracts, and document summaries. == Example == A business letter contains information in the form of text, as well as other types of information, such as the position of the text. For instance, a typical letter contains two addresses before the body of the text. The address at the very top (sometimes aligned to the right) is the sender address. This is normally followed by the date of the letter, with the place of writing. After this, the receiver address is listed. The distinction between the sender address and the receiver address is conveyed solely by the position of the address on the page, i.e. there is no textual indication like Sender: in front of the addresses. == Data dimensions and ML architecture == Data is typically distinguished into spatial data and time-series data, the former includes things like images, maps and graphs, while the latter includes signals such as stock prices or voice recordings. Document AI combines text data, which has a time dimension, with other types of data, such as the position of an address in a business letter, which is spatial. Historically in machine learning spatial data was analyzed using a convolutional neural network, and temporal data using a recurrent neural network. With the advent of dimension-type agnostic transformer architecture, these two different types of dimension can be more easily combined, Document AI is an example of this. == Benchmarks == Several public datasets are used to evaluate Document AI systems. FUNSD (Form Understanding in Noisy Scanned Documents) contains 199 annotated forms with token- and block-level labels for form understanding tasks. CORD (Consolidated Receipt Dataset) supports key information extraction from receipts. DocVQA contains approximately 50,000 questions over 12,000 document images for layout-aware visual question answering. == Common uses == Document AI systems are used to automate document processing and information extraction in business and financial workflows, including invoice and receipt processing, data entry automation, anomaly detection, mortgage processing, loan portfolio monitoring, credit risk management, and fraud detection such as counterfeit currency and fraudulent checks. They are also applied in regulatory compliance and contract analysis, including assessing changes in legal and regulatory documents. In real estate, Document AI supports document classification and structured information extraction for standardized processing and analytics. With the adoption of generative AI, Document AI systems can also generate and pre-fill structured documents such as contracts or business forms from natural language prompts.
Read more →
Darwin among the Machines

"Darwin among the Machines" is a letter to the editor published in The Press newspaper on 13 June 1863 in Christchurch, New Zealand. The title, which was chosen by the author, references the work of Charles Darwin. Written by Samuel Butler but signed Cellarius, the letter raised the possibility that machines were a kind of "mechanical life" undergoing constant evolution, and that eventually machines might supplant humans as the dominant species. == Book of the Machines == Butler developed this and subsequent articles into The Book of the Machines, three chapters of Erewhon, published anonymously in 1872. The Erewhonian society Butler envisioned had long ago undergone a revolution that destroyed most mechanical inventions. The narrator of the story finds a book that details the reasons for this revolution, which he translates for the reader. Despite the initial popularity of Erewhon, Butler commented in the preface to the second edition that reviewers had "in some cases been inclined to treat the chapters on Machines as an attempt to reduce Mr. Darwin's theory to an absurdity." He protested that "few things would be more distasteful to me than any attempt to laugh at Mr. Darwin", but also added "I am surprised, however, that the book at which such an example of the specious misuse of analogy would seem most naturally levelled should have occurred to no reviewer; neither shall I mention the name of the book here, though I should fancy that the hint given will suffice", which may suggest that the chapter on Machines was in fact a satire intended to illustrate the "specious misuse of analogy", even if the target was not Darwin; Butler, fearing that he had offended Darwin, wrote him a letter explaining that the actual target was Joseph Butler's 1736 The Analogy of Religion, Natural and Revealed, to the Constitution and Course of Nature. The Victorian scholar Herbert Sussman has suggested that although Butler's exploration of machine evolution was intended to be whimsical, he may also have been genuinely interested in the notion that living organisms are a type of mechanism and was exploring this notion with his writings on machines, while the philosopher Louis Flaccus called it "a mixture of fun, satire, and thoughtful speculation." == Evolution of Global Intelligence == George Dyson applies Butler's original premise to the artificial life and intelligence of Alan Turing in Darwin Among the Machines: The Evolution of Global Intelligence (1998) ISBN 0-7382-0030-1, to suggest that the internet is a living, sentient being. Dyson's main claim is that the evolution of a conscious mind from today's technology is inevitable. It is not clear whether this will be a single mind or multiple minds, how smart that mind would be, and even if we will be able to communicate with it. He also clearly suggests that there are forms of intelligence on Earth that we are currently unable to understand. From the book: "What mind, if any, will become apprehensive of the great coiling of ideas now under way is not a meaningless question, but it is still too early in the game to expect an answer that is meaningful to us."
Read more →
Knights of Sidonia

Knights of Sidonia (Japanese: シドニアの騎士, Hepburn: Shidonia no Kishi) is a Japanese manga series written and illustrated by Tsutomu Nihei. It was serialized by Kodansha's seinen manga magazine Monthly Afternoon between April 2009 and September 2015, with its chapters collected in 15 tankōbon volumes. It tells the story of Nagate Tanikaze, an "under-dweller" destined to become a Garde pilot, whose mission is to defend the generation ship Sidonia from a hostile alien species called Gauna. The manga was licensed for English release in North America by Vertical. An anime television series adaptation was produced by Polygon Pictures. The first season aired from April to June 2014; the second between April and June 2015. An anime film sequel titled Knights of Sidonia: Love Woven in the Stars premiered in June 2021. In 2015, Knights of Sidonia received the 39th Kodansha Manga Award in the general category, as well as the 47th Seiun Award in the Best Comic category in 2016. == Plot == === Setting === The story is set in the year 3394, a thousand years after mankind flees from Earth after it was destroyed by a race of shapeshifting aliens called the Gauna (奇居子（ガウナ）), aboard hundreds of colossal spacecraft created from the remains of the planet. One such ship is the Sidonia, which has developed its own human culture closely based on that of Japan where human cloning, asexual reproduction, and human genetic engineering, such as granting humans photosynthesis, are commonplace. It is also revealed that the top echelons of this society have secretly been granted immortality. With a population of over 500,000 people, Sidonia is possibly the last human settlement remaining, as the fates of the other ships are unknown. Little is known about the true nature of the Gauna or their motivation for attacking humanity. At any given time, a Gauna consists of a nearly impenetrable core protected by a dense layer of malleable flesh known as "placenta" (胞衣, ena). Once the ena is shed away and the core is destroyed, the Gauna's body disintegrates. While Sidonia itself is heavily armed with an arsenal of high-output beam cannons and mass cannons including slow but powerful planet-destroying warheads, it is primarily defended by large mechanized weapons called Gardes (衛人, Morito) whose weaponry and mobility is powered by "Higgs particles" (ヘイグス粒子, Heigusu Ryūshi), armed with a high-output beam cannon for long range assaults and a special spear known as "Kabizashi" for close combat. The tip of the kabizashi is made of a rare and little-understood material which has the unique property of being able to destroy a Gauna's core. Later the Gardes are also equipped with firearms whose ammunition have the same material of the Kabizashi after a means to artificially mass-produce it is discovered. Most people in the surviving human population are screened and drafted as Garde pilots at a young age, if they are shown to be capable of piloting them. === Story === The story follows the adventures of Garde pilot Nagate Tanikaze, who lived in the underground layer of Sidonia since birth and was raised by his grandfather. Never having met anyone else, he trains himself in an old Guardian pilot simulator every day, eventually mastering it. After his grandfather's death, he emerges to the surface and is selected as a Garde pilot, just as Sidonia is once again threatened by the Gauna. == Media == === Manga === Written and illustrated by Tsutomu Nihei, Knights of Sidonia was serialized in Kodansha's seinen manga magazine Monthly Afternoon from April 25, 2009, to September 25, 2015. It was compiled in 15 tankōbon volumes. The manga has been licensed in North America by Vertical, who released all 15 volumes in English between February 5, 2013, and April 26, 2016. === Anime === An anime television series adaptation, produced by Polygon Pictures, aired its first season from April 10 to June 26, 2014, on MBS and later on TBS, CBC and BS-TBS. The series was directed by Kōbun Shizuno, assisted by Hiroyuki Seshita, with scripts by Sadayuki Murai and character designs by Yuki Moriyama. The opening theme song is "Sidonia" (シドニア, Shidonia), performed by Angela, while the ending theme song is "Show" (掌 -show-, Shō), performed by Eri Kitamura. A second season aired from April 11 to June 26, 2015. For the second season, the opening theme song is "Kishi Kōshinkyoku" (騎士行進曲, Knight March), performed by Angela, while the ending theme song is "Requiem" (鎮魂歌 -レクイエム-, Rekuiemu), performed by CustomiZ. The series was localized and streamed by Netflix in all of its territories since July 4, 2014, becoming the service's first original anime, as well as the first anime series on Netflix available in Dolby Vision/HDR. The first season has been licensed for home video release by Sentai Filmworks. The second season was released on Netflix on July 3, 2015, and has been licensed by Sentai Filmworks for home video distribution. In July 2021, Funimation announced they acquired the streaming rights from Netflix to both seasons. === Films === A compilation film of the first season with additional scenes and re-edited sound effects was released on March 6, 2015. A new anime film, titled Knights of Sidonia: Love Woven in the Stars, was announced on July 3, 2020. Hiroyuki Seshita served as chief director, while Tadahiro Yoshihira served as director for the new film, with Polygon Pictures returning for production. Sadayuki Murai and Tetsuya Yamada returned to write scripts, while Shūji Katayama composed the music. The rest of the staff and cast returned to reprise their roles. The first four minutes of the film were shown on YouTube on April 28, 2021. The film was set to premiere on May 14, 2021, but was delayed to June 4, 2021, due to the COVID-19 pandemic. Funimation screened the film in international theaters starting on September 13, 2021. == Reception == === Manga === Knights of Sidonia won the 39th Kodansha Manga Award in the general category in 2015. The manga won the 47th Seiun Award in the Best Comic category in 2016. It also won the Best Seinen category at the 26th Salón del Manga de Barcelona in 2020. It was one of the Jury Recommended works in the Manga Division at the 17th Japan Media Arts Festival in 2013. The Young Adult Library Services Association listed Knights of Sidonia in its 2014 list of Top 10 Graphic Novels for Teens. Carlo Santos from Anime News Network gave the first manga volume a B, stating, "It is got a young man piloting a giant robot against alien enemies, but Knight of Sidonia is no Neon Genesis Evangelion. Yet it is not as bleak or incomprehensible as Tsutomu Nihei works like Blame! or Biomega, either—rather, it is the best of both worlds, bringing Nihei's hard sci-fi mentality into a more conventional space-adventure environment". === Anime === The anime series received positive reviews, even from famous members of the Japanese anime/game industry, like Hideo Kojima, creator of the Metal Gear series, who claims that "It's a kind of anime that we haven't seen for a while that has that sci-fi spirit. Using digital technology cultivated through games, it creates animation that encapsulates Japan's cultural assets like manga, cel animation, kanji, giant robots, etc. What's born is a unique made-in-Japan work that could never be cooked up in Hollywood. Japanese culture has lost its 'cool', and Knights of Sidonia will be the white knight that saves it". Other industry pros left acknowledgements as well, including Akiko Higashimura, Digitarou and Yoshinao Dao.
Read more →
Aporia (company)

Aporia is a machine learning observability platform based in Tel Aviv, Israel. The company has a US office located in San Jose, California. Aporia has developed software for monitoring and controlling undetected defects and failures used by other companies to detect and report anomalies, and warn in the early stages of faults. == History == Aporia was founded in 2019 by Liran Hason and Alon Gubkin. In April 2021, the company raised a $5 million seed round for its monitoring platform for ML models. In February 2022, the company closed a Series A round of $25 million for its ML observability platform. Aporia was named by Forbes as the Next Billion-Dollar Company in June 2022. In November, the company partnered with ClearML, an MLOPs platform, to improve ML pipeline optimization. In January 2023, Aporia launched Direct Data Connectors, a novel technology allowing organizations to monitor their ML models in minutes (previously the process of integrating ML monitoring into a customer’s cloud environment took weeks or more.) DDC (Direct Data Connectors) enables users to connect Aporia to their preferred data source and monitor all of their data at once, without data sampling or data duplication (which is a huge security risk for major organizations. In April 2023, Aporia announced the company partnered with Amazon Web Services (AWS) to provide more reliable ML observability to AWS consumers by deploying Aporia's architecture to their AWS environment, this will allow customers to monitor their models in production regardless of platform.
Read more →
Distributed artificial intelligence

Distributed Artificial Intelligence (DAI) (also called Decentralized Artificial Intelligence) is a melding of artificial intelligence with distributed computing. From artificial intelligence comes the theory and technology for constructing or analyzing an intelligent system. But where artificial intelligence uses psychology as a source of ideas, inspiration, and metaphor, DAI uses sociology, economics, and management science for inspiration. Where the focus of artificial intelligence is on the individual, the focus of DAI is on the group. Distributed computing provides the computational substrate on which this group focus can occur. Using techniques from artificial intelligence, communication theory, control theory, and interaction theory, it produces a cooperative solution to problems by a decentralized group of computational entities (agents). DAI is closely related to and a predecessor of the field of multi-agent systems. They are distinguished generally by multi-agent systems being open, where the entities might arise from different interests and have individual goals, and distributed artificial-intelligence systems, where the entities have common goals. There are numerous applications and tools. == Definition == Distributed Artificial Intelligence (DAI) is an approach to solving complex learning, planning, and decision-making problems. It is embarrassingly parallel, thus able to exploit large scale computation and spatial distribution of computing resources. These properties allow it to solve problems that require the processing of very large data sets. DAI systems consist of autonomous learning processing nodes (agents), that are distributed, often at a very large scale. DAI nodes can act independently, and partial solutions are integrated by communication between nodes, often asynchronously. By virtue of their scale, DAI systems are robust and elastic, and by necessity, loosely coupled. Furthermore, DAI systems are built to be adaptive to changes in the problem definition or underlying data sets due to the scale and difficulty in redeployment. DAI systems do not require all the relevant data to be aggregated in a single location, in contrast to monolithic or centralized Artificial Intelligence systems, which have tightly coupled and geographically close processing nodes. Therefore, DAI systems often operate on sub-samples or hashed impressions of very large datasets. In addition, the source dataset may change or be updated during the course of the execution of a DAI system. == Development == In 1975 distributed artificial intelligence emerged as a subfield of artificial intelligence that dealt with interactions of intelligent agents. As a scientific discipline, it progressed through a series of workshops in the USA (International Workshop on Distributed Artificial Intelligence, held in 13 editions from 1978 - 1994), Europe (Workshop on Modelling Autonomous Agents in a Multi-Agent World https://link.springer.com/conference/maamaw), and Asia (Multi-Agent and Cooperative Computation Workshop (MACC) https://sites.google.com/view/sig-macc/macc-workshop?authuser=0). Distributed artificial intelligence systems were conceived as a group of intelligent entities, called agents, that interacted by cooperation, by coexistence, or by competition. DAI is categorized into multi-agent systems and distributed problem solving. In multi-agent systems the main focus is how agents coordinate their knowledge and activities. For distributed problem solving the major focus is how the problem is decomposed and the solutions are synthesized. == Goals == The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence, especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires: A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled Coordination of the actions and communication of the nodes Subsamples of large data sets and online machine learning There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the following: Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters of computers can be used to speed up calculation. Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an abstraction for developing DPS systems. See below for further details. Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at macro level but also at micro level, as it is in many social simulation scenarios. == Approaches == Two types of DAI has emerged: In Multi-agent systems agents coordinate their knowledge and activities and reason about the processes of coordination. Agents are physical or virtual entities that can act, perceive their environment, and communicate with other agents. An agent is autonomous and has skills to achieve goals. The agents change the state of their environment by their actions. There are a number of different coordination techniques. In distributed problem solving the work is divided among nodes and the knowledge is shared. The main concerns are task decomposition and synthesis of the knowledge and solutions. DAI can apply a bottom-up approach to AI, similar to the subsumption architecture as well as the traditional top-down approach of AI. In addition, DAI can also be a vehicle for emergence. === Challenges === The challenges in Distributed AI are: How to carry out communication and interaction of agents and which communication language or protocols should be used. How to ensure the coherency of agents. How to synthesise the results among 'intelligent agents' group by formulation, description, decomposition and allocation. == Applications and tools == Areas where DAI have been applied are: Electronic commerce, e.g. for trading strategies the DAI system learns financial trading rules from subsamples of very large samples of financial data Networks, e.g. in telecommunications the DAI system controls the cooperative resources in a WLAN network Routing, e.g. model vehicle flow in transport networks Scheduling, e.g. flow shop scheduling where the resource management entity ensures local optimization and cooperation for global and local consistency Search engines, e.g. in LLM federated search like Ithy where document retrieval and analysis are distributed to DAI agents before aggregation Multi-Agent systems, e.g. artificial life, the study of simulated life Electric power systems, e.g. Condition Monitoring Multi-Agent System (COMMAS) applied to transformer condition monitoring, and IntelliTEAM II Automatic Restoration System DAI integration in tools has included: ECStar is a distributed rule-based learning system. == Agents == === Systems: Agents and multi-agents === Notion of Agents: Agents can be described as distinct entities with standard boundaries and interfaces designed for problem solving. Notion of Multi-Agents: Multi-Agent system is defined as a network of agents which are loosely coupled working as a single entity like society for problem solving that an individual agent cannot solve. === Software agents === The key concept used in DPS and MABS is the abstraction called software agents. An agent is a virtual (or physical) autonomous entity that has an understanding of its environment and acts upon it. An agent is usually able to communicate with other agents in the same system to achieve a common goal, that one agent alone could not achieve. This communication system uses an agent communication language. A first classification that is useful is to divide agents into: reactive agent – A reactive agent is not much more than an automaton that receives input, processes it and produces an output. deliberative agent – A deliberative agent in contrast should have an internal view of its environment and is able to follow its own plans. hybrid agent – A hybrid agent is a mixture of reactive and deliberative, that follows its own plans, but also sometimes directly reacts to external events without deliberation. Well-recognized agent architectures that describe how an agent is internally structured are: ASMO (emergence of distributed modules) BDI (Believe Desire Intention, a general architecture that describes how plans are made) InterRAP (A three-layer architecture, with a reactive, a deliberative and a social layer) PECS (Physics, Emotion, Cognition, Social, describes how those four parts influences the agents behavior). Soar (a rule-based approach)
Read more →
Taylor Swift deepfake pornography controversy

In late January 2024, sexually explicit AI-generated deepfake images of American musician Taylor Swift were proliferated on social media platforms 4chan and X (formerly Twitter). Several artificial images of Swift of a sexual or violent nature were quickly spread, with one post reported to have been seen over 47 million times before its eventual removal. The images led Microsoft to enhance Microsoft Designer's text-to-image model to prevent future abuse. Moreover, these images prompted responses from anti-sexual assault advocacy groups, US politicians, Swifties, and Microsoft CEO Satya Nadella, among others, and it has been suggested that Swift's influence could result in new legislation regarding the creation of deepfake pornography. A similar controversy emerged in August 2025, when The Verge reported AI image and video tool Grok Imagine generated sexually explicit images and videos of Swift from an otherwise innocuous text prompt. == Background == American musician Taylor Swift has been the target of misogyny and slut-shaming throughout her career. American technology corporation Microsoft offers AI image creators called Microsoft Designer and Bing Image Creator, which employ censorship safeguards to prevent users from generating unsafe or objectionable content. Members of a Telegram group discussed ways to circumvent these censors to create pornographic images of celebrities. Graphika, a disinformation research firm, traced the creation of the images back to a 4chan community. == Reactions == For some, the deepfake images of Swift immediately became a source of controversy and outrage. Other internet users found them humorous and absurd, such as the image making it appear as though Swift was to engage in sexual intercourse with Oscar the Grouch. The images drew condemnations from Rape, Abuse & Incest National Network and SAG-AFTRA. The latter group, who had been following issues regarding AI-generated media prior to Swift's involvement, considered the images "upsetting, harmful and deeply concerning." Microsoft CEO Satya Nadella, whose company's products were believed to be used to make these images, responded to the controversy as "alarming and terrible", further stating his belief that "we all benefit when the online world is a safe world." === Taylor Swift === A source close to Swift told the Daily Mail that she would be considering legal action, saying, "Whether or not legal action will be taken is being decided, but there is one thing that is clear: These fake AI-generated images are abusive, offensive, exploitative, and done without Taylor's consent and/or knowledge." === Politicians === White House press secretary Karine Jean-Pierre expressed concern over the counterfeit images, deeming them "alarming", and emphasized the obligation of social media platforms to curb the dissemination of misinformation. Several members of American politics called for legislation against AI-generated pornography. Later in the month, a bipartisan bill was introduced by US senators Dick Durbin, Lindsey Graham, Amy Klobuchar and Josh Hawley. The bill would allow victims to sue individuals who produced or possessed "digital forgeries" with intent to distribute, or those who received the material knowing it was made without consent. The European Union struck a deal in February 2024 on a similar bill that would criminalize deepfake pornography, as well as online harassment and revenge porn, by mid-2027. === Social media platforms === X responded to the sharing of these images on their own website with claims they would suspend accounts that participated in their spread. Despite this, the photos continued to be reshared among accounts of X, and spread to other platforms including Instagram and Reddit. X enforces a "synthetic and manipulated media policy", which has been criticized for its efficacy. They briefly blocked searches of Swift's name on January 27, 2024, reinstating them two days later. === Swifties === Fans of Taylor Swift, known as Swifties, responded to the circulation of these images by pushing the hashtag #ProtectTaylorSwift to trend on X. They also flooded other hashtags related to the images with more positive images and videos of her live performances. == Cultural significance == Deepfake pornography has remained highly controversial and has affected figures from other celebrities to ordinary people, most of whom are women. Journalists have opined that the involvement of a prominent public figure such as Swift in the dissemination of AI-generated pornography could bring public awareness and political reform to the issue.
Read more →