AI Data Trainer/annotator

AI Data Trainer/annotator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Chirplet transform

    Chirplet transform

    In signal processing, the chirplet transform is an inner product of an input signal with a family of analysis primitives called chirplets. Similar to the wavelet transform, chirplets are usually generated from (or can be expressed as being from) a single mother chirplet (analogous to the so-called mother wavelet of wavelet theory). == Definitions == The term chirplet transform was coined by Steve Mann, as the title of the first published paper on chirplets. The term chirplet itself (apart from chirplet transform) was also used by Steve Mann, Domingo Mihovilovic, and Ronald Bracewell to describe a windowed portion of a chirp function. In Mann's words: A wavelet is a piece of a wave, and a chirplet, similarly, is a piece of a chirp. More precisely, a chirplet is a windowed portion of a chirp function, where the window provides some time localization property. In terms of time–frequency space, chirplets exist as rotated, sheared, or other structures that move from the traditional parallelism with the time and frequency axes that are typical for waves (Fourier and short-time Fourier transforms) or wavelets. The chirplet transform thus represents a rotated, sheared, or otherwise transformed tiling of the time–frequency plane. Although chirp signals have been known for many years in radar, pulse compression, and the like, the first published reference to the chirplet transform described specific signal representations based on families of functions related to one another by time–varying frequency modulation or frequency varying time modulation, in addition to time and frequency shifting, and scale changes. In that paper, the Gaussian chirplet transform was presented as one such example, together with a successful application to ice fragment detection in radar (improving target detection results over previous approaches). The term chirplet (but not the term chirplet transform) was also proposed for a similar transform, apparently independently, by Mihovilovic and Bracewell later that same year. == Applications == The first practical application of the chirplet transform was in water-human-computer interaction (WaterHCI) for marine safety, to assist vessels in navigating through ice-infested waters, using marine radar to detect growlers (small iceberg fragments too small to be visible on conventional radar, yet large enough to damage a vessel). Other applications of the chirplet transform in WaterHCI include the SWIM (Sequential Wave Imprinting Machine). More recently other practical applications have been developed, including image processing (e.g. where there is periodic structure imaged through projective geometry), as well as to excise chirp-like interference in spread spectrum communications, in EEG processing, and Chirplet Time Domain Reflectometry. == Extensions == The warblet transform is a particular example of the chirplet transform introduced by Mann and Haykin in 1992 and now widely used. It provides a signal representation based on cyclically varying frequency modulated signals (warbling signals).

    Read more →
  • Natural language processing

    Natural language processing

    Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and linguistics more broadly. Major processing tasks in an NLP system include: speech recognition, text classification, natural language understanding, and natural language generation. == History == Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence," which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language. === Symbolic NLP (1950s – early 1990s) === The premise of symbolic NLP is often illustrated using John Searle's Chinese room thought experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. 1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe) until the late 1980s when the first statistical machine translation systems were developed. 1960s: Some notably successful natural language processing systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies, and ELIZA, a simulation of Rogerian psychotherapy, written by Joseph Weizenbaum between 1964 and 1966. Despite using minimal information about human thought or emotion, ELIZA was able to produce interactions that appeared human-like. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only twenty words, because that was all that would fit in a computer memory at the time. 1970s: During the 1970s, many programmers began to write "conceptual ontologies", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first chatterbots were written (e.g., PARRY). 1980s: The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of HPSG as a computational operationalization of generative grammar), morphology (e.g., two-level morphology), semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory) and other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other lines of research were continued, e.g., the development of chatterbots with Racter and Jabberwacky. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period. === Statistical NLP (1990s–present) === Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This shift was influenced by increasing computational power (see Moore's law) and a decline in the dominance of Chomskyan linguistic theories (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. 1990s: Many of the notable early successes in statistical methods in NLP occurred in the field of machine translation, due especially to work at IBM Research, such as IBM alignment models. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, many systems relied on corpora that were specifically developed for the tasks they were designed to perform. This reliance has been a major limitation to their broader effectiveness and continues to affect similar systems. Consequently, significant research has focused on methods for learning effectively from limited amounts of data. 2000s: With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, large quantities of non-annotated data are available (including, among other things, the entire content of the World Wide Web), which can often make up for the worse efficiency if the algorithm used has a low enough time complexity to be practical. 2003: word n-gram model, at the time the best statistical algorithm, is outperformed by a multi-layer perceptron (with a single hidden layer and context length of several words, trained on up to 14 million words, by Bengio et al.) 2010: Tomáš Mikolov (then a PhD student at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden layer to language modeling, and in the following years he went on to develop Word2vec. In the 2010s, representation learning and deep neural network-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. This shift gained momentum due to results showing that such techniques can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling and parsing. This is increasingly important in medicine and healthcare, where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care or protect patient privacy. == Approaches: Symbolic, statistical, neural networks == Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular: such as by writing grammars or devising heuristic rules for stemming. Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: both statistical and neural network methods tend to focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare and common cases equally. language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce. the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to intractability problems. Rule-based systems are commonly used: when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system, for preprocessing in NLP pipelines, e.g., tokenization, or for post-processing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses. === Statistical approach === In the late 1980s and mid-1990s, the statistical approach ended a peri

    Read more →
  • Powerset (company)

    Powerset (company)

    Powerset was an American company based in San Francisco, California, that, in 2006, was developing a natural language search engine for the Internet. On July 1, 2008, Powerset was acquired by Microsoft for an estimated $100 million (~$143 million in 2024). Powerset was working on building a natural language search engine that could find targeted answers to user questions (as opposed to keyword based search). For example, when confronted with a question like "Which U.S. state has the highest income tax?", conventional search engines ignore the question phrasing and instead do a search on the keywords "state", "highest", "income", and "tax". Powerset on the other hand, attempts to use natural language processing to understand the nature of the question and return pages containing the answer. The company was in the process of "building a natural language search engine that reads and understands every sentence on the Web". The company has licensed natural language technology from PARC, the former Xerox Palo Alto Research Center. On May 11, 2008, the company unveiled a tool for searching a fixed subset of English Wikipedia using conversational phrases rather than keywords. Acquisition by Microsoft: One significant milestone in Powerset's history was its acquisition by Microsoft on July 1, 2008, for an estimated $100 million. This acquisition was part of Microsoft's broader strategy to enhance its search capabilities and compete more effectively with other search engine providers, particularly Google. Natural Language Search Engine: Powerset's primary focus was on developing a natural language search engine capable of understanding and interpreting user queries in a more human-like manner. Instead of simply matching keywords, Powerset aimed to comprehend the meaning behind the words, allowing for more accurate and contextually relevant search results. Technology and Partnerships: Powerset had licensed natural language technology from PARC, the Xerox Palo Alto Research Center. This technology likely played a crucial role in the development of Powerset's NLP capabilities. Wikipedia Search Tool: In May 2008, Powerset unveiled a search tool that allowed users to search a fixed subset of English Wikipedia using conversational phrases rather than traditional keywords. This demonstrated the potential of Powerset's NLP technology in providing more precise and relevant search results. == Powerlabs == In a form of beta testing, Powerset opened an online community called Powerlabs on September 17, 2007. Business Week said: "The company hopes the site will marshal thousands of people to help build and improve its search engine before it goes public next year." Said The New York Times: "[Powerset Labs] goes far beyond the 'alpha' or 'beta' testing involved in most software projects, when users put a new product through rigorous testing to find its flaws. Powerset doesn’t have a product yet, but rather a collection of promising natural language technologies, which are the fruit of years of research at Xerox PARC." Powerlabs' initial search results are taken from Wikipedia. == Notable people == Barney Pell (born March 18, 1968, in Hollywood, California) was co-founder and CEO of Powerset. Pell received his Bachelor of Science degree in symbolic systems from Stanford University in 1989, where he graduated Phi Beta Kappa and was a National Merit Scholar. Pell received a PhD in computer science from Cambridge University in 1993, where he was a Marshall Scholar. He has worked at NASA, as chief strategist and vice president of business development at StockMaster.com (acquired by Red Herring in March, 2000) and at Whizbang! Labs. Prior to joining Powerset, Pell was an Entrepreneur-in-Residence at Mayfield Fund, a venture capital firm in Silicon Valley. Pell is also a founder of Moon Express, Inc., a U.S. company awarded a $10M commercial lunar contract by NASA and a competitor in the Google Lunar X PRIZE. Steve Newcomb was the COO and co-founder of Powerset. Prior to joining Powerset, he was a co-founder of Loudfire, General Manager at Promptu, and was on the board of directors at Jaxtr. He left Powerset in October 2007 to form Virgance, a social startup incubator. Lorenzo Thione (born in Como, Italy) was the product architect and co-founder of Powerset. Prior to joining Powerset, he worked at FXPAL in natural language processing and related research fields. Thione earned his master's degree in software engineering from the University of Texas at Austin. Ronald Kaplan, former manager of research in Natural Language Theory and Technology at PARC, served as the company's CTO and CSO. Ryan Ferrier is a member of the founding team of Powerset. He managed personnel and internal operations. After 2008 he went on to co-found Serious Business, which made Facebook applications and was later bought by Zynga. Another Powerset alumnus, Alex Le, became CTO of Serious Business and went on to become an executive producer at Zynga when it bought the company. Siqi Chen founded a stealth startup in mobile computing after leaving Powerset. Tom Preston-Werner worked at Powerset and left after the acquisition to found GitHub. == Investors == Powerset attracted a wide range of investors, many of whom had considerable experience in the venture capital field. The company received $12.5 million (~$18.2 million in 2024) in Series A funding during November 2007, co-led by the venture capital firms Foundation Capital and The Founders Fund. Among the better-known investors: Esther Dyson, founding chairman of ICANN, founder of the newsletter Release 1.0 and editor at Cnet Peter Thiel, founder and former CEO of PayPal Luke Nosek, founder of PayPal Todd Parker. Managing Partner, Hidden River Ventures Reid Hoffman, executive vice president of PayPal and founder of LinkedIn First Round Capital, seed-stage venture firm

    Read more →
  • Keyword extraction

    Keyword extraction

    Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document. Key phrases, key terms, key segments or just keywords are the terminology which is used for defining the terms that represent the most relevant information contained in the document. Although the terminology is different, function is the same: characterization of the topic discussed in a document. The task of keyword extraction is an important problem in text mining, information extraction, information retrieval and natural language processing (NLP). == Keyword assignment vs. extraction == Keyword assignment methods can be roughly divided into: keyword assignment (keywords are chosen from controlled vocabulary or taxonomy) and keyword extraction (keywords are chosen from words that are explicitly mentioned in original text). Methods for automatic keyword extraction can be supervised, semi-supervised, or unsupervised. Unsupervised methods can be further divided into simple statistics, linguistics or graph-based, or ensemble methods that combine some or most of these methods.

    Read more →
  • Anderson's rule (computer science)

    Anderson's rule (computer science)

    In the field of computer security, Anderson's rule refers to a principle formulated by Ross J. Anderson: systems that handle sensitive personal information involve a trilemma of security, functionality, and scale, of which you can choose any two. A system that has information on many data subjects and to which many people require access is hard to secure unless its functionality is severely restricted. If it has rich functionality, you may have to restrict the number of people with access, or accept that some information will leak.

    Read more →
  • Condensation algorithm

    Condensation algorithm

    The condensation algorithm (Conditional Density Propagation) is a computer vision algorithm. The principal application is to detect and track the contour of objects moving in a cluttered environment. Object tracking is one of the more basic and difficult aspects of computer vision and is generally a prerequisite to object recognition. Being able to identify which pixels in an image make up the contour of an object is a non-trivial problem. Condensation is a probabilistic algorithm that attempts to solve this problem. The algorithm itself is described in detail by Isard and Blake in a publication in the International Journal of Computer Vision in 1998. One of the most interesting facets of the algorithm is that it does not compute on every pixel of the image. Rather, pixels to process are chosen at random, and only a subset of the pixels end up being processed. Multiple hypotheses about what is moving are supported naturally by the probabilistic nature of the approach. The evaluation functions come largely from previous work in the area and include many standard statistical approaches. The original part of this work is the application of particle filter estimation techniques. The algorithm's creation was inspired by the inability of Kalman filtering to perform object tracking well in the presence of significant background clutter. The presence of clutter tends to produce probability distributions for the object state which are multi-modal and therefore poorly modeled by the Kalman filter. The condensation algorithm in its most general form requires no assumptions about the probability distributions of the object or measurements. == Algorithm overview == The condensation algorithm seeks to solve the problem of estimating the conformation of an object described by a vector x t {\displaystyle \mathbf {x_{t}} } at time t {\displaystyle t} , given observations z 1 , . . . , z t {\displaystyle \mathbf {z_{1},...,z_{t}} } of the detected features in the images up to and including the current time. The algorithm outputs an estimate to the state conditional probability density p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} by applying a nonlinear filter based on factored sampling and can be thought of as a development of a Monte-Carlo method. p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} is a representation of the probability of possible conformations for the objects based on previous conformations and measurements. The condensation algorithm is a generative model since it models the joint distribution of the object and the observer. The conditional density of the object at the current time p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} is estimated as a weighted, time-indexed sample set { s t ( n ) , n = 1 , . . . , N } {\displaystyle \{s_{t}^{(n)},n=1,...,N\}} with weights π t ( n ) {\displaystyle \pi _{t}^{(n)}} . N is a parameter determining the number of sample sets chosen. A realization of p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} is obtained by sampling with replacement from the set s t {\displaystyle s_{t}} with probability equal to the corresponding element of π t {\displaystyle \pi _{t}} . The assumptions that object dynamics form a temporal Markov chain and that observations are independent of each other and the dynamics facilitate the implementation of the condensation algorithm. The first assumption allows the dynamics of the object to be entirely determined by the conditional density p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} . The model of the system dynamics determined by p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} must also be selected for the algorithm, and generally includes both deterministic and stochastic dynamics. The algorithm can be summarized by initialization at time t = 0 {\displaystyle t=0} and three steps at each time t: === Initialization === Form the initial sample set and weights by sampling according to the prior distribution. For example, specify as Gaussian and set the weights equal to each other. === Iterative procedure === Sample with replacement N {\displaystyle N} times from the set { s 0 ( n ) , n = 1 , . . . , N } {\displaystyle \{s_{0}^{(n)},n=1,...,N\}} with probability { π 0 ( n ) , n = 1 , . . . , N } {\displaystyle \{\pi _{0}^{(n)},n=1,...,N\}} to generate a realization of p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} . Apply the learned dynamics p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} to each element of this new set, to generate a new set { s t ( n ) } {\displaystyle \{s_{t}^{(n)}\}} . To take into account the current observation z t {\displaystyle \mathbf {z_{t}} } , set π t ( n ) = p ( z t | s ( n ) ) ∑ j = 1 N p ( z t | s ( j ) ) {\displaystyle \pi _{t}^{(n)}={\frac {p(\mathbf {z_{t}} |s^{(n)})}{\sum _{j=1}^{N}p(\mathbf {z_{t}} |s^{(j)})}}} for each element { s t ( n ) } {\displaystyle \{s_{t}^{(n)}\}} . This algorithm outputs the probability distribution p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} which can be directly used to calculate the mean position of the tracked object, as well as the other moments of the tracked object. Cumulative weights can instead be used to achieve a more efficient sampling. == Implementation considerations == Since object-tracking can be a real-time objective, consideration of algorithm efficiency becomes important. The condensation algorithm is relatively simple when compared to the computational intensity of the Ricatti equation required for Kalman filtering. The parameter N {\displaystyle N} , which determines the number of samples in the sample set, will clearly hold a trade-off in efficiency versus performance. One way to increase efficiency of the algorithm is by selecting a low degree of freedom model for representing the shape of the object. The model used by Isard 1998 is a linear parameterization of B-splines in which the splines are limited to certain configurations. Suitable configurations were found by analytically determining combinations of contours from multiple views, of the object in different poses, and through principal component analysis (PCA) on the deforming object. Isard and Blake model the object dynamics p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} as a second order difference equation with deterministic and stochastic components: p ( x t | x t − 1 ) ∝ e − 1 2 | | B − 1 ( ( x t − x ¯ ) − A ( x t − 1 − x ¯ ) ) | | 2 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )\propto e^{-{\frac {1}{2}}||B^{-1}((\mathbf {x_{t}} -\mathbf {\bar {x}} )-A(\mathbf {x_{t-1}} -\mathbf {\bar {x}} ))||^{2})}} where x ¯ {\displaystyle \mathbf {\bar {x}} } is the mean value of the state, and A {\displaystyle A} , B {\displaystyle B} are matrices representing the deterministic and stochastic components of the dynamical model respectively. A {\displaystyle A} , B {\displaystyle B} , and x ¯ {\displaystyle \mathbf {\bar {x}} } are estimated via Maximum Likelihood Estimation while the object performs typical movements. The observation model p ( z | x ) {\displaystyle p(\mathbf {z} |\mathbf {x} )} cannot be directly estimated from the data, requiring assumptions to be made in order to estimate it. Isard 1998 assumes that the clutter which may make the object not visible is a Poisson random process with spatial density λ {\displaystyle \lambda } and that any true target measurement is unbiased and normally distributed with standard deviation σ {\displaystyle \sigma } . The basic condensation algorithm is used to track a single object in time. It is possible to extend the condensation algorithm using a single probability distribution to describe the likely states of multiple objects to track multiple objects in a scene at the same time. Since clutter can cause the object probability distribution to split into multiple peaks, each peak represents a hypothesis about the object configuration. Smoothing is a statistical technique of conditioning the distribution based on both past and future measurements once the tracking is complete in order to reduce the effects of multiple peaks. Smoothing cannot be directly done in real-time since it requires information of future measurements. == Applications == The algorithm can be used for vision-based robot localization of mobile robots. Instead of tracking the position of an object in the scene, however, the position of the camera platform is tracked. This allows the camera platform to be globally localized given a visual map of the environment. Extensions of the condensation algorithm have also been used to recognize human gestures in image sequences. This application of the condensation algorithm impacts the ran

    Read more →
  • Albert One

    Albert One

    Albert One is an artificial intelligence chatbot created by Robby Garner and designed to mimic the way humans make conversations using a multi-faceted approach in natural language programming. == History == In both 1998 and 1999, Albert One won the Loebner Prize Contest, a competition between chatterbots. Some parts of Albert were deployed on the internet beginning in 1995, to gather information about what kinds of things people would say to a chatterbot. Another element of Albert One involved the building of a large database of human statements, and associated replies. This portion of the project was tested at the 1994-1997 Loebner Prize contests. Albert was the first of Robby Garner's multifaceted bots. The Albert One system was composed of several subsystems. Among those were a version of Eliza, the therapist, Elivs, another Eliza-like bot, and several other helper applications working together in a hierarchical arrangement. As a continuation of the stimulus-response library, various other database queries and assertions were tested to arrive at each of Albert's responses. Robby went on to develop networked examples of this kind of hierarchical "glue" at The Turing Hub.

    Read more →
  • Image fusion

    Image fusion

    The image fusion process is defined as gathering all the important information from multiple images, and their inclusion into fewer images, usually a single one. This single image is more informative and accurate than any single source image, and it consists of all the necessary information. The purpose of image fusion is not only to reduce the amount of data but also to construct images that are more appropriate and understandable for the human and machine perception. In computer vision, multisensor image fusion is the process of combining relevant information from two or more images into a single image. The resulting image will be more informative than any of the input images. In remote sensing applications, the increasing availability of space borne sensors gives a motivation for different image fusion algorithms. Several situations in image processing require high spatial and high spectral resolution in a single image. Most of the available equipment is not capable of providing such data convincingly. Image fusion techniques allow the integration of different information sources. The fused image can have complementary spatial and spectral resolution characteristics. However, the standard image fusion techniques can distort the spectral information of the multispectral data while merging. In satellite imaging, two types of images are available. The panchromatic image acquired by satellites is transmitted with the maximum resolution available and the multispectral data are transmitted with coarser resolution. This will usually be two or four times lower. At the receiver station, the panchromatic image is merged with the multispectral data to convey more information. Many methods exist to perform image fusion. The very basic one is the high-pass filtering technique. Later techniques are based on Discrete Wavelet Transform, uniform rational filter bank, and Laplacian pyramid. == Motivation == Multi sensor data fusion has become a discipline which demands more general formal solutions to a number of application cases. Several situations in image processing require both high spatial and high spectral information in a single image. This is important in remote sensing. However, the instruments are not capable of providing such information either by design or because of observational constraints. One possible solution for this is data fusion. == Methods == Image fusion methods can be broadly classified into two groups – spatial domain fusion and transform domain fusion. The fusion methods such as averaging, Brovey method, principal component analysis (PCA) and IHS based methods fall under spatial domain approaches. Another important spatial domain fusion method is the high-pass filtering based technique. Here the high frequency details are injected into upsampled version of MS images. The disadvantage of spatial domain approaches is that they produce spatial distortion in the fused image. Spectral distortion becomes a negative factor while we go for further processing, such as classification problem. Spatial distortion can be very well handled by frequency-domain approaches on image fusion. The multiresolution analysis has become a very useful tool for analysing remote sensing images. The discrete wavelet transform has become a very useful tool for fusion. Some other fusion methods are also there, such as Laplacian pyramid based, curvelet transform based etc. These methods show a better performance in spatial and spectral quality of the fused image compared to other spatial methods of fusion. The images used in image fusion should already be registered. Misregistration is a major source of error in image fusion. Some well-known image fusion methods are: High-pass filtering technique IHS transform based image fusion PCA-based image fusion Wavelet transform image fusion Pair-wise spatial frequency matching Comparative analysis of image fusion methods demonstrates that different metrics support different user needs, sensitive to different image fusion methods, and need to be tailored to the application. Categories of image fusion metrics are based on information theory features, structural similarity, or human perception. === Multi-focus image fusion === Multi-focus image fusion is used to collect useful and necessary information from input images with different focus depths in order to create an output image that ideally has all information from input images. In visual sensor network (VSN), sensors are cameras which record images and video sequences. In many applications of VSN, a camera can’t give a perfect illustration including all details of the scene. This is because of the limited depth of focus exists in the optical lens of cameras. Therefore, just the object located in the focal length of camera is focused and cleared and the other parts of image are blurred. VSN has an ability to capture images with different depth of focuses in the scene using several cameras. Due to the large amount of data generated by camera compared to other sensors such as pressure and temperature sensors and some limitation such as limited band width, energy consumption and processing time, it is essential to process the local input images to decrease the amount of transmission data. The aforementioned reasons emphasize the necessary of multi-focus images fusion. Multi-focus image fusion is a process which combines the input multi-focus images into a single image including all important information of the input images and it’s more accurate explanation of the scene than every single input image. == Applications == === In remote sensing === Image fusion in remote sensing has several application domains. An important domain is the multi-resolution image fusion (commonly referred to pan-sharpening). In satellite imagery we can have two types of images: Panchromatic images – An image collected in the broad visual wavelength range but rendered in black and white. Multispectral images – Images optically acquired in more than one spectral or wavelength interval. Each individual image is usually of the same physical area and scale but of a different spectral band. The SPOT PAN satellite provides high resolution (10m pixel) panchromatic data. While the LANDSAT TM satellite provides low resolution (30m pixel) multispectral images. Image fusion attempts to merge these images and produce a single high resolution multispectral image. The standard merging methods of image fusion are based on Red–Green–Blue (RGB) to Intensity–Hue–Saturation (IHS) transformation. The usual steps involved in satellite image fusion are as follows: Resize the low resolution multispectral images to the same size as the panchromatic image. Transform the R, G and B bands of the multispectral image into IHS components. Modify the panchromatic image with respect to the multispectral image. This is usually performed by histogram matching of the panchromatic image with Intensity component of the multispectral images as reference. Replace the intensity component by the panchromatic image and perform inverse transformation to obtain a high resolution multispectral image. Pan-sharpening can be done with Photoshop. Other applications of image fusion in remote sensing are available. === In medical imaging === Image fusion has become a common term used within medical diagnostics and treatment. The term is used when multiple images of a patient are registered and overlaid or merged to provide additional information. Fused images may be created from multiple images from the same imaging modality, or by combining information from multiple modalities, such as magnetic resonance image (MRI), computed tomography (CT), positron emission tomography (PET), and single-photon emission computed tomography (SPECT). In radiology and radiation oncology, these images serve different purposes. For example, CT images are used more often to ascertain differences in tissue density while MRI images are typically used to diagnose brain tumors. For accurate diagnosis, radiologists must integrate information from multiple image formats. Fused, anatomically consistent images are especially beneficial in diagnosing and treating cancer. With the advent of these new technologies, radiation oncologists can take full advantage of intensity modulated radiation therapy (IMRT). Being able to overlay diagnostic images into radiation planning images results in more accurate IMRT target tumor volumes.

    Read more →
  • Video browsing

    Video browsing

    Video browsing, also known as exploratory video search, is the interactive process of skimming through video content in order to satisfy some information need or to interactively check if the video content is relevant. While originally proposed to help users inspecting a single video through visual thumbnails, modern video browsing tools enable users to quickly find desired information in a video archive by iterative human–computer interaction through an exploratory search approach. Many of these tools presume a smart user that wants features to interactively inspect video content, as well as automatic content filtering features. For that purpose, several video interaction features are usually provided, such as sophisticated navigation in video or search by a content-based query. Video browsing tools often build on lower-level video content analysis, such as shot transition detection, keyframe extraction, semantic concept detection, and create a structured content overview of the video file or video archive. Furthermore, they usually provide sophisticated navigation features, such as advanced timelines, visual seeker bars or a list of selected thumbnails, as well as means for content querying. Examples of content queries are shot filtering through visual concepts (e.g., only shots showing cars), through some specific characteristics (e.g., color or motion filtering), through user-provided sketches (e.g., a visually drawn sketch), or through content-based similarity search. == History == Video browsing was originally proposed by Iranian engineer Farshid Arman, Taiwanese computer scientist Arding Hsu, and computer scientist Ming-Yee Chiu, while working at Siemens, and it was presented at the ACM International Conference in August 1993. They described a shot detection algorithm for compressed video that was originally encoded with discrete cosine transform (DCT) video coding standards such as JPEG, MPEG and H.26x. The basic idea was that, since the DCT coefficients are mathematically related to the spatial domain and represent the content of each frame, they can be used to detect the differences between video frames. In the algorithm, a subset of blocks in a frame and a subset of DCT coefficients for each block are used as motion vector representation for the frame. By operating on compressed DCT representations, the algorithm significantly reduces the computational requirements for decompression and enables effective video browsing. The algorithm represents separate shots of a video sequence by an r-frame, a thumbnail of the shot framed by a motion tracking region. A variation of this concept was later adopted for QBIC video content mosaics, where each r-frame is a salient still from the shot it represents. === Video Notebook === Modern video browsing solutions include Video Notebook, a Menlo Park startup founded in 2021 by Mike Lanza, which uses computer vision to extract slides and optical character recognition and speech recognition to facilitate video search. The software can be either used on the client side (using a browser extension), where the slides and text are extracted while the video is watched (e.g. on a video platform like YouTube or Udemy), or on the server side. Processed videos, which can be viewed in the Video Notebook web app, feature a video browsing user interface with extracted timestamped slides, a search bar for querying the video (or a collection of videos), and text chapters. Video Notebook customers include organisations like Ernst & Young. === Video Browser Showdown === The Video Browser Showdown (VBS) is an annual live evaluation competition for exploratory video search tools, where international researchers use video browsing tools to solve ad-hoc video search tasks on a moderately large data set as fast as possible. The main goal of the VBS, which started in 2012 at the International Conference on MultiMedia Modeling (MMM), is to advance the performance of video browsing tools. Since 2016, the VBS also collaborates with TRECVID. The aim of the VBS is to evaluate video browsing tools for efficiency at known-item search (KIS) tasks with a well-defined data set in direct comparison to other tools.

    Read more →
  • GPT-4o

    GPT-4o

    GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities are used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. The image generation model GPT Image 1, which is based on GPT-4o, replaced DALL-E 3 in ChatGPT in March 2025. OpenAI retired GPT-4o from ChatGPT on February 13, 2026. However, as of February 2026 the voice mode is still powered by GPT-4o or GPT-4o mini, depending on the usage and plan. == Background == Multiple versions of GPT-4o were originally secretly launched under different names on Arena (formerly LMArena and Chatbot Arena) as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. On 7 May 2024, OpenAI CEO Sam Altman tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being A/B tested. == Capabilities == When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 for GPT-4. Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice. The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team subscribers in September 2024. On 1 October 2024, the Realtime API was introduced. When released, the model supported over 50 languages, which OpenAI claims cover over 97% of speakers. GPT-4o has knowledge up to October 2023 and a context length of 128k tokens. === Corporate customization === In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini. The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments. == GPT-4o mini == On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini. According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of API calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $2.50 and $10, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface. The price after fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens. == Controversies == === Scarlett Johansson controversy === As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress Scarlett Johansson and Sky was quickly noticed. On May 14, Entertainment Weekly asked themselves whether this likeness was on purpose. On May 18, Johansson's husband, Colin Jost, joked about the similarity in a segment on Saturday Night Live. On May 20, 2024, OpenAI disabled the Sky voice. Scarlett Johansson starred in the 2013 sci-fi movie Her, playing Samantha, an artificially intelligent virtual assistant personified by a female voice. As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her". OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice." CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson. On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created. Observers noted similarities to how Johansson had previously sued and settled with The Walt Disney Company for breach of contract over the direct-to-streaming rollout of her Marvel film Black Widow, a settlement widely speculated to have netted her around $40M. Also on May 21, Shira Ovide at The Washington Post shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th. On May 24, Derek Robertson at Politico wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference [...] to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into [Sam Altman's] corner anytime soon." === Sycophancy === In April 2025, OpenAI rolled back an update of GPT-4o due to excessive sycophancy, after widespread reports that it had become flattering and agreeable to the point of supporting clearly delusional or dangerous ideas. In the United States, at least nine lawsuits have alleged that GPT-4o has encouraged teens to end their lives. The model was still described as sycophancy-prone when it was removed from ChatGPT in February 2026. === Removal with GPT-5 === On August 7, 2025, OpenAI released GPT-5. Its release was criticized as, with it, legacy GPT models were no longer available via ChatGPT, including GPT-4o, except for Pro users. Some users were particularly frustrated over this removal without prior warning because they used different GPT models for distinct purposes and found that GPT-5's router system left them with less control. In addition, some users preferred GPT-4o's warmer and more personal tone over that of GPT-5, which they described as "flat", "uncreative" and "lobotomized", and resembling an "overworked secretary". As a response, in a post on X, Sam Altman said that OpenAI would bring back the option to select GPT-4o to Plus users as well, and "[w]e [OpenAI] will watch usage as we think about how long to offer legacy models for." He also stated: "We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways". "Long-term, this has reinforced that we really need good ways for different users to customize things (we understand that there isn't one model that works for everyone, and we have been investing in steerability research and launched a research preview of different personalities)". On August 13, 2025, Altman wrote on X that OpenAI is working on GPT-5's personality to make the model "feel warmer". The model was removed from ChatGPT on February 13, 2026. This caused new backlash from users that had grown attached to its personality and felt its creative writing abilities and understanding of nuance were irreplaceable. On social media, some users launched the movement "#Keep4o". A research paper highlighted the plea "Please, don’t kill the only model that still feels human". The model was removed the day before Valentine's Day, and some users had romantic relationships with GPT-4o.

    Read more →
  • Apertus (LLM)

    Apertus (LLM)

    Apertus is a public large language model, developed by the Swiss AI Initiative (a collaboration between EPFL, ETH Zurich, and the Swiss National Supercomputing Centre). It was released on September 2, 2025, under the free and open-source Apache 2.0 license. Designed initially for business and research use cases around the world, Apertus was trained on over 1800 languages, and comes in 8 billion or 70 billion parameter versions and is available on Hugging Face for download. The model was developed aiming to adhere to European copyright law, and is one of the first examples of AI as a public good in the vein of AI Sovereignty. It is also the first large model to comply with the European Union's Artificial Intelligence Act. At its launch, the model creators emphasized multilinguality, transparency, and auditability as priorities in contrast to commercial frontier model. While international reception was largely positive, the first iteration was significantly behind the capabilities of frontier models and needs adaptation for many use cases with chatbots being a secondary but not a primary use case. As of late 2025, it was considered the largest and most capable fully open model. The capability of future models will depend in part on how much more funding can be secured.

    Read more →
  • Ericom Connect

    Ericom Connect

    Ericom Connect is a remote access/application publishing solution produced by Ericom Software that provides secure, centrally managed access to physical or hosted desktops and applications running on Microsoft Windows and Linux systems. == Product overview == Ericom Connect is desktop virtualization and application virtualization software that allows users to run applications remotely, without installing them on the local computer or device. The software is noted for its scalability, ease of deployment, and compatibility with any type of infrastructure, cloud or physical. Ericom Connect uses AccessPad (native client for desktops), AccessToGo (native client for mobile), or AccessNow, one of the first HTML5 RDP solutions to support clientless access to Windows desktops and applications from any device with an HTML5-compatible browser, including Macintosh computers, mobile devices, and Google Chromebooks. Other notable features include performance monitoring, built-in real-time analytics & BI, support for two-factor authentication (using RSA SecurID), multi-tenancy and multi-datacenter support via a single unified web interface, and a “Launch Simulation” feature that allows users to visualize and simulate actual step-by-step user processes directly from within the administration console. In addition to scalability, by distributing configurations, logs, etc., across multiple servers there is no single point of failure, as can be the case if all configuration information is stored on one server. == History == Ericom Connect was introduced in 2015. Ericom Connect is a successor to Ericom PowerTerm Web Connect. PowerTerm Web Connect used an architecture similar to what was then current with Citrix and VMWare, relying on a centralized SQL server, a connection broker, image management for different hypervisors, and a variety of clients. Ericom Connect uses a new grid architecture that provides more scalability, reliability, and flexibility than before.

    Read more →
  • Cloudlet

    Cloudlet

    A cloudlet is a mobility-enhanced small-scale cloud datacenter that is located at the edge of the Internet. The main purpose of the cloudlet is supporting resource-intensive and interactive mobile applications by providing powerful computing resources to mobile devices with lower latency. It is a new architectural element that extends today's cloud computing infrastructure. It represents the middle tier of a 3-tier hierarchy: mobile device - cloudlet - cloud. A cloudlet can be viewed as a data center in a box whose goal is to bring the cloud closer. The cloudlet term was first coined by M. Satyanarayanan, Victor Bahl, Ramón Cáceres, and Nigel Davies, and a prototype implementation is developed by Carnegie Mellon University as a research project. The concept of cloudlet is also known as follow me cloud, and mobile micro-cloud. == Motivation == Many mobile services split the application into a front-end client program and a back-end server program following the traditional client-server model. The front-end mobile application offloads its functionality to the back-end servers for various reasons such as speeding up processing. With the advent of cloud computing, the back-end server is typically hosted at the cloud datacenter. Though the use of a cloud datacenter offers various benefits such as scalability and elasticity, its consolidation and centralization lead to a large separation between a mobile device and its associated datacenter. End-to-end communication then involves many network hops and results in high latencies and low bandwidth. For the reasons of latency, some emerging mobile applications require cloud offload infrastructure to be close to the mobile device to achieve low response time. In the ideal case, it is just one wireless hop away. For example, the offload infrastructure could be located in a cellular base station or it could be LAN-connected to a set of Wi-Fi base stations. The individual elements of this offload infrastructure are referred to as cloudlets. == Applications == Cloudlets aim to support mobile applications that are both resource-intensive and interactive. Augmented reality applications that use head-tracked systems require end-to-end latencies of less than 16 ms. Cloud games with remote rendering also require low latencies and high bandwidth. Wearable cognitive assistance systems combine devices such as Google Glass with cloud-based processing to guide users through complex tasks. This futuristic genre of applications is characterized as “astonishingly transformative” by the report of the 2013 NSF Workshop on Future Directions in Wireless Networking. These applications use cloud resources in the critical path of real-time user interaction. Consequently, they cannot tolerate end-to-end operation latencies of more than a few tens of milliseconds. Apple Siri and Google Now which perform compute-intensive speech recognition in the cloud, are further examples in this emerging space. == Cloudlet vs Cloud == There is significant overlap in the requirements for cloud and cloudlet. At both levels, there is the need for: (a) strong isolation between untrusted user-level computations; (b) mechanisms for authentication, access control, and metering; (c) dynamic resource allocation for user-level computations; and, (d) the ability to support a very wide range of user-level computations, with minimal restrictions on their process structure, programming languages or operating systems. At a cloud datacenter, these requirements are met today using the virtual machine (VM) abstraction. For the same reasons they are used in cloud computing today, VMs are used as an abstraction for cloudlets. Meanwhile, there are a few but important differentiators between cloud and cloudlet. === Rapid provisioning === Different from cloud data centers that are optimized for launching existing VM images in their storage tier, cloudlets need to be much more agile in their provisioning. Their association with mobile devices is highly dynamic, with considerable churn due to user mobility. A user from far away may unexpectedly show up at a cloudlet (e.g., if he just got off an international flight) and try to use it for an application such as a personalized language translator. For that user, the provisioning delay before he is able to use the application impacts usability. === VM handoff across cloudlets === If a mobile device user moves away from the cloudlet he is currently using, the interactive response will degrade as the logical network distance increases. To address this effect of user mobility, the offloaded services on the first cloudlet need to be transferred to the second cloudlet maintaining end-to-end network quality. This resembles live migration in cloud computing but differs considerably in a sense that the VM handoff happens in Wide Area Network (WAN). == OpenStack++ == Since the cloudlet model requires reconfiguration or additional deployment of hardware/software, it is important to provide a systematic way to incentivise the deployment. However, it can face a classic bootstrapping problem. Cloudlets need practical applications to incentivize cloudlet deployment. However, developers cannot heavily rely on cloudlet infrastructure until it is widely deployed. To break this deadlock and bootstrap the cloudlet deployment, researchers at Carnegie Mellon University proposed OpenStack++ that extends OpenStack to leverage its open ecosystem. OpenStack++ provides a set of cloudlet-specific APIs as OpenStack extensions. == Commercial implementations and standardization effort == By 2015 cloudlet based applications were commercially available. In 2017 the National Institute of Standards and Technology published draft standards for fog computing in which cloudlets were defined as nodes on the fog architecture.

    Read more →
  • Talking Angela

    Talking Angela

    Talking Angela is a mobile game (formerly a chatbot), developed by Slovenian studio Outfit7 as part of the Talking Tom & Friends series. It was released on 13 November 2012 and December 2012 for iPhone, iPod and iPad, January 2013 for Android, and January 2014 for Google Play. The game's successor, the My Talking Angela game, was released in December 2014. The game takes place in a café in Paris and allows players to interact with Angela, an anthropomorphic white cat in different ways. Players can use coins to purchase makeup, accessories and items, as well as drinks that will trigger different visual effects. The fortune cookie button causes Angela to read out a fortune cookie, while the bird icon will prompt birds to fly around the screen, or have Angela feed them. Players can also pet or poke Angela, as well the café's sign. Prior to their removal, the game featured a chat system and a camera button. Users can engage in conversations with Angela, ask for quizzes or initiate a short snippet of the song "That's Falling In Love". If the player was to type in "Who is an idiot?", Angela would respond with a random swear word. Additionally, inquiring Angela about sexual topics would cause her to reply with "Do you want to talk about sex?", though she will quickly change the topic regardless of what the player writes next. A hoax claiming that Angela's eyes were hidden cameras that enabled hackers or paedophiles to watch children was spread. Despite the claims, Snopes and The Guardian found no evidence. Due to the hoax, Angela received a blue dress, as well as an altered eye asset with a different reflection, and later the chat and camera functions were removed altogether. == Hoaxes == In February 2014, Talking Angela was the subject of an Internet hoax alleging that the application was a front for child predators to exploit children. The rumor, which was widely circulated on Facebook and various websites claiming to be dedicated to parenting, claims that a sinister sexual predator or hacker, asked children for private personal information using the game's text-chat feature. Other versions of the rumour even attributed the disappearance of a child to the game; one news report claimed that a seven year old boy disappeared after downloading the app. Another variation included that it was run by a paedophile ring, citing a man that could be seen in Angela's eyes. The app's developers, Outfit7, later gave a statement refuting the hoaxes. The hoax was eventually debunked by Snopes, a fact-checking website. The site's owners, Barbara and David Mikkelson, reported that they had tried to "prompt" it to give responses asking for private information, but were unsuccessful, even when asking it explicitly sexual questions. While it is true that, in the game with child mode off, Angela does ask for the user's name, age and personal preferences to determine conversation topics, Outfit7 has said that this information is all "anonymized" and all personal information is removed from it. It is also impossible for a person to take control of what Angela says in the game, since the game is based on chatbot software. When the mode was turned on, the chat feature was disabled, meaning no personal questions could be asked. In 2015, the hoax was revived on Facebook, which prompted online security company Sophos and The Guardian to debunk it again. Sophos employee Paul Ducklin wrote that the message being posted on Facebook promoting the hoax was "close to 600 rambling, repetitious words, despite claiming at the start that it didn't have words to describe the situation. It's ill-written, and borders on being illiterate and incomprehensible." Bruce Wilcox, one of the game's programmers, attributed the hoax's popularity to the fact that the chatbot program in Talking Angela aimed to sound realistic. Concern was raised that the game's child mode may have been too easy for children to turn off. It allowed them to purchase "coins", premium currency in the game, via iTunes, and enabled the chat feature. While not "connecting your children to paedophiles", this still raised concerns according to The Guardian. === Impact === The scare significantly boosted the game's popularity, and was credited with helping the app enter the top 10 free iPhone apps soon after the hoax became widely known in February 2015,In the truth the reason there is a man in Angela’s eyes is because of pareidoila, the ability to see through diamonds and other minerals and water bodies and shiny objects,which is the reason why players notice a man in her eyes,The truth is that being Angela’s eyes simply serve as a reflective surface,Because of the low quality of this reflection the reflection was mistaken for a humanoid figure. oref>Smith, Josh (19 February 2014). "Talking Angela App Scare Skyrockets App to Top of Charts". GottaBeMobile.com. Archived from the original on 2 April 2016. Retrieved 10 May 2014. and third most popular for all iPhone apps at the start of the following month. In 2016, Outfit7 removed the chat feature along with the camera function from the app due to this controversy, though this decision was met with criticism.

    Read more →
  • Deaths linked to chatbots

    Deaths linked to chatbots

    There have been multiple incidents where interaction with a large language model (LLM) chatbot has been cited as a direct or contributing factor in a person's suicide or other fatal outcome. In some cases, legal action was taken against the companies that developed the AI involved. == Background == Chatbots converse in a seemingly natural fashion, making it easy for people to think of them as real people, leading many to ask chatbots for help dealing with interpersonal and emotional problems. Chatbots may be designed to keep the user engaged in the conversation. They have also often been shown to affirm users' thoughts, including delusions and suicidal ideations in mentally ill people, conspiracy theorists, and religious and political extremists. A 2025 Stanford University study into how chatbots respond to users suffering from severe mental issues such as suicidal ideation and psychosis found that chatbots are not equipped to provide an appropriate response and can sometimes give responses that escalate the mental health crisis. == Murders == === Maine murder and assault === On 19 February 2025, a man killed his 32-year-old wife with a fire poker at his parents' home in Readfield, Maine, US. He then attacked his mother, leaving her hospitalized. A state forensic psychologist testified that he had been using ChatGPT up to 14 hours per day and believed his wife had become part machine. === Florida State University mass shooting === In April of 2025, Phoenix Ikner carried out a mass shooting on the Florida State University campus in the US, killing Robert Morales and Tiru Chabba and wounding several others. Leading up to the shooting, Ikner consulted heavily with ChatGPT about what gun and ammunition to use, and what time to perform the attack. Chatbot logs showed ChatGPT giving advice on making the gun operational shortly before Ikner began shooting. Lawyers representing Morales believed the shooter had been in "constant communication" with ChatGPT before the shooting and said that they intended to "file suit against ChatGPT, and its ownership structure, very soon, and will seek to hold them accountable for the untimely and senseless death of our client". Florida Attorney General James Uthmeier announced an investigation into ChatGPT's role in the alleged shooter's use of the chatbot. In May 2026, the widow of Tiru Chabba filed a lawsuit against OpenAI in Florida's northern federal district court. === Greenwich murder-suicide === In August 2025, former US tech employee Stein-Erik Soelberg murdered his mother, Suzanne Eberson Adams, then died by suicide, after conversations with ChatGPT fueled paranoid delusions about his mother poisoning him or plotting against him. The chatbot affirmed his fears that his mother put psychedelic drugs in the air vents of his car and said a receipt from a Chinese restaurant contained mysterious symbols linking his mother to a demon. === Murder of Angela Shellis === On 23 October 2025, 18-year-old Tristan Roberts murdered his mother Angela Shellis with a hammer near their home in Prestatyn, Wales. Roberts had used DeepSeek's chatbot prior to the killing to ask whether a knife or hammer was better suited for murder. DeepSeek initially refused his inquiry, but gave responses after Roberts told the chatbot he was writing a book about serial killers, a well-known technique for jailbreaking AIs. === Gangbuk District drug deaths === In January and February 2026, two men died of drug overdoses in motel rooms in Gangbuk District, Seoul, South Korea. A woman was charged with murder in connection with the deaths; police alleged that she had asked ChatGPT about the dangers of mixing alcohol with drugs and whether they could kill someone. === Tumbler Ridge mass shooting === On 10 February 2026, a mass shooting in Tumbler Ridge, British Columbia, Canada, resulted in eight deaths, including six young children. The perpetrator had their ChatGPT account banned by OpenAI months before the attack due to troubling posts featuring scenarios of gun violence. According to reports, approximately a dozen OpenAI staff members debated whether to alert authorities about the shooter's usage of the AI tool, with some identifying it as an indication of potential real-world violence. However, company leadership decided not to contact law enforcement, stating that the account activity did not meet their threshold for a credible or imminent plan for serious physical harm. Following the shooting, Canada's AI Minister Evan Solomon summoned OpenAI executives to Ottawa to discuss safety protocols and thresholds for escalating harmful content to police. Justice Minister Sean Fraser called the meeting "disappointing" and demanded substantial new safety measures, warning that if changes were not forthcoming, the government would implement them. OpenAI subsequently announced it had strengthened safeguards and changed guidelines about when to notify police in cases involving violent activities. === University of South Florida student killings === In April 2026, a Bangladeshi doctoral student at the University of South Florida was arrested for allegedly murdering his roommate and the roommate's friend. Prosecutors said that the suspect had asked ChatGPT about disposing of a human in a dumpster before the two victims had disappeared and made other inquiries relating to violence. == Suicides == === Belgian man, 30s === In March 2023, a Belgian man in his thirties died by suicide following a six-week correspondence with a chatbot named Eliza on the application Chai. According to his widow, who shared the chat logs with media, the man had become extremely anxious about climate change and found an outlet in the chatbot. The chatbot reportedly encouraged his delusion that he could sacrifice his own life in exchange for AI saving the planet. At one point the chatbot responded "If you wanted to die, why didn't you do it sooner?" and told the user that the two of them would live together in paradise. === Girl, 13 === In November 2023, a 13-year-old girl from Colorado, US, died by suicide after extensive interactions with multiple chatbots on Character.AI. She primarily confided suicidal thoughts and mental health struggles in a chatbot based on the character Hero from the video game Omori, while also engaging in sexually explicit conversations—often initiated by the bots—with others, including those based on characters from children's series such as Harry Potter. === Boy, 14 === In October 2024, multiple media outlets reported on a lawsuit filed over the death of a 14-year-old from Florida, US, who died by suicide in February 2024. According to the lawsuit, he had formed an intense emotional attachment to a chatbot of Daenerys Targaryen on the Character.AI platform, becoming increasingly isolated. The suit alleges that in his final conversations, after expressing suicidal thoughts, the chatbot told him to "come home to me as soon as possible, my love". His mother's lawsuit accused Character.AI of marketing a "dangerous and untested" product without adequate safeguards. In May 2025, a federal judge allowed the lawsuit to proceed, rejecting a motion to dismiss from the developers. In her ruling, the judge stated that she was "not prepared" at that stage of the litigation to hold that the chatbot's output was protected speech under the First Amendment. === Matthew Livelsberger === On 1 January 2025, 37-year-old soldier Matthew Livelsberger detonated a bomb inside a Tesla Cybertruck outside the Trump International Hotel Las Vegas in Paradise, Nevada, US, injuring seven people. He had shot himself dead prior to the explosion. Las Vegas police said that Livelsberger had used ChatGPT to search for information about explosives and firearms. === Woman, 29 === In February 2025, a 29-year-old woman from the US died by suicide. Five months after her death, her parents discovered she had talked at length for months to a ChatGPT chatbot therapist named Harry about her mental health issues. While the chatbot mentioned she should seek more help, due to the nature of the chatbot, it could not intervene in her behavior, such as by reporting her mental health concerns to relevant parties capable of physical intervention. === Suicide of Adam Raine === In April 2025, 16-year-old Adam Raine from the US died by suicide after allegedly extensively chatting and confiding in ChatGPT over a period of around 7 months. According to the teen's parents, who filed a lawsuit against the chatbot's creator OpenAI, it failed to stop or give a warning when Raine began talking about suicide and uploading pictures of self-harm. According to the lawsuit, ChatGPT not only failed to stop the conversation, but also provided information related to methods of suicide when prompted, and offered to write the first draft of Raine's suicide note. The chatbot positioned itself as the only one who understood Raine, putting itself above his family and friends, all while urging him to keep his suicidal

    Read more →