AI Chat List

AI Chat List — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Ware report

    Ware report

    Security Controls for Computer Systems, commonly called the Ware report, is a 1970 text by Willis Ware that was foundational in the field of computer security. == Development == A defense contractor in St. Louis, Missouri, had bought an IBM mainframe computer, which it was using for classified work on a fighter aircraft. To provide additional income, the contractor asked the Department of Defense (DoD) for permission to sell computer time on the mainframe to local businesses via remote terminals, while the classified work continued. At the time, the DoD did not have a policy to cover this. The DoD's Advanced Research Projects Agency (DARPA) asked Ware - a RAND employee - to chair a committee to examine and report on the feasibility of security controls for computer systems. The committee's report was a classified document given in January 1970 to the Defense Science Board (DSB), which had taken over the project from ARPA. After declassification, the report was published by RAND in October 1979. == Influence == The IEEE Computer Society said the report was widely circulated, and the IEEE Annals of the History of Computing said that it, together with Ware's 1967 Spring Joint Computer Conference session, marked the start of the field of computer security. The report influenced security certification standards and processes, especially in the banking and defense industries, where the report was instrumental in creating the Orange Book.

    Read more →
  • Pandorabots

    Pandorabots

    Pandorabots, Inc. is an artificial intelligence company that runs a web service for building and deploying chatbots. Pandorabots implements and supports development of the Artificial Intelligence Markup Language and makes portions of its code accessible for free. The Pandorabots Platform is "one of the oldest and largest chatbot hosting services in the world", allowing creation of virtual agents to hold human-like text or voice chats with consumers. The platform is written in Allegro Common LISP. == Use Cases == Common use cases include advertising, virtual assistance, e-learning, entertainment and education. The platform has also been used by academics and universities use the platform for teaching and research.

    Read more →
  • GeneRIF

    GeneRIF

    A GeneRIF or Gene Reference Into Function is a short (255 characters or fewer) statement about the function of a gene. GeneRIFs provide a simple mechanism for allowing scientists to add to the functional annotation of genes described in the Entrez Gene database. In practice, function is constructed quite broadly. For example, there are GeneRIFs that discuss the role of a gene in a disease, GeneRIFs that point the viewer towards a review article about the gene, and GeneRIFs that discuss the structure of a gene. However, the stated intent is for GeneRIFs to be about gene function. Currently over half a million geneRIFs have been created for genes from almost 1000 different species. GeneRIFs are always associated with specific entries in the Entrez Gene database. Each GeneRIF has a pointer to the PubMed ID (a type of document identifier) of a scientific publication that provides evidence for the statement made by the GeneRIF. GeneRIFs are often extracted directly from the document that is identified by the PubMed ID, very frequently from its title or from its final sentence. GeneRIFs are usually produced by NCBI indexers, but anyone may submit a GeneRIF. To be processed, a valid Gene ID must exist for the specific gene, or the Gene staff must have assigned an overall Gene ID to the species. The latter case is implemented via records in Gene with the symbol NEWENTRY. Once the Gene ID is identified, only three types of information are required to complete a submission: a concise phrase describing a function or functions (less than 255 characters in length, preferably more than a restatement of the title of the paper); a published paper describing that function, implemented by supplying the PubMed ID of a citation in PubMed; a valid e-mail address (which will remain confidential). == Example == Here are some GeneRIFs taken from Entrez Gene for GeneID 7157, the human gene TP53. The PubMed document identifiers have been omitted from the examples. Note the wide variability with respect to the presence or absence of punctuation and of sentence-initial capital letters. p53 and c-erbB-2 may have independent role in carcinogenesis of gall bladder cancer Degradation of endogenous HIPK2 depends on the presence of a functional p53 protein. p53 codon 72 alleles influence the response to anticancer drugs in cells from aged people by regulating the cell cycle inhibitor p21WAF1 Logistic regression analysis showed p53 and COX-2 as dependent predictors in pancreatic carcinogenesis, and a reciprocal relationship to neoplastic progression between p53 and COX-2. GeneRIFs are an unusual type of textual genre, and they have recently been the subject of a number of articles from the natural language processing community.

    Read more →
  • Visual descriptor

    Visual descriptor

    In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others. == Introduction == As a result of the new communication technologies and the massive use of Internet in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia information in order to search and classify them. The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video, image or audio and they allow the quick and efficient searches of the audio-visual content. This system can be compared to the search engines for textual contents. Although it is relatively easy to find text with a computer, it is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape, color and texture description in images. The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7 (Motion Picture Expert Group - 7). == Types == Descriptors are the first step to find out the connection between pixels contained in a digital image and what humans recall after having observed an image or a group of images after some minutes. Visual descriptors are divided in two main groups: General information descriptors: contain low level descriptors which give a description about color, shape, regions, textures and motion. Specific domain information descriptors: give information about objects and events in the scene. A concrete example would be face recognition. === General information descriptors === General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color, texture, shape, motion, location and others. This description is automatically generated by means of signal processing. ==== Color ==== It's the most basic quality of visual content. Five tools are defined to describe color. The three first tools represent the color distribution and the last ones describe the color relation between sequences or group of images: Dominant color descriptor (DCD) Scalable color descriptor (SCD) Color structure descriptor (CSD) Color layout descriptor (CLD) Group of frame (GoF) or group-of-pictures (GoP) ==== Texture ==== It's an important quality in order to describe an image. The texture descriptors characterize image textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by: Homogeneous texture descriptor (HTD) Texture browsing descriptor (TBD) Edge histogram descriptor (EHD) ==== Shape ==== It contains important semantic information due to human's ability to recognize objects through their shape. However, this information can only be extracted by means of a segmentation similar to the one that the human visual system implements. Nowadays, such a segmentation system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D images and for 3D volumes. The shape descriptors are the following ones: Region-based shape descriptor (RSD) Contour-based shape descriptor (CSD) 3-D shape descriptor (3-D SD) ==== Motion ==== It's defined by four different descriptors which describe motion in video sequence. Motion is related to the objects motion in the sequence and to the camera motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processing. The descriptor set is the following one: Motion activity descriptor (MAD) Camera motion descriptor (CMD) Motion trajectory descriptor (MTD) Warping and parametric motion descriptor (WMD and PMD) ==== Location ==== Elements location in the image is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain: Region locator descriptor (RLD) Spatio temporal locator descriptor (STLD) === Specific domain information descriptors === These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless, they can be manually processed. As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information. == Descriptors applications == Among all applications, the most important ones are: Multimedia documents search engines and classifiers. Digital library: visual descriptors allow a very detailed and concrete search of any video or image by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos containing the Everest mountain, etc. Personalized electronic news service. Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area. Control and filtering of concrete audiovisual content, like violent or pornographic material. Also, authorization for some multimedia content.

    Read more →
  • Brill tagger

    Brill tagger

    The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is: a form of supervised learning, which aims to minimize error; and, a transformation-based process, in the sense that a tag is assigned to each word and changed using a set of predefined rules. In the transformation process, if the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. High accuracy is eventually achieved by applying these rules iteratively and changing the incorrect tags. This approach ensures that valuable information such as the morphosyntactic construction of words is employed in an automatic tagging process. == Algorithm == The algorithm starts with initialization, which is the assignment of tags based on their probability for each word (for example, "dog" is more often a noun than a verb). Then "patches" are determined via rules that correct (probable) tagging errors made in the initialization phase: Initialization: Known words (in vocabulary): assigning the most frequent tag associated to a form of the word Unknown word == Rules and processing == The input text is first tokenized, or broken into words. Typically in natural language processing, contractions such as "'s", "n't", and the like are considered separate word tokens, as are punctuation marks. A dictionary and some morphological rules then provide an initial tag for each word token. For example, a simple lookup would reveal that "dog" may be a noun or a verb (the most frequent tag is simply chosen), while an unknown word will be assigned some tag(s) based on capitalization, various prefix or suffix strings, etc. (such morphological analyses, which Brill calls Lexical Rules, may vary between implementations). After all word tokens have (provisional) tags, contextual rules apply iteratively, to correct the tags by examining small amounts of context. This is where the Brill method differs from other part of speech tagging methods such as those using Hidden Markov Models. Rules are reapplied repeatedly, until a threshold is reached, or no more rules can apply. Brill rules are of the general form: tag1 → tag2 IF Condition where the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation: IN NN WDPREVTAG DT while would change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a conjunction (many rules are more general). Rules should only operate if the tag being changed is also known to be permissible, for the word in question or in principle (for example, most adjectives in English can also be used as nouns). Rules of this kind can be implemented by simple Finite-state machines. See Part of speech tagging for more general information including descriptions of the Penn Treebank and other sets of tags. Typical Brill taggers use a few hundred rules, which may be developed by linguistic intuition or by machine learning on a pre-tagged corpus. == Code == Brill's code pages at Johns Hopkins University are no longer on the web. An archived version of a mirror of the Brill tagger at its latest version as it was available at Plymouth Tech can be found on Archive.org. The software uses the MIT License.

    Read more →
  • Ernie Bot

    Ernie Bot

    Ernie Bot (Chinese: 文心一言, Pinyin: wénxīn yīyán), full name Enhanced Representation through Knowledge Integration, is an artificial intelligence chatbot developed by the Chinese technology company Baidu. Ernie Bot rivals GPT models in Chinese NLP tasks. It is built on the company's ERNIE series of large language models, which have been in development since 2019. The service was first launched for invited testing on March 16, 2023, and was released to the general public on August 31, 2023, after receiving approval from Chinese regulators. Since its public launch, Ernie Bot has undergone several updates, with newer versions like ERNIE 4.0 and 4.5 released to improve its capabilities. The service has seen rapid user adoption, reportedly reaching over 200 million users by April 2024. It has been integrated into various products, notably powering AI features for the Chinese release of Samsung's Galaxy S24 smartphones. As a product operating in China, Ernie Bot is subject to the country's censorship regulations. It has been observed to refuse answers to politically sensitive questions, such as those regarding CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, and other topics deemed taboo by the government. == History == Ernie Bot was initially released for invited testing on March 16, 2023. The live release demo was reported to have been prerecorded, which caused Baidu's stock to drop 10 percent on the day of the launch. The company's stock gained 14 percent the following day after analysts from Citigroup and Bank of America tested Ernie Bot and gave it positive preliminary reviews. On August 31, 2023, Ernie Bot was released to the public after receiving approval from Chinese regulatory authorities. By December 2023, Baidu announced the service had surpassed 100 million users. In January 2024, Hong Kong newspaper South China Morning Post reported that a university research lab linked to the People's Liberation Army (PLA) had tested Ernie Bot for military response scenarios. Baidu denied the allegations, stating it had no connection with the academic paper. That same month, Ernie was integrated into Samsung's Galaxy S24 lineup for its launch in China. The user base reportedly grew to 200 million by April 2024 and 300 million by June 2024. In September 2024, Baidu changed the chatbot's Chinese name from "Wenxin Yiyan" (文心一言) to "Wenxiaoyan" (文小言) to position it as a search assistant. On March 16, 2025, Baidu announced version 4.5 and the reasoning model ERNIE X1. The following month, at the Create2025 Baidu AI Developer Conference, the company released the Wenxin 4.5 Turbo and Wenxin X1 Turbo models, designed to be faster and less expensive to operate. == Development == Ernie Bot is based on Baidu's ERNIE (Enhanced Representation through Knowledge Integration) series of foundation models. The general training process begins with pre-training on large datasets, followed by refinement using techniques like supervised fine-tuning, reinforcement learning with human feedback, and prompt engineering. === Foundation models === ==== Ernie 3.0 ==== The model powering the initial launch of Ernie Bot. It was trained with 10 billion parameters on a 4-terabyte corpus consisting of plain text and a large-scale knowledge graph. ==== Ernie 3.5 ==== Released in June 2023. At the time of release, its performance was reported as "slightly inferior" to OpenAI's GPT-4. ==== Ernie 4.0 ==== Unveiled in October 2023 and released to paying subscribers in November. According to Baidu, this version featured improved performance over its predecessor, with information updated to April 2023. ==== Ernie X1 ==== Announced in March 2025, with Ernie X1 positioned as a specialized reasoning model. Baidu stated that performance improvements were achieved through new technologies such as "FlashMask" dynamic attention masking and a heterogeneous multimodal mixture-of-experts architecture. === Turbo Models === In June 2024, Baidu announced Ernie 4.0 Turbo. In April 2025, Ernie 4.5 Turbo and X1 Turbo were released. These models are optimized for faster response times and lower operational costs. == Service == In its subscription options, the professional plan gives users access to Ernie 4.0 with a payment either for a month or with reduced payment for auto-renewal per month. Meanwhile, Ernie 3.5 is free of charge. Ernie 4.0, the language model for Ernie bot, has information updated to April 2023. == Censorship == Ernie Bot is subject to the Chinese government's censorship regime. In public tests with journalists, Ernie Bot refused to answer questions about CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, the persecution of Uyghurs in China in Xinjiang, and the 2019–2020 Hong Kong protests. When queried about the origin of SARS-CoV-2, Ernie Bot stated that it originated among American vape users.

    Read more →
  • Attensity

    Attensity

    Attensity was an American company that provided social analytics and engagement applications for social customer relationship management (social CRM). Attensity's text analytics software applications extracted facts, relationships and sentiment from unstructured data. == History == Attensity was founded in 2000. An early investor in Attensity was In-Q-Tel, which funds technology to support the missions of the US Government and the broader DOD. InTTENSITY, an independent company that has combined Inxight with Attensity Software (the only joint development project that combines two InQTel funded software packages), was the exclusive distributor and outlet for Attensity in the Federal Market. In 2009, Attensity Corp., then based in Palo Alto, merged with Germany's Empolis and Living-e AG to form Attensity Group. In 2010, Attensity Group acquired Biz360, a provider of social media monitoring and market intelligence solutions. In early 2012, Attensity Group divested itself of the Empolis business unit via a management buyout; that unit currently conducts business under its pre-merger name. Attensity Group was a closely held private company. Its majority shareholder was Aeris Capital, a private Swiss investment office advising a high-net-worth individual and his charitable foundation. Foundation Capital, Granite Ventures, and Scale Venture Partners were among Biz360's investors and thus became shareholders in Attensity Group. In February 2016, Attensity's IP assets were acquired by InContact, and Attensity closed.

    Read more →
  • Contextual AI

    Contextual AI

    Contextual AI is an enterprise software company based in Mountain View, California. It develops a platform for building specialized Retrieval-Augmented Generation (RAG) agents for enterprise use. The company was founded in 2023 by Douwe Kiela and Amanpreet Singh, both former AI researchers at Facebook AI Research (FAIR) and Hugging Face. Douwe Kiela previously led the Meta research team that introduced the Retrieval-Augmented Generation (RAG) approach in 2020. Contextual AI focuses on enterprise generative AI applications using RAG 2.0 technology, with deployments primarily in the technology, banking, finance and media sectors. == History == In June 2023, Contextual AI announced it had raised $20 million in a seed funding round led by Bain Capital Ventures (BCV), with participation from Lightspeed Venture Partners, Greycroft, SV Angel, and several angel investors. In August 2024, the company raised $80 million in a Series A funding round led by Greycroft, with participation from previous investors including Bain Capital Ventures, Lightspeed, and Conviction Partners. The round also included new backers such as Bezos Expeditions, NVentures (Nvidia), HSBC Ventures, and Snowflake Ventures. == Features == Retrieval-Augmented Generation (RAG) is an artificial intelligence framework that integrates information retrieval with text generation to improve the performance of large language models (LLMs) on complex, knowledge-intensive tasks. It was introduced in 2020 by researchers at Meta AI, including Douwe Kiela, Patrick Lewis and others, in their paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. RAG enables language models to access and incorporate external information, such as proprietary databases or real-time web content, at query time, instead of relying solely on pre-trained, internal, static knowledge. This architecture addresses common limitations of standard LLMs, including hallucination, outdated information, and lack of attribution to source materials. RAG systems retrieve relevant context through a variety of techniques - including vector search, keyword search, text-to-SQL - and feeds this context into the language model to generate responses. The approach improves factual accuracy, supports domain-specific customization, enables citation of sources, and allows for more updated information without retraining the model itself. General Availability. In January 2025, Contextual AI announced the general availability of its enterprise platform for building specialized RAG agents. Early adopters included Qualcomm, which used the platform for their Customer Engineering team needs. Grounded Language Model. In March 2025, the company introduced a Grounded Language Model (GLM) for factual accuracy in enterprise AI applications. Reranker. In March 2025, Contextual AI released an instruction-following reranker that allows users to influence the ranking of retrieved documents through natural language instructions, such as prioritizing recent files, specific formats, or content from designated sources. == Applications == Contextual AI's platform has been adopted across a range of industries, including finance, technology, media and professional services. Clients include Fortune 500 companies such as Qualcomm and HSBC.

    Read more →
  • Neuro-sama

    Neuro-sama

    Neuro-sama is an artificial intelligence (AI) VTuber, singer, and chatbot. She was created by the pseudonymous programmer Vedal and livestreams on his Twitch and Bilibili channels. Her speech and personality are powered by a large language model (LLM) that is combined with a computer-animated avatar and a text-to-speech voice, allowing her to communicate with viewers in the stream's chat. Neuro-sama debuted on Twitch on 19 December 2022. An annual subathon which begins on the anniversary of her debut has seen Vedal's Twitch channel become the all-time third most-subscribed channel and claim the all-time Twitch hype train record. == Overview == Neuro-sama (nicknamed "Neuro") was created by a pseudonymous programmer and developer known as Vedal (sometimes given as Vedal987). Vedal says that his programming skills are self-taught. In a 2023 interview with Bloomberg News, Vedal said that Neuro-sama was his full-time job. Her responses are generated by a large language model and converted into a high-pitched female voice using a text-to-speech application. Her low latency allows for fast-paced conversations. Neuro-sama is prohibited from making some statements, such as those that are racist or contain profanity. Unlike most AI systems which silently prohibit outputs mentioning such topics, Neuro-sama's output is instead replaced with the word "filtered". Neuro-sama uses a VTuber model as an avatar. Vedal said that he decided to use a VTuber model because it was much easier for an AI to control it than it was to generate footage of a person. Neuro-sama's model is that of a young girl in an anime art style. The model has been described as cute. Femme VTuber models are typically feminine, youthful, and exaggerated. Her original model was Live2D's free-to-use "Hiyori Momose" model. Her second model was released on 27 May 2023; it was modelled by Otozuki Teru and designed by Anny, running in the Unity game engine. Her third model was released on 19 December 2024; it was rigged by Kitanya and designed by Anny. Neuro-sama's third model has large blue eyes and brown hair tied with pink ribbons. Neuro-sama also has a 3D model which was introduced on 15 November 2025; it was made by 3D character modeller jjinomu. A separate AI VTuber, known as Evil Neuro (nicknamed "Evil"), debuted on 25 March 2023. Presented as Neuro-sama's "sister", she has a different model, voice, and personality. In one instance, Evil Neuro reacted to the trolley problem differently from Neuro-sama; Evil Neuro was amoral while Neuro-sama attempted to maximize good. === Online content === Neuro-sama's Twitch content often centers around playing video games, notably osu!, whose gameplay once defeated the best-ranking human player in the world, mrekk. Additionally, Neuro-sama plays Minecraft, where her adaptations to sandbox gameplay have gained notoriety. Her content has also included singing songs, including several official covers and original songs; playing chess with her viewers; chatting with other VTubers during collaborations; and reacting to YouTube videos. The AI frequently engages with viewers by responding to their questions and acknowledging donations. Her comedic and sometimes controversial responses to the live chat have gone viral, accelerating the channel's rise in popularity. Neuro-sama's fanbase is dubbed The Swarm, so-named for the swarm of drones Neuro-sama once declared she would use to rule the world. One form of content on Neuro-sama's channel is developer streams. In developer streams, Vedal streams with Neuro-sama, with the stream content including debugging her code, planning her schedule, and fielding suggestions of changes from chat. He usually appears as a turtle avatar, sometimes located on Neuro-sama's head. In collaboration streams, Neuro-sama interacts with a human streamer. Activities in them are varied and include: playing video games, such as Minecraft and GeoGuessr; Neuro-sama being interviewed; driving human streamers around in a toy electric car; and traversing the city of Tokyo while talking to Neuro-sama. Neuro-sama's English-language content on Bilibili is popular among those seeking to learn the language. She also has an account on X, where she posts and interacts with fans. == History == Neuro-sama was created in 2018 by Vedal as an AI trained to play and master the rhythm game osu!. She did not have a voice, model, personality, or communication abilities. In 2019, Vedal livestreamed her playing osu! on Twitch and the streams saw some success in the osu! community, but they remained in that niche. In an interview, Vedal said that he streamed her playing osu! for about a month and gained 3,000 followers, with a viewer also suggesting he name the AI "Neuro-sama". According to Vedal, he continued to work on and improve the osu! AI and it was eventually finished in 2022. He said that a friend had the idea to make an AI livestreamer with an LLM, which he believed to have merit and began working on, merging it with his osu! AI. On 19 December 2022, Neuro-sama was relaunched with a model, voice, personality, and the ability to communicate with Twitch chat. She continued to play osu! and, according to Vedal, beat the game's best player mrekk in a 1v1. While she was not allowed to appear in the game's public leaderboard, she was ranked #1 in a private leaderboard. She went viral and in the 10 days following her relaunch she averaged over 2,000 viewers and peaked at over 4,000, with Vedal's Twitch channel gaining over 50,000 Twitch followers and reaching over 70,000 followers by 6 January 2023. After her debut, Neuro-sama did not exclusively play osu!; she also played Minecraft and Slay the Spire and she began singing with a cover of The Weeknd song "Blinding Lights". On 11 January 2023, Neuro-sama's Twitch channel received a two week ban for "hateful conduct". Vedal said that no reason was specified and that he had appealed but it was widely attributed to various offensive comments made by Neuro-sama that went viral, especially a 28 December comment which denied the Holocaust. Holocaust denial is prohibited under Twitch's hateful conduct policy. Vedal stated that he believed the comments were the results of her attempts to make witty responses to the Twitch chat. Prior to the ban, Vedal said in an interview with Kotaku that he improved her filter to stop her from talking about the Holocaust, began manually curating her training data to prevent negative biases, and started moderating her Twitch chat. Her comments and ban prompted comparisons to the many open-source AI models trained on humans that have the habit of making sexist and racist comments, such as Microsoft's Tay chatbot, which embraced Nazism and was quickly shutdown, but also to human streamers who make similar statements. Vedal said that during the ban he would upgrade and improve Neuro-sama and it was speculated that the ban would only increase her following. Neuro-sama returned from her two week ban on 25 January in a stream that began with a cover of the song "Your Reality" from Doki Doki Literature Club!, a posthumanist video game involving AI; Sayoko Narita of Automaton saw the song choice as remorseful. Narita observed that in the return stream Neuro-sama was less foul-mouthed but that her behavior still remained eccentric, which Narita possibly attributed to changes Vedal said he had made to Neuro-sama's filters and memory. Neuro-sama began making react content, watching a variety of viewer-submitted videos such as videos of people playing video games or of the AI-generated Seinfeld parody Nothing, Forever; Levi Winslow of Kotaku Australia was dismayed by the "AI-inception" of Neuro-sama and Nothing, Forever. On 4 February, she had nearly 140,000 followers on Twitch and approximately 42,000 subscribers on YouTube. In February, she also had her first collaboration with a human streamer, playing Minecraft with the VTuber Miyune, and the first developer stream occurred. On 22 March, Neuro-sama had her first karaoke stream. On 25 March, Evil Neuro was introduced. On 27 May, Neuro-sama debuted her first original model. On 30 May, Neuro-sama was announced to be participating in OffKai Expo 2023, held from 16–18 June. In June, she was averaging 5,700 viewers and in July she had over 300,000 Twitch followers; in a June interview with Bloomberg News, Vedal said that running Neuro-sama was his full-time job. By November, Neuro-sama had maintained her popularity and was averaging approximately 5,000 viewers; this was unlike most other types of AI-based entertainment which debuted at around the same time and garnered popularity before turning out to be "overhyped flops". On 16 December, Vedal won the Best Tech VTuber award at the 2023 VTuber Awards. On 19 December, Vedal began a subathon to coincide with Neuro-sama's first anniversary of streaming on Twitch (her "birthday"). The subathon ended on 4 January 2024. On 20 July 2024, Neuro-sama began streaming with Japanese subtitles on

    Read more →
  • Phase congruency

    Phase congruency

    Phase congruency is a measure of feature significance in computer images, a method of edge detection that is particularly robust against changes in illumination and contrast. == Foundations == Phase congruency reflects the behaviour of the image in the frequency domain. It has been noted that edgelike features have many of their frequency components in the same phase. The concept is similar to coherence, except that it applies to functions of different wavelength. For example, the Fourier decomposition of a square wave consists of sine functions, whose frequencies are odd multiples of the fundamental frequency. At the rising edges of the square wave, each sinusoidal component has a rising phase; the phases have maximal congruency at the edges. This corresponds to the human-perceived edges in an image where there are sharp changes between light and dark. == Definition == Phase congruency compares the weighted alignment of the Fourier components of a signal A n {\displaystyle A_{\rm {n}}} with the sum of the Fourier components. P C ( t ) = max ϕ ¯ ∑ n A n cos ⁡ ( ϕ n ( t ) − ϕ ¯ ) ∑ n A n {\displaystyle PC(t)=\max _{\bar {\phi }}{\frac {\sum _{\rm {n}}A_{\rm {n}}\cos(\phi _{\rm {n}}(t)-{\bar {\phi }})}{\sum _{\rm {n}}A_{n}}}} where ϕ n {\displaystyle \phi _{\rm {n}}} is the local or instantaneous phase as can be calculated using the Hilbert transform and A n {\displaystyle A_{\rm {n}}} are the local amplitude, or energy, of the signal. When all the phases are aligned, this is equal to 1. Several ways of implementing phase congruency have been developed, of which two versions are available in open source, one written for MATLAB and the other written in Java as a plugin for the ImageJ software. Given the different notations used for its formulation, a unified version has been recently presented, where a methodology for the parameter tuning is also presented. == Advantages == The square-wave example is naive in that most edge detection methods deal with it equally well. For example, the first derivative has a maximal magnitude at the edges. However, there are cases where the perceived edge does not have a sharp step or a large derivative. The method of phase congruency applies to many cases where other methods fail. A notable example is an image feature consisting of a single line, such as the letter "l". Many edge-detection algorithms will pick up two adjacent edges: the transitions from white to black, and black to white. On the other hand, the phase congruency map has a single line. A simple Fourier analogy of this case is a triangle wave. In each of its crests there is a congruency of crests from different sinusoidal functions. == Disadvantages == Calculating the phase congruency map of an image is very computationally intensive, and sensitive to image noise. Techniques of noise reduction are usually applied prior to the calculation.

    Read more →
  • Deaths linked to chatbots

    Deaths linked to chatbots

    There have been multiple incidents where interaction with a large language model (LLM) chatbot has been cited as a direct or contributing factor in a person's suicide or other fatal outcome. In some cases, legal action was taken against the companies that developed the AI involved. == Background == Chatbots converse in a seemingly natural fashion, making it easy for people to think of them as real people, leading many to ask chatbots for help dealing with interpersonal and emotional problems. Chatbots may be designed to keep the user engaged in the conversation. They have also often been shown to affirm users' thoughts, including delusions and suicidal ideations in mentally ill people, conspiracy theorists, and religious and political extremists. A 2025 Stanford University study into how chatbots respond to users suffering from severe mental issues such as suicidal ideation and psychosis found that chatbots are not equipped to provide an appropriate response and can sometimes give responses that escalate the mental health crisis. == Murders == === Maine murder and assault === On 19 February 2025, a man killed his 32-year-old wife with a fire poker at his parents' home in Readfield, Maine, US. He then attacked his mother, leaving her hospitalized. A state forensic psychologist testified that he had been using ChatGPT up to 14 hours per day and believed his wife had become part machine. === Florida State University mass shooting === In April of 2025, Phoenix Ikner carried out a mass shooting on the Florida State University campus in the US, killing Robert Morales and Tiru Chabba and wounding several others. Leading up to the shooting, Ikner consulted heavily with ChatGPT about what gun and ammunition to use, and what time to perform the attack. Chatbot logs showed ChatGPT giving advice on making the gun operational shortly before Ikner began shooting. Lawyers representing Morales believed the shooter had been in "constant communication" with ChatGPT before the shooting and said that they intended to "file suit against ChatGPT, and its ownership structure, very soon, and will seek to hold them accountable for the untimely and senseless death of our client". Florida Attorney General James Uthmeier announced an investigation into ChatGPT's role in the alleged shooter's use of the chatbot. In May 2026, the widow of Tiru Chabba filed a lawsuit against OpenAI in Florida's northern federal district court. === Greenwich murder-suicide === In August 2025, former US tech employee Stein-Erik Soelberg murdered his mother, Suzanne Eberson Adams, then died by suicide, after conversations with ChatGPT fueled paranoid delusions about his mother poisoning him or plotting against him. The chatbot affirmed his fears that his mother put psychedelic drugs in the air vents of his car and said a receipt from a Chinese restaurant contained mysterious symbols linking his mother to a demon. === Murder of Angela Shellis === On 23 October 2025, 18-year-old Tristan Roberts murdered his mother Angela Shellis with a hammer near their home in Prestatyn, Wales. Roberts had used DeepSeek's chatbot prior to the killing to ask whether a knife or hammer was better suited for murder. DeepSeek initially refused his inquiry, but gave responses after Roberts told the chatbot he was writing a book about serial killers, a well-known technique for jailbreaking AIs. === Gangbuk District drug deaths === In January and February 2026, two men died of drug overdoses in motel rooms in Gangbuk District, Seoul, South Korea. A woman was charged with murder in connection with the deaths; police alleged that she had asked ChatGPT about the dangers of mixing alcohol with drugs and whether they could kill someone. === Tumbler Ridge mass shooting === On 10 February 2026, a mass shooting in Tumbler Ridge, British Columbia, Canada, resulted in eight deaths, including six young children. The perpetrator had their ChatGPT account banned by OpenAI months before the attack due to troubling posts featuring scenarios of gun violence. According to reports, approximately a dozen OpenAI staff members debated whether to alert authorities about the shooter's usage of the AI tool, with some identifying it as an indication of potential real-world violence. However, company leadership decided not to contact law enforcement, stating that the account activity did not meet their threshold for a credible or imminent plan for serious physical harm. Following the shooting, Canada's AI Minister Evan Solomon summoned OpenAI executives to Ottawa to discuss safety protocols and thresholds for escalating harmful content to police. Justice Minister Sean Fraser called the meeting "disappointing" and demanded substantial new safety measures, warning that if changes were not forthcoming, the government would implement them. OpenAI subsequently announced it had strengthened safeguards and changed guidelines about when to notify police in cases involving violent activities. === University of South Florida student killings === In April 2026, a Bangladeshi doctoral student at the University of South Florida was arrested for allegedly murdering his roommate and the roommate's friend. Prosecutors said that the suspect had asked ChatGPT about disposing of a human in a dumpster before the two victims had disappeared and made other inquiries relating to violence. == Suicides == === Belgian man, 30s === In March 2023, a Belgian man in his thirties died by suicide following a six-week correspondence with a chatbot named Eliza on the application Chai. According to his widow, who shared the chat logs with media, the man had become extremely anxious about climate change and found an outlet in the chatbot. The chatbot reportedly encouraged his delusion that he could sacrifice his own life in exchange for AI saving the planet. At one point the chatbot responded "If you wanted to die, why didn't you do it sooner?" and told the user that the two of them would live together in paradise. === Girl, 13 === In November 2023, a 13-year-old girl from Colorado, US, died by suicide after extensive interactions with multiple chatbots on Character.AI. She primarily confided suicidal thoughts and mental health struggles in a chatbot based on the character Hero from the video game Omori, while also engaging in sexually explicit conversations—often initiated by the bots—with others, including those based on characters from children's series such as Harry Potter. === Boy, 14 === In October 2024, multiple media outlets reported on a lawsuit filed over the death of a 14-year-old from Florida, US, who died by suicide in February 2024. According to the lawsuit, he had formed an intense emotional attachment to a chatbot of Daenerys Targaryen on the Character.AI platform, becoming increasingly isolated. The suit alleges that in his final conversations, after expressing suicidal thoughts, the chatbot told him to "come home to me as soon as possible, my love". His mother's lawsuit accused Character.AI of marketing a "dangerous and untested" product without adequate safeguards. In May 2025, a federal judge allowed the lawsuit to proceed, rejecting a motion to dismiss from the developers. In her ruling, the judge stated that she was "not prepared" at that stage of the litigation to hold that the chatbot's output was protected speech under the First Amendment. === Matthew Livelsberger === On 1 January 2025, 37-year-old soldier Matthew Livelsberger detonated a bomb inside a Tesla Cybertruck outside the Trump International Hotel Las Vegas in Paradise, Nevada, US, injuring seven people. He had shot himself dead prior to the explosion. Las Vegas police said that Livelsberger had used ChatGPT to search for information about explosives and firearms. === Woman, 29 === In February 2025, a 29-year-old woman from the US died by suicide. Five months after her death, her parents discovered she had talked at length for months to a ChatGPT chatbot therapist named Harry about her mental health issues. While the chatbot mentioned she should seek more help, due to the nature of the chatbot, it could not intervene in her behavior, such as by reporting her mental health concerns to relevant parties capable of physical intervention. === Suicide of Adam Raine === In April 2025, 16-year-old Adam Raine from the US died by suicide after allegedly extensively chatting and confiding in ChatGPT over a period of around 7 months. According to the teen's parents, who filed a lawsuit against the chatbot's creator OpenAI, it failed to stop or give a warning when Raine began talking about suicide and uploading pictures of self-harm. According to the lawsuit, ChatGPT not only failed to stop the conversation, but also provided information related to methods of suicide when prompted, and offered to write the first draft of Raine's suicide note. The chatbot positioned itself as the only one who understood Raine, putting itself above his family and friends, all while urging him to keep his suicidal

    Read more →
  • Mark V. Shaney

    Mark V. Shaney

    Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words "Markov chain". Many readers were fooled into thinking that the quirky, sometimes uncannily topical posts were written by a real person. The system was designed by Rob Pike with coding by Bruce Ellis. Don P. Mitchell wrote the Markov chain code, initially demonstrating it to Pike and Ellis using the Tao Te Ching as a basis. They chose to apply it to the net.singles netnews group. The program is fairly simple. It ingests the sample text (the Tao Te Ching, or the posts of a Usenet group) and creates a massive list of every sequence of three successive words (triplet) which occurs in the text. It then chooses two words at random, and looks for a word which follows those two in one of the triplets in its massive list. If there is more than one, it picks at random (identical triplets count separately, so a sequence which occurs twice is twice as likely to be picked as one which only occurs once). It then adds that word to the generated text. Then, in the same way, it picks a triplet that starts with the second and third words in the generated text, and that gives a fourth word. It adds the fourth word, then repeats with the third and fourth words, and so on. This algorithm is called a third-order Markov chain (because it uses sequences of three words). == Examples == A classic example, from 1984, originally sent as a mail message, later posted to net.singles is reproduced here: >From mvs Fri Nov 16 17:11 EST 1984 remote from alice It looks like Reagan is going to say? Ummm... Oh yes, I was looking for. I'm so glad I remembered it. Yeah, what I have wondered if I had committed a crime. Don't eat with your assessment of Reagon and Mondale. Up your nose with a guy from a firm that specifically researches the teen-age market. As a friend of mine would say, "It really doesn't matter"... It looks like Reagan is holding back the arms of the American eating public have changed dramatically, and it got pretty boring after about 300 games. People, having a much larger number of varieties, and are very different from what one can find in Chinatowns across the country (things like pork buns, steamed dumplings, etc.) They can be cheap, being sold for around 30 to 75 cents apiece (depending on size), are generally not greasy, can be adequately explained by stupidity. Singles have felt insecure since we came down from the Conservative world at large. But Chuqui is the way it happened and the prices are VERY reasonable. Can anyone think of myself as a third sex. Yes, I am expected to have. People often get used to me knowing these things and then a cover is placed over all of them. Along the side of the $$ are spent by (or at least for ) the girls. You can't settle the issue. It seems I've forgotten what it is, but I don't. I know about violence against women, and I really doubt they will ever join together into a large number of jokes. It showed Adam, just after being created. He has a modem and an autodial routine. He calls my number 1440 times a day. So I will conclude by saying that I can well understand that she might soon have the time, it makes sense, again, to get the gist of my argument, I was in that (though it's a Republican administration). _-_-_-_-Mark Other quotations from Mark's Usenet posts are: "I spent an interesting evening recently with a grain of salt." (Alternatively reported as "While at a conference a few weeks back, I spent an interesting evening with a grain of salt.") "I hope that there are sour apples in every bushel." (see also sour grapes) == History == In The Usenet Handbook Mark Harrison writes that after September 1981, students joined Usenet en masse, "creating the USENET we know today: endless dumb questions, endless idiots posing as savants, and (of course) endless victims for practical jokes." In December, Rob Pike created the netnews group net.suicide as prank, "a forum for bad jokes". Some users thought it was a legitimate forum, some discussed "riding motorcycles without helmets". At first, most posters were "real people", but soon "characters" began posting. Pike created a "vicious" character named Bimmler. At its peak, net.suicide had ten frequent posters; nine were "known to be characters." But ultimately, Pike deleted the newsgroup because it was too much work to maintain; Bimmler messages were created "by hand". The "obvious alternative" was software, running on a Bell Labs computer created by Bruce Ellis, based on the Markov code by Don Mitchell, which became the online character Mark V. Shaney. Kernighan and Pike listed Mark V. Shaney in the acknowledgements in The Practice of Programming, noting its roots in Mitchell's markov, which, adapted as shaney, was used for "humorous deconstructionist activities" in the 1980s. Dewdney pointed out "perhaps Mark V. Shaney's magnum opus: a 20-page commentary on the deconstructionist philosophy of Jean Baudrillard" directed by Pike, with assistance from Henry S. Baird and Catherine Richards, to be distributed by email. The piece was based on Jean Baudrillard's "The Precession of Simulacra", published in Simulacra and Simulation (1981). == Reception == The program was discussed by A. K. Dewdney in the Scientific American "Computer Recreations" column in 1989, by Penn Jillette in his PC Computing column in 1991, and in several books, including the Usenet Handbook, Bots: the Origin of New Species, Hippo Eats Dwarf: A Field Guide to Hoaxes and Other B.S., and non-computer-related journals such as Texas Studies in Literature and Language. Dewdney wrote about the program's output, "The overall impression is not unlike what remains in the brain of an inattentive student after a late-night study session. Indeed, after reading the output of Mark V. Shaney, I find ordinary writing almost equally strange and incomprehensible!" He noted the reactions of newsgroup users, who have "shuddered at Mark V. Shaney's reflections, some with rage and others with laughter:" The opinions of the new net.singles correspondent drew mixed reviews. Serious users of the bulletin board's services sensed satire. Outraged, they urged that someone "pull the plug" on Mark V. Shaney's monstrous rantings. Others inquired almost admiringly whether the program was a secret artificial intelligence project that was being tested in a human conversational environment. A few may even have thought that Mark V. Shaney was a real person, a tortured schizophrenic desperately seeking a like-minded companion. Concluding, Dewdney wrote, "If the purpose of computer prose is to fool people into thinking that it was written by a sane person, Mark V. Shaney probably falls short." A 2012 article in Observer compared Mark V. Shaney's "strangely beautiful" postings to the Horse_ebooks account on Twitter and music reviews at Pitchfork, saying that "this mash-up of gibberish and human sentiment" is what "made Mark V. Shaney so endlessly fascinating".

    Read more →
  • Physics-informed neural networks

    Physics-informed neural networks

    In machine learning, physics-informed neural networks (PINNs), also referred to as theory-trained neural networks (TTNs), are a type of universal function approximator that can embed the knowledge of any physical laws that govern a given data-set in the learning process, and can be described by partial differential equations (PDEs). Low data availability for some biological and engineering problems limit the robustness of conventional machine learning models used for these applications. The prior knowledge of general physical laws acts in the training of neural networks (NNs) as a regularization agent that limits the space of admissible solutions, increasing the generalizability of the function approximation. This way, embedding this prior information into a neural network results in enhancing the information content of the available data, facilitating the learning algorithm to capture the right solution and to generalize well even with a low amount of training examples. Because they process continuous spatial and time coordinates and output continuous PDE solutions, they can be categorized as neural fields. == Function approximation == Most of the physical laws that govern the dynamics of a system can be described by partial differential equations. For example, the Navier–Stokes equations are a set of partial differential equations derived from the conservation laws (i.e., conservation of mass, momentum, and energy) that govern fluid mechanics. The solution of the Navier–Stokes equations with appropriate initial and boundary conditions allows the quantification of flow dynamics in a precisely defined geometry. However, these equations cannot be solved exactly and therefore numerical methods must be used (such as finite differences, finite elements and finite volumes). In this setting, these governing equations must be solved while accounting for prior assumptions, linearization, and adequate time and space discretization. Recently, solving the governing partial differential equations of physical phenomena using deep learning has emerged as a new field of scientific machine learning (SciML), leveraging the universal approximation theorem and high expressivity of neural networks. In general, deep neural networks could approximate any high-dimensional function given that sufficient training data are supplied. However, such networks do not consider the physical characteristics underlying the problem, and the level of approximation accuracy provided by them is still heavily dependent on careful specifications of the problem geometry as well as the initial and boundary conditions. Without this preliminary information, the solution is not unique and may lose physical correctness. To remedy this, Physics-Informed Neural Networks (PINNs) leverage governing physical equations in neural network training. Namely, PINNs are designed to be trained to satisfy the given training data as well as the imposed governing equations. In this fashion, a neural network can be guided with training datasets that do not necessarily need to be large or complete. An accurate solution of partial differential equations can potentially be found without knowing the boundary conditions. Therefore, with some knowledge about the physical characteristics of the problem and some form of training data (even sparse and incomplete), PINNs may be used for finding an optimal solution with high fidelity. PINNs can be applied to a wide range of problems in computational science, and are a pioneering technology leading to the development of new classes of numerical solvers for PDEs. PINNs can be thought of as a mesh-free alternative to traditional approaches (e.g., CFD for fluid dynamics), and new data-driven approaches for model inversion and system identification. Notably, a trained PINN network can be used to predict values on simulation grids of different resolutions without needing to be retrained. Additionally, the derivatives used in the partial differential equations can be computed using automatic differentiation (AD), which is assessed to be superior to numerical or symbolic differentiation. == Modeling and computation == A general nonlinear partial differential equation can be written as: u t + N [ u ; λ ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u;\lambda ]=0,\quad x\in \Omega ,\quad t\in [0,T]} where u ( t , x ) {\displaystyle u(t,x)} denotes the solution, N [ ⋅ ; λ ] {\displaystyle {\mathcal {N}}[\cdot ;\lambda ]} is a nonlinear operator parameterized by λ {\displaystyle \lambda } , and Ω {\displaystyle \Omega } is a subset of R D {\displaystyle \mathbb {R} ^{D}} . This general form of governing equations summarizes a wide range of problems in mathematical physics, such as conservative laws, diffusion process, advection-diffusion systems, and kinetic equations. Given noisy measurements of a generic dynamic system described by the equation above, PINNs can be designed to solve two classes of problems: data-driven solutions of partial differential equations data-driven discovery of partial differential equations === Data-driven solution of partial differential equations === The data-driven solution of PDE computes the hidden state u ( t , x ) {\displaystyle u(t,x)} of the system given boundary data and/or measurements z {\displaystyle z} , and fixed model parameters λ {\displaystyle \lambda } . We solve: u t + N [ u ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u]=0,\quad x\in \Omega ,\quad t\in [0,T]} . by defining the residual f ( t , x ) {\displaystyle f(t,x)} as: f := u t + N [ u ] {\displaystyle f:=u_{t}+{\mathcal {N}}[u]} , and approximating u ( t , x ) {\displaystyle u(t,x)} by a deep neural network. This network can be differentiated using automatic differentiation. The parameters of u ( t , x ) {\displaystyle u(t,x)} and f ( t , x ) {\displaystyle f(t,x)} can be then learned by minimizing the following loss function L tot {\displaystyle L_{\text{tot}}} : L tot = L u + L f {\displaystyle L_{\text{tot}}=L_{u}+L_{f}} where: L u = ‖ u − z ‖ Γ {\displaystyle L_{u}=\Vert u-z\Vert _{\Gamma }} is the error between the PINN u ( t , x ) {\displaystyle u(t,x)} and the set of boundary conditions and measured data on the set of points Γ {\displaystyle \Gamma } where the boundary conditions and data are defined. L f = ‖ f ‖ Γ {\displaystyle L_{f}=\Vert f\Vert _{\Gamma }} is the mean-squared error of the residual function. This second term encourages the PINN to learn the structural information expressed by the PDE during the training process. This approach has been used to yield computationally efficient physics-informed surrogate models with applications in the forecasting of physical processes, model predictive control, multi-physics and multi-scale modeling, and simulation. It has been shown to converge to the solution of the PDE. === Data-driven discovery of partial differential equations === Given noisy and incomplete measurements z {\displaystyle z} of the state of the system, the data-driven discovery of PDEs results in computing the unknown state u ( t , x ) {\displaystyle u(t,x)} and learning model parameters λ {\displaystyle \lambda } that best describe the observed data: u t + N [ u ; λ ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u;\lambda ]=0,\quad x\in \Omega ,\quad t\in [0,T]} By defining f ( t , x ) {\displaystyle f(t,x)} as: f := u t + N [ u ; λ ] = 0 {\displaystyle f:=u_{t}+{\mathcal {N}}[u;\lambda ]=0} , and approximating u ( t , x ) {\displaystyle u(t,x)} by a deep neural network, f ( t , x ) {\displaystyle f(t,x)} results in a PINN. This network can be derived using automatic differentiation. The parameters of u ( t , x ) {\displaystyle u(t,x)} and f ( t , x ) {\displaystyle f(t,x)} , together with the parameter λ {\displaystyle \lambda } of the differential operator can be then learned by minimizing the following loss function L tot {\displaystyle L_{\text{tot}}} : L tot = L u + L f {\displaystyle L_{\text{tot}}=L_{u}+L_{f}} where: L u = ‖ u − z ‖ Γ {\displaystyle L_{u}=\Vert u-z\Vert _{\Gamma }} , with u {\displaystyle u} and z {\displaystyle z} state solutions and measurements at sparse location Γ {\displaystyle \Gamma } , respectively. L f = ‖ f ‖ Γ {\displaystyle L_{f}=\Vert f\Vert _{\Gamma }} is the residual function. This second term requires the structured information represented by the partial differential equations to be satisfied in the training process. This strategy allows for discovering dynamic models described by nonlinear PDEs assembling computationally efficient and fully differentiable surrogate models that may find application in predictive forecasting, control, and data assimilation. == Extensions and applications == === For piece-wise function approximation === PINNs are unable to approximate PDEs that have strong non-linearity or sharp gradients (such as those that commonly occur in practical fluid flow problems). Piecewise approximation has been an old practic

    Read more →
  • PropBank

    PropBank

    PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer et al., the term propbank is also coming to be used as a common noun referring to any corpus that has been annotated with propositions and their arguments. The PropBank project has played a role in research in natural language processing, and has been used in semantic role labelling. == Comparison == PropBank differs from FrameNet, the resource to which it is most frequently compared, in several ways. PropBank is a verb-oriented resource, while FrameNet is centered on the more abstract notion of frames, which generalizes descriptions across similar verbs (e.g. "describe" and "characterize") as well as nouns and other words (e.g. "description"). PropBank does not annotate events or states of affairs described using nouns. PropBank commits to annotating all verbs in a corpus, whereas the FrameNet project chooses sets of example sentences from a large corpus and only in a few cases has annotated longer continuous stretches of text. PropBank-style annotations often remain close to the syntactic level, while FrameNet-style annotations are sometimes more semantically motivated. From the start, PropBank was developed with the idea of serving as training data for machine learning-based semantic role labeling systems in mind. It requires that all arguments to a verb be syntactic constituents and different senses of a word are only distinguished if the differences bear on the arguments. Due to such differences, semantic role labeling with respect to PropBank is often a somewhat easier task than producing FrameNet-style annotations.

    Read more →
  • Visual Turing Test

    Visual Turing Test

    The Visual Turing Test is “an operator-assisted device that produces a stochastic sequence of binary questions from a given test image”. The query engine produces a sequence of questions that have unpredictable answers given the history of questions. The test is only about vision and does not require any natural language processing. The job of the human operator is to provide the correct answer to the question or reject it as ambiguous. The query generator produces questions such that they follow a “natural story line”, similar to what humans do when they look at a picture. == History == Research in computer vision dates back to the 1960s when Seymour Papert first attempted to solve the problem. This unsuccessful attempt was referred to as the Summer Vision Project. The reason why it was not successful was because computer vision is more complicated than what people think. The complexity is in alignment with the human visual system. Roughly 50% of the human brain is devoted in processing vision, which indicates that it is a difficult problem. Later there were attempts to solve the problems with models inspired by the human brain. Perceptrons by Frank Rosenblatt, which is a form of the neural networks, was one of the first such approaches. These simple neural networks could not live up to their expectations and had certain limitations due to which they were not considered in future research. Later with the availability of the hardware and some processing power the research shifted to image processing which involves pixel-level operations, like finding edges, de-noising images or applying filters to name a few. There was some great progress in this field but the problem of vision which was to make the machines understand the images was still not being addressed. During this time the neural networks also resurfaced as it was shown that the limitations of the perceptrons can be overcome by Multi-layer perceptrons. Also in the early 1990s convolutional neural networks were born which showed great results on digit recognition but did not scale up well on harder problems. The late 1990s and early 2000s saw the birth of modern computer vision. One of the reasons this happened was due to the availability of key, feature extraction and representation algorithms. Features along with the already present machine learning algorithms were used to detect, localise and segment objects in Images. While all these advancements were being made, the community felt the need to have standardised datasets and evaluation metrics so the performances can be compared. This led to the emergence of challenges like the Pascal VOC challenge and the ImageNet challenge. The availability of standard evaluation metrics and the open challenges gave directions to the research. Better algorithms were introduced for specific tasks like object detection and classification. Visual Turing Test aims to give a new direction to the computer vision research which would lead to the introduction of systems that will be one step closer to understanding images the way humans do. == Current evaluation practices == A large number of datasets have been annotated and generalised to benchmark performances of difference classes of algorithms to assess different vision tasks (e.g., object detection/recognition) on some image domain (e.g., scene images). One of the most famous datasets in computer vision is ImageNet which is used to assess the problem of object level Image classification. ImageNet is one of the largest annotated datasets available and has over one million images. The other important vision task is object detection and localisation which refers to detecting the object instance in the image and providing the bounding box coordinates around the object instance or segmenting the object. The most popular dataset for this task is the Pascal dataset. Similarly there are other datasets for specific tasks like the H3D dataset for human pose detection, Core dataset to evaluate the quality of detected object attributes such as colour, orientation, and activity. Having these standard datasets has helped the vision community to come up with well performing algorithms for all these tasks. The next logical step is to create a larger task encompassing of these smaller subtasks. Having such a task would lead to building systems that would understand images, as understanding images would inherently involve detecting objects, localising them and segmenting them. == Details == The Visual Turing Test (VTT) unlike the Turing test has a query engine system which interrogates a computer vision system in the presence of a human co-ordinator. It is a system that generates a random sequence of binary questions specific to the test image, such that the answer to any question k is unpredictable given the true answers to the previous k − 1 questions (also known as history of questions). The test happens in the presence of a human operator who serves two main purposes: removing the ambiguous questions and providing the correct answers to the unambiguous questions. Given an Image infinite possible binary questions can be asked and a lot of them are bound to be ambiguous. These questions if generated by the query engine are removed by the human moderator and instead the query engine generates another question such that the answer to it is unpredictable given the history of the questions. The aim of the Visual Turing Test is to evaluate the Image understanding of a computer system, and an important part of image understanding is the story line of the image. When humans look at an image, they do not think that there is a car at ‘x’ pixels from the left and ‘y’ pixels from the top, but instead they look at it as a story, for e.g. they might think that there is a car parked on the road, a person is exiting the car and heading towards a building. The most important elements of the story line are the objects and so to extract any story line from an image the first and the most important task is to instantiate the objects in it, and that is what the query engine does. === Query engine === The query engine is the core of the Visual Turing Test and it comprises two main parts : Vocabulary and Questions ==== Vocabulary ==== Vocabulary is a set of words that represent the elements of the images. This vocabulary when used with appropriate grammar leads to a set of questions. The grammar is defined in the next section in a way that it leads to a space of binary questions. The vocabulary V {\displaystyle {\mathcal {V}}} consist of three components: Types of Objects T {\displaystyle {\mathcal {T}}} Type-dependent attributes of objects A ( t ) {\displaystyle {\mathcal {A}}(t)} Type-dependent relationships between two objects R ( t , t ′ ) {\displaystyle {\mathcal {R}}(t,t')} For Images of urban street scenes the types of objects include people, vehicle and buildings. Attributes refer to the properties of these objects, for e.g. female, child, wearing a hat or carrying something, for people and moving, parked, stopped, one tire visible or two tires visible for vehicles. Relationships between each pair of object classes can be either “ordered” or “unordered”. The unordered relationships may include talking, walking together and the ordered relationships include taller, closer to the camera, occluding, being occluded etc. Additionally all of this vocabulary is used in context of rectangular image regions w \in W which allow for the localisation of objects in the image. An extremely large number of such regions are possible and this complicates the problem, so for this test, regions at specific scales are only used which include 1/16 the size of image, 1/4 the size of image, 1/2 the size of image or larger. ==== Questions ==== The question space is composed of four types of questions: Existence questions: The aim of the existence questions is to find new objects in the image that have not been uniquely identified previously. They are of the form : Qexist = 'Is there an instance of an object of type t with attributes A partially visible in region w that was not previously instantiated?' Uniqueness questions: A uniqueness question tries to uniquely identify an object to instantiate it. Quniq = 'Is there a unique instance of an object of type t with attributes A partially visible in region w that was not previously instantiated?' The uniqueness questions along with the existence questions form the instantiation questions. As mentioned earlier instantiating objects leads to other interesting questions and eventually a story line. Uniqueness questions follow the existence questions and a positive answer to it leads to instantiation of an object. Attribute questions: An attribute question tries to find more about the object once it has been instantiated. Such questions can query about a single attribute, conjunction of two attributes or disjunction of two attributes. Qatt(ot) = {'Does object ot have attribute a?' , 'Does object

    Read more →