AI Analytical Thinking

AI Analytical Thinking — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Vector database

    Vector database

    A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==

    Read more →
  • Is an AI Photo Editor Worth It in 2026?

    Is an AI Photo Editor Worth It in 2026?

    Shopping for the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Mona Diab

    Mona Diab

    Mona Talat Diab (Arabic: منى طلعت دياب) is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning. == Education == Diab completed her M.Sc. in computer science with a major in machine learning and artificial intelligence at The George Washington University (1997) and her Ph.D. in computational linguistics at the University of Maryland, Linguistics Department and University of Maryland Institute for Advanced Computer Studies (UMIACS) in 2003, under the supervision of Philip Resnik. She was also a postdoctoral research scientist at Stanford University (2003–2005) under the mentorship of Dan Jurafsky, where she was a part of the Stanford NLP Group. == Career == After her postdoc at Stanford, Diab took a position as research scientist (principal investigator) at the Center for Computational Learning Systems (CCLS) in Columbia University, where she was also adjunct professor in the computer science department. In 2013 she joined the George Washington University as an associate professor, where she was promoted to full professor in 2017. Diab is the founder and director of the GW NLP lab CARE4Lang. Diab served as an elected faculty senator at Columbia University for 6 years (2007–2012) and an elected faculty senator at GW (2013–2014). She served the computational linguistics community as elected member, secretary and president of ACL SIGLEX (2005–2016) and elected president of ACL SIGSemitic. She currently serves as the elected VP-elect for ACL SIGDAT. In 2017 Diab joined Amazon AWS AI Deep Learning Group for Human Language Technologies, where she led the AWS Lex project for task oriented dialogue systems for enterprises. A couple of years later, she moved to Facebook AI as a research scientist. In the fall of 2023, she became the director of CMU's Language Technologies Institute -- the first full time director since the passing of its founder Jaime Carbonell. == Research == Diab's research interests include several areas in computational linguistics/natural language processing, like conversational AI, computational lexical semantics, multilingual and cross lingual processing, social media processing with an emphasis on computational socio- pragmatics, information extraction & text analytics, machine translation. Besides this, she also has special interests in Arabic NLP and low resource scenarios. Diab co-established two research trends in the computational linguistics field, computational approaches to linguistic code switching in 2007 and semantic textual similarity in 2010. Diab together with Nizar Habash and Owen Rambow, co-founded CADIM in 2005, a global reference point in Arabic dialect processing. In 2012, Diab together with Eneko Agirre and Johan Bos, brought together two ACL communities SIGLEX and SIGSEM and established the 1st tier conference SEM. == Awards and recognition == Selected as one of top 150 leaders and visionaries in AI nationwide to participate in White House AI Summit in Government, Washington, D.C., US, September 2019 March 2017: 3 Muslim Women in STEM You Should Know About, Teen Vogue, March 2017 May 2017: Behind Every Strong Woman Is...Another Strong Woman: Ten women give thanks to the women who supported them on the way up. Elle, May 2017. Google Faculty Research Award – Tharwa++: Building a multidialectal Arabic Lexical Repository, (PI), 09.2015 –12.2016. Google Faculty Research Award – Nuanced Sentiment and Perspective Analysis for Arabic Social Media Text, (PI), 12.2014 –12.2015 QNRF Best Poster Award – Ossama Obeid, Houda Bouamor, Wajdi Zaghouani, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab, Kemal Oflazer. (2016) MANDIAC: A Web-based Annotation System For Manual Arabic Diacritization. Proceedings of the 2nd Workshop on Arabic Corpora and Processing Tools, LREC 2016. Best Paper Award – Aminian, Maryam, Mahmoud Ghoneim, Mona Diab. (2015) Unsupervised False Friend Disambiguation Using Contextual Word Clusters and Parallel Word Alignments. In Proceedings of Workshop 9th Semantics Syntax Statistical Translation, NAACL 2015, Denver CO, US. == Publications == Diab has over 250 publications, and she is an acting editor for several scientific journals. === Selected publications === Semeval-2012 task 6: A pilot on semantic textual similarity. E. Agirre, D. Cer, M. Diab, A. Gonzalez-Agirre. SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) Predictive linguistic features of schizophrenia. ES Kayi, M Diab, L Pauselli, M Compton, G Coppersmith. arXiv preprint arXiv:1810.09377 Ideological perspective detection using semantic features. H Elfardy, M Diab, C Callison-Burch – Proceedings of SEM 2015 DeSePtion: Dual sequence prediction and adversarial examples for improved fact-checking. Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan, 2020 Does Causal Coherence Predict Online Spread of Social Media? Pedram Hosseini, Mona Diab, David A Broniatowski. Proceedings of International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, 2019. Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections. YA Lai, X Zhu, Y Zhang, M Diab, arXiv preprint arXiv:2003.08529, 2020 Readability of written medicine information materials in Arabic language: expert and consumer evaluation. S Al Aqeel, N Abanmy, A Aldayel, H Al-Khalifa, M Al-Yahya, M Diab. BMC health services research 18 (1), 1–7, 2019 Unsupervised word mapping using structural similarities in monolingual embeddings. H Aldarmaki, M Mohan, M Diab – Transactions of the Association for Computational Linguistics, 2018 An unsupervised method for word sense tagging using parallel corpora M Diab, P Resnik. Proceedings of ACL 2002 Overview for the first shared task on language identification in code-switched data. Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Julia Hirschberg, Alison Chang, Pascale Fung. Proceedings of the First Workshop on Computational Approaches to Code Switching, 2014 Modeling sentences in the latent space. W Guo, M Diab – ACL 20 12 Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. M Carpuat, M Diab – NAACL-HLT 2010 Rumor detection and classification for twitter data. S Hamidian, MT Diab – arXiv preprint arXiv:1912.08926, 2019 Subgroup detection in ideological discussions. A Abu-Jbara, P Dasigi, M Diab, D Radev – ACL 2012 Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. A. Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, R. Roth. LREC 14, 1094–1101. 2014 Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots. A. Gupta, P. Zhang, G. Lalwani, M. Diab. EMNLP 2019 A multitask learning approach for diacritic restoration. S. Alqahtani, A. Mishra, M. Diab. ACL 2020

    Read more →
  • Michael Kohlhase

    Michael Kohlhase

    Michael Kohlhase (born 13 September 1964, in Erlangen) is a German computer scientist and professor at University of Erlangen–Nuremberg, where he is head of the KWARC research group (Knowledge Adaptation and Reasoning for Content). == Academic Positions == Michael Kohlhase is president of the OpenMath Society and a trustee of the Interest Group for Mathematical Knowledge Management (MKM). He was a trustee of the Conference on Automated Deduction and the CALCULEMUS Interest Group. He has been Conference Chair of CADE-21 and Program Chair of the KI-2006, MKM-2005, and CALCULEMUS-2000 conferences and has served on the Programme Committees of more than three dozen international conferences. Kohlhase holds an adjunct associate professorship at Carnegie Mellon University and was (2006–2008) vice director of the Department of Safe and Secure Cognitive Systems at German Research Centre for Artificial Intelligence (DFKI) Lab Bremen. In 2014, he became a member of the Global Digital Mathematics Library Working Group of the IMU. == Academic career == Michael Kohlhase obtained a degree in Mathematics (1989) from University of Bonn, a doctorate (1994) and habilitation (1999) in Computer Science at Saarland University. He has pursued his doctoral and post-doctoral research in extended research visits at Carnegie Mellon University, University of Amsterdam, the University of Edinburgh, and SRI International. From 2000–2003, he has conducted research and taught at the School of Computer Science at Carnegie Mellon University, where he was appointed to an adjunct associate professor. In September 2003 he was appointed as Professor of Computer Science at Jacobs University Bremen (International University Bremen until 2007), and 2006–2008 he was vice director of the Department of Safe and Secure Cognitive Systems of the German Research Centre for Artificial Intelligence (DFKI) Bremen. Since September 2016 he holds the Professorship for Knowledge Representation and Processing at University of Erlangen–Nuremberg. He has authored or edited four books and published almost 100 peer-reviewed papers. == Awards and Scholarships == 2000 3-year Heisenberg-Stipend of the Deutsche Forschungsgemeinschaft (DFG). 1996 AKI-prize, dissertation prize of the "Arbeitsgemeinschaft deutscher KI-Institute (AKI)" 1991 dissertation stipend of the Studienstiftung (German National Academic Foundation) 1986 masters stipend of Studienstiftung == Research interests == Michael Kohlhase's current research interests include Automated theorem proving and knowledge representation for mathematics, inference-based techniques for natural language processing and semantics, and computer-supported education. Much of his concrete work is based on web-based content markup formats like MathML, OpenMath, and OMDoc and systems for managing this data, e.g. semantic search engines for mathematical formulae, semantic extensions to LaTeX, or converting legacy LaTeX documents from the arXiv.

    Read more →
  • Digital Michelangelo Project

    Digital Michelangelo Project

    The Digital Michelangelo Project was a pioneering initiative undertaken during the 1998–1999 academic year to digitize the sculptures and architecture of Michelangelo using advanced laser scanning technology. The project was led by a team of 30 faculty, staff, and students from Stanford University and the University of Washington, with the aim of creating high-resolution 3D models of Michelangelo's works for scholarly, educational, and preservation purposes. == Objectives == The primary goals of the Digital Michelangelo Project were: To apply recent advancements in laser rangefinder technology for digitizing large cultural artifacts. To create detailed digital archives of Michelangelo's sculptures and architectural spaces for future study and analysis. To explore potential educational and curatorial applications for 3D scanned data. === Artworks digitized === The project involved scanning several iconic works by Michelangelo, including: David The Unfinished Slaves (Atlas, Awakening, Bearded, and Youthful) St. Matthew The allegorical statues from the Medici tombs (Night, Day, Dawn, and Dusk) The architectural interiors of the Tribuna del David at the Galleria dell'Accademia and the New Sacristy in the Medici Chapels. == Technology and methodology == === 3D scanning === The project's primary scanner was a laser triangulation rangefinder mounted on a motorized gantry, custom-built by Cyberware Inc. The scanner used a laser sheet to project onto an object, capturing its shape through triangulation. Multiple scans were taken from various angles and combined into a single, detailed 3D mesh. The resolution achieved was fine enough to capture even Michelangelo's chisel marks, with triangles approximately 0.25 mm on each side. In addition to shape data, color data was captured using a spotlight and a secondary camera, enabling the creation of textured 3D models. === Data processing === The project developed a software suite for processing the scanned data. This included: Aligning and merging multiple scans into a seamless 3D model. Filling holes in the geometry caused by inaccessible areas. Correcting color data for lighting inconsistencies and shadowing. Non-photorealistic rendering techniques were also applied, highlighting surface features such as Michelangelo’s chisel marks for enhanced visualization. == Logistical challenges == The scale and complexity of the project presented several challenges: Data size: The dataset for David alone comprised 2 billion polygons and 7,000 color images, occupying 60 GB of storage. Artifact safety: Ensuring the safety of the statues during scanning required extensive crew training, foam-encased equipment, and collision-prevention mechanisms. == Applications and impact == The digitized models have numerous potential applications: Art history: Allowing precise measurements and geometric analysis, such as determining chisel types or evaluating structural balance. Education: Providing new ways to study art, including interactive viewing from unconventional angles and with custom lighting. Museum curation: Enhancing visitor experiences through interactive kiosks and virtual models. The project demonstrated the potential for 3D technology to preserve and disseminate cultural heritage. == Data distribution == The project's models are available through Stanford University for scholarly purposes, under strict licensing due to Italian intellectual property laws. === ScanView === To provide public access to the 3D models while respecting usage restrictions, the project developed ScanView, a client/server rendering system. ScanView allows users to view and interact with high-resolution 3D models without downloading the data. The client component consists of a freely available viewer program and simplified 3D models. Users can navigate these models locally, adjusting position, orientation, lighting, and surface appearance. When a user finalizes a view, the client queries a remote server for a high-resolution rendering of the model, which is sent back to overwrite the simplified version on the user’s screen. A typical query-response cycle takes 1–2 seconds, depending on network conditions. To protect the models from unauthorized reconstruction, the system employs several security measures, including: Encrypting queries Perturbing viewpoint and lighting parameters Adding noise and warping rendered images Compressing images before transmission ScanView operates on Windows-based PCs and provides access to selected models, including David and St. Matthew, as well as other artifacts such as fragments of the Forma Urbis Romae and items from the Stanford 3D Scanning Repository. == Sponsors == The Digital Michelangelo Project was supported by Stanford University, Interval Research Corporation, and the Paul G. Allen Foundation for the Arts.

    Read more →
  • General Internet Corpus of Russian

    General Internet Corpus of Russian

    General Internet Corpus of Russian (GICR) is a corpus of Russian internet texts that has been accessible on request through an online query interface since 2013. The corpus includes rich text materials from the blogosphere, social networks, major news sources and literary magazines. == Goals of the project == The project has the status of an educational and scientific one, and many tasks of computational linguistics are solved by independent researchers and research groups with the materials obtained by GICR. While other corpus projects of Russian are focused on fiction and edited texts, General Internet Corpus provides linguists timely opportunity to learn the language as it is, with all the slang and regional peculiarities. Corpus gives the opportunity to carry out research in Linguistic research of a wide range: dialectological research, study of word distribution, study of the language of the social networks, study of the influence of gender, age and other factors on the language, frequency of words, fixed expressions and different constructions, stylistic features of texts of different segments of the Internet, etc. Social media analysis Corpus-based machine learning for evaluating automatic tagging At various times, student papers and independent researches were carried out on the project material by students, graduates and employees of MSU, MIPT, Russian State Humanitarian University, Novosibirsk State University, Higher School of Economics, Russian Academy of Sciences, SFU, CSU, SGMP, IAAS of MSU. Scientific project leaders: Belikov V. - RSUH, Moscow, Russia Selegey V. - RSUH, ABBYY, Moscow, Russia Sharoff S. - RSUH, Moscow, Russia; University of Leeds, UK The organizations involved in support of GICR: Russian State University of Humanities ABBYY Company Moscow Institute of Physics and Technology Skolkovo Institute of Science and Technology == Size and content of the corpus == Corpus size for the summer 2016 is 19.8 billion tokens, of which 49% are from VKontakte, 40% are from LiveJournal, another 4% - from Mail.ru Blogs and News, and 2% - from Russian Magazine Hall. The sources collected in news segment are: RIA Novosti, Regnum, Lenta.ru, Rosbalt. Texts are provided with metamarkup (by date of creation of the text, sex, place and year of birth of the author, Internet genre, etc.); all texts are provided with automatic morphological tagging and lemmatization. Most of the texts collected are of 2013–2014 years of creation, although in some segments, such as in Russian Magazine Hall, there are some texts collected since 1994. GICR is one of the few mega-corpora projects nowadays, which means its available size is reaching several billion of words. == Access == Currently the interface of GICR is in beta stage, so access to the search in the corpora is provided and is free, but is available for researchers on request.

    Read more →
  • Richard Zemel

    Richard Zemel

    Richard Stanley Zemel (born 1963) is a Canadian-American computer scientist and professor at Columbia University, Department of Computer Science, and a leading figure in the field of machine learning and computer vision. Zemel studied the history of science at Harvard University and obtained his B.A. in 1984. He continued his study at the Department of Computer Science of the University of Toronto under the supervision of Geoffrey Hinton. He obtained his M.Sc. and Ph.D. both in computer science in 1989 and 1994, respectively.

    Read more →
  • Phraselator

    Phraselator

    The Phraselator is a weatherproof handheld language translation device developed by Applied Data Systems and VoxTec, a former division of the military contractor Marine Acoustics, located in Annapolis, Maryland, USA. It was designed to serve as a handheld computer device that translates English into one of 40 different languages. == The device == The Phraselator is a small speech translation PDA-sized device designed to aid in interpretation. The device does not produce synthesized speech like that utilized by Stephen Hawking; instead, it plays pre-recorded foreign language MP3 files. Users can select the phrase they wish to convey from an English list on the screen or speak into the device. It then uses speech recognition technology called DynaSpeak, developed by SRI International, to play the proper sound file. The accuracy of the speech recognition software is over 70 percent according to software developer Jack Buchanan. The device can also record replies for translation later. Pre-recorded phrases are stored on Secure Digital flash memory cards. A 128 MB card can hold up to 12,000 phrases in four or five languages. Users can download phrase modules from the official website, which contained over 300,000 phrases as of March 2005. Users can also construct their own custom phrase modules. Earlier devices were known to have run on an SA-1110 Strong Arm 206 MHz CPU with 32MB SDRAM and 32MB onboard Flash RAM. A newer model, the P2, was released in 2004 and developed according to feedback from U.S. soldiers. It translates one way from English to approximately 60 other languages. It has a directional microphone, a larger library of phrases and a longer battery life. The 2004 release was created by and utilizes a computer board manufactured by InHand Electronics, Inc. In the future, the device will be able to display pictures so users can ask questions such as "Have you seen this person?" Developer Ace Sarich notes that the device is inferior to human interpreter. Conclusions derived from a Nepal field test conducted by U.S. and Nepal based NGO Himalayan Aid in 2004 seemed to confirm Sarich's comparisons: The very concept of using a machine as a communication point between individuals seemed to actually encourage a more limited form of interaction between tester and respondent. Usually, when limited language skills are present between parties, the genuine struggle and desire to communicate acts as a display of good will – we openly display our weakness in this regard – and the result is a more relaxed and human encounter. This was not necessarily present with the Phraselator as all parties abandoned learning about each other and instead focused on learning how to work with the device. As a tool for bridging any cultural differences or communicating effectively at any length, the Phraselator would not be recommended. This device, at least in the form tested, would best be used in large-scale operations where there is no time for language training and there is a need to communicate fixed ideas, quickly, over the greatest distance by employing large amounts of unskilled users. Large humanitarian or natural disasters in remote areas of third-world countries might be an effective example. == Origin == The original idea for the device came from Lee Morin, a Navy doctor in Operation Desert Storm. To communicate with patients, he played Arabic audio files from his laptop. He informed Ace Sarich, the vice president of VoxTec, about the idea. VoxTec won a DARPA Small Business Innovation Research grant in early 2001 to develop a military-grade handheld phrase translator. During its development, the Phraselator was tested and evaluated by scientists from the Army Research Laboratory. The device was first field tested in Afghanistan in 2001. By 2002, about 500 Phraselators were built for soldiers around the world with another 250 ordered by the U.S. Special Forces. The device cost $2000 to develop and could convert spoken English into one of 200,000 recorded commands and questions in 30 languages. However, the device could only translate one-way. At the time, the only existing two-way voice translator that could convert speech back and forth between languages was the Audio Voice Translation Guide System, or TONGUES, which was developed by Carnegie Mellon University for Lockheed Martin. As part of a DARPA program known as the Spoken Language Communication and Translation System for Tactical Use, SRI International has further developed two-way translation software for use in Iraq called IraqComm in 2006 which contains a vocabulary of 40,000 English words and 50,000 words in Iraqi Arabic. == Notable users == The handheld translator was recently used by U.S. troops while providing relief to tsunami victims in early 2005. About 500 prototypes of the device were provided to U.S. military forces in Operation Enduring Freedom. Units loaded with Haitian dialects have been provided to U.S. troops in Haiti. Army military police have used it in Kandahar to communicate with POWs. In late 2004, the U.S. Navy began to augment some ships with a version of the device attached to large speakers in order to broadcast clear voice instructions up to 400 yards (370 m) away. Corrections officers and law enforcement in Oneida County, New York, have tested the device. Hospital emergency rooms and health departments have also evaluated it. Several Native American tribes such as the Choctaw Nation, the Ponca, and the Comanche Nation have also used the device to preserve their dying languages. Various law enforcement agencies, such as the Los Angeles Police Department, also use the phraselator in their patrol cars. == Awards == In March 2004, DARPA director Dr. Tony Tether presented the Small Business Innovative Research Award to the VoxTec division of Marine Acoustics at DARPATech 2004 in Anaheim, CA. The device was recently listed as one of "Ten Emerging Technologies That Will Change Your World" in MIT's Technology Review. == Pop culture == Software developer Jack Buchanan believes that building a device similar to the fictional universal translator seen in Star Trek would be harder than building the Enterprise. The device was mentioned in a list of "Top 10 Star Trek Tech" on Space.com.

    Read more →
  • Confusion matrix

    Confusion matrix

    In machine learning, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. In unsupervised learning it is usually called a matching matrix. The term is used specifically in the problem of statistical classification. Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The diagonal of the matrix therefore represents all instances that are correctly predicted. The name stems from the fact that it makes it easy to identify whether the system is confusing two classes (i.e., commonly mislabeling one class as another). The confusion matrix has its origins in human perceptual studies of auditory stimuli. It was adapted for machine learning studies and used by Frank Rosenblatt, among other early researchers, to compare human and machine classifications of visual (and later auditory) stimuli. It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table). == Example == Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows: Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9). Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column: The actual classification is positive and the predicted classification is positive (1,1). This is called a true positive result because the positive sample was correctly identified by the classifier. The actual classification is positive and the predicted classification is negative (1,0). This is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. The actual classification is negative and the predicted classification is positive (0,1). This is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. The actual classification is negative and the predicted classification is negative (0,0). This is called a true negative result because the negative sample gets correctly identified by the classifier. We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable. The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows: The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data. Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier: In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. P = T P + F N {\displaystyle P=TP+FN} and N = F P + T N {\displaystyle N=FP+TN} . == Table of confusion == In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly. For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer). According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC). Other metrics can be included in a confusion matrix, each of them having their significance and use. Some researchers have argued that the confusion matrix, and the metrics derived from it, do not truly reflect a model's knowledge. In particular, the confusion matrix cannot show whether correct predictions were reached through sound reasoning or merely by chance (a problem known in philosophy as epistemic luck). It also does not capture situations where the facts used to make a prediction later change or turn out to be wrong (defeasibility). This means that while the confusion matrix is a useful tool for measuring classification performance, it may give an incomplete picture of a model’s true reliability. == Confusion matrices with more than two categories == Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity. == Confusion matrices in multi-label and soft-label classification == Confusion matrices are not limited to single-label classification (where only one class is present) or hard-label settings (where classes are either fully present, 1, or absent, 0). They can also be extended to Multi-label classification (where multiple classes can be predicted at once) and soft-label classification (where classes can be partially present). One such extension is the Transport-based Confusion Matrix (TCM), which builds on the theory of optimal transport and the principle of maximum entropy. TCM applies to single-label, multi-label, and soft-label settings. It retains the familiar structure of the standard confusion matrix: a square matrix sized by the number of classes, with diagonal entries indicating correct predictions and off-diagonal entries indicating confusion. In the single-label case, TCM is identical to the standard confusion matrix. TCM follows the same reasoning as the standard confusion matrix: if class A is overestimated (its predicted value is greater than its label value) and class B is underestimated (its predicted value is less than its label value), A is considered confused with B, and the entry (B, A) is increased. If a class is both predicted and present, it is correctly identified, and the diagonal entry (A, A) increases. Optimal transport and maximum entropy are used to determine the extent to which these entries are updated. TCM enables clearer comparison between predictions and labels in complex classification tasks, while maintaining a consistent matrix format across settings.

    Read more →
  • Ronald J. Williams

    Ronald J. Williams

    Ronald James Williams (1945 – February 16, 2024) was an American mathematician and computer scientist who spent the majority of his career at Northeastern University. He is considered one of the pioneers of neural networks. In 1986, he co-authored the seminal paper in Nature on the backpropagation algorithm along with David Rumelhart and Geoffrey Hinton, which triggered a boom in neural network research. == Education and career == Williams was born in Southern California. He studied at California Institute of Technology as a undergraduate student and received a B.S. in mathematics there in 1966. He received his M.A. and Ph.D. in mathematics, both at University of California, San Diego (UCSD) in 1972 and 1975, respectively. His Ph.D. thesis was supervised by Donald Werner Anderson. He worked for a defense contractor for some time after graduation. From 1983 to 1986, Williams was a member of the Parallel Distributed Processing research group headed by David Rumelhart at the Institute for Cognitive Science at UCSD. In 1986, Williams accepted a professorship in computer science at Northeastern University in Boston, where he remained afterwards. In addition to the backpropagation paper, Williams made fundamental contributions to the fields of recurrent neural networks, where he, along with David Zipser, invented the teacher forcing algorithm and made important contributions to backpropagation through time. In reinforcement learning, Williams introduced the REINFORCE algorithm in 1992, which became the first policy gradient method. Besides his works on neural networks, Williams, together with Wenxu Tong and Mary Jo Ondrechen, developed Partial Order Optimum Likelihood (POOL), a machine learning method used in the prediction of active amino acids in protein structures. POOL is a maximum likelihood method with a monotonicity constraint and is a general predictor of properties that depend monotonically on the input features.

    Read more →
  • Best AI Video Editors in 2026

    Best AI Video Editors in 2026

    Shopping for the best AI video editor? An AI video editor is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI video editor slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Adam Tauman Kalai

    Adam Tauman Kalai

    Adam Tauman Kalai is an American computer scientist who specializes in artificial intelligence and works at OpenAI. == Education and career == Kalai graduated from Harvard University in 1996 with a BA in computer science and received a MA and PhD, both in computer science, from Carnegie Mellon University in 1999 and 2001, respectively. His doctoral advisor was Avrim Blum. After graduation, Kalai did his postdoctoral research at Massachusetts Institute of Technology under Santosh Vempala until 2003. Kalai became a faculty member at the Toyota Technological Institute at Chicago from 2003 to 2006, followed by a stint as an assistant professor at Georgia Institute of Technology from 2007 to 2008. He joined Microsoft Research in 2008 and subsequently moved to OpenAI in 2023. == Contributions == Kalai is known for his algorithm for generating random factored numbers (see Bach's algorithm), for co-inventing the cooperative-competitive value (coco value), for efficiently learning learning mixtures of Gaussians, for the Blum-Kalai-Wasserman algorithm for learning parity with noise, and for the intractability of the folk theorem in game theory. More recently, Kalai is known for identifying and reducing gender bias in word embeddings, which are a representation of words commonly used in AI systems. In 2026, he coauthored a Nature paper on hallucinations in large language models. == Personal life == Kalai is the son of game theorist Ehud Kalai and is married to cryptographer Yael Tauman Kalai.

    Read more →
  • PARRY

    PARRY

    PARRY was an early example of a chatbot, implemented in 1972 by psychiatrist Kenneth Colby. == History == PARRY was written in 1972 by psychiatrist Kenneth Colby, then at Stanford University. While ELIZA was a simulation of a Rogerian therapist, PARRY attempted to simulate a person with paranoid schizophrenia. The program implemented a crude model of the behavior of a person with paranoid schizophrenia based on concepts, conceptualizations, and beliefs (judgements about conceptualizations: accept, reject, neutral). It also embodied a conversational strategy, and as such was a much more serious and advanced program than ELIZA. It was described as "ELIZA with attitude". PARRY was tested in the early 1970s using a variation of the Turing Test. A group of experienced psychiatrists analysed a combination of real patients and computers running PARRY through teleprinters. Another group of 33 psychiatrists were shown transcripts of the conversations. The two groups were then asked to identify which of the "patients" were human and which were computer programs. The psychiatrists were able to make the correct identification only 48 percent of the time — a figure consistent with random guessing. PARRY and ELIZA (also known as "the Doctor") interacted several times. The most famous of these exchanges occurred at the ICCC 1972, where PARRY and ELIZA were hooked up over ARPANET and responded to each other.

    Read more →
  • Theano (software)

    Theano (software)

    Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures. == History == Theano is an open source project primarily developed by the Montreal Institute for Learning Algorithms (MILA) at the Université de Montréal. The name of the software references the ancient philosopher Theano, long associated with the development of the golden mean. On 28 September 2017, Pascal Lamblin posted a message from Yoshua Bengio, Head of MILA: major development would cease after the 1.0 release due to competing offerings by strong industrial players. Theano 1.0.0 was then released on 15 November 2017. On 17 May 2018, Chris Fonnesbeck wrote on behalf of the PyMC development team that the PyMC developers will officially assume control of Theano maintenance once the MILA development team steps down. On 29 January 2021, they started using the name Aesara for their fork of Theano. On 29 Nov 2022, the PyMC development team announced that the PyMC developers will fork the Aesara project under the name PyTensor. == Sample code == The following code is the original Theano's example. It defines a computational graph with 2 scalars a and b of type double and an operation between them (addition) and then creates a Python function f that does the actual computation. == Examples == === Matrix Multiplication (Dot Product) === The following code demonstrates how to perform matrix multiplication using Theano, which is essential for linear algebra operations in many machine learning tasks. === Gradient Calculation === The following code uses Theano to compute the gradient of a simple operation (like a neuron) with respect to its input. This is useful in training machine learning models (backpropagation). === Building a Simple Neural Network === The following code shows how to start building a simple neural network. This is a very basic neural network with one hidden layer. === Broadcasting in Theano === The following code demonstrates how broadcasting works in Theano. Broadcasting allows operations between arrays of different shapes without needing to explicitly reshape them.

    Read more →
  • How to Choose an AI Code-review Tool

    How to Choose an AI Code-review Tool

    Trying to pick the best AI code-review tool? An AI code-review tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI code-review tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →