AI For Students Exam Generator

AI For Students Exam Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Eigenface

    Eigenface

    An eigenface ( EYE-gən-) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. == History == The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En). In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels. Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes. == Generation == A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face. The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face. The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. === Practical implementation === To create a set of eigenfaces, one must: Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance ⁠ v = ( λ 1 + λ 2 + . . . + λ n ) {\displaystyle v=(\lambda _{1}+\lambda _{2}+...+\lambda _{n})} ⁠, n = number of components, and λ {\displaystyle \lambda } represents component eigenvalue. k is the smallest number that satisfies ( λ 1 + λ 2 + . . . + λ k ) v > ϵ {\displaystyle {\frac {(\lambda _{1}+\lambda _{2}+...+\lambda _{k})}{v}}>\epsilon } These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded. === Matlab example code === Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16. Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code: === Computing the eigenvectors === Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows. Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by S v i = T T T v i = λ i v i {\displaystyle \mathbf {Sv} _{i}=\mathbf {T} \mathbf {T} ^{T}\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} However TTT is a large matrix, and if instead we take the eigenvalue decomposition of T T T u i = λ i u i {\displaystyle \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {u} _{i}} then we notice that by pre-multiplying both sides of the equation with T, we obtain T T T T u i = λ i T u i {\displaystyle \mathbf {T} \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {T} \mathbf {u} _{i}} Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have

    Read more →
  • Volker Markl

    Volker Markl

    Volker Markl (born 1971) is a German computer scientist and database systems researcher. == Career == In 1999, Markl received his PhD in computer science under the direction of Rudolf Bayer at the Technical University of Munich. His doctoral research led to the development of the UB-Tree. From 1997 to 2000, he was research group leader at FORWISS, the Bavarian research center for knowledge-based systems. From 2001 to 2008, he was project leader at the IBM Almaden Research Center, Silicon Valley. Since 2008, he has been full professor and Chair of the Database Systems and Information Management Group at Technische Universität Berlin. Since 2014, he is head of the Intelligent Analytics for Massive Data Research Department at the German Research Centre for Artificial Intelligence (DFKI), Berlin. From 2014 to 2020, he was director of the Berlin Big Data Center (BBDC). From 2018 to 2020, he was co-director of the Berlin Machine Learning Center (BZML). Together with Klaus-Robert Müller he became director of the new Berlin Institute for the Foundations of Learning and Data (BIFOLD), after both BBDC and the BZML merged into BIFOLD in 2020. From 2010 through 2019, he led the DFG funded Stratosphere project, which led to the establishment of Apache Flink. In 2018, he was elected president of the VLDB Endowment for a six years period that ended in 2024. == Research == Markl’s research interests lie at the intersection of distributed systems, scalable data processing, and machine learning. == Awards and honors == Markl was elected member of the Berlin-Brandenburg Academy of Sciences and Humanities in 2021. Since 2026 he is member of the German National Academy of Sciences Leopoldina. His work was honoured with several awards, including: 2025 ICDE Best Paper Award 2021 ICDE Best Paper Award 2021 BTW Best Paper Award 2020 ACM SIGMOD Best Paper Award 2020 ACM Fellow 2019 EDBT Best Paper Award 2017 BTW Best Paper Award 2017 EDBT Best Demonstration Award 2016 ACM SIGMOD Research Highlight Award 2014 VLDB Best Paper Award 2012 IBM Faculty Award 2012 IBM Shared University Research Grant 2010 Hewlett Packard Open Innovation Award 2005 IBM Outstanding Technological Achievement Award 2005 IBM Pat Goldberg Best Paper Award

    Read more →
  • Babel Fish (website)

    Babel Fish (website)

    Yahoo! Babel Fish was a free Web-based machine translation service by Yahoo!. In May 2012 it was replaced by Bing Translator (now Microsoft Translator), to which queries were redirected. Although Yahoo! has transitioned its Babel Fish translation services to Bing Translator, it did not sell its translation application to Microsoft outright. As the oldest free online language translator, the service translated text or Web pages in 36 pairs between 13 languages, including English, Simplified Chinese, Traditional Chinese, Dutch, French, German, Greek, Italian, Japanese, Korean, Portuguese, Russian, and Spanish. The internet service derived its name from the Babel fish, a fictional species in Douglas Adams's book and radio series The Hitchhiker's Guide to the Galaxy that could instantly translate languages. In turn, the name of the fictional creature refers to the biblical account of the confusion of languages that arose in the city of Babel. == History == On December 9, 1997, Digital Equipment Corporation (DEC) and SYSTRAN S.A. launched AltaVista Translation Service at babelfish.altavista.com, which was developed by a team of researchers at DEC. In February 2003, AltaVista was bought by Overture Services, Inc. In July 2003, Overture, in turn, was taken over by Yahoo!. The web address for Babel Fish remained at babelfish.altavista.com until May 9, 2008, when the address changed to babelfish.yahoo.com. In 2012, the Web address changed again, this time redirecting babelfish.yahoo.com to www.microsofttranslator.com when Microsoft's Bing Translator replaced Yahoo Babel Fish. As of June 2013, babelfish.yahoo.com no longer redirects to the Microsoft Bing Translator. Instead, it refers directly back to the main Yahoo.com page.

    Read more →
  • Jaime Carbonell

    Jaime Carbonell

    Jaime Guillermo Carbonell (July 29, 1953 – February 28, 2020) was a computer scientist who made seminal contributions to the development of natural language processing tools and technologies. His research in machine translation resulted in the development of several state-of-the-art language translation and artificial intelligence systems. He earned his B.S. degrees in Physics and in Mathematics from MIT in 1975 and did his Ph.D. under Dr. Roger Schank at Yale University in 1979. He joined Carnegie Mellon University as an assistant professor of computer science in 1979 and moved to Pittsburgh. He was affiliated with the Language Technologies Institute, Computer Science Department, Machine Learning Department, and Computational Biology Department at Carnegie Mellon. His interests spanned several areas of artificial intelligence, language technologies and machine learning. In particular, his research focused on areas such as text mining (extraction, categorization, novelty detection) and in new theoretical frameworks such as a unified utility-based theory bridging information retrieval, summarization, free-text question-answering and related tasks. He also worked on machine translation, both high-accuracy knowledge-based MT and machine learning for corpus-based MT (such as generalized example-based MT). == Career == Carbonell was the Allen Newell Professor of Computer Science and head of the Language Technologies Institute at Carnegie Mellon University. He joined Carnegie Mellon in 1979, and became a key faculty member in the artificial intelligence area. He was appointed full professor in 1987, Newell Chair in 1995, and University Professor in 2012. He completed his undergraduate studies at MIT. He received dual degrees in Mathematics and Physics. He received his Ph.D. in computer science from Yale University in 1979. At the time of his appointment, Carbonell was the youngest chaired professor in the School of Computer Science at CMU. His research spanned several areas of computer science, mostly in artificial intelligence, including: machine learning, data and text mining, natural language processing, very-large-scale knowledge bases, translingual information retrieval and automated summarization. He wrote more than 300 technical papers and gave over 500 invited or refereed-paper presentations (colloquia, seminars, panels, conferences, keynotes, etc.). He died following a long illness on February 28, 2020. Mona Talat Diab became the director of CMU's Language Technologies Institute in 2023. == Research == Carbonell created MMR (maximal marginal relevance) technology for text summarization and informational novelty detection in search engines, invention of transformational analogy, a generalized method for case-based reasoning (CBR) to re-use, modify and compose past successful plans for increasingly complex problems and knowledge-based interlingual machine translation. He was instrumental in setting up the Computational Biolinguistics Program, a joint venture between Carnegie Mellon and the University of Pittsburgh, which combines Language Technologies and Machine Learning to model and predict genomic, proteomic and glycomic 3D structures. Carbonell also did work in machine learning. He organized the first four machine learning conferences, starting with CMU in 1981. The Language Technologies Institute (LTI), founded and directed by Carbonell, achieved top honors in multiple areas. These areas include machine translation, search engines (including founding of Lycos by Michael Mauldin, one of Carbonell’s PhD students), speech synthesis, and education. LTI remains the original, largest and best-known institute for language technologies, with over $12M in annual funding and 200 researchers (faculty, staff, PhD students, MS students, visiting scholars etc.). Carbonell made major technical contributions in several fields, including (1) Creation of MMR (maximal marginal relevance) technology for text summarization and informational novelty detection in search engines,(2) Proactive machine learning for multi-source cost-sensitive active learning, (3) Linked conditional random fields for predicting tertiary and quaternary protein folds, (4) Symmetric optimal phrasal alignment method for trainable example-based and statistical machine translation, (5) Series- anomaly modeling for financial fraud detection and syndromic surveillance, (6) Knowledge-based interlingual machine translation, (7) Robust case-frame parsing, (8) Seeded version-space learning and (9) Invention of transformational and derivational analogy, generalized methods for case-based reasoning (CBR) to re-use, modify and compose past successful plans for increasingly complex problems. The teams led by Carbonell achieved top honors in many areas such as first scalable high-accuracy interlingual machine translation (1991), first speech-to-speech machine translation (1992), first large-scale spider and search engine (1994), and first trainable, large-scale protein-structure topology predictor (2005). Modern machine learning, co-founded by Carbonell, Michalski and Mitchell, is a fundamental enabling technology in search engines, data mining and social networking. Starting in 1980, he co-edited the first three books on ML, launched the ML conferences and was a co-founder and editor-in-chief of ML Journal. Carbonell’s innovations have led to several successful start-ups: Carnegie Group (AI expertsystems), Lycos (web search), Wisdom (financial optimization & ML), Carnegie Speech (spoken-language tutoring), Dynamix (data mining and pattern discovery), and Meaningful Machines (context-based machine translation). Carbonell was the founding director of The Language Technology Institute, the preeminent global institution in language studies, unparalleled in size and scope and has since been adopted/imitated in Germany (DFKI), Japan (Tokyo Univ.), and the US (Johns Hopkins). == Awards and honors == Okawa Prize, 2015 Best paper award, “Translingual Search” w/Yang, International Joint Conference on AI, 1997 Allen Newell endowed chair, Carnegie Mellon University, 1995 Elected fellow of AAAI, 1991 Computer Science teaching award, Carnegie Mellon University, 1987 Sperry Fellowship for excellence in AI research, 1986 Herbert Simon teaching award, 1986 "Recognition of Service" award from the ACM for the SIGART presidency, 1983–1985 Provided congressional testimony on machine translation, 1990 == Selected works == === Books === 1983. (with Ryszard S. Michalski & Tom M. Mitchell, Eds.) Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann. 1986. (with Ryszard S. Michalski & Tom Mitchell, Eds.) Machine learning: An artificial intelligence approach. Vol. II. Los Altos, CA: Morgan-Kaufmann. 1986. (with Ryszard S. Michalski & Tom Mitchell, Eds.) Machine Learning: A Guide to Current Research. Kluwer Academic Publishers. == Contributions == “Protein Quaternary Fold Recognition Using Conditional Graphical Models” IJCAI 2007 (w/Liu et al.) “Context-Based Machine Translation” AMTA 2006 (w/Klein et al.) “SCRFs: A New Approach for Protein Fold Recognition,’’ Journal of Computational Biology, 13,2, 2006 (w/Liu et al) “MT for Resource-Poor Languages Using Elicitation-Based Learning” Machine Translation, 2004 ‘‘Learning Approaches for Detecting and Tracking News Events,’’ IEEE Trans I.S., 14, 4, 2000 (w/Yang)

    Read more →
  • GeneRIF

    GeneRIF

    A GeneRIF or Gene Reference Into Function is a short (255 characters or fewer) statement about the function of a gene. GeneRIFs provide a simple mechanism for allowing scientists to add to the functional annotation of genes described in the Entrez Gene database. In practice, function is constructed quite broadly. For example, there are GeneRIFs that discuss the role of a gene in a disease, GeneRIFs that point the viewer towards a review article about the gene, and GeneRIFs that discuss the structure of a gene. However, the stated intent is for GeneRIFs to be about gene function. Currently over half a million geneRIFs have been created for genes from almost 1000 different species. GeneRIFs are always associated with specific entries in the Entrez Gene database. Each GeneRIF has a pointer to the PubMed ID (a type of document identifier) of a scientific publication that provides evidence for the statement made by the GeneRIF. GeneRIFs are often extracted directly from the document that is identified by the PubMed ID, very frequently from its title or from its final sentence. GeneRIFs are usually produced by NCBI indexers, but anyone may submit a GeneRIF. To be processed, a valid Gene ID must exist for the specific gene, or the Gene staff must have assigned an overall Gene ID to the species. The latter case is implemented via records in Gene with the symbol NEWENTRY. Once the Gene ID is identified, only three types of information are required to complete a submission: a concise phrase describing a function or functions (less than 255 characters in length, preferably more than a restatement of the title of the paper); a published paper describing that function, implemented by supplying the PubMed ID of a citation in PubMed; a valid e-mail address (which will remain confidential). == Example == Here are some GeneRIFs taken from Entrez Gene for GeneID 7157, the human gene TP53. The PubMed document identifiers have been omitted from the examples. Note the wide variability with respect to the presence or absence of punctuation and of sentence-initial capital letters. p53 and c-erbB-2 may have independent role in carcinogenesis of gall bladder cancer Degradation of endogenous HIPK2 depends on the presence of a functional p53 protein. p53 codon 72 alleles influence the response to anticancer drugs in cells from aged people by regulating the cell cycle inhibitor p21WAF1 Logistic regression analysis showed p53 and COX-2 as dependent predictors in pancreatic carcinogenesis, and a reciprocal relationship to neoplastic progression between p53 and COX-2. GeneRIFs are an unusual type of textual genre, and they have recently been the subject of a number of articles from the natural language processing community.

    Read more →
  • Apache cTAKES

    Apache cTAKES

    Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source Natural Language Processing (NLP) system that extracts clinical information from electronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated. cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. == Components == Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research. These components include: Named Section identifier Sentence boundary detector Rule-based tokenizer Formatted list identifier Normalizer Context dependent tokenizer Part-of-speech tagger Phrasal chunker Dictionary lookup annotator Context annotator Negation detector Uncertainty detector Subject detector Dependency parser patient smoking status identifier Drug mention annotator == History == Development of cTAKES began at the Mayo Clinic in 2006. The development team, led by Dr. Guergana Savova and Dr. Christopher Chute, included physicians, computer scientists and software engineers. After its deployment, cTAKES became an integral part of Mayo's clinical data management infrastructure, processing more than 80 million clinical notes. When Dr. Savova's moved to Boston Children's Hospital in early 2010, the core development team grew to include members there. Further external collaborations include: University of Colorado Brandeis University University of Pittsburgh University of California at San Diego Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain. In 2010, cTAKES was adopted by the i2b2 program and is a central component of the SHARP Area 4. In 2013, cTAKES released their first release as an Apache Software Foundation incubator project: cTAKES 3.0. In March 2013, cTAKES became an Apache Software Foundation Top Level Project (TLP).

    Read more →
  • The Best Free AI Resume Builder for Beginners

    The Best Free AI Resume Builder for Beginners

    Curious about the best AI resume builder? An AI resume builder is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI resume builder slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Is an AI Customer-support Bot Worth It in 2026?

    Is an AI Customer-support Bot Worth It in 2026?

    In search of the best AI customer-support bot? An AI customer-support bot is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI customer-support bot slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • PANGU (software)

    PANGU (software)

    The PANGU (Planet and Asteroid Natural scene Generation Utility) is a computer graphics utility of which the development was funded by ESA and performed by University of Dundee. It generates scenes of planets, moons, asteroids, spacecraft and rovers. The main purpose of the tool is to test and validate navigation techniques based on the processing of images coming from on-board sensors, such as a camera or imaging LIDAR on a planetary lander.

    Read more →
  • Optical Character Recognition (Unicode block)

    Optical Character Recognition (Unicode block)

    Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards. == Block == == Subheadings == The Optical Character Recognition block has three informal subheadings (groupings) within its character collection: OCR-A, MICR, and OCR. === OCR-A === The OCR-A subheading contains six characters taken from the OCR-A font described in the ISO 1073-1:1976 standard: U+2440 ⑀ OCR HOOK, U+2441 ⑁ OCR CHAIR, U+2442 ⑂ OCR FORK, U+2443 ⑃ OCR INVERTED FORK, U+2444 ⑄ OCR BELT BUCKLE, and U+2445 ⑅ OCR BOW TIE. The OCR bow tie is given the informative alias "unique asterisk". The hook, chair and fork, in addition to a long vertical bar, are included in the most basic "numeric" implementation level of OCR-A, which includes digits but excludes letters and conventional punctuation. By contrast, the most basic implementation level of OCR-B instead includes the digits, plus sign, less-than sign, greater-than sign, long vertical bar and seven of the capital letters; as such, there are no characters specific to OCR-B in the Optical Character Recognition block. === MICR === The MICR subheading contains four punctuation characters for bank cheque identifiers, taken from the magnetic ink character recognition E-13B font (codified in the ISO 1004:1995 standard): U+2446 ⑆ OCR BRANCH BANK IDENTIFICATION, U+2447 ⑇ OCR AMOUNT OF CHECK, U+2448 ⑈ OCR DASH, and U+2449 ⑉ OCR CUSTOMER ACCOUNT NUMBER. The latter two characters are misnamed: their names were inadvertently switched when they were named in the 1993 (first) edition of ISO/IEC 10646, a mistake which had been present since Unicode 1.0.0. Although their formal names remain unchanged due to the Unicode stability policy, they both have corrected normative aliases: U+2448 ⑈ is MICR ON US SYMBOL, and U+2449 ⑉ is MICR DASH SYMBOL (the standard notes that "the Unicode character names include several misnomers"). These symbols had previously been encoded by the ISO-IR-98 encoding defined by ISO 2033:1983, in which they were simply named SYMBOL ONE through SYMBOL FOUR. All four characters have informative aliases in the Unicode charts: "transit", "amount", "on us", and "dash" respectively. === OCR === The OCR subheading consists of a single character: U+244A ⑊ OCR DOUBLE BACKSLASH. == History == The following Unicode-related documents record the purpose and process of defining specific characters in the Optical Character Recognition block:

    Read more →
  • Jean Véronis

    Jean Véronis

    Jean Véronis (3 June 1955 – 8 September 2013) was a French linguist, computer scientist and blogger, and a research professor at Aix-Marseille University. His research interests included natural language processing, text mining and standardisation. He was a founder of the field that is now called digital humanities. In 2006, his blog was listed among the 15 most influential by Le Monde.

    Read more →
  • Katz's back-off model

    Katz's back-off model

    Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. It accomplishes this estimation by backing off through progressively shorter history models under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results. The model was introduced in 1987 by Slava M. Katz. Prior to that, n-gram language models were constructed by training individual models for different n-gram orders using maximum likelihood estimation and then interpolating them together. == Method == The equation for Katz's back-off model is: P b o ( w i ∣ w i − n + 1 ⋯ w i − 1 ) = { d w i − n + 1 ⋯ w i C ( w i − n + 1 ⋯ w i − 1 w i ) C ( w i − n + 1 ⋯ w i − 1 ) if C ( w i − n + 1 ⋯ w i ) > k α w i − n + 1 ⋯ w i − 1 P b o ( w i ∣ w i − n + 2 ⋯ w i − 1 ) otherwise {\displaystyle {\begin{aligned}&P_{bo}(w_{i}\mid w_{i-n+1}\cdots w_{i-1})\\[4pt]={}&{\begin{cases}d_{w_{i-n+1}\cdots w_{i}}{\dfrac {C(w_{i-n+1}\cdots w_{i-1}w_{i})}{C(w_{i-n+1}\cdots w_{i-1})}}&{\text{if }}C(w_{i-n+1}\cdots w_{i})>k\\[10pt]\alpha _{w_{i-n+1}\cdots w_{i-1}}P_{bo}(w_{i}\mid w_{i-n+2}\cdots w_{i-1})&{\text{otherwise}}\end{cases}}\end{aligned}}} where C(x) = number of times x appears in training wi = ith word in the given context Essentially, this means that if the n-gram has been seen more than k times in training, the conditional probability of a word given its history is proportional to the maximum likelihood estimate of that n-gram. Otherwise, the conditional probability is equal to the back-off conditional probability of the (n − 1)-gram. The more difficult part is determining the values for k, d and α. k {\displaystyle k} is the least important of the parameters. It is usually chosen to be 0. However, empirical testing may find better values for k. d {\displaystyle d} is typically the amount of discounting found by Good–Turing estimation. In other words, if Good–Turing estimates C {\displaystyle C} as C ∗ {\displaystyle C^{}} , then d = C ∗ C {\displaystyle d={\frac {C^{}}{C}}} To compute α {\displaystyle \alpha } , it is useful to first define a quantity β, which is the left-over probability mass for the (n − 1)-gram: β w i − n + 1 ⋯ w i − 1 = 1 − ∑ { w i : C ( w i − n + 1 ⋯ w i ) > k } d w i − n + 1 ⋯ w i C ( w i − n + 1 ⋯ w i − 1 w i ) C ( w i − n + 1 ⋯ w i − 1 ) {\displaystyle \beta _{w_{i-n+1}\cdots w_{i-1}}=1-\sum _{\{w_{i}:C(w_{i-n+1}\cdots w_{i})>k\}}d_{w_{i-n+1}\cdots w_{i}}{\frac {C(w_{i-n+1}\cdots w_{i-1}w_{i})}{C(w_{i-n+1}\cdots w_{i-1})}}} Then the back-off weight, α, is computed as follows: α w i − n + 1 ⋯ w i − 1 = β w i − n + 1 ⋯ w i − 1 ∑ { w i : C ( w i − n + 1 ⋯ w i ) ≤ k } P b o ( w i ∣ w i − n + 2 ⋯ w i − 1 ) {\displaystyle \alpha _{w_{i-n+1}\cdots w_{i-1}}={\frac {\beta _{w_{i-n+1}\cdots w_{i-1}}}{\sum _{\{w_{i}:C(w_{i-n+1}\cdots w_{i})\leq k\}}P_{bo}(w_{i}\mid w_{i-n+2}\cdots w_{i-1})}}} The above formula only applies if there is data for the "(n − 1)-gram". If not, the algorithm skips n-1 entirely and uses the Katz estimate for n-2. (and so on until an n-gram with data is found) == Discussion == This model generally works well in practice, but fails in some circumstances. For example, suppose that the bigram "a b" and the unigram "c" are very common, but the trigram "a b c" is never seen. Since "a b" and "c" are very common, it may be significant (that is, not due to chance) that "a b c" is never seen. Perhaps it's not allowed by the rules of the grammar. Instead of assigning a more appropriate value of 0, the method will back off to the bigram and estimate P(c | b), which may be too high.

    Read more →
  • Zesta

    Zesta

    Zesta is an online food ordering and delivery platform operating across the African region. Formerly known as Square Eats, the company rebranded to Zesta in 2025. Zesta connects customers with restaurants and stores, offering delivery services for food, groceries, parcel delivery and other essentials. == History == Zesta was originally founded as Square Eats in 2020 by twin brothers Henry Newman and Randall Newman when they were 21 years old. It was launched in Gaborone, Botswana, and quickly gained traction as a leading food delivery service in the country. The company halted operations and took a strategic decision to reinvent the business in 2022. In 2025, the company announced its rebranding to Zesta, highlighting its commitment to evolving beyond food delivery to become a super app. === COVID-19 initiative === During the COVID-19 pandemic, Zesta (then Square Eats) implemented measures to ensure safety and hygiene, including providing free gloves and hand sanitizer to drivers and introducing contactless delivery options. These efforts positioned the platform as a trusted service during the pandemic. == Service == Zesta facilitates delivery from a wide range of merchant partners via a smartphone app, available on iOS and Android platforms, or through its website. Customers can browse their favorite restaurants, place orders, and have meals delivered to their doorstep efficiently.

    Read more →
  • Is an AI Sales Assistant Worth It in 2026?

    Is an AI Sales Assistant Worth It in 2026?

    Shopping for the best AI sales assistant? An AI sales assistant is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI sales assistant slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • OCR-A

    OCR-A

    OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm (0.10 inch) apart, and the reader required to accept any spacing between 0.2286 cm (0.09 inch) and 0.4572 cm (0.18 inch). == Standardization == The OCR-A font was standardized by the American National Standards Institute (ANSI) as ANSI X3.17-1981. X3.4 has since become the INCITS and the OCR-A standard is now called ISO 1073-1:1976. == Implementations == In 1968, American Type Founders produced OCR-A, one of the first optical character recognition typefaces to meet the criteria set by the U.S. Bureau of Standards. The design is simple so that it can be easily read by a machine, but it is more difficult for the human eye to read. As metal type gave way to computer-based typesetting, Tor Lillqvist used Metafont to describe the OCR-A font. That definition was subsequently improved by Richard B. Wales. Their work is available from CTAN. To make the free version of the font more accessible to users of Microsoft Windows, John Sauter converted the Metafont definitions to TrueType using potrace and FontForge in 2004. In 2007, Gürkan Sengün created a Debian package from this implementation. In 2008. Luc Devroye corrected the vertical positioning in John Sauter's implementation, and fixed the name of lower case z. Independently, Matthew Skala used mftrace to convert the Metafont definitions to TrueType format in 2006. In 2011 he released a new version created by rewriting the Metafont definitions to work with METATYPE1, generating outlines directly without an intermediate tracing step. On September 27, 2012, he updated his implementation to version 0.2. In addition to these free implementations of OCR-A, there are also implementations sold by several vendors. As a joke, Tobias Frere-Jones in 1995 created Estupido-Espezial, a redesign with swashes and a long s. It was used in a "technology"-themed section of Rolling Stone. Maxitype designed the OCR-X typeface—based on the OCR-A typeface with OpenType features, alien/technology-themed dingbats and available in six weights (Thin, Light, Regular, Medium, Bold, Black). Japanese typeface foundry Visual Design Laboratory (VDL) designed two typefaces based on the OCR-A typeface: one for Simplified Chinese characters named Jieyouti and one for Japanese characters named Yota G (ヨタG) , both available in five weights (Light, Regular, Medium, Semi Bold, Bold). == Use == Although optical character recognition technology has advanced to the point where such simple fonts are no longer necessary, the OCR-A font has remained in use. Its usage remains widespread in the encoding of checks around the world. Some lock box companies still insist that the account number and amount owed on a bill return form be printed in OCR-A. Also, because of its unusual look, it is sometimes used in advertising and display graphics. Notably, it is used for the subtitles in films and television series such as Blacklist and for the main titles in The Pretender. Additionally, OCR-A is used in the titles and subtitles for the films 13 Hours: The Secret Soldiers of Benghazi and Hoppers (film). It was also used for the logo, branding, and marketing material of the children's toy line Hexbug. == Code points == A font is a set of character shapes, or glyphs. For a computer to use a font, each glyph must be assigned a code point in a character set. When OCR-A was being standardized the usual character coding was the American Standard Code for Information Interchange or ASCII. Not all of the glyphs of OCR-A fit into ASCII, and for five of the characters there were alternate glyphs, which might have suggested the need for a second font. However, for convenience and efficiency all of the glyphs were expected to be accessible in a single font using ASCII coding, with the additional characters placed at coding points that would otherwise have been unused. The modern descendant of ASCII is Unicode, also known as ISO 10646. Unicode contains ASCII and has special provisions for OCR characters, so some implementations of OCR-A have looked to Unicode for guidance on character code assignments. === Pre-Unicode standard representation === The ISO standard ISO 2033:1983, and the corresponding Japanese Industrial Standard JIS X 9010:1984 (originally JIS C 6229–1984), define character encodings for OCR-A, OCR-B and E-13B. For OCR-A, they define a modified 7-bit ASCII set (also known by its ISO-IR number ISO-IR-91) including only uppercase letters, digits, a subset of the punctuation and symbols, and some additional symbols. Codes which are redefined relative to ASCII, as opposed to simply omitted, are listed below: Additionally, the long vertical mark () is encoded at 0x7C, corresponding to the ASCII vertical bar (|). === Dedicated OCR-A characters in Unicode === The following characters have been defined for control purposes and are now in the "Optical Character Recognition" Unicode range 2440–245F: === Space, digits, and unaccented letters === All implementations of OCR-A use U+0020 for space, U+0030 through U+0039 for the decimal digits, U+0041 through U+005A for the unaccented upper case letters, and U+0061 through U+007A for the unaccented lower case letters. === Regular characters === In addition to the digits and unaccented letters, many of the characters of OCR-A have obvious code points in ASCII. Of those that do not, most, including all of OCR-A's accented letters, have obvious code points in Unicode. === Remaining characters === Linotype coded the remaining characters of OCR-A as follows: === Additional characters === The fonts that descend from the work of Tor Lillqvist and Richard B. Wales define four characters not in OCR-A to fill out the ASCII character set. These shapes use the same style as the OCR-A character shapes. They are: Linotype also defines additional characters. === Exceptions === Some implementations do not use the above code point assignments for some characters. ==== PrecisionID ==== The PrecisionID implementation of OCR-A has the following non-standard code points: OCR Hook at U+007E OCR Chair at U+00C1 OCR Fork at U+00C2 Euro Sign at U+0080 ==== Barcodesoft ==== The Barcodesoft implementation of OCR-A has the following non-standard code points: OCR Hook at U+0060 OCR Chair at U+007E OCR Fork at U+005F Long Vertical Mark at U+007C (agrees with Linotype) Character Erase at U+0008 ==== Morovia ==== The Morovia implementation of OCR-A has the following non-standard code points: OCR Hook at U+007E (agrees with PrecisionID) OCR Chair at U+00F0 OCR Fork at U+005F (agrees with Barcodesoft) Long Vertical Mark at U+007C (agrees with Linotype) ==== IDAutomation ==== The IDAutomation implementation of OCR-A has the following non-standard code points: OCR Hook at U+007E (agrees with PrecisionID) OCR Chair at U+00C1 (agrees with PrecisionID) OCR Fork at U+00C2 (agrees with PrecisionID) OCR Belt Buckle at U+00C3 == Sellers of font standards == Hardcopy of ISO 1073-1:1976, distributed through ANSI, from Amazon.com ISO 1073-1 is also available from Techstreet, who distributes standards for ANSI and ISO

    Read more →