AI Chatbot Interface

AI Chatbot Interface — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Immediate mode (computer graphics)

    Immediate mode (computer graphics)

    Immediate mode is an API design pattern in computer graphics libraries, in which the client calls directly cause rendering of graphics objects to the display, or in which the data to describe rendering primitives is inserted frame by frame directly from the client into a command list (in the case of immediate mode primitive rendering), without the use of extensive indirection – thus immediate – to retained resources. It does not preclude the use of double-buffering. Retained mode is an alternative approach. Historically, retained mode has been the dominant style in GUI libraries; however, both can coexist in the same library and are not necessarily exclusive in practice. == Overview == In immediate mode, the scene (complete object model of the rendering primitives) is retained in the memory space of the client, instead of the graphics library. This implies that in an immediate mode application, the lists of graphical objects to be rendered are kept by the client and are not saved by the graphics library API. The application must re-issue all drawing commands required to describe the entire scene each time a new frame is required, regardless of actual changes. This method provides on the one hand a maximum of control and flexibility to the application program, but on the other hand it also generates continuous work load on the CPU. Examples of immediate mode rendering systems include Direct2D, OpenGL and Quartz. There are some immediate mode GUIs that are particularly suitable when used in conjunction with immediate mode rendering systems. == Immediate mode primitive rendering == Primitive vertex attribute data may be inserted frame by frame into a command buffer by a rendering API. This involves significant bandwidth and processor time (especially if the graphics processing unit is on a separate bus), but may be advantageous for data generated dynamically by the CPU. It is less common since the advent of increasingly versatile shaders, with which a graphics processing unit may generate increasingly complex effects without the need for CPU intervention. == Immediate mode rendering with vertex buffers == Although drawing commands have to be re-issued for each new frame, modern systems using this method are generally able to avoid the unnecessary duplication of more memory-intensive display data by referring to that unchanging data (via indirection) (e.g. textures and vertex buffers) in the drawing commands. == Immediate mode GUI == Graphical user interfaces traditionally use retained mode-style API design, but immediate mode GUIs instead use an immediate mode-style API design, in which user code directly specifies the GUI elements to draw in the user input loop. For example, rather than having a CreateButton() function that a user would call once to instantiate a button, an immediate-mode GUI API may have a DoButton() function which should be called whenever the button should be on screen. The technique was developed by Casey Muratori in 2002. Prominent implementations include Omar Cornut's Dear ImGui in C++, Nic Barker's Clay in C and Micha Mettke's Nuklear in C.

    Read more →
  • Is an AI Voice Assistant Worth It in 2026?

    Is an AI Voice Assistant Worth It in 2026?

    Trying to pick the best AI voice assistant? An AI voice assistant is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI voice assistant slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • DALL-E

    DALL-E

    DALL-E, DALL-E 2, and DALL-E 3 (stylised DALL·E) are text-to-image models developed by OpenAI using deep learning methodologies to generate digital images from natural language descriptions known as prompts. The first version of DALL-E was announced in January 2021. In the following year, its successor DALL-E 2 was released. DALL-E 3 was released natively into ChatGPT for ChatGPT Plus and ChatGPT Enterprise customers in October 2023, with availability via OpenAI's API and "Labs" platform provided in early November. Microsoft implemented the model in Bing's Image Creator tool and plans to implement it into their Designer app. With Bing's Image Creator tool, Microsoft Copilot runs on DALL-E 3. In March 2025, DALL-E-3 was replaced in ChatGPT by GPT Image's native image-generation capabilities. == History and background == DALL-E was revealed by OpenAI in a blog post on 5 January 2021, and uses a version of GPT-3 modified to generate images. On 6 April 2022, OpenAI announced DALL-E 2, a successor designed to generate more realistic images at higher resolutions that "can combine concepts, attributes, and styles". On 20 July 2022, DALL-E 2 entered into a beta phase with invitations sent to 1 million waitlisted individuals; users could generate a certain number of images for free every month and may purchase more. Access had previously been restricted to pre-selected users for a research preview due to concerns about ethics and safety. On 28 September 2022, DALL-E 2 was opened to everyone and the waitlist requirement was removed. In September 2023, OpenAI announced their latest image model, DALL-E 3, capable of understanding "significantly more nuance and detail" than previous iterations. In early November 2022, OpenAI released DALL-E 2 as an API, allowing developers to integrate the model into their own applications. Microsoft unveiled their implementation of DALL-E 2 in their Designer app and Image Creator tool included in Bing and Microsoft Edge. The API operates on a cost-per-image basis, with prices varying depending on image resolution. Volume discounts are available to companies working with OpenAI's enterprise team. The software's name is a portmanteau of the names of animated robot Pixar character WALL-E and the Spanish surrealist artist Salvador Dalí. In February 2024, OpenAI began adding watermarks to DALL-E generated images, containing metadata in the C2PA (Coalition for Content Provenance and Authenticity) standard promoted by the Content Authenticity Initiative. == Technology == The first generative pre-trained transformer (GPT) model was initially developed by OpenAI in 2018, using a Transformer architecture. The first iteration, GPT-1, was scaled up to produce GPT-2 in 2019; in 2020, it was scaled up again to produce GPT-3, with 175 billion parameters. === DALL-E === DALL-E has three components: a discrete VAE, an autoregressive decoder-only Transformer model (12 billion parameters) similar to GPT-3, and a CLIP pair of image encoder and text encoder. The discrete VAE can convert an image to a sequence of tokens, and conversely, convert a sequence of tokens back to an image. This is necessary as the Transformer model does not directly process image data. The input to the Transformer model is a sequence of tokenised image caption followed by tokenised image patches. The image caption is in English, tokenised by byte pair encoding (vocabulary size 16384), and can be up to 256 tokens long. Each image is a 256×256 RGB image, divided into 32×32 patches of 4×4 each. Each patch is then converted by a discrete variational autoencoder to a token (vocabulary size 8192). DALL-E was developed and announced to the public in conjunction with CLIP (Contrastive Language-Image Pre-training). CLIP is a separate model based on contrastive learning that was trained on 400 million pairs of images with text captions scraped from the Internet. Its role is to "understand and rank" DALL-E's output by predicting which caption from a list of 32,768 captions randomly selected from the dataset (of which one was the correct answer) is most appropriate for an image. A trained CLIP pair is used to filter a larger initial list of images generated by DALL-E to select the image that is closest to the text prompt. === DALL-E 2 === DALL-E 2 uses 3.5 billion parameters, a smaller number than its predecessor. Instead of an autoregressive Transformer, DALL-E 2 uses a diffusion model conditioned on CLIP image embeddings, which, during inference, are generated from CLIP text embeddings by a prior model. This is the same architecture as that of Stable Diffusion, released a few months later. === DALL-E 3 === While a technical report was written for DALL-E 3, it does not include training or implementation details of the model, instead focusing on the improved prompt following capabilities developed for DALL-E 3. == Capabilities == DALL-E can generate imagery in multiple styles, including photorealistic imagery, paintings, and emoji. It can "manipulate and rearrange" objects in its images, and can correctly place design elements in novel compositions without explicit instruction. Thom Dunn writing for BoingBoing remarked that "For example, when asked to draw a daikon radish blowing its nose, sipping a latte, or riding a unicycle, DALL-E often draws the handkerchief, hands, and feet in plausible locations." DALL-E showed the ability to "fill in the blanks" to infer appropriate details without specific prompts, such as adding Christmas imagery to prompts commonly associated with the celebration, and appropriately placed shadows to images that did not mention them. Furthermore, DALL-E exhibits a broad understanding of visual and design trends. DALL-E can produce images for a wide variety of arbitrary descriptions from various viewpoints with only rare failures. Mark Riedl, an associate professor at the Georgia Tech School of Interactive Computing, found that DALL-E could blend concepts (described as a key element of human creativity). Its visual reasoning ability is sufficient to solve Raven's Matrices (visual tests often administered to humans to measure intelligence). DALL-E 3 follows complex prompts with more accuracy and detail than its predecessors, and is able to generate more coherent and accurate text. DALL-E 3 is integrated into ChatGPT Plus. === Image modification === Given an existing image, DALL-E 2 and DALL-E 3 can produce "variations" of the image as individual outputs based on the original, as well as edit the image to modify or expand upon it. The "inpainting" and "outpainting" abilities of these models use context from an image to fill in missing areas using a medium consistent with the original, following a given prompt. For example, this can be used to insert a new subject into an image, or expand an image beyond its original borders. According to OpenAI, "Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image." === Technical limitations === DALL-E 2's language understanding has limits. It is sometimes unable to distinguish "A yellow book and a red vase" from "A red book and a yellow vase" or "A panda making latte art" from "Latte art of a panda". It generates images of an astronaut riding a horse when presented with the prompt "a horse riding an astronaut". It also fails to generate the correct images in a variety of circumstances. Requesting more than three objects, negation, numbers, and connected sentences may result in mistakes, and object features may appear on the wrong object. Additional limitations include generating text, ambigrams and other forms of typography, which often results in dream-like gibberish. The model also has a limited capacity to address scientific information, such as astronomy or medical imagery. == Ethical concerns == DALL-E 2's reliance on public datasets influences its results and leads to algorithmic bias in some cases, such as generating higher numbers of men than women for requests that do not mention gender. DALL-E 2's training data was filtered to remove violent and sexual imagery, but this was found to increase bias in some cases such as reducing the frequency of women being generated. OpenAI hypothesise that this may be because women were more likely to be sexualised in training data which caused the filter to influence results. In September 2022, OpenAI confirmed to The Verge that DALL-E invisibly inserts phrases into user prompts to address bias in results; for instance, "black man" and "Asian woman" are inserted into prompts that do not specify gender or race. OpenAI claims to address concerns for potential "racy content" – containing nudity or sexual content generation, with DALL-E 3 through input/output filters, blocklists, ChatGPT refusals, and model level interventions. However, DALL-E 3 continues to disproportionally represent people as White, female, and youthful. Users are able to somewhat remedy

    Read more →
  • The Best Free AI Customer-support Bot for Beginners

    The Best Free AI Customer-support Bot for Beginners

    Shopping for the best AI customer-support bot? An AI customer-support bot is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI customer-support bot slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Logical Machine Corporation

    Logical Machine Corporation

    Logical Machine Corporation (LOMAC) was an American computer company active from the mid-1970s to the 1980s and based in the San Francisco Bay Area. It was founded as John Peers and Company by the British entrepreneur John Peers in 1974. LOMAC developed the ADAM, a minicomputer which ran a specialized compiler for the company's natural English programming language. Throughout the late 1970s, the company acquired several technology firms, including Byte, Inc., the owner of the Byte Shop retail chain. Despite its unique approach to computing and earning $5 million in revenue in 1977, LOMAC struggled as the industry began to standardize around the IBM Personal Computer (IBM PC). Following Peers's departure in 1980, the company rebranded as Logical Business Machines, Inc. (LBM, or simply Logical), and attempted to pivot toward IBM PC–compatible hardware. However, financial difficulties led to the company filing for Chapter 11 bankruptcy in 1984. After emerging from bankruptcy in 1985 with new investment, Logical ceased hardware manufacturing to focus exclusively on software development and value-added reselling. == History == John Peers (born 1942) founded Logical Machine Corporation as John Peers and Company in September 1974. The company originally occupied a 4,500-square-foot office in Burlingame, California. The company was Peers' fourth; he had recently sold off Allied Business Systems of London to Trafalgar House in 1974. Peers sought to set up manufacturing in an agricultural zone in Ukiah, California. Following a delay, caused in part by concerned residents, a 30,000-square-foot plant was raised in Burke Hill, three miles south of Ukiah. The Ukiah plant was built to mass manufacture the company's ADAM minicomputer. The ADAM computer ran a specialized compiler for the company's natural English programming language; that is to say, the programming language attempted to closely emulate English syntax. Prototypes of the ADAM were built in May 1974, based on specifications devised in October 1973. Peers had yet to patent the technology as of June 1975. The ADAM's central processing unit was bolted onto an 7-by-6-foot L-shaped desk, on which rested its terminal. Twenty units of the ADAM were installed between April 1975 and February 1976, out of a backlog of orders for 3,500 from 500 clients, manufactured out of the company's Burlingame headquarters. It cost US$40,000. A controversial print advertisement featuring a naked woman seated at an ADAM terminal—as a pastiche of Adam and Eve—was recalled in early 1976 as a result of outcry from the National Organization for Women. The company changed its name to Logical Machine Corporation (LOMAC) in October 1976 and moved its headquarters to a 26,000-square-foot building in Sunnyvale, California, in anticipation of a ramping up of orders for the ADAM. The company originally occupied half of the building; they later purchased the other half from the tenant in July 1977 to double its manufacturing output. For fiscal year 1977, the company earned $5 million in revenue. In December 1977, LOMAC acquired Byte, Inc.—the proprietor of The Byte Shop, the first computer retail chain—from Paul Terrell and Boyd Wilson for an unspecified amount. The Byte Shop had 65 locations in the San Francisco Bay Area in 1978; it catered mainly to hobbyists with low cost microcomputer kits, in contrast to the high cost of LOMAC's ADAM. By July 1978, however, LOMAC were able to reduce the price of the ADAM down to $15,000. The company by that point had shipped their 50th ADAM and expanded to 14 countries. Also in 1978, LOMAC acquired Mass Memory—a high-tech optical storage company based in Phoenix, Arizona, whose products had storage capacities on the order gigabytes and terabytes—and Centigram, makers of the Mike—a computer with speech recognition. Later that year, the company introduced Tina, a low-cost version of the ADAM. LOMAC suffered losses that year and appointed Jerry Brandt to the board of directions, naming him chief operating officer, in August 1978. Brandt had Logical absorb Mass Memory and Centigram into the parent operations, shutting down their respective plants in the process, converted 10 Byte Shops to franchises and opened 25 more franchised Byte locations, and stopped direct sales of LOMAC's business computer products. By the beginning of 1979, LOMAC was profitable once more, and Brandt was let go from LOMAC. Peers left LOMAC in 1980, following a slump in the company's sales. He became an executive director of the United States Robotics Society, a consortium for industrial automation companies, that year. Following Peers' departure, LOMAC changed its name to Logical Business Machines, adopting the name of its European subsidiary. In 1983, the company announced a 16-bit clone of the IBM PC, called the Logical L-XT, which featured a 10-MB hard drive, 320-KB floppy drive and 192 KB of RAM, and a real-time clock, and came shipped with various software (including MS-DOS, a word processor, and a spreadsheet application) and an amber CRT monitor. The following year, the company introduced L-NET, a local area network system based on the L-XT that could link up to 64 computers. L-NET came shipped with a natural programming language, Diplomat—a descendant of the programming language used on the ADAM. In June 1983, Logical sued Coleco Industries over trademark infringement with the latter's to-be-released Adam microcomputer. Logical cited confusion from their existing ADAM customer base caused by the announcement of the Coleco Adam as the basis for the suit. Coleco challenged Logical in the press, writing that Logical's rights to the Adam trademark for use in computers had lapsed earlier in the year. The two settled out of court, with Coleco agreeing to license the Adam name from Logical in exchange for unlimited rights to the Adam trademark. Logical halted development of the L-XT when they filed for Chapter 11 bankruptcy in July 1984. The company had been $4 million in debt. They emerged from bankruptcy in September 1985, after being infused with $2 million from Carat Ltd. The latter immediately received a little less than 50 percent ownership in Logical—this stake set to grow to over 50 percent over the next six months. As part of the terms of exiting bankruptcy, Logical stopped manufacturing hardware and strictly became a software development company and value-added reseller of computer systems.

    Read more →
  • Tf–idf

    Tf–idf

    In information retrieval, tf–idf (term frequency–inverse document frequency, TFIDF, TFIDF, TF–IDF, or Tf–idf) is a measure of importance of a word to a document in a collection or corpus, adjusted for the fact that some words appear more frequently in general. Like the bag-of-words model, it models a document as a multiset of words, without word order. It is a refinement over the simple bag-of-words model, by allowing the weight of words to depend on the rest of the corpus. It was often used as a weighting factor in searches of information retrieval, text mining, and user modeling. A survey conducted in 2015 showed that 83% of text-based recommender systems in digital libraries used tf–idf. Variations of the tf–idf weighting scheme were often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. One of the simplest ranking functions is computed by summing the tf–idf for each query term; many more sophisticated ranking functions are variants of this simple model. == Motivations == Karen Spärck Jones (1972) conceived a statistical interpretation of term-specificity called Inverse Document Frequency (idf), which became a cornerstone of term weighting: The specificity of a term can be quantified as an inverse function of the number of documents in which it occurs.For example, the df (document frequency) and idf for some words in Shakespeare's 37 plays might be represented as follows: We see that "Romeo", "Falstaff", and "salad" appears in very few plays, so seeing these words, one could get a good idea as to which play it might be. In contrast, "good" and "sweet" appears in every play and are completely uninformative as to which play it is. == Definition == The tf–idf is the product of two statistics, term frequency and inverse document frequency. There are various ways for determining the exact values of both statistics. A formula that aims to define the importance of a keyword or phrase within a document or a web page. === Term frequency === Term frequency, tf(t,d), is the relative frequency of term t within document d, t f ( t , d ) = f t , d ∑ t ′ ∈ d f t ′ , d {\displaystyle \mathrm {tf} (t,d)={\frac {f_{t,d}}{\sum _{t'\in d}{f_{t',d}}}}} , where ft,d is the raw count of a term in a document, i.e., the number of times that term t occurs in document d. Note the denominator is simply the total number of terms in document d (counting each occurrence of the same term separately). There are various other ways to define term frequency: the raw count itself: tf(t,d) = ft,d Boolean "frequencies": tf(t,d) = 1 if t occurs in d and 0 otherwise; logarithmically scaled frequency: tf(t,d) = log (1 + ft,d); augmented frequency, to prevent a bias towards longer documents, e.g. raw frequency divided by the raw frequency of the most frequently occurring term in the document: t f ( t , d ) = 0.5 + 0.5 ⋅ f t , d max { f t ′ , d : t ′ ∈ d } {\displaystyle \mathrm {tf} (t,d)=0.5+0.5\cdot {\frac {f_{t,d}}{\max\{f_{t',d}:t'\in d\}}}} === Inverse document frequency === The inverse document frequency is a measure of how much information the word provides, i.e., how common or rare it is across all documents. It is the logarithmically scaled inverse fraction of the documents that contain the word (obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient): i d f ( t , D ) = log ⁡ N n t {\displaystyle \mathrm {idf} (t,D)=\log {\frac {N}{n_{t}}}} with D {\displaystyle D} : is the set of all documents in the corpus N = | D | {\displaystyle N={|D|}} : total number of documents in the corpus n t = | { d ∈ D : t ∈ d } | {\displaystyle n_{t}=|\{d\in D:t\in d\}|} : number of documents where the term t {\displaystyle t} appears (i.e., t f ( t , d ) ≠ 0 {\displaystyle \mathrm {tf} (t,d)\neq 0} ). If the term is not in the corpus, this will lead to a division-by-zero. It is therefore common to adjust the numerator to 1 + N {\displaystyle 1+N} and the denominator to 1 + | { d ∈ D : t ∈ d } | {\displaystyle 1+|\{d\in D:t\in d\}|} . === Term frequency–inverse document frequency === Then tf–idf is calculated as t f i d f ( t , d , D ) = t f ( t , d ) ⋅ i d f ( t , D ) {\displaystyle \mathrm {tfidf} (t,d,D)=\mathrm {tf} (t,d)\cdot \mathrm {idf} (t,D)} A high weight in tf–idf is reached by a high term frequency (in the given document) and a low document frequency of the term in the whole collection of documents; the weights hence tend to filter out common terms. Since the ratio inside the idf's log function is always greater than or equal to 1, the value of idf (and tf–idf) is greater than or equal to 0. As a term appears in more documents, the ratio inside the logarithm approaches 1, bringing the idf and tf–idf closer to 0. == Justification of idf == Idf was introduced as "term specificity" by Karen Spärck Jones in a 1972 paper. Although it has worked well as a heuristic, its theoretical foundations have been troublesome for at least three decades afterward, with many researchers trying to find information theoretic justifications for it. Spärck Jones's own explanation did not propose much theory, aside from a connection to Zipf's law. Attempts have been made to put idf on a probabilistic footing, by estimating the probability that a given document d contains a term t as the relative document frequency, P ( t | D ) = | { d ∈ D : t ∈ d } | N , {\displaystyle P(t|D)={\frac {|\{d\in D:t\in d\}|}{N}},} so that we can define idf as i d f = − log ⁡ P ( t | D ) = log ⁡ 1 P ( t | D ) = log ⁡ N | { d ∈ D : t ∈ d } | {\displaystyle {\begin{aligned}\mathrm {idf} &=-\log P(t|D)\\&=\log {\frac {1}{P(t|D)}}\\&=\log {\frac {N}{|\{d\in D:t\in d\}|}}\end{aligned}}} Namely, the inverse document frequency is the logarithm of "inverse" relative document frequency. This probabilistic interpretation in turn takes the same form as that of self-information. However, applying such information-theoretic notions to problems in information retrieval leads to problems when trying to define the appropriate event spaces for the required probability distributions: not only documents need to be taken into account, but also queries and terms. == Link with information theory == Both term frequency and inverse document frequency can be formulated in terms of information theory; it helps to understand why their product has a meaning in terms of joint informational content of a document. A characteristic assumption about the distribution p ( d , t ) {\displaystyle p(d,t)} is that: p ( d | t ) = 1 | { d ∈ D : t ∈ d } | {\displaystyle p(d|t)={\frac {1}{|\{d\in D:t\in d\}|}}} This assumption and its implications, according to Aizawa: "represent the heuristic that tf–idf employs." The conditional entropy of a "randomly chosen" document in the corpus D {\displaystyle D} , conditional to the fact it contains a specific term t {\displaystyle t} (and assuming that all documents have equal probability to be chosen) is: H ( D | T = t ) = − ∑ d p d | t log ⁡ p d | t = − log ⁡ 1 | { d ∈ D : t ∈ d } | = log ⁡ | { d ∈ D : t ∈ d } | | D | + log ⁡ | D | = − i d f ( t ) + log ⁡ | D | {\displaystyle H({\cal {D}}|{\cal {T}}=t)=-\sum _{d}p_{d|t}\log p_{d|t}=-\log {\frac {1}{|\{d\in D:t\in d\}|}}=\log {\frac {|\{d\in D:t\in d\}|}{|D|}}+\log |D|=-\mathrm {idf} (t)+\log |D|} In terms of notation, D {\displaystyle {\cal {D}}} and T {\displaystyle {\cal {T}}} are "random variables" corresponding to respectively draw a document or a term. The mutual information can be expressed as M ( T ; D ) = H ( D ) − H ( D | T ) = ∑ t p t ⋅ ( H ( D ) − H ( D | W = t ) ) = ∑ t p t ⋅ i d f ( t ) {\displaystyle M({\cal {T}};{\cal {D}})=H({\cal {D}})-H({\cal {D}}|{\cal {T}})=\sum _{t}p_{t}\cdot (H({\cal {D}})-H({\cal {D}}|W=t))=\sum _{t}p_{t}\cdot \mathrm {idf} (t)} The last step is to expand p t {\displaystyle p_{t}} , the unconditional probability to draw a term, with respect to the (random) choice of a document, to obtain: M ( T ; D ) = ∑ t , d p t | d ⋅ p d ⋅ i d f ( t ) = ∑ t , d t f ( t , d ) ⋅ 1 | D | ⋅ i d f ( t ) = 1 | D | ∑ t , d t f ( t , d ) ⋅ i d f ( t ) . {\displaystyle M({\cal {T}};{\cal {D}})=\sum _{t,d}p_{t|d}\cdot p_{d}\cdot \mathrm {idf} (t)=\sum _{t,d}\mathrm {tf} (t,d)\cdot {\frac {1}{|D|}}\cdot \mathrm {idf} (t)={\frac {1}{|D|}}\sum _{t,d}\mathrm {tf} (t,d)\cdot \mathrm {idf} (t).} This expression shows that summing the Tf–idf of all possible terms and documents recovers the mutual information between documents and term taking into account all the specificities of their joint distribution. Each Tf–idf hence carries the "bit of information" attached to a term x document pair. == Link with statistical theory == Tf–idf is closely related to the negative logarithmically transformed p-value from a one-tailed formulation of Fisher's exact test when the underlying corpus documents satisfy certain idealized assumptions. More recently, tf–idf variants were shown to arise as components in the test st

    Read more →
  • AI Content Generators Reviews: What Actually Works in 2026

    AI Content Generators Reviews: What Actually Works in 2026

    In search of the best AI content generator? An AI content generator is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI content generator slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Timo Honkela

    Timo Honkela

    Timo Untamo Honkela (August 4, 1962 – May 9, 2020) was a computer scientist at the University of Helsinki, Aalto University School of Science and Aalto University School of Art, Design and Architecture. He holds a PhD from Helsinki University of Technology. From 2014 until 2018 he held a fixed-term professorship at the University of Helsinki. Before joining the University of Helsinki he worked as a non-tenured professor in two Schools of the Aalto University, The School of Art, Design and Architecture and the School of Science. He has presented his thoughts on his studies and work in the joint blog 375 Humanists. Timo Honkela conducted research on several areas related to knowledge engineering, cognitive modeling and natural language processing. Honkela was born in Kalajoki. From 1998 to 2000 he worked as a professor in the Aalto Media Lab. To the media Lab Honkela brought his expertise in Kohonen self-organising map (SOM) and worked closely with artist and designers around the topic. In 2001 Honkela collaborated with George Legrady to produce an interactive museum installation, Pockets Full of Memories to the Centre Georges Pompidou, National Museum of Modern Art in Paris. The concept, created by Legrady, provided for visitors a possibility to scan their own objects to a database and then organise them by Kohonen Self-Organizing Map algorithm. In 2017 Honkela published a book in Finnish. The book Rauhankone (English: Peace Machine) presents his idea of designing artificial intelligence and machine learning to serve humanity, in practice to help people to live in peace with each other. He died in Helsinki. == Publications == Timo Honkela, Wlodzislaw Duch, Mark Girolami and Samuel Kaski (editors): Artificial Neural Networks and Machine Learning, Springer, 2011. Jorma Laaksonen and Timo Honkela (editors): Advances in Self-Organizing Maps, Springer, 2011. Timo Honkela: Rauhankone. Gaudeamus, 2017.

    Read more →
  • PNGOUT

    PNGOUT

    PNGOUT is a freeware command line optimizer for PNG images written by Ken Silverman. The transformation is lossless, meaning that the resulting image is visually identical to the source image. According to its author, this program can often get higher compression than other optimizers by 5–10%. It is possible to compress some inflated PNGs to a size below 1% of the original file. PNGOUT was also available as a plug-in for the freeware image viewer IrfanView and can be enabled as an option when saving files. It allows editing of various PNGOUT settings via a dialog box. PNGOUT integration was removed in IrfanView version 4.58 in favour of OptiPNG. In 2006, a commercial version of PNGOUT with a graphical user interface, known as PNGOUTWin, was released by Ardfry Imaging, a small company Silverman co-founded in 2005. There is also a freeware GUI frontend to PNGOUT available, known as PNGGauntlet. == Main operation == The main function of PNGOUT is to reduce the size of image data contained in the IDAT chunk. This chunk is compressed using the deflate algorithm. Deflate algorithms can vary in speed and compression ratio, with higher compression ratios generally implying lower speed. Ken Silverman wrote a deflate compressor for PNGOUT that is slower than the ones used in most graphics software, but produces smaller files. PNGOUT also performs automatic bit depth, color, and palette reduction where appropriate.

    Read more →
  • Topic model

    Topic model

    In natural language processing, a topic model is a type of probabilistic, neural, or algebraic model for discovering the abstract topics that occur in a collection of documents. Topic modeling is a frequently used text mining tool for discovering hidden semantic features and structures in a text. The topics produced by topic models are generated through a variety of mathematical frameworks, including probabilistic generative models, matrix factorization methods based on word co-occurrence, and clustering algorithms applied to semantic embeddings. Topic models are commonly used to organize and discover latent features in large collections of unstructured text and other forms of big data. Beyond text mining, topic models have also been used to uncover latent structures in fields such as genetic information, bioinformatics, computer vision, and social networks. == History == An early topic model was described by Papadimitriou, Raghavan, Tamaki and Vempala in 1998. Another one, called probabilistic latent semantic analysis (PLSA), was created by Thomas Hofmann in 1999. Latent Dirichlet allocation (LDA), perhaps the most common topic model currently in use, is a generalization of PLSA. Developed by David Blei, Andrew Ng, and Michael I. Jordan in 2002, LDA introduces sparse Dirichlet prior distributions over document-topic and topic-word distributions, encoding the intuition that documents cover a small number of topics and that topics often use a small number of words. Other topic models are generally extensions on LDA, such as Pachinko allocation, which improves on LDA by modeling correlations between topics in addition to the word correlations which constitute topics. Hierarchical latent tree analysis (HLTA) is an alternative to LDA, which models word co-occurrence using a tree of latent variables and the states of the latent variables, which correspond to soft clusters of documents, are interpreted as topics. == Topic models for context information == Approaches for temporal information include Block and Newman's determination of the temporal dynamics of topics in the Pennsylvania Gazette during 1728–1800. Griffiths & Steyvers used topic modeling on abstracts from the journal PNAS to identify topics that rose or fell in popularity from 1991 to 2001 whereas Lamba & Madhusushan used topic modeling on full-text research articles retrieved from DJLIT journal from 1981 to 2018. In the field of library and information science, Lamba & Madhusudhan applied topic modeling on different Indian resources like journal articles and electronic theses and resources (ETDs). Nelson has been analyzing change in topics over time in the Richmond Times-Dispatch to understand social and political changes and continuities in Richmond during the American Civil War. Yang, Torget and Mihalcea applied topic modeling methods to newspapers from 1829 to 2008. Mimno used topic modelling with 24 journals on classical philology and archaeology spanning 150 years to look at how topics in the journals change over time and how the journals become more different or similar over time. Yin et al. introduced a topic model for geographically distributed documents, where document positions are explained by latent regions which are detected during inference. Chang and Blei included network information between linked documents in the relational topic model, to model the links between websites. The author-topic model by Rosen-Zvi et al. models the topics associated with authors of documents to improve the topic detection for documents with authorship information. HLTA was applied to a collection of recent research papers published at major AI and Machine Learning venues. The resulting model is called The AI Tree. The resulting topics are used to index the papers at aipano.cse.ust.hk to help researchers track research trends and identify papers to read, and help conference organizers and journal editors identify reviewers for submissions. To improve the qualitative aspects and coherency of generated topics, some researchers have explored the efficacy of "coherence scores", or otherwise how computer-extracted clusters (i.e. topics) align with a human benchmark. Coherence scores are metrics for optimising the number of topics to extract from a document corpus. == Algorithms == In practice, researchers attempt to fit appropriate model parameters to the data corpus using one of several heuristics for maximum likelihood fit. A survey by D. Blei describes this suite of algorithms. Several groups of researchers starting with Papadimitriou et al. have attempted to design algorithms with provable guarantees. Assuming that the data were actually generated by the model in question, they try to design algorithms that probably find the model that was used to create the data. Techniques used here include singular value decomposition (SVD) and the method of moments. In 2012 an algorithm based upon non-negative matrix factorization (NMF) was introduced that also generalizes to topic models with correlations among topics. Since 2017, neural networks has been leveraged in topic modeling in order to improve the speed of inference, and leading to further advancements like vONTSS, which allows humans to incorporate domain knowledge via weakly supervised learning. In 2018, a new approach to topic models was proposed based on the stochastic block model. Topic modeling has leveraged LLMs through contextual embedding and fine tuning. == Applications of topic models == === To quantitative biomedicine === Topic models are being used also in other contexts. For examples uses of topic models in biology and bioinformatics research emerged. Recently topic models has been used to extract information from dataset of cancers' genomic samples. In this case topics are biological latent variables to be inferred. === To analysis of music and creativity === Topic models can be used for analysis of continuous signals like music. For instance, they were used to quantify how musical styles change in time, and identify the influence of specific artists on later music creation.

    Read more →
  • Cognitive computer

    Cognitive computer

    A cognitive computer is a computer that hardwires artificial intelligence and machine learning algorithms into an integrated circuit that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic engineering approach. Synonyms include neuromorphic chip and cognitive chip. In 2023, IBM's proof-of-concept NorthPole chip (optimized for 2-, 4- and 8-bit precision) achieved remarkable performance in image recognition. In 2013, IBM developed Watson, a cognitive computer that uses neural networks and deep learning techniques. The following year, it developed the 2014 TrueNorth microchip architecture which is designed to be closer in structure to the human brain than the von Neumann architecture used in conventional computers. In 2017, Intel also announced its version of a cognitive chip in "Loihi, which it intended to be available to university and research labs in 2018. Intel (most notably with its Pohoiki Beach and Springs systems), Qualcomm, and others are improving neuromorphic processors steadily. == IBM TrueNorth chip == TrueNorth was a neuromorphic CMOS integrated circuit produced by IBM in 2014. It is a manycore processor network on a chip design, with 4096 cores, each one having 256 programmable simulated neurons for a total of just over a million neurons. In turn, each neuron has 256 programmable "synapses" that convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). Its basic transistor count is 5.4 billion. In 2023 Zhejiang University and Alibaba developed Darwin a neuromorphic chip The darwin3 chip was designed around 2023 so it is fairly modern compared to IBM's TrueNorth or Intel's LoihI. === Details === Memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von Neumann-architecture bottleneck and is very energy-efficient, with IBM claiming a power consumption of 70 milliwatts and a power density that is 1/10,000th of conventional microprocessors. The SyNAPSE chip operates at lower temperatures and power because it only draws power necessary for computation. Skyrmions have been proposed as models of the synapse on a chip. The neurons are emulated using a Linear-Leak Integrate-and-Fire (LLIF) model, a simplification of the leaky integrate-and-fire model. According to IBM, it does not have a clock, operates on unary numbers, and computes by counting to a maximum of 19 bits. The cores are event-driven by using both synchronous and asynchronous logic, and are interconnected through an asynchronous packet-switched mesh network on chip (NOC). IBM developed a new network to program and use TrueNorth. It included a simulator, a new programming language, an integrated programming environment, and libraries. This lack of backward compatibility with any previous technology (e.g., C++ compilers) poses serious vendor lock-in risks and other adverse consequences that may prevent it from commercialization in the future. === Research === In 2018, a cluster of TrueNorth network-linked to a master computer was used in stereo vision research that attempted to extract the depth of rapidly moving objects in a scene. == IBM NorthPole chip == In 2023, IBM released its NorthPole chip, which is a proof-of-concept for dramatically improving performance by intertwining compute with memory on-chip, thus eliminating the Von Neumann bottleneck. It blends approaches from IBM's 2014 TrueNorth system with modern hardware designs to achieve speeds about 4,000 times faster than TrueNorth. It can run ResNet-50 or Yolo-v4 image recognition tasks about 22 times faster, with 25 times less energy and 5 times less space, when compared to GPUs which use the same 12-nm node process that it was fabricated with. It includes 224 MB of RAM and 256 processor cores and can perform 2,048 operations per core per cycle at 8-bit precision, and 8,192 operations at 2-bit precision. It runs at between 25 and 425 MHz. This is an inferencing chip, but it cannot yet handle GPT-4 because of memory and accuracy limitations == Intel Loihi chip == === Pohoiki Springs === Pohoiki Springs is a system that incorporates Intel's self-learning neuromorphic chip, named Loihi, introduced in 2017, perhaps named after the Hawaiian seamount Lōʻihi. Intel claims Loihi is about 1000 times more energy efficient than general-purpose computing systems used to train neural networks. In theory, Loihi supports both machine learning training and inference on the same silicon independently of a cloud connection, and more efficiently than convolutional neural networks or deep learning neural networks. Intel points to a system for monitoring a person's heartbeat, taking readings after events such as exercise or eating, and using the chip to normalize the data and work out the ‘normal’ heartbeat. It can then spot abnormalities and deal with new events or conditions. The first iteration of the chip was made using Intel's 14 nm fabrication process and houses 128 clusters of 1,024 artificial neurons each for a total of 131,072 simulated neurons. This offers around 130 million synapses, far less than the human brain's 800 trillion synapses, and behind IBM's TrueNorth. Loihi is available for research purposes among more than 40 academic research groups as a USB form factor. In October 2019, researchers from Rutgers University published a research paper to demonstrate the energy efficiency of Intel's Loihi in solving simultaneous localization and mapping. In March 2020, Intel and Cornell University published a research paper to demonstrate the ability of Intel's Loihi to recognize different hazardous materials, which could eventually aid to "diagnose diseases, detect weapons and explosives, find narcotics, and spot signs of smoke and carbon monoxide". === Pohoiki Beach === Intel's Loihi 2, named Pohoiki Beach, was released in September 2021 with 64 cores. It boasts faster speeds, higher-bandwidth inter-chip communications for enhanced scalability, increased capacity per chip, a more compact size due to process scaling, and improved programmability. === Hala Point === Hala Point packages 1,152 Loihi 2 processors produced on Intel 3 process node in a six-rack-unit chassis. The system supports up to 1.15 billion neurons and 128 billion synapses distributed over 140,544 neuromorphic processing cores, consuming 2,600 watts of power. It includes over 2,300 embedded x86 processors for ancillary computations. Intel claimed in 2024 that Hala Point was the world’s largest neuromorphic system. It uses Loihi 2 chips. It is claimed to offer 10x more neuron capacity and up to 12x higher performance. The Darwin3 chip exceeds these specs. Hala Point provides up to 20 quadrillion operations per second, (20 petaops), with efficiency exceeding 15 trillion (8-bit) operations s−1 W−1 on conventional deep neural networks. Hala Point integrates processing, memory and communication channels in a massively parallelized fabric, providing 16 PB s−1 of memory bandwidth, 3.5 PB s−1 of inter-core communication bandwidth, and 5 TB s−1 of inter-chip bandwidth. The system can process its 1.15 billion neurons 20 times faster than a human brain. Its neuron capacity is roughly equivalent to that of an owl brain or the cortex of a capuchin monkey. Loihi-based systems can perform inference and optimization using 100 times less energy at speeds as much as 50 times faster than CPU/GPU architectures. Intel claims that Hala Point can create LLMs. Much further research is needed == SpiNNaker == SpiNNaker (Spiking Neural Network Architecture) is a massively parallel, manycore supercomputer architecture designed by the Advanced Processor Technologies Research Group at the Department of Computer Science, University of Manchester. == Criticism == Critics argue that a room-sized computer – as in the case of IBM's Watson – is not a viable alternative to a three-pound human brain. Some also cite the difficulty for a single system to bring so many elements together, such as the disparate sources of information as well as computing resources. In 2021, The New York Times released Steve Lohr's article "What Ever Happened to IBM’s Watson?". He wrote about some costly failures of IBM Watson. One of them, a cancer-related project called the Oncology Expert Advisor, was abandoned in 2016 as a costly failure. During the collaboration, Watson could not use patient data. Watson struggled to decipher doctors’ notes and patient histories. The development of LLMs has placed a new emphasis on cognitive computers, because the Transformer technology that underpins LLMs demands huge energy for GPUs and PCs. Cognitive computers use significantly less energy, but the details of STDPs and neuron models cannot yet match the accuracy of backprop, and so ANN to SNN weight translations such as QAT and PQT or progressive quantization are becoming popular, with their own limitations.

    Read more →
  • Top 10 AI Virtual Assistants Compared (2026)

    Top 10 AI Virtual Assistants Compared (2026)

    Looking for the best AI virtual assistant? An AI virtual assistant is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI virtual assistant slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Contextual image classification

    Contextual image classification

    Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood. The goal of this approach is to classify the images by using the contextual information. == Introduction == Similar as processing language, a single word may have multiple meanings unless the context is provided, and the patterns within the sentences are the only informative segments we care about. For images, the principle is same. Find out the patterns and associate proper meanings to them. As the image illustrated below, if only a small portion of the image is shown, it is very difficult to tell what the image is about. Even try another portion of the image, it is still difficult to classify the image. However, if we increase the contextual of the image, then it makes more sense to recognize. As the full images shows below, almost everyone can classify it easily. During the procedure of segmentation, the methods which do not use the contextual information are sensitive to noise and variations, thus the result of segmentation will contain a great deal of misclassified regions, and often these regions are small (e.g., one pixel). Compared to other techniques, this approach is robust to noise and substantial variations for it takes the continuity of the segments into account. Several methods of this approach will be described below. == Applications == === Functioning as a post-processing filter to a labelled image === This approach is very effective against small regions caused by noise. And these small regions are usually formed by few pixels or one pixel. The most probable label is assigned to these regions. However, there is a drawback of this method. The small regions also can be formed by correct regions rather than noise, and in this case the method is actually making the classification worse. This approach is widely used in remote sensing applications. === Improving the post-processing classification === This is a two-stage classification process: For each pixel, label the pixel and form a new feature vector for it. Use the new feature vector and combine the contextual information to assign the final label to the === Merging the pixels in earlier stages === Instead of using single pixels, the neighbour pixels can be merged into homogeneous regions benefiting from contextual information. And provide these regions to classifier. === Acquiring pixel feature from neighbourhood === The original spectral data can be enriched by adding the contextual information carried by the neighbour pixels, or even replaced in some occasions. This kind of pre-processing methods are widely used in textured image recognition. The typical approaches include mean values, variances, texture description, etc. === Combining spectral and spatial information === The classifier uses the grey level and pixel neighbourhood (contextual information) to assign labels to pixels. In such case the information is a combination of spectral and spatial information. === Powered by the Bayes minimum error classifier === Contextual classification of image data is based on the Bayes minimum error classifier (also known as a naive Bayes classifier). Present the pixel: A pixel is denoted as x 0 {\displaystyle x_{0}} . The neighbourhood of each pixel x 0 {\displaystyle x_{0}} is a vector and denoted as N ( x 0 ) {\displaystyle N(x_{0})} . The values in the neighbourhood vector is denoted as f ( x i ) {\displaystyle f(x_{i})} . Each pixel is presented by the vector ξ = ( f ( x 0 ) , f ( x 1 ) , … , f ( x k ) ) {\displaystyle \xi =\left(f(x_{0}),f(x_{1}),\ldots ,f(x_{k})\right)} x i ∈ N ( x 0 ) ; i = 1 , … , k {\displaystyle x_{i}\in N(x_{0});\quad i=1,\ldots ,k} The labels (classification) of pixels in the neighbourhood N ( x 0 ) {\displaystyle N(x_{0})} are presented as a vector η = ( θ 0 , θ 1 , … , θ k ) {\displaystyle \eta =\left(\theta _{0},\theta _{1},\ldots ,\theta _{k}\right)} θ i ∈ { ω 0 , ω 1 , … , ω k } {\displaystyle \theta _{i}\in \left\{\omega _{0},\omega _{1},\ldots ,\omega _{k}\right\}} ω s {\displaystyle \omega _{s}} here denotes the assigned class. A vector presents the labels in the neighbourhood N ( x 0 ) {\displaystyle N(x_{0})} without the pixel x 0 {\displaystyle x_{0}} η ^ = ( θ 1 , θ 2 , … , θ k ) {\displaystyle {\hat {\eta }}=\left(\theta _{1},\theta _{2},\ldots ,\theta _{k}\right)} The neighbourhood: Size of the neighbourhood. There is no limitation of the size, but it is considered to be relatively small for each pixel x 0 {\displaystyle x_{0}} . A reasonable size of neighbourhood would be 3 × 3 {\displaystyle 3\times 3} of 4-connectivity or 8-connectivity ( x 0 {\displaystyle x_{0}} is marked as red and placed in the centre). The calculation: Apply the minimum error classification on a pixel x 0 {\displaystyle x_{0}} , if the probability of a class ω r {\displaystyle \omega _{r}} being presenting the pixel x 0 {\displaystyle x_{0}} is the highest among all, then assign ω r {\displaystyle \omega _{r}} as its class. θ 0 = ω r if P ( ω r ∣ f ( x 0 ) ) = max s = 1 , 2 , … , R P ( ω s ∣ f ( x 0 ) ) {\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid f(x_{0}))=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid f(x_{0}))} The contextual classification rule is described as below, it uses the feature vector x 1 {\displaystyle x_{1}} rather than x 0 {\displaystyle x_{0}} . θ 0 = ω r if P ( ω r ∣ ξ ) = max s = 1 , 2 , … , R P ( ω s ∣ ξ ) {\displaystyle \theta _{0}=\omega _{r}\quad {\text{ if }}\quad P(\omega _{r}\mid \xi )=\max _{s=1,2,\ldots ,R}P(\omega _{s}\mid \xi )} Use the Bayes formula to calculate the posteriori probability P ( ω s ∣ ξ ) {\displaystyle P(\omega _{s}\mid \xi )} P ( ω s ∣ ξ ) = p ( ξ ∣ ω s ) P ( ω s ) p ( ξ ) {\displaystyle P(\omega _{s}\mid \xi )={\frac {p(\xi \mid \omega _{s})P(\omega _{s})}{p\left(\xi \right)}}} The number of vectors is the same as the number of pixels in the image. For the classifier uses a vector corresponding to each pixel x i {\displaystyle x_{i}} , and the vector is generated from the pixel's neighbourhood. The basic steps of contextual image classification: Calculate the feature vector ξ {\displaystyle \xi } for each pixel. Calculate the parameters of probability distribution p ( ξ ∣ ω s ) {\displaystyle p(\xi \mid \omega _{s})} and P ( ω s ) {\displaystyle P(\omega _{s})} Calculate the posterior probabilities P ( ω r ∣ ξ ) {\displaystyle P(\omega _{r}\mid \xi )} and all labels θ 0 {\displaystyle \theta _{0}} . Get the image classification result. == Algorithms == === Template matching === The template matching is a "brute force" implementation of this approach. The concept is first create a set of templates, and then look for small parts in the image match with a template. This method is computationally high and inefficient. It keeps an entire templates list during the whole process and the number of combinations is extremely high. For a m × n {\displaystyle m\times n} pixel image, there could be a maximum of 2 m × n {\displaystyle 2^{m\times n}} combinations, which leads to high computation. This method is a top down method and often called table look-up or dictionary look-up. === Lower-order Markov chain === The Markov chain also can be applied in pattern recognition. The pixels in an image can be recognised as a set of random variables, then use the lower order Markov chain to find the relationship among the pixels. The image is treated as a virtual line, and the method uses conditional probability. === Hilbert space-filling curves === The Hilbert curve runs in a unique pattern through the whole image, it traverses every pixel without visiting any of them twice and keeps a continuous curve. It is fast and efficient. === Markov meshes === The lower-order Markov chain and Hilbert space-filling curves mentioned above are treating the image as a line structure. The Markov meshes however will take the two dimensional information into account. === Dependency tree === The dependency tree is a method using tree dependency to approximate probability distributions.

    Read more →
  • The Best Free AI Code-review Tool for Beginners

    The Best Free AI Code-review Tool for Beginners

    Curious about the best AI code-review tool? An AI code-review tool is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI code-review tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Transfer-based machine translation

    Transfer-based machine translation

    Transfer-based machine translation is a type of machine translation (MT). It is currently one of the most widely used methods of machine translation. In contrast to the simpler direct model of MT, transfer MT breaks translation into three steps: analysis of the source language text to determine its grammatical structure, transfer of the resulting structure to a structure suitable for generating text in the target language, and finally generation of this text. Transfer-based MT systems are thus capable of using knowledge of the source and target languages. == Design == Both transfer-based and interlingua-based machine translation have the same idea: to make a translation it is necessary to have an intermediate representation that captures the "meaning" of the original sentence in order to generate the correct translation. In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved. The way in which transfer-based machine translation systems work varies substantially, but in general they follow the same pattern: they apply sets of linguistic rules which are defined as correspondences between the structure of the source language and that of the target language. The first stage involves analysing the input text for morphology and syntax (and sometimes semantics) to create an internal representation. The translation is generated from this representation using both bilingual dictionaries and grammatical rules. It is possible with this translation strategy to obtain fairly high quality translations, with accuracy in the region of 90% (although this is highly dependent on the language pair in question, for example the distance between the two). == Operation == In a rule-based machine translation system the original text is first analysed morphologically and syntactically in order to obtain a syntactic representation. This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information. The transfer process then converts this final representation (still in the original language) to a representation of the same level of abstraction in the target language. These two representations are referred to as "intermediate" representations. From the target language representation, the stages are then applied in reverse. == Analysis and transformation == Various methods of analysis and transformation can be used before obtaining the final result. Along with these statistical approaches may be augmented generating hybrid systems. The methods which are chosen and the emphasis depends largely on the design of the system, however, most systems include at least the following stages: Morphological analysis. Surface forms of the input text are classified as to part-of-speech (e.g. noun, verb, etc.) and sub-category (number, gender, tense, etc.). All of the possible "analyses" for each surface form are typically made output at this stage, along with the lemma of the word. Lexical categorisation. In any given text some of the words may have more than one meaning, causing ambiguity in analysis. Lexical categorisation looks at the context of a word to try to determine the correct meaning in the context of the input. This can involve part-of-speech tagging and word sense disambiguation. Lexical transfer. This is basically dictionary translation; the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen. Structural transfer. While the previous stages deal with words, this stage deals with larger constituents, for example phrases and chunks. Typical features of this stage include concordance of gender and number, and re-ordering of words or phrases. Morphological generation. From the output of the structural transfer stage, the target language surface forms are generated. == Transfer types == One of the main features of transfer-based machine translation systems is a phase that "transfers" an intermediate representation of the text in the original language to an intermediate representation of text in the target language. This can work at one of two levels of linguistic analysis, or somewhere in between. The levels are: Superficial transfer (or syntactic). This level is characterised by transferring "syntactic structures" between the source and target languages. It is suitable for languages in the same family or of the same type, for example in the Romance languages between Spanish, Catalan, French, Italian, etc. Deep transfer (or semantic). This level constructs a semantic representation that is dependent on the source language. This representation can consist of a series of structures which represent the meaning. In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related languages (e.g. Spanish-English or Spanish-Basque, etc.)

    Read more →