AI Chat Character Talkie

AI Chat Character Talkie — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Teaspiller

    Teaspiller

    Teaspiller was a US-based web application for customers to find accountants and hire them to do their taxes and accounting online. In 2013 the company was acquired by Intuit, Inc and added to its TurboTax product line. The Teaspiller employees and code were all acquired and the product was renamed as "TurboTax CPA select". It enabled accountants to work remotely with clients (share files, send secure messages, schedule appointments), as well as find new clients looking for their specific skills through a complex search algorithm. This was done through extended profiles containing licensing information, professional histories, user ratings, peer endorsements, association memberships, and practice areas. The service had been called an H&R Block killer by Business Insider as it helped customers find accountants to prepare tax returns online. As of 2011 it had 20,000 US accountants listed on the site. The application was built using the Django framework. == History == Teaspiller was built by Vemdara, LLC, a web company based in New York and founded in 2009 by Amit Vemuri (a former VP at Travelocity). The web application was launched in 2010. In 2013 the company was acquired by Intuit as part of their TurboTax product line and renamed as "TurboTax CPA select".

    Read more →
  • International Computer Archive of Modern and Medieval English

    International Computer Archive of Modern and Medieval English

    The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in corpus linguistics to digitise English texts. The organisation was founded in Oslo, Norway in 1977 as the International Computer Archive of Modern English, before being renamed to its current title. Its primary objectives were: collecting and distributing information on English language material available for computer processing; and linguistic research completed or in progress on this material; compiling an archive of corpora to be located at the University of Bergen, from where copies of the material can be obtained at cost. The portal to their materials is hosted at the University of Bergen, where they have set out the aim of the organization to "collect and distribute information on English language material available for computer processing and on linguistic research to compile an archive of English text corpora in machine-readable form, and to make material available to research institutions." Creating computer corpora, i.e. collections of texts in machine-readable form, is the most accessible way to study both transcribed spoken language and various genres of written texts for modern scholars, including both "descriptive and more theoretically-minded linguists". The ICAME group hosts academic conferences that focus on corpus linguistic studies of historical changes and contemporary grammatical descriptions of English, and makes corpora of different varieties of English available to scholars, starting with editions of the 1960s Brown Corpus. Their first academic conference was held in Bergen, Norway in 1979, and scholars who were interested in corpus linguistics continued to meet each spring in different European and English-speaking countries. At these meetings, the compilation and distribution of corpora they enabled played a key role in the creation of the field of corpus linguistics in the 20th century, a precursor to current big data analytics. In summarizing the field, Kennedy's Introduction to Corpus Linguistics notes that "for corpus linguists with an interest in the description of English, the International Computer Archive of Modern and Medieval English has been the major resource". The influence of ICAME on the field has also be laid out in Facchinetti's history, Corpus Linguistics Twenty-five Years On. One influential resource that ICAME made available was a CD of 20 different corpora, including those covering different regional Englishes (such as the Australian Corpus of English, the Wellington Corpus of Spoken New Zealand English, the Kolhapur Corpus of Indian English, the Bergen Corpus of London Teenage Language (COLT), the Helsinki Corpus of Older Scots, and the International Corpus of English—East-African component), as well as versions of the Brown Corpus and the Lancaster-Bergen-Oslo (LOB) corpus tagged for part of speech. ICAME also published an annual journal, the ICAME Journal, formerly ICAME News, that contains articles, conference reports, reviews and notices related to corpus linguistics. The current editors of the ICAME Journal are Merja Kytö and Anna-Brita Stenström.I am wearing a tie clip in the shape of a monkey wrench... The story behind this peculiar piece of jewelry goes back to the early 60s when I was assembling the notorious Brown Corpus and others were using computers to make concordances of William Butler Yeats and other poets. One of my colleagues, a specialist in modem Irish literature, was heard to remark that anyone who would use a computer on good literature was nothing but a plumber. Some of my students responded by forming a linguistic plumber's union, the symbol of which was, of course, a monkey wrench.

    Read more →
  • Top 10 AI Presentation Makers Compared (2026)

    Top 10 AI Presentation Makers Compared (2026)

    Trying to pick the best AI presentation maker? An AI presentation maker is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI presentation maker slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Is an AI Humanizer Worth It in 2026?

    Is an AI Humanizer Worth It in 2026?

    Shopping for the best AI humanizer? An AI humanizer is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI humanizer slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Morphological antialiasing

    Morphological antialiasing

    Morphological antialiasing (MLAA) is a spatial anti-aliasing technique used in real-time computer graphics. It reduces artifacts, such as jaggies, when representing a high-resolution image at a lower resolution. MLAA is a post-process filtering which detects borders in the resulting image and then finds specific patterns in these. Anti-aliasing is achieved by blending pixels in these borders, according to the pattern they belong to and their position within the pattern. Introduced in 2009, MLAA was an early and influential example of anti-aliasing techniques done in post-processing, which makes them suitable for deferred shading. A similar method in this class is fast approximate anti-aliasing (FXAA). Temporal anti-aliasing, also a post-process, has become the most common anti-aliasing method for real-time rendering and video games. Enhanced subpixel morphological antialiasing, or SMAA, is an image-based GPU-based implementation of MLAA developed by Universidad de Zaragoza and Crytek.

    Read more →
  • ROUGE (metric)

    ROUGE (metric)

    ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. ROUGE metrics range between 0 and 1, with higher scores indicating higher similarity between the automatically produced summary and the reference. == Metrics == The following five evaluation metrics are available. ROUGE-N: Overlap of n-grams between the system and reference summaries. ROUGE-1 refers to the overlap of unigrams (each word) between the system and reference summaries. ROUGE-2 refers to the overlap of bigrams between the system and reference summaries. ROUGE-L: Longest Common Subsequence (LCS) based statistics. Longest common subsequence problem takes into account sentence-level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically. ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes. ROUGE-S: Skip-bigram based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order. ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.

    Read more →
  • How to Choose an AI Clip Maker

    How to Choose an AI Clip Maker

    Curious about the best AI clip maker? An AI clip maker is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI clip maker slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Collocation extraction

    Collocation extraction

    Collocation extraction is the task of using a computer to extract collocations automatically from a corpus. The traditional method of performing collocation extraction is to find a formula based on the statistical quantities of those words to calculate a score associated to every word pairs. Proposed formulas are mutual information, t-test, z test, chi-squared test and likelihood ratio. Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist' or ‘collocation extraction’ its very self.

    Read more →
  • Two-phase locking

    Two-phase locking

    In databases and transaction processing, two-phase locking (2PL) is a pessimistic concurrency control method that guarantees conflict-serializability. It is also the name of the resulting set of database transaction schedules (histories). The protocol uses locks, applied by a transaction to data, which may block (interpreted as signals to stop) other transactions from accessing the same data during the transaction's life. By the 2PL protocol, locks are applied and removed in two phases: Expanding phase: locks are acquired and no locks are released. Shrinking phase: locks are released and no locks are acquired. Two types of locks are used by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may use more lock types. Using locks that block processes, 2PL, S2PL, and SS2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions. == Read and write locks == Locks are used to guarantee serializability. A transaction is holding a lock on an object if that transaction has acquired a lock on that object which has not yet been released. For 2PL, the only used data-access locks are read-locks (shared locks) and write-locks (exclusive locks). Below are the rules for read-locks and write-locks: A transaction is allowed to read an object if and only if it is holding a read-lock or write-lock on that object. A transaction is allowed to write an object if and only if it is holding a write-lock on that object. A schedule (i.e., a set of transactions) is allowed to hold multiple locks on the same object simultaneously if and only if none of those locks are write-locks. If a disallowed lock attempts on being held simultaneously, it will be blocked. == Variants == Note that all conflict serializable schedules are also view serializable (but not vice-versa). === Two-phase locking === According to the two-phase locking protocol, each transaction handles its locks in two distinct, consecutive phases during the transaction's execution: Expanding phase (aka Growing phase): locks are acquired and no locks are released (the number of locks can only increase). Shrinking phase (aka Contracting phase): locks are released and no locks are acquired. The two phase locking rules can be summarized as: each transaction must never acquire a lock after it has released a lock. The serializability property is guaranteed for a schedule with transactions that obey this rule. Typically, without explicit knowledge in a transaction on end of phase 1, the rule is safely determined only when a transaction has completed processing and requested commit. In this case, all the locks can be released at once (phase 2). === Conservative two-phase locking === Conservative two-phase locking (C2PL) differs from 2PL in that transactions obtain all the locks they need before the actual execution begins. This is to ensure that a transaction that already holds some locks will not block waiting for other locks. C2PL prevents deadlocks. In cases of heavy lock contention, C2PL reduces the time locks are held on average, relative to 2PL and Strict 2PL, because transactions that hold locks are never blocked. In light lock contention, C2PL holds more locks than is necessary, because it is difficult to predict which locks will be needed in the future, thus leading to higher overhead. A C2PL transaction will not obtain any locks if it cannot obtain all the locks it needs in its initial request. Furthermore, each transaction needs to declare its read and write set (the data items that will be read/written), which is not always possible. Because of these limitations, C2PL is not used very frequently. === Strict two-phase locking === To comply with the strict two-phase locking (S2PL) protocol, a transaction needs to comply with 2PL, and release its write (exclusive) locks only after the transaction has ended (i.e., either committed or aborted). On the other hand, read (shared) locks are released regularly during the shrinking phase. Unlike 2PL, S2PL provides strictness (a special case of cascade-less recoverability). This protocol is not appropriate in B-trees because it causes Bottleneck (while B-trees always starts searching from the parent root). === Strong strict two-phase locking === or Rigorousness, or Rigorous scheduling, or Rigorous two-phase locking To comply with strong strict two-phase locking (SS2PL), a transaction's read and write locks are released only after that transaction has ended (i.e., either committed or aborted). A transaction obeying SS2PL has only a phase 1 and lacks a phase 2 until the transaction has completed. Every SS2PL schedule is also an S2PL schedule, but not vice versa.

    Read more →
  • International Computer Archive of Modern and Medieval English

    International Computer Archive of Modern and Medieval English

    The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in corpus linguistics to digitise English texts. The organisation was founded in Oslo, Norway in 1977 as the International Computer Archive of Modern English, before being renamed to its current title. Its primary objectives were: collecting and distributing information on English language material available for computer processing; and linguistic research completed or in progress on this material; compiling an archive of corpora to be located at the University of Bergen, from where copies of the material can be obtained at cost. The portal to their materials is hosted at the University of Bergen, where they have set out the aim of the organization to "collect and distribute information on English language material available for computer processing and on linguistic research to compile an archive of English text corpora in machine-readable form, and to make material available to research institutions." Creating computer corpora, i.e. collections of texts in machine-readable form, is the most accessible way to study both transcribed spoken language and various genres of written texts for modern scholars, including both "descriptive and more theoretically-minded linguists". The ICAME group hosts academic conferences that focus on corpus linguistic studies of historical changes and contemporary grammatical descriptions of English, and makes corpora of different varieties of English available to scholars, starting with editions of the 1960s Brown Corpus. Their first academic conference was held in Bergen, Norway in 1979, and scholars who were interested in corpus linguistics continued to meet each spring in different European and English-speaking countries. At these meetings, the compilation and distribution of corpora they enabled played a key role in the creation of the field of corpus linguistics in the 20th century, a precursor to current big data analytics. In summarizing the field, Kennedy's Introduction to Corpus Linguistics notes that "for corpus linguists with an interest in the description of English, the International Computer Archive of Modern and Medieval English has been the major resource". The influence of ICAME on the field has also be laid out in Facchinetti's history, Corpus Linguistics Twenty-five Years On. One influential resource that ICAME made available was a CD of 20 different corpora, including those covering different regional Englishes (such as the Australian Corpus of English, the Wellington Corpus of Spoken New Zealand English, the Kolhapur Corpus of Indian English, the Bergen Corpus of London Teenage Language (COLT), the Helsinki Corpus of Older Scots, and the International Corpus of English—East-African component), as well as versions of the Brown Corpus and the Lancaster-Bergen-Oslo (LOB) corpus tagged for part of speech. ICAME also published an annual journal, the ICAME Journal, formerly ICAME News, that contains articles, conference reports, reviews and notices related to corpus linguistics. The current editors of the ICAME Journal are Merja Kytö and Anna-Brita Stenström.I am wearing a tie clip in the shape of a monkey wrench... The story behind this peculiar piece of jewelry goes back to the early 60s when I was assembling the notorious Brown Corpus and others were using computers to make concordances of William Butler Yeats and other poets. One of my colleagues, a specialist in modem Irish literature, was heard to remark that anyone who would use a computer on good literature was nothing but a plumber. Some of my students responded by forming a linguistic plumber's union, the symbol of which was, of course, a monkey wrench.

    Read more →
  • Radford M. Neal

    Radford M. Neal

    Radford M. Neal (born September 12, 1956) is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he held a Canada research chair in statistics and machine learning. == Education and career == Neal studied computer science at the University of Calgary, where he received his B.Sc. in 1977 and M.Sc. in 1980, with thesis work supervised by David Hill. He worked for several years as a sessional instructor at the University of Calgary and as a statistical consultant in the industry before coming back to the academia. Neal continued his study at the University of Toronto, where he received his Ph.D. in 1995 under the supervision of Geoffrey Hinton. Neal became an assistant professor at the University of Toronto in 1995, an associated professor in 1999 and a full professor since 2001. He was the Canada Research Chair in Statistics and Machine Learning from 2003 to 2016 and retired in 2017. Neal has made great contributions in the area of machine learning and statistics, where he is particularly well known for his work on Markov chain Monte Carlo, error correcting codes and Bayesian learning for neural networks. He is also known for his blog and as the developer of pqR: a new version of the R interpreter.

    Read more →
  • Moore machine

    Moore machine

    In the theory of computation, a Moore machine is a finite-state machine whose current output values are determined only by its current state. This is in contrast to a Mealy machine, whose output values are determined both by its current state and by the values of its inputs. Like other finite state machines, in Moore machines, the input typically influences the next state. Thus the input may indirectly influence subsequent outputs, but not the current or immediate output. The Moore machine is named after Edward F. Moore, who presented the concept in a 1956 paper, “Gedanken-experiments on Sequential Machines.” == Formal definition == A Moore machine can be defined as a 6-tuple ( S , s 0 , Σ , Λ , δ , G ) {\displaystyle (S,s_{0},\Sigma ,\Lambda ,\delta ,G)} consisting of the following: A finite set of states S {\displaystyle S} A start state (also called initial state) s 0 {\displaystyle s_{0}} which is an element of S {\displaystyle S} A finite set called the input alphabet Σ {\displaystyle \Sigma } A finite set called the output alphabet Λ {\displaystyle \Lambda } A transition function δ : S × Σ → S {\displaystyle \delta :S\times \Sigma \rightarrow S} mapping a state and the input alphabet to the next state An output function G : S → Λ {\displaystyle G:S\rightarrow \Lambda } mapping each state to the output alphabet "Evolution across time" is realized in this abstraction by having the state machine consult the time-changing input symbol at discrete "timer ticks" t 0 , t 1 , t 2 , . . . {\displaystyle t_{0},t_{1},t_{2},...} and react according to its internal configuration at those idealized instants, or else having the state machine wait for a next input symbol (as on a FIFO) and react whenever it arrives. A Moore machine can be regarded as a restricted type of finite-state transducer. == Visual representation == === Table === A state transition table is a table listing all the triples in the transition relation δ : S × Σ → S {\displaystyle \delta :S\times \Sigma \rightarrow S} . === Diagram === The state diagram for a Moore machine, or Moore diagram, is a state diagram that associates an output value with each state. == Relationship with Mealy machines == As Moore and Mealy machines are both types of finite-state machines, they are equally expressive: either type can be used to parse a regular language. The difference between Moore machines and Mealy machines is that in the latter, the output of a transition is determined by the combination of current state and current input ( S × Σ {\displaystyle S\times \Sigma } as the domain of G {\displaystyle G} ), as opposed to just the current state ( S {\displaystyle S} as the domain of G {\displaystyle G} ). When represented as a state diagram, for a Moore machine, each node (state) is labeled with an output value; for a Mealy machine, each arc (transition) is labeled with an output value. Every Moore machine M {\displaystyle M} is equivalent to the Mealy machine with the same states and transitions and the output function G ( s , σ ) = G M ( δ M ( s , σ ) ) {\displaystyle G(s,\sigma )=G_{M}(\delta _{M}(s,\sigma ))} , which takes each state-input pair ( s , σ ) {\displaystyle (s,\sigma )} and yields G M ( δ M ( s , σ ) ) {\displaystyle G_{M}(\delta _{M}(s,\sigma ))} , where G M {\displaystyle G_{M}} is M {\displaystyle M} 's output function and δ M {\displaystyle \delta _{M}} is M {\displaystyle M} 's transition function. However, not every Mealy machine can be converted to an equivalent Moore machine. Some can be converted only to an almost equivalent Moore machine, with outputs shifted in time. This is due to the way that state labels are paired with transition labels to form the input/output pairs. Consider a transition s i → s j {\displaystyle s_{i}\rightarrow s_{j}} from state s i {\displaystyle s_{i}} to state s j {\displaystyle s_{j}} . The input causing the transition s i → s j {\displaystyle s_{i}\rightarrow s_{j}} labels the edge ( s i , s j ) {\displaystyle (s_{i},s_{j})} . The output corresponding to that input, is the label of state s i {\displaystyle s_{i}} . Notice that this is the source state of the transition. So for each input, the output is already fixed before the input is received, and depends solely on the present state. This is the original definition by E. Moore. It is a common mistake to use the label of state s j {\displaystyle s_{j}} as output for the transition s i → s j {\displaystyle s_{i}\rightarrow s_{j}} . == Examples == Types according to number of inputs/outputs. === Simple === Simple Moore machines have one input and one output: edge detector using XOR binary adding machine clocked sequential systems (a restricted form of Moore machine where the state changes only when the global clock signal changes) Most digital electronic systems are designed as clocked sequential systems. Clocked sequential systems are a restricted form of Moore machine where the state changes only when the global clock signal changes. Typically the current state is stored in flip-flops, and a global clock signal is connected to the "clock" input of the flip-flops. Clocked sequential systems are one way to solve metastability problems. A typical electronic Moore machine includes a combinational logic chain to decode the current state into the outputs (lambda). The instant the current state changes, those changes ripple through that chain, and almost instantaneously the output gets updated. There are design techniques to ensure that no glitches occur on the outputs during that brief period while those changes are rippling through the chain, but most systems are designed so that glitches during that brief transition time are ignored or are irrelevant. The outputs then stay the same indefinitely (LEDs stay bright, power stays connected to the motors, solenoids stay energized, etc.), until the Moore machine changes state again. ==== Worked example ==== A sequential network has one input and one output. The output becomes 1 and remains 1 thereafter when at least two 0's and two 1's have occurred as inputs. A Moore machine with nine states for the above description is shown on the right. The initial state is state A, and the final state is state I. The state table for this example is as follows: === Complex === More complex Moore machines can have multiple inputs as well as multiple outputs. == Gedanken-experiments == In Moore's 1956 paper "Gedanken-experiments on Sequential Machines", the ( n ; m ; p ) {\displaystyle (n;m;p)} automata (or machines) S {\displaystyle S} are defined as having n {\displaystyle n} states, m {\displaystyle m} input symbols and p {\displaystyle p} output symbols. Nine theorems are proved about the structure of S {\displaystyle S} , and experiments with S {\displaystyle S} . Later, " S {\displaystyle S} machines" became known as "Moore machines". At the end of the paper, in Section "Further problems", the following task is stated: Another directly following problem is the improvement of the bounds given at the theorems 8 and 9. Moore's Theorem 8 is formulated as: Given an arbitrary ( n ; m ; p ) {\displaystyle (n;m;p)} machine S {\displaystyle S} , such that every two of its states are distinguishable from one another, then there exists an experiment of length n ( n − 1 ) 2 {\displaystyle {\tfrac {n(n-1)}{2}}} which determines the state of S {\displaystyle S} at the end of the experiment. In 1957, A. A. Karatsuba proved the following two theorems, which completely solved Moore's problem on the improvement of the bounds of the experiment length of his "Theorem 8". Theorem A. If S {\displaystyle S} is an ( n ; m ; p ) {\displaystyle (n;m;p)} machine, such that every two of its states are distinguishable from one another, then there exists a branched experiment of length at most ( n − 1 ) ( n − 2 ) 2 + 1 {\displaystyle {\tfrac {(n-1)(n-2)}{2}}+1} through which one may determine the state of S {\displaystyle S} at the end of the experiment. Theorem B. There exists an ( n ; m ; p ) {\displaystyle (n;m;p)} machine, every two states of which are distinguishable from one another, such that the length of the shortest experiments establishing the state of the machine at the end of the experiment is equal to ( n − 1 ) ( n − 2 ) 2 + 1 {\displaystyle {\tfrac {(n-1)(n-2)}{2}}+1} . Theorems A and B were used for the basis of the course work of a student of the fourth year, A. A. Karatsuba, "On a problem from the automata theory", which was distinguished by testimonial reference at the competition of student works of the faculty of mechanics and mathematics of Moscow State University in 1958. The paper by Karatsuba was given to the journal Uspekhi Mat. Nauk on 17 December 1958 and was published there in June 1960. Until the present day (2011), Karatsuba's result on the length of experiments is the only exact nonlinear result, both in automata theory, and in similar problems of computational complexity theory.

    Read more →
  • Smoothing

    Smoothing

    In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the data points of a signal are modified so individual points higher than the adjacent points (presumably because of noise) are reduced, and points that are lower than the adjacent points are increased, leading to a smoother signal. Reducing noise by smoothing may aid in data analysis in two notable ways: Help uncover more meaningful information from the underlying data, such as trends. Provide analyses that are both flexible and robust. Many different algorithms are used in smoothing, most commonly binning, kernels, and local weighted regression. == Compared to curve fitting == Smoothing may be distinguished from the related and partially overlapping concept of curve fitting in the following ways: curve fitting often involves the use of an explicit function form for the result, whereas the immediate results from smoothing are the "smoothed" values with no later use made of a functional form if there is one; the aim of smoothing is to give a general idea of relatively slow changes of value with little attention paid to the close matching of data values, while curve fitting concentrates on achieving as close a match as possible. smoothing methods often have an associated tuning parameter which is used to control the extent of smoothing. Curve fitting will adjust any number of parameters of the function to obtain the 'best' fit. == Linear smoothers == In the case that the smoothed values can be written as a linear transformation of the observed values, the smoothing operation is known as a linear smoother; the matrix representing the transformation is known as a smoother matrix or hat matrix. The operation of applying such a matrix transformation is called convolution. Thus the matrix is also called convolution matrix or a convolution kernel. In the case of simple series of data points (rather than a multi-dimensional image), the convolution kernel is a one-dimensional vector. == Algorithms == One of the most common algorithms is the "moving average", often used to try to capture important trends in repeated statistical surveys. In image processing and computer vision, smoothing ideas are used in scale space representations. The simplest smoothing algorithm is the "rectangular" or "unweighted sliding-average smooth". This method replaces each point in the signal with the average of "m" adjacent points, where "m" is a positive integer called the "smooth width". Usually m is an odd number. The triangular smooth is like the rectangular smooth except that it implements a weighted smoothing function. Some specific smoothing and filter types, with their respective uses, pros and cons are:

    Read more →
  • Postediting

    Postediting

    Post-editing (or postediting) is the process whereby humans amend machine-generated translation to achieve an acceptable final product. A person who post-edits is called a post-editor. The concept of post-editing is linked to that of pre-editing. In the process of translating a text via machine translation, best results may be gained by pre-editing the source text – for example by applying the principles of controlled language – and then post-editing the machine output. It is distinct from editing, which refers to the process of improving human generated text (a process which is often known as revision in the field of translation). Post-edited text may afterwards be revised to ensure the quality of the language choices are proofread to correct simple mistakes. Post-editing involves the correction of machine translation output to ensure that it meets a level of quality negotiated in advance between the client and the post-editor. Light post-editing aims at making the output simply understandable; full post-editing at making it also stylistically appropriate. With advances in machine translation full post-editing is becoming an alternative to manual translation. Practically all computer-assisted translation (CAT) tools now support post-editing of machine translated output. == Post-editing and machine translation == Machine translation left the labs to start being used for its actual purpose in the late seventies at some big institutions such as the European Commission and the Pan-American Health Organization, and then, later, at some corporations such as Caterpillar and General Motors. First studies on post-editing appeared in the eighties, linked to those implementations. To develop appropriate guidelines and training, members of the Association for Machine Translation in the Americas (AMTA) and the European Association for Machine Translation (EAMT) set a Post-editing Special Interest Group in 1999. After the nineties, advances in computer power and connectivity sped machine translation development and allowed for its deployment through the web browser, including as a free, useful adjunct to the main search engines (Google Translate, Bing Translator, Yahoo! Babel Fish). A wider acceptance of less than perfect machine translation was accompanied also by a wider acceptance of post-editing. With the demand for localisation of goods and services growing at a pace that could not be met by human translation, not even assisted by translation memory and other translation management technologies, industry bodies such as the Translation Automation Users Society (TAUS) expect machine translation and post-editing to play a much bigger role within the next few years. The use of Machine Translation suggests sometimes pre-editing. Human translators possess significantly more sophisticated cognitive abilities than machine translation (MT) systems. They leverage a wealth of life experience, common sense, and multi-sensory input to understand context, identify semantic intent, and add cultural nuances to translations. This remains true even as MT capabilities continue to improve. Unlike MT systems, which primarily focus on literal word-for-word conversion, human translators grasp the underlying meaning and intent, even when information is implicit. They "read between the lines," guided by their understanding of the world. Essentially, MT models excel at text string prediction, not true comprehension. Their success often stems from framing problems as prediction tasks, such as in self-driving cars or fraud detection. Studies have demonstrated that integrating adaptive MT with post-editing interfaces can lead to reductions in technical effort and time, improving overall translation efficiency. These systems are also supported by research that highlights the benefits of adaptive MT in real-world translation scenarios. For example, incremental adaptation in Neural Machine Translation (NMT) for professional post-editors has been shown to improve translation quality and reduce time spent on edits, showcasing how human expertise and machine assistance can complement each other effectively. == Light and full post-editing == For many years, no widely accepted, standardized post-editing guidelines existed; however, in 2017, ISO standard 18587:2017: Translation services — Post-editing of machine translation output — Requirements was published. Studies in the eighties distinguished between degrees of post-editing which, in the context of the European Commission Translation Service, were first defined as conventional and rapid or full and rapid. Light and full post-editing seems the wording most used today. Light post-editing implies minimal intervention by the post-editor, with the aim of ensuring quality is "good enough" or "understandable"; the expectation is that the client will use it for inbound purposes only, often when the text is needed urgently, or has a short time span. Full post-editing involves a greater level of intervention to achieve a degree of quality to be negotiated between client and post-editor; the expectation is that the outcome will be a text that is not only understandable but presented in some stylistically appropriate way, so it can be used for assimilation and even for dissemination, for inbound and for outbound purposes. The quality is expected to be publishable and equivalent to that of a human translation. The assumption, however, has been that it takes less effort for translators to work directly from the source text than to post-edit the machine generated version. With advances in machine translation, this may be changing. For some language pairs and for some tasks, and with engines that have been customised with domain specific good quality data, some clients are already requesting translators to post-edit instead of translating from scratch, in the belief that they will attain similar quality at a lower cost. The light/full classification, developed in the nineties when machine translation still came on a CD-ROM, may not suit advances in machine translation at the light post-editing end either. For some language pairs and some tasks, particularly if the source has been pre-edited, raw machine output may be good enough for gisting purposes without requiring subsequent human intervention. == Post-editing efficiency == Post-editing is used when raw machine translation is not good enough and human translation not required. Industry advises post-editing to be used when it can at least double the productivity of manual translation, even fourfold it in the case of light post-editing (1000 words per hour vs. 250 wph). However, post-editing efficiency is difficult to predict. Various studies from both academia and industry have claimed that post-editing is generally faster than translating from scratch, regardless of language pairs or translators' experience. There is, however, no agreement about how much time can be saved through post-editing in practice (if any at all): While the industry reports on time savings around 40%, some academic studies suggest that time savings under actual working conditions are more likely to be between 0–20%, or that it may depend on the terminological proximity between the source and target languages. Professionals have also reported negative productivity gains where corrections require more time than to translate from scratch. == Post-editing and the language industry == After some thirty years, post-editing is still "a nascent profession". What the right profile of the post-editor is, has not yet been fully studied. Post-editing overlaps with translating and editing, but only partially. Most think the ideal post-editor will be a translator keen to be trained on the specific skills required, but there are some who think a bilingual without a background in translation may be easier to train. Not much is known either on who the actual post-editors are, whether they tend to be professional translators, whether they work mostly as in-house employees or self-employed, and on which conditions. Many professional translators dislike post-editing, among other reasons because it tends to be paid at lower rates than conventional translations, with the International Association of Professional Translators and Interpreters (IAPTI) having been particularly vocal about it.

    Read more →
  • Theano (software)

    Theano (software)

    Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures. == History == Theano is an open source project primarily developed by the Montreal Institute for Learning Algorithms (MILA) at the Université de Montréal. The name of the software references the ancient philosopher Theano, long associated with the development of the golden mean. On 28 September 2017, Pascal Lamblin posted a message from Yoshua Bengio, Head of MILA: major development would cease after the 1.0 release due to competing offerings by strong industrial players. Theano 1.0.0 was then released on 15 November 2017. On 17 May 2018, Chris Fonnesbeck wrote on behalf of the PyMC development team that the PyMC developers will officially assume control of Theano maintenance once the MILA development team steps down. On 29 January 2021, they started using the name Aesara for their fork of Theano. On 29 Nov 2022, the PyMC development team announced that the PyMC developers will fork the Aesara project under the name PyTensor. == Sample code == The following code is the original Theano's example. It defines a computational graph with 2 scalars a and b of type double and an operation between them (addition) and then creates a Python function f that does the actual computation. == Examples == === Matrix Multiplication (Dot Product) === The following code demonstrates how to perform matrix multiplication using Theano, which is essential for linear algebra operations in many machine learning tasks. === Gradient Calculation === The following code uses Theano to compute the gradient of a simple operation (like a neuron) with respect to its input. This is useful in training machine learning models (backpropagation). === Building a Simple Neural Network === The following code shows how to start building a simple neural network. This is a very basic neural network with one hidden layer. === Broadcasting in Theano === The following code demonstrates how broadcasting works in Theano. Broadcasting allows operations between arrays of different shapes without needing to explicitly reshape them.

    Read more →