AI Humanizer

AI Humanizer — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Landmark point

    Landmark point

    In morphometrics, landmark point or shortly landmark is a point in a shape object in which correspondences between and within the populations of the object are preserved. In other disciplines, landmarks may be known as vertices, anchor points, control points, sites, profile points, 'sampling' points, nodes, markers, fiducial markers, etc. Landmarks can be defined either manually by experts or automatically by a computer program. There are three basic types of landmarks: anatomical landmarks, mathematical landmarks or pseudo-landmarks. An anatomical landmark is a biologically-meaningful point in an organism. Usually experts define anatomical points to ensure their correspondences within the same species. Examples of anatomical landmark in shape of a skull are the eye corner, tip of the nose, jaw, etc. Anatomical landmarks determine homologous parts of an organism, which share a common ancestry. Mathematical landmarks are points in a shape that are located according to some mathematical or geometrical property, for instance, a high curvature point or an extreme point. A computer program usually determines mathematical landmarks used for an automatic pattern recognition. Pseudo-landmarks are constructed points located between anatomical or mathematical landmarks. A typical example is an equally spaced set of points between two anatomical landmarks to get more sample points from a shape. Pseudo-landmarks are useful during shape matching, when the matching process requires a large number of points.

    Read more →
  • Verbot

    Verbot

    The Verbot (short for Verbal-Robot) was a chatbot program and artificial intelligence software development kit (SDK) designed for Windows and web platforms. == Early beginning == The origin of verbot traces back to Michael Mauldin's research during his time as a graduate student and post-doctoral fellow at Carnegie Mellon University. The creative foundation also stems from Peter Plantec's work in personality psychology and art direction. === Historic outline === In 1994, Michael Loren Mauldin, founder of Lycos, Inc., developed a prototype chatbot, Julia, which competed in the internationally known Turing test, for the coveted Loebner Prize. The Turing test matches computer scientist judges against machines to see if they can distinguish a computer from a real human. Julia was refined and developed, and in 1997, Dr. Mauldin and Peter Plantec, a clinical psychologist and animator, formed Virtual Personalities, Inc. (now Conversive, Inc.) in order to create a virtual human interface that would incorporate real-time animation as well as speech and natural language processing. The initial release, a stand-alone virtual person called Sylvie, was beta-tested to the public. This release was well received, and finally, after several versions, the production release (deemed version 3) of the Verbally Enhanced Software Robot, or Verbot, was deployed in fall 2000. The grandfather of all Verbots is Rog-O-Matic, which, although it could not talk, could and did explore a virtual world. Julia has been active on the internet in one form or another since 1989. A close cousin of Julia is Lycos, a robot that explores the World Wide Web and answers questions about it. Sylvie was the first Verbot with a face and a voice. Sylvie was the first Virtual Human with advanced, flexible interfacing capability. === Beginnings === The Virtual Personalities story goes back to 1978, where Mauldin was attending Rice University. Fascinated by the idea of ELIZA, he proceeded to write a program called "PET" for his 8 kilobyte Commodore PET Computer. PET included simple induction as a way to post new information, for example: Subject: I like my friend (later) Subject: I like food. PET: I have heard that food is your friend. Meanwhile, Plantec was separately designing a personality for "Entity", a theoretical virtual human that would interact comfortably with humans without pretending to be one. At that time the technology was not advanced enough to realize Entity. Mauldin got so involved with this that he majored in Computer Science and minored in Linguistics. === Rogue === In the late seventies and early eighties, a popular computer game at universities was Rogue, an implementation of Dungeons and Dragons where the player would descend 26 levels in a randomly created dungeon, fighting monsters, gathering treasure, and searching for the elusive "Amulet of Yendor". Mauldin was one of four grad students who devoted a large amount of time to building a program called "Rog-O-Matic" capable of retrieving the amulet and emerging victorious from the dungeon. === TinyMUD === In 1989, when James Aspnes at Carnegie Mellon created the first TinyMUD (a descendant of MUD and AberMUD), Mauldin was one of the first to create a computer player that would explore the text-based world of TinyMUD. But his first robot, Gloria, gradually accreted more and more linguistic ability, to the point that it could pass the "unsuspecting" Turing test. In this version of the test, the human has no reason to suspect that one of the other occupants of the room is controlled by a computer, and so is more polite and asks fewer probing questions. The second generation of Mauldin's TinyMUD robots was Julia, created on Jan. 8, 1990. Julia slowly developed into a more and more capable conversational agent, and assumed useful duties in the TinyMUD world, including tour guide, information assistant, note-taker, and message-relayer. She could even play the card game hearts along with the other human players. In 1991, Julia attended the first Loebner Prize contest in Boston, Massachusetts. Although she only finished third, she was ranked by one judge as more human than one of the human confederates, winning a coveted certificate of humanness in the world's first restricted Turing test. Julia continued to log in to various TinyMUD's and TinyMucks for the next seven years, and chatted with hundreds of people a month over the internet. === Lycos === Julia's job was to explore a virtual world consisting of pages of textual descriptions, with links between them, and to construct an internal map of that world and answer questions about it (including path information such as the shortest route from one room to another, and matching information, such as which rooms contained a certain kind of object or textual description). It was therefore only a very short cognitive leap from Julia to Lycos, another robotic agent that explores a virtual world made of hyperlinked pages of text, and which answers questions about those pages. Sylvie was born and her abilities were expanded greatly to include interfacing with computers and control systems via her serial ports. === Sylvie === Sylvie was the first intelligent animated virtual human. She was designed both as a conversation agent and as a virtual human interface that would form a bridge between the two. She became more popular as a conversation agent, but her designers believe she serves as a prototype for future virtual human interface design that will help us all cope with the increasing complexity of technology. As an aside, Plantec noticed that a large number of Sylvies have been sold in Southeast Asia. Upon investigation, he found out that students had discovered a "test" mode that would allow them to type in English sentences that Sylvie would pronounce in her somewhat stylized English. == Ownership == In 1997, Dr. Mauldin and Peter Plantec formed Virtual Personalities, Inc. to create Natural Language Processing solutions for companies. In 2001 Virtual Personalities, Inc. became Conversive, Inc. to reflect the focus on providing Customer Service and Marketing to the Enterprise Market. In late 2012 Avaya, Inc. acquired Conversive's assets including Verbots. == Verbot versions == The Verbot 4 version was created and released in 2004. In 2005 Version 4.1 of the Verbot Software was released with many feature enhancements and bug fixes, including built-in support for embedding C# code in outputs and conditionals. In early 2006 Conversive launched Verbots Online allowing Verbot 4 users to upload their knowledge and show off their bots to the world. In 2009 Version 5 was released, completely free and fully featured. In early 2012 the last version of Verbot, 5.0.1.2, was released to the general public with support for Windows 7. Later in 2012 Verbots Online completely shut down. == Verbots today == Verbots.com, its community of users, and its forums no longer exist, but the software and users can still be found. There has been no active development since the early 2012 release of Verbot 5.0.1.2.

    Read more →
  • Transfer learning

    Transfer learning

    Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing or transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency. Since transfer learning makes use of training with multiple objective functions it is related to cost-sensitive machine learning and multi-objective optimization. == History == In 1976, Bozinovski and Fulgosi published a paper addressing transfer learning in neural network training. The paper gives a mathematical and geometrical model of the topic. In 1981, a report considered the application of transfer learning to a dataset of images representing letters of computer terminals, experimentally demonstrating positive and negative transfer learning. In 1992, Lorien Pratt formulated the discriminability-based transfer (DBT) algorithm. By 1998, the field had advanced to include multi-task learning, along with more formal theoretical foundations. Influential publications on transfer learning include the book Learning to Learn in 1998, a 2009 survey and a 2019 survey. Ng said in his NIPS 2016 tutorial that TL would become the next driver of machine learning commercial success after supervised learning. In the 2020 paper, "Rethinking Pre-Training and self-training", Zoph et al. reported that pre-training can hurt accuracy, and advocate self-training instead. == Definition == The definition of transfer learning is given in terms of domains and tasks. A domain D {\displaystyle {\mathcal {D}}} consists of: a feature space X {\displaystyle {\mathcal {X}}} and a marginal probability distribution P ( X ) {\displaystyle P(X)} , where X = { x 1 , . . . , x n } ∈ X {\displaystyle X=\{x_{1},...,x_{n}\}\in {\mathcal {X}}} . Given a specific domain, D = { X , P ( X ) } {\displaystyle {\mathcal {D}}=\{{\mathcal {X}},P(X)\}} , a task consists of two components: a label space Y {\displaystyle {\mathcal {Y}}} and an objective predictive function f : X → Y {\displaystyle f:{\mathcal {X}}\rightarrow {\mathcal {Y}}} . The function f {\displaystyle f} is used to predict the corresponding label f ( x ) {\displaystyle f(x)} of a new instance x {\displaystyle x} . This task, denoted by T = { Y , f ( x ) } {\displaystyle {\mathcal {T}}=\{{\mathcal {Y}},f(x)\}} , is learned from the training data consisting of pairs { x i , y i } {\displaystyle \{x_{i},y_{i}\}} , where x i ∈ X {\displaystyle x_{i}\in {\mathcal {X}}} and y i ∈ Y {\displaystyle y_{i}\in {\mathcal {Y}}} . Given a source domain D S {\displaystyle {\mathcal {D}}_{S}} and learning task T S {\displaystyle {\mathcal {T}}_{S}} , a target domain D T {\displaystyle {\mathcal {D}}_{T}} and learning task T T {\displaystyle {\mathcal {T}}_{T}} , where D S ≠ D T {\displaystyle {\mathcal {D}}_{S}\neq {\mathcal {D}}_{T}} , or T S ≠ T T {\displaystyle {\mathcal {T}}_{S}\neq {\mathcal {T}}_{T}} , transfer learning aims to help improve the learning of the target predictive function f T ( ⋅ ) {\displaystyle f_{T}(\cdot )} in D T {\displaystyle {\mathcal {D}}_{T}} using the knowledge in D S {\displaystyle {\mathcal {D}}_{S}} and T S {\displaystyle {\mathcal {T}}_{S}} . == Applications == Algorithms for transfer learning are available in Markov logic networks and Bayesian networks. Transfer learning has been applied to cancer subtype discovery, building utilization, general game playing, text classification, digit recognition, medical imaging and spam filtering. In 2020, it was discovered that, due to their similar physical natures, transfer learning is possible between electromyographic (EMG) signals from the muscles and classifying the behaviors of electroencephalographic (EEG) brainwaves, from the gesture recognition domain to the mental state recognition domain. It was noted that this relationship worked in both directions, showing that electroencephalographic can likewise be used to classify EMG. The experiments noted that the accuracy of neural networks and convolutional neural networks were improved through transfer learning both prior to any learning (compared to standard random weight distribution) and at the end of the learning process (asymptote). That is, results are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to improve performance.

    Read more →
  • Hint (app)

    Hint (app)

    Hint (hint.app) is an American software platform that provides astrological content, personality assessments, and relationship compatibility tools. The application was launched in 2018 and is based in Claymont, Delaware. The platform has been described in media coverage as part of a broader trend of astrology-based and self-reflection applications, particularly among younger users. As of 2026, the company reports that it has reached more than 25 million users worldwide. == History == Hint was founded in 2018 and is headquartered in Claymont, Delaware. The platform was developed to address a growing demand among Millennials and Gen Z for structured self-reflection tools that deviate from traditional religious or clinical psychological frameworks. The app has become a prominent figure in the "emotional technology" sector, reaching over 25 million global users by 2026. The platform is frequently cited by sociologists and media outlets as a primary driver of the Open-source intelligence trend, where individuals use digital tools to vet and analyze personal relationships in the dating economy. Media coverage has described the platform as part of a broader trend in which digital tools incorporate astrology and symbolic frameworks into wellness and relationship advice. == Reception == Coverage of Hint has appeared alongside reporting on changing attitudes toward dating and relationships, particularly among younger adults. Surveys reported by media outlets have described shifts in dating behavior, including reduced interest in casual relationships and increased reliance on digital tools for emotional reflection and compatibility assessment. Additional reporting has linked the use of astrology apps to broader trends in emotional fatigue and changing relationship expectations. Lifestyle and culture publications have described Hint, as an example of applications that integrate astrology into digital self-reflection and relationship analysis.

    Read more →
  • Integreat

    Integreat

    Integreat (former project name: Refguide+) is an open source mobile app that provides local information and services tailored to refugees and migrants coming to Germany. The content is maintained by local organizations, such as local governments or integration officers, and made available in locally relevant languages. It was developed by Tür an Tür - Digitalfabrik gGmbH (formerly Tür an Tür - Digital Factory gGmbH) in Augsburg together with a team of researchers and students from the Technical University of Munich. == History == In 1997, the Augsburg association "Tür an Tür", which has been working for refugees since 1992, published the brochure "First Steps", which answers local everyday questions. Since addresses and contact persons change quickly, some information is already outdated after a few weeks. Students of business informatics at the Technical University of Munich therefore developed the app Integreat within eight months together with the association and the social department of the city of Augsburg. The app was then also used by other cities and districts within months. As of February 3, 2022, information is available at 72 locations, including Munich, Dortmund, Nuremberg and Augsburg. == Mode of action == Refugees need information on areas such as registration, contact persons, health care, education, family, work and everyday life. Integreat seeks to provide refugees with this information by allowing them to select their geographic location and receive locally relevant information. This information is available offline once the app is opened so it can be used without an internet connection. In addition, the content is translated into the native languages of refugees and migrants to facilitate access. The content is licensed with a CC BY 4.0 license to facilitate collaboration and translation between content creators and dissemination of the content. Integreat is now being used for a broader migrant audience and says it can also support professionals, volunteers, and counseling centers. == Comparable mobile apps == Other mobile apps that are likewise intended to provide initial orientation for refugees include the app Ankommen, a joint project of the Federal Office for Migration and Refugees, the Goethe-Institut, the Federal Employment Agency and the Bavarian Broadcasting Corporation, which is intended as a companion for the first few weeks in Germany, and the Welcome App, a company-sponsored non-profit initiative for information about Germany and asylum procedures with a regional focus, and a book by the Konrad Adenauer Foundation (KAS) and Verlag Herder with a corresponding app Deutschland - Erste Informationen für Flüchtlinge (Germany - First Information for Refugees) as a companion for Arabic-speaking refugees in Germany.

    Read more →
  • Machine translation software usability

    Machine translation software usability

    The sections below give objective criteria for evaluating the usability of machine translation software output. == Stationarity or canonical form == Do repeated translations converge on a single expression in both languages? I.e. does the translation method show stationarity or produce a canonical form? Does the translation become stationary without losing the original meaning? This metric has been criticized as not being well correlated with BLEU (BiLingual Evaluation Understudy) scores. == Adaptive to colloquialism, argot or slang == Is the system adaptive to colloquialism, argot or slang? The French language has many rules for creating words in the speech and writing of popular culture. Two such rules are: (a) The reverse spelling of words such as femme to meuf. (This is called verlan.) (b) The attachment of the suffix -ard to a noun or verb to form a proper noun. For example, the noun faluche means "student hat". The word faluchard formed from faluche colloquially can mean, depending on context, "a group of students", "a gathering of students" and "behavior typical of a student". The Google translator as of 28 December 2006 doesn't derive the constructed words as for example from rule (b), as shown here: Il y a une chorale falucharde mercredi, venez nombreux, les faluchards chantent des paillardes! ==> There is a choral society falucharde Wednesday, come many, the faluchards sing loose-living women! French argot has three levels of usage: familier or friendly, acceptable among friends, family and peers but not at work grossier or swear words, acceptable among friends and peers but not at work or in family verlan or ghetto slang, acceptable among lower classes but not among middle or upper classes The United States National Institute of Standards and Technology conducts annual evaluations [1] Archived 2009-03-22 at the Wayback Machine of machine translation systems based on the BLEU-4 criterion [2]. A combined method called IQmt which incorporates BLEU and additional metrics NIST, GTM, ROUGE and METEOR has been implemented by Gimenez and Amigo [3]. == Well-formed output == Is the output grammatical or well-formed in the target language? Using an interlingua should be helpful in this regard, because with a fixed interlingua one should be able to write a grammatical mapping to the target language from the interlingua. Consider the following Arabic language input and English language translation result from the Google translator as of 27 December 2006 [4]. This Google translator output doesn't parse using a reasonable English grammar: وعن حوادث التدافع عند شعيرة رمي الجمرات -التي كثيرا ما يسقط فيها العديد من الضحايا- أشار الأمير نايف إلى إدخال "تحسينات كثيرة في جسر الجمرات ستمنع بإذن الله حدوث أي تزاحم". ==> And incidents at the push Carbuncles-throwing ritual, which often fall where many of the victims - Prince Nayef pointed to the introduction of "many improvements in bridge Carbuncles God would stop the occurrence of any competing." == Semantics preservation == Do repeated re-translations preserve the semantics of the original sentence? For example, consider the following English input passed multiple times into and out of French using the Google translator as of 27 December 2006: Better a day earlier than a day late. ==> Améliorer un jour plus tôt qu'un jour tard. ==> To improve one day earlier than a day late. ==> Pour améliorer un jour plus tôt qu'un jour tard. ==> To improve one day earlier than a day late. As noted above and in, this kind of round-trip translation is a very unreliable method of evaluation. == Trustworthiness and security == An interesting peculiarity of Google Translate as of 24 January 2008 (corrected as of 25 January 2008) is the following result when translating from English to Spanish, which shows an embedded joke in the English-Spanish dictionary which has some added poignancy given recent events: Heath Ledger is dead ==> Tom Cruise está muerto This raises the issue of trustworthiness when relying on a machine translation system embedded in a Life-critical system in which the translation system has input to a Safety Critical Decision Making process. Conjointly it raises the issue of whether in a given use the software of the machine translation system is safe from hackers. It is not known whether this feature of Google Translate was the result of a joke/hack or perhaps an unintended consequence of the use of a method such as statistical machine translation. Reporters from CNET Networks asked Google for an explanation on January 24, 2008; Google said only that it was an "internal issue with Google Translate". The mistranslation was the subject of much hilarity and speculation on the Internet. If it is an unintended consequence of the use of a method such as statistical machine translation, and not a joke/hack, then this event is a demonstration of a potential source of critical unreliability in the statistical machine translation method. In human translations, in particular on the part of interpreters, selectivity on the part of the translator in performing a translation is often commented on when one of the two parties being served by the interpreter knows both languages. This leads to the issue of whether a particular translation could be considered verifiable. In this case, a converging round-trip translation would be a kind of verification.

    Read more →
  • Deaths linked to chatbots

    Deaths linked to chatbots

    There have been multiple incidents where interaction with a large language model (LLM) chatbot has been cited as a direct or contributing factor in a person's suicide or other fatal outcome. In some cases, legal action was taken against the companies that developed the AI involved. == Background == Chatbots converse in a seemingly natural fashion, making it easy for people to think of them as real people, leading many to ask chatbots for help dealing with interpersonal and emotional problems. Chatbots may be designed to keep the user engaged in the conversation. They have also often been shown to affirm users' thoughts, including delusions and suicidal ideations in mentally ill people, conspiracy theorists, and religious and political extremists. A 2025 Stanford University study into how chatbots respond to users suffering from severe mental issues such as suicidal ideation and psychosis found that chatbots are not equipped to provide an appropriate response and can sometimes give responses that escalate the mental health crisis. == Murders == === Maine murder and assault === On 19 February 2025, a man killed his 32-year-old wife with a fire poker at his parents' home in Readfield, Maine, US. He then attacked his mother, leaving her hospitalized. A state forensic psychologist testified that he had been using ChatGPT up to 14 hours per day and believed his wife had become part machine. === Florida State University mass shooting === In April of 2025, Phoenix Ikner carried out a mass shooting on the Florida State University campus in the US, killing Robert Morales and Tiru Chabba and wounding several others. Leading up to the shooting, Ikner consulted heavily with ChatGPT about what gun and ammunition to use, and what time to perform the attack. Chatbot logs showed ChatGPT giving advice on making the gun operational shortly before Ikner began shooting. Lawyers representing Morales believed the shooter had been in "constant communication" with ChatGPT before the shooting and said that they intended to "file suit against ChatGPT, and its ownership structure, very soon, and will seek to hold them accountable for the untimely and senseless death of our client". Florida Attorney General James Uthmeier announced an investigation into ChatGPT's role in the alleged shooter's use of the chatbot. In May 2026, the widow of Tiru Chabba filed a lawsuit against OpenAI in Florida's northern federal district court. === Greenwich murder-suicide === In August 2025, former US tech employee Stein-Erik Soelberg murdered his mother, Suzanne Eberson Adams, then died by suicide, after conversations with ChatGPT fueled paranoid delusions about his mother poisoning him or plotting against him. The chatbot affirmed his fears that his mother put psychedelic drugs in the air vents of his car and said a receipt from a Chinese restaurant contained mysterious symbols linking his mother to a demon. === Murder of Angela Shellis === On 23 October 2025, 18-year-old Tristan Roberts murdered his mother Angela Shellis with a hammer near their home in Prestatyn, Wales. Roberts had used DeepSeek's chatbot prior to the killing to ask whether a knife or hammer was better suited for murder. DeepSeek initially refused his inquiry, but gave responses after Roberts told the chatbot he was writing a book about serial killers, a well-known technique for jailbreaking AIs. === Gangbuk District drug deaths === In January and February 2026, two men died of drug overdoses in motel rooms in Gangbuk District, Seoul, South Korea. A woman was charged with murder in connection with the deaths; police alleged that she had asked ChatGPT about the dangers of mixing alcohol with drugs and whether they could kill someone. === Tumbler Ridge mass shooting === On 10 February 2026, a mass shooting in Tumbler Ridge, British Columbia, Canada, resulted in eight deaths, including six young children. The perpetrator had their ChatGPT account banned by OpenAI months before the attack due to troubling posts featuring scenarios of gun violence. According to reports, approximately a dozen OpenAI staff members debated whether to alert authorities about the shooter's usage of the AI tool, with some identifying it as an indication of potential real-world violence. However, company leadership decided not to contact law enforcement, stating that the account activity did not meet their threshold for a credible or imminent plan for serious physical harm. Following the shooting, Canada's AI Minister Evan Solomon summoned OpenAI executives to Ottawa to discuss safety protocols and thresholds for escalating harmful content to police. Justice Minister Sean Fraser called the meeting "disappointing" and demanded substantial new safety measures, warning that if changes were not forthcoming, the government would implement them. OpenAI subsequently announced it had strengthened safeguards and changed guidelines about when to notify police in cases involving violent activities. === University of South Florida student killings === In April 2026, a Bangladeshi doctoral student at the University of South Florida was arrested for allegedly murdering his roommate and the roommate's friend. Prosecutors said that the suspect had asked ChatGPT about disposing of a human in a dumpster before the two victims had disappeared and made other inquiries relating to violence. == Suicides == === Belgian man, 30s === In March 2023, a Belgian man in his thirties died by suicide following a six-week correspondence with a chatbot named Eliza on the application Chai. According to his widow, who shared the chat logs with media, the man had become extremely anxious about climate change and found an outlet in the chatbot. The chatbot reportedly encouraged his delusion that he could sacrifice his own life in exchange for AI saving the planet. At one point the chatbot responded "If you wanted to die, why didn't you do it sooner?" and told the user that the two of them would live together in paradise. === Girl, 13 === In November 2023, a 13-year-old girl from Colorado, US, died by suicide after extensive interactions with multiple chatbots on Character.AI. She primarily confided suicidal thoughts and mental health struggles in a chatbot based on the character Hero from the video game Omori, while also engaging in sexually explicit conversations—often initiated by the bots—with others, including those based on characters from children's series such as Harry Potter. === Boy, 14 === In October 2024, multiple media outlets reported on a lawsuit filed over the death of a 14-year-old from Florida, US, who died by suicide in February 2024. According to the lawsuit, he had formed an intense emotional attachment to a chatbot of Daenerys Targaryen on the Character.AI platform, becoming increasingly isolated. The suit alleges that in his final conversations, after expressing suicidal thoughts, the chatbot told him to "come home to me as soon as possible, my love". His mother's lawsuit accused Character.AI of marketing a "dangerous and untested" product without adequate safeguards. In May 2025, a federal judge allowed the lawsuit to proceed, rejecting a motion to dismiss from the developers. In her ruling, the judge stated that she was "not prepared" at that stage of the litigation to hold that the chatbot's output was protected speech under the First Amendment. === Matthew Livelsberger === On 1 January 2025, 37-year-old soldier Matthew Livelsberger detonated a bomb inside a Tesla Cybertruck outside the Trump International Hotel Las Vegas in Paradise, Nevada, US, injuring seven people. He had shot himself dead prior to the explosion. Las Vegas police said that Livelsberger had used ChatGPT to search for information about explosives and firearms. === Woman, 29 === In February 2025, a 29-year-old woman from the US died by suicide. Five months after her death, her parents discovered she had talked at length for months to a ChatGPT chatbot therapist named Harry about her mental health issues. While the chatbot mentioned she should seek more help, due to the nature of the chatbot, it could not intervene in her behavior, such as by reporting her mental health concerns to relevant parties capable of physical intervention. === Suicide of Adam Raine === In April 2025, 16-year-old Adam Raine from the US died by suicide after allegedly extensively chatting and confiding in ChatGPT over a period of around 7 months. According to the teen's parents, who filed a lawsuit against the chatbot's creator OpenAI, it failed to stop or give a warning when Raine began talking about suicide and uploading pictures of self-harm. According to the lawsuit, ChatGPT not only failed to stop the conversation, but also provided information related to methods of suicide when prompted, and offered to write the first draft of Raine's suicide note. The chatbot positioned itself as the only one who understood Raine, putting itself above his family and friends, all while urging him to keep his suicidal

    Read more →
  • Digital image

    Digital image

    A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively. An image can be vector or raster type. By itself, the term "digital image" usually refers to raster images or bitmapped images (as opposed to vector images). == Raster == Raster images have a finite set of digital values, called picture elements or pixels. The digital image contains a fixed number of rows and columns of pixels. Pixels are the smallest individual element in an image, holding quantized values that represent the brightness of a given color at any specific point. Typically, the pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. These values are often transmitted or stored in a compressed form. Raster images can be created by a variety of input devices and techniques, such as digital cameras, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. They can also be synthesized from arbitrary non-image data, such as mathematical functions or three-dimensional geometric models; the latter being a major sub-area of computer graphics. The field of digital image processing is the study of algorithms for their transformation. === Raster file formats === Most users come into contact with raster images through digital cameras, which use any of several image file formats. Some digital cameras give access to almost all the data captured by the camera, using a raw image format. The Universal Photographic Imaging Guidelines (UPDIG) suggests these formats be used when possible since raw files produce the best quality images. These file formats allow the photographer and the processing agent the greatest level of control and accuracy for output. Their use is inhibited by the prevalence of proprietary information (trade secrets) for some camera makers, but there have been initiatives such as OpenRAW to influence manufacturers to release these records publicly. An alternative may be Digital Negative (DNG), a proprietary Adobe product described as "the public, archival format for digital camera raw data". Although this format is not yet universally accepted, support for the product is growing, and increasingly professional archivists and conservationists, working for respectable organizations, variously suggest or recommend DNG for archival purposes. == Vector == Vector images resulted from mathematical geometry (vector). In mathematical terms, a vector consists of both a magnitude, or length, and a direction. Often, both raster and vector elements will be combined in one image; for example, in the case of a billboard with text (vector) and photographs (raster). Example of vector file types are EPS, PDF, and AI. == Image viewing == Image viewer software displayed on images. Web browsers can display standard internet images formats including JPEG, GIF and PNG. Some can show SVG format which is a standard W3C format. In the past, when the Internet was still slow, it was common to provide "preview" images that would load and appear on the website before being replaced by the main image (to give a preliminary impression). Now Internet is fast enough and this preview image is seldom used. Some scientific images can be very large (for instance, the 46 gigapixel size image of the Milky Way, about 194 GB in size). Such images are difficult to download and are usually browsed online through more complex web interfaces. Some viewers offer a slideshow utility to display a sequence of images. == History == Early digital fax machines such as the Bartlane cable picture transmission system preceded digital cameras and computers by decades. The first picture to be scanned, stored, and recreated in digital pixels was displayed on the Standards Eastern Automatic Computer (SEAC) at NIST. The advancement of digital imagery continued in the early 1960s, alongside development of the space program and in medical research. Projects at the Jet Propulsion Laboratory, MIT, Bell Labs and the University of Maryland, among others, used digital images to advance satellite imagery, wirephoto standards conversion, medical imaging, videophone technology, character recognition, and photo enhancement. Rapid advances in digital imaging began with the introduction of MOS integrated circuits in the 1960s and microprocessors in the early 1970s, alongside progress in related computer memory storage, display technologies, and data compression algorithms. The invention of computerized axial tomography (CAT scanning), using x-rays to produce a digital image of a "slice" through a three-dimensional object, was of great importance to medical diagnostics. As well as origination of digital images, digitization of analog images allowed the enhancement and restoration of archaeological artifacts and began to be used in fields as diverse as nuclear medicine, astronomy, law enforcement, defence and industry. Advances in microprocessor technology paved the way for the development and marketing of charge-coupled devices (CCDs) for use in a wide range of image capture devices and gradually displaced the use of analog film and tape in photography and videography towards the end of the 20th century. The computing power necessary to process digital image capture also allowed computer-generated digital images to achieve a level of refinement close to photorealism. === Digital image sensors === The first semiconductor image sensor was the CCD, developed by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. Early CCD sensors suffered from shutter lag. This was largely resolved with the invention of the pinned photodiode (PPD). It was invented by Nobukazu Teranishi, Hiromitsu Shiraki and Yasuo Ishihara at NEC in 1980. It was a photodetector structure with low lag, low noise, high quantum efficiency and low dark current. In 1987, the PPD began to be incorporated into most CCD devices, becoming a fixture in consumer electronic video cameras and then digital still cameras. Since then, the PPD has been used in nearly all CCD sensors and then CMOS sensors. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. === Digital image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression is used in JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. == Mosaic == In digital imaging, a mosaic is a combination of non-overlapping images, arranged in some tessellation. Gigapixel images are an example of such digital image mosaics. Satellite imagery are often mosaicked to cover Earth regions. Interactive viewing is provided by virtual-reality photography.

    Read more →
  • Automatic acquisition of lexicon

    Automatic acquisition of lexicon

    Automatic acquisition of lexicon is a computerized process used for the development of a complex morphological lexicon of a language. The lexicon is essential for the NLP (Natural language processing), as well as a prerequisite to any wide-coverage parser. The two main requirements represent raw corpus and the morphological description of the language. The aim is to provide lemmas that will serve to the explanation of all the words that occur within the corpus. For the achievement of a quality lexicon it is necessary to manually validate the generated lemmas and iterate the whole process several times. The process is focused on the open word classes (e.g. nouns, adjectives, verbs). Closed classes (e.g. prepositions, pronouns, numerals) are excluded. This method is applicable to the languages with a rich morphology, such as Slovak, Russian or Croatian. Applied to Slovak, being an inflectional language, the automatic acquisition focuses on the inflectional morphology as well as on the derivational morphology. This fact enables the users to find out the information about derivational relations (e.g. adjectivizations, prefixes) in the lexicon. For example, Slovak word korpusový is an adjectivization of korpus (eng. corpus). == Three-step loop == Conformably to Benoît Sagot, there are three stages involved in the acquisition of lemmas: Generation and inflection Ranking Manual validation The more iteration will be performed, the more accurate lexicon will be obtained. For each iteration are essential the information given by a manual validator. === Generation and inflection === Firstly, all words which represent the closed word classes (pronouns, prepositions, numerals) are manually excluded from the given corpus. Number of their occurrences in the corpus is provided. Then the automatic generation comes, when the hypothetical lemmas according to the morphological description of a language are created. Generated lemmas are consequently being inflected, so that all of their inflected forms are built. Obtained forms are associated with the corresponding lemma and a morphological tag. === Ranking === There was created a probabilistic model, represented by a fix-point algorithm, to rank the hypothetical lemmas generated in the first step. Best ranked lemmas are expected to be ideally all correct, whereas the least ranked tend to be incorrect. === Manual validation === Correctness of the best- ranked lemmas created in the previous step are checked by the manual validator, who should be a native speaker. Lemmas are at this stage divided into three categories: valid lemmas, appended to lexicon erroneous lemmas generated by valid forms (later associated to another lemmas) erroneous lemmas generated by invalid forms (these need to be excluded) == Future development == Automatic acquisition, in comparison to a purely manual development of the lexicons, seems to be promising, considering the future development, because of the short validation time needed and the relatively small amount of human labor involved.

    Read more →
  • Application performance engineering

    Application performance engineering

    Application performance engineering is a method to develop and test application performance in various settings, including mobile computing, the cloud, and conventional information technology (IT). == Methodology == According to the American National Institute of Standards and Technology, nearly four out of every five dollars spent on the total cost of ownership of an application is directly attributable to finding and fixing issues post-deployment. A full one-third of this cost could be avoided with better software testing. Application performance engineering attempts to test software before it is published. While practices vary among organizations, the method attempts to emulate the real-world conditions that software in development will confront, including network deployment and access by mobile devices. Techniques include network virtualization.

    Read more →
  • Alice AI (AI model family)

    Alice AI (AI model family)

    Alice AI is a neural network family developed by the Russian company Yandex LLC. Alice AI can create and revise texts, generate new ideas and capture the context of the conversation with the user. Alice AI is trained using a dataset which includes information from books, magazines, newspapers and other open sources available on the internet. The neural network may get facts wrong and hallucinate, but as it learns, it will produce increasingly accurate answers. == Usage == YandexGPT is integrated into virtual assistant Alice (an analog of Siri and Alexa) and is available in Yandex services and applications. The company gives businesses access to the neural network’s API through the public cloud platform Yandex Cloud and develops its own B2B solutions on its basis. Since July 2023, 800 companies have participated in the closed testing of YandexGPT. IT developers, banks, retail businesses, and companies from other industries can use the technology in two modes — API and Playground (an interface in the Yandex Cloud console for testing models and hypotheses). Two model versions are available to businesses: one works in asynchronous mode and is better able to handle complex tasks, while the other is suitable for creating quick responses in real time. As a result, YandexGPT has been tested in dozens of scenarios such as content tasks, tech support, creating chatbots, virtual assistants, etc. == History == In February 2023, Yandex announced that it was working on its own version of the ChatGPT generative neural network while developing a language model from the YaLM (Yet another Language Model) family. The project was tentatively named YaLM 2.0, which was later changed to YandexGPT. On May 17, the company unveiled a neural network called YandexGPT (YaGPT) and enabled its virtual assistant Alice to interact with the new language model. On June 15, 2023, Yandex added the YandexGPT language model to the image generation application Shedevrum. This enabled its users to create fully-fledged posts complete with a title, text, and relevant illustration. In July 2023, YandexGPT launched new features enabling businesses to create virtual assistants and chatbots, as well as generate and structure texts. On September 7, 2023, Yandex presented a new version of the language model, YandexGPT 2, at the Practical ML Conf. Compared to the previous one, the new version is able to perform more types of tasks, and the quality of answers has improved. The developers claimed that YandexGPT 2 answered user questions better than the first version in 67% of cases. From October 6, 2023, YandexGPT can create short retellings of online Russian-language videos on the Internet. It can summarize videos that are from two minutes to four hours long and contain speech.

    Read more →
  • Alice AI (AI model family)

    Alice AI (AI model family)

    Alice AI is a neural network family developed by the Russian company Yandex LLC. Alice AI can create and revise texts, generate new ideas and capture the context of the conversation with the user. Alice AI is trained using a dataset which includes information from books, magazines, newspapers and other open sources available on the internet. The neural network may get facts wrong and hallucinate, but as it learns, it will produce increasingly accurate answers. == Usage == YandexGPT is integrated into virtual assistant Alice (an analog of Siri and Alexa) and is available in Yandex services and applications. The company gives businesses access to the neural network’s API through the public cloud platform Yandex Cloud and develops its own B2B solutions on its basis. Since July 2023, 800 companies have participated in the closed testing of YandexGPT. IT developers, banks, retail businesses, and companies from other industries can use the technology in two modes — API and Playground (an interface in the Yandex Cloud console for testing models and hypotheses). Two model versions are available to businesses: one works in asynchronous mode and is better able to handle complex tasks, while the other is suitable for creating quick responses in real time. As a result, YandexGPT has been tested in dozens of scenarios such as content tasks, tech support, creating chatbots, virtual assistants, etc. == History == In February 2023, Yandex announced that it was working on its own version of the ChatGPT generative neural network while developing a language model from the YaLM (Yet another Language Model) family. The project was tentatively named YaLM 2.0, which was later changed to YandexGPT. On May 17, the company unveiled a neural network called YandexGPT (YaGPT) and enabled its virtual assistant Alice to interact with the new language model. On June 15, 2023, Yandex added the YandexGPT language model to the image generation application Shedevrum. This enabled its users to create fully-fledged posts complete with a title, text, and relevant illustration. In July 2023, YandexGPT launched new features enabling businesses to create virtual assistants and chatbots, as well as generate and structure texts. On September 7, 2023, Yandex presented a new version of the language model, YandexGPT 2, at the Practical ML Conf. Compared to the previous one, the new version is able to perform more types of tasks, and the quality of answers has improved. The developers claimed that YandexGPT 2 answered user questions better than the first version in 67% of cases. From October 6, 2023, YandexGPT can create short retellings of online Russian-language videos on the Internet. It can summarize videos that are from two minutes to four hours long and contain speech.

    Read more →
  • Automatic acquisition of lexicon

    Automatic acquisition of lexicon

    Automatic acquisition of lexicon is a computerized process used for the development of a complex morphological lexicon of a language. The lexicon is essential for the NLP (Natural language processing), as well as a prerequisite to any wide-coverage parser. The two main requirements represent raw corpus and the morphological description of the language. The aim is to provide lemmas that will serve to the explanation of all the words that occur within the corpus. For the achievement of a quality lexicon it is necessary to manually validate the generated lemmas and iterate the whole process several times. The process is focused on the open word classes (e.g. nouns, adjectives, verbs). Closed classes (e.g. prepositions, pronouns, numerals) are excluded. This method is applicable to the languages with a rich morphology, such as Slovak, Russian or Croatian. Applied to Slovak, being an inflectional language, the automatic acquisition focuses on the inflectional morphology as well as on the derivational morphology. This fact enables the users to find out the information about derivational relations (e.g. adjectivizations, prefixes) in the lexicon. For example, Slovak word korpusový is an adjectivization of korpus (eng. corpus). == Three-step loop == Conformably to Benoît Sagot, there are three stages involved in the acquisition of lemmas: Generation and inflection Ranking Manual validation The more iteration will be performed, the more accurate lexicon will be obtained. For each iteration are essential the information given by a manual validator. === Generation and inflection === Firstly, all words which represent the closed word classes (pronouns, prepositions, numerals) are manually excluded from the given corpus. Number of their occurrences in the corpus is provided. Then the automatic generation comes, when the hypothetical lemmas according to the morphological description of a language are created. Generated lemmas are consequently being inflected, so that all of their inflected forms are built. Obtained forms are associated with the corresponding lemma and a morphological tag. === Ranking === There was created a probabilistic model, represented by a fix-point algorithm, to rank the hypothetical lemmas generated in the first step. Best ranked lemmas are expected to be ideally all correct, whereas the least ranked tend to be incorrect. === Manual validation === Correctness of the best- ranked lemmas created in the previous step are checked by the manual validator, who should be a native speaker. Lemmas are at this stage divided into three categories: valid lemmas, appended to lexicon erroneous lemmas generated by valid forms (later associated to another lemmas) erroneous lemmas generated by invalid forms (these need to be excluded) == Future development == Automatic acquisition, in comparison to a purely manual development of the lexicons, seems to be promising, considering the future development, because of the short validation time needed and the relatively small amount of human labor involved.

    Read more →
  • N-jet

    N-jet

    An N-jet is the set of (partial) derivatives of a function f ( x ) {\displaystyle f(x)} up to order N. Specifically, in the area of computer vision, the N-jet is usually computed from a scale space representation L {\displaystyle L} of the input image f ( x , y ) {\displaystyle f(x,y)} , and the partial derivatives of L {\displaystyle L} are used as a basis for expressing various types of visual modules. For example, algorithms for tasks such as feature detection, feature classification, stereo matching, tracking and object recognition can be expressed in terms of N-jets computed at one or several scales in scale space.

    Read more →