Linguistic value

Linguistic value

In artificial intelligence, fuzzy logic operations research, and related fields, a linguistic value is a natural language term which is derived using quantitative or qualitative reasoning such as with probability and statistics or fuzzy sets and systems. Variables that take linguistic values are called linguistic variables. == Examples of linguistic variables and values == For example, "age" may be a linguistic variable if its values are not numerical, e.g. very young, quite young, not young, old, not very old etc. These values could be derived from the numeric values for age. As another example, if a shuttle heat shield is deemed of having a linguistic value of a "very low" percentage of damage in re-entry, based upon knowledge from experts in the field, that probability would be given a value of say, 5%. From there on out, if it were to be used in an equation, the variable of percentage of damage will be at 5% if it deemed very low percentage.

1 Second Everyday

1 Second Everyday (1SE) is an application developed by Cesar Kuriyama. The application allows the user to record one second of video every day and then chronologically edits (mashes) them together into a single film. It is compatible with iOS and Android. The idea of the application was developed by Kuriyama's 1 Second Everyday — Age 30 video. The application was launched in January 2013. 1 Second Everyday played a part in the plot of Chef and also became the inspiration for the 2014 short animated clip Feast. == Background == === Kuriyama's video === In February 2011, when Cesar Kuriyama turned 30, after saving money, he quit his job in an advertising firm and took a year off to travel. During this time, he started working on a project he called 1 Second Everyday. As part of the project, every day he recorded one second of video – something that was supposed to help him remember that day. He started the project because he was frustrated with his memory. He planned to stockpile the 365 one-second clips into one film to serve as a memento of his year. While working on the project Kuriyama realized that recording one second every day impacted the decisions he made in a positive way. After a year he made a 365-second clip out of his recordings. The video called 1 Second Everyday – Age 30, went viral. According to Kuriyama, he was initially inspired to take a year off from work by a TED talk given by Stefan Sagmeister called "The Power of Time Off." Kuriyama also delivered a TED talk about 1 Second Everyday in 2012 at TED 2012 in Long Beach California. === Kickstarter campaign === After completing his own video, Kuriyama decided to develop an application that would allow the users to record one second every day and compile their own videos. He developed a prototype of the application and then in 2012, he launched a Kickstarter campaign to raise funds for completing the application. The campaign became one of the most backed app campaigns in the history of Kickstarter. It was backed by 11,281 backers who pledged a total of $56,959 on an initial goal of $20,000. Following the completion of the Kickstarter campaign, he partnered with an application design studio in Brooklyn to develop the application. 1 Second Everyday was released two weeks after the completion of its Kickstarter campaign. == Application == The application was released for iOS on 10 January 2013. An Android-compatible version of the application was developed later. Using it, the user can record the videos in the application or they can select one second portions from their libraries. 1 Second Everyday dates every snippet. The user can also set alarms to remember to record their daily video. In order to compile a video, the user selects the seconds they want and the application creates a compilation video. The user can keep multiple timelines. It also allows users to post directly on social networks. The main interface in 1 Second Everyday is a calendar, which shows the user which days have snippets and which they can still fill in. In the beginning, 1 Second Everyday restricted the recording to one second. However, the developers later released Super Seconds, which allowed users to record an additional half a second video. In 2014, 1 Second Everyday Crowds was launched, which is an area in the application featuring compilations of second clips from different users. == In the media == The Kickstarter campaign of 1 Second Everyday was featured in Entrepreneur's 3 Innovative Tech Startups on Kickstarter Right Now in 2012. The application was featured in The New York Times, The Washington Post, Gawker and other media outlets. By the end of the launch day, it was in Top 10 Free Apps on App Store. It was also selected as the App of the Week on GeekWire in 2013. Several other one-second compilation videos were also posted on the Internet after Kuriyama's video gained media attention. Sam Cornwell, an English photographer documented his son Indigo's growth using a montage of one-second iPhone clips. He shot these clips every single day from the moment of birth right up to the baby's first birthday. According to Cornwell, he was inspired by Kuriyama's project. The video of Cornwell's son gained considerable media attention after it was posted on YouTube. Save the Children also made a video commercial based on a similar format that showed a British girl oblivious of the Syrian war end up being a refugee. 1SE was a finalist for the Fast Company Innovation by Design Award in 2015, but lost to Google Maps. In 2015, Google Android created a gallery, Leap Second 2015, with the help of Droga5 and Kuriyama. The gallery showcased how people around the world enjoyed the one extra second of their lives. Through the 1 Second Everyday app available at Google Play, people were able to submit their extra second, which were then vetted and added to the gallery. The viewers were able to view other celebratory seconds from around the world as well as searching for them using different hashtags.

Corpus-assisted discourse studies

Corpus-assisted discourse studies (abbr.: CADS) is related historically and methodologically to the discipline of corpus linguistics. The principal endeavor of corpus-assisted discourse studies is the investigation, and comparison of features of particular discourse types, integrating into the analysis the techniques and tools developed within corpus linguistics. These include the compilation of specialised corpora and analyses of word and word-cluster frequency lists, comparative keyword lists and, above all, concordances. A broader conceptualisation of corpus-assisted discourse studies would include any study that aims to bring together corpus linguistics and discourse analysis. Such research is often labelled as corpus-based or corpus-assisted discourse analysis, with the term CADS coined by a research group in Italy (Partington 2004) for a specific type of corpus-assisted discourse analysis (see the section 'in different countries' below). == Aims == Corpus-assisted discourse studies aim to uncover non-obvious meaning, that is, meaning which might not be readily available to naked-eye perusal. Much of what carries meaning in texts is not open to direct observation: “you cannot understand the world just by looking at it” (Stubbs [after Gellner 1959] 1996: 92). We use language “semi-automatically”, in the sense that speakers and writers make semi-conscious choices within the various complex overlapping systems of which language is composed, including those of transitivity, modality (Michael Halliday 1994), lexical sets (e.g. freedom, liberty, deliverance), modification, and so on. Authors themselves are, famously, generally unaware of all the meanings their texts convey. By combining the quantitative research approach, that is, statistical analysis of large amounts of the discourse in question - more precisely, large numbers of tokens of the discourse type under study contained in a corpus - with the more qualitative research approach typical of discourse analysis, that is, the close, detailed examination of particular stretches of discourse it may be possible to better understand the processes at play in the discourse type and to gain access to non-obvious meanings. Aims can differ in other types of corpus-based or corpus-assisted discourse analysis; but in general such studies combine quantitative and qualitative research and aim to shed light on discourses, registers, discourse patterns, etc., with the help of a corpus linguistic approach. Specific aims and techniques depend on the relevant project. == In different countries == In German-speaking countries: Pioneering work in corpus-based discourse analysis was conducted in Europe, in particular by Hardt-Mautner/Mautner (1995, 2000) and Stubbs (1996, 2001). CADS and other types of corpus-based discourse analysis are inspired by this important early work. In Italy: A considerable body of research has been conducted in Italy either by individual researchers or under the aegis of combined inter-university projects such as Newspool (Partington et al. 2004) and CorDis (Morley and Bayley eds, 2009). It has concentrated on political and media language, mainly because a nucleus of linguists in Italian universities work in Political Science faculties and are increasingly interested in the use of corpus techniques to conduct a particular type of sociopolitical discourse analysis, including the unearthing of noteworthy ideological metaphors and motifs in the language of political figures and institutions. Italian researchers also developed Modern diachronic corpus-assisted discourse studies (MD-CADS). This approach contrasts the language contained in comparable corpora from different but recent points in time in order to track changes in modern language usage but also social, cultural and political changes over modern times, as reflected - and shared among people - in language. It is this Italian body of research that makes most use of the label CADS. In the UK: Linguists in the UK tend to undertake corpus-based critical discourse analysis (CDA). CDA generally adopts a leftist political stance, focusing on the ways that social and political domination is reproduced by text and talk. This type of corpus-based research was originally associated with Lancaster University (Baker et al. 2008), but has spread more widely since. Such work typically studies the discourses around particular groups of people (e.g. Muslims, people with disabilities) or concepts/events (e.g. feminism, same-sex marriage). In Australia: Corpus-based discourse analysis is undertaken by a growing number of Australian researchers, most often on media texts. Some of this work aims to elucidate specific features of discourse types (news, social media, television series, etc.), while other work is rooted in the tradition of corpus-based critical discourse analysis. == Comparison with traditional corpus linguistics == Traditional corpus linguistics has, quite naturally, tended to privilege the quantitative approach. In the drive to produce more authentic dictionaries and grammars of a language, it has been characterised by the compilation of some very large corpora of heterogeneric discourse types in the desire to obtain an overview of the greatest quantity and variety of discourse types possible, in other words, of the chimerical but useful fiction called the “general language” (“general English”, “general Italian”, and so on). This has led to the construction of immensely valuable research tools such as the Bank of English and the British National Corpus. Some branches of corpus linguistics have also promoted an approach that is "corpus-driven", in which we need, grammatically speaking, a mental tabula rasa to free ourselves of the baleful prejudice exerted by traditional models and allow the data to speak entirely for itself. The aim of corpus-assisted discourse studies and related approaches is radically different. Here the aim of the exercise is to acquaint oneself as much as possible with the discourse type(s) in hand. Researchers typically engage with their corpus in a variety of ways. As well as via wordlists and concordancing, intuitions for further research can also arise from reading or watching or listening to parts of the data-set, a process which can help provide a feel for how things are done linguistically in the discourse-type being studied. Corpus-assisted discourse analysis is also typically characterised by the compilation of ad hoc specialised corpora, since very frequently there exists no previously available collection of the discourse type in question. Often, other corpora are utilized in the course of a study for purposes of comparison. These may include pre-existing corpora or may themselves need to be compiled by the researcher. In some sense, all work with corpora – just as all work with discourse - is properly comparative. Even when a single corpus is employed, it is used to test the data it contains against another body of data. This may consist of the researcher's intuitions, or the data found in reference works such as dictionaries and grammars, or it may be statements made by previous authors in the field. == CADS as a specific type of corpus-based discourse analysis == Researchers in Italy have developed CADS as a specific type of corpus-based discourse analysis, creating a standard set of methods: 'A basic, standard methodology in CADS may resemble the following:' Step 1: Decide upon the research question; Step 2: Choose, compile or edit an appropriate corpus; Step 3: Choose, compile or edit an appropriate reference corpus / corpora; Step 4: Make frequency lists and run a keywords comparison of the corpora; Step 5: Determine the existence of sets of key items; Step 6: Concordance interesting key items (with differing quantities of co-text); Step 7: (Possibly) refine the research question and return to Step 2. This basic procedure can of course vary according to individual research circumstances and requirements. A particular way of conceptualising research questions has also been proposed in such CADS projects: Given that P is a discourse participant (or possibly an institution) and G is a goal, often a political goal: How does P achieve G with language? What does this tell us about P? Comparative studies: how do P1 and P2 differ in their use of language? Does this tell us anything about their different principles and objectives? A second general type of CADS research question, which might be asked of interactive discourse data, has been conceptualised as follows: Given that P(x) is a particular participant or set of participants, DT is the discourse type, and R is an observed relationship between or among participants: How do {P(a), P(b)...P(n)} achieve / maintain R in DT [using language]? Another common type of research question has been conceptualised thus: Given that A is an author, Ph(x) is a phenomenon or practice or behaviour, and DT(x) is a particular discourse type. A has said P

Is an AI Art Generator Worth It in 2026?

Curious about the best AI art generator? An AI art generator is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI art generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

Stochastic grammar

A stochastic grammar (statistical grammar) is a grammar framework with a probabilistic notion of grammaticality: Stochastic context-free grammar Statistical parsing Data-oriented parsing Hidden Markov model (or stochastic regular grammar) Estimation theory The grammar is realized as a language model. Allowed sentences are stored in a database together with the frequency how common a sentence is. Statistical natural language processing uses stochastic, probabilistic and statistical methods, especially to resolve difficulties that arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of corpora and Markov models. "A probabilistic model consists of a non-probabilistic model plus some numerical quantities; it is not true that probabilistic models are inherently simpler or less structural than non-probabilistic models." == Examples == A probabilistic method for rhyme detection is implemented by Hirjee & Brown in their study in 2013 to find internal and imperfect rhyme pairs in rap lyrics. The concept is adapted from a sequence alignment technique using BLOSUM (BLOcks SUbstitution Matrix). They were able to detect rhymes undetectable by non-probabilistic models.

Feature engineering

Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability. Beyond machine learning, the principles of feature engineering are applied in various scientific fields, including physics. For example, physicists construct dimensionless numbers such as the Reynolds number in fluid dynamics, the Nusselt number in heat transfer, and the Archimedes number in sedimentation. They also develop first approximations of solutions, such as analytical solutions for the strength of materials in mechanics. == Clustering == One of the applications of feature engineering has been clustering of feature-objects or sample-objects in a dataset. Especially, feature engineering based on matrix decomposition has been extensively used for data clustering under non-negativity constraints on the feature coefficients. These include Non-Negative Matrix Factorization (NMF), Non-Negative Matrix-Tri Factorization (NMTF), Non-Negative Tensor Decomposition/Factorization (NTF/NTD), etc. The non-negativity constraints on coefficients of the feature vectors mined by the above-stated algorithms yields a part-based representation, and different factor matrices exhibit natural clustering properties. Several extensions of the above-stated feature engineering methods have been reported in literature, including orthogonality-constrained factorization for hard clustering, and manifold learning to overcome inherent issues with these algorithms. Other classes of feature engineering algorithms include leveraging a common hidden structure across multiple inter-related datasets to obtain a consensus (common) clustering scheme. An example is Multi-view Classification based on Consensus Matrix Decomposition (MCMD), which mines a common clustering scheme across multiple datasets. MCMD is designed to output two types of class labels (scale-variant and scale-invariant clustering), and: is computationally robust to missing information, can obtain shape- and scale-based outliers, and can handle high-dimensional data effectively. Coupled matrix and tensor decompositions are popular in multi-view feature engineering. == Predictive modelling == Feature engineering in machine learning and statistical modeling involves selecting, creating, transforming, and extracting data features. Key components include feature creation from existing data, transforming and imputing missing or invalid features, reducing data dimensionality through methods like Principal Components Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), and selecting the most relevant features for model training based on importance scores and correlation matrices. Features vary in significance. Even relatively insignificant features may contribute to a model. Feature selection can reduce the number of features to prevent a model from becoming too specific to the training data set (overfitting). Feature explosion occurs when the number of identified features is too large for effective model estimation or optimization. Common causes include: Feature templates - implementing feature templates instead of coding new features Feature combinations - combinations that cannot be represented by a linear system Feature explosion can be limited via techniques such as regularization, kernel methods, and feature selection. == Automation == Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a decision tree. Deep Feature Synthesis uses simpler methods. === Multi-relational Decision Tree Learning (MRDTL) === Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as decision nodes, refined systematically until a specific termination criterion is reached. Most MRDTL studies base implementations on relational databases, which results in many redundant operations. These redundancies can be reduced by using techniques such as tuple id propagation. === Open-source implementations === There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: featuretools is a Python library for transforming time series and relational data into feature matrices for machine learning. MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. OneBM or One-Button Machine combines feature transformations and feature selection on relational data with feature selection techniques. OneBM helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost. getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in C/C++ with a Python interface. It has been shown to be at least 60 times faster than tsflex, tsfresh, tsfel, featuretools or kats. tsfresh is a Python library for feature extraction on time series data. It evaluates the quality of the features using hypothesis testing. tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. seglearn is an extension for multivariate, sequential time series data to the scikit-learn Python library. tsfel is a Python package for feature extraction on time series data. kats is a Python toolkit for analyzing time series data. === Deep feature synthesis === The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores == The feature store is where the features are stored and organized for the explicit purpose of being used to either train models (by data scientists) or make predictions (by applications that have a trained model). It is a central location where you can either create or update groups of features created from multiple different data sources, or create and update new datasets from those feature groups for training models or for use in applications that do not want to compute the features but just retrieve them when it needs them to make predictions. A feature store includes the ability to store code used to generate features, apply the code to raw data, and serve those features to models upon request. Useful capabilities include feature versioning and policies governing the circumstances under which features can be used. Feature stores can be standalone software tools or built into machine learning platforms. == Alternatives == Feature engineering can be a time-consuming and error-prone process, as it requires domain expertise and often involves trial and error. Deep learning algorithms may be used to process a large raw dataset without having to resort to feature engineering. However, deep learning algorithms still require careful preprocessing and cleaning of the input data. In addition, choosing the right architecture, hyperparameters, and optimization algorithm for a deep neural network can be a challenging and iterative process.

Frederick Jelinek

Frederick Jelinek (18 November 1932 – 14 September 2010) was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up". Jelinek was born in Czechoslovakia before World War II and emigrated with his family to the United States in the early years of the communist regime. He studied engineering at the Massachusetts Institute of Technology and taught for 10 years at Cornell University before accepting a job at IBM Research. In 1961, he married Czech screenwriter Milena Jelinek. At IBM, his team advanced approaches to computer speech recognition and machine translation. After IBM, he went to head the Center for Language and Speech Processing at Johns Hopkins University for 17 years, where he was still working on the day he died. == Personal life == Jelinek was born on November 18, 1932, as Bedřich Jelínek in Kladno to Vilém and Trude Jelínek. His father was Jewish; his mother was born in Switzerland to Czech Catholic parents and had converted to Judaism. Jelínek senior, a dentist, had planned early to escape Nazi occupation and flee to England; he arranged for a passport, visa, and the shipping of his dentistry materials. The couple planned to send their son to an English private school. However, Vilém decided to stay at the last minute and was eventually sent to the Theresienstadt concentration camp, where he died in 1945. The family was forced to move to Prague in 1941, but Frederick, his sister and mother—thanks to the latter's background—escaped the concentration camps. After the war, Jelinek entered in the gymnasium, despite having missed several years of schooling because education of Jewish children had been forbidden since 1942. His mother, anxious that her son should get a good education, made great efforts for their emigration, especially when it became clear he would not be allowed to even attempt the graduation examination. His mother hoped her son would become a physician, but Jelinek dreamed of being a lawyer. He studied engineering in evening classes at the City College of New York and received stipends from the National Committee for a Free Europe that allowed him to study at the Massachusetts Institute of Technology. About his choice of specialty, he said: "Fortunately, to electrical engineering there belonged a discipline whose aim was not the construction of physical systems: the theory of information". He obtained his Ph.D. in 1962, with Robert Fano as his adviser. In 1957, Jelinek paid an unexpected visit to Prague. He had been in Vienna and applied for a visa, hoping to see his former acquaintances again. He met with his old friend Miloš Forman, who introduced him to film student Milena Tobolová—whose screenplay had been the basis for the movie Easy Life (Snadný život). His flight back to the U.S. had a stopover in Munich, during which he called her to propose. Tobolová was considered a dissident and the authorities were not happy with her film. Jelinek asked for help from Jerome Wiesner and Cyrus Eaton, the latter who lobbied Nikita Khrushchev. Following the inauguration of John F. Kennedy, a group of Czech dissidents were allowed to emigrate in January 1961. Thanks to the lobbying, the future Milena Jelinek was one of them. After completing his graduate studies, Jelinek, who had developed an interest in linguistics, had plans to work with Charles F. Hockett at Cornell University. However these fell through and during the next ten years he continued to study information theory. Having previously worked at IBM during a sabbatical, he began full-time work there in 1972—at first on leave for Cornell, but permanently from 1974. He remained there for over twenty years. Although at first he had been offered a regular research job, upon his arrival he learned that Josef Raviv had recently been promoted to head of the newly opened IBM Haifa Research Laboratory, and became head of the Continuous Speech Recognition group at the Thomas J. Watson Research Center. Despite his team's successes in this area, Jelinek's work remained little known in his home country because Czech scientists were not allowed to participate in key conferences. After the 1989 fall of communism, Jelinek helped establish scientific relationships, regularly visiting to lecture and helping to persuade IBM to establish a computing centre at Charles University. In 1993, he retired from IBM and went to Johns Hopkins University's Center for Language and Speech Processing, where he was director and Julian Sinclair Smith Professor of Electrical and Computer Engineering. He was still working there at the time of his death; Jelinek died of a heart attack at the close of an otherwise normal workday in mid-September 2010. He was survived by his wife, daughter and son, sister, stepsister, and three grandchildren, including Sophie Gold Jelinek. == Research and legacy == Information theory was a fashionable scientific approach in the mid '50s. However, pioneer Claude Shannon wrote in 1956 that this trendiness was dangerous. He said, "Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems ... It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems." During the next decade, a combination of factors shut down the application of information theory to natural language processing (NLP) problems—in particular machine translation. One factor was the 1957 publication of Noam Chomsky's Syntactic Structures, which stated, "probabilistic models give no insight into the basic problems of syntactic structure". This accorded well with the philosophy of the artificial intelligence research of the time, which promoted rule-based approaches. The other factor was the 1966 ALPAC report, which recommended that the government should stop funding research into machine translation. ALPAC chairman John Pierce later said that the field was filled with "mad inventors or untrustworthy engineers". He said that the underlying linguistic problems must be solved before attempts at NLP could be reasonably made. These elements essentially halted research in the field. Jelinek had begun to develop an interest in linguistics after the immigration of his wife, who initially enrolled in the MIT linguistics program with the help of Roman Jakobson. Jelinek often accompanied her to Chomsky's lectures, and even discussed the possibility of changing orientation with his adviser. Fano was "really upset", and after the failure of his project with Hockett at Cornell, he did not return to this field of research until starting work at IBM. The scope of research at IBM was considerably different from that of most other teams. According to Mark Liberman, "While [Jelinek] was leading IBM's effort to solve the general dictation problem during the decade or so following 1972, most other U.S. companies and academic researchers were working on very limited problems ... or were staying out of the field entirely". Jelinek regarded speech recognition as an information theory problem—a noisy channel, in this case the acoustic signal—which some observers considered a daring approach. The concept of perplexity was introduced in their first model, New Raleigh Grammar, which was published in 1976 as the paper "Continuous Speech Recognition by Statistical Methods" in the journal Proceedings of the IEEE. According to Young, the basic noisy channel approach "reduced the speech recognition problem to one of producing two statistical models". Whereas New Raleigh Grammar was a hidden Markov model, their next model, called Tangora, was broader and involved n-grams, specifically trigrams. Even though "it was obvious to everyone that this model was hopelessly impoverished", it was not improved upon until Jelinek presented another paper in 1999. The same trigram approach was applied to phones in single words. Although the identification of parts of speech turned out not to be very useful for speech recognition, tagging methods developed during these projects are now used in various NLP applications. The incremental research techniques developed at IBM eventually became dominant in the field after DARPA, in the mid-80s, returned to NLP research and imposed that methodology to participating teams, shared common goals, data, and precise evaluation metrics. The Continuous Speech Recognition Group's research, which required large amounts of data to train the algorithms, eventually led to the creation of the Linguistic Data Consortium. In the 1980s, although the broader problem of speech recognition remained unsolved, they sought to apply the methods developed to other problems; machine translat