Hindsight optimisation (HOP) is a computer science technique used in artificial intelligence for analysis of actions which have stochastic results. HOP is used in combination with a deterministic planner. By creating sample results for each of the possible actions from the given state (i.e. determinising the actions), and using the deterministic planner to analyse those sample results, HOP allows an estimate of the actual action.
Spreading activation
Spreading activation is a method for searching associative networks, biological and artificial neural networks, or semantic networks. The search process is initiated by labeling a set of source nodes (e.g. concepts in a semantic network) with weights or "activation" and then iteratively propagating or "spreading" that activation out to other nodes linked to the source nodes. Most often these "weights" are real values that decay as activation propagates through the network. When the weights are discrete this process is often referred to as marker passing. Activation may originate from alternate paths, identified by distinct markers, and terminate when two alternate paths reach the same node. However brain studies show that several different brain areas play an important role in semantic processing. Spreading activation in semantic networks as a model were invented in cognitive psychology to model the fan out effect. Spreading activation can also be applied in information retrieval, by means of a network of nodes representing documents and terms contained in those documents. == Cognitive psychology == As it relates to cognitive psychology, spreading activation is the theory of how the brain iterates through a network of associated ideas to retrieve specific information. The spreading activation theory presents the array of concepts within our memory as cognitive units, each consisting of a node and its associated elements or characteristics, all connected together by edges. A spreading activation network can be represented schematically, in a sort of web diagram with shorter lines between two nodes meaning the ideas are more closely related and will typically be associated more quickly to the original concept. In memory psychology, the spreading activation model holds that people organize their knowledge of the world based on their personal experiences, which in turn form the network of ideas that is the person's knowledge of the world. When a word (the target) is preceded by an associated word (the prime) in word recognition tasks, participants seem to perform better in the amount of time that it takes them to respond. For instance, subjects respond faster to the word "doctor" when it is preceded by "nurse" than when it is preceded by an unrelated word like "carrot". This semantic priming effect with words that are close in meaning within the cognitive network has been seen in a wide range of tasks given by experimenters, ranging from sentence verification to lexical decision and naming. As another example, if the original concept is "red" and the concept "vehicles" is primed, they are much more likely to say "fire engine" instead of something unrelated to vehicles, such as "cherries". If instead "fruits" was primed, they would likely name "cherries" and continue on from there. The activation of pathways in the network has everything to do with how closely linked two concepts are by meaning, as well as how a subject is primed. == Algorithm == A directed graph is populated by Nodes[ 1...N ] each having an associated activation value A [ i ] which is a real number in the range [0.0 ... 1.0]. A Link[ i, j ] connects source node[ i ] with target node[ j ]. Each edge has an associated weight W [ i, j ] usually a real number in the range [0.0 ... 1.0]. Parameters: Firing threshold F, a real number in the range [0.0 ... 1.0] Decay factor D, a real number in the range [0.0 ... 1.0] Steps: Initialize the graph setting all activation values A [ i ] to zero. Set one or more origin nodes to an initial activation value greater than the firing threshold F. A typical initial value is 1.0. For each unfired node [ i ] in the graph having an activation value A [ i ] greater than the node firing threshold F: For each Link [ i, j ] connecting the source node [ i ] with target node [ j ], adjust A [ j ] = A [ j ] + (A [ i ] W [ i, j ] D) where D is the decay factor. If a target node receives an adjustment to its activation value so that it would exceed 1.0, then set its new activation value to 1.0. Likewise maintain 0.0 as a lower bound on the target node's activation value should it receive an adjustment to below 0.0. Once a node has fired it may not fire again, although variations of the basic algorithm permit repeated firings and loops through the graph. Nodes receiving a new activation value that exceeds the firing threshold F are marked for firing on the next spreading activation cycle. If activation originates from more than one node, a variation of the algorithm permits marker passing to distinguish the paths by which activation is spread over the graph The procedure terminates when either there are no more nodes to fire or in the case of marker passing from multiple origins, when a node is reached from more than one path. Variations of the algorithm that permit repeated node firings and activation loops in the graph, terminate after a steady activation state, with respect to some delta, is reached, or when a maximum number of iterations is exceeded. == Examples ==
Aggregation (linguistics)
In linguistics, aggregation is a subtask of natural language generation, which involves merging syntactic constituents (such as sentences and phrases) together. Sometimes aggregation can be done at a conceptual level. == Examples == A simple example of syntactic aggregation is merging the two sentences John went to the shop and John bought an apple into the single sentence John went to the shop and bought an apple. Syntactic aggregation can be much more complex than this. For example, aggregation can embed one of the constituents in the other; e.g., we can aggregate John went to the shop and The shop was closed into the sentence John went to the shop, which was closed. From a pragmatic perspective, aggregating sentences together often suggests to the reader that these sentences are related to each other. If this is not the case, the reader may be confused. For example, someone who reads John went to the shop and bought an apple may infer that the apple was bought in the shop; if this is not the case, then these sentences should not be aggregated. == Algorithms and issues == Aggregation algorithms must do two things: Decide when two constituents should be aggregated Decide how two constituents should be aggregated, and create the aggregated structure The first issue, deciding when to aggregate, is poorly understood. Aggegration decisions certainly depend on the semantic relations between the constituents, as mentioned above; they also depend on the genre (e.g., bureaucratic texts tend to be more aggregated than instruction manuals). They probably should depend on rhetorical and discourse structure. The literacy level of the reader is also probably important (poor readers need shorter sentences). But we have no integrated model which brings all these factors together into a single algorithm. With regard to the second issue, there have been some studies of different types of aggregation, and how they should be carried out. Harbusch and Kempen describe several syntactic aggregation strategies. In their terminology, John went to the shop and bought an apple is an example of forward conjunction Reduction Much less is known about conceptual aggregation. Di Eugenio et al. show how conceptual aggregation can be done in an intelligent tutoring system, and demonstrate that performing such aggregation makes the system more effective (and that conceptual aggregation make a bigger impact than syntactic aggregation). == Software == Unfortunately there is not much software available for performing aggregation. However the SimpleNLG system does include limited support for basic aggregation. For example, the following code causes SimpleNLG to print out The man is hungry and buys an apple.
Natural language processing
Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and linguistics more broadly. Major processing tasks in an NLP system include: speech recognition, text classification, natural language understanding, and natural language generation. == History == Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence," which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language. === Symbolic NLP (1950s – early 1990s) === The premise of symbolic NLP is often illustrated using John Searle's Chinese room thought experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. 1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe) until the late 1980s when the first statistical machine translation systems were developed. 1960s: Some notably successful natural language processing systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies, and ELIZA, a simulation of Rogerian psychotherapy, written by Joseph Weizenbaum between 1964 and 1966. Despite using minimal information about human thought or emotion, ELIZA was able to produce interactions that appeared human-like. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only twenty words, because that was all that would fit in a computer memory at the time. 1970s: During the 1970s, many programmers began to write "conceptual ontologies", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first chatterbots were written (e.g., PARRY). 1980s: The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of HPSG as a computational operationalization of generative grammar), morphology (e.g., two-level morphology), semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory) and other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other lines of research were continued, e.g., the development of chatterbots with Racter and Jabberwacky. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period. === Statistical NLP (1990s–present) === Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This shift was influenced by increasing computational power (see Moore's law) and a decline in the dominance of Chomskyan linguistic theories (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. 1990s: Many of the notable early successes in statistical methods in NLP occurred in the field of machine translation, due especially to work at IBM Research, such as IBM alignment models. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, many systems relied on corpora that were specifically developed for the tasks they were designed to perform. This reliance has been a major limitation to their broader effectiveness and continues to affect similar systems. Consequently, significant research has focused on methods for learning effectively from limited amounts of data. 2000s: With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, large quantities of non-annotated data are available (including, among other things, the entire content of the World Wide Web), which can often make up for the worse efficiency if the algorithm used has a low enough time complexity to be practical. 2003: word n-gram model, at the time the best statistical algorithm, is outperformed by a multi-layer perceptron (with a single hidden layer and context length of several words, trained on up to 14 million words, by Bengio et al.) 2010: Tomáš Mikolov (then a PhD student at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden layer to language modeling, and in the following years he went on to develop Word2vec. In the 2010s, representation learning and deep neural network-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. This shift gained momentum due to results showing that such techniques can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling and parsing. This is increasingly important in medicine and healthcare, where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care or protect patient privacy. == Approaches: Symbolic, statistical, neural networks == Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular: such as by writing grammars or devising heuristic rules for stemming. Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: both statistical and neural network methods tend to focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare and common cases equally. language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce. the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to intractability problems. Rule-based systems are commonly used: when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system, for preprocessing in NLP pipelines, e.g., tokenization, or for post-processing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses. === Statistical approach === In the late 1980s and mid-1990s, the statistical approach ended a peri
Ghana Post GPS
GhanaPostGPS is a web and smartphone application, sponsored by the government of Ghana and developed by Vokacom, to provide a digital addresses and postal codes for every 5 squared meter location in Ghana. The digital address is a composite of the postcode (region, district & area code) plus a unique address. GhanaPostGPS is the first digital addressing system created by the government of Ghana. GhanaPost GPS is a mandatory requirement for obtaining the National Identification Card and other services.
Netomi
Netomi, formerly msg.ai, is an American artificial intelligence company and developer of chatbot technologies. == History == msg.ai was founded in May 2015 by Puneet Mehta. msg.ai worked with Sony Pictures to launch a chat bot on Facebook Messenger for a $100M film, Goosebumps and subsequently joined Y Combinator as a member of the Winter 2016 class. Later that year and in 2017, msg.ai completed two rounds of seed funding, led by Y Combinator and Index Ventures. In 2018, the company changed its name to Netomi. In 2019, the company raised $14.7 million in a Series A funding round also led by Index Ventures. In 2021, the company raised $30 million in a Series B funding round led by WndrCo LLC.
Automated essay scoring
Automated essay scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational setting. It is a form of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades, for example, the numbers 1 to 6. Therefore, it can be considered a problem of statistical classification. Several factors have contributed to a growing interest in AES. Among them are cost, accountability, standards, and technology. Rising education costs have led to pressure to hold the educational system accountable for results by imposing standards. The advance of information technology promises to measure educational achievement at reduced cost. The use of AES for high-stakes testing in education has generated significant backlash, with opponents pointing to research that computers cannot yet grade writing accurately and arguing that their use for such purposes promotes teaching writing in reductive ways (i.e. teaching to the test). == History == Most historical summaries of AES trace the origins of the field to the work of Ellis Batten Page. In 1966, he argued for the possibility of scoring essays by computer, and in 1968 he published his successful work with a program called Project Essay Grade (PEG). Using the technology of that time, computerized essay scoring would not have been cost-effective, so Page abated his efforts for about two decades. Eventually, Page sold PEG to Measurement Incorporated. By 1990, desktop computers had become so powerful and so widespread that AES was a practical possibility. As early as 1982, a UNIX program called Writer's Workbench was able to offer punctuation, spelling and grammar advice. In collaboration with several companies (notably Educational Testing Service), Page updated PEG and ran some successful trials in the early 1990s. Peter Foltz and Thomas Landauer developed a system using a scoring engine called the Intelligent Essay Assessor (IEA). IEA was first used to score essays in 1997 for their undergraduate courses. It is now a product from Pearson Educational Technologies and used for scoring within a number of commercial products and state and national exams. IntelliMetric is Vantage Learning's AES engine. Its development began in 1996. It was first used commercially to score essays in 1998. Educational Testing Service offers "e-rater", an automated essay scoring program. It was first used commercially in February 1999. Jill Burstein was the team leader in its development. ETS's Criterion Online Writing Evaluation Service uses the e-rater engine to provide both scores and targeted feedback. Lawrence Rudner has done some work with Bayesian scoring, and developed a system called BETSY (Bayesian Essay Test Scoring sYstem). Some of his results have been published in print or online, but no commercial system incorporates BETSY as yet. Under the leadership of Howard Mitzel and Sue Lottridge, Pacific Metrics developed a constructed response automated scoring engine, CRASE. Currently utilized by several state departments of education and in a U.S. Department of Education-funded Enhanced Assessment Grant, Pacific Metrics’ technology has been used in large-scale formative and summative assessment environments since 2007. Measurement Inc. acquired the rights to PEG in 2002 and has continued to develop it. In 2012, the Hewlett Foundation sponsored a competition on Kaggle called the Automated Student Assessment Prize (ASAP). 201 challenge participants attempted to predict, using AES, the scores that human raters would give to thousands of essays written to eight different prompts. The intent was to demonstrate that AES can be as reliable as human raters, or more so. The competition also hosted a separate demonstration among nine AES vendors on a subset of the ASAP data. Although the investigators reported that the automated essay scoring was as reliable as human scoring, this claim was not substantiated by any statistical tests because some of the vendors required that no such tests be performed as a precondition for their participation. Moreover, the claim that the Hewlett Study demonstrated that AES can be as reliable as human raters has since been strongly contested, including by Randy E. Bennett, the Norman O. Frederiksen Chair in Assessment Innovation at the Educational Testing Service. Some of the major criticisms of the study have been that five of the eight datasets consisted of paragraphs rather than essays, four of the eight data sets were graded by human readers for content only rather than for writing ability, and that rather than measuring human readers and the AES machines against the "true score", the average of the two readers' scores, the study employed an artificial construct, the "resolved score", which in four datasets consisted of the higher of the two human scores if there was a disagreement. This last practice, in particular, gave the machines an unfair advantage by allowing them to round up for these datasets. In 1966, Page hypothesized that, in the future, the computer-based judge will be better correlated with each human judge than the other human judges are. Despite criticizing the applicability of this approach to essay marking in general, this hypothesis was supported for marking free text answers to short questions, such as those typical of the British GCSE system. Results of supervised learning demonstrate that the automatic systems perform well when marking by different human teachers is in good agreement. Unsupervised clustering of answers showed that excellent papers and weak papers formed well-defined clusters, and the automated marking rule for these clusters worked well, whereas marks given by human teachers for the third cluster ('mixed') can be controversial, and the reliability of any assessment of works from the 'mixed' cluster can often be questioned (both human and computer-based). == Different dimensions of essay quality == According to a recent survey, modern AES systems try to score different dimensions of an essay's quality in order to provide feedback to users. These dimensions include the following items: Grammaticality: following grammar rules Usage: using of prepositions, word usage Mechanics: following rules for spelling, punctuation, capitalization Style: word choice, sentence structure variety Relevance: how relevant of the content to the prompt Organization: how well the essay is structured Development: development of ideas with examples Cohesion: appropriate use of transition phrases Coherence: appropriate transitions between ideas Thesis Clarity: clarity of the thesis Persuasiveness: convincingness of the major argument == Procedure == From the beginning, the basic procedure for AES has been to start with a training set of essays that have been carefully hand-scored. The program evaluates surface features of the text of each essay, such as the total number of words, the number of subordinate clauses, or the ratio of uppercase to lowercase letters—quantities that can be measured without any human insight. It then constructs a mathematical model that relates these quantities to the scores that the essays received. The same model is then applied to calculate scores of new essays. Recently, one such mathematical model was created by Isaac Persing and Vincent Ng. which not only evaluates essays on the above features, but also on their argument strength. It evaluates various features of the essay, such as the agreement level of the author and reasons for the same, adherence to the prompt's topic, locations of argument components (major claim, claim, premise), errors in the arguments, cohesion in the arguments among various other features. In contrast to the other models mentioned above, this model is closer in duplicating human insight while grading essays. Due to the growing popularity of deep neural networks, deep learning approaches have been adopted for automated essay scoring, generally obtaining superior results, often surpassing inter-human agreement levels. The various AES programs differ in what specific surface features they measure, how many essays are required in the training set, and most significantly in the mathematical modeling technique. Early attempts used linear regression. Modern systems may use linear regression or other machine learning techniques often in combination with other statistical techniques such as latent semantic analysis and Bayesian inference. The automated essay scoring task has also been studied in the cross-domain setting using machine learning models, where the models are trained on essays written for one prompt (topic) and tested on essays written for another prompt. Successful approaches in the cross-domain scenario are based on deep neural networks or models that combine deep and shallow features. == Criteria for success == Any method of a