ASR-complete

ASR-complete

ASR-complete is, by analogy to "NP-completeness" in complexity theory, a term to indicate that the difficulty of a computational problem is equivalent to solving the central automatic speech recognition problem, i.e. recognize and understanding spoken language. Unlike "NP-completeness", this term is typically used informally. Such problems are hypothesised to include: Spoken natural language understanding Understanding speech from far-field microphones, i.e. handling the reverbation and background noise These problems are easy for humans to do (in fact, they are described directly in terms of imitating humans). Some systems can solve very simple restricted versions of these problems, but none can solve them in their full generality.

Stevens Award

The Stevens Award is a software engineering lecture award given by the Reengineering Forum, an industry association. The international Stevens Award was created to recognize outstanding contributions to the literature or practice of methods for software and systems development. The first award was given in 1995. The presentations focus on the current state of software methods and their direction for the future. This award lecture is named in memory of Wayne Stevens (1944-1993), a consultant, author, pioneer, and advocate of the practical application of software methods and tools. The Stevens Award and lecture is managed by the Reengineering Forum. The award was founded by International Workshop on Computer Aided Software Engineering (IWCASE), an international workshop association of users and developers of computer-aided software engineering (CASE) technology, which merged into The Reengineering Forum. Wayne Stevens was a charter member of the IWCASE executive board. == Recipients == 1995: Tony Wasserman 1996: David Harel 1997: Michael Jackson 1998: Thomas McCabe 1999: Tom DeMarco 2000: Gerald Weinberg 2001: Peter Chen 2002: Cordell Green 2003: Manny Lehman 2004: François Bodart 2005: Mary Shaw, Jim Highsmith 2006: Grady Booch 2007: Nicholas Zvegintzov 2008: Harry Sneed 2009: Larry Constantine 2010: Peter Aiken 2011: Jared Spool, Barry Boehm 2012: Philip Newcomb 2013: Jean-Luc Hainaut 2014: François Coallier 2015: Pierre Bourque

Transaction logic

Transaction Logic is an extension of predicate logic that accounts in a clean and declarative way for the phenomenon of state changes in logic programs and databases. This extension adds connectives specifically designed for combining simple actions into complex transactions and for providing control over their execution. The logic has a natural model theory and a sound and complete proof theory. Transaction Logic has a Horn clause subset, which has a procedural as well as a declarative semantics. The important features of the logic include hypothetical and committed updates, dynamic constraints on transaction execution, non-determinism, and bulk updates. In this way, Transaction Logic is able to declaratively capture a number of non-logical phenomena, including procedural knowledge in artificial intelligence, active databases, and methods with side effects in object databases. Transaction Logic was originally proposed in 1993 by Anthony Bonner and Michael Kifer and later described in more detail in An Overview of Transaction Logic and Logic Programming for Database Transactions. The most comprehensive description appears in Bonner & Kifer's technical report from 1995. In later years, Transaction Logic was extended in various ways, including concurrency, defeasible reasoning, partially defined actions, and other features. In 2013, the original paper on Transaction Logic has won the 20-year Test of Time Award of the Association for Logic Programming as the most influential paper from the proceedings of ICLP 1993 conference in the preceding 20 years. == Examples == === Graph coloring === Here tinsert denotes the elementary update operation of transactional insert. The connective ⊗ is called serial conjunction. === Pyramid stacking === The elementary update tdelete represents the transactional delete operation. === Hypothetical execution === Here <> is the modal operator of possibility: If both action1 and action2 are possible, execute action1. Otherwise, if only action2 is possible, then execute it. === Dining philosophers === Here | is the logical connective of parallel conjunction of Concurrent Transaction Logic. == Implementations == A number of implementations of Transaction Logic exist: The original implementation. An implementation of Concurrent Transaction Logic. Transaction Logic enhanced with tabling. An implementation of Transaction Logic has also been incorporated as part of the Flora-2 knowledge representation and reasoning system. All these implementations are open source.

Minion (solver)

Minion is a solver for satisfaction problems. Unlike constraint programming toolkits, which expect users to write programs in a traditional programming language like C++, Java or Prolog, Minion takes a text file which specifies the problem, and solves using only this. This makes using Minion much simpler, at the cost of much less customization. Minion has been shown to be faster than major commercial constraint solvers including CPLEX (formerly IBM ILOG). == Overview == Minion was introduced in 2006 by researchers at the University of St Andrews as a “fast, scalable” solver for large and hard CSP instances. The project provides a compact input language and a low-overhead C++ implementation aimed at throughput and memory efficiency. == Design and features == Minion implements a range of variable and constraint types commonly used in CSP modelling, plus search heuristics and optimisation support. The solver architecture prioritises cache-friendly data structures and specialised propagators. Notably, the developers adapted watched literal techniques from SAT solving to speed up constraint propagation for, among others, Boolean sums, the element global constraint, and table constraints. The modelling approach relies on a plain-text format (parsed by Minion) rather than embedding models into a host programming language. This reduces overhead and supports rapid “model-and-run” experimentation for large benchmark sets. == Performance == In the original evaluation on standard benchmarks, the authors reported that Minion often ran between one and two orders of magnitude faster than state-of-the-art toolkits of the time (including ILOG Solver and Gecode) on large, hard instances, with smaller gains—or slowdowns—on easier problems. Subsequent research has used Minion as a baseline solver in empirical studies and test generation tasks, reflecting its adoption within parts of the constraint programming community. == Applications == Minion has been applied in academic work on combinatorial search, scheduling and test generation, and is available to other environments via wrappers (for example, from the R language).

Vivid knowledge

Vivid knowledge refers to a specific kind of knowledge representation. The idea of a vivid knowledge base is to get an interpretation mostly straightforward out of it – it implies the interpretation. Thus, any query to such a knowledge base can be reduced to a database-like query. == Propositional knowledge base == A propositional knowledge base KB is vivid iff KB is a complete and consistent set of literals (over some vocabulary). Such a knowledge base has the property that it as exactly one interpretation, i.e. the interpretation is unique. A check for entailment of a sentence can simply be broken down into its literals and those can be answered by a simple database-like check of KB. == First-order knowledge base == A first-order knowledge base KB is vivid iff for some finite set of positive function-free ground literals KB+, KB = KB+ ∪ Negations ∪ DomainClosure ∪ UniqueNames, whereby Negations ≔ { ¬p | p is atomic and KB ⊭ p }, DomainClosure ≔ { (ci ≠ cj) | ci, cj are distinct constants }, UniqueNames ≔ { ∀x: (x = c1) ∨ (x = c2) ∨ ..., where the ci are all the constants in KB+ }. All interpretations of a vivid first-order knowledge base are isomorphic.

ImageNet

The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes. == History == AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms. In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features. She was also inspired by a 1987 estimate that the average person recognizes roughly 30,000 different kinds of objects. As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images. Labeling started in July 2008 and ended in April 2010. It took 49K workers from 167 countries filtering and labeling over 160M candidate images. They had enough budget to have each of the 14 million images labelled three times. The original plan called for 10,000 images per category, for 40,000 categories at 400 million images, each verified 3 times. They found that humans can classify at most 2 images/sec. At this rate, it was estimated to take 19 human-years of labor (without rest). They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset". The poster was reused at Vision Sciences Society 2009. In 2009, Alex Berg suggested adding object localization as a task. Li approached PASCAL Visual Object Classes contest in 2009 for a collaboration. It resulted in the subsequent ImageNet Large Scale Visual Recognition Challenge starting in 2010, which has 1000 classes and object localization, as compared to PASCAL VOC which had just 20 classes and 19,737 images (in 2010). === Significance for deep learning === On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner-up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training, an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest, having 3.57% error on the test set. Andrej Karpathy estimated in 2014 that with concentrated effort, he could reach 5.1% error rate, and ~10 people from his lab reached ~12-13% with less effort. It was estimated that with maximal effort, a human could reach 2.4%. == Dataset == ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification. In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute. The original plan of the full ImageNet would have roughly 50M clean, diverse and full resolution images spread over approximately 50K synsets. This was not achieved. The summary statistics given on April 30, 2010: Total number of non-empty synsets: 21841 Total number of images: 14,197,122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million === Categories === The categories of ImageNet were filtered from the WordNet concepts. Each concept, since it can contain multiple synonyms (for example, "kitty" and "young cat"), so each concept is called a "synonym set" or "synset". There were more than 100,000 synsets in WordNet 3.0, majority of them are nouns (80,000+). The ImageNet dataset filtered these to 21,841 synsets that are countable nouns that can be visually illustrated. Each synset in WordNet 3.0 has a "WordNet ID" (wnid), which is a concatenation of part of speech and an "offset" (a unique identifying number). Every wnid starts with "n" because ImageNet only includes nouns. For example, the wnid of synset "dog, domestic dog, Canis familiaris" is "n02084071". The categories in ImageNet fall into 9 levels, from level 1 (such as "mammal") to level 9 (such as "German shepherd"). === Image format === The images were scraped from online image search (Google, Picsearch, MSN, Yahoo, Flickr, etc) using synonyms in multiple languages. For example: German shepherd, German police dog, German shepherd dog, Alsatian, ovejero alemán, pastore tedesco, 德国牧羊犬. ImageNet consists of images in RGB format with varying resolutions. For example, in ImageNet 2012, "fish" category, the resolution ranges from 4288 x 2848 to 75 x 56. In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks. For example, in PyTorch, ImageNet images are by default normalized by dividing the pixel values so that they fall between 0 and 1, then subtracting by [0.485, 0.456, 0.406], then dividing by [0.229, 0.224, 0.225]. These are the mean and standard deviations for ImageNet, so this whitens the input data. === Labels and annotations === Each image is labelled with exactly one wnid. Dense SIFT features (raw SIFT descriptors, quantized codewords, and coordinates of each descriptor/codeword) for ImageNet-1K were available for download, designed for bag of visual words. The bounding boxes of objects were available for about 3000 popular synsets with on average 150 images in each synset. Furthermore, some images have attributes. They released 25 attributes for ~400 popular synsets: Color: black, blue, brown, gray, green, orange, pink, red, violet, white, yellow Pattern: spotted, striped Shape: long, round, rectangular, square Texture: furry, smooth, rough, shiny, metallic, vegetation, wooden, wet === ImageNet-21K === The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k. The full ImageNet-21k was released in Fall of 2011, as fall11_whole.tar. There is no official train-validation-test split for ImageNet-21k. Some classes contain only 1-10 samples, while others contain thousands. === ImageNet-1K === There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the most highly used subsets of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Each category in ImageNet-1K is a leaf category, meaning that there are no child nodes below it, unlike ImageNet-21K. For example, in ImageNet-21K, there are some images categorized as simply "mammal", whereas in ImageNet-1K, there are only images categorized as things like "German shepherd", since there are no child-words below "German shepherd". === Later developments === In the WordNet they built ImageNet on, there were 2832 synsets in the "person" subtree. During 2018--2020 period, they removed the download of the ImageNet-21k as they went through extensive filtering in these person synsets. Out of these 2832 synsets, 1593 were deemed "potentially offensive". Out of the remaining 1239, 1081 were deemed not really "visual". The result was that only 158 syn

Allen's interval algebra

Allen's interval algebra is a calculus for temporal reasoning that was introduced by James F. Allen in 1983. The calculus defines possible relations between time intervals and provides a composition table that can be used as a basis for reasoning about temporal descriptions of events. == Formal description == === Relations === The following 13 base relations capture the possible relations between two intervals. To see that the 13 relations are exhaustive, note that each point of X {\displaystyle X} can be at 5 possible locations relative to Y {\displaystyle Y} : before, at the start, within, at the end, after. These give 5 + 4 + 3 + 2 + 1 = 15 {\displaystyle 5+4+3+2+1=15} possible relative positions for the start and the end of X {\displaystyle X} . Of these, we cannot have X 0 = X 1 = Y 0 {\displaystyle X_{0}=X_{1}=Y_{0}} since X 0 < X 1 {\displaystyle X_{0}