AI Chat Free No Limit

AI Chat Free No Limit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ELIZA

    ELIZA

    ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and substitution methodology that gave users an illusion of understanding on the part of the program, but gave no response that could be considered really understanding what was being said by either party. Whereas the ELIZA program itself was written (originally) in MAD-SLIP, the pattern matching directives that contained most of its language capability were provided in separate "scripts", represented in a Lisp-like expression. The most famous script, DOCTOR, simulated a psychotherapist of the Rogerian school (in which the therapist often reflects back the patient's words to the patient), and used rules, dictated in the script, to respond with non-directional questions to user inputs. As such, ELIZA was one of the first chatbots (originally "chatterbots") and one of the first programs capable of attempting the Turing test. Weizenbaum intended the program as a method to explore communication between humans and machines. He was surprised that some people, including his secretary, attributed human-like feelings to the computer program, a phenomenon that came to be called the ELIZA effect. Many academics believed that the program would be able to positively influence the lives of many people, particularly those with psychological issues, and that it could aid doctors working on such patients' treatment. While ELIZA was capable of engaging in discourse, it could not converse with true understanding. However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary. The original ELIZA source code had been missing since its creation in the 1960s, as it was not common to publish articles that included source code at that time. However, more recently the MAD-SLIP source code was discovered in the MIT archives and published on various platforms, such as the Internet Archive. The source code is of high historical interest since it demonstrates not only the specificity of programming languages and techniques at that time, but also the beginning of software layering and abstraction as a means of achieving sophisticated software programming. == Overview == Joseph Weizenbaum's ELIZA, running the DOCTOR script, created a conversational interaction somewhat similar to what might take place in the office of "a [non-directive] psychotherapist in an initial psychiatric interview" and to "demonstrate that the communication between man and machine was superficial". While ELIZA is best known for acting in the manner of a psychotherapist, the speech patterns are due to the data and instructions supplied by the DOCTOR script. ELIZA itself examined the text for keywords, applied values to said keywords, and transformed the input into an output; the script that ELIZA ran determined the keywords, set the values of keywords, and set the rules of transformation for the output. Weizenbaum chose to make the DOCTOR script in the context of psychotherapy to "sidestep the problem of giving the program a data base of real-world knowledge", allowing it to reflect back the patient's statements to carry the conversation forward. The result was a somewhat intelligent-seeming response that reportedly deceived some early users of the program. Weizenbaum named his program ELIZA after Eliza Doolittle, a working-class character in George Bernard Shaw's Pygmalion (also appearing in the musical My Fair Lady, which was based on the play and was hugely popular at the time). According to Weizenbaum, ELIZA's ability to be "incrementally improved" by various users made it similar to Eliza Doolittle, since Eliza Doolittle was taught to speak with an upper-class accent in Shaw's play. However, unlike the human character in Shaw's play, ELIZA is incapable of learning new patterns of speech or new words through interaction alone. Edits must be made directly to ELIZA's active script in order to change the manner by which the program operates. Weizenbaum first implemented ELIZA in his own SLIP list-processing language, where, depending upon the initial entries by the user, the illusion of human intelligence could appear, or be dispelled through several interchanges. Some of ELIZA's responses were so convincing that Weizenbaum and several others have anecdotes of users becoming emotionally attached to the program, occasionally forgetting that they were conversing with a computer. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." In 1966, interactive computing (via a teletype) was new. It was 11 years before the personal computer became familiar to the general public, and three decades before most people encountered attempts at natural language processing in Internet services like Ask.com or PC help systems such as Microsoft Office Clippit. Although those programs included years of research and work, ELIZA remains a milestone because it was the first time a programmer had attempted such a human-machine interaction with the goal of creating the illusion (however brief) of human–human interaction. At the ICCC 1972, ELIZA was brought together with another early artificial-intelligence program named PARRY for a computer-only conversation. While ELIZA was built to speak as a doctor, PARRY was intended to simulate a patient with schizophrenia. == Design and implementation == Weizenbaum originally wrote ELIZA in MAD-SLIP for CTSS on an IBM 7094 as a program to make natural-language conversation possible with a computer. To accomplish this, Weizenbaum identified five "fundamental technical problems" for ELIZA to overcome: the identification of key words, the discovery of a minimal context, the choice of appropriate transformations, the generation of responses in the absence of key words, and the provision of an editing capability for ELIZA scripts. Weizenbaum solved these problems and made ELIZA such that it had no built-in contextual framework or universe of discourse. However, this required ELIZA to have a script of instructions on how to respond to inputs from users. ELIZA starts its process of responding to an input by a user by first examining the text input for a "keyword". A "keyword" is a word designated as important by the acting ELIZA script, which assigns to each keyword a precedence number, or a RANK, designed by the programmer. If such words are found, they are put into a "keystack", with the keyword of the highest RANK at the top. The input sentence is then manipulated and transformed as the rule associated with the keyword of the highest RANK directs. For example, when the DOCTOR script encounters words such as "alike" or "same", it would output a message pertaining to similarity, in this case "In what way?", as these words had high precedence number. This also demonstrates how certain words, as dictated by the script, can be manipulated regardless of contextual considerations, such as switching first-person pronouns and second-person pronouns and vice versa, as these too had high precedence numbers. Such words with high precedence numbers are deemed superior to conversational patterns and are treated independently of contextual patterns. Following the first examination, the next step of the process is to apply an appropriate transformation rule, which includes two parts: the "decomposition rule" and the "reassembly rule". First, the input is reviewed for syntactical patterns in order to establish the minimal context necessary to respond. Using the keywords and other nearby words from the input, different disassembly rules are tested until an appropriate pattern is found. Using the script's rules, the sentence is then "dismantled" and arranged into sections of the component parts as the "decomposition rule for the highest-ranking keyword" dictates. The example that Weizenbaum gives is the input "You are very helpful", which is transformed to "I are very helpful". This is then broken into (1) empty (2) "I" (3) "are" (4) "very helpful". The decomposition rule has broken the phrase into four small segments that contain both the keywords and the information in the sentence. The decomposition rule then designates a particular reassembly rule, or set of reassembly rules, to follow when reconstructing the sentence. The reassembly rule takes the fragments of the input that the decomposition rule had created, rearranges them, and adds in programmed words to create a response. Using Weizenbaum's example previously stated, such a reassembly rule would take the fragments and apply them to the phrase "What makes

    Read more →
  • Fred (chatbot)

    Fred (chatbot)

    Fred, or FRED, was an early chatbot written by Robby Garner. == History == The name Fred was initially suggested by Karen Lindsey, and then Robby jokingly came up with an acronym, "Functional Response Emulation Device." Fred has also been implemented as a Java application by Paco Nathan called JFRED Archived 2008-08-24 at the Wayback Machine. Fred Chatterbot is designed to explore Natural Language communications between people and computer programs. In particular, this is a study of conversation between people and ways that a computer program can learn from other people's conversations to make its own conversations. Fred used a minimalistic "stimulus-response" approach. It worked by storing a database of statements and their responses, and made its own reply by looking up the input statements made by a user and then rendering the corresponding response from the database. This approach simplified the complexity of the rule base, but required expert coding and editing for modifications. Fred was a predecessor to Albert One, which Garner used in 1998 and 1999 to win the Loebner Prize.

    Read more →
  • IruSoft

    IruSoft

    IruSoft (Arabic: آيروسوفت) is an insurance regulatory platform designated for licensing, supervision and inspection of the insurance sector within a country. The platform introduced unique supervision-technology (suptech), insurance-technology (insurtech) and regulatory-technology (regtech) automated modules by which a regulator requires less resources to ensure fairness, transparency and competition and to prevent conflicts of interest in the sector. IruSoft was founded by Abdullah Al-Salloum and owned by the Insurance Regulatory Unit in Kuwait. The Insurance Regulatory Unit optimized processing insurance-sector's customer complaints by issuing Resolution No. (1) of 2022 that introduced IruSoft's complaints public module; an automated resolution center, by which the process of receiving submitted complaints, passing them on to the platforms of licensed insurance companies, tracking matter-related discussions and updates and getting them escalated if unresolved to be discussed by a committee assigned by the unit is integrally automated and analyzed for better key performance indicators.

    Read more →
  • Noisy text analytics

    Noisy text analytics

    Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques. == Techniques for noisy text analysis == Missing punctuation and the use of non-standard words can often hinder standard natural language processing tools such as part-of-speech tagging and parsing. Techniques to both learn from the noisy data and then to be able to process the noisy data are only now being developed. == Possible source of noisy text == World Wide Web: Poorly written text is found in web pages, online chat, blogs, wikis, discussion forums, newsgroups. Most of these data are unstructured and the style of writing is very different from, say, well-written news articles. Analysis for the web data is important because they are sources for market buzz analysis, market review, trend estimation, etc. Also, because of the large amount of data, it is necessary to find efficient methods of information extraction, classification, automatic summarization and analysis of these data. Contact centers: This is a general term for help desks, information lines and customer service centers operating in domains ranging from computer sales and support to mobile phones to apparels. On an average a person in the developed world interacts at least once a week with a contact center agent. A typical contact center agent handles over a hundred calls per day. They operate in various modes such as voice, online chat and E-mail. The contact center industry produces gigabytes of data in the form of E-mails, chat logs, voice conversation transcriptions, customer feedback, etc. A bulk of the contact center data is voice conversations. Transcription of these using state of the art automatic speech recognition results in text with 30-40% word error rate. Further, even written modes of communication like online chat between customers and agents and even the interactions over email tend to be noisy. Analysis of contact center data is essential for customer relationship management, customer satisfaction analysis, call modeling, customer profiling, agent profiling, etc., and it requires sophisticated techniques to handle poorly written text. Printed Documents: Many libraries, government organizations and national defence organizations have vast repositories of hard copy documents. To retrieve and process the content from such documents, they need to be processed using Optical Character Recognition. In addition to printed text, these documents may also contain handwritten annotations. OCRed text can be highly noisy depending on the font size, quality of the print etc. It can range from 2-3% word error rates to as high as 50-60% word error rates. Handwritten annotations can be particularly hard to decipher, and error rates can be quite high in their presence. Short Messaging Service (SMS): Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language.

    Read more →
  • Tiki Wiki CMS Groupware

    Tiki Wiki CMS Groupware

    Tiki Wiki CMS Groupware or simply Tiki, originally known as TikiWiki, is a free and open source Wiki-based content management system and online office suite written primarily in PHP and distributed under the GNU Lesser General Public License (LGPL-2.1-only) license. In addition to enabling websites and portals on the internet and on intranets and extranets, Tiki contains a number of collaboration features allowing it to operate as a Geospatial Content Management System (GeoCMS) and Groupware web application. Tiki includes all the basic features common to most CMSs such as the ability to register and maintain individual user accounts within a flexible and rich permission / privilege system, create and manage menus, RSS-feeds, customize page layout, perform logging, and administer the system. All administration tasks are accomplished through a browser-based user interface. Tiki features an all-in-one design, as opposed to a core+extensions model followed by other CMSs. This allows for future-proof upgrades (since all features are released together), but has the drawback of an extremely large codebase (more than 1,000,000 lines). Tiki can run on any computing platform that supports both a web server capable of running PHP 5 (including Apache HTTP Server, IIS, Lighttpd, Hiawatha, Cherokee, and nginx) and a MySQL/MariaDB database to store content and settings. == Major components == Tiki has four major categories of components: content creation and management tools, content organization tools and navigation aids, communication tools, and configuration and administration tools. These components enable administrators and users to create and manage content, as well as letting them communicate to others and configure sites. In addition, Tiki allows each user to choose from various visual themes. These themes are implemented using CSS and the open source Smarty template engine. Additional themes can be created by a Tiki administrator for branding or customization as well. == Internationalization == Tiki is an international project, supporting many languages. The default interface language in Tiki is English, but any language that can be encoded and displayed using the UTF-8 encoding can be supported. Translated strings can be included via an external language file, or by translating interface strings directly, through the database. As of 29 September 2005, Tiki had been fully translated into eight languages and reportedly 90% or more translated into another five languages, as well as partial translations for nine additional languages. Tiki also supports interactive translation of actual wiki pages and was the initial wiki engine used in the Cross Lingual Wiki Engine Project. This allows Tiki-based web sites to have translated content — not just the user interface. == Implementation == Tiki is developed primarily in PHP with some JavaScript code. It uses MySQL/MariaDB as a database. It will run on any server that provides PHP 5, including Apache and Microsoft's IIS. Tiki components make extensive use of other open source projects, including Zend Framework, Smarty, jQuery, HTML Purifier, FCKeditor, Raphaël, phpCAS, and Morcego. When used with Mapserver Tiki can become a Geospatial Content Management System. == Project team == Tiki is under active development by a large international community of over 300 developers and translators, and is one of the largest open-source teams in the world. Project members have donated the resources and bandwidth required to host the tiki.org website and various subdomains. The project members refer to this dependence on their own product as "eating their own dogfood", which they have been doing since the early days of the project. Tiki community members also participate in various related events such as WikiSym and the Libre Software Meeting. == History == Tiki has been hosted on SourceForge.net since its initial release (Release 0.9, named Spica) in October 2002. It was primarily the development of Luis Argerich (Buenos Aires, Argentina), Eduardo Polidor (São Paulo, Brazil), and Garland Foster (Green Bay, WI, United States). In July 2003, Tiki was named the SourceForge.net July 2003 Project of the Month. In late 2003, a fork of Tiki was used to create Bitweaver. In 2006, Tiki was named to CMS Report's Top 30 Web Applications. In 2008, Tiki was named to EContent magazine's Top 100 In 2009, Tiki adopted a six-month release cycle and announced the selection of a Long Term Support (LTS) version and the Tiki Software Community Association was formed as the legal steward for Tiki. The Tiki Software Association is a not-for-profit entity established in Canada. Previously, the entire project was run entirely by volunteers. In 2010, Tiki received Best of Open Source Software Applications Award (BOSSIE) from InfoWorld, in the Applications category. In 2011, Tiki was named to CMS Report's Top 30 Web Applications. In 2012, Tiki was named "Best Web Tool" by WebHostingSearch.com, and "People's Choice: Best Free CMS" by CMS Critic. In 2016, Tiki was named as one of the "10 Best Open Source Collaboration Software Tools" by Small Business Computing. == Name == The name TikiWiki is written in CamelCase, a common Wiki syntax indicating a hyperlink within the Wiki. It is most likely a compound word combining two Polynesian terms, Tiki and Wiki, to create a self-rhyming name similar to wikiwiki, a common variant of wiki. A backronym has also been formed for Tiki: Tightly Integrated Knowledge Infrastructure. == Release Information and History == In general, the Tiki Software Community Association releases a new major version of Tiki Wiki every 8 months where prior, non-LTS, major versions are supported until the first minor version release of the next major version (i.e., 16.0 ⇒ 17.1). Starting with version 12.x, Tiki Wiki LTS is supported for 5 years where it enters a security/maintenance release cycle upon the release of the next LTS version. Tiki Wiki's release history is outlined below.

    Read more →
  • Noisy text analytics

    Noisy text analytics

    Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques. == Techniques for noisy text analysis == Missing punctuation and the use of non-standard words can often hinder standard natural language processing tools such as part-of-speech tagging and parsing. Techniques to both learn from the noisy data and then to be able to process the noisy data are only now being developed. == Possible source of noisy text == World Wide Web: Poorly written text is found in web pages, online chat, blogs, wikis, discussion forums, newsgroups. Most of these data are unstructured and the style of writing is very different from, say, well-written news articles. Analysis for the web data is important because they are sources for market buzz analysis, market review, trend estimation, etc. Also, because of the large amount of data, it is necessary to find efficient methods of information extraction, classification, automatic summarization and analysis of these data. Contact centers: This is a general term for help desks, information lines and customer service centers operating in domains ranging from computer sales and support to mobile phones to apparels. On an average a person in the developed world interacts at least once a week with a contact center agent. A typical contact center agent handles over a hundred calls per day. They operate in various modes such as voice, online chat and E-mail. The contact center industry produces gigabytes of data in the form of E-mails, chat logs, voice conversation transcriptions, customer feedback, etc. A bulk of the contact center data is voice conversations. Transcription of these using state of the art automatic speech recognition results in text with 30-40% word error rate. Further, even written modes of communication like online chat between customers and agents and even the interactions over email tend to be noisy. Analysis of contact center data is essential for customer relationship management, customer satisfaction analysis, call modeling, customer profiling, agent profiling, etc., and it requires sophisticated techniques to handle poorly written text. Printed Documents: Many libraries, government organizations and national defence organizations have vast repositories of hard copy documents. To retrieve and process the content from such documents, they need to be processed using Optical Character Recognition. In addition to printed text, these documents may also contain handwritten annotations. OCRed text can be highly noisy depending on the font size, quality of the print etc. It can range from 2-3% word error rates to as high as 50-60% word error rates. Handwritten annotations can be particularly hard to decipher, and error rates can be quite high in their presence. Short Messaging Service (SMS): Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language.

    Read more →
  • Visual Turing Test

    Visual Turing Test

    The Visual Turing Test is “an operator-assisted device that produces a stochastic sequence of binary questions from a given test image”. The query engine produces a sequence of questions that have unpredictable answers given the history of questions. The test is only about vision and does not require any natural language processing. The job of the human operator is to provide the correct answer to the question or reject it as ambiguous. The query generator produces questions such that they follow a “natural story line”, similar to what humans do when they look at a picture. == History == Research in computer vision dates back to the 1960s when Seymour Papert first attempted to solve the problem. This unsuccessful attempt was referred to as the Summer Vision Project. The reason why it was not successful was because computer vision is more complicated than what people think. The complexity is in alignment with the human visual system. Roughly 50% of the human brain is devoted in processing vision, which indicates that it is a difficult problem. Later there were attempts to solve the problems with models inspired by the human brain. Perceptrons by Frank Rosenblatt, which is a form of the neural networks, was one of the first such approaches. These simple neural networks could not live up to their expectations and had certain limitations due to which they were not considered in future research. Later with the availability of the hardware and some processing power the research shifted to image processing which involves pixel-level operations, like finding edges, de-noising images or applying filters to name a few. There was some great progress in this field but the problem of vision which was to make the machines understand the images was still not being addressed. During this time the neural networks also resurfaced as it was shown that the limitations of the perceptrons can be overcome by Multi-layer perceptrons. Also in the early 1990s convolutional neural networks were born which showed great results on digit recognition but did not scale up well on harder problems. The late 1990s and early 2000s saw the birth of modern computer vision. One of the reasons this happened was due to the availability of key, feature extraction and representation algorithms. Features along with the already present machine learning algorithms were used to detect, localise and segment objects in Images. While all these advancements were being made, the community felt the need to have standardised datasets and evaluation metrics so the performances can be compared. This led to the emergence of challenges like the Pascal VOC challenge and the ImageNet challenge. The availability of standard evaluation metrics and the open challenges gave directions to the research. Better algorithms were introduced for specific tasks like object detection and classification. Visual Turing Test aims to give a new direction to the computer vision research which would lead to the introduction of systems that will be one step closer to understanding images the way humans do. == Current evaluation practices == A large number of datasets have been annotated and generalised to benchmark performances of difference classes of algorithms to assess different vision tasks (e.g., object detection/recognition) on some image domain (e.g., scene images). One of the most famous datasets in computer vision is ImageNet which is used to assess the problem of object level Image classification. ImageNet is one of the largest annotated datasets available and has over one million images. The other important vision task is object detection and localisation which refers to detecting the object instance in the image and providing the bounding box coordinates around the object instance or segmenting the object. The most popular dataset for this task is the Pascal dataset. Similarly there are other datasets for specific tasks like the H3D dataset for human pose detection, Core dataset to evaluate the quality of detected object attributes such as colour, orientation, and activity. Having these standard datasets has helped the vision community to come up with well performing algorithms for all these tasks. The next logical step is to create a larger task encompassing of these smaller subtasks. Having such a task would lead to building systems that would understand images, as understanding images would inherently involve detecting objects, localising them and segmenting them. == Details == The Visual Turing Test (VTT) unlike the Turing test has a query engine system which interrogates a computer vision system in the presence of a human co-ordinator. It is a system that generates a random sequence of binary questions specific to the test image, such that the answer to any question k is unpredictable given the true answers to the previous k − 1 questions (also known as history of questions). The test happens in the presence of a human operator who serves two main purposes: removing the ambiguous questions and providing the correct answers to the unambiguous questions. Given an Image infinite possible binary questions can be asked and a lot of them are bound to be ambiguous. These questions if generated by the query engine are removed by the human moderator and instead the query engine generates another question such that the answer to it is unpredictable given the history of the questions. The aim of the Visual Turing Test is to evaluate the Image understanding of a computer system, and an important part of image understanding is the story line of the image. When humans look at an image, they do not think that there is a car at ‘x’ pixels from the left and ‘y’ pixels from the top, but instead they look at it as a story, for e.g. they might think that there is a car parked on the road, a person is exiting the car and heading towards a building. The most important elements of the story line are the objects and so to extract any story line from an image the first and the most important task is to instantiate the objects in it, and that is what the query engine does. === Query engine === The query engine is the core of the Visual Turing Test and it comprises two main parts : Vocabulary and Questions ==== Vocabulary ==== Vocabulary is a set of words that represent the elements of the images. This vocabulary when used with appropriate grammar leads to a set of questions. The grammar is defined in the next section in a way that it leads to a space of binary questions. The vocabulary V {\displaystyle {\mathcal {V}}} consist of three components: Types of Objects T {\displaystyle {\mathcal {T}}} Type-dependent attributes of objects A ( t ) {\displaystyle {\mathcal {A}}(t)} Type-dependent relationships between two objects R ( t , t ′ ) {\displaystyle {\mathcal {R}}(t,t')} For Images of urban street scenes the types of objects include people, vehicle and buildings. Attributes refer to the properties of these objects, for e.g. female, child, wearing a hat or carrying something, for people and moving, parked, stopped, one tire visible or two tires visible for vehicles. Relationships between each pair of object classes can be either “ordered” or “unordered”. The unordered relationships may include talking, walking together and the ordered relationships include taller, closer to the camera, occluding, being occluded etc. Additionally all of this vocabulary is used in context of rectangular image regions w \in W which allow for the localisation of objects in the image. An extremely large number of such regions are possible and this complicates the problem, so for this test, regions at specific scales are only used which include 1/16 the size of image, 1/4 the size of image, 1/2 the size of image or larger. ==== Questions ==== The question space is composed of four types of questions: Existence questions: The aim of the existence questions is to find new objects in the image that have not been uniquely identified previously. They are of the form : Qexist = 'Is there an instance of an object of type t with attributes A partially visible in region w that was not previously instantiated?' Uniqueness questions: A uniqueness question tries to uniquely identify an object to instantiate it. Quniq = 'Is there a unique instance of an object of type t with attributes A partially visible in region w that was not previously instantiated?' The uniqueness questions along with the existence questions form the instantiation questions. As mentioned earlier instantiating objects leads to other interesting questions and eventually a story line. Uniqueness questions follow the existence questions and a positive answer to it leads to instantiation of an object. Attribute questions: An attribute question tries to find more about the object once it has been instantiated. Such questions can query about a single attribute, conjunction of two attributes or disjunction of two attributes. Qatt(ot) = {'Does object ot have attribute a?' , 'Does object

    Read more →
  • Shape context

    Shape context

    Shape context is a feature descriptor used in object recognition. Serge Belongie and Jitendra Malik proposed the term in their paper "Matching with Shape Contexts" in 2000. == Theory == The shape context is intended to be a way of describing shapes that allows for measuring shape similarity and the recovering of point correspondences. The basic idea is to pick n points on the contours of a shape. For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all other points. The set of all these vectors is a rich description of the shape localized at that point but is far too detailed. The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor. So, for the point pi, the coarse histogram of the relative coordinates of the remaining n − 1 points, h i ( k ) = # { q ≠ p i : ( q − p i ) ∈ bin ( k ) } {\displaystyle h_{i}(k)=\#\{q\neq p_{i}:(q-p_{i})\in {\mbox{bin}}(k)\}} is defined to be the shape context of p i {\displaystyle p_{i}} . The bins are normally taken to be uniform in log-polar space. The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter "A" are shown. (a) and (b) are the sampled edge points of the two shapes. (c) is the diagram of the log-polar bins used to compute the shape context. (d) is the shape context for the point marked with a circle in (a), (e) is that for the point marked as a diamond in (b), and (f) is that for the triangle. As can be seen, since (d) and (e) are the shape contexts for two closely related points, they are quite similar, while the shape context in (f) is very different. For a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scaling, small perturbations, and, depending on the application, rotation. Translational invariance comes naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance α {\displaystyle \alpha } between all the point pairs in the shape although the median distance can also be used. Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers using synthetic point set matching experiments. One can provide complete rotational invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotational invariance e.g. distinguishing a "6" from a "9". == Use in shape matching == A complete system that uses shape contexts for shape matching consists of the following steps (which will be covered in more detail in the Details of Implementation section): Randomly select a set of points that lie on the edges of a known shape and another set of points on an unknown shape. Compute the shape context of each point found in step 1. Match each point from the known shape to a point on an unknown shape. To minimize the cost of matching, first choose a transformation (e.g. affine, thin plate spline, etc.) that warps the edges of the known shape to the unknown (essentially aligning the two shapes). Then select the point on the unknown shape that most closely corresponds to each warped point on the known shape. Calculate the "shape distance" between each pair of points on the two shapes. Use a weighted sum of the shape context distance, the image appearance distance, and the bending energy (a measure of how much transformation is required to bring the two shapes into alignment). To identify the unknown shape, use a nearest-neighbor classifier to compare its shape distance to shape distances of known objects. == Details of implementation == === Step 1: Finding a list of points on shape edges === The approach assumes that the shape of an object is essentially captured by a finite subset of the points on the internal or external contours on the object. These can be simply obtained using the Canny edge detector and picking a random set of points from the edges. Note that these points need not and in general do not correspond to key-points such as maxima of curvature or inflection points. It is preferable to sample the shape with roughly uniform spacing, though it is not critical. === Step 2: Computing the shape context === This step is described in detail in the Theory section. === Step 3: Computing the cost matrix === Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k). As shape contexts are distributions represented as histograms, it is natural to use the χ2 test statistic as the "shape context cost" of matching the two points: C S = 1 2 ∑ k = 1 K [ g ( k ) − h ( k ) ] 2 g ( k ) + h ( k ) {\displaystyle C_{S}={\frac {1}{2}}\sum _{k=1}^{K}{\frac {[g(k)-h(k)]^{2}}{g(k)+h(k)}}} The values of this range from 0 to 1. In addition to the shape context cost, an extra cost based on the appearance can be added. For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition): C A = 1 2 ‖ ( cos ⁡ ( θ 1 ) sin ⁡ ( θ 1 ) ) − ( cos ⁡ ( θ 2 ) sin ⁡ ( θ 2 ) ) ‖ {\displaystyle C_{A}={\frac {1}{2}}{\begin{Vmatrix}{\dbinom {\cos(\theta _{1})}{\sin(\theta _{1})}}-{\dbinom {\cos(\theta _{2})}{\sin(\theta _{2})}}\end{Vmatrix}}} This is half the length of the chord in unit circle between the unit vectors with angles θ 1 {\displaystyle \theta _{1}} and θ 2 {\displaystyle \theta _{2}} . Its values also range from 0 to 1. Now the total cost of matching the two points could be a weighted-sum of the two costs: C = ( 1 − β ) C S + β C A {\displaystyle C=(1-\beta )C_{S}+\beta C_{A}\!\,} Now for each point pi on the first shape and a point qj on the second shape, calculate the cost as described and call it Ci,j. This is the cost matrix. === Step 4: Finding the matching that minimizes total cost === Now, a one-to-one matching π ( i ) {\displaystyle \pi (i)} that matches each point pi on shape 1 and qj on shape 2 that minimizes the total cost of matching, H ( π ) = ∑ i C ( p i , q π ( i ) ) {\displaystyle H(\pi )=\sum _{i}C\left(p_{i},q_{\pi (i)}\right)} is needed. This can be done in O ( N 3 ) {\displaystyle O(N^{3})} time using the Hungarian method, although there are more efficient algorithms. To have robust handling of outliers, one can add "dummy" nodes that have a constant but reasonably large cost of matching to the cost matrix. This would cause the matching algorithm to match outliers to a "dummy" if there is no real match. === Step 5: Modeling transformation === Given the set of correspondences between a finite set of points on the two shapes, a transformation T : R 2 → R 2 {\displaystyle T:\mathbb {R} ^{2}\to \mathbb {R} ^{2}} can be estimated to map any point from one shape to the other. There are several choices for this transformation, described below. ==== Affine ==== The affine model is a standard choice: T ( p ) = A p + o {\displaystyle T(p)=Ap+o\!} . The least squares solution for the matrix A {\displaystyle A} and the translational offset vector o is obtained by: o = 1 n ∑ i = 1 n ( p i − q π ( i ) ) , A = ( Q + P ) t {\displaystyle o={\frac {1}{n}}\sum _{i=1}^{n}\left(p_{i}-q_{\pi (i)}\right),A=(Q^{+}P)^{t}} Where P = ( 1 p 11 p 12 ⋮ ⋮ ⋮ 1 p n 1 p n 2 ) {\displaystyle P={\begin{pmatrix}1&p_{11}&p_{12}\\\vdots &\vdots &\vdots \\1&p_{n1}&p_{n2}\end{pmatrix}}} with a similar expression for Q {\displaystyle Q\!} . Q + {\displaystyle Q^{+}\!} is the pseudoinverse of Q {\displaystyle Q\!} . ==== Thin plate spline ==== The thin plate spline (TPS) model is the most widely used model for transformations when working with shape contexts. A 2D transformation can be separated into two TPS function to model a coordinate transform: T ( x , y ) = ( f x ( x , y ) , f y ( x , y ) ) {\displaystyle T(x,y)=\left(f_{x}(x,y),f_{y}(x,y)\right)} where each of the ƒx and ƒy have the form: f ( x , y ) = a 1 + a x x + a y y + ∑ i = 1 n ω i U ( ‖ ( x i , y i ) − ( x , y ) ‖ ) , {\displaystyle f(x,y)=a_{1}+a_{x}x+a_{y}y+\sum _{i=1}^{n}\omega _{i}U\left({\begin{Vmatrix}(x_{i},y_{i})-(x,y)\end{Vmatrix}}\right),} and the kernel function U ( r ) {\displaystyle U(r)\!} is defined by U ( r ) = r 2 log ⁡ r 2 {\displaystyle U(r)=r^{2}\log r^{2}\!} . The exact details of how to solve for the parameters can be found elsewhere but it essentially involves solving a linear system of equations. The bending energy (a measure of how much transformation is needed to align the points) will also be easily obtained. ==== Regularized TPS ==== The TPS formulation above has exact matching requirement for the pairs of points on the two shapes. For noisy data, it is best to

    Read more →
  • Cache language model

    Cache language model

    A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution. Statistical language models are key components of speech recognition systems and of many machine translation systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a cache component and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems. To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) N-gram language models will assign a very low probability to the word "elephant" because it is a very rare word in English. If the speech recognition system does not contain a cache component, the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times, the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of machine learning and more specifically of pattern recognition. There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability). The cache language model was first proposed in a paper published in 1990, after which the IBM speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in word-error rates once the first few hundred words of a document had been dictated. A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium training data sizes". The development of the cache language model has generated considerable interest among those concerned with computational linguistics in general and statistical natural language processing in particular: recently, there has been interest in applying the cache language model in the field of statistical machine translation. The success of the cache language model in improving word prediction rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness". Recently, the cache language model concept – originally conceived for the N-gram statistical language model paradigm – has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the recurrent neural network (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity. Another recent line of research involves incorporating a cache component in a feed-forward neural language model (FN-LM) to achieve rapid domain adaptation.

    Read more →
  • DreamLab

    DreamLab

    DreamLab was a volunteer computing Android and iOS app launched in 2015 by Imperial College London and the Vodafone Foundation. It was discontinued on 2nd April 2025. == Description == The app helped to research cancer, COVID-19, new drugs and tropical cyclones. To do this, DreamLab accessed part of the device's processing power, with the user's consent, while the owner charged their smartphone, to speed up the calculations of the algorithms from Imperial College London. The aim of the tropical cyclone project was to prepare for climate change risks. Other projects aimed to find existing drugs and food molecules that could help people with COVID-19 and other diseases. The performance of 100,000 smartphones would reach the annual output of all research computers at Imperial College in just three months, with a nightly runtime of six hours. The app was developed in 2015 by the Garvan Institute of Medical Research in Sydney and the Vodafone Foundation. In May 2020, the project had over 490,000 registered users.

    Read more →
  • Superquadrics

    Superquadrics

    In mathematics, the superquadrics or super-quadrics (also superquadratics) are a family of geometric shapes defined by formulas that resemble those of ellipsoids and other quadrics, except that the squaring operations are replaced by arbitrary powers. They can be seen as the three-dimensional relatives of the superellipses. The term may refer to the solid object or to its surface, depending on the context. The equations below specify the surface; the solid is specified by replacing the equality signs by less-than-or-equal signs. The superquadrics include many shapes that resemble cubes, octahedra, cylinders, lozenges and spindles, with rounded or sharp corners. Because of their flexibility and relative simplicity, they are popular geometric modeling tools, especially in computer graphics. It becomes an important geometric primitive widely used in computer vision, robotics, and physical simulation. Some authors, such as Alan Barr, define "superquadrics" as including both the superellipsoids and the supertoroids. In modern computer vision literatures, superquadrics and superellipsoids are used interchangeably, since superellipsoids are the most representative and widely utilized shape among all the superquadrics. Comprehensive coverage of geometrical properties of superquadrics and methods of their recovery from range images and point clouds are covered in several computer vision literatures. == Formulas == === Implicit equation === The surface of the basic superquadric is given by | x | r + | y | s + | z | t = 1 {\displaystyle \left|x\right|^{r}+\left|y\right|^{s}+\left|z\right|^{t}=1} where r, s, and t are positive real numbers that determine the main features of the superquadric. Namely: less than 1: a pointy octahedron modified to have concave faces and sharp edges. exactly 1: a regular octahedron. between 1 and 2: an octahedron modified to have convex faces, blunt edges and blunt corners. exactly 2: a sphere greater than 2: a cube modified to have rounded edges and corners. infinite (in the limit): a cube Each exponent can be varied independently to obtain combined shapes. For example, if r=s=2, and t=4, one obtains a solid of revolution which resembles an ellipsoid with round cross-section but flattened ends. This formula is a special case of the superellipsoid's formula if (and only if) r = s. If any exponent is allowed to be negative, the shape extends to infinity. Such shapes are sometimes called super-hyperboloids. The basic shape above spans from -1 to +1 along each coordinate axis. The general superquadric is the result of scaling this basic shape by different amounts A, B, C along each axis. Its general equation is | x A | r + | y B | s + | z C | t = 1. {\displaystyle \left|{\frac {x}{A}}\right|^{r}+\left|{\frac {y}{B}}\right|^{s}+\left|{\frac {z}{C}}\right|^{t}=1.} === Parametric description === Parametric equations in terms of surface parameters u and v (equivalent to longitude and latitude if m equals 2) are x ( u , v ) = A g ( v , 2 r ) g ( u , 2 r ) y ( u , v ) = B g ( v , 2 s ) f ( u , 2 s ) z ( u , v ) = C f ( v , 2 t ) − π 2 ≤ v ≤ π 2 , − π ≤ u < π , {\displaystyle {\begin{aligned}x(u,v)&{}=Ag\left(v,{\frac {2}{r}}\right)g\left(u,{\frac {2}{r}}\right)\\y(u,v)&{}=Bg\left(v,{\frac {2}{s}}\right)f\left(u,{\frac {2}{s}}\right)\\z(u,v)&{}=Cf\left(v,{\frac {2}{t}}\right)\\&-{\frac {\pi }{2}}\leq v\leq {\frac {\pi }{2}},\quad -\pi \leq u<\pi ,\end{aligned}}} where the auxiliary functions are f ( ω , m ) = sgn ⁡ ( sin ⁡ ω ) | sin ⁡ ω | m g ( ω , m ) = sgn ⁡ ( cos ⁡ ω ) | cos ⁡ ω | m {\displaystyle {\begin{aligned}f(\omega ,m)&{}=\operatorname {sgn}(\sin \omega )\left|\sin \omega \right|^{m}\\g(\omega ,m)&{}=\operatorname {sgn}(\cos \omega )\left|\cos \omega \right|^{m}\end{aligned}}} and the sign function sgn(x) is sgn ⁡ ( x ) = { − 1 , x < 0 0 , x = 0 + 1 , x > 0. {\displaystyle \operatorname {sgn}(x)={\begin{cases}-1,&x<0\\0,&x=0\\+1,&x>0.\end{cases}}} === Spherical product === Barr introduces the spherical product which given two plane curves produces a 3D surface. If f ( μ ) = ( f 1 ( μ ) f 2 ( μ ) ) , g ( ν ) = ( g 1 ( ν ) g 2 ( ν ) ) {\displaystyle f(\mu )={\begin{pmatrix}f_{1}(\mu )\\f_{2}(\mu )\end{pmatrix}},\quad g(\nu )={\begin{pmatrix}g_{1}(\nu )\\g_{2}(\nu )\end{pmatrix}}} are two plane curves then the spherical product is h ( μ , ν ) = f ( μ ) ⊗ g ( ν ) = ( f 1 ( μ ) g 1 ( ν ) f 1 ( μ ) g 2 ( ν ) f 2 ( μ ) ) {\displaystyle h(\mu ,\nu )=f(\mu )\otimes g(\nu )={\begin{pmatrix}f_{1}(\mu )\ g_{1}(\nu )\\f_{1}(\mu )\ g_{2}(\nu )\\f_{2}(\mu )\end{pmatrix}}} This is similar to the typical parametric equation of a sphere: x = x 0 + r sin ⁡ θ cos ⁡ φ y = y 0 + r sin ⁡ θ sin ⁡ φ ( 0 ≤ θ ≤ π , 0 ≤ φ < 2 π ) z = z 0 + r cos ⁡ θ {\displaystyle {\begin{aligned}x&=x_{0}+r\sin \theta \;\cos \varphi \\y&=y_{0}+r\sin \theta \;\sin \varphi \qquad (0\leq \theta \leq \pi ,\;0\leq \varphi <2\pi )\\z&=z_{0}+r\cos \theta \end{aligned}}} which give rise to the name spherical product. Barr uses the spherical product to define quadric surfaces, like ellipsoids, and hyperboloids as well as the torus, superellipsoid, superquadric hyperboloids of one and two sheets, and supertoroids. == Plotting code == The following GNU Octave code generates a mesh approximation of a superquadric:

    Read more →
  • Attempto Controlled English

    Attempto Controlled English

    Attempto Controlled English (ACE) is a controlled natural language, i.e. a subset of standard English with a restricted syntax and restricted semantics described by a small set of construction and interpretation rules. It has been under development at the University of Zurich since 1995. In 2013, ACE version 6.7 was announced. ACE can serve as knowledge representation, specification, and query language, and is intended for professionals who want to use formal notations and formal methods, but may not be familiar with them. Though ACE appears perfectly natural—it can be read and understood by any speaker of English—it is in fact a formal language. ACE and its related tools have been used in the fields of software specifications, theorem proving, proof assistants, text summaries, ontologies, rules, querying, medical documentation and planning. Here are some simple examples: Every woman is a human. A woman is a human. A man tries-on a new tie. If the tie pleases his wife then the man buys it. ACE construction rules require that each noun be introduced by a determiner (a, every, no, some, at least 5, ...). Regarding the list of examples above, ACE interpretation rules decide that (1) is interpreted as universally quantified, while (2) is interpreted as existentially quantified. Sentences like "Women are human" do not follow ACE syntax and are consequently not valid. Interpretation rules resolve the anaphoric references in (3): the tie and it of the second sentence refer to a new tie of the first sentence, while his and the man of the second sentence refer to a man of the first sentence. Thus an ACE text is a coherent entity of anaphorically linked sentences. The Attempto Parsing Engine (APE) translates ACE texts unambiguously into discourse representation structures (DRS) that use a variant of the language of first-order logic. A DRS can be further translated into other formal languages, for instance AceRules with various semantics, OWL, and SWRL. Translating an ACE text into (a fragment of) first-order logic allows users to reason about the text, for instance to verify, to validate, and to query it. == Overview == As an overview of the current version 6.6 of ACE this section: Briefly describes the vocabulary Gives an account of the syntax Summarises the handling of ambiguity Explains the processing of anaphoric references. === Vocabulary === The vocabulary of ACE comprises: Predefined function words (e.g. determiners, conjunctions) Predefined phrases (e.g. "it is false that ...", "it is possible that ...") Content words (e.g. nouns, verbs, adjectives, adverbs). === Grammar === The grammar of ACE defines and constrains the form and the meaning of ACE sentences and texts. ACE's grammar is expressed as a set of construction rules. The meaning of sentences is described as a small set of interpretation rules. A Troubleshooting Guide describes how to use ACE and how to avoid pitfalls. ==== ACE texts ==== An ACE text is a sequence of declarative sentences that can be anaphorically interrelated. Furthermore, ACE supports questions and commands. ==== Simple sentences ==== A simple sentence asserts that something is the case—a fact, an event, a state. The temperature is −2 °C. A customer inserts 2 cards. A card and a code are valid. Simple ACE sentences have the following general structure: subject + verb + complements + adjuncts Every sentence has a subject and a verb. Complements (direct and indirect objects) are necessary for transitive verbs (insert something) and ditransitive verbs (give something to somebody), whereas adjuncts (adverbs, prepositional phrases) are optional. All elements of a simple sentence can be elaborated upon to describe the situation in more detail. To further specify the nouns customer and card, we could add adjectives: A trusted customer inserts two valid cards. possessive nouns and of-prepositional phrases: John's customer inserts a card of Mary. or variables as appositions: John inserts a card A. Other modifications of nouns are possible through relative sentences: A customer who is trusted inserts a card that he owns. which are described below since they make a sentence composite. We can also detail the insertion event, e.g. by adding an adverb: A customer inserts some cards manually. or, equivalently: A customer manually inserts some cards. or, by adding prepositional phrases: A customer inserts some cards into a slot. We can combine all of these elaborations to arrive at: John's customer who is trusted inserts a valid card of Mary manually into a slot A. ==== Composite sentences ==== Composite sentences are recursively built from simpler sentences through coordination, subordination, quantification, and negation. Note that ACE composite sentences overlap with what linguists call compound sentences and complex sentences. ===== Coordination ===== Coordination by and is possible between sentences and between phrases of the same syntactic type. A customer inserts a card and the machine checks the code. There is a customer who inserts a card and who enters a code. A customer inserts a card and enters a code. An old and trusted customer enters a card and a code. Note that the coordination of the noun phrases a card and a code represents a plural object. Coordination by or is possible between sentences, verb phrases, and relative clauses. A customer inserts a card or the machine checks the code. A customer inserts a card or enters a code. A customer owns a card that is invalid or that is damaged. Coordination by and and or is governed by the standard binding order of logic, i.e. and binds stronger than or. Commas can be used to override the standard binding order. Thus the sentence: A customer inserts a VisaCard or inserts a MasterCard, and inserts a code. means that the customer inserts a VisaCard and a code, or alternatively a MasterCard and a code. ===== Subordination ===== There are four constructs of subordination: relative sentences, if-then sentences, modality, and sentence subordination. Relative sentences starting with who, which, and that allow to add detail to nouns: A customer who is trusted inserts a card that he owns. With the help of if-then sentences we can specify conditional or hypothetical situations: If a card is valid then a customer inserts it. Note the anaphoric reference via the pronoun it in the then-part to the noun phrase a card in the if-part. Modality allows us to express possibility and necessity: A trusted customer can/must insert a card. It is possible/necessary that a trusted customer inserts a card. Sentence subordination comes in various forms: It is true/false that a customer inserts a card. It is not provable that a customer inserts a card. A clerk believes that a customer inserts a card. ===== Quantification ===== Quantification allows us to speak about all objects of a certain class (universal quantification), or to denote explicitly the existence of at least one object of this class (existential quantification). The textual occurrence of a universal or existential quantifier opens its scope that extends to the end of the sentence, or in coordinations to the end of the respective coordinated sentence. To express that all involved customers insert cards we can write Every customer inserts a card. This sentence means that each customer inserts a card that may, or may not, be the same as the one inserted by another customer. To specify that all customers insert the same card—however unrealistic that situation seems—we can write: A card is inserted by every customer. or, equivalently: There is a card that every customer inserts. To state that every card is inserted by a customer we write: Every card is inserted by a customer. or, somewhat indirectly: For every card there is a customer who inserts it. ===== Negation ===== Negation allows us to express that something is not the case: A customer does not insert a card. A card is not valid. To negate something for all objects of a certain class one uses no: No customer inserts more than 2 cards. or, there is no: There is no customer who inserts a card. To negate a complete statement one uses sentence negation: It is false that a customer inserts a card. These forms of negation are logical negations, i.e. they state that something is provably not the case. Negation as failure states that a state of affairs cannot be proved, i.e. there is no information whether the state of affairs is the case or not. It is not provable that a customer inserts a card. ==== Queries ==== ACE supports two forms of queries: yes/no-queries and wh-queries. Yes/no-queries ask for the existence or non-existence of a specified situation. If we specified: A customer inserts a card. then we can ask: Does a customer insert a card? to get a positive answer. Note that interrogative sentences always end with a question mark. With the help of wh-queries, i.e. queries with query words, we can interrogate a text for details of the specified situation. If we specified: A

    Read more →
  • Universal IR Evaluation

    Universal IR Evaluation

    In computer science, Universal IR Evaluation (information retrieval evaluation) aims to develop measures of database retrieval performance that shall be comparable across all information retrieval tasks. == Measures of "relevance" == IR (information retrieval) evaluation begins whenever a user submits a query (search term) to a database. If the user is able to determine the relevance of each document in the database (relevant or not relevant), then for each query, the complete set of documents is naturally divided into four distinct (mutually exclusive) subsets: relevant documents that are retrieved, not relevant documents that are retrieved, relevant documents that are not retrieved, and not relevant documents that are not retrieved. These four subsets (of documents) are denoted by the letters a, b, c, d respectively and are called Swets variables, named after their inventor. In addition to the Swets definitions, four relevance metrics have also been defined: Recall refers to the fraction of relevant documents that are retrieved (a/(a+b)), and Precision refers to the fraction of retrieved documents that are relevant (a/(a+c)). These are the most commonly used and well-known relevance metrics found in the IR evaluation literature. Two less commonly used metrics include the Fallout, i.e., the fraction of not relevant documents that are retrieved (b/(b+d)), and the Miss, which refers to the fraction of relevant documents that are not retrieved (c/(c+d)) during any given search. == Universal IR evaluation techniques == Universal IR evaluation addresses the mathematical possibilities and relationships among the four relevance metrics Precision, Recall, Fallout and Miss, denoted by P, R, F and M, respectively. One aspect of the problem involves finding a mathematical derivation of a complete set of universal IR evaluation points. The complete set of 16 points, each one a quadruple of the form (P, R, F, M), describes all the possible universal IR outcomes. For example, many of us have had the experience of querying a database and not retrieving any documents at all. In this case, the Precision would take on the undetermined form 0/0, the Recall and Fallout would both be zero, and the Miss would be any value greater than zero and less than one (assuming a mix of relevant and not relevant documents were in the database, none of which were retrieved). This universal IR evaluation point would thus be denoted by (0/0, 0, 0, M), which represents only one of the 16 possible universal IR outcomes. The mathematics of universal IR evaluation is a fairly new subject since the relevance metrics P, R, F, M were not analyzed collectively until recently (within the past decade). A lot of the theoretical groundwork has already been formulated, but new insights in this area await discovery.

    Read more →
  • Statistical shape analysis

    Statistical shape analysis

    Statistical shape analysis is an analysis of the geometrical properties of some given set of shapes by statistical methods. For instance, it could be used to quantify differences between male and female gorilla skull shapes, normal and pathological bone shapes, leaf outlines with and without herbivory by insects, etc. Important aspects of shape analysis are to obtain a measure of distance between shapes, to estimate mean shapes from (possibly random) samples, to estimate shape variability within samples, to perform clustering and to test for differences between shapes. One of the main methods used is principal component analysis (PCA). Statistical shape analysis has applications in various fields, including medical imaging, computer vision, computational anatomy, sensor measurement, and geographical profiling. == Landmark-based techniques == In the point distribution model, a shape is determined by a finite set of coordinate points, known as landmark points. These landmark points often correspond to important identifiable features such as the corners of the eyes. Once the points are collected some form of registration is undertaken. This can be a baseline methods used by Fred Bookstein for geometric morphometrics in anthropology. Or an approach like Procrustes analysis which finds an average shape. David George Kendall investigated the statistical distribution of the shape of triangles, and represented each triangle by a point on a sphere. He used this distribution on the sphere to investigate ley lines and whether three stones were more likely to be co-linear than might be expected. Statistical distribution like the Kent distribution can be used to analyse the distribution of such spaces. Alternatively, shapes can be represented by curves or surfaces representing their contours, by the spatial region they occupy. == Shape deformations == Differences between shapes can be quantified by investigating deformations transforming one shape into another. In particular a diffeomorphism preserves smoothness in the deformation. This was pioneered in D'Arcy Thompson's On Growth and Form before the advent of computers. Deformations can be interpreted as resulting from a force applied to the shape. Mathematically, a deformation is defined as a mapping from a shape x to a shape y by a transformation function Φ {\displaystyle \Phi } , i.e., y = Φ ( x ) {\displaystyle y=\Phi (x)} . Given a notion of size of deformations, the distance between two shapes can be defined as the size of the smallest deformation between these shapes. Diffeomorphometry is the focus on comparison of shapes and forms with a metric structure based on diffeomorphisms, and is central to the field of Computational anatomy. Diffeomorphic registration, introduced in the 90's, is now an important player with existing codes bases organized around ANTS, DARTEL, DEMONS, LDDMM, StationaryLDDMM, and FastLDDMM are examples of actively used computational codes for constructing correspondences between coordinate systems based on sparse features and dense images. Voxel-based morphometry (VBM) is an important technology built on many of these principles. Methods based on diffeomorphic flows are also used. For example, deformations could be diffeomorphisms of the ambient space, resulting in the LDDMM (Large Deformation Diffeomorphic Metric Mapping) framework for shape comparison.

    Read more →
  • Region Based Convolutional Neural Networks

    Region Based Convolutional Neural Networks

    Region-based Convolutional Neural Networks (R-CNN) are a family of machine learning models for computer vision, and specifically object detection and localization. The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search over feature maps outputted by a CNN. R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera, locating text in an image, and enabling object detection in Google Lens. Mask R-CNN is also one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks. == History == The following covers some of the versions of R-CNN that have been developed. November 2013: R-CNN. April 2015: Fast R-CNN. June 2015: Faster R-CNN. March 2017: Mask R-CNN. December 2017: Cascade R-CNN is trained with increasing Intersection over Union (IoU, also known as the Jaccard index) thresholds, making each stage more selective against nearby false positives. June 2019: Mesh R-CNN adds the ability to generate a 3D mesh from a 2D image. == Architecture == For review articles see. === Selective search === Given an image (or an image-like feature map), selective search (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004), then performs the following: Input: (colour) image Output: Set of object location hypotheses L Segment image into initial regions R = {r1, ..., rn} using Felzenszwalb and Huttenlocher (2004) Initialise similarity set S = ∅ foreach Neighbouring region pair (ri, rj) do Calculate similarity s(ri, rj) S = S ∪ s(ri, rj) while S ≠ ∅ do Get highest similarity s(ri, rj) = max(S) Merge corresponding regions rt = ri ∪ rj Remove similarities regarding ri: S = S \ s(ri, r∗) Remove similarities regarding rj: S = S \ s(r∗, rj) Calculate similarity set St between rt and its neighbours S = S ∪ St R = R ∪ rt Extract object location boxes L from all regions in R === R-CNN === With R-CNN, prediction follows a two-step process. A preprocessing selective search step generates a large set of candidate objects (typically as many as 2000), known as regions of interest (ROI). These are forwarded to a CNN, which predicts an object class score and bounding box estimate, independently for each ROI. Importantly, the ROIs are heavily filtered to remove excess candidates. This is achieved using two mechanism. Filtering begins by removing ROIs assigned to the background category. This is a specialized category, which is scored by the CNN alongside other categories. An unfortunate reality is that remaining ROIs typically suffer from heavy duplication. Namely, multiple ROIs that cover same objects in the image are all assigned non-background categories. This is resolved by a heuristic non-maximum suppression (NMS) step. === Fast R-CNN === While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image. At the end of the network is a ROIPooling module, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses selective search to generate its region proposals. === Faster R-CNN === While Fast R-CNN used selective search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself. === Mask R-CNN === While previous versions of R-CNN focused on object detections, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.

    Read more →