AI Coding Github

AI Coding Github — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Ogle app

    Ogle app

    Ogle is a free smartphone based social media application. It is available for iOS and Android. Ogle acts like a school wide forum that lets users and users' classmates share and interact. Users can share photos, videos, questions, even thoughts and watch submissions grow in popularity as other users vote and comment on them. == App Features == Campus Feed: Interact by watching and posting videos or pictures to your campus story. Photos and Videos: share what you want with many different timing options. Interact: Chat with friends and groups, or share a moment for all to see. Real-name system: choose to register an account with username and profile picture. Custom Stickers: Create stickers to add creativity and zest to your pictures. Flash Interaction: All private chat and group chat history will be deleted after 24 hours on Ogle Chat. == Controversies == Users can post anything on Ogle using text, photos, and videos. As a result, some Ogle user's sense of anonymity, posts have targeted specific schools and students with abusive and hurtful content. The Ogle app's user anonymity makes it difficult for school officials to quickly investigate issues that occur within the Ogle app. On March 28, 2016, three people were arrested after violent threats were made against an Anaheim high school. 18-year-old Miguel Meza was arrested Sunday afternoon during a traffic stop, along with his passenger, 23-year-old Johnny Aguilar. Police said both men had loaded handguns. Aguilar was also accused of violating his probation. "It is concerning the fact that they did have firearms, but we don't have a crystal ball. We can't determine if they possessed those firearms to engage in some kind of school violence or if they had it for another reason," Sgt. Daron Wyatt with the Anaheim Police Department said. Officials said Meza and Aguilar have known gang ties and detectives began investigating Meza after threats were made against the school on Ogle. On February 29, 2016, Santa Cruz County sheriff's deputies arrested a 16-year-old Aptos High School student Friday, accused of making an online threat of gun violence at Aptos High and Monte Vista Christian."He basically told detectives that it was all a joke. It's not a joke. You have multiple resources being spent to investigate these cases," said Santa Cruz County Sheriff's Sgt. Roy Morales. The schools remained open throughout the week, with a huge police presence on campus. In an anonymous emailed statement to the Daily Pilot on Thursday, the "Ogle team" said: "We are aware of the concern, and cyberbullying is absolutely NOT our intention for the app. Our goal for this app is to create a free and safe community space for students, for a better communication. We are currently working around the clock to improve the app. As a matter of fact, we are also in contact with local police departments, anti-bullying organizations and local high schools to try to help the students." In response to these incidents, Ogle expressed that they takes the safety of its users seriously and does not condone any type of behavior that is illegal or in violation of its content policies. The company also said it has instituted a content moderation team to increase review and identify and remove inappropriate content, and take action against “those who violate our community guidelines.”

    Read more →
  • Online analytical processing

    Online analytical processing

    In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture. OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. OLAP is typically contrasted to OLTP (online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete). == Overview of OLAP systems == At the core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures that are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models. === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function (or aggregation function). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A search algorithm. Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation functions, and include COUNT, MAX, MIN, and SUM, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN, and MODE; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). == Types == OLAP systems have been traditionally categorized using the following taxonomy. === Multidimensional OLAP (MOLAP) === MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher-level aggregates of the data. It is very compact for low dimension data se

    Read more →
  • Materials informatics

    Materials informatics

    Materials informatics is a field of study that applies the principles of informatics and data science to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. The term "materials informatics" is frequently used interchangeably with "data science", "machine learning", and "artificial intelligence" by the community. This is an emerging field, with a goal to achieve high-speed and robust acquisition, management, analysis, and dissemination of diverse materials data with the goal of greatly reducing the time and risk required to develop, produce, and deploy new materials, which generally takes longer than 20 years. This field of endeavor is not limited to some traditional understandings of the relationship between materials and information. Some more narrow interpretations include combinatorial chemistry, process modeling, materials databases, materials data management, and product life cycle management. Materials informatics is at the convergence of these concepts, but also transcends them and has the potential to achieve greater insights and deeper understanding by applying lessons learned from data gathered on one type of material to others. By gathering appropriate meta data, the value of each individual data point can be greatly expanded. == Databases == Databases are essential for any informatics research and applications. In material informatics many databases exist containing both empirical data obtained experimentally, and theoretical data obtained computationally. Big data that can be used for machine learning is particularly difficult to obtain for experimental data due to the lack of a standard for reporting data and the variability in the experimental environment. This lack of big data has led to growing effort in developing machine learning techniques that utilize data extremely data sets. On the other hand, large uniform database of theoretical density functional theory (DFT) calculations exists. These databases have proven their utility in high-throughput material screening and discovery. Some common DFT databases and high throughput tools are listed below: Databases: MaterialsProject.org, MaterialsWeb.org (University of Florida) HT software: Pymatgen, MPInterfaces, Matminer == Beyond computational methods? == The concept of materials informatics is addressed by the Materials Research Society. For example, materials informatics was the theme of the December 2006 issue of the MRS Bulletin. The issue was guest-edited by John Rodgers of Innovative Materials, Inc., and David Cebon of Cambridge University, who described the "high payoff for developing methodologies that will accelerate the insertion of materials, thereby saving millions of investment dollars." The editors focused on the limited definition of materials informatics as primarily focused on computational methods to process and interpret data. They stated that "specialized informatics tools for data capture, management, analysis, and dissemination" and "advances in computing power, coupled with computational modeling and simulation and materials properties databases" will enable such accelerated insertion of materials. A broader definition of materials informatics goes beyond the use of computational methods to carry out the same experimentation, viewing materials informatics as a framework in which a measurement or computation is one step in an information-based learning process that uses the power of a collective to achieve greater efficiency in exploration. When properly organized, this framework crosses materials boundaries to uncover fundamental knowledge of the basis of physical, mechanical, and engineering properties. == Challenges == While there are many who believe in the future of informatics in the materials development and scaling process, many challenges remain. Hill, et al., write that "Today, the materials community faces serious challenges to bringing about this data-accelerated research paradigm, including diversity of research areas within materials, lack of data standards, and missing incentives for sharing, among others. Nonetheless, the landscape is rapidly changing in ways that should benefit the entire materials research enterprise." This remaining tension between traditional materials development methodologies and the use of more computationally, machine learning, and analytics approaches will likely exist for some time as the materials industry overcomes some of the cultural barriers necessary to fully embrace such new ways of thinking. == Analogy from Biology == The overarching goals of bioinformatics and systems biology may provide a useful analogy. Andrew Murray of Harvard University expresses the hope that such an approach "will save us from the era of "one graduate student, one gene, one PhD". Similarly, the goal of materials informatics is to save us from one graduate student, one alloy, one PhD. Such goals will require more sophisticated strategies and research paradigms than applying data-science methods to the same tasks set currently undertaken by students.

    Read more →
  • NCSA Brown Dog

    NCSA Brown Dog

    NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the National Center for Supercomputing Applications (NCSA) that is funded by the National Science Foundation (NSF). == History == Brown Dog is part of the DataNet partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the University of Illinois Urbana-Champaign as well as researchers from Boston University and the University of North Carolina at Chapel Hill. == Unstructured, uncurated, long tail data == Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project. == Approach == Brown Dog describes itself as the "super mutt" of software (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today. == Technology == Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog. === Data Tilling Service === Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given URL for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering. This service runs on port 9443. === Data Access Proxy === Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or Domain Name Service, the DAP configuration would be entered into a user's machine and browser settings. Data requests over HTTP would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves. This service runs on port 8184. == Use cases == Brown Dog targets three use cases proposed by groups within the EarthCube research communities. Developers and researchers from these communities will work together on use cases that span geoscience, engineering, biology and social science. === Long tail vegetation data in ecology and global change biology === This use case is led by Michael Dietze, Boston University Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation. === Designing green infrastructure considering storm water and human requirements === This use case is led by Barbara Minsker], University of Illinois at Urbana-Champaign]; William Sullivan, University of Illinois at Urbana-Champaign; Arthur Schmidt, University of Illinois at Urbana-Champaign. This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology. === Development and application for critical zone studies === This use case is led by Praveen Kumar, University of Illinois at Urbana-Champaign Critical Zone (CZ) is the "skin" of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes. The Critical Zone supports all terrestrial living systems. Its upper part is the bio-mantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this bio-dynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration. The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context "unstructured" data is viewed broadly as consisting of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text. == NSF Award == CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIU

    Read more →
  • ELMo

    ELMo

    ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors. It was created by researchers at the Allen Institute for Artificial Intelligence, and University of Washington and first released in February 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a contextual understanding of tokens. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. ELMo was historically important as a pioneer of self-supervised generative pretraining followed by fine-tuning, where a large model is trained to reproduce a large corpus, then the large model is augmented with additional task-specific weights and fine-tuned on supervised task data. It was an instrumental step in the evolution towards transformer-based language modelling. == Architecture == ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. The input text sequence is first mapped by an embedding layer into a sequence of vectors. Then two parts are run in parallel over it. The forward part is a 2-layered LSTM with 4096 units and 512 dimension projections, and a residual connection from the first to second layer. The backward part has the same architecture, but processes the sequence back-to-front. The outputs from all 5 components (embedding layer, two forward LSTM layers, and two backward LSTM layers) are concatenated and multiplied by a linear matrix ("projection matrix") to produce a 512-dimensional representation per input token. ELMo was pretrained on a text corpus of 1 billion words. The forward part is trained by repeatedly predicting the next token, and the backward part is trained by repeatedly predicting the previous token. After the ELMo model is pretrained, its parameters are frozen, except for the projection matrix, which can be fine-tuned to minimize loss on specific language tasks. This is an early example of the pretraining-fine-tune paradigm. The original paper demonstrated this by improving state of the art on six benchmark NLP tasks. === Contextual word representation === The architecture of ELMo accomplishes a contextual understanding of tokens. For example, the first forward LSTM of ELMo would process each input token in the context of all previous tokens, and the first backward LSTM would process each token in the context of all subsequent tokens. The second forward LSTM would then incorporate those to further contextualize each token. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. For example, consider the sentenceShe went to the bank to withdraw money.In order to represent the token "bank", the model must resolve its polysemy in context. The first forward LSTM would process "bank" in the context of "She went to the", which would allow it to represent the word to be a location that the subject is going towards. The first backward LSTM would process "bank" in the context of "to withdraw money", which would allow it to disambiguate the word as referring to a financial institution. The second forward LSTM can then process "bank" using the representation vector provided by the first backward LSTM, thus allowing it to represent it to be a financial institution that the subject is going towards. == Historical context == ELMo is one link in a historical evolution of language modelling. Consider a simple problem of document classification, where we want to assign a label (e.g., "spam", "not spam", "politics", "sports") to a given piece of text. The simplest approach is the "bag of words" approach, where each word in the document is treated independently, and its frequency is used as a feature for classification. This was computationally cheap but ignored the order of words and their context within the sentence. GloVe and Word2Vec built upon this by learning fixed vector representations (embeddings) for words based on their co-occurrence patterns in large text corpora. Like BERT (but unlike "bag of words" such as Word2Vec and GloVe), ELMo word embeddings are context-sensitive, producing different representations for words that share the same spelling. It was trained on a corpus of about 30 million sentences and 1 billion words. Previously, bidirectional LSTM was used for contextualized word representation. ELMo applied the idea to a large scale, achieving state of the art performance. After the 2017 publication of Transformer architecture, the architecture of ELMo was changed from a multilayered bidirectional LSTM to a Transformer encoder, giving rise to BERT. BERT has a similar pretrain-fine-tune workflow, but uses a Transformer with implications for more parallelizable training.

    Read more →
  • List of artificial intelligence projects

    List of artificial intelligence projects

    The following is a list of current and past, non-classified notable artificial intelligence projects. == Specialized projects == === Brain-inspired === Blue Brain Project, an attempt to create a synthetic brain by reverse-engineering the mammalian brain down to the molecular level. Google Brain, a deep learning project part of Google X attempting to have intelligence similar or equal to human-level. Human Brain Project, ten-year scientific research project, based on exascale supercomputers. === Cognitive architectures === 4CAPS, developed at Carnegie Mellon University under Marcel A. Just ACT-R, developed at Carnegie Mellon University under John R. Anderson. AIXI, Universal Artificial Intelligence developed by Marcus Hutter at IDSIA and ANU. CALO, a DARPA-funded, 25-institution effort to integrate many artificial intelligence approaches (natural language processing, speech recognition, machine vision, probabilistic logic, planning, reasoning, many forms of machine learning) into an AI assistant that learns to help manage your office environment. CHREST, developed under Fernand Gobet at Brunel University and Peter C. Lane at the University of Hertfordshire. CLARION, developed under Ron Sun at Rensselaer Polytechnic Institute and University of Missouri. CoJACK, an ACT-R inspired extension to the JACK multi-agent system that adds a cognitive architecture to the agents for eliciting more realistic (human-like) behaviors in virtual environments. Copycat, by Douglas Hofstadter and Melanie Mitchell at the Indiana University. DUAL, developed at the New Bulgarian University under Boicho Kokinov. FORR developed by Susan L. Epstein at The City University of New York. IDA and LIDA, implementing Global Workspace Theory, developed under Stan Franklin at the University of Memphis. OpenCog Prime, developed using the OpenCog Framework. Procedural Reasoning System (PRS), developed by Michael Georgeff and Amy L. Lansky at SRI International. Psi-Theory developed under Dietrich Dörner at the Otto-Friedrich University in Bamberg, Germany. Soar, developed under Allen Newell and John Laird at Carnegie Mellon University and the University of Michigan. Society of Mind and its successor The Emotion Machine proposed by Marvin Minsky. Subsumption architectures, developed e.g. by Rodney Brooks (though it could be argued whether they are cognitive). === Games === AlphaGo, software developed by Google that plays the Chinese board game Go. Chinook, a computer program that plays English draughts; the first to win the world champion title in the competition against humans. Deep Blue, a chess-playing computer developed by IBM which beat Garry Kasparov in 1997. Halite, an artificial intelligence programming competition created by Two Sigma in 2016. Libratus, a poker AI that beat world-class poker players in 2017, intended to be generalisable to other applications. The Matchbox Educable Noughts and Crosses Engine (sometimes called the Machine Educable Noughts and Crosses Engine or MENACE) was a mechanical computer made from 304 matchboxes designed and built by artificial intelligence researcher Donald Michie in 1961. Quick, Draw!, an online game developed by Google that challenges players to draw a picture of an object or idea and then uses a neural network to guess what the drawing is. The Samuel Checkers-playing Program (1959) was among the world's first successful self-learning programs, and as such a very early demonstration of the fundamental concept of artificial intelligence (AI). Stockfish AI, an open source chess engine currently ranked the highest in many computer chess rankings. TD-Gammon, a program that learned to play world-class backgammon partly by playing against itself (temporal difference learning with neural networks). === Internet activism === Serenata de Amor, project for the analysis of public expenditures and detect discrepancies. === Knowledge and reasoning === Alice (Microsoft), a project from Microsoft Research Lab aimed at improving decision-making in Economics Braina, an intelligent personal assistant application with a voice interface for Windows OS. Cyc, an attempt to assemble an ontology and database of everyday knowledge, enabling human-like reasoning. Eurisko, a language by Douglas Lenat for solving problems which consists of heuristics, including some for how to use and change its heuristics. Google Now, an intelligent personal assistant with a voice interface in Google's Android and Apple Inc.'s iOS, as well as Google Chrome web browser on personal computers. Holmes a new AI created by Wipro. Microsoft Cortana, an intelligent personal assistant with a voice interface in Microsoft's various Windows 10 editions. MindsDB, is an AI automation platform for building AI/ML powered features and applications. Mycin, an early medical expert system. Open Mind Common Sense, a project based at the MIT Media Lab to build a large common sense knowledge base from online contributions. Siri, an intelligent personal assistant and knowledge navigator with a voice-interface in Apple Inc.'s iOS and macOS. SNePS, simultaneously a logic-based, frame-based, and network-based knowledge representation, reasoning, and acting system. Viv (software), a new AI by the creators of Siri. Wolfram Alpha, an online service that answers queries by computing the answer from structured data. === Motion and manipulation === AIBO, the robot pet for the home, grew out of Sony's Computer Science Laboratory (CSL). Cog, a robot developed by MIT to study theories of cognitive science and artificial intelligence, now discontinued. === Music === Melomics, a bioinspired technology for music composition and synthesization of music, where computers develop their own style, rather than mimic musicians. === Natural language processing === AIML, an XML dialect for creating natural language software agents. Apache Lucene, a high-performance, full-featured text search engine library written entirely in Java. Apache OpenNLP, a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking and parsing. Artificial Linguistic Internet Computer Entity (A.L.I.C.E.), a natural language processing chatterbot. ChatGPT, a chatbot built on top of OpenAI's GPT-3.5 and GPT-4 family of large language models. Claude, a family of large language models developed by Anthropic and launched in 2023. Claude LLMs achieved high coding scores in several recognized LLM benchmarks. Cleverbot, successor to Jabberwacky, now with 170m lines of conversation, Deep Context, fuzziness and parallel processing. Cleverbot learns from around 2 million user interactions per month. DeepSeek: Chinese chatbot funded by hedge fund High-Flyer. DBRX, 136 billion parameter open sourced large language model developed by Mosaic ML and Databricks. ELIZA, a famous 1966 computer program by Joseph Weizenbaum, which parodied person-centered therapy. FreeHAL, a self-learning conversation simulator (chatterbot) which uses semantic nets to organize its knowledge to imitate a very close human behavior within conversations. Gemini, a family of multimodal large language model developed by Google's DeepMind. Drives the Gemini chatbot, formerly known as Bard. GigaChat, a chatbot by Russian Sberbank. GPT-3, a 2020 language model developed by OpenAI that can produce text difficult to distinguish from that written by a human. Jabberwacky, a chatbot by Rollo Carpenter, aiming to simulate natural human chat. LaMDA, a family of conversational neural language models developed by Google. LLaMA, a 2023 language model family developed by Meta that includes 7, 13, 33 and 65 billion parameter models.[1] Mycroft, a free and open-source intelligent personal assistant that uses a natural language user interface. PARRY, another early chatterbot, written in 1972 by Kenneth Colby, attempting to simulate a paranoid schizophrenic. SHRDLU, an early natural language processing computer program developed by Terry Winograd at MIT from 1968 to 1970. SYSTRAN, a machine translation technology by the company of the same name, used by Yahoo!, AltaVista and Google, among others. === Speech recognition === CMU Sphinx, a group of speech recognition systems developed at Carnegie Mellon University. DeepSpeech, an open-source Speech-To-Text engine based on Baidu's deep speech research paper. Whisper, an open-source speech recognition system developed at OpenAI. === Speech synthesis === 15.ai, a real-time artificial intelligence text-to-speech tool developed by an anonymous researcher from MIT. Amazon Polly, a speech synthesis software by Amazon. Festival Speech Synthesis System, a general multi-lingual speech synthesis system developed at the Centre for Speech Technology Research (CSTR) at the University of Edinburgh. WaveNet, a deep neural network for generating raw audio. === Video === CapCut is a video editor tool, developed

    Read more →
  • Documentalist

    Documentalist

    A documentalist is a professional, trained in documentation science and specializing in assisting researchers in their search for scientific and technical documentation. With the development of bibliographical databases such as MEDLINE, documentalists were professionals who searched such databases on the behalf of users. When the field of documentation changed its name to information science, the terms information specialist or information professional often replaced the term documentalist.

    Read more →
  • Artificial empathy

    Artificial empathy

    Artificial empathy or computational empathy is the development of AI systems—such as companion robots or virtual agents—that can detect emotions and respond to them in an empathic way. Although such technology can be perceived as scary or threatening, it could also have a significant advantage over humans for roles in which emotional expression can be important, such as in the health care sector. An October 2025 review and meta-analysis in the British Medical Bulletin found that AI chatbots were rated as showing more empathy than human healthcare professionals in 13 of 15 studies that compared them. Care-givers who perform emotional labor above and beyond the requirements of paid labor can experience chronic stress or burnout, and can become desensitized to patients. Artificial empathy could also help the socialization of care-givers, or serve as role model for emotional detachment. A broader definition of artificial empathy is "the ability of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the signals (s)he emits (e.g., facial expression, voice, gesture) or to predict a person's reaction (including, but not limited to internal states) when he or she is exposed to a given set of stimuli (e.g., facial expression, voice, gesture, graphics, music, etc.)". A 2025 study reported that some multimodal large language models can recognize basic facial expressions with human-level accuracy on a commonly used research dataset of posed facial expressions. == Areas of research == There are a variety of philosophical, theoretical, and applicative questions related to artificial empathy. For example: Which conditions would have to be met for a robot to respond competently to a human emotion? What models of empathy can or should be applied to Social and Assistive Robotics? Must the interaction of humans with robots imitate affective interaction between humans? Can a robot help science learn about affective development of humans? Would robots create unforeseen categories of inauthentic relations? What relations with robots can be considered authentic? How can we assess artificial empathy in AI systems? == Examples of artificial empathy research and practice == People often communicate and make decisions based on inferences about each other's internal states (e.g., emotional, cognitive, and physical states) that are in turn based on signals emitted by the person such as facial expression, body gesture, voice, and words. Broadly speaking, artificial empathy focuses on developing non-human models that achieve similar objectives using similar data. === Streams of artificial empathy research === Artificial empathy has been applied in various research disciplines, including artificial intelligence and business. Two main streams of research in this domain are: the use of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the signals he or she emits (e.g., facial expression, voice, gesture) the use of nonhuman models to predict a person's reaction when he or she is exposed to a given set of stimuli (e.g., facial expression, voice, gesture, graphics, music, etc.). Research on affective computing, such as emotional speech recognition and facial expression detection, falls within the first stream of artificial empathy. Contexts that have been studied include oral interviews, call centers, human-computer interaction, sales pitches, and financial reporting. The second stream of artificial empathy has been researched more in marketing contexts, such as advertising, branding, customer reviews, in-store recommendation systems, movies, and online dating. === Artificial empathy applications in practice === With the increasing volume of visual, audio, and text data in commerce, many business applications for artificial empathy have followed. For example, Affectiva analyses viewers' facial expressions from video recordings while they are watching video advertisements in order to optimize the content design of video ads. Software like HireVue, BarRaiser, a hiring intelligence firm, helps firms make recruitment decisions by analyzing audio and video information from candidates' video interviews. Lapetus Solutions develops a model to estimate an individual's longevity, health status, and disease susceptibility from a face photo. Their technology has been applied in the insurance industry. == Artificial empathy and human services == Although artificial intelligence cannot yet replace social workers themselves, the technology has been deployed in that field. Florida State University published a study about Artificial Intelligence being used in the human services field. The research used computer algorithms to analyze health records for combinations of risk factors that could predict a future suicide attempt. The article reports, "machine learning—a future frontier for artificial intelligence—can predict with 80% to 90% accuracy whether someone will attempt suicide as far off as two years into the future. The algorithms become even more accurate as a person's suicide attempt gets closer. For example, the accuracy climbs to 92% one week before a suicide attempt when artificial intelligence focuses on general hospital patients". Such algorithmic machines can help social workers. Social work operates on a cycle of engagement, assessment, intervention, and evaluation with clients. Earlier assessment for risk of suicide can lead to earlier interventions and prevention, therefore saving lives. The system would learn, analyze, and detect risk factors, alerting the clinician of a patient's suicide risk score (analogous to a patient's cardiovascular risk score). Then, social workers could step in for further assessment and preventive intervention.

    Read more →
  • Naked Objects for .NET

    Naked Objects for .NET

    Naked Objects for .NET or Naked Objects MVC is a software framework that builds upon the ASP.NET MVC framework. As the name suggests, the framework synthesizes two architectural patterns: naked objects and model–view–controller (MVC). These two patterns have been considered as antithetical. However, Trygve Reenskaug (the inventor of the MVC pattern) has made it clear that he does not see it that way, in his foreword to Richard Pawson's PhD thesis on the Naked Objects pattern. The Naked Objects MVC framework will take a domain model (written as Plain Old CLR Objects) and render it as a complete HTML application without the need for writing any user interface code - by means of a small set of generic View and Controller classes. The framework uses reflection rather than code generation. The developer may then choose to create customised Views and/or Controllers, using standard ASP.NET MVC patterns, for use where the generic user interface is not suitable.

    Read more →
  • Enumeration algorithm

    Enumeration algorithm

    In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms. == Formal definitions == An enumeration problem P {\displaystyle P} is defined as a relation R {\displaystyle R} over strings of an arbitrary alphabet Σ {\displaystyle \Sigma } : R ⊆ Σ ∗ × Σ ∗ {\displaystyle R\subseteq \Sigma ^{}\times \Sigma ^{}} An algorithm solves P {\displaystyle P} if for every input x {\displaystyle x} the algorithm produces the (possibly infinite) sequence y {\displaystyle y} such that y {\displaystyle y} has no duplicate and z ∈ y {\displaystyle z\in y} if and only if ( x , z ) ∈ R {\displaystyle (x,z)\in R} . The algorithm should halt if the sequence y {\displaystyle y} is finite. == Common complexity classes == Enumeration problems have been studied in the context of computational complexity theory, and several complexity classes have been introduced for such problems. A very general such class is EnumP, the class of problems for which the correctness of a possible output can be checked in polynomial time in the input and output. Formally, for such a problem, there must exist an algorithm A which takes as input the problem input x, the candidate output y, and solves the decision problem of whether y is a correct output for the input x, in polynomial time in x and y. For instance, this class contains all problems that amount to enumerating the witnesses of a problem in the class NP. Other classes that have been defined include the following. In the case of problems that are also in EnumP, these problems are ordered from least to most specific: Output polynomial, the class of problems whose complete output can be computed in polynomial time. Incremental polynomial time, the class of problems where, for all i, the i-th output can be produced in polynomial time in the input size and in the number i. Polynomial delay, the class of problems where the delay between two consecutive outputs is polynomial in the input (and independent from the output). Strongly polynomial delay, the class of problems where the delay before each output is polynomial in the size of this specific output (and independent from the input or from the other outputs). The preprocessing is generally assumed to be polynomial. Constant delay, the class of problems where the delay before each output is constant, i.e., independent from the input and output. The preprocessing phase is generally assumed to be polynomial in the input. == Common techniques == Backtracking: The simplest way to enumerate all solutions is by systematically exploring the space of possible results (partitioning it at each successive step). However, performing this may not give good guarantees on the delay, i.e., a backtracking algorithm may spend a long time exploring parts of the space of possible results that do not give rise to a full solution. Flashlight search: This technique improves on backtracking by exploring the space of all possible solutions but solving at each step the problem of whether the current partial solution can be extended to a partial solution. If the answer is no, then the algorithm can immediately backtrack and avoid wasting time, which makes it easier to show guarantees on the delay between any two complete solutions. In particular, this technique applies well to self-reducible problems. Closure under set operations: If we wish to enumerate the disjoint union of two sets, then we can solve the problem by enumerating the first set and then the second set. If the union is non disjoint but the sets can be enumerated in sorted order, then the enumeration can be performed in parallel on both sets while eliminating duplicates on the fly. If the union is not disjoint and both sets are not sorted then duplicates can be eliminated at the expense of a higher memory usage, e.g., using a hash table. Likewise, the cartesian product of two sets can be enumerated efficiently by enumerating one set and joining each result with all results obtained when enumerating the second step. == Examples of enumeration problems == The vertex enumeration problem, where we are given a polytope described as a system of linear inequalities and we must enumerate the vertices of the polytope. Enumerating the minimal transversals of a hypergraph. This problem is related to monotone dualization and is connected to many applications in database theory and graph theory. Enumerating the answers to a database query, for instance a conjunctive query or a query expressed in monadic second-order. There have been characterizations in database theory of which conjunctive queries could be enumerated with linear preprocessing and constant delay. The problem of enumerating maximal cliques in an input graph, e.g., with the Bron–Kerbosch algorithm Listing all elements of structures such as matroids and greedoids Several problems on graphs, e.g., enumerating independent sets, paths, cuts, etc. Enumerating the satisfying assignments of representations of Boolean functions, e.g., a Boolean formula written in conjunctive normal form or disjunctive normal form, a binary decision diagram such as an OBDD, or a Boolean circuit in restricted classes studied in knowledge compilation, e.g., NNF. == Connection to computability theory == The notion of enumeration algorithms is also used in the field of computability theory to define some high complexity classes such as RE, the class of all recursively enumerable problems. This is the class of sets for which there exist an enumeration algorithm that will produce all elements of the set: the algorithm may run forever if the set is infinite, but each solution must be produced by the algorithm after a finite time.

    Read more →
  • AI: When a Robot Writes a Play

    AI: When a Robot Writes a Play

    AI: When a Robot Writes a Play (in Czech: AI: Když robot píše hru) is a 2021 experimental theatre play, where 90% of its script was automatically generated by artificial intelligence (the GPT-2 language model). The play is in Czech language, but an English version of the script also exists. == Creation == The play is the first result of the THEaiTRE research project, aiming to commemorate the centenary of the R.U.R. play by Karel Čapek by investigating to what extent artificial intelligence could be used to create theatre play scripts. The script of the play was created using the THEaiTRobot tool, based on the GPT-2 language model. First, the play dramaturge, David Košťák, described the initial setting of each scene in a few sentences, and wrote the first line for each character. Next, THEaiTRobot suggested a continuation of the script, which the dramaturge could use, reject, or use part of it and let the tool generate a new continuation. Another option was to manually insert another line or a scenic remark. The script was generated in English and was automatically translated to Czech by the state-of-the-art CUBBITT machine translation tool. The resulting script was then further post-edited by the dramaturge. The resulting script was made freely available for non-commercial use both in English and in Czech, with marked manually inserted texts and manual edits. The analysis shows that 90% of the English script is automatically generated, with 10% manually written or manually post-edited. In the Czech script, a larger amount of edits were made, but the analysis claims that these additional edits are corrections of errors of the automated translation and stylistic corrections which do not change the meaning of the lines as represented by the English script, but rather bring the Czech script closer to the English one. == Characters == The play contains 9 characters. The Robot appears in all the scenes, while each of the other characters appears in only one scene. Robot – The lead character, a male humanoid robot. Master – An old man, the creator of the Robot. Boy – A schoolboy. Masseuse – A sex worker in a brothel. Stranger – An engineer. Man. Psychologist. Administrator – A female clerk at an employment agency. Actress – A film actress and a model in a robot-like costume. == Plot == The play is composed of 8 scenes. It tells the story of a humanoid robot, who encounters 8 other characters and engages into various typically human situations and activities, related to death, love, sex, violence, etc. The individual scenes are not tightly linked, but there are some linking points, such as the central character of the robot or some repeated and developing themes, such as the robot's search for love. The scenes often contain some absurd turns and it is often hard to find sense in them. It is therefore a very complicated piece interpretationally, requiring the director and the actors to invest a lot of effort and creativity in finding a meaningful interpretation which would not deviate from the script. In the interpretation by Švanda theatre, who premiered the play and who also participated on the creation of the script, the scenes typically contain non-verbally expressed content which can add a lot to the meaning of the scene compared to what is contained in the actual script (as the script only contains the lines said by the characters). === Scene 1: Death === The play opens by the Robot parting with his dying Master. The Master gives the Robot several last lessons and talks with him about death, soul, and love. === Scene 2: Sense of Humour === In the second scene, the Robot meets a sad and angry Boy, who complains that he wants to go to school, that his girlfriend is crazy, that he wants to buy a car, etc. The Robot tries to help the Boy by giving him advice, but the Boy's reactions are quite negative and irritated. The Boy then repeatedly asks the Robot to tell him a joke; the Robot keeps refusing, but ultimately tells the following joke: When you are dead. When your children are dead. When your grandchildren are dead, I will be still alive. === Scene 3: Nightclub === The Robot wants to feel pleasure, so he goes to a "night club" (a brothel), where he meets a "Masseuse" (a prostitute). The Robot is initially "a bit cold", but eventually manages to enjoy the experience and falls in love with the Masseuse. In the Švanda theatre performance, the Robot and the Masseuse seem to have a sort of virtual sex without touching each other, reminiscent of the sex scene in Demolition Man. === Scene 4: Fear of the Dark === It is the night. The Robot is standing under a lamp, unable to move away from the light as he finds that he is afraid of the dark. He meets a Stranger, an engineer who tells him that robots don't have feelings and that people cannot be trusted, and keeps hurting him. In the Švanda theatre performance, the Man repeatedly zaps the Robot with some kind of electric pulse. === Scene 5: Killer Robot === A Man approaches the Robot and repeatedly asks him to kill him. Instead, the Robot sticks a finger into the Man's anus, which leads to an argument between the Man and the Robot. === Scene 6: Burn Out === The Robot meets a Psychologist, who keeps asking him lots of questions regarding his life, burnout feeling, love, relationships, and emotions. They also talk about the Robot using a device called emotion machine which helps him to get rid of stress. === Scene 7: Search for Job === The Robot comes to an employment agency. He meets an Administrator and asks her to help him find a job. He expresses the wish to become an actor, and talks about his experience as a clown. He reveals his name to be Troy McClure, which is a character from The Simpsons who is an actor. In the Švanda theatre performance, the Administrator starts to seduce the Robot once his name is revealed, which he keeps ignoring; the Administrator then becomes irritated. === Scene 8: Love at First Sight === The Robot meets a human Actress in a robotic costume and falls in love with her immediately. The Actress is first reluctant, but the Robot manages to seduce her and she also falls in love with him. The Robot tells her about a binary world, in which he lives and where he will also take her. Ultimately, the Actress agrees, and the whole play concludes by the Robot and the Actress promising each to other to always be together. In the Švanda theatre performance, the Robot does not have a physical body in this scene, we can only hear his voice and see a pulsating light (based on the line in the script where the Robot says: "I have no body. So I don't need to wear clothes. You can't see me, you only hear me."), and the Actress eventually also agrees to lose her physical body so that she can be with the Robot forever. == Theatrical performances == The play premiered on 26 February 2021 in Švanda Theatre in Prague, Czech Republic, directed by Daniel Hrbek. Due to the COVID-19 pandemic, the play was not played in front of a live audience, but it was broadcast online, in Czech language with English subtitles. The play was followed by a panel discussion by the project members and experts on artificial intelligence. The premiere was viewed by 13,498 spectators worldwide. A short trailer of the premiere is available on YouTube. In 2021, after the opening of the theatres in the Czech Republic to spectators, the play can be viewed at Švanda Theatre. The performance takes approximately 60 minutes, and is followed by a discussion of the creators with the audience. The derniere is planned for 4 February 2023. == Reception == The play received a number of reviews, both in its country of origin as well as internationally. It is praised as first of its kind, although some reviewers note the similarity to previous works, such as the musical Beyond the Fence, the play Lifestyle of the Richard and Family, or the short movie Sunspring; however, these works used less advanced technology, and either were very short (Sunspring) or necessitated a larger amount of human interventions. The reviewers note that the script is far from perfect, with many inconsistencies and nonsensical parts, and conclude that the technology is definitely not yet ready to replace human authors; however, some find some parts of the script frighteningly human-like. The amount of human intervention is a somewhat controversial topic, with some reviewers finding the human influence too large (especially in interpreting the script and putting the play on scene), while others feel that a greater amount of human intervention would have been favorable as this could greatly improve the quality of the play. The reviews also frequently comment on the amount of sex, violence and strong language in the play; this can be attributed to the method used for creating the script, where the GPT-2 language model reflects topics and language common in the human-written articles on the internet that were used to train the model. Furthermore, some r

    Read more →
  • Sparse identification of non-linear dynamics

    Sparse identification of non-linear dynamics

    Sparse identification of nonlinear dynamics (SINDy) is a data-driven algorithm for obtaining dynamical systems from data. Given a series of snapshots of a dynamical system and its corresponding time derivatives, SINDy performs a sparsity-promoting regression (such as LASSO and sparse Bayesian inference) on a library of nonlinear candidate functions of the snapshots against the derivatives to find the governing equations. This procedure relies on the assumption that most physical systems only have a few dominant terms which dictate the dynamics, given an appropriately selected coordinate system and quality training data. It has been applied to identify the dynamics of fluids, based on proper orthogonal decomposition, as well as other complex dynamical systems, such as biological networks. == Mathematical Overview == First, consider a dynamical system of the form x ˙ = d d t x ( t ) = f ( x ( t ) ) , {\displaystyle {\dot {\textbf {x}}}={\frac {d}{dt}}{\textbf {x}}(t)={\textbf {f}}({\textbf {x}}(t)),} where x ( t ) ∈ R n {\displaystyle {\textbf {x}}(t)\in \mathbb {R} ^{n}} is a state vector (snapshot) of the system at time t {\displaystyle t} and the function f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} defines the equations of motion and constraints of the system. The time derivative may be either prescribed or numerically approximated from the snapshots. With x {\displaystyle {\textbf {x}}} and x ˙ {\displaystyle {\dot {\textbf {x}}}} sampled at m {\displaystyle m} equidistant points in time ( t 1 , t 2 , ⋯ , t m {\displaystyle t_{1},t_{2},\cdots ,t_{m}} ), these can be arranged into matrices of the form X = [ x T ( t 1 ) x T ( t 2 ) ⋮ x T ( t m ) ] = [ x 1 ( t 1 ) x 2 ( t 1 ) ⋯ x n ( t 1 ) x 1 ( t 2 ) x 2 ( t 2 ) ⋯ x n ( t 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( t m ) x 2 ( t m ) ⋯ x n ( t m ) ] , {\displaystyle {\bf {{X}={\begin{bmatrix}\mathbf {x} ^{\mathsf {T}}(t_{1})\\\mathbf {x} ^{\mathsf {T}}(t_{2})\\\vdots \\\mathbf {x} ^{\mathsf {T}}(t_{m})\end{bmatrix}}={\begin{bmatrix}x_{1}(t_{1})&x_{2}(t_{1})&\cdots &x_{n}(t_{1})\\x_{1}(t_{2})&x_{2}(t_{2})&\cdots &x_{n}(t_{2})\\\vdots &\vdots &\ddots &\vdots \\x_{1}(t_{m})&x_{2}(t_{m})&\cdots &x_{n}(t_{m})\end{bmatrix}},}}} and similarly for X ˙ {\displaystyle {\dot {\mathbf {X} }}} . Next, a library Θ ( X ) {\displaystyle \mathbf {\Theta } (\mathbf {X} )} of nonlinear candidate functions of the columns of X {\displaystyle {\textbf {X}}} is constructed, which may be constant, polynomial, or more exotic functions (like trigonometric and rational terms, and so on): Θ ( X ) = [ | | | | | | 1 X X 2 X 3 ⋯ sin ⁡ ( X ) cos ⁡ ( X ) ⋯ | | | | | | ] {\displaystyle \ \ \ {\bf {{\Theta }({\bf {{X})={\begin{bmatrix}\vline &\vline &\vline &\vline &&\vline &\vline &\\1&{\bf {X}}&{\bf {{X}^{2}}}&{\bf {{X}^{3}}}&\cdots &\sin({\bf {{X})}}&\cos({\bf {{X})}}&\cdots \\\vline &\vline &\vline &\vline &&\vline &\vline &\end{bmatrix}}}}}}} The number of possible model structures from this library is combinatorially high. f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} is then substituted by Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} and a vector of coefficients Ξ = [ ξ 1 ξ 2 ⋯ ξ n ] {\displaystyle {\bf {{\Xi }=\left[{\bf {{\xi }_{1}{\bf {{\xi }_{2}\cdots {\bf {{\xi }_{n}}}}}}}\right]}}} determining the active terms in f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} : X ˙ = Θ ( X ) Ξ {\displaystyle {\dot {\bf {X}}}={\bf {{\Theta }({\bf {{X}){\bf {\Xi }}}}}}} Because only a few terms are expected to be active at each point in time, an assumption is made that f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} admits a sparse representation in Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} . This then becomes an optimization problem in finding a sparse Ξ {\displaystyle {\bf {\Xi }}} which optimally embeds X ˙ {\displaystyle {\dot {\textbf {X}}}} . In other words, a parsimonious model is obtained by performing least squares regression on the system (4) with sparsity-promoting ( L 1 {\displaystyle L_{1}} ) regularization ξ k = arg ⁡ min ξ k ′ | | X ˙ k − Θ ( X ) ξ k ′ | | 2 + λ | | ξ k ′ | | 1 , {\displaystyle {\bf {{\xi }_{k}={\underset {\bf {{\xi }'_{k}}}{\arg \min }}\left|\left|{\dot {\bf {X}}}_{k}-{\bf {{\Theta }({\bf {{X}){\bf {{\xi }'_{k}}}}}}}\right|\right|_{2}+\lambda \left|\left|{\bf {{\xi }'_{k}}}\right|\right|_{1},}}} where λ {\displaystyle \lambda } is a regularization parameter. Finally, the sparse set of ξ k {\displaystyle {\bf {{\xi }_{k}}}} can be used to reconstruct the dynamical system: x ˙ k = Θ ( x ) ξ k {\displaystyle {\dot {x}}_{k}={\bf {{\Theta }({\bf {{x}){\bf {{\xi }_{k}}}}}}}}

    Read more →
  • Cleverbot

    Cleverbot

    Cleverbot is a chatterbot web application. It was created by British AI scientist Rollo Carpenter and launched in October 2008. It was preceded by Jabberwacky, a chatbot project that began in 1988 and went online in 1997. In its first decade, Cleverbot held several thousand conversations with Carpenter and his associates. Since launching on the web, the number of conversations held has exceeded 150 million. Besides the web application, Cleverbot is also available as an iOS, Android, and Windows Phone app. == Operation == Cleverbot's responses are not pre-programmed because it learns from human input: Humans type into the box below the Cleverbot logo and the system finds all keywords or an exact phrase matching the input. After searching through its saved conversations, it responds to the input by finding how a human responded to that input when it was asked, in part or in full, by Cleverbot. Cleverbot participated in a formal Turing test at the 2011 Techniche festival at the Indian Institute of Technology Guwahati on 3 September 2011. Out of the 1334 votes cast, Cleverbot was judged to be 59.3% human, compared to the rating of 63.3% human achieved by human participants. A score of 50.05% or higher is often considered to be a passing grade. The software running for the event had to handle just 1 or 2 simultaneous requests, whereas online Cleverbot is usually talking to around 10,000 to 50,000 people at once. == Developments == Cleverbot is constantly growing in data size at the rate of 4 to 7 million interactions per day. Updates to the software have been mostly behind the scenes. In 2014, Cleverbot was upgraded to use GPU serving techniques. Unlike Eliza, the program does not respond in a fixed way, instead choosing its responses heuristically using fuzzy logic, the whole of the conversation being compared to the millions that have taken place before. Cleverbot now uses over 279 million interactions, about 3-4% of the data it has already accumulated. The developers of Cleverbot are attempting to build a new version using machine learning techniques. An app that uses the Cleverscript engine to play a game of 20 Questions has been launched under the name Clevernator. Unlike other such games, the player asks the questions and it is the role of the AI to understand, and answer factually. An app that allows owners to create and talk to their own small Cleverbot-like AI has been launched, called Cleverme! for Apple products. == In popular culture == Cleverbot received media attention after being featured in the popular 2010 creepypasta ARG web serial Ben Drowned by Alexander D. Hall. In early 2017, a Twitch stream of two Google Home devices modified to talk to each other using Cleverbot garnered over 700,000 visitors and over 30,000 peak concurrent viewers.

    Read more →
  • Online analytical processing

    Online analytical processing

    In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture. OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. OLAP is typically contrasted to OLTP (online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete). == Overview of OLAP systems == At the core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures that are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models. === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function (or aggregation function). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A search algorithm. Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation functions, and include COUNT, MAX, MIN, and SUM, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN, and MODE; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). == Types == OLAP systems have been traditionally categorized using the following taxonomy. === Multidimensional OLAP (MOLAP) === MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher-level aggregates of the data. It is very compact for low dimension data se

    Read more →
  • Known-item search

    Known-item search

    Known-item search is a specialization of information exploration which represents the activities carried out by searchers who have a particular item in mind. In the context of library catalogs, known‐item search means a search for an item for which the author or title is known. Although the concept of known-item search originated in library science, it is now applied in the context of web search and other online search activities. Known-item search is distinguished from exploratory search, in which a searcher is unfamiliar with the domain of their search goal, unsure about the ways to achieve their goal, and/or unsure about what their goal is.

    Read more →