AI for Business

Explore the best AI for Business — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Grammatik

    Grammatik

    Grammatik was the first grammar-checking program for home computers. Aspen Software of Albuquerque, NM, released the earliest version of this diction and style checker for personal computers. It was first released no later than 1981, and was inspired by the Writer's Workbench. Grammatik was first available for the TRS-80, and soon had versions for CP/M and the IBM PC. Reference Software International of San Francisco, California, acquired Grammatik in 1985. Development of Grammatik continued, and it became an actual grammar checker that could detect writing errors beyond simple style checking. Subsequent versions were released for MS-DOS, Windows, Macintosh, and Unix. Grammatik was ultimately acquired by WordPerfect Corporation and is integrated into the WordPerfect word processor.

    Read more →
  • Ecoinformatics

    Ecoinformatics

    Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics. A few definitions have been circulating, mostly centered on the creation of tools to access and analyze natural system data. However, the scope and aims of ecoinformatics are certainly broader than the development of metadata standards to be used in documenting datasets. Ecoinformatics aims to facilitate environmental research and management by developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different environmental datasets to be combined to test ecological hypotheses. Ecoinformatics is related to the concept of ecosystem services. Ecoinformatics characterize the semantics of natural system knowledge. For this reason, much of today's ecoinformatics research relates to the branch of computer science known as knowledge representation, and active ecoinformatics projects are developing links to activities such as the Semantic Web. Current initiatives to effectively manage, share, and reuse ecological data are indicative of the increasing importance of fields like ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE, Data Conservancy, and Artificial Intelligence for Environment & Sustainability. == Software Development Lifecycle == Central to the concept of ecoinformatics is the Software Development Lifecycle (SDLC), a systematic framework for writing, implementing, and maintaining software products. Typically in Ecoinformatics projects, the development pipeline includes data collection, usually from several different environmental data sources, then integrating these data sources together, and then analyzing the data. Here, each step of the SDLC is described in the context of ecoinformatics, per Michener et al. It is important to note that the plan, collect, assure, describes and preserve steps refer to the data collection entity, which can be individual researchers or large data-collection networks, while the discover, integrate, and analyze steps typically refer to the individual researcher. Plan: Ecoinformatics projects require data from several databases. Each database holds different data, and therefore researchers should identify what types of environmental or ecological data they will need to answer their research question. Collect: Data is collected in several different ways. In ecoinformatics, this is usually restricted to manually entering data into a spreadsheet, and parsing data from an existing database. The growth of relational databases has made it easier for ecologists to download relevant data and integrate datasets together Assure: Data entries should be checked thoroughly to validate their accuracy and usability, such as to check for outliers and erroneous points. The same principle applies to data downloaded from datasets. This responsibility falls on both the ecologist downloading the data, and the entity that sets up the data collection system. Describe: An accurate description of the metadata of a dataset that is used in a study should include enough information to deduce the data collection and processing methodology, when the data were collected, why the data were collected, and how the data were stored. This is important for reproducibility, especially for projects that build on each other and may recycle data Preserve: After data is collected by an institutional entity, it should be archived such that it is easily accessible. Ideally, this is in databases that are maintained and not at risk of deprecation Discover: While there are good practices for discovering data to start a research project, this process is often marred by a lack of usable, published data, as researchers may collect data specific to their study, but may not publish this data for wider use. On the data collection end, this can be addressed by better data-sharing practices, such as by linking datasets when publishing papers or studies. On the data procurement end, this can be addressed by more precise data searching, such as using key words to find relevant datasets. Integrate: Synthesizing datasets together can be difficult and labor-intensive, largely due to the methodological differences in data collection. There are several approaches to this, but the best practices typically involve computational approaches, namely using R or Python, to automate the processes and prevent errors Analyze: Data analysis can take several forms, and should be tailored to the specific ecological project. However, all data analysis methods should be well-documented, including the procedure for analysis, justification for analysis methods, and any shortcomings in a specific approach. == Applications of Ecoinformatics Across Ecology == === Ecosystem Ecology === Source: Ecosystem studies, by definition, encompass interactions across the entire life sciences spectrum, from microscopic biochemical reactions to large-scale geological phenomena. As a result, big databases may not be designed specifically for any particular research question, but should be inclusive enough to support most studies. Since ecosystem-level questions require a broad perspective, data-related ecosystem projects would likely incorporate data from several databases. A common framework for incorporating data into ecosystem-level studies is the network science model, in which data collection mechanisms and resources are treated like a large, interconnected network instead of individual entities. The network may include several data collection stations within one databases, or may span across multiple databases. Currently there are several large-scale networks, but they do not generate data on the scale to consider ecology as a big data science. A current challenge for ecoinformatics in ecosystem ecology is that most funding is prioritized for generating new data rather than maintaining existing data infrastructures. Integrating data across the different spatial scales can also be difficult, since each dataset may hold different types of data. === Urban Ecology === Source: The current push for smart cities, and sensor network integration into infrastructure, has positioned as a major source of data for ecological studies. Typical urban ecology questions address the effects of urbanization on the local ecosystem, and how to drive future development to promote urban biodiversity. While sensor networks in cities typically collect environmental data to optimize city processes, they may also be used for ecological initiatives, especially with respect to understanding the complex, multi-layered relationship between cities and their local ecosystem. It can also be used to better understand the current landscape of cities, and identify avenues for rewinding of cities. For example, analyzing mobility patterns can identify areas that may lend themselves well to building parks and green spaces. Bird watching data can also be used to identify the types of bird species in a local area. === Infectious Disease === Source: Like other disciplines of ecology, emerging infectious disease and epidemiology span multiple scales, from understanding the genetics that drive disease trends to large-scale spatiotemporal analyses. As a result, infectious disease studies can incorporate everything from bioinformatics, genetic sequences, amino acid sequences, and environmental observation data. On the micro-scale, these data can then be used to predict infectivity/transmissibility, drug resistance, drug candidates, and mutation sites. On the macro-scale, it can be used to identify societal trends or environmental factors that lend themselves to spillover, locations of infection, and practices that cause disease transmission. == Databases == Source: USGS National Streamflow sensor network GBIF Neotoma Paleobiology database European Vegetation Archive USDA Forest Inventory Analysis TRY BIEN AmeriFlux TEAM iNaturalist NEON GLEON LTER CZO TERN SAEON

    Read more →
  • Artificial intelligence in Indonesia

    Artificial intelligence in Indonesia

    Artificial intelligence in Indonesia refers to development, use and governance of artificial intelligence in Indonesia. Indonesia has treated AI as a national policy area through the Strategi Nasional Kecerdasan Artifisial or National Artificial Intelligence Strategy for 2020–2045. Public discussion has focused on the role of AI in sectors such as health, agriculture, education, mobile technology and e-commerce. Recent developments include AI ethics guidance issued by the communications ministry. Proposals for a national AI roadmap and sovereign AI fund, investment in cloud and AI infrastructure, and local-language AI initiatives for Bahasa Indonesia and regional Indonesian languages. == National strategy == Indonesia's National Artificial Intelligence Strategy is known in Indonesian as Strategi Nasional Kecerdasan Artifisial or Stranas KA. The strategy was published as a long-term framework for the development and use of AI between 2020 and 2045. It is intended to guide ministries, government agencies, regional governments and other stakeholders. The strategy identifies five priority sectors: health services, bureaucratic reform, education and research, food security, and mobility and smart cities. OECD lists the Ministry of Research and Technology and the National Research and Innovation Agency as organisations associated with the strategy. The strategy was developed through consultation with public and private stakeholders. == Institutions == The Indonesian Artificial Intelligence Industry Research and Innovation Collaboration, known as KORIKA is the nodal agency for the national AI strategy. KORIKA describes its vision as creating a collaborative ecosystem to accelerate implementation of the national AI strategy towards Vision Indonesia 2045. The Ministry of Communication and Digital Affairs has also been involved in AI governance, digital policy and public communication. In 2025, Reuters reported that the ministry was preparing a national AI roadmap to give investors and developers a clearer view of Indonesia's market, infrastructure and computing capacity. == AI Governance == Indonesia has introduced policy guidance on the ethical use of artificial intelligence. The policy sets out ethical values for the development and use of AI. These include humanity, security, transparency, credibility and accountability, personal data protection, sustainable development and intellectual property protection. A UNESCO country profile on Indonesia noted that Indonesia had adopted a national AI strategy and had policy frameworks. It also identified gaps in internet access, gender inclusion, language datasets, digital talent and cybersecurity. UNESCO recommended that Indonesia update its AI standards, invest in ethical AI, strengthen research coordination and consider establishing a national agency for artificial intelligence. In May 2026, Antara News reported comments by Deputy Minister of Communication and Digital Affairs Nezar Patria. Who said that AI safety requires partnerships, shared standards and continuing dialogue. == Sectors == AI policy discussions in Indonesia have identified health, agriculture, education, government services, mobility and smart cities as areas where AI could be applied. Mobile technology and e-commerce have been discussed as important areas of AI adoption in Indonesia. Research on AI adoption in Indonesia by Siddhartha Paul Tiwari and Adi Fahrudin has also examined mobile and e-commerce sectors. UNESCO has also noted that Indonesia's large digital economy and startup ecosystem have supported AI adoption, while also pointing to challenges in talent, research capacity and cybersecurity. Indonesia is one of the developing-country markets attracting AI infrastructure investment, including data centres. == Challenges == Indonesia faces several challenges in developing and governing AI. These include gaps in computing infrastructure, uneven connectivity outside major cities, shortages of skilled workers, limited research funding, cybersecurity risks, misinformation, data leaks and the underrepresentation of Indonesian and indigenous languages in AI datasets. UNESCO noted that Bahasa is spoken by around 200 million people but remains underrepresented in AI. It also noted that Indonesia has more than 700 indigenous languages, many of which face the risk of extinction. UNESCO recommended stronger coordination in AI research and a more unified strategy for using AI in language preservation.

    Read more →
  • Brian Deer Classification System

    Brian Deer Classification System

    The Brian Deer Classification System (BDC) is a library classification system used to organize materials in libraries with specialized Indigenous collections. The system was created in the mid-1970s by Canadian librarian A. Brian Deer, a Kahnawake Mohawk. It has been adapted for use in a British Columbia version, and also by a small number of First Nations libraries in Canada. == History and usage == Deer designed his classification system while working in the library of the National Indian Brotherhood (now the Assembly of First Nations) from 1974 to 1976. Instead of using a standard library classification scheme, such as that of the Library of Congress, he created a new system to organize the library's historic indigenous research materials and papers. He later worked at the library of the Union of British Columbia Indian Chiefs, where he developed a system for its holdings. He returned to Kahnawake, working at its Cultural Centre at Kahnawake and the Kahnawake Branch branch of the Mohawk Nation Office. His system was flexible, and he created new forms for their collections. The new systems Deer created were designed specifically for the materials in each collection according to the concerns of local Indigenous people at the time (for example, categories included land claims, treaty rights, resource management, and Elders' stories). Between 1978 and 1980, the system was adapted for use in British Columbia by Gene Joseph and Keltie McCall while they were working at the Union of British Columbia Indian Chiefs, becoming known as BDC-BC. Joseph later adapted it further for use in the Xwi7xwa Library at University of British Columbia, Vancouver. Though the Brian Deer Classification was not created as a universal classification solution for Indigenous resources, the system has provided a foundation for specialized libraries to create their own localized classification schemes. Variations of the Brian Deer Classification System are used in a small number of Canadian libraries. One prominent library using BDC is the X̱wi7x̱wa Library at the University of British Columbia, which uses a British Columbia-focused version of BDC along with First Nations House of Learning subject headings. The Union of British Columbia Indian Chiefs Resource Centre issued a revised BDC-BC in 2014, with the goal of providing users with a more flexible and culturally appropriate approach to organizing their resources. The Aanischaaukamikw Cree Cultural Institute in Oujé-Bougoumou, Quebec, implemented a local adaptation of BDC when they opened in 2012. In 2020 the Carrier Sekani Tribal Council in Prince George, British Columbia, shifted from organizing its library with the Dewey Decimal Classification to using a version of the BDC. They added new subject heading categories for topics of local interest such as the crisis of Missing and murdered Indigenous women. Simon Fraser University Library began developing the Indigenous Curriculum Resource Centre (ICRC) in 2020, with the physical space opening in 2023. The ICRC is Call to Action 21 of SFU's Aboriginal Reconciliation Council's final report, Walk This Path With Us. Through its collection, the ICRC supports those interested in learning about how and why decolonizing pedagogy and teaching practices are important. The physical items in the collection are catalogued using a modified Brian Deer Classification system. In 2022 Kwantlen Polytechnic University’s χʷəχʷéy̓əm Indigenous Collection released a revised BDC-BC System. This BDC contains works exclusively with Indigenous authored materials and expands the cuttering systems of previous BDC, with the result that much of the collection reflects a spatial relationality. The implementation of this BDC was possible due to the tireless work at Xwi7xwa Library, Union of British Columbia Indian Chiefs Resource Centre, and Simon Fraser University Library's Indigenous Curriculum Resource Centre. == Structure == The high-level organizational structure of BDC reflects a First Nations worldview, with an emphasis on relationships between and among people, animals, and the land. Subcategories demonstrate the relationships among First Nations by grouping them geographically as opposed to alphabetically; the latter is a practice frequently used for specific topics in the Library of Congress Classification. The top-level hierarchy of the X̱wi7x̱wa Library adaptation of BDC-BC demonstrates the emphasis on access to subjects prioritized by a First Nation collection: Reference Materials Local History History International Education Economic Development Housing and Community Development Criminal Justice System Constitution (Canada) and First Nations Self Government Rights and Title Natural Resources Community Resources Health World View Fine Arts Languages Literature The system is not designed to provide a comprehensive description of all topics of interest to North American Indigenous peoples; in addition, its use is limited in scope, being intended for small and specialized libraries. While English is used in the classification scheme as a common language among First Nations peoples and non-Indigenous library users, Indigenous spellings and terminology that local library users would expect to find are used to provide access. Short and easily remembered call numbers are used to facilitate use by both library workers and patrons, with the recognition that Indigenous libraries often have a small staff and limited resources to devote to cataloging. Beyond its simplicity, one potential drawback of the system is its shortage of clear guidelines for application, which provides flexibility but can also result in inconsistencies within and between library catalogs. Because few libraries use the BDC and there are limited examples for use as case studies, implementing the system and keeping it up-to-date can prove a challenge for libraries with limited resources. However, X̱wi7x̱wa Library head librarian Ann Doyle describes the system as "an important part of the body of Indigenous scholarship" that should be retained as a reflection of Indigenous worldviews, as well as for ease of access for Indigenous library users.

    Read more →
  • Cloud-computing comparison

    Cloud-computing comparison

    The following is a comparison of cloud-computing software and providers. == IaaS (Infrastructure as a service) == === Providers === ==== General ==== == SaaS (Software as a Service) == === General === === Supported hosts === === Supported guests === == PaaS (Platform as a service) == === Providers === === Providers on IaaS === PaaS providers which can run on IaaS providers ("itself" means the provider is both PaaS and IaaS):

    Read more →
  • Archetype (information science)

    Archetype (information science)

    In the field of informatics, an archetype is a formal re-usable model of a domain concept. Traditionally, the term archetype is used in psychology to mean an idealized model of a person, personality or behaviour (see Archetype). The usage of the term in informatics is derived from this traditional meaning, but applied to domain modelling instead. An archetype is defined by the OpenEHR Foundation (for health informatics) as follows: An archetype is a computable expression of a domain content model in the form of structured constraint statements, based on some reference model. openEHR archetypes are based on the openEHR reference model. Archetypes are all expressed in the same formalism. In general, they are defined for wide re-use, however, they can be specialized to include local particularities. They can accommodate any number of natural languages and terminologies. == Formal specifications == The modern archetype formalism is specified and maintained by the openEHR Foundation, and although originally developed for the health IT domain, is completely domain-independent, and has been used in geospatial modelling, telecommunications, and defence. The archetype formalism consists of a number of specifications including: 'ADL 1.4': original release of Archetype Definition Language (ADL) and Archetype Object Model (AOM); widely implemented in health IT domain; 'ADL 2': modern release of Archetype Definition Language (ADL), Archetype Object Model (AOM), Archetype Identification specification and Archetype Technology Overview. The Archetype Technology Overview provides a short technical overview of the archetype formalism useful for new users. The ADL/AOM 1.4 specifications were provided to ISO TC 215 in 2008 by the openEHR Foundation and became the ISO 13606-2 standard, extant until 2019. ISO TC 215 accepted the AOM 2 specification as the basis for a revision of this standard, which was issued in 2019. In late 2015, the Object Management Group (OMG) accepted an RfP entitled 'Archetype Modeling Language (AML)' as a new candidate standard. This specification is a form of ADL re-engineered as a UML profile so as to enable archetype modelling to be supported within UML tools. == Tools == A number of tools area available for working with archetypes. Most are listed on the openEHR modelling tools page. They include: ADL Designer, a modern AOM2-based web editing application Archetype Editor, an original desktop clinical modelling tool Template Designer, an original desktop clinical templating tool LinkEHR, an archetype and data integration tool ADL Workbench, reference compiler and visualiser tool == Example ==

    Read more →
  • ISO 15926

    ISO 15926

    ISO 15926 is a standard for data integration, sharing, exchange, and hand-over between computer systems. The title, "Industrial automation systems and integration—Integration of life-cycle data for process plants including oil and gas production facilities", is regarded too narrow by the present ISO 15926 developers. Having developed a generic data model and reference data library for process plants, it turned out that this subject is already so wide, that actually any state information may be modelled with it. == History == In 1991 a European Union ESPRIT-, named ProcessBase, started. The focus of this research project was to develop a data model for lifecycle information of a facility that would suit the requirements of the process industries. At the time that the project duration had elapsed, a consortium of companies involved in the process industries had been established: EPISTLE (European Process Industries STEP Technical Liaison Executive). Initially individual companies were members, but later this changed into a situation where three national consortia were the only members: PISTEP (UK), POSC/Caesar (Norway), and USPI-NL (Netherlands). (later PISTEP merged into POSC/Caesar, and USPI-NL was renamed to USPI). EPISTLE took over the work of the ProcessBase project. Initially this work involved a standard called ISO 10303-221 (referred to as "STEP AP221"). In that AP221 we saw, for the first time, an Annex M with a list of standard instances of the AP221 data model, including types of objects. These standard instances would be for reference and would act as a knowledge base with knowledge about the types of objects. In the early nineties EPISTLE started an activity to extend Annex M to become a library of such object classes and their relationships: STEPlib. In the STEPlib activities a group of approx. 100 domain experts from all three member consortia, spread over the various expertises (e.g. Electrical, Piping, Rotating equipment, etc.), worked together to define the "core classes". The development of STEPlib was extended with many additional classes and relationships between classes and published as Open source data. Furthermore, the concepts and relation types from the AP221 and ISO 15926-2 data models were also added to the STEPlib dictionary. This resulted in the development of Gellish English, whereas STEPlib became the Gellish English dictionary. Gellish English is a structured subset of natural English and is a modeling language suitable for knowledge modeling, product modeling and data exchange. It differs from conventional modeling languages (meta languages) as used in information technology as it not only defines generic concepts, but also includes an English dictionary. The semantic expression capability of Gellish English was significantly increased by extending the number of relation types that can be used to express knowledge and information. For modelling-technical reasons POSC/Caesar proposed another standard than ISO 10303, called ISO 15926. EPISTLE (and ISO) supported that proposal, and continued the modelling work, thereby writing Part 2 of ISO 15926. This Part 2 has official ISO IS (International Standard) status since 2003. POSC/Caesar started to put together their own RDL (Reference Data Library). They added many specialized classes, for example for ANSI (American National Standards Institute) pipe and pipe fittings. Meanwhile, STEPlib continued its existence, mainly driven by some members of USPI. Since it was clear that it was not in the interest of the industry to have two libraries for, in essence, the same set of classes, the Management Board of EPISTLE decided that the core classes of the two libraries shall be merged into Part 4 of ISO 15926. This merging process has been finished. Part 4 should act as reference data for part 2 of ISO 15926 as well as for ISO 10303-221 and replaced its Annex M. On June 5, 2007 ISO 15926-4 was signed off as a TS (Technical Specification). In 1999 the work on an earlier version of Part 7 started. Initially this was based on XML Schema (the only useful W3C Recommendation available then), but when Web Ontology Language (OWL) became available it was clear that provided a far more suitable environment for Part 7. Part 7 passed the first ISO ballot by the end of 2005, and an implementation project started. A formal ballot for TS (Technical Specification) was planned for December 2007. However, it was decided then to split Part 7 into more than one part, because the scope was too wide. == Need for ISO15926 == In 2004, the National Institute of Standards and Technology (NIST) released a report on the impact of the lack of digital interoperability in the capital projects industry. The report estimated the cost of inadequate interoperability in the U.S. capital facilities industry to be $15.8 billion per year. This was considered likely to be a conservative figure. == The standard == ISO 15926 has thirteen parts (as of February 2022): Part 1 - Overview and fundamental principles Part 2 - Data model Part 3 - Reference data for geometry and topology Part 4 - Reference Data, the terms used within facilities for the process industry Part 6 - Methodology for the development and validation of reference data (under development) Part 7 - Template methodology Part 8 - OWL/RDF implementation Part 9 - Implementation standards, with the focus on standard web servers, web services, and security (under development) Part 10 - Conformance testing Part 11 - Methodology for simplified industrial usage of reference data (under development) Part 12 - Life cycle integration ontology in Web Ontology Language (OWL2) Part 13 - Integrated lifecycle asset planning === Description === The model and the library are suitable for representing lifecycle information about technical installations and their components. They can also be used for defining the terms used in product catalogs in e-commerce. Another, more limited, use of the standard is as a reference classification for harmonization purposes between shared databases and product catalogues that are not based on ISO 15926. The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them. Although set up for the process industries with large projects involving many parties, and involving plant operations and maintenance lasting decades, the technology can be used by anyone willing to set up a proper vocabulary of reference data in line with Part 4. In Part 7 the concept of Templates is introduced. These are semantic constructs, using Part 2 entities, that represent a small piece of information. These constructs then are mapped to more efficient classes of n-ary relations that interlink the Nodes that are involved in the represented information. In Part 8 the Part 7 Templates are defined in OWL and instantiated in RDF. For validation and reasoning purposes all are represented in First-Order Logic as well. In Part 9 these Node and Template instances are stored in an RDF triple store, set up to a standard schema and an API. Each participating computer system maps its data from its internal format to such ISO-standard Node and Template instances. Data can be "handed over" from one triple store to another in cases where data custodianship is handed over (e.g. from a contractor to a plant owner, or from a manufacturer to the owners of the manufactured goods). Hand-over can be for a part of all data, whilst maintaining full referential integrity. Documents are user-definable. They are defined in XML Schema and they are, in essence, only a structure containing cells that make reference to instances of Templates. This represents a view on all lifecycle data: since the data model is a 4D (space-time) model, it is possible to present the data that was valid at any given point in time, thus providing a true historical record. It is expected that this will be used for Knowledge Mining. Data can be queried by means of SPARQL. In any implementation a restricted number of triple stores can be involved, with different access rights. This is done by means of creating a CPF Server (= Confederation of Participating Façades). An Ontology Browser allows for access to one or more triple stores in a given CPF, depending on the access rights. == Projects and applications == There are a number of projects working on the extension of the ISO 15926 standard in different application areas. === Capital-intensive projects === Within the application of Capital Intensive projects, some cooperating implementation projects are running: The DEXPI project: The objective of DEXPI is to develop and promote a general standard for the process industry covering all phases of the lifecycle of a (petro-)chemical plant, ranging from specification of functional requirements to assets in operation. Finalised projects include: The EDRC Project of FIATECH Capturing Equipment Data Requirements Using ISO 15926 and Assessing Conforma

    Read more →
  • Algorithm engineering

    Algorithm engineering

    Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithmics theory and practical applications of algorithms in software engineering. It is a general methodology for algorithmic research. == Origins == In 1995, a report from an NSF-sponsored workshop "with the purpose of assessing the current goals and directions of the Theory of Computing (TOC) community" identified the slow speed of adoption of theoretical insights by practitioners as an important issue and suggested measures to reduce the uncertainty by practitioners whether a certain theoretical breakthrough will translate into practical gains in their field of work, and tackle the lack of ready-to-use algorithm libraries, which provide stable, bug-free and well-tested implementations for algorithmic problems and expose an easy-to-use interface for library consumers. But also, promising algorithmic approaches have been neglected due to difficulties in mathematical analysis. The term "algorithm engineering" was first used with specificity in 1997, with the first Workshop on Algorithm Engineering (WAE97), organized by Giuseppe F. Italiano. == Difference from algorithm theory == Algorithm engineering does not intend to replace or compete with algorithm theory, but tries to enrich, refine and reinforce its formal approaches with experimental algorithmics (also called empirical algorithmics). This way it can provide new insights into the efficiency and performance of algorithms in cases where the algorithm at hand is less amenable to algorithm theoretic analysis, formal analysis pessimistically suggests bounds which are unlikely to appear on inputs of practical interest, the algorithm relies on the intricacies of modern hardware architectures like data locality, branch prediction, instruction stalls, instruction latencies which the machine model used in Algorithm Theory is unable to capture in the required detail, the crossover between competing algorithms with different constant costs and asymptotic behaviors needs to be determined. == Methodology == Some researchers describe algorithm engineering's methodology as a cycle consisting of algorithm design, analysis, implementation and experimental evaluation, joined by further aspects like machine models or realistic inputs. They argue that equating algorithm engineering with experimental algorithmics is too limited, because viewing design and analysis, implementation and experimentation as separate activities ignores the crucial feedback loop between those elements of algorithm engineering. === Realistic models and real inputs === While specific applications are outside the methodology of algorithm engineering, they play an important role in shaping realistic models of the problem and the underlying machine, and supply real inputs and other design parameters for experiments. === Design === Compared to algorithm theory, which usually focuses on the asymptotic behavior of algorithms, algorithm engineers need to keep further requirements in mind: Simplicity of the algorithm, implementability in programming languages on real hardware, and allowing code reuse. Additionally, constant factors of algorithms have such a considerable impact on real-world inputs that sometimes an algorithm with worse asymptotic behavior performs better in practice due to lower constant factors. === Analysis === Some problems can be solved with heuristics and randomized algorithms in a simpler and more efficient fashion than with deterministic algorithms. Unfortunately, this makes even simple randomized algorithms difficult to analyze because there are subtle dependencies to be taken into account. === Implementation === Huge semantic gaps between theoretical insights, formulated algorithms, programming languages and hardware pose a challenge to efficient implementations of even simple algorithms, because small implementation details can have rippling effects on execution behavior. The only reliable way to compare several implementations of an algorithm is to spend an considerable amount of time on tuning and profiling, running those algorithms on multiple architectures, and looking at the generated machine code. === Experiments === See: Experimental algorithmics === Application engineering === Implementations of algorithms used for experiments differ in significant ways from code usable in applications. While the former prioritizes fast prototyping, performance and instrumentation for measurements during experiments, the latter requires thorough testing, maintainability, simplicity, and tuning for particular classes of inputs. === Algorithm libraries === Stable, well-tested algorithm libraries like LEDA play an important role in technology transfer by speeding up the adoption of new algorithms in applications. Such libraries reduce the required investment and risk for practitioners, because it removes the burden of understanding and implementing the results of academic research. == Conferences == Two main conferences on Algorithm Engineering are organized annually, namely: Symposium on Experimental Algorithms (SEA), established in 1997 (formerly known as WEA). SIAM Meeting on Algorithm Engineering and Experiments (ALENEX), established in 1999. The 1997 Workshop on Algorithm Engineering (WAE'97) was held in Venice (Italy) on September 11–13, 1997. The Third International Workshop on Algorithm Engineering (WAE'99) was held in London, UK in July 1999. The first Workshop on Algorithm Engineering and Experimentation (ALENEX99) was held in Baltimore, Maryland on January 15–16, 1999. It was sponsored by DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science (at Rutgers University), with additional support from SIGACT, the ACM Special Interest Group on Algorithms and Computation Theory, and SIAM, the Society for Industrial and Applied Mathematics.

    Read more →
  • Convolutional neural network

    Convolutional neural network

    A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. CNNs are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced—in some cases—by newer architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by the regularization that comes from using shared weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution (or cross-correlation) kernels, only 25 weights for each convolutional layer are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features. Some applications of CNNs include: image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain–computer interfaces, and financial time series. CNNs are also known as shift invariant or space invariant artificial neural networks, based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant responses known as feature maps. Counter-intuitively, most convolutional neural networks are not invariant to translation, due to the downsampling operation they apply to the input. Feedforward neural networks are usually fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. The "full connectivity" of these networks makes them prone to overfitting data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the generalized principles that characterize a given dataset rather than the biases of a poorly-populated set. Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field. CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns to optimize the filters (or kernels) through automated learning, whereas in traditional algorithms these filters are hand-engineered. This simplifies and automates the process, enhancing efficiency and scalability overcoming human-intervention bottlenecks. == Architecture == A convolutional neural network consists of an input layer, hidden layers and an output layer. In a convolutional neural network, the hidden layers include one or more layers that perform convolutions. Typically this includes a layer that performs a dot product of the convolution kernel with the layer's input matrix. This product is usually the Frobenius inner product, and its activation function is commonly ReLU. As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as pooling layers, fully connected layers, and normalization layers. Here it should be noted how close a convolutional neural network is to a matched filter. === Convolutional layers === In a CNN, the input is a tensor with shape: (number of inputs) × (input height) × (input width) × (input channels) After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map, with shape: (number of inputs) × (feature map height) × (feature map width) × (feature map channels). Convolutional layers convolve the input and pass its result to the next layer. This is similar to the response of a neuron in the visual cortex to a specific stimulus. Each convolutional neuron processes data only for its receptive field. Although fully connected feedforward neural networks can be used to learn features and classify data, this architecture is generally impractical for larger inputs (e.g., high-resolution images), which would require massive numbers of neurons because each pixel is a relevant input feature. A fully connected layer for an image of size 100 × 100 has 10,000 weights for each neuron in the second layer. Convolution reduces the number of free parameters, allowing the network to be deeper. For example, using a 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using shared weights means there are many fewer parameters, which helps avoid the vanishing gradients and exploding gradients problems seen during backpropagation in earlier neural networks. To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers, which are based on a depthwise convolution followed by a pointwise convolution. The depthwise convolution is a spatial convolution applied independently over each channel of the input tensor, while the pointwise convolution is a standard convolution restricted to the use of 1 × 1 {\displaystyle 1\times 1} kernels. === Pooling layers === Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map. There are two common types of pooling in popular use: max and average. Max pooling uses the maximum value of each local cluster of neurons in the feature map, while average pooling takes the average value. === Fully connected layers === Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional multilayer perceptron neural network (MLP). Each neuron in the fully connected layer receives input from all the neurons in the previous layer. These inputs are weighted and summed with the corresponding biases, and then passed through an activation function to perform a nonlinear transformation, generating the output. The flattened matrix goes through a fully connected layer to classify the images. === Receptive field === In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's receptive field. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the entire previous layer. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers. To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios, thus having a variable receptive field size. === Weights === Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights. The vectors of weights and biases are called filters and represent particular features of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the memory footprint because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector

    Read more →
  • Small Data

    Small Data

    Small Data: the Tiny Clues that Uncover Huge Trends is Martin Lindstrom's seventh book. It chronicles his work as a branding expert, working with consumers across the world to better understand their behavior. The theory behind the book is that businesses can better create products and services based on observing consumer behavior in their homes, as opposed to relying solely on big data. == Content == The book is based on a several year period of consumer studies for major corporations across the globe. It features case studies of the author's work interviewing consumers in their homes and using his observations to create hypotheses as to why they use products the way that they do. == Public reception == The book was a New York Times Bestseller upon release and was positively reviewed on several websites, Including Entrepreneur and Forbes. In 2016, it was named a Best Business Book by strategy+business and one of Inc. Magazine's Best Sales and Marketing books.

    Read more →
  • EdgeRank

    EdgeRank

    EdgeRank is the name commonly given to the algorithm that Facebook uses to determine what articles should be displayed in a user's News Feed. As of 2011, Facebook has stopped using the EdgeRank system and uses a machine learning algorithm that, as of 2013, takes more than 100,000 factors into account. EdgeRank was developed and implemented by Serkan Piantino. == Formula and factors == In 2010, a simplified version of the EdgeRank algorithm was presented as: ∑ e d g e s e u e w e d e {\displaystyle \sum _{\mathrm {edges\,} e}u_{e}w_{e}d_{e}} where: u e {\displaystyle u_{e}} is user affinity. w e {\displaystyle w_{e}} is how the content is weighted. d e {\displaystyle d_{e}} is a time-based decay parameter. User Affinity: The User Affinity part of the algorithm in Facebook's EdgeRank looks at the relationship and proximity of the user and the content (post/status update). Content Weight: What action was taken by the user on the content. Time-Based Decay Parameter: New or old. Newer posts tend to hold a higher place than older posts. Some of the methods that Facebook uses to adjust the parameters are proprietary and not available to the public. A study has shown that it is possible to hypothesize a disadvantage of the "like" reaction and advantages of other interactions (e.g., the "haha" reaction or "comments") in content algorithmic ranking on Facebook. The "like" button can decrease the organic reach as a "brake effect of viral reach". The "haha" reaction, "comments" and the "love" reaction could achieve the highest increase in total organic reach. == Impact == EdgeRank and its successors have a broad impact on what users actually see out of what they ostensibly follow: for instance, the selection can produce a filter bubble (if users are exposed to updates which confirm their opinions etc.) or alter people's mood (if users are shown a disproportionate amount of positive or negative updates). As a result, for Facebook pages, the typical engagement rate is less than 1% (or less than 0.1% for the bigger ones), and organic reach 10% or less for most non-profits. As a consequence, for pages, it may be nearly impossible to reach any significant audience without paying to promote their content.

    Read more →
  • DPVweb

    DPVweb

    DPVweb is a database for virologists working on plant viruses combining taxonomic, bioinformatic and symptom data. == Description == DPVweb is a central web-based source of information about viruses, viroids and satellites of plants, fungi and protozoa. It provides comprehensive taxonomic information, including brief descriptions of each family and genus, and classified lists of virus sequences. It makes use of a large database that also holds detailed, curated, information for all sequences of viruses, viroids and satellites of plants, fungi and protozoa that are complete or that contain at least one complete gene. There are currently about 10,000 such sequences. For comparative purposes, DPVweb also contains a representative sequence of all other fully sequenced virus species with an RNA or single-stranded DNA genome. For each curated sequence the database contains the start and end positions of each feature (gene, non-translated region, etc.), and these have been checked for accuracy. As far as possible, the nomenclature for genes and proteins are standardized within genera and families. Sequences of features (either as DNA or amino acid sequences) can be directly downloaded from the website in FASTA format. The sequence information can also be accessed via client software for personal computers. == History == The Descriptions of Plant Viruses (DPVs) were first published by the Association of Applied Biologists in 1970 as a series of leaflets, each one written by an expert describing a particular plant virus. In 1998 all of the 354 DPVs published in paper were scanned, and converted into an electronic format in a database and distributed on CDROM. In 2001 the descriptions were made available on the new DPVweb site, providing open access to the now 400+ DPVs (currently 415) as well as taxonomic and sequence data on all plant viruses. == Uses == DPVweb is an aid to researchers in the field of plant virology as well as an educational resource for students of virology and molecular biology. The site provides a single point of access for all known plant virus genome sequences making it easy to collect these sequences together for further analysis and comparison. Sequence data from the DPVweb database have proved valuable for a number of projects: survey of codon usage bias amongst all plant viruses, two-way comparisons between comprehensive sets of sequences from the families Flexiviridae and Potyviridae that have helped inform taxonomy and clarify genus and species discrimination criteria, a survey and verification of the polyprotein cleavage sites within the family Potyviridae.

    Read more →
  • Coupled pattern learner

    Coupled pattern learner

    Coupled Pattern Learner (CPL) is a machine learning algorithm which couples the semi-supervised learning of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods. == Coupled Pattern Learner == Semi-supervised learning approaches using a small number of labeled examples with many unlabeled examples are usually unreliable as they produce an internally consistent, but incorrect set of extractions. CPL solves this problem by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. It was introduced by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell in 2009. == CPL overview == CPL is an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors. Basic idea behind CPL is that semi-supervised training of a single type of extractor such as ‘coach’ is much more difficult than simultaneously training many extractors that cover a variety of inter-related entity and relation types. Using prior knowledge about the relationships between these different entities and relations CPL makes unlabeled data as a useful constraint during training. For e.g., ‘coach(x)’ implies ‘person(x)’ and ‘not sport(x)’. == CPL description == === Coupling of predicates === CPL primarily relies on the notion of coupling the learning of multiple functions so as to constrain the semi-supervised learning problem. CPL constrains the learned function in two ways. Sharing among same-arity predicates according to logical relations Relation argument type-checking === Sharing among same-arity predicates === Each predicate P in the ontology has a list of other same-arity predicates with which P is mutually exclusive. If A is mutually exclusive with predicate B, A’s positive instances and patterns become negative instances and negative patterns for B. For example, if ‘city’, having an instance ‘Boston’ and a pattern ‘mayor of arg1’, is mutually exclusive with ‘scientist’, then ‘Boston’ and ‘mayor of arg1’ will become a negative instance and a negative pattern respectively for ‘scientist.’ Further, Some categories are declared to be a subset of another category. For e.g., ‘athlete’ is a subset of ‘person’. === Relation argument type-checking === This is a type checking information used to couple the learning of relations and categories. For example, the arguments of the ‘ceoOf’ relation are declared to be of the categories ‘person’ and ‘company’. CPL does not promote a pair of noun phrases as an instance of a relation unless the two noun phrases are classified as belonging to the correct argument types. === Algorithm description === Following is a quick summary of the CPL algorithm. Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances; FILTER candidates that violate coupling; RANK candidate instances/patterns; PROMOTE top candidates; end end ==== Inputs ==== A large corpus of Part-Of-Speech tagged sentences and an initial ontology with predefined categories, relations, mutually exclusive relationships between same-arity predicates, subset relationships between some categories, seed instances for all predicates, and seed patterns for the categories. ==== Candidate extraction ==== CPL finds new candidate instances by using newly promoted patterns to extract the noun phrases that co-occur with those patterns in the text corpus. CPL extracts, Category Instances Category Patterns Relation Instances Relation Patterns ==== Candidate filtering ==== Candidate instances and patterns are filtered to maintain high precision, and to avoid extremely specific patterns. An instance is only considered for assessment if it co-occurs with at least two promoted patterns in the text corpus, and if its co-occurrence count with all promoted patterns is at least three times greater than its co-occurrence count with negative patterns. ==== Candidate ranking ==== CPL ranks candidate instances using the number of promoted patterns that they co-occur with so that candidates that occur with more patterns are ranked higher. Patterns are ranked using an estimate of the precision of each pattern. ==== Candidate promotion ==== CPL ranks the candidates according to their assessment scores and promotes at most 100 instances and 5 patterns for each predicate. Instances and patterns are only promoted if they co-occur with at least two promoted patterns or instances, respectively. == Meta-Bootstrap Learner == Meta-Bootstrap Learner (MBL) was also proposed by the authors of CPL. Meta-Bootstrap learner couples the training of multiple extraction techniques with a multi-view constraint, which requires the extractors to agree. It makes addition of coupling constraints on top of existing extraction algorithms, while treating them as black boxes, feasible. MBL assumes that the errors made by different extraction techniques are independent. Following is a quick summary of MBL. Input: An ontology O, a set of extractors ε Output: Trusted instances for each predicate for i=1,2,...,∞ do foreach predicate p in O do foreach extractor e in ε do Extract new candidates for p using e with recently promoted instances; end FILTER candidates that violate mutual-exclusion or type-checking constraints; PROMOTE candidates that were extracted by all extractors; end end Subordinate algorithms used with MBL do not promote any instance on their own, they report the evidence about each candidate to MBL and MBL is responsible for promoting instances. == Applications == In their paper authors have presented results showing the potential of CPL to contribute new facts to existing repository of semantic knowledge, Freebase

    Read more →
  • List of artificial intelligence projects

    List of artificial intelligence projects

    The following is a list of current and past, non-classified notable artificial intelligence projects. == Specialized projects == === Brain-inspired === Blue Brain Project, an attempt to create a synthetic brain by reverse-engineering the mammalian brain down to the molecular level. Google Brain, a deep learning project part of Google X attempting to have intelligence similar or equal to human-level. Human Brain Project, ten-year scientific research project, based on exascale supercomputers. === Cognitive architectures === 4CAPS, developed at Carnegie Mellon University under Marcel A. Just ACT-R, developed at Carnegie Mellon University under John R. Anderson. AIXI, Universal Artificial Intelligence developed by Marcus Hutter at IDSIA and ANU. CALO, a DARPA-funded, 25-institution effort to integrate many artificial intelligence approaches (natural language processing, speech recognition, machine vision, probabilistic logic, planning, reasoning, many forms of machine learning) into an AI assistant that learns to help manage your office environment. CHREST, developed under Fernand Gobet at Brunel University and Peter C. Lane at the University of Hertfordshire. CLARION, developed under Ron Sun at Rensselaer Polytechnic Institute and University of Missouri. CoJACK, an ACT-R inspired extension to the JACK multi-agent system that adds a cognitive architecture to the agents for eliciting more realistic (human-like) behaviors in virtual environments. Copycat, by Douglas Hofstadter and Melanie Mitchell at the Indiana University. DUAL, developed at the New Bulgarian University under Boicho Kokinov. FORR developed by Susan L. Epstein at The City University of New York. IDA and LIDA, implementing Global Workspace Theory, developed under Stan Franklin at the University of Memphis. OpenCog Prime, developed using the OpenCog Framework. Procedural Reasoning System (PRS), developed by Michael Georgeff and Amy L. Lansky at SRI International. Psi-Theory developed under Dietrich Dörner at the Otto-Friedrich University in Bamberg, Germany. Soar, developed under Allen Newell and John Laird at Carnegie Mellon University and the University of Michigan. Society of Mind and its successor The Emotion Machine proposed by Marvin Minsky. Subsumption architectures, developed e.g. by Rodney Brooks (though it could be argued whether they are cognitive). === Games === AlphaGo, software developed by Google that plays the Chinese board game Go. Chinook, a computer program that plays English draughts; the first to win the world champion title in the competition against humans. Deep Blue, a chess-playing computer developed by IBM which beat Garry Kasparov in 1997. Halite, an artificial intelligence programming competition created by Two Sigma in 2016. Libratus, a poker AI that beat world-class poker players in 2017, intended to be generalisable to other applications. The Matchbox Educable Noughts and Crosses Engine (sometimes called the Machine Educable Noughts and Crosses Engine or MENACE) was a mechanical computer made from 304 matchboxes designed and built by artificial intelligence researcher Donald Michie in 1961. Quick, Draw!, an online game developed by Google that challenges players to draw a picture of an object or idea and then uses a neural network to guess what the drawing is. The Samuel Checkers-playing Program (1959) was among the world's first successful self-learning programs, and as such a very early demonstration of the fundamental concept of artificial intelligence (AI). Stockfish AI, an open source chess engine currently ranked the highest in many computer chess rankings. TD-Gammon, a program that learned to play world-class backgammon partly by playing against itself (temporal difference learning with neural networks). === Internet activism === Serenata de Amor, project for the analysis of public expenditures and detect discrepancies. === Knowledge and reasoning === Alice (Microsoft), a project from Microsoft Research Lab aimed at improving decision-making in Economics Braina, an intelligent personal assistant application with a voice interface for Windows OS. Cyc, an attempt to assemble an ontology and database of everyday knowledge, enabling human-like reasoning. Eurisko, a language by Douglas Lenat for solving problems which consists of heuristics, including some for how to use and change its heuristics. Google Now, an intelligent personal assistant with a voice interface in Google's Android and Apple Inc.'s iOS, as well as Google Chrome web browser on personal computers. Holmes a new AI created by Wipro. Microsoft Cortana, an intelligent personal assistant with a voice interface in Microsoft's various Windows 10 editions. MindsDB, is an AI automation platform for building AI/ML powered features and applications. Mycin, an early medical expert system. Open Mind Common Sense, a project based at the MIT Media Lab to build a large common sense knowledge base from online contributions. Siri, an intelligent personal assistant and knowledge navigator with a voice-interface in Apple Inc.'s iOS and macOS. SNePS, simultaneously a logic-based, frame-based, and network-based knowledge representation, reasoning, and acting system. Viv (software), a new AI by the creators of Siri. Wolfram Alpha, an online service that answers queries by computing the answer from structured data. === Motion and manipulation === AIBO, the robot pet for the home, grew out of Sony's Computer Science Laboratory (CSL). Cog, a robot developed by MIT to study theories of cognitive science and artificial intelligence, now discontinued. === Music === Melomics, a bioinspired technology for music composition and synthesization of music, where computers develop their own style, rather than mimic musicians. === Natural language processing === AIML, an XML dialect for creating natural language software agents. Apache Lucene, a high-performance, full-featured text search engine library written entirely in Java. Apache OpenNLP, a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking and parsing. Artificial Linguistic Internet Computer Entity (A.L.I.C.E.), a natural language processing chatterbot. ChatGPT, a chatbot built on top of OpenAI's GPT-3.5 and GPT-4 family of large language models. Claude, a family of large language models developed by Anthropic and launched in 2023. Claude LLMs achieved high coding scores in several recognized LLM benchmarks. Cleverbot, successor to Jabberwacky, now with 170m lines of conversation, Deep Context, fuzziness and parallel processing. Cleverbot learns from around 2 million user interactions per month. DeepSeek: Chinese chatbot funded by hedge fund High-Flyer. DBRX, 136 billion parameter open sourced large language model developed by Mosaic ML and Databricks. ELIZA, a famous 1966 computer program by Joseph Weizenbaum, which parodied person-centered therapy. FreeHAL, a self-learning conversation simulator (chatterbot) which uses semantic nets to organize its knowledge to imitate a very close human behavior within conversations. Gemini, a family of multimodal large language model developed by Google's DeepMind. Drives the Gemini chatbot, formerly known as Bard. GigaChat, a chatbot by Russian Sberbank. GPT-3, a 2020 language model developed by OpenAI that can produce text difficult to distinguish from that written by a human. Jabberwacky, a chatbot by Rollo Carpenter, aiming to simulate natural human chat. LaMDA, a family of conversational neural language models developed by Google. LLaMA, a 2023 language model family developed by Meta that includes 7, 13, 33 and 65 billion parameter models.[1] Mycroft, a free and open-source intelligent personal assistant that uses a natural language user interface. PARRY, another early chatterbot, written in 1972 by Kenneth Colby, attempting to simulate a paranoid schizophrenic. SHRDLU, an early natural language processing computer program developed by Terry Winograd at MIT from 1968 to 1970. SYSTRAN, a machine translation technology by the company of the same name, used by Yahoo!, AltaVista and Google, among others. === Speech recognition === CMU Sphinx, a group of speech recognition systems developed at Carnegie Mellon University. DeepSpeech, an open-source Speech-To-Text engine based on Baidu's deep speech research paper. Whisper, an open-source speech recognition system developed at OpenAI. === Speech synthesis === 15.ai, a real-time artificial intelligence text-to-speech tool developed by an anonymous researcher from MIT. Amazon Polly, a speech synthesis software by Amazon. Festival Speech Synthesis System, a general multi-lingual speech synthesis system developed at the Centre for Speech Technology Research (CSTR) at the University of Edinburgh. WaveNet, a deep neural network for generating raw audio. === Video === CapCut is a video editor tool, developed

    Read more →
  • ISO 15926

    ISO 15926

    ISO 15926 is a standard for data integration, sharing, exchange, and hand-over between computer systems. The title, "Industrial automation systems and integration—Integration of life-cycle data for process plants including oil and gas production facilities", is regarded too narrow by the present ISO 15926 developers. Having developed a generic data model and reference data library for process plants, it turned out that this subject is already so wide, that actually any state information may be modelled with it. == History == In 1991 a European Union ESPRIT-, named ProcessBase, started. The focus of this research project was to develop a data model for lifecycle information of a facility that would suit the requirements of the process industries. At the time that the project duration had elapsed, a consortium of companies involved in the process industries had been established: EPISTLE (European Process Industries STEP Technical Liaison Executive). Initially individual companies were members, but later this changed into a situation where three national consortia were the only members: PISTEP (UK), POSC/Caesar (Norway), and USPI-NL (Netherlands). (later PISTEP merged into POSC/Caesar, and USPI-NL was renamed to USPI). EPISTLE took over the work of the ProcessBase project. Initially this work involved a standard called ISO 10303-221 (referred to as "STEP AP221"). In that AP221 we saw, for the first time, an Annex M with a list of standard instances of the AP221 data model, including types of objects. These standard instances would be for reference and would act as a knowledge base with knowledge about the types of objects. In the early nineties EPISTLE started an activity to extend Annex M to become a library of such object classes and their relationships: STEPlib. In the STEPlib activities a group of approx. 100 domain experts from all three member consortia, spread over the various expertises (e.g. Electrical, Piping, Rotating equipment, etc.), worked together to define the "core classes". The development of STEPlib was extended with many additional classes and relationships between classes and published as Open source data. Furthermore, the concepts and relation types from the AP221 and ISO 15926-2 data models were also added to the STEPlib dictionary. This resulted in the development of Gellish English, whereas STEPlib became the Gellish English dictionary. Gellish English is a structured subset of natural English and is a modeling language suitable for knowledge modeling, product modeling and data exchange. It differs from conventional modeling languages (meta languages) as used in information technology as it not only defines generic concepts, but also includes an English dictionary. The semantic expression capability of Gellish English was significantly increased by extending the number of relation types that can be used to express knowledge and information. For modelling-technical reasons POSC/Caesar proposed another standard than ISO 10303, called ISO 15926. EPISTLE (and ISO) supported that proposal, and continued the modelling work, thereby writing Part 2 of ISO 15926. This Part 2 has official ISO IS (International Standard) status since 2003. POSC/Caesar started to put together their own RDL (Reference Data Library). They added many specialized classes, for example for ANSI (American National Standards Institute) pipe and pipe fittings. Meanwhile, STEPlib continued its existence, mainly driven by some members of USPI. Since it was clear that it was not in the interest of the industry to have two libraries for, in essence, the same set of classes, the Management Board of EPISTLE decided that the core classes of the two libraries shall be merged into Part 4 of ISO 15926. This merging process has been finished. Part 4 should act as reference data for part 2 of ISO 15926 as well as for ISO 10303-221 and replaced its Annex M. On June 5, 2007 ISO 15926-4 was signed off as a TS (Technical Specification). In 1999 the work on an earlier version of Part 7 started. Initially this was based on XML Schema (the only useful W3C Recommendation available then), but when Web Ontology Language (OWL) became available it was clear that provided a far more suitable environment for Part 7. Part 7 passed the first ISO ballot by the end of 2005, and an implementation project started. A formal ballot for TS (Technical Specification) was planned for December 2007. However, it was decided then to split Part 7 into more than one part, because the scope was too wide. == Need for ISO15926 == In 2004, the National Institute of Standards and Technology (NIST) released a report on the impact of the lack of digital interoperability in the capital projects industry. The report estimated the cost of inadequate interoperability in the U.S. capital facilities industry to be $15.8 billion per year. This was considered likely to be a conservative figure. == The standard == ISO 15926 has thirteen parts (as of February 2022): Part 1 - Overview and fundamental principles Part 2 - Data model Part 3 - Reference data for geometry and topology Part 4 - Reference Data, the terms used within facilities for the process industry Part 6 - Methodology for the development and validation of reference data (under development) Part 7 - Template methodology Part 8 - OWL/RDF implementation Part 9 - Implementation standards, with the focus on standard web servers, web services, and security (under development) Part 10 - Conformance testing Part 11 - Methodology for simplified industrial usage of reference data (under development) Part 12 - Life cycle integration ontology in Web Ontology Language (OWL2) Part 13 - Integrated lifecycle asset planning === Description === The model and the library are suitable for representing lifecycle information about technical installations and their components. They can also be used for defining the terms used in product catalogs in e-commerce. Another, more limited, use of the standard is as a reference classification for harmonization purposes between shared databases and product catalogues that are not based on ISO 15926. The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them. Although set up for the process industries with large projects involving many parties, and involving plant operations and maintenance lasting decades, the technology can be used by anyone willing to set up a proper vocabulary of reference data in line with Part 4. In Part 7 the concept of Templates is introduced. These are semantic constructs, using Part 2 entities, that represent a small piece of information. These constructs then are mapped to more efficient classes of n-ary relations that interlink the Nodes that are involved in the represented information. In Part 8 the Part 7 Templates are defined in OWL and instantiated in RDF. For validation and reasoning purposes all are represented in First-Order Logic as well. In Part 9 these Node and Template instances are stored in an RDF triple store, set up to a standard schema and an API. Each participating computer system maps its data from its internal format to such ISO-standard Node and Template instances. Data can be "handed over" from one triple store to another in cases where data custodianship is handed over (e.g. from a contractor to a plant owner, or from a manufacturer to the owners of the manufactured goods). Hand-over can be for a part of all data, whilst maintaining full referential integrity. Documents are user-definable. They are defined in XML Schema and they are, in essence, only a structure containing cells that make reference to instances of Templates. This represents a view on all lifecycle data: since the data model is a 4D (space-time) model, it is possible to present the data that was valid at any given point in time, thus providing a true historical record. It is expected that this will be used for Knowledge Mining. Data can be queried by means of SPARQL. In any implementation a restricted number of triple stores can be involved, with different access rights. This is done by means of creating a CPF Server (= Confederation of Participating Façades). An Ontology Browser allows for access to one or more triple stores in a given CPF, depending on the access rights. == Projects and applications == There are a number of projects working on the extension of the ISO 15926 standard in different application areas. === Capital-intensive projects === Within the application of Capital Intensive projects, some cooperating implementation projects are running: The DEXPI project: The objective of DEXPI is to develop and promote a general standard for the process industry covering all phases of the lifecycle of a (petro-)chemical plant, ranging from specification of functional requirements to assets in operation. Finalised projects include: The EDRC Project of FIATECH Capturing Equipment Data Requirements Using ISO 15926 and Assessing Conforma

    Read more →