AI Analytics Dashboard

AI Analytics Dashboard — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ELMo

    ELMo

    ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors. It was created by researchers at the Allen Institute for Artificial Intelligence, and University of Washington and first released in February 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a contextual understanding of tokens. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. ELMo was historically important as a pioneer of self-supervised generative pretraining followed by fine-tuning, where a large model is trained to reproduce a large corpus, then the large model is augmented with additional task-specific weights and fine-tuned on supervised task data. It was an instrumental step in the evolution towards transformer-based language modelling. == Architecture == ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. The input text sequence is first mapped by an embedding layer into a sequence of vectors. Then two parts are run in parallel over it. The forward part is a 2-layered LSTM with 4096 units and 512 dimension projections, and a residual connection from the first to second layer. The backward part has the same architecture, but processes the sequence back-to-front. The outputs from all 5 components (embedding layer, two forward LSTM layers, and two backward LSTM layers) are concatenated and multiplied by a linear matrix ("projection matrix") to produce a 512-dimensional representation per input token. ELMo was pretrained on a text corpus of 1 billion words. The forward part is trained by repeatedly predicting the next token, and the backward part is trained by repeatedly predicting the previous token. After the ELMo model is pretrained, its parameters are frozen, except for the projection matrix, which can be fine-tuned to minimize loss on specific language tasks. This is an early example of the pretraining-fine-tune paradigm. The original paper demonstrated this by improving state of the art on six benchmark NLP tasks. === Contextual word representation === The architecture of ELMo accomplishes a contextual understanding of tokens. For example, the first forward LSTM of ELMo would process each input token in the context of all previous tokens, and the first backward LSTM would process each token in the context of all subsequent tokens. The second forward LSTM would then incorporate those to further contextualize each token. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. For example, consider the sentenceShe went to the bank to withdraw money.In order to represent the token "bank", the model must resolve its polysemy in context. The first forward LSTM would process "bank" in the context of "She went to the", which would allow it to represent the word to be a location that the subject is going towards. The first backward LSTM would process "bank" in the context of "to withdraw money", which would allow it to disambiguate the word as referring to a financial institution. The second forward LSTM can then process "bank" using the representation vector provided by the first backward LSTM, thus allowing it to represent it to be a financial institution that the subject is going towards. == Historical context == ELMo is one link in a historical evolution of language modelling. Consider a simple problem of document classification, where we want to assign a label (e.g., "spam", "not spam", "politics", "sports") to a given piece of text. The simplest approach is the "bag of words" approach, where each word in the document is treated independently, and its frequency is used as a feature for classification. This was computationally cheap but ignored the order of words and their context within the sentence. GloVe and Word2Vec built upon this by learning fixed vector representations (embeddings) for words based on their co-occurrence patterns in large text corpora. Like BERT (but unlike "bag of words" such as Word2Vec and GloVe), ELMo word embeddings are context-sensitive, producing different representations for words that share the same spelling. It was trained on a corpus of about 30 million sentences and 1 billion words. Previously, bidirectional LSTM was used for contextualized word representation. ELMo applied the idea to a large scale, achieving state of the art performance. After the 2017 publication of Transformer architecture, the architecture of ELMo was changed from a multilayered bidirectional LSTM to a Transformer encoder, giving rise to BERT. BERT has a similar pretrain-fine-tune workflow, but uses a Transformer with implications for more parallelizable training.

    Read more →
  • Knowledge graph

    Knowledge graph

    In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. Since the development of the Semantic Web, knowledge graphs have often been associated with linked open data projects, focusing on the connections between concepts and entities. They are also historically associated with and used by search engines such as Google, Bing, and Yahoo; knowledge engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks such as LinkedIn and Facebook. Recent developments in data science and machine learning, particularly in graph neural networks, representation learning, and machine learning, have broadened the scope of knowledge graphs beyond their traditional use in search engines and recommender systems. They are increasingly used in scientific research, with notable applications in fields such as genomics, proteomics, and systems biology. == History == The term was coined as early as 1972 by the Austrian linguist Edgar W. Schneider, in a discussion of how to build modular instructional systems for courses. In the late 1980s, the University of Groningen and University of Twente jointly began a project called Knowledge Graphs, focusing on the design of semantic networks with edges restricted to a limited set of relations, to facilitate algebras on the graph. In subsequent decades, the distinction between semantic networks and knowledge graphs was blurred. Some early knowledge graphs were topic-specific. In 1985, Wordnet was founded, capturing semantic relationships between words and meanings – an application of this idea to language itself. In 2005, Marc Wirk founded Geonames to capture relationships between different geographic names and locales and associated entities. In 1998, Andrew Edmonds of Science in Finance Ltd in the UK created a system called ThinkBase that offered fuzzy-logic based reasoning in a graphical context. In 2007, both DBpedia and Freebase were founded as graph-based knowledge repositories for general-purpose knowledge. DBpedia focused exclusively on data extracted from Wikipedia, while Freebase also included a range of public datasets. Neither described themselves as a 'knowledge graph' but developed and described related concepts. In 2012, Google introduced their Knowledge Graph, building on DBpedia and Freebase among other sources. They later incorporated RDFa, Microdata, JSON-LD content extracted from indexed web pages, including the CIA World Factbook, Wikidata, and Wikipedia. Entity and relationship types associated with this knowledge graph have been further organized using terms from the schema.org vocabulary. The Google Knowledge Graph became a complement to string-based search within Google, and its popularity online brought the term into more common use. Since then, several large multinationals have advertised their use of knowledge graphs, further popularising the term. These include Facebook, LinkedIn, Airbnb, Microsoft, Amazon, Uber and eBay. In 2019, IEEE combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph. The development of large language models expanded interest in knowledge graphs as a way to structure information from unstructured text, with advances in language processing enabling their automatic or semi-automatic generation and expansion. The term knowledge graph has since broadened to include the dynamically constructed and adaptive graph structures, which support retrieval, reasoning, and summarization in generative systems. Microsoft Research's GraphRAG (2024) exemplified this development by integrating LLM-generated graphs into retrieval-augmented generation. == Definitions == There is no single commonly accepted definition of a knowledge graph. Most definitions view the topic through a Semantic Web lens and include these features: Flexible relations among knowledge in topical domains: A knowledge graph (i) defines abstract classes and relations of entities in a schema, (ii) mainly describes real world entities and their interrelations, organized in a graph, (iii) allows for potentially interrelating arbitrary entities with each other, and (iv) covers various topical domains. General structure: A network of entities, their semantic types, properties, and relationships. To represent properties, categorical or numerical values are often used. Supporting reasoning over inferred ontologies: A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. There are, however, many knowledge graph representations for which some of these features are not relevant. For those knowledge graphs, this simpler definition may be more useful: A digital structure that represents knowledge as concepts and the relationships between them (facts). A knowledge graph can include an ontology that allows both humans and machines to understand and reason about its contents. === Implementations === In addition to the above examples, the term has been used to describe open knowledge projects such as YAGO and Wikidata; federations like the Linked Open Data cloud; a range of commercial search tools, including Yahoo's semantic search assistant Spark, Google's Knowledge Graph, and Microsoft's Satori; and the LinkedIn and Facebook entity graphs. The term is also used in the context of note-taking software applications that allow a user to build a personal knowledge graph. The popularization of knowledge graphs and their accompanying methods have led to the development of graph databases such as Neo4j, GraphDB and AgensGraph. These graph databases allow users to easily store data as entities and their interrelationships, and facilitate operations such as data reasoning, node embedding, and ontology development on knowledge bases. In contrast, virtual knowledge graphs do not store information in specialized databases. They rely on an underlying relational database or data lake to answer queries on the graph. Such a virtual knowledge graph system must be properly configured in order to answer the queries correctly. This specific configuration is done through a set of mappings that define the relationship between the elements of the data source and the structure and ontology of the virtual knowledge graph. == Using a knowledge graph for reasoning over data == A knowledge graph formally represents semantics by describing entities and their relationships. Knowledge graphs may make use of ontologies as a schema layer. By doing this, they allow logical inference for retrieving implicit knowledge rather than only allowing queries requesting explicit knowledge. In order to allow the use of knowledge graphs in various machine learning tasks, several methods for deriving latent feature representations of entities and relations have been devised. These knowledge graph embeddings allow them to be connected to machine learning methods that require feature vectors like word embeddings. This can complement other estimates of conceptual similarity. Models for generating useful knowledge graph embeddings are commonly the domain of graph neural networks (GNNs). GNNs are deep learning architectures that comprise edges and nodes, which correspond well to the entities and relationships of knowledge graphs. The topology and data structures afforded by GNNs provide a convenient domain for semi-supervised learning, wherein the network is trained to predict the value of a node embedding (provided a group of adjacent nodes and their edges) or edge (provided a pair of nodes). These tasks serve as fundamental abstractions for more complex tasks such as knowledge graph reasoning and alignment. === Entity alignment === As new knowledge graphs are produced across a variety of fields and contexts, the same entity will inevitably be represented in multiple graphs. However, because no single standard for the construction or representation of knowledge graph exists, resolving which entities from disparate graphs correspond to the same real world subject is a non-trivial task. This task is known as knowledge graph entity alignment, and is an active area of research. Strategies for entity alignment generally seek to identify similar substructures, semantic relationships, shared attributes, or combinations of all three between two distinct knowledge graphs. Entity alignment methods use these structural similarities between generally non-isomorphic graphs to predict which nodes correspond to the same entity. In 2023, researchers found success in using large language models (LLMs) in the task of entity alignment. This was in particul

    Read more →
  • Ordered key–value store

    Ordered key–value store

    An ordered key–value store (OKVS) is a type of data storage paradigm that can support multi-model databases. An OKVS is an ordered mapping of bytes to bytes. An OKVS will keep the key–value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered key–value stores found their way into many modern database systems including NewSQL database systems. == History == The origin of ordered key–value store stems from the work of Ken Thompson on dbm in 1979. Later in 1991, Berkeley DB was released that featured a B-Tree backend that allowed the keys to stay sorted. Berkeley DB was said to be very fast and made its way into various commercial product. It was included in Python standard library until 2.7. In 2009, Tokyo Cabinet was released that was superseded by Kyoto Cabinet that support both transaction and ordered keys. In 2011, LMDB was created to replace Berkeley DB in OpenLDAP. There is also Google's LevelDB that was forked by Facebook in 2012 as RocksDB. In 2014, WiredTiger, successor of Berkeley DB was acquired by MongoDB and is since 2019 the primary backend of MongoDB database. Other notable implementation of the OKVS paradigm are Sophia and SQLite3 LSM extension. Another notable use of OKVS paradigm is the multi-model database system called ArangoDB based on RocksDB. Some NewSQL databases are supported by ordered key–value stores. JanusGraph, a property graph database, has both a Berkeley DB backend and FoundationDB backend. == Key concepts == === Lexicographic encoding === There are algorithms that encode basic data types (boolean, string, number) and composition of those data types inside sorted containers (tuple, list, vector) that preserve their natural ordering. It is possible to work with an ordered key–value store without having to work directly with bytes. In FoundationDB, it is called the tuple layer. === Range query === Inside an OKVS, keys are ordered, and because of that it is possible to do range queries. A range query retrieves all keys between two specified keys, ensuring that the fetched keys are returned in a sorted order. === Subspaces === === Key composition === One can construct key spaces to build higher level abstractions. The idea is to construct keys, that takes advantage of the ordered nature of the top level key space. When taking advantage of the ordered nature of the key space, one can query ranges of keys that have particular pattern. === Denormalization === Denormalization, as in, repeating the same piece of data in multiple subspace is common practice. It allows to create secondary representation, also called indices, that will allow to speed up queries. == Higher level abstractions == The following abstraction or databases were built on top ordered key–value stores: Timeseries database, Record Database, also known as Row store databases, they behave similarly to what is dubbed RDBMS, Tuple Stores, also known as Triple Store or Quad Store but also Generic Tuple Store, Document database, that mimics MongoDB API, Full-text search Geographic Information Systems Property Graph Versioned Data Vector space database for Approximate Nearest Neighbor All those abstraction can co-exist with the same OKVS database and when ACID is supported, the operations happens with the guarantees offered by the transaction system. == Feature matrix == == Use-cases == OKVS are useful to implement two strategies: optimize a small feature e.g. to make a 10% improvement in read or write latency; the second strategy is to take advantage of the distributed nature of FoundationDB, and TiKV, for which there is no equivalent at very large scale in resilience. Both users need to re-implement the needed high level abstractions, because there are no portable ready-to-use libraries of high-level abstraction. There is still a complex balance, of complexity, maintainability, fine-tuning, and readily available features that makes it still a choice of experts. Sometime more specialized data-structures can be faster than a high-level abstraction on top of an OKVS. Another interest of OKVS paradigm stems from it simple, and versatile interface, that makes it an interesting target for experimental storage algorithms, and data structures.

    Read more →
  • ArchiMate

    ArchiMate

    ArchiMate ( AR-ki-mayt) is an open and independent enterprise architecture modeling language to support the description, analysis and visualization of architecture within and across business domains in an unambiguous way. ArchiMate is a technical standard from The Open Group and is based on concepts from the now superseded IEEE 1471 standard. It is supported by various tool vendors and consulting firms. ArchiMate is also a registered trademark of The Open Group. The Open Group has a certification program for ArchiMate users, software tools and courses. ArchiMate distinguishes itself from other languages such as Unified Modeling Language (UML) and Business Process Modeling and Notation (BPMN) by its enterprise modelling scope. Also, UML and BPMN are meant for a specific use and they are quite heavy – containing about 150 (UML) and 250 (BPMN) modeling concepts whereas ArchiMate works with just about 50 (in version 2.0). The goal of ArchiMate is to be ”as small as possible”, not to cover every edge scenario imaginable. To be easy to learn and apply, ArchiMate was intentionally restricted “to the concepts that suffice for modeling the proverbial 80% of practical cases". == Overview == ArchiMate offers a common language for describing the construction and operation of business processes, organizational structures, information flows, IT systems, and technical infrastructure. This insight helps the different stakeholders to design, assess, and communicate the consequences of decisions and changes within and between these business domains. The main concepts and relationships of the ArchiMate language can be seen as a framework, the so-called Archimate Framework: It divides the enterprise architecture into a business, application and technology layer. In each layer, three aspects are considered: active elements, an internal structure and elements that define use or communicate information. One of the objectives of the ArchiMate language is to define the relationships between concepts in different architecture domains. The concepts of this language therefore hold the middle between the detailed concepts, which are used for modeling individual domains (for example, the Unified Modeling Language (UML) for modeling software products), and Business Process Model and Notation (BPMN), which is used for business process modeling. == History == ArchiMate is partly based on the now superseded IEEE 1471 standard. It was developed in the Netherlands by a project team from the Telematica Instituut in cooperation with several Dutch partners from government, industry and academia. Among the partners were Ordina NV, Radboud Universiteit Nijmegen, the Leiden Institute for Advanced Computer Science (LIACS) and the Centrum Wiskunde & Informatica (CWI). Later, tests were performed in organizations such as ABN AMRO, the Dutch Tax and Customs Administration and the ABP. The development process lasted from July 2002 to December 2004, and took about 35 person years and approximately 4 million euros. The development was funded by the Dutch government (Dutch Tax and Customs Administration), and business partners, including ABN AMRO and the ABP Pension Fund. In 2008 the ownership and stewardship of ArchiMate was transferred to The Open Group. It is now managed by the ArchiMate Forum within The Open Group. In February 2009 The Open Group published the ArchiMate 1.0 standard as a formal technical standard. In January 2012 the ArchiMate 2.0 standard, and in 2013 the ArchiMate 2.1 standard was released. In June 2016, the Open Group released version 3.0 of the ArchiMate Specification. An update to Archimate 3.0.1 came out in August 2017. Archimate 3.1 was published 5 November 2019. The latest version of the ArchiMate Specification is version 3.2 released October 2022. Version 3.0 adds enhanced support for capability-oriented strategic modelling, new entities representing physical resources (for modelling the ingredients, equipment and transport resources used in the physical world) and a generic metamodel showing the entity types and the relationships between them. == ArchiMate framework == === Core framework === The main concepts and elements of the ArchiMate language are being presented as ArchiMate core framework. It consists of three layers and three aspects. This creates a matrix of combinations. Every layer has its passive structure, behavior and active structure aspects. ==== Layers ==== ArchiMate has a layered and service-oriented look on architectural models. The higher layers make use of services that are provided by the lower layers. Although, at an abstract level, the concepts that are used within each layer are similar, we define more concrete concepts that are specific for a certain layer. In this context, we distinguish three main layers: The business layer is about business processes, services, functions and events of business units. This layer "offers products and services to external customers, which are realized in the organization by business processes performed by business actors and roles". The application layer is about software applications that "support the components in the business with application services". The technology layer deals "with the hardware and communication infrastructure to support the application layer. This layer offers infrastructural services needed to run applications, realized by computer and communication hardware and system software". Each of these main layers can be further divided in sub-layers. For example, in the business layer, the primary business processes realising the products of a company may make use of a layer of secondary (supporting) business processes; in the application layer, the end-user applications may make use of generic services offered by supporting applications. On top of the business layer, a separate environment layer may be added, modelling the external customers that make use of the services of the organisation (although these may also be considered part of the business layer). In line with service orientation, the most important relation between layers is formed by use relations, which show how the higher layers make use of the services of lower layers. However, a second type of link is formed by realisation relations: elements in lower layers may realise comparable elements in higher layers; e.g., a ‘data object’ (application layer) may realise a ‘business object’ (business layer); or an ‘artifact’ (technology layer) may realise either a ‘data object’ or an ‘application component’ (application layer). ==== Aspects ==== Passive structure is the set of entities on which actions are conducted. In the business layer the example would be information objects, in the application layer data objects and in the technology layer, they could include physical objects. Behavior refers to the processes and functions performed by the actors. "Structural elements are assigned to behavioral elements, to show who or what displays the behavior". Active structure is the set of entities that display some behavior, e.g. business actors, devices, or application components. === Full framework === The Full ArchiMate framework is enriched by the physical layer, which was added to allow modeling of “physical equipment, materials, and distribution networks” and was not present in the previous version. The implementation and migration layer adds elements that allow architects to model a state of transition, to mark parts of the architecture that are temporary for the purpose, as the name says, of implementation and migration. Strategy layer adds three elements: resource, capability and course of action. These elements help to incorporate strategic dimension to the ArchiMate language by allowing it to depict the usage of resources and capabilities in order to achieve some strategic goals. Finally, there is a motivation aspect that allows different stakeholders to describe the motivation of specific actors or domains, which can be quite important when looking at one thing from several different angles. It adds several elements like stakeholder, value, driver, goal, meaning etc. == ArchiMate language == The ArchiMate language is formed as a top-level and is hierarchical. On the top, there is a model. A model is a collection of concepts. A concept can be either an element or a relationship. An element can be either of behavior type, structure, motivation or a so-called composite element (which means that it does not fit just one aspect of the framework, but two or more). The functionality of all concepts without a dependency on a specific layer is described by the generic metamodel. This layer-independent description of concepts is useful when trying to understand the mechanics of the Archimate language. === Concepts === ==== Elements ==== The generic elements are distributed into the same categories as the layers: Active structure elements Behavior elements Passive structure elements Motivation elements Active structure e

    Read more →
  • Diia

    Diia

    Diia (Ukrainian: Дія [ˈd⁽ʲ⁾ijɐ] , lit. 'Action'; also an acronym for Держава і Я, Derzhava i Ya, IPA: [derˈʒɑwɐ i ˈjɑ], lit. 'State and Me') is a mobile app, a web portal and a brand of e-governance in Ukraine. Launched in 2020, the Diia app allows Ukrainian citizens to use digital documents on their smartphones instead of physical ones for identification and sharing purposes. The Diia portal allows access to over 130 government services. Eventually, the government plans to make all kinds of state-person interactions available through Diia. Diia was built in partnership with the United States and is poised to be shared with other countries. On the sidelines of the 2023 World Economic Forum in Davos, USAID Administrator Samantha Power said the US hopes to replicate the success of Diia in other countries. == History == Diia was first presented on September 27, 2019, by the Ministry of Digital Transformation of Ukraine as a brand of the State in a Smartphone project. Vice Prime Minister and Minister of Digital Transformation Mykhailo Fedorov announced the creation of a mobile app and a web portal that would unite in a single place all the services provided by the state to citizens and businesses. On February 6, 2020, the mobile app Diia was officially launched. During the presentation, Ukrainian President Volodymyr Zelensky said that 9 million Ukrainians now have access to their driver's license and car registration documents on their phones, while Prime Minister Oleksiy Honcharuk called the implementation of the State in a Smartphone project a priority for the government. In April 2020, the Ukrainian government approved a resolution for experimental usage of digital ID-cards and passports which would be issued to all Ukrainians via the Diia. On October 5, 2020, during the Diia Summit, the government presented a first major update of the app and web portal branded "Diia 2.0". More types of documents were added to the app as well as the ability to share documents with others via a single tap on a push-message. The web portal in turn expanded the number of available services to 27, including the ability to register a private limited company in half an hour. President Zelensky who opened the summit, announced that in 2021 Ukraine will enter the "paper less" mode by prohibiting civil servants from demanding paper documents. By the end of 2020, the app had more than six million users, while the portal had 50 available services. In March 2021, the Ukrainian parliament adopted a bill equating digital identity documents with their physical analogues. Starting on August 23, Ukrainian citizens can use digital ID-cards and passports for all purposes while in Ukraine. According to Minister of Digital Transformation Mykhailo Fedorov, Ukraine will become the first country in the world where digital identity documents are considered legally equivalent to ordinary ones. In September 2024, Diia launched an online marriage registration service, which can be beneficial especially for military personnel who spend much time on the frontline separated from their partners. In October 2024, Diia's online marriage service appeared in Time's Inventions of the 2024 list. In the first month of its operations over 1.1 million Ukrainians tried to make proposals using the technology, and 435 couples got married. == Benefits and challenges == The first and most obvious benefit is the convenience of such a platform. Citizens can have many documents on their smartphones at once, without concern about losing or damaging them. Whenever needed, they can just open an app on their smartphones and show/check the document they need. The idea is that Diia will help cut the bureaucracy associated with public services, which in turn will help fight corruption and increase government savings. Fewer people are needed to be employed in the public sector and fewer human to human interactions are supposed to happen. With the start of the program, already 10% of government employees were reduced, which contributes to hundreds of millions of dollars in savings, but besides this, the initiative also improves the speed, efficiency, and transparency of government services. In addition, the digitalization of the government sector helps to develop the whole IT industry in the country, people become more digitally aware and educated, this affects other sectors as well, increasing the spread of digital infrastructure and expediting the speed of overall digitalization. The UN E-government Development Index, which assesses the capabilities of governments to integrate its functions electronically, such as the use of internet and mobile devices, ranked Ukraine 69th in 193 countries surveyed in 2020. Despite its low ranking in the e-government development index, Ukraine made a big jump on the e-participation index, which they ranked 43rd out of 193 countries from 0.66 in 2018 to 0.81 in 2020 (un.org, 2020), suggesting that the government and its citizens are adapting the IT-based government functions. The main goal of e-government according to Perez-Morote et.al. (2020) is to have accountability and transparency among the countries involved. But to do so, there are several challenges that a country should assess first prior to implementing e-government. In the research written by Heeks (2001), the author identified 2 main challenges that countries face in the development of e-government, first is the strategic challenge which involves the preparedness (e-readiness) of the entire government system for electronic transformation, and second challenge is the tactical challenge where the government must design (e-governance design) a system where it can be understood by every user, it's important that the information that needs to be communicated to the consumers is received clearly. For the first challenge (e-readiness), Ukraine had an internet penetration rate of 76% in 2020 and is expected to grow to 82%, it is important that consumers have the internet access for it to enable the consumers to utilize the service. Another factor is the readiness of its institutional infrastructure, which means that the government has its own organization which is solely focused on implementing the e-government project. In the case of Ukraine, the e-governance team is led by Oleksandr Ryzhenko, and the country's e-governance initiative is even further strengthened by ensuring that the data and legal infrastructure are already prepared. Ukraine has done this by modernizing their legislation that is more appropriate in the digital service, and the data exchange solution used by Ukraine is called Trembita. The human infrastructure is also being updated, as competent individuals must be the one doing the task, hence, EGOV4UKRAINE was launched, this aims to get IT developers for developing a system for administrative services. These efforts by the Ukrainian government did not go unnoticed, and they received an award from the e-Governance Academy as "partner of the year 2017". For the second challenge, which deals with the system design, the success of Ukraine can be seen on the latest data of UNDP, where it shows a high increase in the E-participation index. In 2018, Ukraine ranked 75th it ranked 46th in 2020 (un.org, 2020). Despite visible success, the implementation of the e-government was accompanied by problems. Data leakage became the main one. In May 2020, the data of 26 million driver's licenses appeared in the public domain on the Internet. The Ukrainian government said the Diia app was not linked to a data breach, but it is impossible to say for certain. Any storage of official documents in electronic format is associated with the risk of their leakage. In addition, the Diia application still has data protection issues, as the required protection system has not been implemented. This is also compounded by the country's weak data protection legal regime. In addition, since 2023, Ukrainians are able to register their cars with this app. Issued license plates are not using regional codes, but they are using special codes starting with DI or PD. == Diia City == In May 2020, the government presented Diia City headed by Oleksandr Borniakov, a large-scale project which would establish a virtual model of a free economic zone for representatives of the creative economy. It would provide for special digital residency with a particular taxation regime, intellectual property protection and simplified regulations. Diia City concurrently imposes certain constraints on contracts involving individual entrepreneurs (FOPs). It also offers the benefit of tax rebates. Diia City garners endorsement from the Ukrainian government, believing it will support the country's position in the IT market. As of July 30, 2023, the program had more than 600 residents, including companies like iGama, Avenga, SBRobotiks, and Intellectsoft.

    Read more →
  • Information Rules

    Information Rules

    Information Rules is a 1999 book by Carl Shapiro and Hal Varian applying traditional economic theories to modern information-based technologies. The book examines commercial strategies appropriate to companies that deal in information, given the high "first copy" and low "subsequent copy" costs of information commodities, such as music CDs or original texts. == Content == The book examines competing standards, and how a company might influence widespread consumer acceptance of one over another, such as VHS versus Betamax, or HD DVD versus Blu-ray. The book mentions possible business strategies of such publishers as Encyclopædia Britannica who have to confront how to stay viable as technology changes the value and availability of information.

    Read more →
  • Distributed transaction

    Distributed transaction

    A distributed transaction operates within a distributed environment, typically involving multiple nodes across a network depending on the location of the data. A key aspect of distributed transactions is atomicity, which ensures that the transaction is completed in its entirety or not executed at all. It's essential to note that distributed transactions are not limited to databases. The Open Group, a vendor consortium, proposed the X/Open Distributed Transaction Processing Model (X/Open XA), which became a de facto standard for the behavior of transaction model components. Databases are common transactional resources and, often, transactions span a couple of such databases. In this case, a distributed transaction can be seen as a database transaction that must be synchronized (or provide ACID properties) among multiple participating databases which are distributed among different physical locations. The isolation property (the I of ACID) poses a special challenge for multi database transactions, since the (global) serializability property could be violated, even if each database provides it (see also global serializability). In practice most commercial database systems use strong strict two-phase locking (SS2PL) for concurrency control, which ensures global serializability, if all the participating databases employ it. A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit (2PC). This algorithm is usually applied for updates able to commit in a short period of time, ranging from couple of milliseconds to couple of minutes. There are also long-lived distributed transactions, for example a transaction to book a trip, which consists of booking a flight, a rental car and a hotel. Since booking the flight might take up to a day to get a confirmation, two-phase commit is not applicable here, it will lock the resources for this long. In this case more sophisticated techniques that involve multiple undo levels are used. The way you can undo the hotel booking by calling a desk and cancelling the reservation, a system can be designed to undo certain operations (unless they are irreversibly finished). In practice, long-lived distributed transactions are implemented in systems based on web services. Usually these transactions utilize principles of compensating transactions, Optimism and Isolation Without Locking. The X/Open standard does not cover long-lived distributed transactions. Several technologies, including Jakarta Enterprise Beans and Microsoft Transaction Server fully support distributed transaction standards. == Synchronization == In event-driven architectures, distributed transactions can be synchronized through using request–response paradigm and it can be implemented in two ways: Creating two separate queues: one for requests and the other for replies. The event producer must wait until it receives the response. Creating one dedicated ephemeral queue for each request.

    Read more →
  • Artificial intelligence in Brazilian industry

    Artificial intelligence in Brazilian industry

    In 2022, 16.9% (1,620) of the 9,586 Brazilian industrial companies with 100 or more employees used artificial intelligence in their operations Among the companies that used AI, the areas of administration (73.8%), product project development (65.9%), processes, services and marketing (65.1%) were those that used it the most, followed by the areas of production (56.4%) and logistics (48.4%). == Current scenario == === Adoption in Brazilian industrial sectors === In senior management, the majority (56%) of executives have a long-term vision for its use. The study also shows that IT, Innovation, and Marketing are the areas where AI use is most widespread, and that 43% of companies are developing or adapting the algorithms they use. The majority of large institutions that reported some type of AI use purchased these solutions from other companies (76%). Some factors for the adoption of artificial intelligence in companies include the establishment of an autonomous strategy by the company (87.0%), and the influence of suppliers and/or customers (63.0%) and the main difficulties in using technologies were high costs (80.8%), lack of qualified personnel in the company (54.6%) and excessive economic risks (49.5%). Three variables are considered the most relevant to explain the option to use AI: the implementation of a digital security policy, the size of companies with 250 or more employees and the characteristics of the company related to information and communication. When analyzing AI use by company size in Brazil, large companies have the highest proportion of AI use, mainly due to their investment capacity and technology experimentation. However, when comparing Brazil and Europe, indicators show an acceleration in AI use among large European companies, while in Brazil the situation remains stable. In 2023, 30% of large companies in the European bloc used some type of AI, a figure that rose to 41% in 2024, while in Brazil these proportions were 41% in 2023 and 38% in 2024. === Workforce === The challenge of upskilling begins with employees who are capable of understanding recent technological changes. Similarly, companies must create the environment and conditions for workforce development conducive to innovation, and universities must be prepared to provide knowledge aligned with the transition process, which in turn must be supported by public policies. The concern with training a specialized workforce in AI can be seen in the low number of graduates and PhDs in computer science and computer engineering in Brazil, compared to the number shown in other countries. As recorded in the document Recommendations for the Advancement of Artificial Intelligence in Brazil, 2019 data from the Coordination for the Improvement of Higher Education Personnel (CAPES) indicate that "the number of PhDs graduated annually in computing remained below 400 in 2016, and is not expected to have increased during the Covid-19 pandemic" (ABC, 2023). In the United States, by contrast, the number of PhDs graduated in these two areas has remained around 1,800 for the past 11 years, and during this period, the number of PhDs specializing in AI jumped from 10% to 19%. Based on data from the CNPq Lattes Platform (October 2019), it is possible to observe that the number of professionals in the AI field in Brazil is 4,429 specialists. This is still a small number compared to the 415,166 IT jobs in the country's business sector alone. === R&D, scientific production and integration with industry === China and the United States lead in the number of publications. These two countries are followed by the G7 members: India, Austria, South Korea, and Spain. Brazil appears in the next group, alongside the Netherlands, Russia, Indonesia, and Ireland. Regarding the promotion of research and technologies related to AI, public entities such as the Coordination for the Improvement of Higher Education Personnel (Capes) and the National Council for Scientific and Technological Development (CNPq) stood out as the main funders. Currently, different countries and territories have been promoting the development of Artificial Intelligence (AI). In the Brazilian case, one of the main initiatives is the creation of Engineering Research Centers/Applied Research Centers (CPE/CPA) in AI by the São Paulo Research Foundation (FAPESP), in collaboration with the Ministry of Science, Technology and Innovation (MCTI), the Ministry of Communications (MC) and the Brazilian Internet Steering Committee (CGI.br). In terms of the number of patents filed and the volume of investments, the leading nations in AI are the United States, China, France, Germany, the United Kingdom, Russia, India, Switzerland, Japan, South Korea, the Netherlands, Sweden, Finland, Ireland, Singapore, Canada, Israel, and Italy. Brazil appears among the top twenty countries in some rankings, mainly due to its good number of publications (approximately 10% of the number of articles published by the United States). The US is home to approximately 60% of the world's top AI researchers, followed by China (11%), Europe (10%), and Canada (6%). To change this scenario, in August 2024, the Brazilian government announced an investment of R$23 billion until 2028 in artificial intelligence, seeking to “transform the country into a global reference in innovation”. == Future challenges == The Organization for Economic Cooperation and Development (2020) report highlighted three factors that hinder the digital transformation journey and application of AI in Brazil: insufficient infrastructure, high costs due to the tax system, and financial limitations, such as limited access to financing. The costs of adopting technology, its incompatibility with the business, and the lack of training also represent obstacles that Brazilian industry must overcome. There are also inherent obstacles for companies. A McKinsey review emphasizes that once a company chooses one or more sectors to focus on, it must select specific applications. Buyers aren't interested in artificial intelligence simply because it's a breakthrough technology; they want AI to generate a good return on investment, whether by solving specific problems, saving money, or increasing sales. If an AI vendor tried to offer a horizontal solution, the value proposition might not be as compelling. Part of the solution to Brazil's technological backwardness involves building an ecosystem fueled by private institutions, universities, and governments.

    Read more →
  • Lexxe

    Lexxe

    Lexxe is an internet search engine that applies Natural Language Processing in its semantic search technology. Founded in 2005 by Dr. Hong Liang Qiao, Lexxe is based in Sydney, Australia. Today, Lexxe's key focus is on sentiment search with the launch of a news sentiment search site at News & Moods (www.newsandmoods.com). Lexxe has experienced several stages of change of focus in search technology: Lexxe launched its Alpha version in 2005, featuring Natural Language question answering (i.e. users could ask questions in English to the search engine apart from keyword searches — this feature has been suspended for redevelopment since 2010). It used only algorithms to extract answers from web pages, with no question-answer pair databases prepared in advance. In 2011, Lexxe launched a beta version with a new search technology called Semantic Key. Semantic Keys enable users to query with a conceptual keyword (or a keyword with a special meaning, hence the term Semantic Key) in order to find instances under the concept, e.g. price → $5.95 or €200, color → red, yellow, white. For example, “price: a pound of apples”, “color: ferrari”. With initial 500 Semantic Keys at the Beta launch, Lexxe became the first search engine in the world to offer this unique and useful search technology to the users. The cost of building Semantic Keys was too heavy though. In 2017, Lexxe launched News & Moods (www.newsandmoods.com), an open platform for news sentiment search, a first step towards sentiment search feature for the entire Internet search in Lexxe search engine. News & Moods also comes with smartphone apps in Android and iOS.

    Read more →
  • Reference data

    Reference data

    Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data

    Read more →
  • Ontology for Biomedical Investigations

    Ontology for Biomedical Investigations

    The Ontology for Biomedical Investigations (OBI) is an open-access, integrated ontology for the description of biological and clinical investigations. OBI provides a model for the design of an investigation, the protocols and instrumentation used, the materials used, the data generated and the type of analysis performed on it. The project is being developed as part of the OBO Foundry and as such adheres to all the principles therein such as orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and the use of a common formal language. In OBI the common formal language used is the Web Ontology Language (OWL). As of March 2008, a pre-release version of the ontology was made available at the project's SVN repository. == Scope == The Ontology for Biomedical Investigations (OBI) addresses the need for controlled vocabularies to support integration and joint ("cross-omics") analysis of experimental data, a need originally identified in the transcriptomics domain by the FGED Society, which developed the MGED Ontology as an annotation resource for microarray data.Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. (November 2007). "The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration". Nature Biotechnology. 25 (11): 1251–5. doi:10.1038/nbt1346. PMC 2814061. PMID 17989687. OBI uses the basic formal ontology upper-level ontology as a means of describing general entities that do not belong to a specific problem domain. As such, all OBI classes are a subclass of some BFO class. The ontology has the scope of modeling all biomedical investigations and as such contains ontology terms for aspects such as: biological material – for example blood plasma instrument (and parts of an instrument therein) – for example DNA microarray, centrifuge information content – such as an image or a digital information entity such as an electronic medical record design and execution of an investigation (and individual experiments therein) – for example study design, electrophoresis material separation data transformation (incorporating aspects such as data normalization and data analysis) – for example principal components analysis dimensionality reduction, mean calculation Less 'concrete' aspects such as the role a given entity may play in a particular scenario (for example the role of a chemical compound in an experiment) and the function of an entity (for example the digestive function of the stomach to nutriate the body) are also covered in the ontology. == OBI consortium == The MGED Ontology was originally identified in the transcriptomics domain by the FGED Society and was developed to address the needs of data integration. Following a mutual decision to collaborate, this effort later became a wider collaboration between groups such as FGED, PSI and MSI in response to the needs of areas such as transcriptomics, proteomics and metabolomics and the FuGO (Functional Genomics Investigation Ontology) was created. This later became the OBI covering the wider scope of all biomedical investigations. As an international, cross-domain initiative, the OBI consortium draws upon a pool of experts from a variety of fields, not limited to biology. The current list of OBI consortium members is available at the OBI consortium website. The consortium is made up of a coordinating committee which is a combination of two subgroups, the Community Representative (those representing a particular biomedical community) and the Core Developers (ontology developers who may or may not be members of any single community). Separate to the coordinating committee is the Developers Working Group which consists of developers within the communities collaborating in the development of OBI at the discretion of current OBI Consortium members. == Papers on OBI ==

    Read more →
  • Retention period

    Retention period

    A retention period (associated with a retention schedule or retention program) is an aspect of records and information management (RIM) and the records life cycle that identifies the duration of time for which the information should be maintained or "retained", irrespective of format (paper, electronic, or other). Retention periods vary with different types of information, based on content and a variety of other factors, including internal organizational need, regulatory requirements for inspection or audit, legal statutes of limitation, involvement in litigation, and taxation and financial reporting needs, as well as other factors as defined by local, regional, state, national, and/or international governing entities. Once an applicable retention period has elapsed for a given type or series of information, and all holds/moratoriums have been released, the information is typically destroyed using an approved and effective destruction method, which renders the information completely and irreversibly unusable via any means. Alternatively, it may be converted from one form to another (e.g. from paper to electronic), depending on the defined retention period per format. Information with historical value beyond its "usable value" may be accessioned to the custody of an archive organization for permanent or extended long-term preservation. == Defensible disposition == Defensible disposition refers to the ability of an identified and applied retention period to effectively provide for the defense of the record, and its eventual destruction or accessioning when scrutinized within a court of law or by other review. It is commonly advised by records and information management (RIM) professionals that any and all retention periods applied to organizational information should be reviewed and approved for use by competent legal counsel, which represents the organization, and is familiar with the specific business needs and legal and regulatory requirements of the organization. Additionally, a practical approach to information assessment/classification, proper documentation of the disposition program, strategic review of disposition policy over time for efficacy are required for proper defensible disposition. == Guidance and education organizations == ARMA International Information and Records Management Society filerskeepers records retention FAQ

    Read more →
  • NCAA transfer portal

    NCAA transfer portal

    The NCAA transfer portal is a National Collegiate Athletic Association (NCAA) application, database, and compliance tool that facilitates student athletes' transfers between member institutions. It is intended to bring greater transparency to the transfer process and to enable student athletes to publicize their desire to transfer. The transfer portal is an NCAA-wide database covering all three NCAA divisions, although most media coverage of the transfer portal involves its use in the top-level Division I (D-I). The portal launched on October 15, 2018. Regulations adopted in 2021 allowed student-athletes in D-I football, men's and women's basketball, men's ice hockey, and baseball to transfer schools using the portal once without sitting out a year. In 2024, the NCAA authorized athletes unlimited transfers. == Process == For Divisions I and II, once an athlete desiring to transfer informs their school; the school must enter the athlete's name in the database within two business days. Then coaches and staff from other universities may contact the athlete about potentially transferring. Before the January 2026 NCAA convention, Division III schools were allowed, but not required, to enter such a student into the portal. A proposal to require use of the portal in that division was approved at the convention. The timeline for D-III members to enter athletes into the portal differs from that of the other divisions. Athletes wishing to enter the portal must first complete an educational module. Once completed, the school has seven calendar days to enter the athlete's transfer request into the portal. == Transfer windows == On August 31, 2022, the D-I board adopted a series of changes to transfer rules, introducing the concept of transfer windows, similar to those used in professional soccer worldwide. Student-athletes who wish to take advantage of the one-time transfer rule must, under normal circumstances, enter the portal within a designated window for their sport. These windows are slightly different for each NCAA sport, but are broadly grouped by the NCAA's three athletic "seasons". At that time, the windows were as follows: Fall sports – A 45-day winter window opening the day after championship selections are made in that sport, and a spring window from May 1–15. According to the NCAA, "reasonable accommodations" would be made for participants in football's FBS and FCS championship games (respectively the College Football Playoff National Championship and Division I Football Championship Game), both of which take place in early January. Participants in those games had a 14-day window opening on the day after the championship game, as well as the spring window. Winter sports – A 60-day window opening the day after championship selections are made in that sport. Spring sports – A winter window from December 1–15, and a 45-day spring window opening the day after championship selections are made in that sport. For sports included in the NCAA Emerging Sports for Women program, transfer windows are the same as those for fully recognized NCAA sports. As with fully recognized NCAA sports, transfer windows linked to championship events open on the day after selections are made for the generally recognized championship events in emerging sports. Student-athletes whose athletic aid is reduced, canceled, or not renewed by their school, as well as those affected by a university's elimination of a sports team, may enter the transfer portal at any time without penalty. A slightly different exception applies to those undergoing a head coaching change; student-athletes so affected in sports other than Division I football can enter the portal within 30 days of the change, starting on the day after the coach's departure is announced. The coaching change window also applied to Division I football before October 2025. Less than a month after transfer windows were adopted, the Division I Council adopted a change that affected only graduate transfers. Student-athletes who are set to graduate with remaining athletic eligibility, and plan to continue competition as postgraduate students, were exempt from transfer windows. They could enter the portal at any time during the academic year, and were not subject to the standard deadlines of May 1 for fall and winter sports and July 1 for spring sports. In April 2024, graduate transfers became subject to the same deadlines as all other transfer students. This change did not affect windows for student-athletes affected by a head coaching change, a loss of athletic aid, or the discontinuation of a team. Because the Ivy League allows neither redshirting nor athletic participation by graduate students, athletes at its member schools who are set to complete four years of attendance but still have remaining athletic eligibility may enter the portal at any time during their fourth academic year of attendance. In October 2024, the Division I Council reduced transfer windows in football and basketball to a total of 30 days. For FBS and FCS football, the fall window opened for 20 days, starting on the Monday after FBS conference championship games. Participants in postseason play had a 5-day window that opened on the day after each team's final game. A 10-day spring window opened in mid-April. In men's and women's basketball, a single 30-day window opens on the day after the second round of each Division I tournament concludes. The existing exceptions regarding head coaching changes, a loss of athletic aid, or the discontinuation of a team remained in place. Almost exactly a year later, Division I adopted more significant changes to the football transfer portal for both FBS and FCS. The previous two windows were abolished and replaced by a single window that opens from January 2–16. Participants in the College Football Playoff National Championship—the only game in FBS or FCS played after the closure of the new window—receive a 5-day window that opens on the day after that game. The window for players undergoing a head coaching change was also reduced. A new window of 15 days opens five calendar days after the hiring or public announcement of a new head coach. Should a school fail to hire or publicly announce a new head coach within 30 days after the previous coach's departure, the window will open on the 31st day after departure, provided that the 31st day is no earlier than January 3. This particular window, also open for 15 days, may open at any time before June 30. No change was announced to the exceptions for those affected by a loss of athletic aid or the discontinuation of a team. == Impact on high school recruiting == Effective July 1, 2025, the NCAA Division I Board of Directors implemented new DI roster limits following the court-approved House settlement. Additionally, according to the NCAA, "NCAA rules for Division I programs will no longer include sport-specific scholarship limits." As a result, many top Division I programs, especially those in power conferences, are relying heavily on the transfer portal to bring in conference- and national-level student-athletes. This shift in recruiting focus has already been exemplified across Division I men's and women's track and field especially, beginning in the recruitment cycle for 2025 college entries. Track and field coaches formerly managing rosters of 120-plus (60-plus men and 60-plus women) are now limited to 45 per side for a total of 90 roster spots across men's and women's track and field, meaning they are recruiting fewer student-athletes out of high school and more immediately impactful scholarship-worthy student-athletes via the transfer portal. Roster limits for track and field teams are even more stringent in the Southeastern Conference (SEC): 35 men and 35 women. For high school track and field athletes seeking opportunities with top DI programs, they no longer need to display potential to be point-scorers, but demonstrate the ability to contribute immediately, often by competing at a level aligned with conference scoring standards.

    Read more →
  • Paper data storage

    Paper data storage

    Paper data storage refers to the use of paper as a data storage device. This includes writing, illustrating, and the use of data that can be interpreted by a machine or is the result of the functioning of a machine. A defining feature of paper data storage is the ability of humans to produce it with only simple tools and interpret it visually. Though now mostly obsolete, paper was once an important form of computer data storage as both paper tape and punch cards were a common staple of working with computers before the 1980s. == History == Before paper was used for storing data, it had been used in several applications for storing instructions to specify a machine's operation. The earliest use of paper to store instructions for a machine was the work of Basile Bouchon who, in 1725, used punched paper rolls to control textile looms. This technology was later developed into the wildly successful Jacquard loom. The 19th century saw several other uses of paper for controlling machines. In 1846, telegrams could be prerecorded on punched tape and rapidly transmitted using Alexander Bain's automatic telegraph. Several inventors took the concept of a mechanical organ and used paper to represent the music. In the late 1880s Herman Hollerith invented the recording of data on a medium that could then be read by a machine. Prior uses of machine readable media, above, had been for control (automatons, piano rolls, looms, ...), not data. "After some initial trials with paper tape, he settled on punched cards..." Hollerith's method was used in the 1890 census. Hollerith's company eventually became the core of IBM. Other technologies were also developed that allowed machines to work with marks on paper instead of punched holes. This technology was widely used for tabulating votes and grading standardized tests. Banks used magnetic ink on checks, supporting MICR scanning. In an early electronic computing device, the Atanasoff–Berry Computer, electric sparks were used to singe small holes in paper cards to represent binary data. The altered dielectric constant of the paper at the location of the holes could then be used to read the binary data back into the machine by means of electric sparks of lower voltage than the sparks used to create the holes. This form of paper data storage was never made reliable and was not used in any subsequent machine. == Modern techniques == === 1D barcodes === Barcodes make it possible for any object that was to be sold or transported to have some computer readable information securely attached to it. Universal Product Code barcodes, first used in 1974, are ubiquitous today. Some people recommend a width of at least 3 pixels for each minimum-width gap and each minimum-width bar for 1D barcodes. The density is about 50 bits per linear inch (about 2 bit/mm). === 2D barcodes === 2D barcodes allow to store much more data on paper, up to 2.9 kbyte per barcode. It is recommended to have a width of at least 4 pixels—e.g., a 4 × 4 pixel = 16 pixel module. == Limits == The limits of data storage depend on the technology to write and read such data. The theoretical limits assume a scanner that can perfectly reproduce the printed image at its printing resolution, and a program which can accurately interpret such an image. For example, an 8 in × 10 in (200 mm × 250 mm) 600 dpi black-and-white image contains 3.43 MiB of data, as does a 300 dpi CMYK printed image. A 2,400 ppi True color (24-bit) image contains about 1.29 GiB of information; printing an image maintaining this data would require a printing resolution of about 120,000 dpi in black and white, or 60,000 dpi with CMYK dots.

    Read more →
  • Enterprise information integration

    Enterprise information integration

    Enterprise information integration (EII) is the ability to support a unified view of data and information for an entire organization. The goal of EII is to get a large set of heterogeneous data sources to appear to a user or system as a single, homogeneous data source. In a data virtualization application of EII, there is a process of information integration, using data abstraction to provide a unified interface (known as uniform data access) for viewing all the data within an organization, and a single set of structures and naming conventions (known as uniform information representation) to represent this data. == Overview == Data within an enterprise can be stored in heterogeneous formats, including relational databases (which themselves come in a large number of varieties), text files, XML files, spreadsheets and a variety of proprietary storage methods, each with their own indexing and data access methods. Standardized data access APIs have emerged that offer a specific set of commands to retrieve and modify data from a generic data source. Many applications exist that implement these APIs' commands across various data sources, most notably relational databases. Such APIs include ODBC, JDBC, XQJ, OLE DB, and more recently ADO.NET. There are also standard formats for representing data within a file that are very important to information integration. The best-known of these is XML, which has emerged as a standard universal representation format. There are also more specific XML "grammars" defined for specific types of data such as Geography Markup Language for expressing geographical features and Directory Service Markup Language for holding directory-style information. In addition, non-XML standard formats exist such as iCalendar for representing calendar information and vCard for business card information. Enterprise Information Integration (EII) applies data integration commercially. Despite the theoretical problems described above, the private sector shows more concern with the problems of data integration as a viable product. EII emphasizes neither correctness nor tractability, but speed and simplicity. === Uniform data access === Uniform data access means connectivity and controllability across numerous target data sources. Necessary to fields such as EII and Electronic Data Interchange (EDI), it is most often used regarding analysis of disparate data types and data sources, which must be rendered into a uniform information representation, and generally must appear homogenous to the analysis tools—when the data being analyzed is typically heterogeneous and widely varying in size, type, and original representation. === Uniform information representation === Uniform information representation allows information from several realms or disciplines to be displayed and worked with as if it came from the same realm or discipline. It takes information from a number of sources, which may have used different methodologies and metrics in their data collection, and builds a single large collection of information, where some records may be more complete than others across all fields of data Uniform information representation is particularly important in EII and Electronic Data Interchange (EDI), where different departments of a large organization may have collected information for different purposes, with different labels and units, until one department realized that data already collected by those other departments could be re-purposed for their own needs—saving the enterprise the effort and cost of re-collecting the same information. === Combining disparate data sets === Each data source is disparate and as such is not designed to support EII. Therefore, data virtualization as well as data federation depends upon accidental data commonality to support combining data and information from disparate data sets. Because of this lack of data value commonality across data sources, the return set may be inaccurate, incomplete, and impossible to validate. One solution is to recast disparate databases to integrate these databases without the need for ETL. The recast databases support commonality constraints where referential integrity may be enforced between databases. The recast databases provide designed data access paths with data value commonality across databases. === Simplicity of deployment === Even if recognized as a solution to a problem, EII as of 2009 currently takes time to apply and offers complexities in deployment. Proposed schema-less solutions include "Lean Middleware". === Handling higher-order information === Analysts experience difficulty—even with a functioning information integration system—in determining whether the sources in the database will satisfy a given application. Answering these kinds of questions about a set of repositories requires semantic information like metadata and/or ontologies. == Applications == EII products enable loose coupling between homogeneous-data consuming client applications and services and heterogeneous-data stores. Such client applications and services include Desktop Productivity Tools (spreadsheets, word processors, presentation software, etc.), development environments and frameworks (Java EE, .NET, Mono, SOAP or RESTful Web services, etc.), business intelligence (BI), business activity monitoring (BAM) software, enterprise resource planning (ERP), Customer relationship management (CRM), business process management (BPM and/or BPEL) Software, and web content management (CMS). == Data access technologies == Service Data Objects (SDO) for Java, C++ and .Net clients and any type of data source XQuery and XQuery API for Java

    Read more →