Datacap

Datacap

Datacap (an IBM Company), a privately owned company, manufactures and sells computer software, and services. Datacap's first product, Paper Keyboard, was a "forms processing" product and shipped in 1989. In August 2010, IBM announced that it had acquired Datacap for an undisclosed amount. == Overview == Datacap sells products through a value-added distribution network worldwide. The software is classified as "enterprise software", meaning that it requires trained professionals to install and configure. Although the Company has focused on providing solutions for scanning paper documents, most recently Company materials have emphasized customer requirements to handle electronic documents ("eDocs"), documents being received into an organization electronically (usually email). Datacap claims that its software is unique because of the rules engine ("Rulerunner") used for processing inbound documents, including performing the image processing (deskew, noise removal, etc.), optical character recognition (OCR), intelligent character recognition (ICR), validations, and export-release formatting of extracted data to target ERP and line of business application.

Data preprocessing

Data preprocessing can refer to manipulation, filtration or augmentation of data before it is analyzed, and is often an important step in the data mining process. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues. Preprocessing is the process by which unstructured data is transformed into intelligible representations suitable for machine-learning models. This phase of model deals with noise in order to arrive at better and improved results from the original data set which was noisy. This dataset also has some level of missing value present in it. The preprocessing pipeline used can often have large effects on the conclusions drawn from the downstream analysis. Thus, representation and quality of data is necessary before running any analysis. If there is a high proportion of irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase may be more difficult. Data preparation and filtering steps can take a considerable amount of processing time. Examples of methods used in data preprocessing include cleaning, instance selection, normalization, one-hot encoding, data transformation, feature extraction and feature selection. == Applications == === Data mining === Data preprocessing allows for the removal of unwanted data with the use of data cleaning, this allows the user to have a dataset to contain more valuable information after the preprocessing stage for data manipulation later in the data mining process. Editing such dataset to either correct data corruption or human error is a crucial step to get accurate quantifiers like true positives, true negatives, false positives and false negatives found in a confusion matrix that are commonly used for a medical diagnosis. Users are able to join data files together and use preprocessing to filter any unnecessary noise from the data which can allow for higher accuracy. Users use Python programming scripts accompanied by the pandas library which gives them the ability to import data from a comma-separated values as a data-frame. The data-frame is then used to manipulate data that can be challenging otherwise to do in Excel. Pandas (software) which is a powerful tool that allows for data analysis and manipulation; which makes data visualizations, statistical operations and much more, a lot easier. Many also use the R programming language to do such tasks as well. The reason why a user transforms existing files into a new one is because of many reasons. Aspects of data preprocessing may include imputing missing values, aggregating numerical quantities and transforming continuous data into categories (data binning). More advanced techniques like principal component analysis and feature selection are working with statistical formulas and are applied to complex datasets which are recorded by GPS trackers and motion capture devices. === Semantic data preprocessing === Semantic data mining is a subset of data mining that specifically seeks to incorporate domain knowledge, such as formal semantics, into the data mining process. Domain knowledge is the knowledge of the environment the data was processed in. Domain knowledge can have a positive influence on many aspects of data mining, such as filtering out redundant or inconsistent data during the preprocessing phase. Domain knowledge also works as constraint. It does this by using working as set of prior knowledge to reduce the space required for searching and acting as a guide to the data. Simply put, semantic preprocessing seeks to filter data using the original environment of said data more correctly and efficiently. There are increasingly complex problems which are asking to be solved by more elaborate techniques to better analyze existing information. Instead of creating a simple script for aggregating different numerical values into a single value, it make sense to focus on semantic based data preprocessing. The idea is to build a dedicated ontology, which explains on a higher level what the problem is about. In regards to semantic data mining and semantic pre-processing, ontologies are a way to conceptualize and formally define semantic knowledge and data. The Protégé (software) is the standard tool for constructing an ontology. In general, the use of ontologies bridges the gaps between data, applications, algorithms, and results that occur from semantic mismatches. As a result, semantic data mining combined with ontology has many applications where semantic ambiguity can impact the usefulness and efficiency of data systems. Applications include the medical field, language processing, banking, and even tutoring, among many more. There are various strengths to using a semantic data mining and ontological based approach. As previously mentioned, these tools can help during the per-processing phase by filtering out non-desirable data from the data set. Additionally, well-structured formal semantics integrated into well designed ontologies can return powerful data that can be easily read and processed by machines. A specifically useful example of this exists in the medical use of semantic data processing. As an example, a patient is having a medical emergency and is being rushed to hospital. The emergency responders are trying to figure out the best medicine to administer to help the patient. Under normal data processing, scouring all the patient’s medical data to ensure they are getting the best treatment could take too long and risk the patients’ health or even life. However, using semantically processed ontologies, the first responders could save the patient’s life. Tools like a semantic reasoner can use ontology to infer the what best medicine to administer to the patient is based on their medical history, such as if they have a certain cancer or other conditions, simply by examining the natural language used in the patient's medical records. This would allow the first responders to quickly and efficiently search for medicine without having worry about the patient’s medical history themselves, as the semantic reasoner would already have analyzed this data and found solutions. In general, this illustrates the incredible strength of using semantic data mining and ontologies. They allow for quicker and more efficient data extraction on the user side, as the user has fewer variables to account for, since the semantically pre-processed data and ontology built for the data have already accounted for many of these variables. However, there are some drawbacks to this approach. Namely, it requires a high amount of computational power and complexity, even with relatively small data sets. This could result in higher costs and increased difficulties in building and maintaining semantic data processing systems. This can be mitigated somewhat if the data set is already well organized and formatted, but even then, the complexity is still higher when compared to standard data processing. Below is a simple a diagram combining some of the processes, in particular semantic data mining and their use in ontology. The diagram depicts a data set being broken up into two parts: the characteristics of its domain, or domain knowledge, and then the actual acquired data. The domain characteristics are then processed to become user understood domain knowledge that can be applied to the data. Meanwhile, the data set is processed and stored so that the domain knowledge can applied to it, so that the process may continue. This application forms the ontology. From there, the ontology can be used to analyze data and process results. Fuzzy preprocessing is another, more advanced technique for solving complex problems. Fuzzy preprocessing and fuzzy data mining make use of fuzzy sets. These data sets are composed of two elements: a set and a membership function for the set which comprises 0 and 1. Fuzzy preprocessing uses this fuzzy data set to ground numerical values with linguistic information. Raw data is then transformed into natural language. Ultimately, fuzzy data mining's goal is to help deal with inexact information, such as an incomplete database. Currently fuzzy preprocessing, as well as other fuzzy based data mining techniques see frequent use with neural networks and artificial intelligence.

Sparse identification of non-linear dynamics

Sparse identification of nonlinear dynamics (SINDy) is a data-driven algorithm for obtaining dynamical systems from data. Given a series of snapshots of a dynamical system and its corresponding time derivatives, SINDy performs a sparsity-promoting regression (such as LASSO and sparse Bayesian inference) on a library of nonlinear candidate functions of the snapshots against the derivatives to find the governing equations. This procedure relies on the assumption that most physical systems only have a few dominant terms which dictate the dynamics, given an appropriately selected coordinate system and quality training data. It has been applied to identify the dynamics of fluids, based on proper orthogonal decomposition, as well as other complex dynamical systems, such as biological networks. == Mathematical Overview == First, consider a dynamical system of the form x ˙ = d d t x ( t ) = f ( x ( t ) ) , {\displaystyle {\dot {\textbf {x}}}={\frac {d}{dt}}{\textbf {x}}(t)={\textbf {f}}({\textbf {x}}(t)),} where x ( t ) ∈ R n {\displaystyle {\textbf {x}}(t)\in \mathbb {R} ^{n}} is a state vector (snapshot) of the system at time t {\displaystyle t} and the function f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} defines the equations of motion and constraints of the system. The time derivative may be either prescribed or numerically approximated from the snapshots. With x {\displaystyle {\textbf {x}}} and x ˙ {\displaystyle {\dot {\textbf {x}}}} sampled at m {\displaystyle m} equidistant points in time ( t 1 , t 2 , ⋯ , t m {\displaystyle t_{1},t_{2},\cdots ,t_{m}} ), these can be arranged into matrices of the form X = [ x T ( t 1 ) x T ( t 2 ) ⋮ x T ( t m ) ] = [ x 1 ( t 1 ) x 2 ( t 1 ) ⋯ x n ( t 1 ) x 1 ( t 2 ) x 2 ( t 2 ) ⋯ x n ( t 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( t m ) x 2 ( t m ) ⋯ x n ( t m ) ] , {\displaystyle {\bf {{X}={\begin{bmatrix}\mathbf {x} ^{\mathsf {T}}(t_{1})\\\mathbf {x} ^{\mathsf {T}}(t_{2})\\\vdots \\\mathbf {x} ^{\mathsf {T}}(t_{m})\end{bmatrix}}={\begin{bmatrix}x_{1}(t_{1})&x_{2}(t_{1})&\cdots &x_{n}(t_{1})\\x_{1}(t_{2})&x_{2}(t_{2})&\cdots &x_{n}(t_{2})\\\vdots &\vdots &\ddots &\vdots \\x_{1}(t_{m})&x_{2}(t_{m})&\cdots &x_{n}(t_{m})\end{bmatrix}},}}} and similarly for X ˙ {\displaystyle {\dot {\mathbf {X} }}} . Next, a library Θ ( X ) {\displaystyle \mathbf {\Theta } (\mathbf {X} )} of nonlinear candidate functions of the columns of X {\displaystyle {\textbf {X}}} is constructed, which may be constant, polynomial, or more exotic functions (like trigonometric and rational terms, and so on): Θ ( X ) = [ | | | | | | 1 X X 2 X 3 ⋯ sin ⁡ ( X ) cos ⁡ ( X ) ⋯ | | | | | | ] {\displaystyle \ \ \ {\bf {{\Theta }({\bf {{X})={\begin{bmatrix}\vline &\vline &\vline &\vline &&\vline &\vline &\\1&{\bf {X}}&{\bf {{X}^{2}}}&{\bf {{X}^{3}}}&\cdots &\sin({\bf {{X})}}&\cos({\bf {{X})}}&\cdots \\\vline &\vline &\vline &\vline &&\vline &\vline &\end{bmatrix}}}}}}} The number of possible model structures from this library is combinatorially high. f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} is then substituted by Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} and a vector of coefficients Ξ = [ ξ 1 ξ 2 ⋯ ξ n ] {\displaystyle {\bf {{\Xi }=\left[{\bf {{\xi }_{1}{\bf {{\xi }_{2}\cdots {\bf {{\xi }_{n}}}}}}}\right]}}} determining the active terms in f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} : X ˙ = Θ ( X ) Ξ {\displaystyle {\dot {\bf {X}}}={\bf {{\Theta }({\bf {{X}){\bf {\Xi }}}}}}} Because only a few terms are expected to be active at each point in time, an assumption is made that f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} admits a sparse representation in Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} . This then becomes an optimization problem in finding a sparse Ξ {\displaystyle {\bf {\Xi }}} which optimally embeds X ˙ {\displaystyle {\dot {\textbf {X}}}} . In other words, a parsimonious model is obtained by performing least squares regression on the system (4) with sparsity-promoting ( L 1 {\displaystyle L_{1}} ) regularization ξ k = arg ⁡ min ξ k ′ | | X ˙ k − Θ ( X ) ξ k ′ | | 2 + λ | | ξ k ′ | | 1 , {\displaystyle {\bf {{\xi }_{k}={\underset {\bf {{\xi }'_{k}}}{\arg \min }}\left|\left|{\dot {\bf {X}}}_{k}-{\bf {{\Theta }({\bf {{X}){\bf {{\xi }'_{k}}}}}}}\right|\right|_{2}+\lambda \left|\left|{\bf {{\xi }'_{k}}}\right|\right|_{1},}}} where λ {\displaystyle \lambda } is a regularization parameter. Finally, the sparse set of ξ k {\displaystyle {\bf {{\xi }_{k}}}} can be used to reconstruct the dynamical system: x ˙ k = Θ ( x ) ξ k {\displaystyle {\dot {x}}_{k}={\bf {{\Theta }({\bf {{x}){\bf {{\xi }_{k}}}}}}}}

Semantic translation

Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system. An example of semantic translation is the conversion of XML data from one data model to a second data model using formal ontologies for each system such as the Web Ontology Language (OWL). This is frequently required by intelligent agents that wish to perform searches on remote computer systems that use different data models to store their data elements. The process of allowing a single user to search multiple systems with a single search request is also known as federated search. Semantic translation should be differentiated from data mapping tools that do simple one-to-one translation of data from one system to another without actually associating meaning with each data element. Semantic translation requires that data elements in the source and destination systems have "semantic mappings" to a central registry or registries of data elements. The simplest mapping is of course where there is equivalence. There are three types of Semantic equivalence: Class Equivalence - indicating that class or "concepts" are equivalent. For example: "Person" is the same as "Individual" Property Equivalence - indicating that two properties are equivalent. For example: "PersonGivenName" is the same as "FirstName" Instance Equivalence - indicating that two individual instances of objects are equivalent. For example: "Dan Smith" is the same person as "Daniel Smith" Semantic translation is very difficult if the terms in a particular data model do not have direct one-to-one mappings to data elements in a foreign data model. In that situation, an alternative approach must be used to find mappings from the original data to the foreign data elements. This problem can be alleviated by centralized metadata registries that use the ISO-11179 standards such as the National Information Exchange Model (NIEM).

Algorithmic transparency

Algorithmic transparency is the principle that the factors that influence the decisions made by algorithms should be visible, or transparent, to the people who use, regulate, and are affected by systems that employ those algorithms. Although the phrase was coined in 2016 by Nicholas Diakopoulos and Michael Koliska about the role of algorithms in deciding the content of digital journalism services, the underlying principle dates back to the 1970s and the rise of automated systems for scoring consumer credit. The phrases "algorithmic transparency" and "algorithmic accountability" are sometimes used interchangeably – especially since they were coined by the same people – but they have subtly different meanings. Specifically, "algorithmic transparency" states that the inputs to the algorithm and the algorithm's use itself must be known, but they need not be fair. "Algorithmic accountability" implies that the organizations that use algorithms must be accountable for the decisions made by those algorithms, even though the decisions are being made by a machine, and not by a human being. Current research around algorithmic transparency interested in both societal effects of accessing remote services running algorithms, as well as mathematical and computer science approaches that can be used to achieve algorithmic transparency. In the United States, the Federal Trade Commission's Bureau of Consumer Protection studies how algorithms are used by consumers by conducting its own research on algorithmic transparency and by funding external research. In the European Union, the data protection laws that came into effect in May 2018 include a "right to explanation" of decisions made by algorithms, though it is unclear what this means. Furthermore, the European Union founded The European Center for Algorithmic Transparency (ECAT).

Automated restaurant

An automated restaurant or robotic restaurant is a restaurant that uses robots to do tasks such as delivering food and drink to the tables or cooking the food. Restaurant automation means the use of a restaurant management system to automate some or occasionally all of the major operations of a restaurant establishment. More recently, restaurants are opening that have completely or partially automated their services. These may include: taking orders, preparing food, serving, and billing. A few fully automated restaurants operate without any human intervention whatsoever. Robots are designed to help and sometimes replace human labour (such as waiters and chefs). The automation of restaurants may also allow for the option for greater customization of an order. == History == === Vending machines === In the late 19th and early 20th century a number of restaurants served food solely through vending machines. These restaurants were called automats or, in Japan, shokkenki. Customers ordered their food directly through the machines. === Sushi conveyors === Yoshiaki Shiraishi is a Japanese innovator who is known for the creation of conveyor belt sushi. He had the idea following difficulty staffing his small sushi restaurant and managing the restaurant on his own. He was inspired seeing beer bottles on a conveyor belt in an Asahi brewery. Yoshiaki's restaurants are an early example of restaurant automation; they used a conveyor belt to distribute dishes around the restaurant, eliminating the need for waiters. This example of automation dates back to the Japanese economic miracle; the first of Yoshiaki's conveyor belt sushi restaurants was opened under the name Mawaru Genroku Sushi in 1958, in Osaka. === Partial automation === As of 2011, across Europe, McDonald's had already begun implementing 7,000 touch screen kiosks that could handle cashiering duties. From 2015 to 2020, Zume had an automated pizza parlor. Later companies would try to produce smaller, less ambitious devices, with one robotics company producing a machine that could automate the slowest and most repetitive parts of assembling a pizza, such as spreading pizza sauce or placing slices of pepperoni, while leaving other customizations to employees. In 2020, a restaurant in the Netherlands began trialling the use of a robot to serve guests. In September 2021, Karakuri's 'Semblr' food service robot served personalised lunches for the 4,000 employees of grocery technology solutions provider ocado Group's head offices in Hatfield, UK. 2,700 different combinations of dishes were on offer. Customers could specify in grams what hot and cold items, proteins, sauces and fresh toppings they wanted. In 2021, Columbia University School of Engineering and Applied Science engineers developed a method of cooking 3D printed chicken with software-controlled robotic lasers. The “Digital Food” team exposed raw 3D printed chicken structures to both blue and infrared light. They then assessed the cooking depth, colour development, moisture retention and flavour differences of the laser-cooked 3D printed samples in comparison to stove-cooked meat. In June 2022 a California nonprofit chain of residential communities, Front Porch, experimented with robots in dining rooms at two locations to supplement wait staff by carrying plated food and drink to tables, and removing dishes. 65% of residents found the robots helpful, with 51% saying they let the staff spend more quality time with diners. 51% of staff were "excited" and 58% said they enabled more quality time with diners. The chain has 19 senior living communities (and 35 affordable housing communities), so it has potential to expand robots to more dining rooms. It is shifting to memory care, which may affect plans. == Rationales == === Advantages === Efficiency: Automated restaurants can significantly enhance operational efficiency by minimizing human error and reducing service time. With automated ordering, payment, and food preparation systems, customers can enjoy faster service and reduced waiting times. Cost savings: By reducing the need for human staff, automated restaurants can potentially lower labor costs. This can be particularly beneficial in areas with high labor expenses, as it allows for better resource allocation and cost management. Consistency: Automation ensures consistency in food quality and presentation. With precise portion control and standardized cooking methods, customers can expect the same quality and taste in their meals every time they visit. Enhanced customer experience: Self-service kiosks and automated systems provide customers with control and convenience. They can customize their orders, browse through menu options, and pay seamlessly, creating a more interactive and satisfying dining experience. === Disadvantages === Lack of personal touch: Automated restaurants may lack the personal interaction and warmth that traditional restaurants provide. Some customers prefer the human touch, personalized recommendations, and the social aspect of dining out. Technical issues: Reliance on technology means that technical glitches and malfunctions can occur, resulting in service disruptions or delays. Maintenance and technical support become critical in ensuring smooth operations. Limited menu complexity: The automation process may be better suited for standardized menu items rather than complex or customized dishes. The ability to cater to unique dietary preferences or accommodate special requests may be limited. Employment implications: Automated restaurants may result in job losses for traditional restaurant staff, potentially impacting the local workforce. It is important to consider the social and economic implications of adopting such technology. == Locations == Automated restaurants have been opening in many countries. Examples include: Nala Restaurant in Naperville, Illinois Fritz's Railroad Restaurant in Kansas City, Kansas Výtopna, a Railway Restaurant using model trains: franchise of various restaurants and coffeehouses in the Czech Republic Bagger's Restaurant in Nuremberg, Germany FuA-Men Restaurant, a ramen restaurant located in Nagoya, Japan Fōster Nutrition in Buenos Aires, Argentina Dalu Robot Restaurant in Jinan, China Haohai Robot Restaurant in Harbin, China Robot Kitchen Restaurant in Hong Kong Robo-Chef restaurant in Tehran, Iran, started in 2017, is the first robotic and "waiterless" restaurant of the Middle East. MIT graduates opened Spyce Kitchens in downtown Boston, Massachusetts, in 2018 Foodom, under Country Garden Holdings, opened January 12, 2020, in Guangzhou, China Robot Chacha, the first robot restaurant of India, is planning to open in the capital city of New Delhi. Kura Revolving Sushi Bar, with a number of locations in the United States, uses a tablets at tables for ordering, a conveyor belt to deliver food, and robots to deliver drinks and condiments. Chipotle Mexican Grill is beginning to deploy the Hyphen Makeline, which assembles up to 350 bowls and salads automatically per hour, and Chippy, an automatic tortilla chip fryer made by Miso Robotics. Serious Dumplings in Boca Raton, Florida

Super column

A super column is a tuple (a pair) with a binary super column name and a value that maps it to many columns. They consist of a key–value pairs, where the values are columns. Theoretically speaking, super columns are (sorted) associative array of columns. Similar to a regular column family where a row is a sorted map of column names and column values, a row in a super column family is a sorted map of super column names that maps to column names and column values. A super column is part of a keyspace together with other super columns and column families, and columns. == Code example == Written in the JSON-like syntax, a super column definition can be like this: Where: "databases" are keyspace; "Cassandra" and "HBase" are rowKeys; "name" and "address" are super column names; "firstName", "city", "age", etc. are column names.