Schema-agnostic databases

Schema-agnostic databases

Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic semantic matching between queries and databases. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary. The increase in the size and in the semantic heterogeneity of database schemas bring new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. == Description == The evolution of data environments towards the consumption of data from multiple data sources and the growth in the schema size, complexity, dynamicity and decentralisation (SCoDD) of schemas increases the complexity of contemporary data management. The SCoDD trend emerges as a central data management concern in Big Data scenarios, where users and applications have a demand for more complete data, produced by independent data sources, under different semantic assumptions and contexts of use, which is the typical scenario for Semantic Web Data applications. The evolution of databases in the direction of heterogeneous data environments strongly impacts the usability, semiotics and semantic assumptions behind existing data accessibility methods such as structured queries, keyword-based search and visual query systems. With schema-less databases containing potentially millions of dynamically changing attributes, it becomes unfeasible for some users to become aware of the 'schema' or vocabulary in order to query the database. At this scale, the effort in understanding the schema in order to build a structured query can become prohibitive. == Schema-agnostic queries == Schema-agnostic queries can be defined as query approaches over structured databases which allow users satisfying complex information needs without the understanding of the representation (schema) of the database. Similarly, Tran et al. defines it as "search approaches, which do not require users to know the schema underlying the data". Approaches such as keyword-based search over databases allow users to query databases without employing structured queries. However, as discussed by Tran et al.: "From these points, users however have to do further navigation and exploration to address complex information needs. Unlike keyword search used on the Web, which focuses on simple needs, the keyword search elaborated here is used to obtain more complex results. Instead of a single set of resources, the goal is to compute complex sets of resources and their relations." The development of approaches to support natural language interfaces (NLI) over databases have aimed towards the goal of schema-agnostic queries. Complementarily, some approaches based on keyword search have targeted keyword-based queries which express more complex information needs. Other approaches have explored the construction of structured queries over databases where schema constraints can be relaxed. All these approaches (natural language, keyword-based search and structured queries) have targeted different degrees of sophistication in addressing the problem of supporting a flexible semantic matching between queries and data, which vary from the completely absence of the semantic concern to more principled semantic models. While the demand for schema-agnosticism has been an implicit requirement across semantic search and natural language query systems over structured data, it is not sufficiently individuated as a concept and as a necessary requirement for contemporary database management systems. Recent works have started to define and model the semantic aspects involved on schema-agnostic queries. === Schema-agnostic structured queries === Consist of schema-agnostic queries following the syntax of a structured standard (for example SQL, SPARQL). The syntax and semantics of operators are maintained, while different terminologies are used. ==== Example 1 ==== SELECT ?y { BillClinton hasDaughter ?x . ?x marriedTo ?y . } which maps to the following SPARQL query in the dataset vocabulary: ==== Example 2 ==== which maps to the following SPARQL query in the dataset vocabulary: === Schema-agnostic keyword queries === Consist of schema-agnostic queries using keyword queries. In this case the syntax and semantics of operators are different from the structured query syntax. ==== Example ==== "Bill Clinton daughter married to" "Books by William Goldman with more than 300 pages" == Semantic complexity == As of 2016 the concept of schema-agnostic queries has been developed primarily in academia. Most of schema-agnostic query systems have been investigated in the context of Natural Language Interfaces over databases or over the Semantic Web. These works explore the application of semantic parsing techniques over large, heterogeneous and schema-less databases. More recently, the individuation of the concept of schema-agnostic query systems and databases have appeared more explicitly within the literature. Freitas et al. provide a probabilistic model on the semantic complexity of mapping schema-agnostic queries.

Eigenface

An eigenface ( EYE-gən-) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. == History == The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En). In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels. Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes. == Generation == A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face. The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face. The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. === Practical implementation === To create a set of eigenfaces, one must: Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance ⁠ v = ( λ 1 + λ 2 + . . . + λ n ) {\displaystyle v=(\lambda _{1}+\lambda _{2}+...+\lambda _{n})} ⁠, n = number of components, and λ {\displaystyle \lambda } represents component eigenvalue. k is the smallest number that satisfies ( λ 1 + λ 2 + . . . + λ k ) v > ϵ {\displaystyle {\frac {(\lambda _{1}+\lambda _{2}+...+\lambda _{k})}{v}}>\epsilon } These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded. === Matlab example code === Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16. Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code: === Computing the eigenvectors === Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows. Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by S v i = T T T v i = λ i v i {\displaystyle \mathbf {Sv} _{i}=\mathbf {T} \mathbf {T} ^{T}\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} However TTT is a large matrix, and if instead we take the eigenvalue decomposition of T T T u i = λ i u i {\displaystyle \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {u} _{i}} then we notice that by pre-multiplying both sides of the equation with T, we obtain T T T T u i = λ i T u i {\displaystyle \mathbf {T} \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {T} \mathbf {u} _{i}} Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have

Digital zombie

A digital zombie is a person so engaged with digital technology or social media they are unable to separate themselves from a persistent online presence. Writing in 2017, University of Sydney researcher Andrew Campbell expressed concerns over whether or not the individual can truly live a full and healthy life while they are preoccupied with the digital world. Other individuals have also begun referencing certain types of behaviour with being a digital zombie. Stefanie Valentic, managing editor of EHS Today, refers to it as people hunting digital creatures through their smartphones in public spaces, always fixed on their phones. The University of Warwick has used the term to argue that further research needs to be done with people who exist in digital form after death to help people grieve their loss. == Modern applications == === Distracted walking === The term digital zombie can refer to a person performing distracted walking, which has been labelled dangerous by the American Academy of Orthopaedic Surgeons. They created the "Digital Deadwalkers" campaign after physicians became aware of the risks associated with walking across intersections and sidewalks while paying attention only to smartphones and not one's surroundings. Also stating that the name is derived from the fact that "they're oblivious to everyone else, so it's like they're dead-walking, sleepwalking." === Living through media === The Department of Sociology, University of Warwick has also identified the term, digital zombie, to refer to an individual who has died but is digitally resurrected, reanimated and socially active. These digital zombies do things in death they did not do when they were alive as they "live" again through a digital self on a digital medium. Dead celebrities sometimes become digital zombies when they are reanimated to appear in commercial advertisements (such as Audrey Hepburn and Bob Monkhouse). Other accidental digital zombies include Tupac Shakur and Michael Jackson who were both digitally resurrected and recreated to perform "live" on stage years after their death. Researchers at the University of Warwick have carried out research into the area of human-computer interaction. in an effort to understand the affect these digital zombies have on grief and bereavement. === Mobile gaming === Writer for EHS Today, Stefanie Valentic, has made observations with the mobile phone video game Pokémon Go, which offers players the experience to hunt and collect digital creatures called Pokémon through their smartphone in real world. Players can be observed simultaneously gazing at their phone while also obliviously walking around their environments looking for Pokémon. Stefanie references these individuals as "digital zombies" since they walk around with no cognition of their surroundings while engaged with their phone. == Health risks == === Heavy use of technology === Research by the University of Sydney has begun looking at how new technology such as digital media and smartphones impact our lives and questioning whether they can create new compulsions and obsessions. The research demonstrates that increased heavy technological use can have negative health consequences similar to drugs, smoking, and alcohol. Marcel O'Gorman, an associate professor of English at the University of Waterloo, has commented on the body of research examining how technology impacts cognition, stating currently that there is no empirical evidence to support any theories that suggest that technology can damage memory and attention span. === Heightened risk to children === Manfred Spitzer, a German psychiatrist, has raised concerns with providing digital devices to children. During the early childhood stage while their brains are rapidly growing, increased exposure to digital devices may deprive them of necessary development required to facilitate brain growth. These concerns are also shared by Korean doctors who believe giving digital devices, like smartphones to children, limits their cognitive development.

Web API

A web API is an application programming interface (API) for either a web server or a web browser. As a web development concept, it can be related to a web application's client side (including any web frameworks being used). A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML by means of an HTTP-based web server. A server API (SAPI) is not considered a server-side web API, unless it is publicly accessible by a remote web application. == Client side == A client-side web API is a programmatic interface to extend functionality within a web browser or other HTTP client. Originally these were most commonly in the form of native plug-in browser extensions however most newer ones target standardized JavaScript bindings. The Mozilla Foundation created their WebAPI specification which is designed to help replace native mobile applications with HTML5 applications. Google created their Native Client architecture which is designed to help replace insecure native plug-ins with secure native sandboxed extensions and applications. They have also made this portable by employing a modified LLVM AOT compiler. == Server side == A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML. The web API is exposed most commonly by means of an HTTP-based web server. Mashups are web applications which combine the use of multiple server-side web APIs. Webhooks are server-side web APIs that take input as a Uniform Resource Identifier (URI) that is designed to be used like a remote named pipe or a type of callback such that the server acts as a client to dereference the provided URI and trigger an event on another server which handles this event thus providing a type of peer-to-peer IPC. === Endpoints === Endpoints are important aspects of interacting with server-side web APIs, as they specify where resources can be accessed by third-party software. Usually the access is via a URI to which HTTP requests are posted, and from which the response is thus expected. Web APIs may be public or private, the latter of which requires an access token. Endpoints need to be static, otherwise the correct functioning of software that interacts with them cannot be guaranteed. If the location of a resource changes (and with it the endpoint) then previously written software will break, as the required resource can no longer be found at the same place. As API providers still want to update their web APIs, many have introduced a versioning system in the URI that points to an endpoint. === Resources versus services === Web 2.0 Web APIs often use machine-based interactions such as REST and SOAP. RESTful web APIs use HTTP methods to access resources via URL-encoded parameters, and use JSON or XML to transmit data. By contrast, SOAP protocols are standardized by the W3C and mandate the use of XML as the payload format, typically over HTTP. Furthermore, SOAP-based Web APIs use XML validation to ensure structural message integrity, by leveraging the XML schemas provisioned with WSDL documents. A WSDL document accurately defines the XML messages and transport bindings of a Web service. === Documentation === Server-side web APIs are interfaces for the outside world to interact with the business logic. For many companies this internal business logic and the intellectual property associated with it are what distinguishes them from other companies, and potentially what gives them a competitive edge. They do not want this information to be exposed. However, in order to provide a web API of high quality, there needs to be a sufficient level of documentation. One API provider that not only provides documentation, but also links to it in its error messages is Twilio. However, there are now directories of popular documented server-side web APIs. === Growth and impact === The number of available web APIs has grown consistently over the past years, as businesses realize the growth opportunities associated with running an open platform, that any developer can interact with. ProgrammableWeb tracks over 24000 Web APIs that were available in 2022, up from 105 in 2005. Web APIs have become ubiquitous. There are few major software applications/services that do not offer some form of web API. One of the most common forms of interacting with these web APIs is via embedding external resources, such as tweets, Facebook comments, YouTube videos, etc. In fact there are very successful companies, such as Disqus, whose main service is to provide embeddable tools, such as a feature-rich comment system. Any website of the TOP 100 Alexa Internet ranked websites uses APIs and/or provides its own APIs, which is a very distinct indicator for the prodigious scale and impact of web APIs as a whole. As the number of available web APIs has grown, open source tools have been developed to provide more sophisticated search and discovery. APIs.json provides a machine-readable description of an API and its operations, and the related project APIs.io offers a searchable public listing of APIs based on the APIs.json metadata format. === Business === ==== Commercial ==== Many companies and organizations rely heavily on their Web API infrastructure to serve their core business clients. In 2014 Netflix received around 5 billion API requests, most of them within their private API. ==== Governmental ==== Many governments collect a lot of data, and some governments are now opening up access to this data. The interfaces through which this data is typically made accessible are web APIs. Web APIs allow for data, such as "budget, public works, crime, legal, and other agency data" to be accessed by any developer in a convenient manner. == Example == An example of a popular web API is the Astronomy Picture of the Day API operated by the American space agency NASA. It is a server-side API used to retrieve photographs of space or other images of interest to astronomers, and metadata about the images. According to the API documentation, the API has one endpoint: https://api.nasa.gov/planetary/apod The documentation states that this endpoint accepts GET requests. It requires one piece of information from the user, an API key, and accepts several other optional pieces of information. Such pieces of information are known as parameters. The parameters for this API are written in a format known as a query string, which is separated by a question mark character (?) from the endpoint. An ampersand (&) separates the parameters in the query string from each other. Together, the endpoint and the query string form a URL that determines how the API will respond. This URL is also known as a query or an API call. In the below example, two parameters are transmitted (or passed) to the API via the query string. The first is the required API key and the second is an optional parameter — the date of the photograph requested. https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY&date=1996-12-03 Visiting the above URL in a web browser will initiate a GET request, calling the API and showing the user a result, known as a return value or as a return. This API returns JSON, a type of data format intended to be understood by computers, but which is somewhat easy for a human to read as well. In this case, the JSON contains information about a photograph of a white dwarf star: The above API return has been reformatted so that names of JSON data items, known as keys, appear at the start of each line. The last of these keys, named url, indicates a URL which points to a photograph: https://apod.nasa.gov/apod/image/9612/ngc2440_hst2.jpg Following the above URL, a web browser user would see this photo: Although this API can be called by an end user with a web browser (as in this example) it is intended to be called automatically by software or by computer programmers while writing software. JSON is intended to be parsed by a computer program, which would extract the URL of the photograph and the other metadata. The resulting photo could be embedded in a website, automatically sent via text message, or used for any other purpose envisioned by a software developer.

Coupling (electronics)

In electronics, electric power and telecommunication, coupling is the transfer of electrical energy from one circuit to another, or between parts of a circuit. Coupling can be deliberate as part of the function of the circuit, or it may be undesirable, for instance due to coupling to stray fields. For example, energy is transferred from a power source to an electrical load by means of conductive coupling, which may be either resistive or direct coupling. An AC potential may be transferred from one circuit segment to another having a DC potential by use of a capacitor. Electrical energy may be transferred from one circuit segment to another segment with different impedance by use of a transformer; this is known as impedance matching. These are examples of electrostatic and electrodynamic inductive coupling. == Types == Electrical conduction: Direct coupling, also called conductive coupling and galvanic coupling Resistive conduction Atmospheric plasma channel coupling Electromagnetic induction: Electrodynamic induction — commonly called inductive coupling, also magnetic coupling Capacitive coupling Evanescent wave coupling Electromagnetic radiation: Radio waves — Wireless telecommunications. Electromagnetic interference (EMI) — Sometimes called radio frequency interference (RFI), is unwanted coupling. Electromagnetic compatibility (EMC) requires techniques to avoid such unwanted coupling, such as electromagnetic shielding. Microwave power transmission Other kinds of energy coupling: Acoustic coupler

Physicalization

Physicalization of computer hardware (the opposite of virtualization), is a way to place multiple physical machines in a rack unit. It can be a way to reduce hardware costs, since in some cases, server processors cost more per core than energy efficient laptop processors, which may make up for added cost of board level integration. While Moore's law makes increasing integration less expensive, some jobs require much I/O bandwidth, which may be less expensive to provide using many less-integrated processors. Applications and services that are I/O bound are likely to benefit from such physicalized environments. This ensures that each operating system instance is running on a processor that has its own network interface card, host bus and I/O sub-system unlike in the case of a multi-core servers where a single I/O sub-system is shared between all the cores / VMs.

Single address space operating system

In computer science, a single address space operating system (or SASOS) is an operating system that provides only one globally shared address space for all processes. In a single address space operating system, numerically identical (virtual memory) logical addresses in different processes all refer to exactly the same byte of data. In a traditional OS with private per-process address space, memory protection is based on address space boundaries ("address space isolation"). Single address-space operating systems make translation and protection orthogonal, which in no way weakens protection. The core advantage is that pointers (i.e. memory references) have global validity, meaning their meaning is independent of the process using it. This allows sharing pointer-connected data structures across processes, and making them persistent, i.e. storing them on backup store. Some processor architectures have direct support for protection independent of translation. On such architectures, a SASOS may be able to perform context switches faster than a traditional OS. Such architectures include Itanium, and Version 5 of the Arm architecture, as well as capability architectures such as CHERI. A SASOS should not be confused with a flat memory model, which provides no address translation and generally no memory protection. In contrast, a SASOS makes protection orthogonal to translation: it may be possible to name a data item (i.e. know its virtual address) while not being able to access it. SASOS projects using hardware-based protection include the following: Angel IBM i (formerly called OS/400) Iguana at NICTA, Australia Mungi at NICTA, Australia Nemesis Opal Scout Sombrero Related are OSes that provide protection through language-level type safety: Br1X Genera JX a research Java OS Phantom OS Singularity Theseus OS Torsion