The r-to-1 collision problem is an important theoretical problem in complexity theory, quantum computing, and computational mathematics. The collision problem most often refers to the 2-to-1 version: given n {\displaystyle n} even and a function f : { 1 , … , n } → { 1 , … , n } {\displaystyle f:\,\{1,\ldots ,n\}\rightarrow \{1,\ldots ,n\}} , we are promised that f is either 1-to-1 or 2-to-1. We are only allowed to make queries about the value of f ( i ) {\displaystyle f(i)} for any i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} . The problem then asks how many such queries we need to make to determine with certainty whether f is 1-to-1 or 2-to-1. == Classical solutions == === Deterministic === Solving the 2-to-1 version deterministically requires n 2 + 1 {\textstyle {\frac {n}{2}}+1} queries, and in general distinguishing r-to-1 functions from 1-to-1 functions requires n r + 1 {\textstyle {\frac {n}{r}}+1} queries. This is a straightforward application of the pigeonhole principle: if a function is r-to-1, then after n r + 1 {\textstyle {\frac {n}{r}}+1} queries we are guaranteed to have found a collision. If a function is 1-to-1, then no collision exists. Thus, n r + 1 {\textstyle {\frac {n}{r}}+1} queries suffice. If we are unlucky, then the first n / r {\displaystyle n/r} queries could return distinct answers, so n r + 1 {\textstyle {\frac {n}{r}}+1} queries is also necessary. === Randomized === If we allow randomness, the problem is easier. By the birthday paradox, if we choose (distinct) queries at random, then with high probability we find a collision in any fixed 2-to-1 function after Θ ( n ) {\displaystyle \Theta ({\sqrt {n}})} queries. == Quantum solution == The BHT algorithm, which uses Grover's algorithm, solves this problem optimally by only making O ( n 1 / 3 ) {\displaystyle O(n^{1/3})} queries to f. The matching lower bound of Ω ( n 1 / 3 ) {\displaystyle \Omega (n^{1/3})} was proved by Aaronson and Shi using the polynomial method.
Teleradiology
Teleradiology is the transmission of radiological patient images from procedures such as x-rays, Computed tomography (CT), and MRI imaging, from one location to another for the purposes of sharing studies with other radiologists and physicians. Teleradiology allows radiologists to provide services without actually having to be at the location of the patient. This is particularly important when a sub-specialist such as an MRI radiologist, neuroradiologist, pediatric radiologist, or musculoskeletal radiologist is needed, since these professionals are generally only located in large metropolitan areas working during daytime hours. Teleradiology allows for specialists to be available at all times. Teleradiology utilizes standard network technologies such as the Internet, telephone lines, wide area networks, local area networks (LAN) and the latest advanced technologies such as medical cloud computing. Specialized software is used to transmit the images and enable the radiologist to effectively analyze potentially hundreds of images of a given study. Technologies such as advanced graphics processing, voice recognition, artificial intelligence, and image compression are often used in teleradiology. Through teleradiology and mobile DICOM viewers, images can be sent to another part of the hospital or to other locations around the world with equal effort. Teleradiology is a growth technology given that imaging procedures are growing approximately 15% annually against an increase of only 2% in the radiologist population. == Reports == Teleradiology services commonly provide either preliminary or final interpretations of medical imaging studies. Preliminary reads are frequently used in emergency settings to support immediate clinical decisions and may include direct communication of critical findings to the referring physician. Some providers report turnaround times of approximately 30 minutes for emergency cases, with faster processing for time-sensitive conditions such as stroke. Final reads are definitive and used in official patient records and billing. These reports typically include all relevant findings and may require access to prior imaging and clinical data. Teleradiology is also employed to provide off-hour or overflow coverage for healthcare institutions lacking continuous on-site radiology staffing. == Subspecialties == Some teleradiologists are fellowship trained and have a wide variety of subspecialty expertise including such difficult-to-find areas as neuroradiology, pediatric neuroradiology, thoracic imaging, musculoskeletal radiology, mammography, and nuclear cardiology. There are also various medical practitioners who are not radiologists that take on studies in radiology to become sub specialists in their respected fields, an example of this is dentistry where oral and maxillofacial radiology allows those in dentistry to specialize in the acquisition and interpretation of radiographic imaging studies performed for diagnosis of treatment guidance for conditions affecting the maxillofacial region. == Teleultrasound == Teleradiology infrastructure has also been adapted to support point-of-care ultrasound (POCUS) in remote and austere environments. In teleultrasound—also known as telementored ultrasound—a remote expert guides a non-specialist in real time during image acquisition. This technique has been successfully demonstrated in extreme settings, including aboard the International Space Station, on Mount Everest, and during helicopter flight. == Regulations == In the United States, Medicare and Medicaid laws require the teleradiologist to be on U.S. soil in order to qualify for reimbursement of the Final Read. In addition, advanced teleradiology systems must also be HIPAA compliant, which helps to ensure patients' privacy. HIPAA (Health Insurance Portability and Accountability Act of 1996) is a uniform, federal floor of privacy protections for consumers. It limits the ways that entities can use patients' personal information and protects the privacy of all medical information no matter what form it is in. Quality teleradiology must abide by important HIPAA rules to ensure patients' privacy is protected. Also State laws governing the licensing requirements and medical malpractice insurance coverage required for physicians vary from state to state. Ensuring compliance with these laws is a significant overhead expense for larger multi-state teleradiology groups. Medicare (Australia) has identical requirements to that of the United States, where the guidelines are provided by the Department of Health and Ageing, and government based payments fall under the Health Insurance Act. The regulations in Australia are also conducted at both federal and state levels, ensuring that strict guidelines are adhered to at all times, with regular yearly updates and amendments are introduced (usually around March and November of every year), ensuring that the legislation is kept up to date with changes in the industry. One of the most recent changes to Medicare and radiology / teleradiology in Australia was the introduction of the Diagnostic Imaging Accreditation Scheme (DIAS) on 1 July 2008. DIAS was introduced to further improve the quality of Diagnostic Imaging and to amend the Health Insurance Act. == Industry growth == Until the late 1990s teleradiology was primarily used by individual radiologists to interpret occasional emergency studies from offsite locations, often in the radiologists home. The connections were made through standard analog phone lines. Teleradiology expanded rapidly as the growth of the internet and broad band combined with new CT scanner technology to become an essential tool in trauma cases in emergency rooms throughout the country. The occasional 2–3 x-ray studies a week soon became 3–10 CT scans, or more, a night. Because ER physicians are not trained to read CT scans or MRIs, radiologists went from working 8–10 hours a day, five and half days a week to a schedule of 24 hours a day, 7 days a week coverage. This became a particularly acute challenge in smaller rural facilities that only had one solo radiologist with no other to share call. These circumstances spawned a post-dot.com boom of firms and groups that provided medical outsourcing, off-site teleradiology on-call services to hospitals and Radiology Groups around the country. As an example, a teleradiology firm might cover trauma at a hospital in Indiana with doctors based in Texas. Some firms even used overseas doctors in locations like Australia and India. Nighthawk, founded by Paul Berger, was the first to station U.S. licensed radiologists overseas (initially Australia and later Switzerland) to maximize the time zone difference to provide nightcall in U.S. hospitals. Currently, teleradiology firms are facing pricing pressures. Industry consolidation is likely as there are more than 500 of these firms, large and small, throughout the United States.
Quantum neural network
Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models (which are widely used in machine learning for the important task of pattern recognition) with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments. Most Quantum neural networks are developed as feed-forward networks. Similar to their classical counterparts, this structure intakes input from one layer of qubits, and passes that input onto another layer of qubits. This layer of qubits evaluates this information and passes on the output to the next layer. Eventually the path leads to the final layer of qubits. The layers do not have to be of the same width, meaning they don't have to have the same number of qubits as the layer before or after it. This structure is trained on which path to take similar to classical artificial neural networks. This is discussed in a lower section. Quantum neural networks refer to three different categories: Quantum computer with classical data, classical computer with quantum data, and quantum computer with quantum data. == Examples == Quantum neural network research is still in its infancy, and a conglomeration of proposals and ideas of varying scope and mathematical rigor have been put forward. Most of them are based on the idea of replacing classical binary or McCulloch-Pitts neurons with a qubit (which can be called a "quron"), resulting in neural units that can be in a superposition of the state 'firing' and 'resting'. === Quantum perceptrons === A lot of proposals attempt to find a quantum equivalent for the perceptron unit from which neural nets are constructed. A problem is that nonlinear activation functions do not immediately correspond to the mathematical structure of quantum theory, since a quantum evolution is described by linear operations and leads to probabilistic observation. Ideas to imitate the perceptron activation function with a quantum mechanical formalism reach from special measurements to postulating non-linear quantum operators (a mathematical framework that is disputed). A direct implementation of the activation function using the circuit-based model of quantum computation has recently been proposed by Schuld, Sinayskiy and Petruccione based on the quantum phase estimation algorithm. === Quantum networks === At a larger scale, researchers have attempted to generalize neural networks to the quantum setting. One way of constructing a quantum neuron is to first generalise classical neurons and then generalising them further to make unitary gates. Interactions between neurons can be controlled quantumly, with unitary gates, or classically, via measurement of the network states. This high-level theoretical technique can be applied broadly, by taking different types of networks and different implementations of quantum neurons, such as photonically implemented neurons and quantum reservoir processor (quantum version of reservoir computing). Most learning algorithms follow the classical model of training an artificial neural network to learn the input-output function of a given training set and use classical feedback loops to update parameters of the quantum system until they converge to an optimal configuration. Learning as a parameter optimisation problem has also been approached by adiabatic models of quantum computing. Quantum neural networks can be applied to algorithmic design: given qubits with tunable mutual interactions, one can attempt to learn interactions following the classical backpropagation rule from a training set of desired input-output relations, taken to be the desired output algorithm's behavior. The quantum network thus 'learns' an algorithm. === Quantum associative memory === The first quantum associative memory algorithm was introduced by Dan Ventura and Tony Martinez in 1999. The authors do not attempt to translate the structure of artificial neural network models into quantum theory, but propose an algorithm for a circuit-based quantum computer that simulates associative memory. The memory states (in Hopfield neural networks saved in the weights of the neural connections) are written into a superposition, and a Grover-like quantum search algorithm retrieves the memory state closest to a given input. As such, this is not a fully content-addressable memory, since only incomplete patterns can be retrieved. The first truly content-addressable quantum memory, which can retrieve patterns also from corrupted inputs, was proposed by Carlo A. Trugenberger. Both memories can store an exponential (in terms of n qubits) number of patterns but can be used only once due to the no-cloning theorem and their destruction upon measurement. Trugenberger, however, has shown that his probabilistic model of quantum associative memory can be efficiently implemented and re-used multiples times for any polynomial number of stored patterns, a large advantage with respect to classical associative memories. === Classical neural networks inspired by quantum theory === A substantial amount of interest has been given to a "quantum-inspired" model that uses ideas from quantum theory to implement a neural network based on fuzzy logic. == Training == Quantum Neural Networks can be theoretically trained similarly to training classical/artificial neural networks. A key difference lies in communication between the layers of a neural networks. For classical neural networks, at the end of a given operation, the current perceptron copies its output to the next layer of perceptron(s) in the network. However, in a quantum neural network, where each perceptron is a qubit, this would violate the no-cloning theorem. A proposed generalized solution to this is to replace the classical fan-out method with an arbitrary unitary that spreads out, but does not copy, the output of one qubit to the next layer of qubits. Using this fan-out Unitary ( U f {\displaystyle U_{f}} ) with a dummy state qubit in a known state (Ex. | 0 ⟩ {\displaystyle |0\rangle } in the computational basis), also known as an Ancilla bit, the information from the qubit can be transferred to the next layer of qubits. This process adheres to the quantum operation requirement of reversibility. Using this quantum feed-forward network, deep neural networks can be executed and trained efficiently. A deep neural network is essentially a network with many hidden-layers, as seen in the sample model neural network above. Since the Quantum neural network being discussed uses fan-out Unitary operators, and each operator only acts on its respective input, only two layers are used at any given time. In other words, no Unitary operator is acting on the entire network at any given time, meaning the number of qubits required for a given step depends on the number of inputs in a given layer. Since Quantum Computers are notorious for their ability to run multiple iterations in a short period of time, the efficiency of a quantum neural network is solely dependent on the number of qubits in any given layer, and not on the depth of the network. === Cost functions === To determine the effectiveness of a neural network, a cost function is used, which essentially measures the proximity of the network's output to the expected or desired output. In a Classical Neural Network, the weights ( w {\displaystyle w} ) and biases ( b {\displaystyle b} ) at each step determine the outcome of the cost function C ( w , b ) {\displaystyle C(w,b)} . When training a Classical Neural network, the weights and biases are adjusted after each iteration, and given equation 1 below, where y ( x ) {\displaystyle y(x)} is the desired output and a out ( x ) {\displaystyle a^{\text{out}}(x)} is the actual output, the cost function is optimized when C ( w , b ) {\displaystyle C(w,b)} = 0. For a quantum neural network, the cost function is determined by measuring the fidelity of the outcome state ( ρ out {\displaystyle \rho ^{\text{out}}} ) with the desired outcome state ( ϕ out {\displaystyle \phi ^{\text{out}}} ), seen in Equation 2 below. In this case, the Unitary operators are adjusted after each it
Large margin nearest neighbor
Large margin nearest neighbor (LMNN) classification is a statistical machine learning algorithm for metric learning. It learns a pseudometric designed for k-nearest neighbor classification. The algorithm is based on semidefinite programming, a sub-class of convex optimization. The goal of supervised learning (more specifically classification) is to learn a decision rule that can categorize data instances into pre-defined classes. The k-nearest neighbor rule assumes a training data set of labeled instances (i.e. the classes are known). It classifies a new data instance with the class obtained from the majority vote of the k closest (labeled) training instances. Closeness is measured with a pre-defined metric. Large margin nearest neighbors is an algorithm that learns this global (pseudo-)metric in a supervised fashion to improve the classification accuracy of the k-nearest neighbor rule. == Setup == The main intuition behind LMNN is to learn a pseudometric under which all data instances in the training set are surrounded by at least k instances that share the same class label. If this is achieved, the leave-one-out error (a special case of cross validation) is minimized. Let the training data consist of a data set D = { ( x → 1 , y 1 ) , … , ( x → n , y n ) } ⊂ R d × C {\displaystyle D=\{({\vec {x}}_{1},y_{1}),\dots ,({\vec {x}}_{n},y_{n})\}\subset R^{d}\times C} , where the set of possible class categories is C = { 1 , … , c } {\displaystyle C=\{1,\dots ,c\}} . The algorithm learns a pseudometric of the type d ( x → i , x → j ) = ( x → i − x → j ) ⊤ M ( x → i − x → j ) {\displaystyle d({\vec {x}}_{i},{\vec {x}}_{j})=({\vec {x}}_{i}-{\vec {x}}_{j})^{\top }\mathbf {M} ({\vec {x}}_{i}-{\vec {x}}_{j})} . For d ( ⋅ , ⋅ ) {\displaystyle d(\cdot ,\cdot )} to be well defined, the matrix M {\displaystyle \mathbf {M} } needs to be positive semi-definite. The Euclidean metric is a special case, where M {\displaystyle \mathbf {M} } is the identity matrix. This generalization is often (falsely) referred to as Mahalanobis metric. Figure 1 illustrates the effect of the metric under varying M {\displaystyle \mathbf {M} } . The two circles show the set of points with equal distance to the center x → i {\displaystyle {\vec {x}}_{i}} . In the Euclidean case this set is a circle, whereas under the modified (Mahalanobis) metric it becomes an ellipsoid. The algorithm distinguishes between two types of special data points: target neighbors and impostors. === Target neighbors === Target neighbors are selected before learning. Each instance x → i {\displaystyle {\vec {x}}_{i}} has exactly k {\displaystyle k} different target neighbors within D {\displaystyle D} , which all share the same class label y i {\displaystyle y_{i}} . The target neighbors are the data points that should become nearest neighbors under the learned metric. Let us denote the set of target neighbors for a data point x → i {\displaystyle {\vec {x}}_{i}} as N i {\displaystyle N_{i}} . === Impostors === An impostor of a data point x → i {\displaystyle {\vec {x}}_{i}} is another data point x → j {\displaystyle {\vec {x}}_{j}} with a different class label (i.e. y i ≠ y j {\displaystyle y_{i}\neq y_{j}} ) which is one of the nearest neighbors of x → i {\displaystyle {\vec {x}}_{i}} . During learning the algorithm tries to minimize the number of impostors for all data instances in the training set. == Algorithm == Large margin nearest neighbors optimizes the matrix M {\displaystyle \mathbf {M} } with the help of semidefinite programming. The objective is twofold: For every data point x → i {\displaystyle {\vec {x}}_{i}} , the target neighbors should be close and the impostors should be far away. Figure 1 shows the effect of such an optimization on an illustrative example. The learned metric causes the input vector x → i {\displaystyle {\vec {x}}_{i}} to be surrounded by training instances of the same class. If it was a test point, it would be classified correctly under the k = 3 {\displaystyle k=3} nearest neighbor rule. The first optimization goal is achieved by minimizing the average distance between instances and their target neighbors ∑ i , j ∈ N i d ( x → i , x → j ) {\displaystyle \sum _{i,j\in N_{i}}d({\vec {x}}_{i},{\vec {x}}_{j})} . The second goal is achieved by penalizing distances to impostors x → l {\displaystyle {\vec {x}}_{l}} that are less than one unit further away than target neighbors x → j {\displaystyle {\vec {x}}_{j}} (and therefore pushing them out of the local neighborhood of x → i {\displaystyle {\vec {x}}_{i}} ). The resulting value to be minimized can be stated as: ∑ i , j ∈ N i , l , y l ≠ y i [ d ( x → i , x → j ) + 1 − d ( x → i , x → l ) ] + {\displaystyle \sum _{i,j\in N_{i},l,y_{l}\neq y_{i}}[d({\vec {x}}_{i},{\vec {x}}_{j})+1-d({\vec {x}}_{i},{\vec {x}}_{l})]_{+}} With a hinge loss function [ ⋅ ] + = max ( ⋅ , 0 ) {\textstyle [\cdot ]_{+}=\max(\cdot ,0)} , which ensures that impostor proximity is not penalized when outside the margin. The margin of exactly one unit fixes the scale of the matrix M {\displaystyle M} . Any alternative choice c > 0 {\displaystyle c>0} would result in a rescaling of M {\displaystyle M} by a factor of 1 / c {\displaystyle 1/c} . The final optimization problem becomes: min M ∑ i , j ∈ N i d ( x → i , x → j ) + λ ∑ i , j , l ξ i j l {\displaystyle \min _{\mathbf {M} }\sum _{i,j\in N_{i}}d({\vec {x}}_{i},{\vec {x}}_{j})+\lambda \sum _{i,j,l}\xi _{ijl}} ∀ i , j ∈ N i , l , y l ≠ y i {\displaystyle \forall _{i,j\in N_{i},l,y_{l}\neq y_{i}}} d ( x → i , x → j ) + 1 − d ( x → i , x → l ) ≤ ξ i j l {\displaystyle d({\vec {x}}_{i},{\vec {x}}_{j})+1-d({\vec {x}}_{i},{\vec {x}}_{l})\leq \xi _{ijl}} ξ i j l ≥ 0 {\displaystyle \xi _{ijl}\geq 0} M ⪰ 0 {\displaystyle \mathbf {M} \succeq 0} The hyperparameter λ > 0 {\textstyle \lambda >0} is some positive constant (typically set through cross-validation). Here the variables ξ i j l {\displaystyle \xi _{ijl}} (together with two types of constraints) replace the term in the cost function. They play a role similar to slack variables to absorb the extent of violations of the impostor constraints. The last constraint ensures that M {\displaystyle \mathbf {M} } is positive semi-definite. The optimization problem is an instance of semidefinite programming (SDP). Although SDPs tend to suffer from high computational complexity, this particular SDP instance can be solved very efficiently due to the underlying geometric properties of the problem. In particular, most impostor constraints are naturally satisfied and do not need to be enforced during runtime (i.e. the set of variables ξ i j l {\displaystyle \xi _{ijl}} is sparse). A particularly well suited solver technique is the working set method, which keeps a small set of constraints that are actively enforced and monitors the remaining (likely satisfied) constraints only occasionally to ensure correctness. == Extensions and efficient solvers == LMNN was extended to multiple local metrics in the 2008 paper. This extension significantly improves the classification error, but involves a more expensive optimization problem. In their 2009 publication in the Journal of Machine Learning Research, Weinberger and Saul derive an efficient solver for the semi-definite program. It can learn a metric for the MNIST handwritten digit data set in several hours, involving billions of pairwise constraints. An open source Matlab implementation is freely available at the authors web page. Kumal et al. extended the algorithm to incorporate local invariances to multivariate polynomial transformations and improved regularization.
Causal Markov condition
The Causal Markov (CM) condition states that, conditional on the set of all its direct causes, a node is independent of all variables which are not effects or direct causes of that node. In the event that the structure of a Bayesian network accurately depicts causality, the two conditions are equivalent. This is related to the Markov condition, an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket. A network may accurately embody the Markov condition without depicting causality, in which case it should not be assumed to embody the causal Markov condition. == Motivation == Statisticians are enormously interested in the ways in which certain events and variables are connected. The precise notion of what constitutes a cause and effect is necessary to understand the connections between them. The central idea behind the philosophical study of probabilistic causation is that causes raise the probabilities of their effects, all else being equal. A deterministic interpretation of causation means that if A causes B, then A must always be followed by B. In this sense, smoking does not cause cancer because some smokers never develop cancer. On the other hand, a probabilistic interpretation simply means that causes raise the probability of their effects. In this sense, changes in meteorological readings associated with a storm do cause that storm, since they raise its probability. (However, simply looking at a barometer does not change the probability of the storm, for a more detailed analysis, see:). == Examples == In a simple view, releasing one's hand from a hammer causes the hammer to fall. However, doing so in outer space does not produce the same outcome, calling into question if releasing one's fingers from a hammer always causes it to fall. A causal graph could be created to acknowledge that both the presence of gravity and the release of the hammer contribute to its falling. However, it would be very surprising if the surface underneath the hammer affected its falling. This essentially states the Causal Markov Condition, that given the existence of gravity the release of the hammer, it will fall regardless of what is beneath it. == Implications == === Dependence and Causation === It follows from the definition that if X and Y are in V and are probabilistically dependent, then either X causes Y, Y causes X, or X and Y are both effects of some common cause Z in V. This definition was seminally introduced by Hans Reichenbach as the Common Cause Principle (CCP). === Screening === It once again follows from the definition that the parents of X screen X from other "indirect causes" of X (parents of Parents(X)) and other effects of Parents(X) which are not also effects of X.
Document-oriented database
A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases, and the popularity of the term "document-oriented database" has grown alongside the adoption of NoSQL itself. XML databases are a subclass of document-oriented databases optimized for XML documents. Graph databases are similar, but add another layer, the relationship, which allows them to link documents for rapid traversal. Document-oriented databases are conceptually an extension of the key–value store, another type of NoSQL database. In key-value stores, data is treated as opaque by the database, whereas document-oriented systems exploit the internal structure of documents to extract metadata and optimize storage and queries. Although in practice the distinction can be minimal due to modern tooling, document stores are designed to provide a richer programming experience with modern programming techniques. Document databases differ significantly from traditional relational databases (RDBs). Relational databases store data in predefined tables, often requiring an object to be split across multiple tables. In contrast, document databases store all information for a given object in a single document, with each document potentially having a unique structure. This design eliminates the need for object-relational mapping when loading data into the database. == Documents == The central concept of a document-oriented database is the notion of a document. Although implementations vary in their specific definitions, document-oriented databases generally treat documents as self-contained units that encapsulate and encode data in a standardized format. Common encoding formats include XML, YAML, JSON, as well as binary representations such as BSON. Documents in a document store are equivalent to the programming concept of an object. They are not required to adhere to a fixed schema, and documents within the same collection may contain different fields or structures. Fields may be optional, and documents of the same logical type may differ in composition. For example, the following illustrates a document encoded in JSON: A second document might be encoded in XML as: The two example documents share some structural elements but also contain unique fields. The structure, text, and other data within each document are collectively referred to as the document's content and can be accessed or modified using retrieval or editing operations. Unlike relational databases, in which each record contains the same fields and unused fields are left empty, document-oriented databases do not require uniform fields across documents. This design allows new information to be added to some documents without affecting the structure of others. Document databases often support the storage of additional metadata alongside the document content. Such metadata may relate to organizational features, security, indexing, or other implementation-specific features. === CRUD operations === The core operations supported by a document-oriented database for manipulating documents are similar to those in other databases. Although terminology is not perfectly standardized, these operations are generally recognized as Create, Read, Update, and Delete (CRUD). Creation (C): Adds a new document to the database. Retrieval (R): Retrieves documents or fields based on queries. Update (U): Modifies the contents of existing documents. Deletion (D): Removes documents from the database. === Keys === Documents in a document-oriented database are addressed via a unique identifier. This identifier, often a string, URI, or path, can be used to retrieve the document from the database. Most document stores maintain an index on the key to optimize retrieval, and in some implementations the key is required when creating or inserting a new document. === Retrieval === In addition to key-based access, document-oriented databases typically provide an API or query language that enables retrieval based on document content or associated metadata. For example, a query may return all documents with a specific field matching a given value. The available query features, indexing options, and performance characteristics vary across implementations. Document stores differ from key-value stores in that they exploit the internal structure and metadata of stored documents. In many key-value stores, values are treated as opaque or "black-box" data, meaning the database system does not interpret their internal structure. By contrast, document-oriented databases can classify and interpret document content. This enables queries that distinguish between types of data––for example, retrieving all phone numbers containing "555" without also matching a postal code such as "55555." === Editing === Document databases typically provide mechanisms for updating or editing the content or metadata of a document. Updates may involve replacing the entire document or modifying individual elements or fields within the document. === Organization === Document database implementations support a variety of methods for organizing documents, including: Collections: Groups of documents. Depending on the implementation, a document may be required to belong to a single collection or may be allowed in multiple collections. Tags and non-visible metadata: Additional data stored outside the main document content. Directory hierarchies: Documents organized in a tree-like structure, often based on path or URI. These organizational structures may differ between logical and physical representations (e.g. on disk or in memory). == Relationship to other databases == === Relationship to key-value stores === A document-oriented database can be viewed as a specialized form of key-value store, which is itself a category of NoSQL database. In a basic key-value store, the stored value is typically treated as opaque by the database system. By contrast, a document-oriented database provides APIs or a query and update language that allows queries and modifications based on the internal structure of the document. For users who do not require advanced query, retrieval, or update capabilities, the distinction between document-oriented databases and key-value stores may be minimal. === Relationship to search engines === Some search engine and information retrieval systems, such as Apache Solr and Elasticsearch, provide document storage and support core document operations. As a result, they may meet certain functional definitions of a document-oriented database, although their primary design goals differ. === Relationship to relational databases === In a relational database, data is organized into predefined types represented as tables. Each table contains rows (records) with a fixed set of columns (fields), so all records in a table share the same structure. Administrators typically define indexes on selected fields to improve query performance. A central principle of relational database design is database normalization, in which data that might otherwise be repeated is stored in separate tables and linked using keys. When records in different tables are related, a foreign key is used to associate them. For example, an address book application may store a contact's name, image, phone numbers, mailing addresses, and email addresses. In a normalized relational design, separate tables might be created for contacts, phone numbers, and email addresses. The phone number table would include a foreign key referencing the associated contact. To reconstruct a complete contact record, the database retrieves related information from each table using the foreign keys and combines it into a single record. In contrast, a document-oriented database stores all data related to an object within a single document, and stored in the database as a single entry. In the address book example,the contact's name, image, and contact information may be stored together in one document. The document is retrieved using a unique key, and all related information is returned together, without needing to look up multiple tables. A key difference between the document-oriented and relational models is that the data formats are not predefined in the document case. In most cases, any sort of document can be stored in a database, and documents can change in type and form over time. For example, a new field such as COUNTRY_FLAG can be added to new documents as they are inserted without affecting existing documents. To aid retrieval, document-oriented systems generally allow the administrator to provide hints to the database for locating certain types of information. These hints work in a similar fashion to indexes in relational databases. Many systems also allow additional metadata outside the content of the document itself
Universal approximation theorem
In the field of machine learning, the universal approximation theorems (UATs) state that neural networks with a certain structure can, in principle, approximate any continuous function to any desired degree of accuracy. These theorems provide a mathematical justification for using neural networks, assuring researchers that a sufficiently large or deep network can model the complex, non-linear relationships often found in real-world data. The best-known version of the theorem applies to feedforward networks with a single hidden layer. It states that if the layer's activation function is non-polynomial (which is true for common choices like the sigmoid function or ReLU), then the network can act as a "universal approximator." Universality is achieved by increasing the number of neurons in the hidden layer, making the network "wider." Other versions of the theorem show that universality can also be achieved by keeping the network's width fixed but increasing its number of layers, making it "deeper." These are existence theorems. They guarantee that a network with the right structure exists, but they do not provide a method for finding the network's parameters (training it), nor do they specify exactly how large the network must be for a given function. Finding a suitable network remains a practical challenge that is typically addressed with optimization algorithms like backpropagation. == Setup == Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces. Most universal approximation theorems are in one of two classes. The first quantifies the approximation capabilities of neural networks with an arbitrary number of artificial neurons ("arbitrary width" case) and the second focuses on the case with an arbitrary number of hidden layers, each containing a limited number of artificial neurons ("arbitrary depth" case). In addition to these two classes, there are also universal approximation theorems for neural networks with bounded number of hidden layers and a limited number of neurons in each layer ("bounded depth and bounded width" case). == History == === Arbitrary width === The first results concerned the arbitrary width case. Ken-ichi Funahashi (May 1989) showed that Rumelhart–Hinton–Williams type backpropagation networks possess universal approximation capability with a class of sigmoidal activation functions, extending the result to multi-output mappings as well. Kurt Hornik, Maxwell Stinchcombe, and Halbert White (July 1989) showed that multilayer feed-forward networks with as few as one hidden layer are universal approximators, provided that the activation function satisfies certain conditions. George Cybenko (December 1989) independently established a related result for sigmoid activation functions using functional-analytic methods. Hornik also showed in 1991 that it is not the specific choice of the activation function but rather the multilayer feed-forward architecture itself that gives neural networks the potential of being universal approximators. Moshe Leshno et al in 1993 and later Allan Pinkus in 1999 showed that the universal approximation property is equivalent to having a nonpolynomial activation function. === Arbitrary depth === The arbitrary depth case was also studied by a number of authors such as Gustaf Gripenberg in 2003, Dmitry Yarotsky, Zhou Lu et al in 2017, Boris Hanin and Mark Sellke in 2018 who focused on neural networks with ReLU activation function. In 2020, Patrick Kidger and Terry Lyons extended those results to neural networks with general activation functions such, e.g. tanh or GeLU. One special case of arbitrary depth is that each composition component comes from a finite set of mappings. In 2024, Cai constructed a finite set of mappings, named a vocabulary, such that any continuous function can be approximated by compositing a sequence from the vocabulary. This is similar to the concept of compositionality in linguistics, which is the idea that a finite vocabulary of basic elements can be combined via grammar to express an infinite range of meanings. === Bounded depth and bounded width === The bounded depth and bounded width case was first studied by Maiorov and Pinkus in 1999. They showed that there exists an analytic sigmoidal activation function such that two hidden layer neural networks with bounded number of units in hidden layers are universal approximators. In 2018, Guliyev and Ismailov constructed a smooth sigmoidal activation function providing universal approximation property for two hidden layer feedforward neural networks with fewer units in hidden layers. In 2018, they also constructed single hidden layer networks with bounded width that are still universal approximators for univariate functions. However, this does not apply for multivariable functions. In 2022, Shen et al. obtained precise quantitative information on the depth and width required to approximate a target function by deep and wide ReLU neural networks. === Quantitative bounds === The question of minimal possible width for universality was first studied in 2021, Park et al obtained the minimum width required for the universal approximation of Lp functions using feed-forward neural networks with ReLU as activation functions. Similar results that can be directly applied to residual neural networks were also obtained in the same year by Paulo Tabuada and Bahman Gharesifard using control-theoretic arguments. In 2023, Cai obtained the optimal minimum width bound for the universal approximation. For the arbitrary depth case, Leonie Papon and Anastasis Kratsios derived explicit depth estimates depending on the regularity of the target function and of the activation function. === Kolmogorov network === The Kolmogorov–Arnold representation theorem is similar in spirit. Indeed, certain neural network families can directly apply the Kolmogorov–Arnold theorem to yield a universal approximation theorem. Robert Hecht-Nielsen showed that a three-layer neural network can approximate any continuous multivariate function. This was extended to the discontinuous case by Vugar Ismailov. In 2024, Ziming Liu and co-authors showed a practical application. === Reservoir computing and quantum reservoir computing === In reservoir computing a sparse recurrent neural network with fixed weights equipped of fading memory and echo state property is followed by a trainable output layer. Its universality has been demonstrated separately for what concerns networks of rate neurons and spiking neurons, respectively. In 2024, the framework has been generalized and extended to quantum reservoirs where the reservoir is based on qubits defined over Hilbert spaces. === Variants === Variants include discontinuous activation functions, noncompact domains, certifiable networks, random neural networks, and alternative network architectures and topologies. The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. For input dimension d x {\displaystyle d_{x}} and output dimension d y {\displaystyle d_{y}} the minimum width required for the universal approximation of the Lp functions is exactly m a x { d x + 1 , d y } {\displaystyle max\{d_{x}+1,d_{y}\}} (for a ReLU network). More generally this also holds if both ReLU and a threshold activation function are used. Universal function approximation on graphs (or rather on graph isomorphism classes) by popular graph convolutional neural networks (GCNs or GNNs) can be made as discriminative as the Weisfeiler–Leman graph isomorphism test. In 2020, a universal approximation theorem result was established by Brüel-Gabrielsson, showing that graph representation with certain injective properties is sufficient for universal function approximation on bounded graphs and restricted universal function approximation on unbounded graphs, with an accompanying O ( | V | ⋅ | E | ) {\displaystyle {\mathcal {O}}(\left|V\right|\cdot \left|E\right|)} -runtime method that performed at state of the art on a collection of benchmarks (where V {\displaystyle V} and E {\displaystyle E} are the sets of nodes and edges of the graph respectively). There are also a variety of results between non-Euclidean spaces and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the convolutional neural network (CNN) architecture, radial basis functions, or neural networks with specific properties. == Arbitrary-width case == A universal approximation theorem formally states that a family of neural network funct