Conditional disclosure of secrets (CDS) is a primitive, studied in information-theoretic cryptography, that allows distributed, non-communicating parties to coordinate the release of information to a third party. CDS was initially introduced for use in the context of private information retrieval, and has been related to communication complexity and non-local quantum computation. == Definition of conditional disclosure of secrets == The conditional disclosure of secrets setting involves three players; Alice, Bob and the referee. Alice receives an input x ∈ { 0 , 1 } n {\displaystyle x\in \{0,1\}^{n}} and a secret z ∈ { 0 , 1 } {\displaystyle z\in \{0,1\}} , and Bob receives a string y ∈ { 0 , 1 } n {\displaystyle y\in \{0,1\}^{n}} . A choice of Boolean function f : { 0 , 1 } 2 n → { 0 , 1 } {\displaystyle f:\{0,1\}^{2n}\rightarrow \{0,1\}} is fixed in advance and known to all players. Alice and Bob cannot communicate with one another, but share a string of random bits which we label r {\displaystyle r} . Alice and Bob compute messages m A = m A ( x , z , r ) {\displaystyle m_{A}=m_{A}(x,z,r)} and m B = m B ( y , r ) {\displaystyle m_{B}=m_{B}(y,r)} , which they send to the referee. The referee knows ( x , y ) {\displaystyle (x,y)} . A CDS protocol consists of the encoding maps applied by Alice and Bob. A protocol is said to be ϵ {\displaystyle \epsilon } -correct if, for all ( x , y ) ∈ f − 1 ( 1 ) {\displaystyle (x,y)\in f^{-1}(1)} , the referee can determine z {\displaystyle z} with probability 1 − ϵ {\displaystyle 1-\epsilon } . A protocol is said to be δ {\displaystyle \delta } -secure if, for all ( x , y ) ∈ f − 1 ( 0 ) {\displaystyle (x,y)\in f^{-1}(0)} the distribution of the messages is δ {\displaystyle \delta } close to a simulator distribution (in total variation distance), where the simulator distribution is independent of z {\displaystyle z} . The communication complexity of a CDS protocol P is the total number of bits of message sent by Alice and Bob. The CDS communication cost of a function, C D S ϵ , δ ( f ) {\displaystyle CDS_{\epsilon ,\delta }(f)} is the minimal communication cost of an ϵ {\displaystyle \epsilon } -correct, δ {\displaystyle \delta } secure protocol that implements f {\displaystyle f} . The randomness complexity and randomness cost of implementing a function in the CDS model are defined similarly, but consider the number of bits of shared random bits held by Alice and Bob. == Basic properties of the primitive == === Amplification === Supposing we have an ϵ {\displaystyle \epsilon } -correct and δ {\displaystyle \delta } -secure CDS protocol, it is known that we can find a new protocol which reduces the correctness and privacy errors at the expense of an increased communication and randomness cost. More specifically, the following theorem has been proven Theorem (Amplification). A CDS protocol for f which supports a single-bit secret with privacy and correctness error of 1/3 can be transformed into a new CDS protocol with privacy and correctness error of 2 − Ω ( k ) {\displaystyle 2^{-\Omega (k)}} and communication/randomness complexity which are larger than those of the original protocol by a multiplicative factor of O(k). In fact, somewhat more than the above theorem is true in that the size of the secret can also be made to be of length k {\displaystyle k} , while simultaneously reducing the correctness and privacy errors as above. The proof involves first encoding the secret z {\displaystyle z} into a secret sharing scheme, and then running the original CDS protocol on each share of the resulting scheme. === Closure === If a CDS protocol for a function f {\displaystyle f} is known, then certain simple modifications of f {\displaystyle f} have CDS protocols with similar efficiency. The simplest case is to consider a CDS protocol for function f {\displaystyle f} and ask for a similarly efficient protocol for the negation of f {\displaystyle f} , labelled ¬ f {\displaystyle \neg f} . This is addressed by the following theorem Theorem (CDS is closed under complement). Suppose that f has a CDS protocol with randomness cost of ρ {\displaystyle \rho } bits, communication complexity of t {\displaystyle t} bits, and privacy and correctness errors δ = ϵ = 2 − k {\displaystyle \delta =\epsilon =2^{-k}} . Then ¬ f {\displaystyle \neg f} has a CDS scheme with similar privacy and correctness errors, and randomness and communication complexity of O ( k 3 ρ 2 t + k 3 ρ 3 ) {\displaystyle O(k^{3}\rho ^{2}t+k^{3}\rho ^{3})} . The cost of a CDS protocol is also closed under formula's, in the following sense. Consider two functions f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} . Then, the communication and randomness costs of f 1 ∧ f 2 {\displaystyle f_{1}\wedge f_{2}} as well as f 1 ∨ f 2 {\displaystyle f_{1}\vee f_{2}} are not much larger than the sum of the costs for f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} . See Applebaum et al. for a precise statement. == Upper and lower bounds on communication cost == Given a function f {\displaystyle f} we would like to understand the communication and randomness costs to implement f {\displaystyle f} in the CDS setting. Towards understanding this, protocols for implementing CDS have been developed (which give an upper bound on the cost) and lower bound strategies have been developed. For most functions, there is a large gap between the known upper and lower bound, so understanding the cost of CDS remains largely an open problem. This section presents some of what is known so far about the cost of CDS. === Secret sharing based upper bound === A subject with a close relationship to CDS is secret sharing. Secret sharing constructions provide an upper bound on the cost of CDS protocols. A secret sharing scheme encodes a secret, s {\displaystyle s} into a set of shares S 1 , . . . , S n {\displaystyle S_{1},...,S_{n}} . Associated to any secret sharing scheme is an access structure, which consists of a set of authorized sets A = A 1 , . . . , A k {\displaystyle {\mathcal {A}}={A_{1},...,A_{k}}} with A i ⊆ { S 1 , . . . , S n } {\displaystyle A_{i}\subseteq \{S_{1},...,S_{n}\}} . The authorized sets are those subsets of the A i {\displaystyle A_{i}} from which it is possible to recover the secret recorded into the scheme. A succinct way to describe an access structure is in terms of a function f A : { 0 , 1 } n → { 0 , 1 } {\displaystyle f_{\mathcal {A}}:\{0,1\}^{n}\rightarrow \{0,1\}} . Each subset of the shares K [ x ] ⊂ { S 1 , . . . , S n } {\displaystyle K[x]\subset \{S_{1},...,S_{n}\}} is labelled by a string x ∈ { 0 , 1 } n {\displaystyle x\in \{0,1\}^{n}} such that x i = 1 {\displaystyle x_{i}=1} if and only if S i ∈ K {\displaystyle S_{i}\in K} . Then we define f A {\displaystyle f_{\mathcal {A}}} to be such that f A ( x ) = 1 {\displaystyle f_{\mathcal {A}}(x)=1} if and only if K [ x ] ∈ A {\displaystyle K[x]\in {\mathcal {A}}} . In words, the function f A {\displaystyle f_{\mathcal {A}}} is 1 when given an authorized subset as input, and 0 otherwise. A basic result in the theory of secret sharing is that an access structure A {\displaystyle {\mathcal {A}}} can be realized in a secret sharing scheme if and only if f A {\displaystyle f_{\mathcal {A}}} is monotone. The size of a secret sharing scheme is defined as the total number of bits in the shares S i {\displaystyle S_{i}} . For monotone functions, there is an upper bound on the communication cost in CDS for any monotone function f {\displaystyle f} in terms of the size of any secret sharing scheme with access structure given by f {\displaystyle f} , C D S ϵ = 0 , δ = 0 ( f ) ≤ S h a r i n g S i z e ( f ) {\displaystyle CDS_{\epsilon =0,\delta =0}(f)\leq SharingSize(f)} For some concrete classes of secret sharing schemes, this relationship can be extended to general (non-monotone) Boolean functions. This leads to an upper bound on CDS cost in terms of the size of any span program that computes f {\displaystyle f} , C D S ϵ = 0 , δ = 0 ( f ) ≤ S P k ( f ) {\displaystyle CDS_{\epsilon =0,\delta =0}(f)\leq SP_{k}(f)} The class of problems with efficient (polynomial size) span program is the complexity class M o d k L {\displaystyle Mod_{k}L} , so problems in this class have efficient CDS protocols. === Sub-exponential upper bounds for all functions === Using a matching vector family based construction, it has been proven that ∀ f , C D S ϵ = 0 , δ = 0 ( f ) ≤ 2 O ( n log n ) {\displaystyle \forall f,\,\,\,\,\,\,CDS_{\epsilon =0,\delta =0}(f)\leq 2^{O({\sqrt {n\log n}})}} . The technique for this proof is similar to one used to prove upper bounds on private information retrieval. This upper bound on CDS also leads to sub-exponential upper bounds on the size of a large class of secret sharing schemes. === Lower bounds from communication complexity === In a CDS protocol, the referee is given the inputs ( x , y ) {\displaystyle (x,y)} . This means it is not clear if the messages sent by Alice a
Data cube
In computer programming, a data cube (or datacube) is a multi-dimensional array of values. Typically, the term "data cube" is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data. Even though it is called a cube, a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. In satellite image timeseries, dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite. For example, in online analytical processing, an OLAP cube about a company would have dimensions that could be the company subsidiaries, the company products, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only a few values with the rest being empty, i.e. undefined, while sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, and in the second case they are called dense, although there is no hard delineation between the two. Data cubes may be stored in database management systems (DBMS) as part of array DBMS. Spatio-temporal databases and geospatial databases may also be represented as coverage data. == History == Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running. A series of data exchange formats support storage and transmission of data cube-like data, often tailored towards particular application domains. Examples include MDX for statistical (in particular, business) data, Zarr and Hierarchical Data Format for general scientific data, and TIFF for imagery. In 1992, Peter Baumann introduced management of massive data cubes with high-level user functionality combined with an efficient software architecture. Datacube operations include subset extraction, processing, fusion, and in general queries in the spirit of data manipulation languages like SQL. Some years after, the data cube concept was applied to describe time-varying business data as data cubes by Jim Gray, et al., and by Venky Harinarayan, Anand Rajaraman and Jeff Ullman. Around that time, a working group on Multi-Dimensional Databases ("Arbeitskreis Multi-Dimensionale Datenbanken") was established at German Gesellschaft für Informatik. Datacube Inc. was an image processing company selling hardware and software applications for the PC market in 1996, however without addressing data cubes as such. The EarthServer initiative has established geo data cube service requirements. == Standardization == In 2018, the ISO SQL database language was extended with data cube functionality as "SQL – Part 15: Multi-dimensional arrays (SQL/MDA)". Web Coverage Processing Service is a geo data cube analytics language issued by the Open Geospatial Consortium in 2008. In addition to the common data cube operations, the language knows about the semantics of space and time and supports both regular and irregular grid data cubes, based on the concept of coverage data. An industry standard for querying business data cubes, originally developed by Microsoft, is MultiDimensional eXpressions. == Implementation == Many high-level computer languages treat data cubes and other large arrays as single entities distinct from their contents. These languages, of which Fortran, APL, IDL, NumPy, PDL, and S-Lang are examples, allow the programmer to manipulate complete film clips and other data en masse with simple expressions derived from linear algebra and vector mathematics. Some languages (such as PDL) distinguish between a list of images and a data cube, while many (such as IDL) do not. Array DBMSs (Database Management Systems) offer a data model which generically supports definition, management, retrieval, and manipulation of n-dimensional data cubes. This database category has been pioneered by the rasdaman system since 1994. == Applications == Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a data cube. === Mathematics === In mathematics, a one-dimensional array corresponds to a vector, a two-dimensional array resembles a matrix; more generally, a tensor may be represented as an n-dimensional data cube. === Science and engineering === For a time sequence of color images, the array is generally four-dimensional, with the dimensions representing image X and Y coordinates, time, and RGB (or other color space) color plane. For example, the EarthServer initiative unites data centers from different continents offering 3-D x/y/t satellite image timeseries and 4-D x/y/z/t weather data for retrieval and server-side processing through the Open Geospatial Consortium WCPS geo data cube query language standard. A data cube is also used in the field of imaging spectroscopy, since a spectrally-resolved image is represented as a three-dimensional volume. Earth observation data cubes combine satellite imagery such as Landsat 8 and Sentinel-2 with Geographic information system analytics. === Business intelligence === In online analytical processing (OLAP), data cubes are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.
Deep Learning Anti-Aliasing
Deep Learning Anti-Aliasing (DLAA) is a form of spatial anti-aliasing developed by Nvidia. DLAA depends on and requires Tensor Cores available in Nvidia RTX cards. DLAA is similar to Deep Learning Super Sampling (DLSS) in its anti-aliasing method, with one important differentiation being that the goal of DLSS is to increase performance at the cost of image quality, whereas the main priority of DLAA is improving image quality at the cost of performance (irrelevant of resolution upscaling or downscaling). DLAA is similar to temporal anti-aliasing (TAA) in that they are both spatial anti-aliasing solutions relying on past frame data. Compared to TAA, DLAA is substantially better when it comes to shimmering, flickering, and handling small meshes like wires. == Technical overview == DLAA collects game rendering data including raw low-resolution input, motion vectors, depth buffers, and exposure information. This information feeds into a convolutional neural network that processes the image to reduce aliasing while preserving fine detail. The neural network architecture employs an auto-encoder design trained on high-quality reference images. The training dataset includes diverse scenarios focusing on challenging cases like sub-pixel details, high-contrast edges, and transparent surfaces. The network then processes frames in real-time. Unlike traditional anti-aliasing solutions that rely on manually written heuristics, such as TAA, DLAA uses its neural network to preserve fine details while eliminating unwanted visual artifacts. == History == DLAA was initially called and marketed by Nvidia as DLSS 2x. The first game that added support for DLAA was The Elder Scrolls Online, which implemented the feature in 2021. By June 2022, DLAA was only available in six games. This number rose to 17 by February 2023. In June 2023, TechPowerUp reported that "DLAA is seeing sluggish adoption among game developers", and that Nvidia was working on adding DLAA to the quality presets of DLSS to boost adoption. By December 2023, DLAA was supported in 41 games. In early 2025, an update for the Nvidia App added a driver-based DLSS override feature that enables users to activate DLAA even in games that do not support it natively. == Differences between TAA and DLAA == TAA is used in many modern video games and game engines; however, all previous implementations have used some form of manually written heuristics to prevent temporal artifacts such as ghosting and flickering. One example of this is neighborhood clamping which forcefully prevents samples collected in previous frames from deviating too much compared to nearby pixels in newer frames. This helps to identify and fix many temporal artifacts, but deliberately removing fine details in this way is analogous to applying a blur filter, and thus the final image can appear blurry when using this method. DLAA uses an auto-encoder convolutional neural network trained to identify and fix temporal artifacts, instead of manually programmed heuristics as mentioned above. Because of this, DLAA can generally resolve detail better than other TAA and TAAU implementations, while also removing most temporal artifacts. == Differences between DLSS and DLAA == While DLSS handles upscaling with a focus on performance, DLAA handles anti-aliasing with a focus on visual quality. DLAA runs at the given screen resolution with no upscaling or downscaling functionality provided by DLAA. DLSS and DLAA share the same AI-driven anti-aliasing method. As such, DLAA functions like DLSS without the upscaling part. Both are made by Nvidia and require Tensor Cores. However, DLSS and DLAA cannot be enabled at the same time, only one can be selected depending on whether performance or image quality is prioritized. == Reception == TechPowerUp found that "[c]ompared to TAA and DLSS, DLAA is clearly producing the best image quality, especially at lower resolutions", arguing that, while "DLSS was already doing a better job than TAA at reconstructing small objects", "DLAA does an even better job". In a Cyberpunk 2077 performance test, IGN stated that "DLAA provided somewhat similar results [FPS wise] to the normal raster mode in most cases but got significant performance boost with the help of frame generation", a feature not available when using native resolution. Rock Paper Shotgun noted that, while DLAA is "not a completely perfect form of anti-aliasing, as the occasional jaggies are present", it "looks a lot sharper overall [than TAA], and especially in motion." According to PC World, "DLAA offers very good anti-aliasing without losing visual information — alternatives like TAA tend to struggle during motion-filled scenes, where DLAA doesn’t. Furthermore, DLAA’s loss of performance is lower than with conventional anti-aliasing methods."
Inductive probability
Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world. There are three sources of knowledge: inference, communication, and deduction. Communication relays information found using other methods. Deduction establishes new facts based on existing facts. Inference establishes new facts from data. Its basis is Bayes' theorem. Information describing the world is written in a language. For example, a simple mathematical language of propositions may be chosen. Sentences may be written down in this language as strings of characters. But in the computer it is possible to encode these sentences as strings of bits (1s and 0s). Then the language may be encoded so that the most commonly used sentences are the shortest. This internal language implicitly represents probabilities of statements. Occam's razor says the "simplest theory, consistent with the data is most likely to be correct". The "simplest theory" is interpreted as the representation of the theory written in this internal language. The theory with the shortest encoding in this internal language is most likely to be correct. == History == Probability and statistics was focused on probability distributions and tests of significance. Probability was formal, well defined, but limited in scope. In particular its application was limited to situations that could be defined as an experiment or trial, with a well defined population. Bayes's theorem is named after Rev. Thomas Bayes 1701–1761. Bayesian inference broadened the application of probability to many situations where a population was not well defined. But Bayes' theorem always depended on prior probabilities, to generate new probabilities. It was unclear where these prior probabilities should come from. Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how patterns in the data may be represented by computer programs, that give shorter representations of the data circa 1964. Chris Wallace and D. M. Boulton developed minimum message length circa 1968. Later Jorma Rissanen developed the minimum description length circa 1978. These methods allow information theory to be related to probability, in a way that can be compared to the application of Bayes' theorem, but which give a source and explanation for the role of prior probabilities. Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theory for the Pareto optimal behavior for an Intelligent agent, circa 1998. === Minimum description/message length === The program with the shortest length that matches the data is the most likely to predict future data. This is the thesis behind the minimum message length and minimum description length methods. At first sight Bayes' theorem appears different from the minimimum message/description length principle. At closer inspection it turns out to be the same. Bayes' theorem is about conditional probabilities, and states the probability that event B happens if firstly event A happens: P ( A ∧ B ) = P ( B ) ⋅ P ( A | B ) = P ( A ) ⋅ P ( B | A ) {\displaystyle P(A\land B)=P(B)\cdot P(A|B)=P(A)\cdot P(B|A)} becomes in terms of message length L, L ( A ∧ B ) = L ( B ) + L ( A | B ) = L ( A ) + L ( B | A ) . {\displaystyle L(A\land B)=L(B)+L(A|B)=L(A)+L(B|A).} This means that if all the information is given describing an event then the length of the information may be used to give the raw probability of the event. So if the information describing the occurrence of A is given, along with the information describing B given A, then all the information describing A and B has been given. ==== Overfitting ==== Overfitting occurs when the model matches the random noise and not the pattern in the data. For example, take the situation where a curve is fitted to a set of points. If a polynomial with many terms is fitted then it can more closely represent the data. Then the fit will be better, and the information needed to describe the deviations from the fitted curve will be smaller. Smaller information length means higher probability. However, the information needed to describe the curve must also be considered. The total information for a curve with many terms may be greater than for a curve with fewer terms, that has not as good a fit, but needs less information to describe the polynomial. === Inference based on program complexity === Solomonoff's theory of inductive inference is also inductive inference. A bit string x is observed. Then consider all programs that generate strings starting with x. Cast in the form of inductive inference, the programs are theories that imply the observation of the bit string x. The method used here to give probabilities for inductive inference is based on Solomonoff's theory of inductive inference. ==== Detecting patterns in the data ==== If all the bits are 1, then people infer that there is a bias in the coin and that it is more likely also that the next bit is 1 also. This is described as learning from, or detecting a pattern in the data. Such a pattern may be represented by a computer program. A short computer program may be written that produces a series of bits which are all 1. If the length of the program K is L ( K ) {\displaystyle L(K)} bits then its prior probability is, P ( K ) = 2 − L ( K ) {\displaystyle P(K)=2^{-L(K)}} The length of the shortest program that represents the string of bits is called the Kolmogorov complexity. Kolmogorov complexity is not computable. This is related to the halting problem. When searching for the shortest program some programs may go into an infinite loop. ==== Considering all theories ==== The Greek philosopher Epicurus is quoted as saying "If more than one theory is consistent with the observations, keep all theories". As in a crime novel all theories must be considered in determining the likely murderer, so with inductive probability all programs must be considered in determining the likely future bits arising from the stream of bits. Programs that are already longer than n have no predictive power. The raw (or prior) probability that the pattern of bits is random (has no pattern) is 2 − n {\displaystyle 2^{-n}} . Each program that produces the sequence of bits, but is shorter than the n is a theory/pattern about the bits with a probability of 2 − k {\displaystyle 2^{-k}} where k is the length of the program. The probability of receiving a sequence of bits y after receiving a series of bits x is then the conditional probability of receiving y given x, which is the probability of x with y appended, divided by the probability of x. ==== Universal priors ==== The programming language affects the predictions of the next bit in the string. The language acts as a prior probability. This is particularly a problem where the programming language codes for numbers and other data types. Intuitively we think that 0 and 1 are simple numbers, and that prime numbers are somehow more complex than numbers that may be composite. Using the Kolmogorov complexity gives an unbiased estimate (a universal prior) of the prior probability of a number. As a thought experiment an intelligent agent may be fitted with a data input device giving a series of numbers, after applying some transformation function to the raw numbers. Another agent might have the same input device with a different transformation function. The agents do not see or know about these transformation functions. Then there appears no rational basis for preferring one function over another. A universal prior insures that although two agents may have different initial probability distributions for the data input, the difference will be bounded by a constant. So universal priors do not eliminate an initial bias, but they reduce and limit it. Whenever we describe an event in a language, either using a natural language or other, the language has encoded in it our prior expectations. So some reliance on prior probabilities are inevitable. A problem arises where an intelligent agent's prior expectations interact with the environment to form a self reinforcing feed back loop. This is the problem of bias or prejudice. Universal priors reduce but do not eliminate this problem. === Universal artificial intelligence === The theory of universal artificial intelligence applies decision theory to inductive probabilities. The theory shows how the best actions to optimize a reward function may be chosen. The result is a theoretical model of intelligence. It is a fundamental theory of intelligence, which optimizes the agents behavior in, Exploring the environment; performing actions to get responses that broaden the agents knowledge. Competing or co-operating with another agent; games. Balancing short and long term rewards. In general no agent will always provi
Machine-learned interatomic potential
Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.
Computational photography
Computational photography refers to digital image capture and processing techniques that use digital computation instead of optical processes. Computational photography can improve the capabilities of a camera, or introduce features that were not possible at all with film-based photography, or reduce the cost or size of camera elements. Examples of computational photography include in-camera computation of digital panoramas, high-dynamic-range images, and light field cameras. Light field cameras use novel optical elements to capture three-dimensional scene information, which can then be used to produce 3D images, enhanced depth-of-field, and selective de-focusing (or "post focus"). Enhanced depth-of-field reduces the need for mechanical focusing systems. All of these features use computational imaging techniques. The definition of computational photography has evolved to cover a number of subject areas in computer graphics, computer vision, and applied optics. These areas are given below, organized according to a taxonomy proposed by Shree K. Nayar. Within each area is a list of techniques, and for each technique, one or two representative papers or books are cited. Deliberately omitted from the taxonomy are image processing (see also digital image processing) techniques applied to traditionally captured images to produce better images. Examples of such techniques are image scaling, dynamic range compression (i.e. tone mapping), color management, image completion (a.k.a. inpainting or hole filling), image compression, digital watermarking, and artistic image effects. Also omitted are techniques that produce range data, volume data, 3D models, 4D light fields, 4D, 6D, or 8D BRDFs, or other high-dimensional image-based representations. Epsilon photography is a sub-field of computational photography. == Effect on photography == Photos taken using computational photography can allow amateurs to produce photographs rivalling the quality of professional photographers, but as of 2019 do not outperform the use of professional-level equipment. == Computational illumination == This is controlling photographic illumination in a structured fashion, then processing the captured images, to create new images. The applications include image-based relighting, image enhancement, image deblurring, geometry/material recovery and so forth. High-dynamic-range imaging uses differently exposed pictures of the same scene to extend dynamic range. Other examples include processing and merging differently illuminated images of the same subject matter ("lightspace"). == Computational optics == This is a capture of optically coded images, followed by computational decoding to produce new images. Coded aperture imaging was mainly applied in astronomy and X-ray imaging to boost the image quality. Instead of a single pin-hole, a pinhole pattern is applied in imaging, and deconvolution is performed to recover the image. In coded exposure imaging, the on/off state of the shutter is coded to modify the kernel of motion blur. In this way, motion deblurring becomes a well-conditioned problem. Similarly, in a lens based coded aperture, the aperture can be modified by inserting a broadband mask. Thus, out of focus deblurring becomes a well-conditioned problem. The coded aperture can also improve the quality in light field acquisition using Hadamard transform optics. Coded aperture patterns can also be designed using color filters, in order to apply different codes at different wavelengths. This allows for increase the amount of light that reaches the camera sensor, compared to binary masks. == Computational imaging == Computational imaging is a set of imaging techniques that combine data acquisition and data processing to create the image of an object through indirect means to yield enhanced resolution, additional information such as optical phase or 3D reconstruction. The information is often recorded without using a conventional optical microscope configuration or with limited datasets. Computational imaging allows going beyond physical limitations of optical systems, such as numerical aperture, or even obliterates the need for optical elements. For parts of the optical spectrum where imaging elements such as objectives are difficult to manufacture or image sensors cannot be miniaturized, computational imaging provides useful alternatives, in fields such as X-ray and THz radiations. === Common techniques === Among common computational imaging techniques are lensless imaging, computational speckle imaging , ptychography and Fourier ptychography. Computational imaging technique often draws on compressive sensing or phase retrieval techniques, where the angular spectrum of the object is reconstructed. Other techniques are related to the field of computational imaging, such as digital holography, computer vision and inverse problems such as tomography. == Computational processing == This is the processing of non-optically-coded images to produce new images. == Computational sensors == These are detectors that combine sensing and processing, typically in hardware, like the oversampled binary image sensor. == Early work in computer vision == Although computational photography is a currently popular buzzword in computer graphics, many of its techniques first appeared in the computer vision literature, either under other names or within papers aimed at 3D shape analysis. == Art history == Computational photography, as an art form, has been practiced by capturing differently exposed pictures of the same subject matter and combining them. This was the inspiration for the development of the wearable computer in the 1970s and early 1980s. Computational photography was inspired by the work of Charles Wyckoff, and thus computational photography datasets (e.g. differently exposed pictures of the same subject matter that are taken in order to make a single composite image) are sometimes referred to as Wyckoff Sets, in his honor. Early work in this area (joint estimation of image projection and exposure value) was undertaken by Mann and Candoccia. Charles Wyckoff devoted much of his life to creating special kinds of 3-layer photographic films that captured different exposures of the same subject matter. A picture of a nuclear explosion, taken on Wyckoff's film, appeared on the cover of Life Magazine and showed the dynamic range from the dark outer areas to the inner core.
Solomonoff's theory of inductive inference
Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.