Variational autoencoder

Variational autoencoder

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling in 2013. It is part of the families of probabilistic graphical models and variational Bayesian methods. In addition to being seen as an autoencoder neural network architecture, variational autoencoders can also be studied within the mathematical formulation of variational Bayesian methods, connecting a neural encoder network to its decoder through a probabilistic latent space (for example, as a multivariate Gaussian distribution) that corresponds to the parameters of a variational distribution. Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the reparameterization trick, although the variance of the noise model can be learned separately. Although this type of model was initially designed for unsupervised learning, its effectiveness has been proven for semi-supervised learning and supervised learning. == Overview of architecture and operation == A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually such models are trained using the expectation-maximization meta-algorithm (e.g. probabilistic PCA, (spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is usually computationally intractable, and in doing so requires the discovery of q-distributions, or variational posteriors. These q-distributions are normally parameterized for each individual data point in a separate optimization process. However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. In that way, the same parameters are reused for multiple data points, which can result in massive memory savings. The first neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder. The decoder is the second neural network of this model. It is a function that maps from the latent space to the input space, e.g. as the means of the noise distribution. It is possible to use another neural network that maps to the variance, however this can be omitted for simplicity. In such a case, the variance can be optimized with gradient descent. To optimize this model, one needs to know two terms: the "reconstruction error", and the Kullback–Leibler divergence (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data, here referred to as p-distribution. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value. More recent approaches replace Kullback–Leibler divergence (KL-D) with various statistical distances, see "Statistical distance VAE variants" below. == Formulation == From the point of view of probabilistic modeling, one wants to maximize the likelihood of the data x {\displaystyle x} by their chosen parameterized probability distribution p θ ( x ) = p ( x | θ ) {\displaystyle p_{\theta }(x)=p(x|\theta )} . This distribution is usually chosen to be a Gaussian N ( x | μ , σ ) {\displaystyle N(x|\mu ,\sigma )} which is parameterized by μ {\displaystyle \mu } and σ {\displaystyle \sigma } respectively, and as a member of the exponential family it is easy to work with as a noise distribution. Simple distributions are easy enough to maximize, however distributions where a prior is assumed over the latents z {\displaystyle z} results in intractable integrals. Let us find p θ ( x ) {\displaystyle p_{\theta }(x)} via marginalizing over z {\displaystyle z} . p θ ( x ) = ∫ z p θ ( x , z ) d z , {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x,z})\,dz,} where p θ ( x , z ) {\displaystyle p_{\theta }({x,z})} represents the joint distribution under p θ {\displaystyle p_{\theta }} of the observable data x {\displaystyle x} and its latent representation or encoding z {\displaystyle z} . According to the chain rule, the equation can be rewritten as p θ ( x ) = ∫ z p θ ( x | z ) p θ ( z ) d z {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x|z})p_{\theta }(z)\,dz} In the vanilla variational autoencoder, z {\displaystyle z} is usually taken to be a finite-dimensional vector of real numbers, and p θ ( x | z ) {\displaystyle p_{\theta }({x|z})} to be a Gaussian distribution. Then p θ ( x ) {\displaystyle p_{\theta }(x)} is a mixture of Gaussian distributions. It is now possible to define the set of the relationships between the input data and its latent representation as Prior p θ ( z ) {\displaystyle p_{\theta }(z)} Likelihood p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} Posterior p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} Unfortunately, the computation of p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} is expensive and in most cases intractable. To speed up the calculus to make it feasible, it is necessary to introduce a further function to approximate the posterior distribution as q ϕ ( z | x ) ≈ p θ ( z | x ) {\displaystyle q_{\phi }({z|x})\approx p_{\theta }({z|x})} with ϕ {\displaystyle \phi } defined as the set of real values that parametrize q {\displaystyle q} . This is sometimes called amortized inference, since by "investing" in finding a good q ϕ {\displaystyle q_{\phi }} , one can later infer z {\displaystyle z} from x {\displaystyle x} quickly without doing any integrals. In this way, the problem is to find a good probabilistic autoencoder, in which the conditional likelihood distribution p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is computed by the probabilistic decoder, and the approximated posterior distribution q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is computed by the probabilistic encoder. Parametrize the encoder as E ϕ {\displaystyle E_{\phi }} , and the decoder as D θ {\displaystyle D_{\theta }} . == Evidence lower bound (ELBO) == Like many deep learning approaches that use gradient-based optimization, VAEs require a differentiable loss function to update the network weights through backpropagation. For variational autoencoders, the idea is to jointly optimize the generative model parameters θ {\displaystyle \theta } to reduce the reconstruction error between the input and the output, and ϕ {\displaystyle \phi } to make q ϕ ( z | x ) {\displaystyle q_{\phi }({z|x})} as close as possible to p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} . As reconstruction loss, mean squared error and cross entropy are often used. The Kullback–Leibler divergence D K L ( q ϕ ( z | x ) ∥ p θ ( z | x ) ) {\displaystyle D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))} can be used as a loss function to squeeze q ϕ ( z | x ) {\displaystyle q_{\phi }({z|x})} under p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} . This divergence loss expands to D K L ( q ϕ ( z | x ) ∥ p θ ( z | x ) ) = E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( z | x ) ] = E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( x ) p θ ( x , z ) ] = ln ⁡ p θ ( x ) + E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( x , z ) ] . {\displaystyle {\begin{aligned}D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }(z|x)}{p_{\theta }(z|x)}}\right]\\&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})p_{\theta }(x)}{p_{\theta }(x,z)}}\right]\\&=\ln p_{\theta }(x)+\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})}{p_{\theta }(x,z)}}\right].\end{aligned}}} Now, define the evidence lower bound (ELBO): L θ , ϕ ( x ) := E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ p θ ( x , z ) q ϕ ( z | x ) ] = ln ⁡ p θ ( x ) − D K L ( q ϕ ( ⋅ | x ) ∥ p θ ( ⋅ | x ) ) {\displaystyle L_{\theta ,\phi }(x):=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\ln p_{\theta }(x)-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }({\cdot |x}))} Maximizing the ELBO θ ∗ , ϕ ∗ = argmax θ , ϕ L θ , ϕ ( x ) {\dis

Artificial reproduction

Artificial reproduction is the re-creation of life brought about by means other than natural ones. It is new life built by human plans and projects. Examples include artificial selection, artificial insemination, in vitro fertilization, artificial womb, artificial cloning, and kinematic replication. Artificial reproduction is one aspect of artificial life. Artificial reproduction can be categorized into one of two classes according to its capacity to be self-sufficient: non-assisted reproductive technology and assisted reproductive technology. Cutting plants' stems and placing them in compost is a form of assisted artificial reproduction, xenobots are an example of a more autonomous type of reproduction, while the artificial womb presented in the movie the Matrix illustrates a non assisted hypothetical technology. The idea of artificial reproduction has led to various technologies. == Theology == Humans have aspired to create life since immemorial times. Most theologies and religions have conceived this possibility as exclusive of deities. Christian religions consider the possibility of artificial reproduction, in most cases, as heretical and sinful. == Philosophy == Although ancient Greek philosophy raised the concept that man could imitate the creative capacity of nature, classic Greeks thought that if possible, human beings would reproduce things as nature does, and vice versa, nature would do the things that man does in the same way. Aristotle, for example, wrote that if nature made tables, it would make them just as men do. In other words, Aristotle said that if nature were to create a table, such table will look like a human-made table. Correspondingly, Descartes envisioned the human body, and nature, as a machine. Cartesian philosophy does not stop seeing a perfect mirror between nature and the artificial. However, Kant revolutionized this old idea by criticizing such naturalism. Kant pedagogically wrote: "Reason, in order to be taught by nature, must approach nature with its principles in one hand, according to which the agreement among appearances can count as laws, and, in the other hand, the experiment thought out in accord with these principles—in order to be instructed by nature not like a pupil, who has recited to him whatever the teacher wants to say, but like an appointed judge who compels witnesses to answer the questions he puts to them.". Humans are not instructed by nature but rather use nature as raw material to invent. Humans find alternatives to the natural restrictions imposed by natural laws thus, nature is not necessarily mirrored. In accordance with Kant (and contrary to what Aristotle thought) Karl Marx, Alfred Whitehead, Jaques Derrida and Juan David García Bacca noticed that nature is incapable of reproducing tables; or airplanes, or submarines, or computers. If nature tried to create airplanes, it would produce birds. If nature tried to create submarines, it would get fishes. If nature tried to create computers, brains would grow. And if nature tried to create man, modern man, monkeys will be evolved. According to Whitehead, if we look for something natural in artificial life, in the most elaborate cases, if anything, only atoms remain natural. Juan David Garcia Bacca summarized, “It will not come out from wood, it will not be born, a galley; from clay, a vessel; from linen, a dress; from iron, a lever,...From natural, artificial. In the artificial, the natural is reduced to a simple raw material, even though it is perfectly specified with natural specification. The artificial is the real, positive, and original negation of the natural: of species, of genus and of essence. Thus, its ontology is superior to natural ontology. And for this very reason Marx did not attach any importance to Darwin, whose evolutionism is confined to the natural order: to changes, at most, from variety to variety, from species to species... natural. For the same reason, nature has no dialectics, even though continuous evolution and selection can occur. The dialectic cannot emerge from the natural, for deeper reasons than, using today's terms, from a bird, an airplane cannot emerge; from fish, a submarine; from ears, a telephone; from eyes, a television; from a brain, a digital computer; from feet, a car; from hands, an engine; from Euclid, Descartes; from Aristotle, Newton; from Plato, Marx.” According to García Bacca, the major difference between natural causes and artificial causes is that nature does not have plans and projects, while humans design things following plans and projects. In contrast, other influential authors such as Michael Behe have depicted the concept and promoted the idea of intelligent design, a notion that has aroused several doubts and heated controversies, as it reframe natural causes in accordance with a natural plan. Previous ideas that have also provided a positive 'sense' to natural reproduction, are orthogenesis, syntropy, orgone and morphic resonance, among others. Although, these ideas have been historically marginalized and often called pseudoscience, recently Bio-semioticians are reconsidering some of them under symbolic approaches. Current metaphysics of science actually recognizes that the artificial ways of reproduction are diverse from nature, i.e., unnatural, anti-natural or supernatural. Because Biosemiotics does not focus on the function of life but on its meaning, it has a better understanding of the artificial than classic biology. == Science == Biology, being the study of cellular life, addresses reproduction in terms of growth and cellular division (i.e., binary fission, mitosis and meiosis); however, the science of artificial reproduction is not restricted by the mirroring of these natural processes.The science of artificial reproduction is actually transcending the natural forms, and natural rules, of reproduction. For example, xenobots have redefined the classical conception of reproduction. Although xenobots are made of eukariotic cells they do not reproduce by mitosis, but rather by kinematic replication. Such constructive replication does not involve growing but rather building. == Assisted reproductive technologies == Assisted reproductive technology (ART)'s purpose is to assist the development of a human embryo, commonly because of medical concerns due to fertility limitations. == Non-assisted reproductive technologies == Non-assisted reproductive technologies (NART) could have medical motivations but are mostly driven by a wider heterotopic ambition. Although, NARTs are initially designed by humans, they are programed to become independent of humans to a relative or absolute extent. James Lovelock proposed that such novelties could overcome humans. === Artificial cloning === Cloning is the cellular reproductive processes where two or more genetically identical organisms are created, either by natural or artificial means. Artificial cloning normally involves editing the genetic code, somatic cell nuclear transfer and 3D bioprinting. === Non-assisted artificial womb === A non-assisted artificial womb or artificial uterus is a device that allow for ectogenesis or extracorporeal pregnancy by growing an embryonic form outside the body of an organism (that would normally carry the embryo to term) without any human assistance. The aspect of non-assistance is the key distinction between the current artificial womb technology (AWT) in modern medical research, which still relies on human assistance. With this non-assisted hypothetical technology, a zygote or stem cells are used to create an embryo that is then incubated and monitored by artificial intelligence (AI) within a chamber composed of biocompatible material. The AI maintains the necessary conditions for the embryo to develop and thrive, proceeding to mimic organic labor and childbirth in order to best help the embryo adjust to the outside world. Ectogenesis—gestation, depicted in the science fiction movie The Matrix, is a fast approaching reality. This type of innovation presupposes that vertebrate wombs are not the only way for bearing humans or other similar forms of life. === Kinematic replication === Self-replication without binary fission, meiosis, mitosis (or any other form of cellular reproduction that involves division and growing) can be achieved. Xenobots are an example of kinematic replication. They are biobots, named after the African clawed frog (Xenopus laevis). Xenobots are cellular life forms designed by using artificial intelligence to build more of themselves by combining frog cells in a liquid medium. The term kinematic replication is usually reserved for biomolecules (e.g. DNA, RNA, prions, etc.) and artificially designed cellular forms (e.g. xenobots). === Machine constructive replication === Machine constructive replication mimics human traditional manufacturing but is entirely self-automated. Such constructive replication is a more general form of kinematic replication, which does not necessarily

AI Clip Makers: Free vs Paid (2026)

Shopping for the best AI clip maker? An AI clip maker is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI clip maker slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

AI Blog Writers: Free vs Paid (2026)

Shopping for the best AI blog writer? An AI blog writer is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI blog writer slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

AI Paragraph Rewriters: Free vs Paid (2026)

Curious about the best AI paragraph rewriter? An AI paragraph rewriter is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI paragraph rewriter slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

Foveated rendering

Foveated rendering is a rendering technique which uses an eye tracker integrated with a virtual reality headset to reduce the rendering workload by greatly reducing the image quality in the peripheral vision (outside of the zone gazed by the fovea). A less sophisticated variant called fixed foveated rendering doesn't utilise eye tracking and instead assumes a fixed focal point. == History == Research into foveated rendering dates back at least to 1991. At Tech Crunch Disrupt SF 2014, Fove unveiled a headset featuring foveated rendering. This was followed by a successful kickstarter in May 2015. At CES 2016, SensoMotoric Instruments (SMI) demoed a new 250 Hz eye tracking system and a working foveated rendering solution. It resulted from a partnership with camera sensor manufacturer Omnivision who provided the camera hardware for the new system. In July 2016, Nvidia demonstrated during SIGGRAPH a new method of foveated rendering claimed to be invisible to users. In February 2017, Qualcomm announced their Snapdragon 835 Virtual Reality Development Kit (VRDK) which includes foveated rendering support called Adreno Foveation. == Use == According to chief scientist Michael Abrash at Oculus, utilising foveated rendering in conjunction with sparse rendering and deep learning image reconstruction has the potential to require an order of magnitude fewer pixels to be rendered in comparison to a full image. Later, these results have been demonstrated and published. In December 2019, fixed foveated rendering support was added to the Oculus Quest SDK. A number of VR headsets have included on-board eye tracking to provide support for foveated rendering, including HTC's Vive Pro Eye (2019), Meta Quest Pro (2022), PlayStation VR2 (2023), and Apple Vision Pro (2024). In 2025, Valve announced the upcoming Steam Frame headset, which applies a variation of the technique known as "foveated streaming" for wireless streaming from a PC to the headset; the method similarly uses variance in bit rate, and is performed at the encoder level rather than the software level.

Léon Bottou

Léon-Yves Bottou (French pronunciation: [leɔ̃ bɔtu]; born 1965) is a researcher best known for his work in machine learning and data compression. His work presents stochastic gradient descent as a fundamental learning algorithm. He is also one of the main creators of the DjVu image compression technology (together with Yann LeCun and Patrick Haffner), and the maintainer of DjVuLibre, the open source implementation of DjVu. He is the original developer of the Lush programming language. == Life == Léon Bottou was born in France in 1965. He obtained the Diplôme d'Ingénieur from École Polytechnique in 1987, a Magistère de Mathématiques Fondamentales et Appliquées et d’Informatique from École Normale Supérieure in 1988, a Diplôme d'Études Approndies in Computer Science in 1988, in 1988, and a PhD from Université Paris-Sud in 1991. In 1988, in collaboration with Yann LeCun, he published SN, a software package for simulating artificial neural networks. His master's thesis concerned using Time Delay Neural Networks for speech recognition. He then joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, New Jersey, where he collaborated with Vladimir Vapnik on local learning algorithms. in 1992, he returned to France and founded Neuristique S.A., a company that produced machine learning tools and one of the first data mining software packages, including Lush, an object-oriented programming language based on C and Lisp designed for training and using large-scale neural networks. In 1995, he returned to Bell Laboratories, where he developed a number of new machine learning methods, such as Graph Transformer Networks (similar to conditional random field), and applied them to handwriting recognition and OCR. The bank check recognition system that he helped develop was widely deployed by NCR and other companies, reading over 10% of all the checks in the US in the late 1990s and early 2000s. In 1996, he joined AT&T Labs and worked primarily on the DjVu image compression technology, that is used by some websites, notably the Internet Archive, to distribute scanned documents. Between 2002 and 2010, he was a research scientist at NEC Laboratories in Princeton, New Jersey, where he focused on the theory and practice of machine learning with large-scale datasets, on-line learning, and stochastic optimization methods. He developed the open source software LaSVM for fast large-scale support vector machine, and stochastic gradient descent software for training linear SVM and Conditional Random Fields. In 2010 he joined the Microsoft adCenter in Redmond, Washington, and in 2012 became a Principal Researcher at Microsoft Research in New York City. In March 2015 he joined Facebook Artificial Intelligence Research, also in New York City, as a research lead. His work in gradient descent argued that both stochastic gradient descent and batch gradient descent reach similar levels of loss with the same number of training samples, but SGD is faster when running on large datasets. He also argued that second-order gradient descent methods, such as quasi-Newton methods, can be beneficial compared to plain SGD. See (Bottou et al 2018) for a review. He was program chair of the 2013 Conference on Neural Information Processing Systems and the 2009 International Conference on Machine Learning. In 2007, he was received one of the first Blavatnik Awards for Young Scientists from the Blavatnik Family Foundation and the New York Academy of Sciences.