Residual neural network

A residual neural network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of that year. As a point of terminology, "residual connection" refers to the specific architectural motif of x ↦ f ( x ) + x {\displaystyle x\mapsto f(x)+x} , where f {\displaystyle f} is an arbitrary neural network module. The motif had been used previously (see §History for details). However, the publication of ResNet made it widely popular for feedforward networks, appearing in neural networks that are seemingly unrelated to ResNet. The residual connection stabilizes the training and convergence of deep neural networks with hundreds of layers, and is a common motif in deep neural networks, such as transformer models (e.g., BERT, and GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system. == Mathematics == === Residual connection === In a multilayer neural network model, consider a (non-residual) subnetwork with a certain number of stacked layers (e.g., 2 or 3). Let H ( x ; α ) {\displaystyle H(x;\alpha )} denote the subnetwork. Suppose H ∗ {\displaystyle H^{}} is the desired optimal output of this subnetwork. Residual learning simply adds x {\displaystyle x} directly to the output, such that the optimal learned output now becomes be H ∗ − x {\displaystyle H^{}-x} , which is interpreted as a "residual" with respect to x {\displaystyle x} . The operation of "adding x {\displaystyle x} " is implemented via a "skip connection" that performs an identity mapping to connect the input of the subnetwork with its output. This connection is referred to as a "residual connection" in later work. Let F ( x ; α ) = H ( x ; a ) + x {\displaystyle F(x;\alpha )=H(x;a)+x} . The function F {\displaystyle F} is often represented by matrix multiplication interlaced with activation functions and normalization operations (e.g., batch normalization or layer normalization). As a whole, one of these subnetworks is referred to as a "residual block". A deep residual network is constructed by simply stacking these blocks. Long short-term memory (LSTM) has a memory mechanism that serves as a residual connection. In an LSTM without a forget gate, an input x t {\displaystyle x_{t}} is processed by a function F {\displaystyle F} and added to a memory cell c t {\displaystyle c_{t}} , resulting in c t + 1 = c t + F ( x t ) {\displaystyle c_{t+1}=c_{t}+F(x_{t})} . An LSTM with a forget gate essentially functions as a highway network. To stabilize the variance of the layers' inputs, it is recommended to replace the residual connections x + f ( x ) {\displaystyle x+f(x)} with x / L + f ( x ) {\displaystyle x/L+f(x)} , where L {\displaystyle L} is the total number of residual layers. === Projection connection === If the function F {\displaystyle F} is of type F : R n → R m {\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{m}} where n ≠ m {\displaystyle n\neq m} , then F ( x ) + x {\displaystyle F(x)+x} is undefined. To handle this special case, a projection connection is used: y = F ( x ) + P ( x ) {\displaystyle y=F(x)+P(x)} where P {\displaystyle P} is typically a linear projection, defined by P ( x ) = M x {\displaystyle P(x)=Mx} where M {\displaystyle M} is a m × n {\displaystyle m\times n} matrix. The matrix is trained via backpropagation, as is any other parameter of the model. === Signal propagation === The introduction of identity mappings facilitates signal propagation in both forward and backward paths. ==== Forward propagation ==== If the output of the ℓ {\displaystyle \ell } -th residual block is the input to the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th residual block (assuming no activation function between blocks), then the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th input is: x ℓ + 1 = F ( x ℓ ) + x ℓ {\displaystyle x_{\ell +1}=F(x_{\ell })+x_{\ell }} Applying this formulation recursively, e.g.: x ℓ + 2 = F ( x ℓ + 1 ) + x ℓ + 1 = F ( x ℓ + 1 ) + F ( x ℓ ) + x ℓ {\displaystyle {\begin{aligned}x_{\ell +2}&=F(x_{\ell +1})+x_{\ell +1}\\&=F(x_{\ell +1})+F(x_{\ell })+x_{\ell }\end{aligned}}} yields the general relationship: x L = x ℓ + ∑ i = ℓ L − 1 F ( x i ) {\displaystyle x_{L}=x_{\ell }+\sum _{i=\ell }^{L-1}F(x_{i})} where L {\textstyle L} is the index of a residual block and ℓ {\textstyle \ell } is the index of some earlier block. This formulation suggests that there is always a signal that is directly sent from a shallower block ℓ {\textstyle \ell } to a deeper block L {\textstyle L} . ==== Backward propagation ==== The residual learning formulation provides the added benefit of mitigating the vanishing gradient problem to some extent. However, it is crucial to acknowledge that the vanishing gradient issue is not the root cause of the degradation problem, which is tackled through the use of normalization. To observe the effect of residual blocks on backpropagation, consider the partial derivative of a loss function E {\displaystyle {\mathcal {E}}} with respect to some residual block input x ℓ {\displaystyle x_{\ell }} . Using the equation above from forward propagation for a later residual block L > ℓ {\displaystyle L>\ell } : ∂ E ∂ x ℓ = ∂ E ∂ x L ∂ x L ∂ x ℓ = ∂ E ∂ x L ( 1 + ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) ) = ∂ E ∂ x L + ∂ E ∂ x L ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) {\displaystyle {\begin{aligned}{\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial x_{L}}{\partial x_{\ell }}}\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}\left(1+{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\right)\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}+{\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\end{aligned}}} This formulation suggests that the gradient computation of a shallower layer, ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} , always has a later term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} that is directly added. Even if the gradients of the F ( x i ) {\displaystyle F(x_{i})} terms are small, the total gradient ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} resists vanishing due to the added term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} . == Variants of residual blocks == === Basic block === A basic block is the simplest building block studied in the original ResNet. This block consists of two sequential 3x3 convolutional layers and a residual connection. The input and output dimensions of both layers are equal. === Bottleneck block === A bottleneck block consists of three sequential convolutional layers and a residual connection. The first layer in this block is a 1×1 convolution for dimension reduction (e.g., to 1/2 of the input dimension); the second layer performs a 3×3 convolution; the last layer is another 1×1 convolution for dimension restoration. The models of ResNet-50, ResNet-101, and ResNet-152 are all based on bottleneck blocks. === Pre-activation block === The pre-activation residual block applies activation functions before applying the residual function F {\displaystyle F} . Formally, the computation of a pre-activation residual block can be written as: x ℓ + 1 = F ( ϕ ( x ℓ ) ) + x ℓ {\displaystyle x_{\ell +1}=F(\phi (x_{\ell }))+x_{\ell }} where ϕ {\displaystyle \phi } can be any activation (e.g. ReLU) or normalization (e.g. LayerNorm) operation. This design reduces the number of non-identity mappings between residual blocks, and allows an identity mapping directly from the input to the output. This design was used to train models with 200 to over 1000 layers, and was found to consistently outperform variants where the residual path is not an identity function. The pre-activation ResNet with 200 layers took 3 weeks to train for ImageNet on 8 GPUs in 2016. Since GPT-2, transformer blocks have been mostly implemented as pre-activation blocks. This is often referred to as "pre-normalization" in the literature of transformer models. == Applications == Originally, ResNet was designed for computer vision. All transformer architectures include residual connections. Indeed, very deep transformers cannot be trained without them. The original ResNet paper made no claim on being inspired by biological systems. However, later research has related ResNet to biologically-plausible algorithms. A study published in Science in 2023 disclosed the complete connectome of an insect brain (specifically that of a fruit fly larva). This study discovered "multilayer shortcuts" that resemble the skip connections in artificial neural networks, including ResNets. == History == === Previous work === Residual connections were noticed in neu

Point-set registration

In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation (e.g., scaling, rotation and translation) that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model (or coordinate frame), and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging. As a special case, registration of two point sets that only differ by a 3D rotation (i.e., there is no scaling and translation), is called the Wahba Problem and also related to the orthogonal procrustes problem. == Formulation == The problem may be summarized as follows: Let { M , S } {\displaystyle \lbrace {\mathcal {M}},{\mathcal {S}}\rbrace } be two finite size point sets in a finite-dimensional real vector space R d {\displaystyle \mathbb {R} ^{d}} , which contain M {\displaystyle M} and N {\displaystyle N} points respectively (e.g., d = 3 {\displaystyle d=3} recovers the typical case of when M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} are 3D point sets). The problem is to find a transformation to be applied to the moving "model" point set M {\displaystyle {\mathcal {M}}} such that the difference (typically defined in the sense of point-wise Euclidean distance) between M {\displaystyle {\mathcal {M}}} and the static "scene" set S {\displaystyle {\mathcal {S}}} is minimized. In other words, a mapping from R d {\displaystyle \mathbb {R} ^{d}} to R d {\displaystyle \mathbb {R} ^{d}} is desired which yields the best alignment between the transformed "model" set and the "scene" set. The mapping may consist of a rigid or non-rigid transformation. The transformation model may be written as T {\displaystyle T} , using which the transformed, registered model point set is: The output of a point set registration algorithm is therefore the optimal transformation T ⋆ {\displaystyle T^{\star }} such that M {\displaystyle {\mathcal {M}}} is best aligned to S {\displaystyle {\mathcal {S}}} , according to some defined notion of distance function dist ⁡ ( ⋅ , ⋅ ) {\displaystyle \operatorname {dist} (\cdot ,\cdot )} : where T {\displaystyle {\mathcal {T}}} is used to denote the set of all possible transformations that the optimization tries to search for. The most popular choice of the distance function is to take the square of the Euclidean distance for every pair of points: where ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} denotes the vector 2-norm, s m {\displaystyle s_{m}} is the corresponding point in set S {\displaystyle {\mathcal {S}}} that attains the shortest distance to a given point m {\displaystyle m} in set M {\displaystyle {\mathcal {M}}} after transformation. Minimizing such a function in rigid registration is equivalent to solving a least squares problem. == Types of algorithms == When the correspondences (i.e., s m ↔ m {\displaystyle s_{m}\leftrightarrow m} ) are given before the optimization, for example, using feature matching techniques, then the optimization only needs to estimate the transformation. This type of registration is called correspondence-based registration. On the other hand, if the correspondences are unknown, then the optimization is required to jointly find out the correspondences and transformation together. This type of registration is called simultaneous pose and correspondence registration. === Rigid registration === Given two point sets, rigid registration yields a rigid transformation which maps one point set to the other. A rigid transformation is defined as a transformation that does not change the distance between any two points. Typically such a transformation consists of translation and rotation. In rare cases, the point set may also be mirrored. In robotics and computer vision, rigid registration has the most applications. === Non-rigid registration === Given two point sets, non-rigid registration yields a non-rigid transformation which maps one point set to the other. Non-rigid transformations include affine transformations such as scaling and shear mapping. However, in the context of point set registration, non-rigid registration typically involves nonlinear transformation. If the eigenmodes of variation of the point set are known, the nonlinear transformation may be parametrized by the eigenvalues. A nonlinear transformation may also be parametrized as a thin plate spline. === Other types === Some approaches to point set registration use algorithms that solve the more general graph matching problem. However, the computational complexity of such methods tend to be high and they are limited to rigid registrations. In this article, we will only consider algorithms for rigid registration, where the transformation is assumed to contain 3D rotations and translations (possibly also including a uniform scaling). The PCL (Point Cloud Library) is an open-source framework for n-dimensional point cloud and 3D geometry processing. It includes several point registration algorithms. == Correspondence-based registration == Correspondence-based methods assume the putative correspondences m ↔ s m {\displaystyle m\leftrightarrow s_{m}} are given for every point m ∈ M {\displaystyle m\in {\mathcal {M}}} . Therefore, we arrive at a setting where both point sets M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} have N {\displaystyle N} points and the correspondences m i ↔ s i , i = 1 , … , N {\displaystyle m_{i}\leftrightarrow s_{i},i=1,\dots ,N} are given. === Outlier-free registration === In the simplest case, one can assume that all the correspondences are correct, meaning that the points m i , s i ∈ R 3 {\displaystyle m_{i},s_{i}\in \mathbb {R} ^{3}} are generated as follows:where l > 0 {\displaystyle l>0} is a uniform scaling factor (in many cases l = 1 {\displaystyle l=1} is assumed), R ∈ SO ( 3 ) {\displaystyle R\in {\text{SO}}(3)} is a proper 3D rotation matrix ( SO ( d ) {\displaystyle {\text{SO}}(d)} is the special orthogonal group of degree d {\displaystyle d} ), t ∈ R 3 {\displaystyle t\in \mathbb {R} ^{3}} is a 3D translation vector and ϵ i ∈ R 3 {\displaystyle \epsilon _{i}\in \mathbb {R} ^{3}} models the unknown additive noise (e.g., Gaussian noise). Specifically, if the noise ϵ i {\displaystyle \epsilon _{i}} is assumed to follow a zero-mean isotropic Gaussian distribution with standard deviation σ i {\displaystyle \sigma _{i}} , i.e., ϵ i ∼ N ( 0 , σ i 2 I 3 ) {\displaystyle \epsilon _{i}\sim {\mathcal {N}}(0,\sigma _{i}^{2}I_{3})} , then the following optimization can be shown to yield the maximum likelihood estimate for the unknown scale, rotation and translation:Note that when the scaling factor is 1 and the translation vector is zero, then the optimization recovers the formulation of the Wahba problem. Despite the non-convexity of the optimization (cb.2) due to non-convexity of the set SO ( 3 ) {\displaystyle {\text{SO}}(3)} , seminal work by Berthold K.P. Horn showed that (cb.2) actually admits a closed-form solution, by decoupling the estimation of scale, rotation and translation. Similar results were discovered by Arun et al. In addition, in order to find a unique transformation ( l , R , t ) {\displaystyle (l,R,t)} , at least N = 3 {\displaystyle N=3} non-collinear points in each point set are required. More recently, Briales and Gonzalez-Jimenez have developed a semidefinite relaxation using Lagrangian duality, for the case where the model set M {\displaystyle {\mathcal {M}}} contains different 3D primitives such as points, lines and planes (which is the case when the model M {\displaystyle {\mathcal {M}}} is a 3D mesh). Interestingly, the semidefinite relaxation is empirically tight, i.e., a certifiably globally optimal solution can be extracted from the solution of the semidefinite relaxation. === Robust registration === The least squares formulation (cb.2) is known to perform arbitrarily badly in the presence of outliers. An outlier correspondence is a pair of measurements s i ↔ m i {\displaystyle s_{i}\leftrightarrow m_{i}} that departs from the generative model (cb.1). In this case, one can consider a differen

Use of artificial intelligence by the United States Department of Defense

The United States Department of Defense has been analyzing and employing military applications of artificial intelligence since at least 2014. The program initially focused on drones and other robots, but has also been using large language models for military research and analysis. The current US policy on lethal autonomous weapons is Department of Defense Directive 3000.09, updated in January 2023. == Background == The United States Department of Defense began developing lethal autonomous weapons as early as the Reagan administration. An early version of the Tomahawk missile could have been used to destroy Soviet ships without direct human control; the initiative was abandoned after the United States and the Soviet Union signed START I. By 2014, the United Kingdom, Israel, and Norway had already begun using missiles equipped with artificial intelligence systems. The Department of Defense established a policy on the use of artificial intelligence in 2012. == History == === 2016–2017: Carter secretaryship === In May 2016, secretary of defense Ash Carter stated that his Third Offset strategy would include utilizing artificial intelligence as a military advantage. The New York Times reported that year that the Department of Defense had tested an autonomous drone at an approximation of a Middle Eastern village at Camp Edwards. Deputy secretary of defense Robert O. Work, who advocated for developing artificial intelligence, told the Times that the United States needed to compete with China and Russia by having a tactical advantage they could not easily replicate. The initiative was developed by DARPA beginning in 2015. The use of artificial intelligence in the U.S. military was controversial within the department; in February, Paul Scharre, who worked for the Office of the Secretary of Defense in the secretaryships of Robert Gates and Leon Panetta, published a report about the risks of artificial intelligence for broad military applications. === 2017–2019: Mattis secretaryship === By 2017, the United States Air Force had already begun using artificial intelligence in military robots. The Air Force's use of Neurala, an artificial intelligence company, concerned officials in the Department of Defense after an investigation found that Neurala had accepted money from an investment firm with funding from a state-run Chinese company. The Department of Defense began heavily investing in artificial intelligence after Work established Project Maven, an initiative to encourage the development and integration of artificial intelligence in the military, in April 2017. In May 2018, secretary of defense Jim Mattis privately expressed to president Donald Trump that he needed to establish a national strategy on artificial intelligence, quoting an article from former secretary of state Henry Kissinger that called for a presidential commission on the technology. The Department of Defense established the Joint Artificial Intelligence Center the following month. Google began working with the Department of Defense on analyzing drone footage as early as March. Google's involvement in the initiative led to protests from employees and mass resignations. Seeking to quell internal unrest, Google stated it would not renew its contract with the Department of Defense in June. The Department of Defense announced an artificial intelligence contract with Microsoft in October. === 2025–present: Hegseth secretaryship === In December 2025, secretary of defense Pete Hegseth announced GenAI.mil, an artificial intelligence platform for the Department of Defense. In a video announcing the platform, Hegseth stated that Department of Defense workers would be able to "conduct deep research, format documents and even analyze video or imagery." The Department of Defense contracted first Gemini by Google, then ChatGPT by OpenAI, and finally Grok by xAI for the platform. Claude by Anthropic was also contracted by the Department of Defense and was in use on secure servers until it was revealed that Claude had been used in the 2026 operation to capture Nicolás Maduro, who was at the time the leader of Venezuela. This revelation sparked a high-profile dispute over Anthropic's ability to constrain Claude's useage, resulting in the termination of Anthropic's $200 million defense contract. The Department of Defense also moved to label Anthropic a supply chain risk, which was later blocked by a federal judge.

It's the Most Terrible Time of the Year

It's the Most Terrible Time of the Year is an AI-generated television commercial created for McDonald's Netherlands by TBWA\Neboko and The Sweetshop. It was released on 6 December 2025 before being pulled four days later due to negative reception over its use of generative artificial intelligence and its cynical, negative depiction of the holiday season. == Plot == On a bleak, snowy day, various people in the city experience different kinds of mishaps during the Christmas season. Among other incidents, families struggle with their huge loads of presents; Santa Claus gets stuck in traffic; a Christmas tree "redecorates" a man's home, sending him through the window; another family puts up with annoying relatives and a burnt Christmas dinner. Because of all this chaos, a man decides to find refuge in a McDonald's outlet. A Christmas choir finishes singing the jingle "It's the Most Terrible Time of the Year" with the call to action to "hide out in McDonald's till January's here". == Campaign == It's the Most Terrible Time of the Year is a 45-second television commercial made by Dutch agency TBWA\Neboko with involvement of United States-based film production studio The Sweetshop. The advertisement was produced heavily with generative artificial intelligence (AI) following the trend set by other brands such as Coca-Cola and Toys "R" Us. McDonald's Netherlands, the client, released a statement that the commercial was meant to depict "the stressful moments during the holidays in the Netherlands". The commercial also used Andy Williams's "It's the Most Wonderful Time of the Year" with lyrics changed to fit with the concept of the advertisement. According to The Sweetshop, the production of the advertisement took "seven weeks". It also added that much effort was put into the commercial compared to the traditional process. Ten people of its in-house AI engine The Gardening Club worked on the project. Los Angeles-based directors Mark Potoka and Matt Spicer were initially credited to be involved in the film but they resigned due to being sidelined from the production process. == Reception == The advertisement was released on McDonald's Netherlands' YouTube channel on 6 December 2025. It had a negative reception over the use of generative AI and the "cynical" concept of the work's story. The video was made private on 9 December 2025. The Sweetshop stated that the production of the advertisement took human effort. McDonald's Netherlands, while stating the original intent of the commercial, released a statement after its pullout that, for many of its customers, the holiday season is the "most wonderful time of the year".

Mars Plus

Mars Plus is a 1994 science fiction novel by American writer Frederik Pohl and Thomas T. Thomas. It is the sequel to Pohl's 1976 novel Man Plus, which is about a cyborg, Roger Torraway, who is designed to operate in the harsh Martian environment, so that humans can start to colonize Mars. Mars Plus is set fifty years after the first novel. Young Demeter Coghlan travels to Mars, now settled by humans and cyborgs, and finds herself amidst a rebellion by the colonists. == Plot == In Man Plus, set in the not-too-distant future, with threat of the Cold War becoming a fighting war, people plan for the colonization of Mars to escape the seemingly-inevitable Armageddon. The American government begins a cyborg program to create a being capable of surviving the harsh Martian environment: a "Man Plus" called Roger Torraway who is converted from man to cyborg. While his cyborg body is adapted to Mars, he feels strange at first. As more nations develop cyborgs, the computer networks of Earth become sentient. Mars Plus is set fifty years after the first novel, when Mars is settled by humans and cyborgs. The cyborg Torroway is in the novel, but he is not the main character. The protagonist is Demeter Coghlan, a young woman from Earth who travels to Mars. Demeter is seeking information about a canyon that she believes may be significant if the colonists begin to convert Mars to an Earth-like planet. Amidst a backdrop of spies and newly dispatched Earth diplomats, the inexperienced Demeter senses that tensions are rising on the planet. She is further disoriented due to recovering from an accident. Despite the risks in the region, Demeter has intense sexual encounters with some of the local colonists. When the locals rebel against the surveillance set up by the computer network, Demeter is kidnapped by the computer network. == Reception == The reviewer from SFBook Reviews criticizes the book, saying "nothing really happens" and stating that there is no linkage to Man Plus apart from the presence of the cyborg Torraway; moreover, the reviewer states that the questions posed in the first novel are not answered. SF Reviews calls Mars Plus "...not as good as Man Plus but...not bad", and it is praised for "...some nice touches: Demeter continuously forgetting to think about geology; her careless dictation to the computer and her irresistible urges for wild sex." SF Reviews criticizes the writing in Mars Plus for being "...a little careless in places" and in need of more "...more crafting and pruning."

AI content watermarking

AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user. AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence. Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically. The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use). Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications). == Background == Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs. Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise. The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording. Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns. In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content. == Formal definitions and design goals == Most modern AI watermarking schemes can be formalized as a pair of algorithms ( W m , D e t e c t ) {\displaystyle ({\mathsf {Wm}},{\mathsf {Detect}})} parameterized by a secret key k {\displaystyle k} . The embedding algorithm W m {\displaystyle {\mathsf {Wm}}} takes a generative model M {\displaystyle M} (and optionally a prompt) and returns a watermarked output x {\displaystyle x} ; the detection algorithm D e t e c t ( x , k ) {\displaystyle {\mathsf {Detect}}(x,k)} outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether x {\displaystyle x} was produced by the watermarked generator. The literature evaluates such schemes along several largely conflicting criteria: Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ. Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level. Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation. Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not. Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection. == Techniques == AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection. There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models. Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms. They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection. A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time. === Text === Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged. The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias δ {\displaystyle \delta } to a pseudorandomly selected subset of vocabulary logits before softmax sampling. Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution. Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model. ==== KGW: token-probability shifting ==== The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks. At each decoding step t {\displaystyle t} , a pseudorandom function (PRF) keyed by a secret k {\displaystyle k} is applied to a context window of h {\displaystyle h} previous tokens to deterministically partition the vocabulary V {\displaystyle V} of size N {\displaystyle N} into a "green list" G ⊂ V {\displaystyle G\subset V} of size γ N {\displaystyle \gamma N} and its complement, the "red list" R = V ∖ G {\displaystyle R=V\setminus G} , where γ ∈ ( 0 , 1 ) {\displaystyle \gamma \in (0,1)} (typically γ = 1 / 2 {\displaystyle \gamma =1/2} ) is the green fraction. A logits processor then increments every green-list logit by a fixed bias δ > 0 {\displaystyle \delta >0} before softmax: ℓ v ′ = ℓ v + δ ⋅ 1 [ v ∈ G ] {\displaystyle \ell '_{v}=\ell _{v}+\delta \cdot \mathbf {1} [v\in G]} so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition g ( ⋅ ) {\displaystyle g(\cdot )} for each token, counts the number of green hits | G | hits {\displaystyle |G|_{\text{hits}}} in a sequence of length T {\displaystyle T} , and computes a one-proportion z-test statistic: z = | G | hits − γ T T γ ( 1 − γ ) {\displaystyle z={\frac {|G|_{\text{hits}}-\gamma T}{\sqrt {T\gamma (1-\gamma )}}}} Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean γ T {\displaystyle \gamma T} ; a large positive z {\displaystyle z} rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model. A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window

Fuzzy concept

A fuzzy concept is an idea of which the boundaries of application can vary considerably according to context or conditions, instead of being fixed once and for all. That means the idea is somewhat vague or imprecise. Yet it is not unclear or meaningless. It has a definite meaning, which can often be made more exact with further elaboration and specification — including a closer definition of the context in which the concept is used. The inverse of a "fuzzy concept" is a "crisp concept" (i.e. a precise concept). Fuzzy concepts are often used to navigate imprecision in the real world, when precise information is not available and an approximate indication is sufficient to be helpful. Although the linguist George Philip Lakoff already defined the semantics of a fuzzy concept in 1973 (inspired by an unpublished 1971 paper by Eleanor Rosch,) the term "fuzzy concept" rarely received a standalone entry in dictionaries, handbooks and encyclopedias. Sometimes it was defined in encyclopedia articles on fuzzy logic, or it was simply equated with a mathematical “fuzzy set”. A fuzzy concept can be "fuzzy" for many different reasons in different contexts. This makes it harder to provide a precise definition that covers all cases. Paradoxically, the definition of fuzzy concepts may itself be somewhat "fuzzy". Lotfi A. Zadeh, known as "the father of fuzzy logic", claimed that "vagueness connotes insufficient specificity, whereas fuzziness connotes unsharpness of class boundaries". Not all scholars agree. With increasing academic literature on the subject, the term "fuzzy concept" is now more widely recognized as a philosophical, linguistic or scientific category, and the study of the characteristics of fuzzy concepts and fuzzy language is known as fuzzy semantics. “Fuzzy logic” has become a generic term for many different kinds of many-valued logics, and is applied in many different areas of research, computer programming and industrial design. For engineers, "Fuzziness is imprecision or vagueness of definition." For computer scientists, a fuzzy concept is an idea which is "to an extent applicable" in a situation. It means that the concept can have gradations of significance or unsharp (variable) boundaries of application — a "fuzzy statement" is a statement which is true "to some extent", and that extent can often be represented by a scaled value (a score). For mathematicians, a "fuzzy concept" is usually a fuzzy set or a combination of such sets (see fuzzy mathematics and fuzzy set theory). In cognitive linguistics, the things that belong to a "fuzzy category" exhibit gradations of family resemblance, and the borders of the category are not clearly defined. Through most of the 20th century, the idea of reasoning with fuzzy concepts faced considerable resistance from Western academic elites. They did not want to endorse the use of imprecise concepts in research or argumentation, and they often regarded fuzzy logic with suspicion, derision or even hostility. That may partly explain why the idea of a "fuzzy concept" did not get a separate entry in encyclopedias, handbooks and dictionaries. Yet although people might not be aware of it, the use of fuzzy concepts has risen gigantically in all walks of life from the 1970s onward. That is mainly due to advances in electronic engineering, fuzzy mathematics and digital computer programming. The new technology allows very complex inferences about "variations on a theme" to be anticipated and fixed in a program. The Perseverance Mars rover, a driverless NASA vehicle used to explore the Jezero crater on the planet Mars, features fuzzy logic programming that steers it through rough terrain. Similarly, to the North, the Chinese Mars rover Zhurong used fuzzy logic algorithms to calculate its travel route in Utopia Planitia from sensor data. New neuro-fuzzy computational methods make it possible for machines to identify, measure, adjust and respond to fine gradations of significance with great precision. It means that practically useful concepts can be coded, sharply defined, and applied to all kinds of tasks, even if ordinarily these concepts are never exactly defined. Nowadays engineers, statisticians and programmers often represent fuzzy concepts mathematically, using fuzzy logic, fuzzy values, fuzzy variables and fuzzy sets (see also fuzzy set theory). Fuzzy logic is not "woolly thinking", but a "precise logic of imprecision" which reasons with graded concepts and gradations of truth. Fuzzy concepts and fuzzy logic often play a significant role in artificial intelligence programming, for example because they can model human cognitive processes more easily than other methods. == Origins == Vagueness and fuzziness have probably always been a part of human experience. In the West, ancient texts show that philosophers and scientists were already thinking critically about this in classical antiquity. Most often, they regarded vagueness as a problem: as an obstacle to clear thinking, as a source of confusion, or as an evasive tactic. It got in the way of providing clear orientation, guidance, direction and leadership. Therefore, vagueness became associated with a hermeneutic of suspicion — it was considered as something to avoid, as something undesirable. By contrast, in the ancient Chinese tradition of Daoist thought of Laozi and Zhuang Zhou, "vagueness is not regarded with suspicion, but is simply an acknowledged characteristic of the world around us" — a subject for meditation and a source of insight. === Sorites paradox === The ancient Sorites paradox raised the logical problem, of how we could exactly define the threshold at which a change in quantitative gradation turns into a qualitative or categorical difference. With some physical processes, this threshold seems relatively easy to identify. For example, water turns into steam at 100 °C or 212 °F. Of course, the boiling point depends partly on atmospheric pressure, which decreases at higher altitudes; it is also affected by the level of humidity — in that sense, the boiling point is "somewhat fuzzy", because it can vary under different conditions. Nevertheless, for every altitude, level of air pressure and degree of humidity, we can predict accurately what the boiling point will be, if we know the relevant conditions. With many other processes and gradations, however, the point of change is much more difficult to locate, and remains somewhat vague. Thus, the boundaries between qualitatively different things may be unsharp: we know that there are boundaries, but we cannot define them exactly. For example, to identify "the oldest city in the world", we have to define what counts as a city, and at what point a growing human settlement becomes a city. === The continuum fallacy and Loki's wager === According to the modern idea of the continuum fallacy, the fact that a statement is to an extent vague, does not automatically mean that it has no validity. The question then arises, of how (by what method or approach) we could ascertain and define the validity that the fuzzy statement does have. The Nordic myth of Loki's wager suggested that concepts that lack precise meanings or lack precise boundaries of application cannot be operated with, because they evade any clear definition. However, the 20th-century idea of "fuzzy concepts" proposes that "somewhat vague terms" can be operated with, because we can explicate and define the variability of their application — by assigning numbers to gradations of applicability. This idea sounds simple enough, but it had large implications. === Precursors and pioneers === In Western civilization, the intellectual recognition of fuzzy concepts has been traced back to a diversity of famous and less well-known thinkers, including (among many others) Eubulides, Epicurus, Plato, Cicero, William Ockham and John Buridan, Georg Wilhelm Friedrich Hegel, Karl Marx and Friedrich Engels, Friedrich Nietzsche, William James, Hugh MacColl, Charles S. Peirce, Hans Reichenbach, Carl Gustav Hempel, Max Black, Arto Salomaa, Ludwig Wittgenstein, Jan Łukasiewicz, Emil Leon Post, Alfred Tarski, Georg Cantor, Nicolai A. Vasiliev, Kurt Gödel, Stanisław Jaśkowski, Willard Van Orman Quine, George J. Klir, Petr Hájek, Joseph Goguen, Ronald R. Yager, Enrique Héctor Ruspini, Jan Pavelka, Didier Dubois, Bernadette Bouchon-Meunier, and Donald Knuth. Across at least two and a half millennia, all of them had something to say about graded concepts with unsharp boundaries. This suggests at least that the awareness of the existence of concepts with "fuzzy" characteristics, in one form or another, has a very long history in human thought. Quite a few 20th century logicians, mathematicians and philosophers also tried to analyze the characteristics of fuzzy concepts as a recognized species, sometimes with the aid of some kind of many-valued logic or substructural logic. An early attempt in the post-WW2 era to create a mathematical theory of sets with gradations of