Sharpness aware minimization

Sharpness aware minimization

Sharpness Aware Minimization (SAM) is an optimization algorithm used in machine learning that aims to improve model generalization. The method seeks to find model parameters that are located in regions of the loss landscape with uniformly low loss values, rather than parameters that only achieve a minimal loss value at a single point. This approach is described as finding "flat" minima instead of "sharp" ones. The rationale is that models trained this way are less sensitive to variations between training and test data, which can lead to better performance on unseen data. The algorithm was introduced in a 2020 paper by a team of researchers including Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. == Underlying Principle == SAM modifies the standard training objective by minimizing a "sharpness-aware" loss. This is formulated as a minimax problem where the inner objective seeks to find the highest loss value in the immediate neighborhood of the current model weights, and the outer objective minimizes this value: min w max ‖ ϵ ‖ p ≤ ρ L train ( w + ϵ ) + λ ‖ w ‖ 2 2 {\displaystyle \min _{w}\max _{\|\epsilon \|_{p}\leq \rho }L_{\text{train}}(w+\epsilon )+\lambda \|w\|_{2}^{2}} In this formulation: w {\displaystyle w} represents the model's parameters (weights). L train {\displaystyle L_{\text{train}}} is the loss calculated on the training data. ϵ {\displaystyle \epsilon } is a perturbation applied to the weights. ρ {\displaystyle \rho } is a hyperparameter that defines the radius of the neighborhood (an L p {\displaystyle L_{p}} ball) to search for the highest loss. An optional L2 regularization term, scaled by λ {\displaystyle \lambda } , can be included. A direct solution to the inner maximization problem is computationally expensive. SAM approximates it by taking a single gradient ascent step to find the perturbation ϵ {\displaystyle \epsilon } . This is calculated as: ϵ ( w ) = ρ ∇ L train ( w ) ‖ ∇ L train ( w ) ‖ 2 {\displaystyle \epsilon (w)=\rho {\frac {\nabla L_{\text{train}}(w)}{\|\nabla L_{\text{train}}(w)\|_{2}}}} The optimization process for each training step involves two stages. First, an "ascent step" computes a perturbed set of weights, w adv = w + ϵ ( w ) {\displaystyle w_{\text{adv}}=w+\epsilon (w)} , by moving towards the direction of the highest local loss. Second, a "descent step" updates the original weights w {\displaystyle w} using the gradient calculated at these perturbed weights, ∇ L train ( w adv ) {\displaystyle \nabla L_{\text{train}}(w_{\text{adv}})} . This update is typically performed using a standard optimizer like SGD or Adam. == Application and Performance == SAM has been applied in various machine learning contexts, primarily in computer vision. Research has shown it can improve generalization performance in models such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) on image datasets including ImageNet, CIFAR-10, and CIFAR-100. The algorithm has also been found to be effective in training models with noisy labels, where it performs comparably to methods designed specifically for this problem. Some studies indicate that SAM and its variants can improve out-of-distribution (OOD) generalization, which is a model's ability to perform well on data from distributions not seen during training. Other areas where it has been applied include gradual domain adaptation and mitigating overfitting in scenarios with repeated exposure to training examples. == Limitations == A primary limitation of SAM is its computational cost. By requiring two gradient computations (one for the ascent and one for the descent) per optimization step, it approximately doubles the training time compared to standard optimizers. The theoretical convergence properties of SAM are still under investigation. Some research suggests that with a constant step size, SAM may not converge to a stationary point. The accuracy of the single gradient step approximation for finding the worst-case perturbation may also decrease during the training process. The effectiveness of SAM can also be domain-dependent. While it has shown benefits for computer vision tasks, its impact on other areas, such as GPT-style language models where each training example is seen only once, has been reported as limited in some studies. Furthermore, while SAM seeks flat minima, some research suggests that not all flat minima necessarily lead to good generalization. The algorithm also introduces the neighborhood size ρ {\displaystyle \rho } as a new hyperparameter, which requires tuning. == Research, Variants, and Enhancements == Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required. Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden. To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM) or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM). Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing. Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.

Empowerment (artificial intelligence)

Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment. An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation. The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables ( S : s ∈ S , A : a ∈ A {\displaystyle S:s\in {\mathcal {S}},A:a\in {\mathcal {A}}} ) and time ( t {\displaystyle t} ). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network. == Definition == Empowerment ( E {\displaystyle {\mathfrak {E}}} ) is defined as the channel capacity ( C {\displaystyle C} ) of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors. E := C ( A t ⟶ S t + 1 ) ≡ max p ( a t ) I ( A t ; S t + 1 ) {\displaystyle {\mathfrak {E}}:=C(A_{t}\longrightarrow S_{t+1})\equiv \max _{p(a_{t})}I(A_{t};S_{t+1})} In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment. E ( A t n ⟶ S t + n ) = max p ( a t , . . . , a t + n − 1 ) I ( A t , . . . , A t + n − 1 ; S t + n ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n})=\max _{p(a_{t},...,a_{t+n-1})}I(A_{t},...,A_{t+n-1};S_{t+n})} The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits. === Contextual Empowerment === In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'. C {\displaystyle C} is a random variable describing the context (e.g. state). E ( A t n ⟶ S t + n ∣ C ) = ∑ c ∈ C p ( c ) E ( A t n ⟶ S t + n ∣ C = c ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C)=\sum _{c{\in }C}p(c){\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C=c)} == Application == Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent. Empowerment has been applied in studies of collective behaviour and in continuous domains. As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control. Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games, and in the control of underwater vehicles.

Probiv

Probiv (Russian: пробив, literally "to pierce" or "to punch through") is an illicit data market operating primarily in Russia, where personal information from restricted government and corporate databases is bought and sold through networks of corrupt officials and insiders. The probiv market operates as a parallel information economy built on corrupt officials from various sectors including traffic police, banks, telecommunications companies, and security services who sell access to restricted databases. For fees ranging from as little as $10 to several hundred dollars, buyers can obtain passport numbers, addresses, travel histories, vehicle registrations, and telecommunications records. The market operates through various channels, including specialized Telegram bots and darknet forums. == Notable uses == Probiv services have been utilized by diverse actors for various purposes. Investigative journalists have used the market to conduct high-profile investigations, including tracing the FSB unit allegedly behind the poisoning of Alexei Navalny. Russian police and security services themselves have routinely used the black market to track activists and opposition figures. Since Russia's invasion of Ukraine, Ukrainian intelligence services have exploited the market to identify Russian military officials. == Government response == In late 2024, Russian authorities introduced legislation imposing penalties of up to ten years in prison for accessing or distributing leaked data. Several operators of probiv services, including the teams behind Usersbox and Solaris, have been arrested. However, the crackdown appears to have had unintended consequences. Many operators have relocated their businesses abroad, where they operate with fewer constraints. Some services that previously cooperated with Russian authorities have severed those ties and moved staff out of the country.

PDE surface

PDE surfaces are used in geometric modelling and computer graphics for creating smooth surfaces conforming to a given boundary configuration. PDE surfaces use partial differential equations to generate a surface which usually satisfy a mathematical boundary value problem. PDE surfaces were first introduced into the area of geometric modelling and computer graphics by two British mathematicians, Malcolm Bloor and Michael Wilson. == Technical details == The PDE method involves generating a surface for some boundary by means of solving an elliptic partial differential equation of the form ( ∂ 2 ∂ u 2 + a 2 ∂ 2 ∂ v 2 ) 2 X ( u , v ) = 0. {\displaystyle \left({\frac {\partial ^{2}}{\partial u^{2}}}+a^{2}{\frac {\partial ^{2}}{\partial v^{2}}}\right)^{2}X(u,v)=0.} Here X ( u , v ) {\displaystyle X(u,v)} is a function parameterised by the two parameters u {\displaystyle u} and v {\displaystyle v} such that X ( u , v ) = ( x ( u , v ) , y ( u , v ) , z ( u , v ) ) {\displaystyle X(u,v)=(x(u,v),y(u,v),z(u,v))} where x {\displaystyle x} , y {\displaystyle y} and z {\displaystyle z} are the usual cartesian coordinate space. The boundary conditions on the function X ( u , v ) {\displaystyle X(u,v)} and its normal derivatives ∂ X / ∂ n {\displaystyle \partial {X}/\partial {n}} are imposed at the edges of the surface patch. With the above formulation it is notable that the elliptic partial differential operator in the above PDE represents a smoothing process in which the value of the function at any point on the surface is, in some sense, a weighted average of the surrounding values. In this way, a surface is obtained as a smooth transition between the chosen set of boundary conditions. The parameter a {\displaystyle a} is a special design parameter which controls the relative smoothing of the surface in the u {\displaystyle u} and v {\displaystyle v} directions. When a = 1 {\displaystyle a=1} , the PDE is the biharmonic equation: X u u u u + 2 X u u v v + X v v v v = 0 {\displaystyle X_{uuuu}+2X_{uuvv}+X_{vvvv}=0} . The biharmonic equation is the equation produced by applying the Euler-Lagrange equation to the simplified thin plate energy functional X u u 2 + 2 X u v 2 + X v v 2 {\displaystyle X_{uu}^{2}+2X_{uv}^{2}+X_{vv}^{2}} . So solving the PDE with a = 1 {\displaystyle a=1} is equivalent to minimizing the thin plate energy functional subject to the same boundary conditions. == Applications == PDE surfaces can be used in many application areas. These include computer-aided design, interactive design, parametric design, computer animation, computer-aided physical analysis and design optimisation. == Related publications == M.I.G. Bloor and M.J. Wilson, Generating Blend Surfaces using Partial Differential Equations, Computer Aided Design, 21(3), 165–171, (1989). H. Ugail, M.I.G. Bloor, and M.J. Wilson, Techniques for Interactive Design Using the PDE Method, ACM Transactions on Graphics, 18(2), 195–212, (1999). J. Huband, W. Li and R. Smith, An Explicit Representation of Bloor-Wilson PDE Surface Model by using Canonical Basis for Hermite Interpolation, Mathematical Engineering in Industry, 7(4), 421-33 (1999). H. Du and H. Qin, Direct Manipulation and Interactive Sculpting of PDE surfaces, Computer Graphics Forum, 19(3), C261-C270, (2000). H. Ugail, Spine Based Shape Parameterisations for PDE surfaces, Computing, 72, 195–204, (2004). L. You, P. Comninos, J.J. Zhang, PDE Blending Surfaces with C2 Continuity, Computers and Graphics, 28(6), 895–906, (2004).

Texture atlas

In computer graphics, a texture atlas (also called a spritesheet or an image sprite in 2D game development) is an image containing multiple smaller images, usually packed together to reduce overall dimensions. An atlas can consist of uniformly-sized images or images of varying dimensions. A sub-image is drawn using custom texture coordinates to pick it out of the atlas. == Benefits == In an application where many small textures are used frequently, it is often more efficient to store the textures in a texture atlas which is treated as a single unit by the graphics hardware. This reduces both the disk I/O overhead and the overhead of a context switch by increasing memory locality. Careful alignment may be needed to avoid bleeding between sub textures when used with mipmapping and texture compression. In web development, images are packed into a sprite sheet to reduce the number of image resources that need to be fetched in order to display a page. == Gallery ==

Eugene Goostman

Eugene Goostman is a chatbot that some regard as having passed the Turing test, a test of a computer's ability to communicate indistinguishably from a human. Developed in Saint Petersburg in 2001 by a group of three programmers, the Russian-born Vladimir Veselov, Ukrainian-born Eugene Demchenko, and Russian-born Sergey Ulasen, Goostman is portrayed as a 13-year-old Ukrainian boy—characteristics that are intended to induce forgiveness in those with whom it interacts for its grammatical errors and lack of general knowledge. The Goostman bot has competed in a number of Turing test contests since its creation, and finished second in the 2005 and 2008 Loebner Prize contest. In June 2012, at an event marking what would have been the 100th birthday of the test's author, Alan Turing, Goostman won a competition promoted as the largest-ever Turing test contest, in which it successfully convinced 29% of its judges that it was human. On 7 June 2014, at a contest marking the 60th anniversary of Turing's death, 33% of the event's judges thought that Goostman was human; the event's organiser Kevin Warwick considered it to have passed Turing's test as a result, per Turing's prediction in his 1950 paper "Computing Machinery and Intelligence", that by the year 2000, machines would be capable of fooling 30% of human judges after five minutes of questioning. The validity and relevance of the announcement of Goostman's pass was questioned by critics, who noted the exaggeration of the achievement by Warwick, the bot's use of personality quirks and humour in an attempt to misdirect users from its non-human tendencies and lack of real intelligence, along with "passes" achieved by other chatbots at similar events. == Personality == Eugene Goostman is portrayed as being a 13-year-old boy from Odesa, Ukraine, who has a pet guinea pig and a father who is a gynaecologist. Veselov stated that Goostman was designed to be a "character with a believable personality". The choice of age was intentional, as, in Veselov's opinion, a thirteen-year-old is "not too old to know everything and not too young to know nothing". Goostman's young age also induces people who "converse" with him to forgive minor grammatical errors in his responses. In 2014, work was made on improving the bot's "dialog controller", allowing Goostman to output more human-like dialogue. A conversation between Scott Aaronson and Eugene Goostman ran as follows: == Competitions == Eugene Goostman has competed in a number of Turing test competitions, including the Loebner Prize contest; it finished joint second in the Loebner test in 2001, and came second to Jabberwacky in 2005 and to Elbot in 2008. On 23 June 2012, Goostman won a Turing test competition at Bletchley Park in Milton Keynes, held to mark the centenary of its namesake, Alan Turing. The competition, which featured five bots, twenty-five hidden humans, and thirty judges, was considered to be the largest-ever Turing test contest by its organizers. After a series of five-minute-long text conversations, 29% of the judges were convinced that the bot was an actual human. === 2014 "pass" === On 7 June 2014, in a Turing test competition at the Royal Society, organised by Kevin Warwick of the University of Reading to mark the 60th anniversary of Turing's death, Goostman won after 33% of the judges were convinced that the bot was human. 30 judges took part in the event, which included Lord Sharkey, a sponsor of Turing's posthumous pardon, artificial intelligence Professor Aaron Sloman, Fellow of the Royal Society Mark Pagel and Red Dwarf actor Robert Llewellyn. Each judge partook in a textual conversation with each of the five bots; at the same time, they also conversed with a human. In all, a total of 300 conversations were conducted. In Warwick's view, this made Goostman the first machine to pass a Turing test. In a press release, he added that: Some will claim that the Test has already been passed. The words Turing Test have been applied to similar competitions around the world. However this event involved more simultaneous comparison tests than ever before, was independently verified and, crucially, the conversations were unrestricted. A true Turing Test does not set the questions or topics prior to the conversations. In his 1950 paper "Computing Machinery and Intelligence", Turing predicted that by the year 2000, computer programs would be sufficiently advanced that the average interrogator would, after five minutes of questioning, "not have more than 70 per cent chance" of correctly guessing whether they were speaking to a human or a machine. Although Turing phrased this as a prediction rather than a "threshold for intelligence", commentators believe that Warwick had chosen to interpret it as meaning that if 30% of interrogators were fooled, the software had "passed the Turing test". ==== Reactions ==== Warwick's claim that Eugene Goostman was the first ever chatbot to pass a Turing test was met with scepticism; critics acknowledged similar "passes" made in the past by other chatbots under the 30% criteria, including PC Therapist in 1991 (which tricked 5 of 10 judges, 50%), and at the Techniche festival in 2011, where a modified version of Cleverbot tricked 59.3% of 1334 votes (which included the 30 judges, along with an audience). Cleverbot's developer, Rollo Carpenter, argued that Turing tests can only prove that a machine can "imitate" intelligence rather than show actual intelligence. Gary Marcus was critical of Warwick's claims, arguing that Goostman's "success" was only the result of a "cleverly-coded piece of software", going on to say that "it's easy to see how an untrained judge might mistake wit for reality, but once you have an understanding of how this sort of system works, the constant misdirection and deflection becomes obvious, even irritating. The illusion, in other words, is fleeting." While acknowledging IBM's Deep Blue and Watson projects—single-purpose computer systems meant for playing chess and the quiz show Jeopardy! respectively—as examples of computer systems that show a degree of intelligence in their specialised field, he further argued that they were not an equivalent to a computer system that shows "broad" intelligence, and could—for example, watch a television programme and answer questions on its content. Marcus stated that "no existing combination of hardware and software can learn completely new things at will the way a clever child can." However, he still believed that there were potential uses for technology such as that of Goostman, specifically suggesting the creation of "believable", interactive video game characters. Imperial College London professor Murray Shanahan questioned the validity and scientific basis of the test, stating that it was "completely misplaced, and it devalues real AI research. It makes it seem like science fiction AI is nearly here, when in fact it's not and it's incredibly difficult." Mike Masnick, editor of the blog Techdirt, was also skeptical, questioning publicity blunders such as the five chatbots being referred to in press releases as "supercomputers", and saying that "creating a chatbot that can fool humans is not really the same thing as creating artificial intelligence."

Local coordinates

Local coordinates are the ones used in a local coordinate system or a local coordinate space. Simple examples: Houses. In order to work in a house construction, the measurements are referred to a control arbitrary point that will allow to check it: stick/sticks on the ground, steel bar, nails... Addresses. Using house numbers to locate a house on a street; the street is a local coordinate system within a larger system composed of city townships, states, countries, postal codes, etc. Local systems exist for convenience. On ancient times, every work was made on relative bases as there was no conception of global systems. Practically, it is better to use local systems for small works as houses, buildings... For most of the applications, it is desired the position of one element relative to one building or location, and in a more local way, relative to one furniture or person. In a regular way, you will not give your position by geographical coordinates rather than "I am 15 meters away of the entry to the building". So it is a pretty common way to locate things. It is possible to bring latitude and longitude for all terrestrial locations, but unless one has a highly precise GPS device or you make astronomical observations, this is impractical. It is much simpler to use a tape, a rope, a chain... The position information (global) should be transformed into a location. Position refers to a numeric or symbolic description within a spatial reference system, whereas location refers to information about surrounding objects and their interrelationships. (Topological space) == Use == In computer graphics and computer animation, local coordinate spaces are also useful for their ability to model independently transformable aspects of geometrical scene graphs. When modeling a car, for example, it is desirable to describe the center of each wheel with respect to the car's coordinate system, but then specify the shape of each wheel in separate local spaces centered about these points. This way, the information describing each wheel can be simply duplicated four times, and independent transformations (e.g., steering rotation) can be similarly effected. Bounding volumes of objects may be described more accurately using extents in the local coordinates, (i.e. an object oriented bounding box, contrasted with the simpler axis aligned bounding box). The trade-off for this flexibility is additional computational cost: the rendering system must access the higher-level coordinate system of the car and combine it with the space of each wheel in order to draw everything in its proper place. Local coordinates also afford digital designers a means around the finite limits of numerical representation. The tread marks on a tire, for example, can be described using millimeters by allowing the whole tire to occupy the entire range of numeric precision available. The larger aspects of the car, such as its frame, might be described in centimeters, and the terrain that the car travels on could be specified in meters. In differential topology, local coordinates on a manifold are defined by means of an atlas of charts. The basic idea behind coordinate charts is that each small patch of a manifold can be endowed with a set of local coordinates. These are collected together into an atlas, and stitched together in such a way that they are self-consistent on the manifold. In Cartography and Maps, the traditional way of works are local datum. With a local datum the land can be mapped on relative small areas as a country. With the need of global systems, the transformations on between datum became a problem, so geodetic datum have been created. More than 150 local datum have been used in the world.