Sharpness aware minimization

Sharpness aware minimization

Sharpness Aware Minimization (SAM) is an optimization algorithm used in machine learning that aims to improve model generalization. The method seeks to find model parameters that are located in regions of the loss landscape with uniformly low loss values, rather than parameters that only achieve a minimal loss value at a single point. This approach is described as finding "flat" minima instead of "sharp" ones. The rationale is that models trained this way are less sensitive to variations between training and test data, which can lead to better performance on unseen data. The algorithm was introduced in a 2020 paper by a team of researchers including Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. == Underlying Principle == SAM modifies the standard training objective by minimizing a "sharpness-aware" loss. This is formulated as a minimax problem where the inner objective seeks to find the highest loss value in the immediate neighborhood of the current model weights, and the outer objective minimizes this value: min w max ‖ ϵ ‖ p ≤ ρ L train ( w + ϵ ) + λ ‖ w ‖ 2 2 {\displaystyle \min _{w}\max _{\|\epsilon \|_{p}\leq \rho }L_{\text{train}}(w+\epsilon )+\lambda \|w\|_{2}^{2}} In this formulation: w {\displaystyle w} represents the model's parameters (weights). L train {\displaystyle L_{\text{train}}} is the loss calculated on the training data. ϵ {\displaystyle \epsilon } is a perturbation applied to the weights. ρ {\displaystyle \rho } is a hyperparameter that defines the radius of the neighborhood (an L p {\displaystyle L_{p}} ball) to search for the highest loss. An optional L2 regularization term, scaled by λ {\displaystyle \lambda } , can be included. A direct solution to the inner maximization problem is computationally expensive. SAM approximates it by taking a single gradient ascent step to find the perturbation ϵ {\displaystyle \epsilon } . This is calculated as: ϵ ( w ) = ρ ∇ L train ( w ) ‖ ∇ L train ( w ) ‖ 2 {\displaystyle \epsilon (w)=\rho {\frac {\nabla L_{\text{train}}(w)}{\|\nabla L_{\text{train}}(w)\|_{2}}}} The optimization process for each training step involves two stages. First, an "ascent step" computes a perturbed set of weights, w adv = w + ϵ ( w ) {\displaystyle w_{\text{adv}}=w+\epsilon (w)} , by moving towards the direction of the highest local loss. Second, a "descent step" updates the original weights w {\displaystyle w} using the gradient calculated at these perturbed weights, ∇ L train ( w adv ) {\displaystyle \nabla L_{\text{train}}(w_{\text{adv}})} . This update is typically performed using a standard optimizer like SGD or Adam. == Application and Performance == SAM has been applied in various machine learning contexts, primarily in computer vision. Research has shown it can improve generalization performance in models such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) on image datasets including ImageNet, CIFAR-10, and CIFAR-100. The algorithm has also been found to be effective in training models with noisy labels, where it performs comparably to methods designed specifically for this problem. Some studies indicate that SAM and its variants can improve out-of-distribution (OOD) generalization, which is a model's ability to perform well on data from distributions not seen during training. Other areas where it has been applied include gradual domain adaptation and mitigating overfitting in scenarios with repeated exposure to training examples. == Limitations == A primary limitation of SAM is its computational cost. By requiring two gradient computations (one for the ascent and one for the descent) per optimization step, it approximately doubles the training time compared to standard optimizers. The theoretical convergence properties of SAM are still under investigation. Some research suggests that with a constant step size, SAM may not converge to a stationary point. The accuracy of the single gradient step approximation for finding the worst-case perturbation may also decrease during the training process. The effectiveness of SAM can also be domain-dependent. While it has shown benefits for computer vision tasks, its impact on other areas, such as GPT-style language models where each training example is seen only once, has been reported as limited in some studies. Furthermore, while SAM seeks flat minima, some research suggests that not all flat minima necessarily lead to good generalization. The algorithm also introduces the neighborhood size ρ {\displaystyle \rho } as a new hyperparameter, which requires tuning. == Research, Variants, and Enhancements == Active research on SAM focuses on reducing its computational overhead and improving its performance. Several variants have been proposed to make the algorithm more efficient. These include methods that attempt to parallelize the two gradient computations, apply the perturbation to only a subset of parameters, or reduce the number of computation steps required. Other approaches use historical gradient information or apply SAM steps intermittently to lower the computational burden. To improve performance and robustness, variants have been developed that adapt the neighborhood size based on model parameter scales (Adaptive SAM or ASAM) or incorporate information about the curvature of the loss landscape (Curvature Regularized SAM or CR-SAM). Other research explores refining the perturbation step by focusing on specific components of the gradient or combining SAM with techniques like random smoothing. Theoretical work continues to analyze the algorithm's behavior, including its implicit bias towards flatter minima and the development of broader frameworks for sharpness-aware optimization that use different measures of sharpness.

BeeSafe

BeeSafe is a personal safety mobile app launched in 2015 as a Slovak startup. It is a location-based security service that notifies family members and friends in case the user of the app gets in danger. The app has received numerous awards. The app has more than 700 downloads and 250 active logins from more than 60 countries worldwide. == History == BeeSafe was founded on March 20, 2015 by Peter Stražovec and Michal Kačerík. The project was a winner of Žilina’s Startup Weekend 2013 and a StartupAwards.SK 2015 finalist. Later on, the app was released in the Android and iOS marketplace. The whole BeeSafe project was in The Spot booster and incubator in Bratislava for three months. BeeSafe entered into an agreement with the city of Piešťany in November 2015 to increase the security of its citizen by connecting the mobile app with the police platform. It is the first city that started using the BeeSafe platform. Further on, the application tries to help people in other Slovak cities. The cities can see the users only if they are in danger. == Awards == BeeSafe app received the Via Bona award, it is a winner of a Slovak startup and has other nominations too.

Sarvam AI

Sarvam AI is an Indian artificial intelligence company headquartered in Bengaluru, Karnataka. Founded in 2023, the company develops large language models (LLMs) and multimodal AI systems with a focus on Indian languages and region-specific use cases. The company has received venture capital backing and has participated in government-supported AI initiatives, including India's sovereign large language model programme under the IndiaAI Mission. == History == Sarvam AI was founded in August 2023 by Vivek Raghavan and Pratyush Kumar, who were previously associated with AI4Bharat at the Indian Institute of Technology Madras. In December 2023, the company announced a combined seed and Series A funding round of approximately US$41 million. The round was led by Lightspeed Venture Partners, with participation from Peak XV Partners and Khosla Ventures. In April 2025, the Ministry of Electronics and Information Technology (MeitY) selected Sarvam AI as one of the companies to develop an indigenous foundational model under the IndiaAI Mission. As part of the initiative, the company received access to government-supported computing infrastructure, including GPUs allocated for model training over a specified period. In February 2026, Sarvam AI introduced two large language models at the AI Impact Summit held at Bharat Mandapam, New Delhi. == Products and technology == Sarvam AI develops language models trained on datasets that include multiple Indian languages and code-mixed text. The company uses mixture-of-experts (MoE) architectures in some of its models. === Foundational language models === On 18 February 2026, the company announced the release of two foundational models: Sarvam-30B – A 30-billion parameter model based on a mixture-of-experts design. According to company disclosures reported by the media, the model activates approximately 1 billion parameters per token and supports a 32,000-token context window. Sarvam-105B – A 105-billion parameter model activating approximately 9 billion parameters per token, with a 128,000-token context window. The model is positioned for complex reasoning and enterprise applications. On 20th February 2026, the company released a beta version of the Sarvam-105B model which is named Indus. It is available on the Apple App Store, Google Play Store and the web. === Speech and vision systems === Sarvam AI has also developed multimodal systems including speech-to-text and vision-language models. Its speech model, referred to as Saaras V3 in company materials, supports multiple Indian languages. The company has also introduced a vision-language model known as Sarvam Vision, intended for document understanding and optical character recognition (OCR) in Indian scripts. === Devices === 'Sarvam Kaze' is an indigenous AI-powered wearable glass that listens, understands, and captures what users see the world through their eyes in real time. The device supports more than 10 Indian languages, enabling voice-based interaction and potentially real-time translation. The company plans to launch the device in May 2026. == Startup support == In March 2026, Sarvam AI launched the Sarvam Startup Program, an initiative providing selected early-stage companies with 6–12 months of API credits scaled to their needs, priority engineering support, and access to production infrastructure for developing multilingual AI applications in areas such as speech, translation, and large language models. == Open-source release == In February 2026, Sarvam AI announced and open-sourced two large language models: Sarvam 30B (30 billion parameters) and Sarvam 105B (105 billion parameters, using a Mixture-of-Experts architecture with 10.3 billion active parameters). Both models were trained from scratch on datasets focused on Indian languages and support advanced reasoning, multilingual tasks, mathematics, and coding. The models are hosted on Hugging Face under the Apache License and are intended for enterprise and developer applications in Indian languages. The models were subsequently released as open source under the Apache License 2.0, with model weights made available on Hugging Face (sarvamai/sarvam-30b and sarvamai/sarvam-105b) and AIKosh in early March 2026. == Government and institutional collaborations == In 2025, Sarvam AI was selected to contribute to India's sovereign AI model initiative under the IndiaAI Mission. The initiative aims to support domestic AI infrastructure and model development. In March 2025, the Unique Identification Authority of India (UIDAI) announced a collaboration with Sarvam AI to integrate AI-based voice interactions and multilingual support into Aadhaar-related services. Sarvam AI has also worked with AI4Bharat and academic institutions on language datasets and speech research projects. == Industry participation == Sarvam AI presented its foundational models at the India AI Impact Summit 2026 in New Delhi. The company has also been listed among Indian members of the AI Alliance, a consortium focused on open-source artificial intelligence initiatives. == List of models ==

Reason maintenance

Reason maintenance is a knowledge representation approach to efficient handling of inferred information that is explicitly stored. Reason maintenance distinguishes between base facts, which can be defeated, and derived facts. As such it differs from belief revision which, in its basic form, assumes that all facts are equally important. Reason maintenance was originally developed as a technique for implementing problem solvers. It encompasses a variety of techniques that share a common architecture: two components—a reasoner and a reason maintenance system—communicate with each other via an interface. The reasoner uses the reason maintenance system to record its inferences and justifications of ("reasons" for) the inferences. The reasoner also informs the reason maintenance system which are the currently valid base facts (assumptions). The reason maintenance system uses the information to compute the truth value of the stored derived facts and to restore consistency if an inconsistency is derived. == Truth maintenance system == A truth maintenance system, or TMS, is a knowledge representation method for representing both beliefs and their dependencies and an algorithm called the "truth maintenance algorithm" that manipulates and maintains the dependencies. The name truth maintenance is due to the ability of these systems to restore consistency. A truth maintenance system maintains consistency between old believed knowledge and current believed knowledge in the knowledge base (KB) through revision. If the current believed statements contradict the knowledge in the KB, then the KB is updated with the new knowledge. It may happen that the same data will again be believed, and the previous knowledge will be required in the KB. If the previous data are not present, but may be required for new inference. But if the previous knowledge was in the KB, then no retracing of the same knowledge is needed. The use of TMS avoids such retracing; it keeps track of the contradictory data with the help of a dependency record. This record reflects the retractions and additions which makes the inference engine (IE) aware of its current belief set. == Algorithm == Each statement having at least one valid justification is made a part of the current belief set. When a contradiction is found, the statement(s) responsible for the contradiction are identified and the records are appropriately updated. This process is called dependency-directed backtracking. The TMS algorithm maintains the records in the form of a dependency network. Each node in the network is an entry in the KB (a premise, antecedent, or inference rule etc.) Each arc of the network represent the inference steps through which the node was derived. A premise is a fundamental belief which is assumed to be true. They do not need justifications. The set of premises are the basis from which justifications for all other nodes will be derived. == Justification == There are two types of justification for a node. They are: Support list [SL] Conditional proof (CP) == Examples == Many kinds of truth maintenance systems exist. Two major types are single-context and multi-context truth maintenance. In single context systems, consistency is maintained among all facts in memory (KB) and relates to the notion of consistency found in classical logic. Multi-context systems support paraconsistency by allowing consistency to be relevant to a subset of facts in memory, a context, according to the history of logical inference. This is achieved by tagging each fact or deduction with its logical history. Multi-agent truth maintenance systems perform truth maintenance across multiple memories, often located on different machines. de Kleer's assumption-based truth maintenance system (ATMS, 1986) was utilized in systems based upon KEE on the Lisp Machine. The first multi-agent TMS was created by Mason and Johnson. It was a multi-context system. Bridgeland and Huhns created the first single-context multi-agent system.

ICAD (software)

ICAD (Corporate history: ICAD, Inc., Concentra (name change at IPO in 1995), KTI (name change in 1998), Dassault Systèmes (purchase in 2001) () is a knowledge-based engineering (KBE) system that enables users to encode design knowledge using a semantic representation that can be evaluated for Parasolid output. ICAD has an open architecture that can utilize all the power and flexibility of the underlying language. KBE, as implemented via ICAD, received a lot of attention due to the remarkable results that appeared to take little effort. ICAD allowed one example of end-user computing that in a sense is unparalleled. Most ICAD developers were degreed engineers. Systems developed by ICAD users were non-trivial and consisted of highly complicated code. In the sense of end-user computing, ICAD was the first to allow the power of a domain tool to be in the hands of the user, at the same time being open to allow extensions as identified and defined by the domain expert or subject-matter expert (SME). A COE article looked at the resulting explosion of expectations (see AI winter), which were not sustainable. However, such a bubble burst does not diminish the existence of ability that would exist were expectations and use reasonable or properly managed. == History == The original implementation of ICAD was on a Lisp machine (Symbolics). Some of the principals involved with the development were Larry Rosenfeld, Avrum Belzer, Patrick M. O'Keefe, Philip Greenspun, and David F. Place. The time frame was 1984–85. ICAD started on special-purpose Symbolics Lisp hardware and was then ported to Unix when Common Lisp became portable to general-purpose workstations. The original domain for ICAD was mechanical design with many application successes. However, ICAD has found use in other domains, such as electrical design, shape modeling, etc. An example project could be wind tunnel design or the development of a support tool for aircraft multidisciplinary design. Further examples can be found in the presentations at the annual IIUG (International ICAD Users Group) that have been published in the KTI Vault (1999 through 2002). Boeing and Airbus used ICAD extensively to develop various components in the 1990s and early 21st century. As of 2003, ICAD was featured strongly in several areas as evidenced by the Vision & Strategy Product Vision and Strategy presentation. After 2003, ICAD use diminished. At the end of 2001, the KTI Company faced financial difficulties and laid off most of its best staff. They were eventually bought out by Dassault who effectively scuppered the ICAD product. See IIUG at COE, 2003 (first meeting due to Dassault by KTI) The ICAD system was very expensive, relatively, and was in the price range of high-end systems. Market dynamics couldn't support this as there may not have been sufficient differentiating factors between ICAD and the lower-end systems (or the promises from Dassault). KTI was absorbed by Dassault Systèmes and ICAD is no longer considered the go-forward tool for knowledge-based engineering (KBE) applications by that company. Dassault Systèmes is promoting a suite of tools oriented around version 5 of their popular CATIA CAD application, with Knowledgeware the replacement for ICAD. As of 2005, things were still a bit unclear. ICAD 8.3 was delivered. The recent COE Aerospace Conference had a discussion about the futures of KBE. One issue involves the stacking of 'meta' issues within a computer model. How this is resolved, whether by more icons or the availability of an external language, remains to be seen. The Genworks GDL product (including kernel technology from the Gendl Project) is the nearest functional equivalent to ICAD currently available. == Particulars == ICAD provided a declarative language (IDL) using New Flavors (never converted to Common Lisp Object System (CLOS)) that supported a mechanism for relating parts (defpart) via a hierarchical set of relationships. Technically, the ICAD Defpart was a Lisp macro; the ICAD defpart list was a set of generic classes that can be instantiated with specific properties depending upon what was represented. This defpart list was extendible via composited parts that represented domain entities. Along with the part-subpart relations, ICAD supported generic relations via the object modeling abilities of Lisp. Example applications of ICAD range from a small collection of defparts that represents a part or component to a larger collection that represents an assembly. In terms of power, an ICAD system, when fully specified, can generate thousands of instances of parts on a major assembly design. One example of an application driving thousands of instances of parts is that of an aircraft wing – where fastener type and placement may number in the thousands, each instance requiring evaluation of several factors driving the design parameters. == Futures (KBE, etc.) == One role for ICAD may be serving as the defining prototype for KBE which would require that we know more about what occurred the past 15 years (much information is tied up behind corporate firewalls and under proprietary walls). With the rise of functional programming languages (an example is Haskell) in the markets, perhaps some of the power attributable to Lisp may be replicated.

Dataset shift

Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

Public First Action

Public First Action is a 501(c)(4) nonprofit organization focused on United States public policy related to artificial intelligence. Public First Action is a bipartisan group that advocates for AI transparency, safeguards, and export controls on advanced AI chips. The organization is aligned with the political action committees Jobs and Democracy, Defending Our Values and Public First. == History == Public First Action was formed in 2025 by former Congressmen Brad Carson, a Democrat, and Chris Stewart, a Republican, to advocate for federal, state, and local regulations related to AI. The group's formation followed the founding of a super PAC network, Leading the Future, which advocates for deregulation of the AI industry and faster development of the new technology. Public First Action supports measures that would increase transparency at frontier AI companies and impose export controls on advanced AI chips, in addition to opposing the preemption of state-level AI laws. In February 2026, Public First Action received $20 million from the AI company Anthropic. That same month, the group announced plans to support 30 to 50 Democrats and Republicans in state and federal races, with Public First Action and aligned super PACs launching advertisements in Nebraska, Tennessee, and other states. In one ad, Public First Action touted Senator Marsha Blackburn for her work on child online safety. As of 2026, the group plans to raise between $50 and $75 million for public oversight of AI and related reforms. == Organization == === Leadership and funding === Public First Action is led by Carson and Stewart. The group has raised nearly $50 million in funding with a goal of raising $75 million during the 2026 midterms. Anthropic has contributed $20 million to the group. === Structure === Public First Action is aligned with three political action committees: "Jobs and Democracy", which supports Democratic candidates; "Defending Our Values", which supports Republican candidates; and "Public First", which supports both Republicans and Democrats.