Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is owned and funded by High-Flyer, a Chinese hedge fund. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies. The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025. DeepSeek-R1 provided responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1. Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1. DeepSeek's success against larger and more established rivals has been described as "upending AI". DeepSeek's models are described as "open-weight", meaning the exact parameters are openly shared, but the training data is not openly licensed. Since the January 2025 debut of DeepSeek-R1, the company has made its new models available under free and open-source software licenses, primarily the MIT License. The company reportedly recruits AI researchers from top Chinese universities and also hires from outside traditional computer science fields to broaden its models' knowledge and capabilities. DeepSeek significantly reduced training expenses for their R1 model by incorporating techniques such as mixture of experts (MoE) layers. The company also trained its models during ongoing trade restrictions on AI chip exports to China, using weaker AI chips intended for export and employing fewer units overall. Observers say this breakthrough sent "shock waves" through the industry which were described as triggering a "Sputnik moment" for the US in the field of artificial intelligence, particularly due to its open-source, cost-effective, and high-performing AI models. This threatened established AI hardware leaders such as Nvidia; Nvidia's share price dropped sharply, losing US$600 billion in market value, the largest single-company decline in U.S. stock market history. == History == === Founding and early years (2016–2023) === In February 2016, High-Flyer was co-founded by AI enthusiast Liang Wenfeng, who had been trading since the 2008 financial crisis while attending Zhejiang University. The company began stock trading using a GPU-dependent deep learning model on 21 October 2016; before then, it had used CPU-based linear models. By the end of 2017, most of its trading was driven by AI. Liang established High-Flyer as a hedge fund focused on developing and using AI trading algorithms, and by 2021 the firm was using AI exclusively, often using Nvidia chips. In 2019, the company began constructing its first computing cluster, Fire-Flyer, at a cost of 200 million yuan; it contained 1,100 GPUs interconnected at 200 Gbit/s and was retired after 1.5 years in operation. By 2021, Liang had started buying large quantities of Nvidia GPUs for an AI project, reportedly obtaining 10,000 Nvidia A100 GPUs before the United States restricted chip sales to China. Computing cluster Fire-Flyer 2 began construction in 2021 with a budget of 1 billion yuan. It was reported that in 2022, Fire-Flyer 2's capacity had been used at over 96%, totaling 56.74 million GPU hours. 27% was used to support scientific computing outside the company. During 2022, Fire-Flyer 2 had 5,000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs. At the time, it exclusively used PCIe instead of the DGX version of A100, since at the time the models it trained could fit within a single 40 GB GPU VRAM and so there was no need for the higher bandwidth of DGX (i.e., it required only data parallelism but not model parallelism). Later, it incorporated NVLinks and NCCL (Nvidia Collective Communications Library) to train larger models that required model parallelism. On 14 April 2023, High-Flyer announced the launch of an artificial general intelligence (AGI) research lab, stating that the new lab would focus on developing AI tools unrelated to the firm's financial business. Two months later, on 17 July 2023, that lab was spun off into an independent company, DeepSeek, with High-Flyer as its principal investor and backer. Venture capital investors were reluctant to provide funding, as they considered it unlikely that the venture would be able to quickly generate an "exit". === Model releases since 2023 === DeepSeek released its first model, DeepSeek Coder, on 2 November 2023, followed by the DeepSeek-LLM series on 29 November 2023. In January 2024, it released two DeepSeek-MoE models (Base and Chat), and in April 3 DeepSeek-Math models (Base, Instruct, and RL). DeepSeek-V2 was released in May 2024, followed a month later by the DeepSeek-Coder V2 series. In September 2024, DeepSeek V2.5 was introduced and revised in December. On 20 November 2024, the preview of DeepSeek-R1-Lite became available via chat. In December, DeepSeek-V3-Base and DeepSeek-V3 (chat) were released. On 20 January 2025, DeepSeek launched the DeepSeek chatbot—based on the DeepSeek-R1 model—free for iOS and Android. By 27 January, DeepSeek surpassed ChatGPT as the most downloaded freeware app on the iOS App Store in the United States, triggering an 18% drop in Nvidia's share price. On 24 March 2025, DeepSeek released DeepSeek-V3-0324 under the MIT License. On 28 May 2025, DeepSeek released DeepSeek-R1-0528 under the MIT License. The model has been noted for more tightly following official Chinese Communist Party ideology and censorship in its answers to questions than prior models. On 21 August 2025, DeepSeek released DeepSeek V3.1 under the MIT License. This model features a hybrid architecture with thinking and non-thinking modes. It also surpasses prior models like V3 and R1, by over 40% on certain benchmarks like SWE-bench and Terminal-bench. It was updated to V3.1-Terminus on 22 September 2025. V3.2-Exp was released on 29 September 2025. It uses DeepSeek Sparse Attention, a more efficient attention mechanism based on previous research published in February. DeepSeek-V3.2 was released on 1 December 2025, alongside a DeepSeek-V3.2-Speciale variant that focused on reasoning. In February 2026, Anthropic accused DeepSeek of using thousands of fraudulent accounts to generate millions of conversations with Claude to train its own large language models. In April 2026, investors began speaking with DeepSeek for a $300 million funding round, which would bring DeepSeek to a total valuation of $10 billion. On April 24, 2026, DeepSeek released a preview of its V4 series, including the 1.6-trillion parameter DeepSeek-V4-Pro and the 284-billion parameter DeepSeek-V4-Flash, both featuring a 1-million token context window, under the MIT License. DeepSeek's V4 LLM has been adopted by key semiconductor manufacturers and artificial intelligence chipmakers such as Huawei and Cambricon. == Company operation == DeepSeek is headquartered in Hangzhou, Zhejiang, and is owned and funded by High-Flyer. Its co-founder, Liang Wenfeng, serves as CEO. As of May 2024, Liang personally held an 84% stake in DeepSeek through two shell corporations. === Strategy === DeepSeek has stated that it focuses on research and does not have immediate plans for commercialization. This posture also means it can skirt certain provisions of China's AI regulations aimed at consumer-facing technologies. DeepSeek's hiring approach emphasizes skills over lengthy work experience, resulting in many hires fresh out of university. The company likewise recruits individuals without computer science backgrounds to expand the range of expertise incorporated into the models, for instance in poetry or advanced mathematics. According to The New York Times, dozens of DeepSeek researchers have or have previously had affiliations with People's Liberation Army laboratories and the Seven Sons of National Defence. Due to the impact of United States restrictions on chips, DeepSeek refined its algorithms to maximise computational efficiency and thereby leveraged older hardware and reduced energy consumption. DeepSeek also expanded on the African continent as it offers more affordable and less power-hungry AI solutions. The company has bolstered African language models and generated a number of startups, for example in Nairobi. Along with Huawei's storage and cloud computing services, the impact on the tech scene in sub-saharan Africa is considerable. DeepSeek offers local data sovereignty and more flexibility compared to Western AI platforms. == Training framework == High-Flyer/DeepSeek had operated at least two primary computing clusters: Fire-Flyer (萤火一号) and Fire-Flyer 2 (萤火二号). Fire-Flyer 1 was constructed in 2019 and was retired after 1.5 years of operation. Fi
Client-side persistent data
Client-side persistent data or CSPD is a term used in computing for storing data required by web applications to complete internet tasks on the client-side as needed rather than exclusively on the server. As a framework it is one solution to the needs of Occasionally connected computing or OCC. A major challenge for HTTP as a stateless protocol has been asynchronous tasks. The AJAX pattern using XMLHttpRequest was first introduced by Microsoft in the context of the Outlook e-mail product. The first CSPD were the 'cookies' introduced by the Netscape Navigator. ActiveX components which have entries in the Windows registry can also be viewed as a form of client-side persistence.
Materials informatics
Materials informatics is a field of study that applies the principles of informatics and data science to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. The term "materials informatics" is frequently used interchangeably with "data science", "machine learning", and "artificial intelligence" by the community. This is an emerging field, with a goal to achieve high-speed and robust acquisition, management, analysis, and dissemination of diverse materials data with the goal of greatly reducing the time and risk required to develop, produce, and deploy new materials, which generally takes longer than 20 years. This field of endeavor is not limited to some traditional understandings of the relationship between materials and information. Some more narrow interpretations include combinatorial chemistry, process modeling, materials databases, materials data management, and product life cycle management. Materials informatics is at the convergence of these concepts, but also transcends them and has the potential to achieve greater insights and deeper understanding by applying lessons learned from data gathered on one type of material to others. By gathering appropriate meta data, the value of each individual data point can be greatly expanded. == Databases == Databases are essential for any informatics research and applications. In material informatics many databases exist containing both empirical data obtained experimentally, and theoretical data obtained computationally. Big data that can be used for machine learning is particularly difficult to obtain for experimental data due to the lack of a standard for reporting data and the variability in the experimental environment. This lack of big data has led to growing effort in developing machine learning techniques that utilize data extremely data sets. On the other hand, large uniform database of theoretical density functional theory (DFT) calculations exists. These databases have proven their utility in high-throughput material screening and discovery. Some common DFT databases and high throughput tools are listed below: Databases: MaterialsProject.org, MaterialsWeb.org (University of Florida) HT software: Pymatgen, MPInterfaces, Matminer == Beyond computational methods? == The concept of materials informatics is addressed by the Materials Research Society. For example, materials informatics was the theme of the December 2006 issue of the MRS Bulletin. The issue was guest-edited by John Rodgers of Innovative Materials, Inc., and David Cebon of Cambridge University, who described the "high payoff for developing methodologies that will accelerate the insertion of materials, thereby saving millions of investment dollars." The editors focused on the limited definition of materials informatics as primarily focused on computational methods to process and interpret data. They stated that "specialized informatics tools for data capture, management, analysis, and dissemination" and "advances in computing power, coupled with computational modeling and simulation and materials properties databases" will enable such accelerated insertion of materials. A broader definition of materials informatics goes beyond the use of computational methods to carry out the same experimentation, viewing materials informatics as a framework in which a measurement or computation is one step in an information-based learning process that uses the power of a collective to achieve greater efficiency in exploration. When properly organized, this framework crosses materials boundaries to uncover fundamental knowledge of the basis of physical, mechanical, and engineering properties. == Challenges == While there are many who believe in the future of informatics in the materials development and scaling process, many challenges remain. Hill, et al., write that "Today, the materials community faces serious challenges to bringing about this data-accelerated research paradigm, including diversity of research areas within materials, lack of data standards, and missing incentives for sharing, among others. Nonetheless, the landscape is rapidly changing in ways that should benefit the entire materials research enterprise." This remaining tension between traditional materials development methodologies and the use of more computationally, machine learning, and analytics approaches will likely exist for some time as the materials industry overcomes some of the cultural barriers necessary to fully embrace such new ways of thinking. == Analogy from Biology == The overarching goals of bioinformatics and systems biology may provide a useful analogy. Andrew Murray of Harvard University expresses the hope that such an approach "will save us from the era of "one graduate student, one gene, one PhD". Similarly, the goal of materials informatics is to save us from one graduate student, one alloy, one PhD. Such goals will require more sophisticated strategies and research paradigms than applying data-science methods to the same tasks set currently undertaken by students.
Knowledge graph
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. Since the development of the Semantic Web, knowledge graphs have often been associated with linked open data projects, focusing on the connections between concepts and entities. They are also historically associated with and used by search engines such as Google, Bing, and Yahoo; knowledge engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks such as LinkedIn and Facebook. Recent developments in data science and machine learning, particularly in graph neural networks, representation learning, and machine learning, have broadened the scope of knowledge graphs beyond their traditional use in search engines and recommender systems. They are increasingly used in scientific research, with notable applications in fields such as genomics, proteomics, and systems biology. == History == The term was coined as early as 1972 by the Austrian linguist Edgar W. Schneider, in a discussion of how to build modular instructional systems for courses. In the late 1980s, the University of Groningen and University of Twente jointly began a project called Knowledge Graphs, focusing on the design of semantic networks with edges restricted to a limited set of relations, to facilitate algebras on the graph. In subsequent decades, the distinction between semantic networks and knowledge graphs was blurred. Some early knowledge graphs were topic-specific. In 1985, Wordnet was founded, capturing semantic relationships between words and meanings – an application of this idea to language itself. In 2005, Marc Wirk founded Geonames to capture relationships between different geographic names and locales and associated entities. In 1998, Andrew Edmonds of Science in Finance Ltd in the UK created a system called ThinkBase that offered fuzzy-logic based reasoning in a graphical context. In 2007, both DBpedia and Freebase were founded as graph-based knowledge repositories for general-purpose knowledge. DBpedia focused exclusively on data extracted from Wikipedia, while Freebase also included a range of public datasets. Neither described themselves as a 'knowledge graph' but developed and described related concepts. In 2012, Google introduced their Knowledge Graph, building on DBpedia and Freebase among other sources. They later incorporated RDFa, Microdata, JSON-LD content extracted from indexed web pages, including the CIA World Factbook, Wikidata, and Wikipedia. Entity and relationship types associated with this knowledge graph have been further organized using terms from the schema.org vocabulary. The Google Knowledge Graph became a complement to string-based search within Google, and its popularity online brought the term into more common use. Since then, several large multinationals have advertised their use of knowledge graphs, further popularising the term. These include Facebook, LinkedIn, Airbnb, Microsoft, Amazon, Uber and eBay. In 2019, IEEE combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph. The development of large language models expanded interest in knowledge graphs as a way to structure information from unstructured text, with advances in language processing enabling their automatic or semi-automatic generation and expansion. The term knowledge graph has since broadened to include the dynamically constructed and adaptive graph structures, which support retrieval, reasoning, and summarization in generative systems. Microsoft Research's GraphRAG (2024) exemplified this development by integrating LLM-generated graphs into retrieval-augmented generation. == Definitions == There is no single commonly accepted definition of a knowledge graph. Most definitions view the topic through a Semantic Web lens and include these features: Flexible relations among knowledge in topical domains: A knowledge graph (i) defines abstract classes and relations of entities in a schema, (ii) mainly describes real world entities and their interrelations, organized in a graph, (iii) allows for potentially interrelating arbitrary entities with each other, and (iv) covers various topical domains. General structure: A network of entities, their semantic types, properties, and relationships. To represent properties, categorical or numerical values are often used. Supporting reasoning over inferred ontologies: A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge. There are, however, many knowledge graph representations for which some of these features are not relevant. For those knowledge graphs, this simpler definition may be more useful: A digital structure that represents knowledge as concepts and the relationships between them (facts). A knowledge graph can include an ontology that allows both humans and machines to understand and reason about its contents. === Implementations === In addition to the above examples, the term has been used to describe open knowledge projects such as YAGO and Wikidata; federations like the Linked Open Data cloud; a range of commercial search tools, including Yahoo's semantic search assistant Spark, Google's Knowledge Graph, and Microsoft's Satori; and the LinkedIn and Facebook entity graphs. The term is also used in the context of note-taking software applications that allow a user to build a personal knowledge graph. The popularization of knowledge graphs and their accompanying methods have led to the development of graph databases such as Neo4j, GraphDB and AgensGraph. These graph databases allow users to easily store data as entities and their interrelationships, and facilitate operations such as data reasoning, node embedding, and ontology development on knowledge bases. In contrast, virtual knowledge graphs do not store information in specialized databases. They rely on an underlying relational database or data lake to answer queries on the graph. Such a virtual knowledge graph system must be properly configured in order to answer the queries correctly. This specific configuration is done through a set of mappings that define the relationship between the elements of the data source and the structure and ontology of the virtual knowledge graph. == Using a knowledge graph for reasoning over data == A knowledge graph formally represents semantics by describing entities and their relationships. Knowledge graphs may make use of ontologies as a schema layer. By doing this, they allow logical inference for retrieving implicit knowledge rather than only allowing queries requesting explicit knowledge. In order to allow the use of knowledge graphs in various machine learning tasks, several methods for deriving latent feature representations of entities and relations have been devised. These knowledge graph embeddings allow them to be connected to machine learning methods that require feature vectors like word embeddings. This can complement other estimates of conceptual similarity. Models for generating useful knowledge graph embeddings are commonly the domain of graph neural networks (GNNs). GNNs are deep learning architectures that comprise edges and nodes, which correspond well to the entities and relationships of knowledge graphs. The topology and data structures afforded by GNNs provide a convenient domain for semi-supervised learning, wherein the network is trained to predict the value of a node embedding (provided a group of adjacent nodes and their edges) or edge (provided a pair of nodes). These tasks serve as fundamental abstractions for more complex tasks such as knowledge graph reasoning and alignment. === Entity alignment === As new knowledge graphs are produced across a variety of fields and contexts, the same entity will inevitably be represented in multiple graphs. However, because no single standard for the construction or representation of knowledge graph exists, resolving which entities from disparate graphs correspond to the same real world subject is a non-trivial task. This task is known as knowledge graph entity alignment, and is an active area of research. Strategies for entity alignment generally seek to identify similar substructures, semantic relationships, shared attributes, or combinations of all three between two distinct knowledge graphs. Entity alignment methods use these structural similarities between generally non-isomorphic graphs to predict which nodes correspond to the same entity. In 2023, researchers found success in using large language models (LLMs) in the task of entity alignment. This was in particul
HAKMEM
HAKMEM, alternatively known as AI Memo 239, is a February 1972 "memo" (technical report) of the MIT AI Lab containing a wide variety of hacks, including useful and clever algorithms for mathematical computation, some number theory and schematic diagrams for hardware – in Guy L. Steele's words, "a bizarre and eclectic potpourri of technical trivia". Contributors included about two dozen members and associates of the AI Lab. The title of the report is short for "hacks memo", abbreviated to six upper case characters that would fit in a single PDP-10 machine word (using a six-bit character set). == History == HAKMEM is notable as an early compendium of algorithmic technique, particularly for its practical bent, and as an illustration of the wide-ranging interests of AI Lab people of the time, which included almost anything other than AI research. HAKMEM contains original work in some fields, notably continued fractions. == Introduction == Compiled with the hope that a record of the random things people do around here can save some duplication of effort -- except for fun. Here is some little known data which may be of interest to computer hackers. The items and examples are so sketchy that to decipher them may require more sincerity and curiosity than a non-hacker can muster. Doubtless, little of this is new, but nowadays it's hard to tell. So we must be content to give you an insight, or save you some cycles, and to welcome further contributions of items, new or used.
Manifold regularization
In machine learning, manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition. == Manifold regularizer == === Motivation === Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions. In particular, manifold regularization extends the technique of Tikhonov regularization as applied to Reproducing kernel Hilbert spaces (RKHSs). Under standard Tikhonov regularization on RKHSs, a learning algorithm attempts to learn a function f {\displaystyle f} from among a hypothesis space of functions H {\displaystyle {\mathcal {H}}} . The hypothesis space is an RKHS, meaning that it is associated with a kernel K {\displaystyle K} , and so every candidate function f {\displaystyle f} has a norm ‖ f ‖ K {\displaystyle \left\|f\right\|_{K}} , which represents the complexity of the candidate function in the hypothesis space. When the algorithm considers a candidate function, it takes its norm into account in order to penalize complex functions. Formally, given a set of labeled training data ( x 1 , y 1 ) , … , ( x ℓ , y ℓ ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{\ell },y_{\ell })} with x i ∈ X , y i ∈ Y {\displaystyle x_{i}\in X,y_{i}\in Y} and a loss function V {\displaystyle V} , a learning algorithm using Tikhonov regularization will attempt to solve the expression arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ ‖ f ‖ K 2 {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma \left\|f\right\|_{K}^{2}} where γ {\displaystyle \gamma } is a hyperparameter that controls how much the algorithm will prefer simpler functions over functions that fit the data better. Manifold regularization adds a second regularization term, the intrinsic regularizer, to the ambient regularizer used in standard Tikhonov regularization. Under the manifold assumption in machine learning, the data in question do not come from the entire input space X {\displaystyle X} , but instead from a nonlinear manifold M ⊂ X {\displaystyle M\subset X} . The geometry of this manifold, the intrinsic space, is used to determine the regularization norm. === Laplacian norm === There are many possible choices for the intrinsic regularizer ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} . Many natural choices involve the gradient on the manifold ∇ M {\displaystyle \nabla _{M}} , which can provide a measure of how smooth a target function is. A smooth function should change slowly where the input data are dense; that is, the gradient ∇ M f ( x ) {\displaystyle \nabla _{M}f(x)} should be small where the marginal probability density P X ( x ) {\displaystyle {\mathcal {P}}_{X}(x)} , the probability density of a randomly drawn data point appearing at x {\displaystyle x} , is large. This gives one appropriate choice for the intrinsic regularizer: ‖ f ‖ I 2 = ∫ x ∈ M ‖ ∇ M f ( x ) ‖ 2 d P X ( x ) {\displaystyle \left\|f\right\|_{I}^{2}=\int _{x\in M}\left\|\nabla _{M}f(x)\right\|^{2}\,d{\mathcal {P}}_{X}(x)} In practice, this norm cannot be computed directly because the marginal distribution P X {\displaystyle {\mathcal {P}}_{X}} is unknown, but it can be estimated from the provided data. === Graph-based approach of the Laplacian norm === When the distances between input points are interpreted as a graph, then the Laplacian matrix of the graph can help to estimate the marginal distribution. Suppose that the input data include ℓ {\displaystyle \ell } labeled examples (pairs of an input x {\displaystyle x} and a label y {\displaystyle y} ) and u {\displaystyle u} unlabeled examples (inputs without associated labels). Define W {\displaystyle W} to be a matrix of edge weights for a graph, where W i j {\displaystyle W_{ij}} is a similarity built from distance measure between the data points x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} (so that more close implies higher W i j {\displaystyle W_{ij}} ). Define D {\displaystyle D} to be a diagonal matrix with D i i = ∑ j = 1 ℓ + u W i j {\displaystyle D_{ii}=\sum _{j=1}^{\ell +u}W_{ij}} and L {\displaystyle L} to be the Laplacian matrix D − W {\displaystyle D-W} . Then, as the number of data points ℓ + u {\displaystyle \ell +u} increases, L {\displaystyle L} converges to the Laplace–Beltrami operator Δ M {\displaystyle \Delta _{M}} , which is the divergence of the gradient ∇ M {\displaystyle \nabla _{M}} . Then, if f {\displaystyle \mathbf {f} } is a vector of the values of f {\displaystyle f} at the data, f = [ f ( x 1 ) , … , f ( x l + u ) ] T {\displaystyle \mathbf {f} =[f(x_{1}),\ldots ,f(x_{l+u})]^{\mathrm {T} }} , the intrinsic norm can be estimated: ‖ f ‖ I 2 = 1 ( ℓ + u ) 2 f T L f {\displaystyle \left\|f\right\|_{I}^{2}={\frac {1}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As the number of data points ℓ + u {\displaystyle \ell +u} increases, this empirical definition of ‖ f ‖ I 2 {\displaystyle \left\|f\right\|_{I}^{2}} converges to the definition when P X {\displaystyle {\mathcal {P}}_{X}} is known. === Solving the regularization problem with graph-based approach === Using the weights γ A {\displaystyle \gamma _{A}} and γ I {\displaystyle \gamma _{I}} for the ambient and intrinsic regularizers, the final expression to be solved becomes: arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ A ‖ f ‖ K 2 + γ I ( ℓ + u ) 2 f T L f {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma _{A}\left\|f\right\|_{K}^{2}+{\frac {\gamma _{I}}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As with other kernel methods, H {\displaystyle {\mathcal {H}}} may be an infinite-dimensional space, so if the regularization expression cannot be solved explicitly, it is impossible to search the entire space for a solution. Instead, a representer theorem shows that under certain conditions on the choice of the norm ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} , the optimal solution f ∗ {\displaystyle f^{}} must be a linear combination of the kernel centered at each of the input points: for some weights α i {\displaystyle \alpha _{i}} , f ∗ ( x ) = ∑ i = 1 ℓ + u α i K ( x i , x ) {\displaystyle f^{}(x)=\sum _{i=1}^{\ell +u}\alpha _{i}K(x_{i},x)} Using this result, it is possible to search for the optimal solution f ∗ {\displaystyle f^{}} by searching the finite-dimensional space defined by the possible choices of α i {\displaystyle \alpha _{i}} . === Functional approach of the Laplacian norm === The idea beyond the graph-Laplacian is to use neighbors to estimate the Laplacian. This method is akin to local averaging methods, that are known to scale poorly in high-dimensional problems. Indeed, the graph Laplacian is known to suffer from the curse of dimensionality. Luckily, it is possible to leverage expected smoothness of the function to estimate thanks to more advanced functional analysis. This method consists of estimating the Laplacian operator using derivatives of the kernel reading ∂ 1 , j K ( x i , x ) {\displaystyle \partial _{1,j}K(x_{i},x)} where ∂ 1 , j {\displaystyle \partial _{1,j}} denotes the partial derivatives according to the j-th coordinate of the first variable. This second approach to the Laplacian norm is to put in relation with meshfree methods, that contrast with the finite difference method in PDE. == Applications == Manifold regularization can extend a variety of algorithms that can be expressed using Tikhonov regularization, by choosing an appropriate loss function V {\displaystyle V} and hypothesis space H {\displaystyle {\mathcal {H}}} . Two commonly used examples are the families of support vector machines and regularized least squares algorithm
XOR swap algorithm
In computer programming, the exclusive or swap (sometimes shortened to XOR swap) is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required. The algorithm is primarily a novelty and a way of demonstrating properties of the exclusive or operation. It is sometimes discussed as a program optimization, but there are almost no cases where swapping via exclusive or provides benefit over the standard, obvious technique. == The algorithm == Conventional swapping requires the use of a temporary storage variable. Using the XOR swap algorithm, however, no temporary storage is needed. The algorithm is as follows: Since XOR is a commutative operation, either X XOR Y or Y XOR X can be used interchangeably in any of the foregoing three lines. Note that on some architectures the first operand of the XOR instruction specifies the target location at which the result of the operation is stored, preventing this interchangeability. The algorithm typically corresponds to three machine-code instructions, represented by corresponding pseudocode and assembly instructions in the three rows of the following table: In the above System/370 assembly code sample, R1 and R2 are distinct registers, and each XR operation leaves its result in the register named in the first argument. Using x86 assembly, values X and Y are in registers eax and ebx (respectively), and xor places the result of the operation in the first register (Note: x86 supports XCHG instruction so using triple XOR do not make sense on this architecture). In RISC-V assembly, value X and Y are in registers x10 and x11, and xor places the result of the operation in the first operand. However, in the pseudocode or high-level language version or implementation, the algorithm fails if x and y use the same storage location, since the value stored in that location will be zeroed out by the first XOR instruction, and then remain zero; it will not be "swapped with itself". This is not the same as if x and y have the same values. The trouble only comes when x and y use the same storage location, in which case their values must already be equal. That is, if x and y use the same storage location, then the line: sets x to zero (because x = y so X XOR Y is zero) and sets y to zero (since it uses the same storage location), causing x and y to lose their original values. == Proof of correctness == The binary operation XOR over bit strings of length N {\displaystyle N} exhibits the following properties (where ⊕ {\displaystyle \oplus } denotes XOR): L1. Commutativity: A ⊕ B = B ⊕ A {\displaystyle A\oplus B=B\oplus A} L2. Associativity: ( A ⊕ B ) ⊕ C = A ⊕ ( B ⊕ C ) {\displaystyle (A\oplus B)\oplus C=A\oplus (B\oplus C)} L3. Identity exists: there is a bit string, 0, (of length N) such that A ⊕ 0 = A {\displaystyle A\oplus 0=A} for any A {\displaystyle A} L4. Each element is its own inverse: for each A {\displaystyle A} , A ⊕ A = 0 {\displaystyle A\oplus A=0} . Suppose that we have two distinct registers R1 and R2 as in the table below, with initial values A and B respectively. We perform the operations below in sequence, and reduce our results using the properties listed above. === Linear algebra interpretation === As XOR can be interpreted as binary addition and a pair of bits can be interpreted as a vector in a two-dimensional vector space over the field with two elements, the steps in the algorithm can be interpreted as multiplication by 2×2 matrices over the field with two elements. For simplicity, assume initially that x and y are each single bits, not bit vectors. For example, the step: which also has the implicit: corresponds to the matrix ( 1 1 0 1 ) {\displaystyle \left({\begin{smallmatrix}1&1\\0&1\end{smallmatrix}}\right)} as ( 1 1 0 1 ) ( x y ) = ( x + y y ) . {\displaystyle {\begin{pmatrix}1&1\\0&1\end{pmatrix}}{\begin{pmatrix}x\\y\end{pmatrix}}={\begin{pmatrix}x+y\\y\end{pmatrix}}.} The sequence of operations is then expressed as: ( 1 1 0 1 ) ( 1 0 1 1 ) ( 1 1 0 1 ) = ( 0 1 1 0 ) {\displaystyle {\begin{pmatrix}1&1\\0&1\end{pmatrix}}{\begin{pmatrix}1&0\\1&1\end{pmatrix}}{\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&1\\1&0\end{pmatrix}}} (working with binary values, so 1 + 1 = 0 {\displaystyle 1+1=0} ), which expresses the elementary matrix of switching two rows (or columns) in terms of the transvections (shears) of adding one element to the other. To generalize to where X and Y are not single bits, but instead bit vectors of length n, these 2×2 matrices are replaced by 2n×2n block matrices such as ( I n I n 0 I n ) . {\displaystyle \left({\begin{smallmatrix}I_{n}&I_{n}\\0&I_{n}\end{smallmatrix}}\right).} These matrices are operating on values, not on variables (with storage locations), hence this interpretation abstracts away from issues of storage location and the problem of both variables sharing the same storage location. == Code example == A C function that implements the XOR swap algorithm: The code first checks if the addresses are distinct and uses a guard clause to exit the function early if they are equal. Without that check, if they were equal, the algorithm would fold to a triple x ^= x resulting in zero. == Reasons for avoidance in practice == On modern CPU architectures, the XOR technique can be slower than using a temporary variable to do swapping. At least on recent x86 CPUs, both by AMD and Intel, moving between registers regularly incurs zero latency. (This is called MOV-elimination.) Even if there is not any architectural register available to use, the XCHG instruction will be at least as fast as the three XORs taken together. Another reason is that modern CPUs strive to execute instructions in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order, negating any benefits of instruction-level parallelism. === Aliasing === The XOR swap is also complicated in practice by aliasing. If an attempt is made to XOR-swap the contents of some location with itself, the result is that the location is zeroed out and its value lost. Therefore, XOR swapping must not be used blindly in a high-level language if aliasing is possible. This issue does not apply if the technique is used in assembly to swap the contents of two registers. Similar problems occur with call by name, as in Jensen's Device, where swapping i and A[i] via a temporary variable yields incorrect results due to the arguments being related: swapping via temp = i; i = A[i]; A[i] = temp changes the value for i in the second statement, which then results in the incorrect i value for A[i] in the third statement. == Variations == The underlying principle of the XOR swap algorithm can be applied to any operation meeting criteria L1 through L4 above. Replacing XOR by addition and subtraction gives various slightly different, but largely equivalent, formulations. For example: Unlike the XOR swap, this variation requires that the underlying processor or programming language uses a method such as modular arithmetic or bignums to guarantee that the computation of X + Y cannot cause an error due to integer overflow. Therefore, it is seen even more rarely in practice than the XOR swap. However, the implementation of AddSwap above in the C programming language always works even in case of integer overflow, since, according to the C standard, addition and subtraction of unsigned integers follow the rules of modular arithmetic, i. e. are done in the cyclic group Z / 2 s Z {\displaystyle \mathbb {Z} /2^{s}\mathbb {Z} } where s {\displaystyle s} is the number of bits of unsigned int. Indeed, the correctness of the algorithm follows from the fact that the formulas ( x + y ) − y = x {\displaystyle (x+y)-y=x} and ( x + y ) − ( ( x + y ) − y ) = y {\displaystyle (x+y)-((x+y)-y)=y} hold in any abelian group. This generalizes the proof for the XOR swap algorithm: XOR is both the addition and subtraction in the abelian group ( Z / 2 Z ) s {\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{s}} (which is the direct sum of s copies of Z / 2 Z {\displaystyle \mathbb {Z} /2\mathbb {Z} } ). This doesn't hold when dealing with the signed int type (the default for int). Signed integer overflow is an undefined behavior in C and thus modular arithmetic is not guaranteed by the standard, which may lead to incorrect results. The sequence of operations in AddSwap can be expressed via matrix multiplication as: ( 1 − 1 0 1 ) ( 1 0 1 − 1 ) ( 1 1 0 1 ) = ( 0 1 1 0 ) {\displaystyle {\begin{pmatrix}1&-1\\0&1\end{pmatrix}}{\begin{pmatrix}1&0\\1&-1\end{pmatrix}}{\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&1\\1&0\end{pmatrix}}} == Application to register allocation == On architectures lacking a dedicated swap instruction, because it avoids the extra temporary register, the XOR swap algorithm is required for optimal register allocatio