Grokipedia is an AI-generated online encyclopedia operated by the American company xAI. The site was launched on October 27, 2025. Some entries are generated by Grok, a large language model owned by the same company, while others were forked from Wikipedia, with some altered and some used nearly verbatim. Articles cannot be directly edited, though logged-in visitors to the encyclopedia can suggest new articles or corrections via a pop-up form, which are reviewed by Grok. The xAI founder Elon Musk suggested Grokipedia could be an alternative to Wikipedia that would "purge out the propaganda" he believes is promoted by the latter, describing Wikipedia as "woke" and an "extension of legacy media propaganda". External analysis of Grokipedia's content has focused on its accuracy and biases due to hallucinations and potential algorithmic bias, which reviewers have described as promoting right-wing perspectives and Musk's views. The majority of coverage has described the website as validating, promoting, and legitimizing a variety of debunked conspiracy theories and ideas against scientific consensus on topics such as HIV/AIDS denialism, vaccines and autism, climate change, and race and intelligence. The site has been accused of whitewashing far-right extremism, such as by falsely claiming a white genocide is actively occurring. Several right-wing figures have welcomed the site. Studies have highlighted its use of sources deemed as having very low credibility such as X conversations and neo-Nazi websites, and for writing about far-right figures and topics in a promotional manner. == Background == Wikipedia is an online encyclopedia written and maintained by a community of volunteers. Its possible bias has been studied and debated. In 2018, Haaretz noted "Wikipedia has succeeded in being accused of being both too liberal and too conservative, and has critics from across the spectrum". xAI is an American AI company founded by Elon Musk in 2023. Its flagship product is the family of large language models called Grok. == History == In 2021, Musk expressed affection for Wikipedia on its 20th anniversary. In 2022, however, Musk argued that Wikipedia was "losing its objectivity", and in 2023, said he would donate US$1 billion to the project if it was pejoratively renamed "Dickipedia". In December 2024, Musk called for a boycott of donations to Wikipedia over its perceived left-wing bias, calling it "Wokepedia". In January 2025, Musk made a series of statements on Twitter denouncing Wikipedia for its description of the incident where he made a controversial gesture, which many viewed as resembling a Nazi salute, at president Donald Trump's second inauguration. Musk has since positioned Grokipedia as an alternative to Wikipedia that would "purge out the propaganda" in the latter, with Musk describing Wikipedia as "woke" and an "extension of legacy media propaganda". === Idea and announcement === In September 2025, Musk spoke at the All-In podcast conference with David O. Sacks, the White House advisor on AI and cryptocurrency, about how Grok consumed data from Wikipedia and other sources to gain more complete knowledge of the world. Sacks suggested publishing its knowledge base as an artifact called "Grokipedia", saying "Wikipedia is so biased, it's a constant war". Following the conversation, Musk announced that xAI was building a new AI-generated online encyclopedia called Grokipedia. According to Musk's announcement, it would be an AI-powered knowledge base designed to rival Wikipedia by addressing its perceived biases, errors, and ideological slants. The project positioned itself within a history of ideologically driven alternatives to Wikipedia, such as the conservative Conservapedia (launched in 2006) and the Russian-government-friendly Ruwiki (launched in 2023). However, Grokipedia is distinct in its core reliance on artificial intelligence rather than human community editing. === Launch and traffic === On October 6, 2025, Musk announced that the early version of Grokipedia was scheduled for release in two weeks, but the project was postponed briefly to address content quality issues. It launched on October 27, 2025, labeled "v 0.1", with over 800,000 articles, compared to over seven million English Wikipedia articles as of September 1, 2025. According to an initial analysis of usage figures by Similarweb, which evaluates data from registered users and partners, Grokipedia recorded a peak of over 460,000 website visits in the US on October 28, 2025. After that, traffic dropped significantly and settled at around 35,000 visits per day between November 8 and 11, 2025. As of early 2026, it had over 5.6 million articles. In January 2026, The Guardian reported that GPT-5.2 frequently cited Grokipedia as a source in responses, raising concerns of misinformation on ChatGPT. The same month, The Verge reported that Google's AI Overviews, AI Mode, and Gemini language model, as well as Microsoft Copilot and Perplexity AI, used Grokipedia to answer niche, obscure, or highly specific factual questions or "non-sensitive queries." According to a case study published by SEO Engico, the site received only 19 clicks from Google Search in November 2025 but reached approximately 3.2 million monthly clicks by January 2026, with over 900,000 pages indexed and millions of ranking keywords. Analysts attributed the surge in part to the site's technical structure and large-scale AI-generated content production. In early February 2026, Grokipedia's visibility in Google Search declined sharply. SEO analysts, including Glenn Gabe and Malte Landwehr, reported a significant drop in rankings across Google organic results as well as in Google AI Overviews and AI Mode. The same case study cited independent reviews that identified citation quality concerns, including references to low-credibility sources and instances of self-citation. By mid-February 2026, Grokipedia had reportedly lost much of its previous search visibility, and Wikipedia ranked above it for searches related to its own name. === Updates === ==== Future ==== In November 2025, Musk announced that he eventually plans to change the name of the site to Encyclopedia Galactica when Grokipedia is "good enough", saying that it had a "long way to go". This name is taken from the publication of that title in the works of Isaac Asimov and Douglas Adams. Musk said that he hoped to send copies of the encyclopedia to "the Moon and Mars and out to deep space". == Content == The Grok large language model generates and fact-checks articles on Grokipedia. Users cannot directly edit Grokipedia articles, but logged-in users can suggest edits and report errors, with such submissions being reviewed and implemented by the Grok AI. Some articles are nearly identical to their Wikipedia entries, but the format of Grokipedia citations is different, and some Grokipedia articles were republished almost verbatim, accompanied by a disclaimer noting that the content was "adapted from Wikipedia" under a Creative Commons license. Others were completely rewritten from scratch using Musk's AI chatbot, Grok. Forbes identified the articles AMD, Lamborghini, and PlayStation 5 as examples of copied Wikipedia articles. Articles attributed to Wikipedia carry a Creative Commons Attribution-ShareAlike license, while the license of other articles is licensed under the "X Community License", a license that accepts reuse and remixing for "non-commercial and research purposes" and commercial use that abides to "all of the guardrails provided in xAI's Acceptable Use Policy". On October 31, 2025, Musk clarified that the duplication of Wikipedia articles was intentional, saying that the Grokipedia team instructed Grok to compile Wikipedia's top 1 million articles and make content changes to them. The site's design has been described as minimalist with a simple homepage including little more than a large search bar. In a comparative textual analysis of the most heavily edited matched article pairs from Grokipedia and Wikipedia, Grokipedia entries are substantially longer and less densely referenced, indicating that AI-produced encyclopedias prioritize exposition rather than source-based validation. Starting in version 0.2, Grok reviews and implements approved suggested edits, and a small panel rotates through a display of the names of several recently edited articles. In February 2026, the Columbia Journalism Review reported on an analysis by the Tow Center for Digital Journalism finding that Grok, the AI behind Grokipedia, had increasingly begun suggesting and approving edits to the site itself without human involvement. According to the report, AI-generated edit suggestions overtook human submissions in December 2025 and accounted for more than three-quarters of proposed changes. The analysis raised concerns about transparency, editorial oversight, and fact-checking standards, particularly after instances in which Grok proposed or modified politically s
Manifold regularization
In machine learning, manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition. == Manifold regularizer == === Motivation === Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions. In particular, manifold regularization extends the technique of Tikhonov regularization as applied to Reproducing kernel Hilbert spaces (RKHSs). Under standard Tikhonov regularization on RKHSs, a learning algorithm attempts to learn a function f {\displaystyle f} from among a hypothesis space of functions H {\displaystyle {\mathcal {H}}} . The hypothesis space is an RKHS, meaning that it is associated with a kernel K {\displaystyle K} , and so every candidate function f {\displaystyle f} has a norm ‖ f ‖ K {\displaystyle \left\|f\right\|_{K}} , which represents the complexity of the candidate function in the hypothesis space. When the algorithm considers a candidate function, it takes its norm into account in order to penalize complex functions. Formally, given a set of labeled training data ( x 1 , y 1 ) , … , ( x ℓ , y ℓ ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{\ell },y_{\ell })} with x i ∈ X , y i ∈ Y {\displaystyle x_{i}\in X,y_{i}\in Y} and a loss function V {\displaystyle V} , a learning algorithm using Tikhonov regularization will attempt to solve the expression arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ ‖ f ‖ K 2 {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma \left\|f\right\|_{K}^{2}} where γ {\displaystyle \gamma } is a hyperparameter that controls how much the algorithm will prefer simpler functions over functions that fit the data better. Manifold regularization adds a second regularization term, the intrinsic regularizer, to the ambient regularizer used in standard Tikhonov regularization. Under the manifold assumption in machine learning, the data in question do not come from the entire input space X {\displaystyle X} , but instead from a nonlinear manifold M ⊂ X {\displaystyle M\subset X} . The geometry of this manifold, the intrinsic space, is used to determine the regularization norm. === Laplacian norm === There are many possible choices for the intrinsic regularizer ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} . Many natural choices involve the gradient on the manifold ∇ M {\displaystyle \nabla _{M}} , which can provide a measure of how smooth a target function is. A smooth function should change slowly where the input data are dense; that is, the gradient ∇ M f ( x ) {\displaystyle \nabla _{M}f(x)} should be small where the marginal probability density P X ( x ) {\displaystyle {\mathcal {P}}_{X}(x)} , the probability density of a randomly drawn data point appearing at x {\displaystyle x} , is large. This gives one appropriate choice for the intrinsic regularizer: ‖ f ‖ I 2 = ∫ x ∈ M ‖ ∇ M f ( x ) ‖ 2 d P X ( x ) {\displaystyle \left\|f\right\|_{I}^{2}=\int _{x\in M}\left\|\nabla _{M}f(x)\right\|^{2}\,d{\mathcal {P}}_{X}(x)} In practice, this norm cannot be computed directly because the marginal distribution P X {\displaystyle {\mathcal {P}}_{X}} is unknown, but it can be estimated from the provided data. === Graph-based approach of the Laplacian norm === When the distances between input points are interpreted as a graph, then the Laplacian matrix of the graph can help to estimate the marginal distribution. Suppose that the input data include ℓ {\displaystyle \ell } labeled examples (pairs of an input x {\displaystyle x} and a label y {\displaystyle y} ) and u {\displaystyle u} unlabeled examples (inputs without associated labels). Define W {\displaystyle W} to be a matrix of edge weights for a graph, where W i j {\displaystyle W_{ij}} is a similarity built from distance measure between the data points x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} (so that more close implies higher W i j {\displaystyle W_{ij}} ). Define D {\displaystyle D} to be a diagonal matrix with D i i = ∑ j = 1 ℓ + u W i j {\displaystyle D_{ii}=\sum _{j=1}^{\ell +u}W_{ij}} and L {\displaystyle L} to be the Laplacian matrix D − W {\displaystyle D-W} . Then, as the number of data points ℓ + u {\displaystyle \ell +u} increases, L {\displaystyle L} converges to the Laplace–Beltrami operator Δ M {\displaystyle \Delta _{M}} , which is the divergence of the gradient ∇ M {\displaystyle \nabla _{M}} . Then, if f {\displaystyle \mathbf {f} } is a vector of the values of f {\displaystyle f} at the data, f = [ f ( x 1 ) , … , f ( x l + u ) ] T {\displaystyle \mathbf {f} =[f(x_{1}),\ldots ,f(x_{l+u})]^{\mathrm {T} }} , the intrinsic norm can be estimated: ‖ f ‖ I 2 = 1 ( ℓ + u ) 2 f T L f {\displaystyle \left\|f\right\|_{I}^{2}={\frac {1}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As the number of data points ℓ + u {\displaystyle \ell +u} increases, this empirical definition of ‖ f ‖ I 2 {\displaystyle \left\|f\right\|_{I}^{2}} converges to the definition when P X {\displaystyle {\mathcal {P}}_{X}} is known. === Solving the regularization problem with graph-based approach === Using the weights γ A {\displaystyle \gamma _{A}} and γ I {\displaystyle \gamma _{I}} for the ambient and intrinsic regularizers, the final expression to be solved becomes: arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ A ‖ f ‖ K 2 + γ I ( ℓ + u ) 2 f T L f {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma _{A}\left\|f\right\|_{K}^{2}+{\frac {\gamma _{I}}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As with other kernel methods, H {\displaystyle {\mathcal {H}}} may be an infinite-dimensional space, so if the regularization expression cannot be solved explicitly, it is impossible to search the entire space for a solution. Instead, a representer theorem shows that under certain conditions on the choice of the norm ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} , the optimal solution f ∗ {\displaystyle f^{}} must be a linear combination of the kernel centered at each of the input points: for some weights α i {\displaystyle \alpha _{i}} , f ∗ ( x ) = ∑ i = 1 ℓ + u α i K ( x i , x ) {\displaystyle f^{}(x)=\sum _{i=1}^{\ell +u}\alpha _{i}K(x_{i},x)} Using this result, it is possible to search for the optimal solution f ∗ {\displaystyle f^{}} by searching the finite-dimensional space defined by the possible choices of α i {\displaystyle \alpha _{i}} . === Functional approach of the Laplacian norm === The idea beyond the graph-Laplacian is to use neighbors to estimate the Laplacian. This method is akin to local averaging methods, that are known to scale poorly in high-dimensional problems. Indeed, the graph Laplacian is known to suffer from the curse of dimensionality. Luckily, it is possible to leverage expected smoothness of the function to estimate thanks to more advanced functional analysis. This method consists of estimating the Laplacian operator using derivatives of the kernel reading ∂ 1 , j K ( x i , x ) {\displaystyle \partial _{1,j}K(x_{i},x)} where ∂ 1 , j {\displaystyle \partial _{1,j}} denotes the partial derivatives according to the j-th coordinate of the first variable. This second approach to the Laplacian norm is to put in relation with meshfree methods, that contrast with the finite difference method in PDE. == Applications == Manifold regularization can extend a variety of algorithms that can be expressed using Tikhonov regularization, by choosing an appropriate loss function V {\displaystyle V} and hypothesis space H {\displaystyle {\mathcal {H}}} . Two commonly used examples are the families of support vector machines and regularized least squares algorithm
Tokken
Tokken is a payment system and mobile app most known for being a legal and secure option for businesses transactions within the cannabis industry, because of its compliance with bank requirements. The startup company was created by Lamine Zarrad, a former regulator at the Office of the Comptroller of the Currency. == Operability == In order for a person to start using the app, they need to provide evidence, in the form of bioidentification data and mobile carrier records, that they can legally purchase weed. After they have been verified, customers can pay directly through the app at any dispensary that is using Tokken. Tokken turns credit card transactions into a digital token, which can be exchanged back for money that can later be deposited into a bank account. All transactions are logged publicly through a blockchain leger, making the process both anonymous and verified. === Banking services === Tokken has a "pay taxes" function which enables dispensaries to pay their taxes directly to the department.
Digital supply chain security
Digital supply chain security refers to efforts to enhance cyber security within the supply chain. It is a subset of supply chain security and is focused on the management of cyber security requirements for information technology systems, software and networks, which are driven by threats such as cyber-terrorism, malware, data theft and the advanced persistent threat (APT). Typical supply chain cyber security activities for minimizing risks include buying only from trusted vendors, disconnecting critical machines from outside networks, and educating users on the threats and protective measures they can take. The acting deputy undersecretary for the National Protection and Programs Directorate for the United States Department of Homeland Security, Greg Schaffer, stated at a hearing that he is aware that there are instances where malware has been found on imported electronic and computer devices sold within the United States. == Examples of supply chain cyber security threats == Network or computer hardware that is delivered with malware installed on it already. Malware that is inserted into software or hardware (by various means) Vulnerabilities in software applications and networks within the supply chain that are discovered by malicious hackers Counterfeit computer hardware == Related U.S. government efforts == Comprehensive National Cyber Initiative Defense Procurement Regulations: Noted in section 806 of the National Defense Authorization Act International Strategy for Cyberspace: White House lays out for the first time the U.S.’s vision for a secure and open Internet. The strategy outlines three main themes: diplomacy, development and defense. Diplomacy: The strategy sets out to “promote an open, interoperable, secure and reliable information and communication infrastructure” by establishing norms of acceptable state behavior built through consensus among nations. Development: Through this strategy the government seeks to “facilitate cybersecurity capacity-building abroad, bilaterally and through multilateral organizations.” The objective is to protect the global IT infrastructure and to build closer international partnerships to sustain open and secure networks. Defense: The strategy calls out that the government “will ensure that the risks associated with attacking or exploiting our networks vastly outweigh the potential benefits” and calls for all nations to investigate, apprehend and prosecute criminals and non-state actors who intrude and disrupt network systems. == Related government efforts around the world == Common Criteria offers with Evaluation Assurance Level(EAL) 4 an opportunity to evaluate all relevant aspects of the digital supply chain security like the product, the development environment, IT systems security, the processes in human resource, physical security and with the module ALC_FLR.3 (Systematic Flaw Remediation) also security update processes and methods even by physical site visits. EAL 4 is mutually recognized in countries that signed the SOGIS-MRA and up to ELA 2 in countries the signed the CCRA but including ALC_FRL.3. Russia: Russia has had non-disclosed functionality certification requirements for several years and has recently initiated the National Software Platform effort based on open-source software. This reflects the apparent desire for national autonomy, reducing dependence on foreign suppliers. India: Recognition of supply chain risk in its draft National Cybersecurity Strategy. Rather than targeting specific products for exclusion, it is considering Indigenous Innovation policies, giving preferences to domestic ITC suppliers in order to create a robust, globally competitive national presence in the sector. China: Deriving from goals in the 11th Five Year Plan (2006–2010), China introduced and pursued a mix of security-focused and aggressive Indigenous Innovation policies. China is requiring an indigenous innovation product catalog be used for its government procurement and implementing a Multi-level Protection Scheme (MLPS) which requires (among other things) product developers and manufacturers to be Chinese citizens or legal persons, and product core technology and key components must have independent Chinese or indigenous intellectual property rights. == Private sector efforts == SLSA (Supply-chain Levels for Software Artifacts) is an end-to-end framework for ensuring the integrity of software artifacts throughout the software supply chain. The requirements are inspired by Google’s internal "Binary Authorization for Borg" that has been in use for the past 8+ years and that is mandatory for all of Google's production workloads. The goal of SLSA is to improve the state of the industry, particularly open source, to defend against the most pressing integrity threats. With SLSA, consumers can make informed choices about the security posture of the software they consume. == Other references == Financial Sector Information Sharing and Analysis Center International Strategy for Cyberspace (from the White House) NSTIC SafeCode Whitepaper Archived 2013-10-21 at the Wayback Machine Trusted Technology Forum and the Open Trusted Technology Provider Standard (O-TTPS) Archived 2012-01-03 at the Wayback Machine Cyber Supply Chain Security Solution Malware Implants in Firmware Supply Chain in the Software Era INFORMATION AND COMMUNICATIONS TECHNOLOGY SUPPLY CHAIN RISK MANAGEMENT TASK FORCE: INTERIM REPORT
Order-independent transparency
Order-independent transparency (OIT) is a class of techniques in rasterisational computer graphics for rendering transparency in a 3D scene, which do not require rendering geometry in sorted order for alpha compositing. == Description == Commonly, 3D geometry with transparency is rendered by blending (using alpha compositing) all surfaces into a single buffer (think of this as a canvas). Each surface occludes existing color and adds some of its own color depending on its alpha value, a ratio of light transmittance. The order in which surfaces are blended affects the total occlusion or visibility of each surface. For a correct result, surfaces must be blended from farthest to nearest or nearest to farthest, depending on the alpha compositing operation, over or under. Ordering may be achieved by rendering the geometry in sorted order, for example sorting triangles by depth, but can take a significant amount of time, not always produce a solution (in the case of intersecting or circularly overlapping geometry) and the implementation is complex. Instead, order-independent transparency sorts geometry per-pixel, after rasterisation. For exact results this requires storing all fragments before sorting and compositing. == History == The A-buffer is a computer graphics technique introduced in 1984 which stores per-pixel lists of fragment data (including micro-polygon information) in a software rasteriser, REYES, originally designed for anti-aliasing but also supporting transparency. More recently, depth peeling in 2001 described a hardware accelerated OIT technique. With limitations in graphics hardware the scene's geometry had to be rendered many times. A number of techniques have followed, to improve on the performance of depth peeling, still with the many-pass rendering limitation. For example, Dual Depth Peeling (2008). In 2009, two significant features were introduced in GPU hardware/drivers/Graphics APIs that allowed capturing and storing fragment data in a single rendering pass of the scene, something not previously possible. These are, the ability to write to arbitrary GPU memory from shaders and atomic operations. With these features a new class of OIT techniques became possible that do not require many rendering passes of the scene's geometry. The first was storing the fragment data in a 3D array, where fragments are stored along the z dimension for each pixel x/y. In practice, most of the 3D array is unused or overflows, as a scene's depth complexity is typically uneven. To avoid overflow the 3D array requires large amounts of memory, which in many cases is impractical. Two approaches to reducing this memory overhead exist. Packing the 3D array with a prefix sum scan, or linearizing, removed the unused memory issue but requires an additional depth complexity computation rendering pass of the geometry. The "Sparsity-aware" S-Buffer, Dynamic Fragment Buffer, "deque" D-Buffer, Linearized Layered Fragment Buffer all pack fragment data with a prefix sum scan and are demonstrated with OIT. Storing fragments in per-pixel linked lists provides tight packing of this data and in late 2011, driver improvements reduced the atomic operation contention overhead making the technique very competitive. == Exact OIT == Exact, as opposed to approximate, OIT accurately computes the final color, for which all fragments must be sorted. For high depth complexity scenes, sorting becomes the bottleneck. One issue with the sorting stage is local memory limited occupancy, in this case a SIMT attribute relating to the throughput and operation latency hiding of GPUs. Backwards memory allocation (BMA) groups pixels by their depth complexity and sorts them in batches to improve the occupancy and hence performance of low depth complexity pixels in the context of a potentially high depth complexity scene. Up to a 3× overall OIT performance increase is reported. Sorting is typically performed in a local array, however performance can be improved further by making use of the GPU's memory hierarchy and sorting in registers, similarly to an external merge sort, especially in conjunction with BMA. == Approximate OIT == Approximate OIT techniques relax the constraint of exact rendering to provide faster results. Higher performance can be gained from not having to store all fragments or only partially sorting the geometry. A number of techniques also compress, or reduce, the fragment data. These include: Stochastic Transparency: draw in a higher resolution in full opacity but discard some fragments. Downsampling will then yield transparency. Adaptive Transparency, a two-pass technique where the first constructs a visibility function which compresses on the fly (this compression avoids having to fully sort the fragments) and the second uses this data to composite unordered fragments. Intel's pixel synchronization avoids the need to store all fragments, removing the unbounded memory requirement of many other OIT techniques. Weighted Blended Order-Independent Transparency replaced the over operator with a commutative approximation. Feeding depth information into the weight produces visually-acceptable occlusion. == OIT in Hardware == The Sega Dreamcast games console included hardware support for automatic OIT.
Similarity learning
Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.
Whitelist
A whitelist or allowlist is a list or register of entities that are being provided a particular privilege, service, mobility, access or recognition. Entities on the list will be accepted, approved and/or recognized. Whitelisting is the reverse of blacklisting, the practice of identifying entities that are denied, unrecognized, or ostracized. == Email whitelists == Spam filters often include the ability to "whitelist" certain sender IP addresses, email addresses or domain names to protect their email from being rejected or sent to a junk mail folder. These can be manually maintained by the user or system administrator - but can also refer to externally maintained whitelist services. === Non-commercial whitelists === Non-commercial whitelists are operated by various non-profit organizations, ISPs, and others interested in blocking spam. Rather than paying fees, the sender must pass a series of tests; for example, their email server must not be an open relay and have a static IP address. The operator of the whitelist may remove a server from the list if complaints are received. === Commercial whitelists === Commercial whitelists are a system by which an Internet service provider allows someone to bypass spam filters when sending email messages to its subscribers, in return for a pre-paid fee, either an annual or a per-message fee. A sender can then be more confident that their messages have reached recipients without being blocked, or having links or images stripped out of them, by spam filters. The purpose of commercial whitelists is to allow companies to reliably reach their customers by email. == Advertising whitelist == Many websites rely on ads as a source of revenue, but the use of ad blockers is increasingly common. Websites that detect an adblocker in use often ask for it to be disabled - or their site to be "added to the whitelist" - a standard feature of most adblockers. == Network whitelists == === LAN whitelists === A use for whitelists is in local area network (LAN) security. Many network admins set up MAC address whitelists, or a MAC address filter, to control who is allowed on their networks. This is used when encryption is not a practical solution or in tandem with encryption. However, it's sometimes ineffective because a MAC address can be faked. === IP whitelist === Firewalls can usually be configured to only allow data-traffic from/to certain (ranges of) IP-addresses. === Application whitelists === One approach in combating viruses and malware is to whitelist software which is considered safe to run, blocking all others. This is particularly attractive in a corporate environment, where there are typically already restrictions on what software is approved. Leading providers of application whitelisting technology include Bit9, Velox, McAfee, Lumension, ThreatLocker, Airlock Digital and SMAC. On Microsoft Windows, recent versions include AppLocker, which allows administrators to control which executable files are denied or allowed to execute. With AppLocker, administrators are able to create rules based on file names, publishers or file location that will allow certain files to execute. Rules can apply to individuals or groups. Policies are used to group users into different enforcement levels. For example, some users can be added to a report-only policy that will allow administrators to understand the impact before moving that user to a higher enforcement level. Linux systems typically have AppArmor and SE Linux features available which can be used to effectively block all applications which are not explicitly whitelisted, and commercial products are also available. On HP-UX introduced a feature called "HP-UX Whitelisting" on 11iv3 version. == Controversy regarding name == In 2018, a journal commentary on a report on predatory publishing was released making claims that "white" and "black" are racially charged terms that need to be avoided in instances such as "whitelist" and "blacklist". The premise of the journal is that "black" and "white" have negative and positive connotations respectively. It states that since "blacklisting" was first referred to during "the time of mass enslavement and forced deportation of Africans to work in European-held colonies in the Americas," the word is therefore related to race. There is no mention of "whitelist" and its origin or relation to race. This issue is most widely disputed in computing industries where "whitelist" and "blacklist" are prevalent (e.g. IP whitelisting). Despite the commentary nature of the journal, some companies and individuals in others have taken to replacing "whitelist" and "blacklist" with new alternatives such as "allow list" and "deny list". Those adopting this change consider using the "whitelist"/"blacklist" names as a code smell. Those that oppose these changes question its attribution to race, citing the same etymology quote that the 2018 journal uses. According to the remark, the term "blacklist" evolved from the term "black book" about a century ago. The term "black book" does not appear to have any etymology or sources that support racial associations, instead originating in the 1400s as a reference to "a list of people who had committed crimes or fallen out of favor with leaders", and popularized by King Henry VIII's literal use of a black book. Others also note the prevalence of positive and negative connotations to "white" and "black" in the Bible, predating attributions to skin tone and slavery. It wasn't until the 1960s Black Power movement that "Black" became a widespread word to refer to one's race as a person of color in America (alternate to African-American) lending itself to the argument that the negative connotation behind "black" and "blacklist" both predate attribution to race.