Two-phase commit protocol

Two-phase commit protocol

In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC, tupac) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction. This protocol (a specialised type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable). The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol. == Assumptions == The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that: there is stable storage at each node with a write-ahead log, no node crashes forever, the data in the write-ahead log is never lost or corrupted in a crash, and any two nodes can communicate with each other. The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost. The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant. == Basic algorithm == === Commit request (or voting) phase === The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants. The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log. Each participant replies with: either an agreement message (participant votes Yes to commit), if the participant's actions succeeded; or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit. === Commit (or completion) phase === ==== Success ==== If the coordinator received an agreement message from all participants during the commit-request phase: The coordinator sends a commit message to all the participants. Each participant completes the operation, and releases all the locks and resources held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator completes the transaction when all acknowledgements have been received. ==== Failure ==== If any participant votes No during the commit-request phase (or the coordinator's timeout expires): The coordinator sends a rollback message to all the participants. Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator undoes the transaction when all acknowledgements have been received. ==== Message flow ==== Coordinator Participant QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare/abort <------------------------------- commit/abort COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGEMENT commit/abort <-------------------------------- end An next to the record type means that the record is forced to stable storage. == Disadvantages == The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received. A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond. == Implementing the two-phase commit protocol == === Common architecture === In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named transaction managers (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., The Open Group's X/Open XA). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively. This common architecture is also effective for the distribution of other atomic commitment protocols besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants. === Protocol optimizations === Database research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations and protocol operations saving under certain system's behavior assumptions. ==== Presumed abort and presumed commit ==== Presumed abort or Presumed commit are common such optimizations. An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typical

Vote Compass

Vote Compass is an interactive, online voting advice application developed by political scientists and run during election campaigns. It surveys users about their political views and, based on their responses, calculates the individual alignment of each user with the parties or candidates running in a given election contest. It is operated by a social enterprise called Vox Pop Labs in partnership with locale-specific news organizations, including the Wall Street Journal, Vox Media, the Canadian and Australian Broadcasting Corporations, Television New Zealand, France24, RTL Group, and Grupo Globo. Vote Compass also operates under the trademarks Boussole électorale and Wahl-Navi for French- and German-language iterations, respectively. == Background == Vote Compass was developed by Clifton van der Linden, a professor in the Department of Political Science at McMaster University. It is run by van der Linden along with a team of social and statistical scientists from Vox Pop Labs. Although inspired by European Voting Advice Applications, van der Linden explicitly rejects this terminology, arguing that Vote Compass was "never intended to account for every variable that influences voter choice and its results should not be interpreted as voting advice." == Methodology == Using a Likert scale, users indicate their responses to a series of policy propositions designed to discriminate between candidates' policies on prominent issues relevant to the election. Propositions are crafted in collaboration with political scientists local to each jurisdiction in which Vote Compass is run. Based on a candidate or political party's public disclosures (i.e. party manifestos, policy proposals, official websites, speeches, media releases, statements made in the legislature, etc.) they are calibrated on the same propositions and scales as are users. A series of aggregation algorithms calculate the overall distance between the user and the candidates or parties. There have been claims that Vote Compass surveys have the potential to become push polling, if the survey questions posed are poorly designed.

Emotion-sensitive software

Emotion-sensitive software (ESS) is software specifically designed to target and monitor emotional response in a human being. Some software measures anger by comparing the pitch of a voice to a regular, or calm, pitch. Another approach is the measurement of physical appearance. If a camera or similar recording device picks up a certain amount of red pigmentation in the skin the system can be alerted that this person is angered. The competitive landscape in the Electronic Surveillance Software (ESS) industry is marked by a high level of secrecy regarding the operational details of these software systems. Many producers deliberately withhold information about the inner workings of their ESS products, a strategy that serves dual purposes: firstly, it intensifies competition among companies in the sector, as each strives to maintain a unique edge without revealing trade secrets that could be leveraged by competitors; secondly, this secrecy acts as a deterrent against individuals or entities who might try to circumvent the surveillance mechanisms. One application of ESS was developed by University of Notre Dame Assistant Professor of Psychology Sidney D'Mello, Art Graesser from the University of Memphis and a colleague from Massachusetts Institute of Technology. They used the technology to create an electronic tutor that could assess a student's level of boredom and frustration based on facial expression and body language, and react accordingly.

Paper data storage

Paper data storage refers to the use of paper as a data storage device. This includes writing, illustrating, and the use of data that can be interpreted by a machine or is the result of the functioning of a machine. A defining feature of paper data storage is the ability of humans to produce it with only simple tools and interpret it visually. Though now mostly obsolete, paper was once an important form of computer data storage as both paper tape and punch cards were a common staple of working with computers before the 1980s. == History == Before paper was used for storing data, it had been used in several applications for storing instructions to specify a machine's operation. The earliest use of paper to store instructions for a machine was the work of Basile Bouchon who, in 1725, used punched paper rolls to control textile looms. This technology was later developed into the wildly successful Jacquard loom. The 19th century saw several other uses of paper for controlling machines. In 1846, telegrams could be prerecorded on punched tape and rapidly transmitted using Alexander Bain's automatic telegraph. Several inventors took the concept of a mechanical organ and used paper to represent the music. In the late 1880s Herman Hollerith invented the recording of data on a medium that could then be read by a machine. Prior uses of machine readable media, above, had been for control (automatons, piano rolls, looms, ...), not data. "After some initial trials with paper tape, he settled on punched cards..." Hollerith's method was used in the 1890 census. Hollerith's company eventually became the core of IBM. Other technologies were also developed that allowed machines to work with marks on paper instead of punched holes. This technology was widely used for tabulating votes and grading standardized tests. Banks used magnetic ink on checks, supporting MICR scanning. In an early electronic computing device, the Atanasoff–Berry Computer, electric sparks were used to singe small holes in paper cards to represent binary data. The altered dielectric constant of the paper at the location of the holes could then be used to read the binary data back into the machine by means of electric sparks of lower voltage than the sparks used to create the holes. This form of paper data storage was never made reliable and was not used in any subsequent machine. == Modern techniques == === 1D barcodes === Barcodes make it possible for any object that was to be sold or transported to have some computer readable information securely attached to it. Universal Product Code barcodes, first used in 1974, are ubiquitous today. Some people recommend a width of at least 3 pixels for each minimum-width gap and each minimum-width bar for 1D barcodes. The density is about 50 bits per linear inch (about 2 bit/mm). === 2D barcodes === 2D barcodes allow to store much more data on paper, up to 2.9 kbyte per barcode. It is recommended to have a width of at least 4 pixels—e.g., a 4 × 4 pixel = 16 pixel module. == Limits == The limits of data storage depend on the technology to write and read such data. The theoretical limits assume a scanner that can perfectly reproduce the printed image at its printing resolution, and a program which can accurately interpret such an image. For example, an 8 in × 10 in (200 mm × 250 mm) 600 dpi black-and-white image contains 3.43 MiB of data, as does a 300 dpi CMYK printed image. A 2,400 ppi True color (24-bit) image contains about 1.29 GiB of information; printing an image maintaining this data would require a printing resolution of about 120,000 dpi in black and white, or 60,000 dpi with CMYK dots.

Dependency network (graphical model)

Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents. == Markov blanket == In a Bayesian network, the Markov blanket of a node is the set of parents and children of that node, together with the children's parents. The values of the parents and children of a node evidently give information about that node. However, its children's parents also have to be included in the Markov blanket, because they can be used to explain away the node in question. In a Markov random field, the Markov blanket for a node is simply its adjacent (or neighboring) nodes. In a dependency network, the Markov blanket for a node is simply the set of its parents. == Dependency network versus Bayesian networks == Dependency networks have advantages and disadvantages with respect to Bayesian networks. In particular, they are easier to parameterize from data, as there are efficient algorithms for learning both the structure and probabilities of a dependency network from data. Such algorithms are not available for Bayesian networks, for which the problem of determining the optimal structure is NP-hard. Nonetheless, a dependency network may be more difficult to construct using a knowledge-based approach driven by expert-knowledge. == Dependency networks versus Markov networks == Consistent dependency networks and Markov networks have the same representational power. Nonetheless, it is possible to construct non-consistent dependency networks, i.e., dependency networks for which there is no compatible valid joint probability distribution. Markov networks, in contrast, are always consistent. == Definition == A consistent dependency network for a set of random variables X = ( X 1 , … , X n ) {\textstyle \mathbf {X} =(X_{1},\ldots ,X_{n})} with joint distribution p ( x ) {\displaystyle p(\mathbf {x} )} is a pair ( G , P ) {\displaystyle (G,P)} where G {\displaystyle G} is a cyclic directed graph, where each of its nodes corresponds to a variable in X {\displaystyle \mathbf {X} } , and P {\displaystyle P} is a set of conditional probability distributions. The parents of node X i {\displaystyle X_{i}} , denoted P a i {\displaystyle \mathbf {Pa_{i}} } , correspond to those variables P a i ⊆ ( X 1 , … , X i − 1 , X i + 1 , … , X n ) {\displaystyle \mathbf {Pa_{i}} \subseteq (X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})} that satisfy the following independence relationships p ( x i ∣ p a i ) = p ( x i ∣ x 1 , … , x i − 1 , x i + 1 , … , x n ) = p ( x i ∣ x − x i ) . {\displaystyle p(x_{i}\mid \mathbf {pa_{i}} )=p(x_{i}\mid x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})=p(x_{i}\mid \mathbf {x} -{x_{i}}).} The dependency network is consistent in the sense that each local distribution can be obtained from the joint distribution p ( x ) {\displaystyle p(\mathbf {x} )} . Dependency networks learned using large data sets with large sample sizes will almost always be consistent. A non-consistent network is a network for which there is no joint probability distribution compatible with the pair ( G , P ) {\displaystyle (G,P)} . In that case, there is no joint probability distribution that satisfies the independence relationships subsumed by that pair. == Structure and parameters learning == Two important tasks in a dependency network are to learn its structure and probabilities from data. Essentially, the learning algorithm consists of independently performing a probabilistic regression or classification for each variable in the domain. It comes from observation that the local distribution for variable X i {\displaystyle X_{i}} in a dependency network is the conditional distribution p ( x i | x − x i ) {\displaystyle p(x_{i}|\mathbf {x} -{x_{i}})} , which can be estimated by any number of classification or regression techniques, such as methods using a probabilistic decision tree, a neural network or a probabilistic support-vector machine. Hence, for each variable X i {\displaystyle X_{i}} in domain X {\displaystyle X} , we independently estimate its local distribution from data using a classification algorithm, even though it is a distinct method for each variable. Here, we will briefly show how probabilistic decision trees are used to estimate the local distributions. For each variable X i {\displaystyle X_{i}} in X {\displaystyle \mathbf {X} } , a probabilistic decision tree is learned where X i {\displaystyle X_{i}} is the target variable and X − X i {\displaystyle \mathbf {X} -X_{i}} are the input variables. To learn a decision tree structure for X i {\displaystyle X_{i}} , the search algorithm begins with a singleton root node without children. Then, each leaf node in the tree is replaced with a binary split on some variable X j {\displaystyle X_{j}} in X − X i {\displaystyle \mathbf {X} -X_{i}} , until no more replacements increase the score of the tree. == Probabilistic Inference == A probabilistic inference is the task in which we wish to answer probabilistic queries of the form p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} , given a graphical model for X {\displaystyle \mathbf {X} } , where Y {\displaystyle \mathbf {Y} } (the 'target' variables) Z {\displaystyle \mathbf {Z} } (the 'input' variables) are disjoint subsets of X {\displaystyle \mathbf {X} } . One of the alternatives for performing probabilistic inference is using Gibbs sampling. A naive approach for this uses an ordered Gibbs sampler, an important difficulty of which is that if either p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} or p ( z ) {\displaystyle p(\mathbf {z} )} is small, then many iterations are required for an accurate probability estimate. Another approach for estimating p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} when p ( z ) {\displaystyle p(\mathbf {z} )} is small is to use modified ordered Gibbs sampler, where Z = z {\displaystyle \mathbf {Z=z} } is fixed during Gibbs sampling. It may also happen that y {\displaystyle \mathbf {y} } is rare, e.g. when Y {\displaystyle \mathbf {Y} } has many variables. So, the law of total probability along with the independencies encoded in a dependency network can be used to decompose the inference task into a set of inference tasks on single variables. This approach comes with the advantage that some terms may be obtained by direct lookup, thereby avoiding some Gibbs sampling. You can see below an algorithm that can be used for obtain p ( y | z ) {\displaystyle p(\mathbf {y|z} )} for a particular instance of y ∈ Y {\displaystyle \mathbf {y} \in \mathbf {Y} } and z ∈ Z {\displaystyle \mathbf {z} \in \mathbf {Z} } , where Y {\displaystyle \mathbf {Y} } and Z {\displaystyle \mathbf {Z} } are disjoint subsets. Algorithm 1: U := Y {\displaystyle \mathbf {U:=Y} } ( the unprocessed variables ) P := Z {\displaystyle \mathbf {P:=Z} } ( the processed and conditioning variables ) p := z {\displaystyle \mathbf {p:=z} } ( the values for P {\displaystyle \mathbf {P} } ) While U ≠ ∅ {\displaystyle \mathbf {U} \neq \emptyset } : Choose X i ∈ U {\displaystyle X_{i}\in \mathbf {U} } such that X i {\displaystyle X_{i}} has no more parents in U {\displaystyle U} than any variable in U {\displaystyle U} If all the parents of X {\displaystyle X} are in P {\displaystyle \mathbf {P} } p ( x i | p ) := p ( x i | p a i ) {\displaystyle p(x_{i}|\mathbf {p} ):=p(x_{i}|\mathbf {pa_{i}} )} Else Use a modified ordered Gibbs sampler to determine p ( x i | p ) {\displaystyle p(x_{i}|\mathbf {p} )} U := U − X i {\displaystyle \mathbf {U:=U} -X_{i}} P := P + X i {\displaystyle \mathbf {P:=P} +X_{i}} p := p + x i {\displaystyle \mathbf {p:=p} +x_{i}} Returns the product of the conditionals p ( x i | p ) {\displaystyle p(x_{i}|\mathbf {p} )} == Applications == In addition to the applications to probabilistic inference, the following applications are in the category of Collaborative Filtering (CF), which is the task of predicting preferences. Dependency networks are a natural model class on which to base CF predictions, once an algorithm for this task only needs estimation of p ( x i = 1 | x − x i = 0 ) {\displaystyle p(x_{i}=1|\mathbf {x} -{x_{i}}=0)} to produce recommendations. In particular, these estimates may be obtained by a direct lookup in a dependency network. Predicting what movies a person will like based on his or her ratings of movies seen; Predicting what web pages a person will access based on his or her history on the site; Predicting what news stories a person is interested in based on other stories he or she read; Predicting what product a person will buy based on products he or she has already purchased and/or dropped into his or her shopping basket. Another class of useful applications for dependency networks is related to data visualization, that is

Machine unlearning

Machine unlearning is a branch of machine learning focused on removing specific undesired element, such as private data, wrong or manipulated training data, outdated information, copyrighted material, harmful content, dangerous abilities, or misinformation, without needing to rebuild models from the ground up. Large language models, like the ones powering ChatGPT, may be asked not just to remove specific elements but also to unlearn a "concept," "fact," or "knowledge," which aren't easily linked to specific examples. New terms such as "model editing," "concept editing," and "knowledge unlearning" have emerged to describe this process. == History == Early research efforts were largely motivated by Article 17 of the GDPR, the European Union's privacy regulation commonly known as the "right to be forgotten" (RTBF), introduced in 2014. The GDPR did not anticipate that the development of large language models would make data erasure a complex task. This issue has since led to research on "machine unlearning," with a growing focus on removing copyrighted material, harmful content, dangerous capabilities, and misinformation. Just as early experiences in humans shape later ones, some concepts are more fundamental and harder to unlearn. A piece of knowledge may be so deeply embedded in the model's knowledge graph that unlearning it could cause internal contradictions, requiring adjustments to other parts of the graph to resolve them. Researchers have now also started studying unlearning in the context of removing incorrect or adversarially manipulated training data such as systematically biased labels or poisoning attacks. == Motivations == At present, machine unlearning is motivated by a growing range of concerns that extend well beyond the field's original focus on data privacy. A widely used taxonomy in the literature distinguishes two high-level categories of motivation. Access revocation covers cases where a data subject or rights holder requests the removal of data they own or control. This is most commonly associated with RTBF established by the European Union's General Data Protection Regulation (GDPR) and analogous legislation such as the California Consumer Privacy Act (CCPA). These regulations grant individuals the legal right to request erasure of their personal data from any system that has processed it, including models that were trained on it. Access revocation also encompasses the removal of copyrighted or pay-walled content that was incorporated into training corpora without the necessary licenses, a concern that has become prominent with the widespread use of largely web-scraped pre-training datasets. Model correction covers cases where the model exhibits undesirable behavior arising from the training data, regardless of any individual's request. This includes: Removal of toxic, biased, or unsafe outputs introduced by harmful content in the training set Correction of stale or factually incorrect associations, such as outdated knowledge encoded in a deployed model Removal of dangerous capabilities, such as detailed knowledge of the synthesis of chemical or biological agents Correction of the influence of data poisoning or adversarial attacks that have corrupted model behavior This second category has been formalized as corrective machine unlearning, which frames unlearning as a post-training mechanism for repairing the effects of bad or harmful training data. It is closely related to the AI safety literature, where data filtering alone has been found insufficient to prevent hazardous knowledge from being encoded in model weights, motivating unlearning as a complementary risk mitigation strategy. A further distinction has been drawn in the literature between removal {eliminating the influence of specific training data on model parameters) and suppression (preventing the model from generating specific outputs regardless of how that knowledge is encoded). These two goals are not equivalent: removing training data does not guarantee meaningful output suppression, and suppressing outputs does not constitute removal of the underlying training data's influence. == SISA Training == SISA is a training strategy consisting of four mechanisms designed to make machine unlearning more efficient by structuring how models are trained and updated. Its goal is to allow a system to remove the influence of specific data points without retraining an entire model from scratch. By reorganizing training data and workflows, SISA reduces the computational burden of unlearning requests. Sharding divides the training dataset into multiple disjoint subsets, or shards. Each shard is used to train a separate model instance. This ensures that a single data point affects only one shard, so unlearning it requires updating only the corresponding shard rather than the full model. Isolation refers to training each shard independently, with nothing shared across shards during the training process. This separation prevents cross-contamination between shards, ensuring that forgetting data in one shard does not require adjustments to any others. Slicing breaks the data within each shard into sequential slices and stores model states after each slice is trained on. When an unlearning request targets a piece of data, the system can roll back to the checkpoint before the point was seen and retrain only from that slice forward. This reduces retraining time even within a shard. Aggregation occurs at inference, when the model is queried. It combines the outputs of each shard to determine the output of the overall model. This is often through majority voting or averaging. This allows SISA-trained systems to behave like a single model despite being composed of multiple shard-level models. Together, these mechanisms enable machine learning systems to forget specific data points with far lower computational cost than full retraining. The trade-off is that sharding and slicing can lead to reduced model accuracy, worse generalization, and increased storage requirements for the intermediate checkpoints. This can be tolerable based on the needs of the individual or organization to comply with "right to be forgotten" or efficiently recover from backdoor attacks. == Algorithms == Machine unlearning algorithms are broadly categorized into exact and approximate methods, reflecting a fundamental trade-off between formal guarantees and computational tractability. === Exact Unlearning === Exact unlearning methods produce a model that is statistically indistinguishable from one retrained from scratch on the dataset with the forget data removed. The canonical framework for exact unlearning is SISA Training (Sharded, Isolated, Sliced, and Aggregated), introduced by Bourtoule et al. (2021). SISA partitions the training dataset into disjoint shards and trains a separate sub-model on each. At inference time, predictions are aggregated across sub-models. When an unlearning request is received, only the sub-model corresponding to the shard containing the target data requires retraining, reducing computational overhead proportionally to the number of shards. Exact methods provide the strongest guarantees but become prohibitively expensive for large pre-trained neural networks and are generally limited to settings where training can be structured in advance. === Approximate Unlearning === Approximate unlearning methods seek to produce a model whose behavior is sufficiently close to an exactly unlearned model without the cost of full retraining. These methods dominate practical applications. Common approaches include: Gradient Ascent: The model is fine-tuned by maximizing the loss on the forget set, directly degrading its performance on targeted data. This is the most direct approach but risks destabilizing performance on retained data. Random Labelling: The model is fine-tuned on the forget set using randomly shuffled labels, confusing its associations with the targeted data while producing a less aggressive weight shift than pure gradient ascent. Gradient Difference: Combines gradient ascent on the forget set with simultaneous gradient descent on the retain set, using the retain objective as a regularizer to preserve general model utility. KL Divergence Regularization: Minimizes the KL divergence between the outputs of the unlearned model and the original model on the retain set, anchoring behavior on data the model should remember. Weight Pruning and Fine-tuning: Parameters with the smallest L1-norm are pruned — targeting weights most weakly associated with general knowledge and potentially most associated with the forget set — followed by fine-tuning on the retain set to restore utility. Layer Reset and Fine-tuning: The first or last k layers are re-initialized to random weights and the model is subsequently fine-tuned on the retain set. This is a coarse but computationally simple approach. Selective Synaptic Dampening: Uses influence functions to estimate the effect of individual trainin

StoredIQ

StoredIQ was a company founded for information lifecycle management (ILM) of unstructured data. Founded in 2001 as Deepfile in Austin, Texas by Jeff Erramouspe, Jeff Bone, Russell Turpin, Rudy Rouhana, Laura Arbilla and Brett Funderburg, the company changed its name in 2005 to StoredIQ. It continued to operate successfully for over a decade until it was acquired in 2012 by IBM. It now serves as a platform for IBM's information life cycle governance, big data governance and enterprise content management technologies. StoredIQ was awarded five patents by the USPTO. The first, originally filed in 2003, enabled unstructured data in file systems to be manipulated in a similar way to information stored in databases. Subsequent patents built upon the patented actionable file system with further enhancements specific to Enterprise Policy Management and expanding the reach of StoredIQ's management capability all the way to individual desktops. In 2008 StoredIQ was recognized as "Best in Compliance" by Network Products Guide. At the same time, StoredIQ was being recognized as a "Top 5 Provider" by the prestigious Socha-Gelbmann eDiscovery survey. There were takeover negotiations with EMC Corporation, initially a strategic investor in StoredIQ, however, the company rejected the approach, leaving EMC to acquire a competitor. The company published a whitepaper titled The Truth About Big Data. This promotion combined with StoredIQ's patented technology led to IBM selecting StoredIQ as the basis for some products.