Pooling layer

Pooling layer

In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, thus reducing the amount of computation and memory required, which makes the model more robust to small variations in the input; and it increases the receptive field of neurons in later layers in the network. == Convolutional neural network pooling == Pooling is most commonly used in convolutional neural networks (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate. As notation, we consider a tensor x ∈ R H × W × C {\displaystyle x\in \mathbb {R} ^{H\times W\times C}} , where H {\displaystyle H} is height, W {\displaystyle W} is width, and C {\displaystyle C} is the number of channels. A pooling layer outputs a tensor y ∈ R H ′ × W ′ × C ′ {\displaystyle y\in \mathbb {R} ^{H'\times W'\times C'}} . We define two variables f , s {\displaystyle f,s} called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables: f H , f W , s H , s W {\displaystyle f_{H},f_{W},s_{H},s_{W}} . The receptive field of an entry in the output tensor, y {\displaystyle y} , are all the entries in x {\displaystyle x} that can affect that entry. === Max pooling === Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps. Define M a x P o o l ( x | f , s ) 0 , 0 , 0 = max ( x 0 : f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{0,0,0}=\max(x_{0:f-1,0:f-1,0})} where 0 : f − 1 {\displaystyle 0:f-1} means the range 0 , 1 , … , f − 1 {\displaystyle 0,1,\dots ,f-1} . Note that we need to avoid the off-by-one error. The next input is M a x P o o l ( x | f , s ) 1 , 0 , 0 = max ( x s : s + f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{1,0,0}=\max(x_{s:s+f-1,0:f-1,0})} and so on. The receptive field of y i , j , c {\displaystyle y_{i,j,c}} is x i s + f − 1 , j s + f − 1 , c {\displaystyle x_{is+f-1,js+f-1,c}} , so in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is:is+f-1,js:js+f-1,c})} If the horizontal and vertical filter size and strides differ, then in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s H : i s H + f H − 1 , j s W : j s W + f W − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is_{H}:is_{H}+f_{H}-1,js_{W}:js_{W}+f_{W}-1,c})} More succinctly, we can write y k = max ( { x k ′ | k ′ in the receptive field of k } ) {\displaystyle y_{k}=\max(\{x_{k'}|k'{\text{ in the receptive field of }}k\})} . If H {\displaystyle H} is not expressible as k s + f {\displaystyle ks+f} where k {\displaystyle k} is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right. Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape R C {\displaystyle \mathbb {R} ^{C}} and the receptive field of y c {\displaystyle y_{c}} is all of x 0 : H , 0 : W , c {\displaystyle x_{0:H,0:W,c}} . That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head. === Average pooling === Average pooling (AvgPool) is similarly defined A v g P o o l ( x | f , s ) i , j , c = a v e r a g e ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) = 1 f 2 ∑ k ∈ i s : i s + f − 1 ∑ l ∈ j s : j s + f − 1 x k , l , c {\displaystyle \mathrm {AvgPool} (x|f,s)_{i,j,c}=\mathrm {average} (x_{is:is+f-1,js:js+f-1,c})={\frac {1}{f^{2}}}\sum _{k\in is:is+f-1}\sum _{l\in js:js+f-1}x_{k,l,c}} Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head. === Interpolations === There are some interpolations of max pooling and average pooling. Mixed Pooling is a linear sum of max pooling and average pooling. That is, M i x e d P o o l ( x | f , s , w ) = w M a x P o o l ( x | f , s ) + ( 1 − w ) A v g P o o l ( x | f , s ) {\displaystyle \mathrm {MixedPool} (x|f,s,w)=w\mathrm {MaxPool} (x|f,s)+(1-w)\mathrm {AvgPool} (x|f,s)} where w ∈ [ 0 , 1 ] {\displaystyle w\in [0,1]} is either a hyperparameter, a learnable parameter, or randomly sampled anew every time. Lp Pooling is similar to average pooling, but uses Lp norm average instead of average: y k = ( 1 N ∑ k ′ in the receptive field of k | x k ′ | p ) 1 / p {\displaystyle y_{k}=\left({\frac {1}{N}}\sum _{k'{\text{ in the receptive field of }}k}|x_{k'}|^{p}\right)^{1/p}} where N {\displaystyle N} is the size of receptive field, and p ≥ 1 {\displaystyle p\geq 1} is a hyperparameter. If all activations are non-negative, then average pooling is the case of p = 1 {\displaystyle p=1} , and max pooling is the case of p → ∞ {\displaystyle p\to \infty } . Square-root pooling is the case of p = 2 {\displaystyle p=2} . Stochastic pooling samples a random activation x k ′ {\displaystyle x_{k'}} from the receptive field with probability x k ′ ∑ k ″ x k ″ {\displaystyle {\frac {x_{k'}}{\sum _{k''}x_{k''}}}} . It is the same as average pooling in expectation. Softmax pooling is like max pooling, but uses softmax, i.e. ∑ k ′ e β x k ′ x k ′ ∑ k ″ e β x k ″ {\displaystyle {\frac {\sum _{k'}e^{\beta x_{k'}}x_{k'}}{\sum _{k''}e^{\beta x_{k''}}}}} where β > 0 {\displaystyle \beta >0} . Average pooling is the case of β ↓ 0 {\displaystyle \beta \downarrow 0} , and max pooling is the case of β ↑ ∞ {\displaystyle \beta \uparrow \infty } Local Importance-based Pooling generalizes softmax pooling by ∑ k ′ e g ( x k ′ ) x k ′ ∑ k ″ e g ( x k ″ ) {\displaystyle {\frac {\sum _{k'}e^{g(x_{k'})}x_{k'}}{\sum _{k''}e^{g(x_{k''})}}}} where g {\displaystyle g} is a learnable function. === Other poolings === Spatial pyramidal pooling applies max pooling (or any other form of pooling) in a pyramid structure. That is, it applies global max pooling, then applies max pooling to the image divided into 4 equal parts, then 16, etc. The results are then concatenated. It is a hierarchical form of global pooling, and similar to global pooling, it is often used just before a classification head. Region of Interest Pooling (also known as RoI pooling) is a variant of max pooling used in R-CNNs for object detection. It is designed to take an arbitrarily-sized input matrix, and output a fixed-sized output matrix. Covariance pooling computes the covariance matrix of the vectors { x k , l , 0 : C − 1 } k ∈ i s : i s + f − 1 , l ∈ j s : j s + f − 1 {\displaystyle \{x_{k,l,0:C-1}\}_{k\in is:is+f-1,l\in js:js+f-1}} which is then flattened to a C 2 {\displaystyle C^{2}} -dimensional vector y i , j , 0 : C 2 − 1 {\displaystyle y_{i,j,0:C^{2}-1}} . Global covariance pooling is used similarly to global max pooling. As average pooling computes the average, which is a first-degree statistic, and covariance is a second-degree statistic, covariance pooling is also called "second-order pooling". It can be generalized to higher-order poolings. Blur Pooling means applying a blurring method before downsampling. For example, the Rect-2 blur pooling means taking an average pooling at f = 2 , s = 1 {\displaystyle f=2,s=1} , then taking every second pixel (identity with s = 2 {\displaystyle s=2} ). == Vision Transformer pooling == In Vision Transformers (ViT), there are the following common kinds of poolings. BERT-like pooling uses a dummy [CLS] token, "classification". For classification, the output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution, which is the network's prediction of class probability distribution. This is the one used by the original ViT and Masked Autoencoder. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multi headed attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. It then applies a feedforward layer F F N {\displaystyle \mathrm {FFN} } on each vector, resulting in a matrix V = [ F F N ( v 1 ) , … , F F N ( v n ) ] {\displaystyle V=[\mathrm {FFN} (v_{1}),\dots ,\mathrm {FFN} (v_{n})]} . This is then sent to a multi-headed attention, resulting in M u l t i h e a d e d A

Digital curation

Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. It is a process that establishes, maintains, and adds value to repositories of digital data for present and future use. The implementation of digital curation is often carried out by archivists, librarians, scientists, historians, and scholars to ensure users have access to reliable, high-quality resources. Enterprises are also starting to adopt digital curation as a means to improve the quality of information and data within their operational and strategic processes. A successful digital curation initiative will help to mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes various aspects, including digital asset management, data curation, digital preservation, and electronic records management. == Word History == Much like the word archive has layered meanings and uses, the word curation is both a noun and a verb, used originally in the field of museology to represent a wide range of activities, most often associated with collection care, long-term preservation, and exhibition design. Curation can be a reference to physical repositories that store cultural heritage or natural resource collections (e.g., a curatorial repository) or a representation of varied policies and processes involved with the long-term care and management of heritage collections, digital archives, and research data (e.g, curatorial/collections management plans, curation life-cycle, and data curation). Yet curation is also associated with short-term objectives and processes of selection and interpretation for the purposes of presentation, such as for gallery exhibitions and websites, which contribute to knowledge creation. It has also been applied to interaction with social media including compiling digital images, web links, and movie files. The term curation entered the legal framework through federal historic preservation laws, starting with the National Historic Preservation Act of 1966, and was further defined and coded into federal regulations through 36 CFR Part 79: Curation of Federally-owned and Administered Archaeological Collections. Curation has since permeated into an array of disciplines but remains closely tied to heritage and information management. == Core Principles and Activities == The term "digital curation" was first used in the e-science and biological science fields as a means of differentiating the additional suite of activities ordinarily employed by library and museum curators to add value to their collections and enable its reuse from the smaller subtask of simply preserving the data, a significantly more concise archival task. Additionally, the historical understanding of the term "curator" demands more than simple care of the collection. A curator is expected to command academic mastery of the subject matter as a requisite part of appraisal and selection of assets and any subsequent adding of value to the collection through application of metadata. === Principles === There are five commonly accepted principles that govern the occupation of digital curation: Manage the complete birth-to-retirement life cycle of the digital asset. Evaluate and cull assets for inclusion in the collection. Apply preservation methods to strengthen the asset’s integrity and reusability for future users. Act proactively throughout the asset life cycle to add value to both the digital asset and the collection. Facilitate the appropriate degree of access to users. === Methodology === The Digital Curation Center offers the following step-by-step life cycle procedures for putting the above principles into practice: Sequential Actions: Conceptualize: Consider what digital material you will be creating and develop storage options. Take into account websites, publications, email, among other types of digital output. Create: Produce digital material and attach all relevant metadata, typically the more metadata the more accessible the information. Appraise and select: Consult the mission statement of the institution or private collection and determine what digital data is relevant. There may also be legal guidelines in place that will guide the decision process for a particular collection. Ingest: Send digital material to the predetermined storage solution. This may be an archive, repository or other facility. Preservation action: Employ measures to maintain the integrity of the digital material. Store: Secure data within the predetermined storage facility. Access, use, and reuse: Determine the level of accessibility for the range of digital material created. Some material may be accessible only by password and other material may be freely accessible to the public. Routinely check that material is still accessible for the intended audience and that the material has not been compromised through multiple uses. Transform: If desirable or necessary the material may be transferred into a different digital format. Occasional Actions: Dispose: Discard any digital material that is not deemed necessary to the institution. Reappraise: Reevaluate material to ensure that is it still relevant and is true to its original form. Migrate: Migrate data to another format in order to protect data for using better in the future. == Related terms == The term "digital curation" is sometimes used interchangeably with terms such as "digital preservation" and "digital archiving." While digital preservation does focus a significant degree of energy on optimizing reusability, preservation remains a subtask to the concept of digital archiving, which is in turn a subtask of digital curation. For example, archiving is a part of curation, but so are subsequent tasks such as themed collection-building, which is not considered an archival task. Similarly, preservation is a part of archiving, as are the tasks of selection and appraisal that are not necessarily part of preservation. Data curation is another term that is often used interchangeably with digital curation, however common usage of the two terms differs. While "data" is a more all-encompassing term that can be used generally to indicate anything recorded in binary form, the term "data curation" is most common in scientific parlance and usually refers to accumulating and managing information relative to the process of research. Data-driven research of education request the role of information professional gradually develop tradition of digital service to data curation particularly at the management of digital research data. So, while documents and other discrete digital assets are technically a subset of the broader concept of data, in the context of scientific vernacular digital curation represents a broader purview of responsibilities than data curation due to its interest in preserving and adding value to digital assets of any kind. == Challenges == === Rate of creation of new data and data sets === The ever lowering cost and increasing prevalence of entirely new categories of technology has led to a quickly growing flow of new data sets. These come from well established sources such as business and government, but the trend is also driven by new styles of sensors becoming embedded in more areas of modern life. This is particularly true of consumers, whose production of digital assets is no longer relegated strictly to work. Consumers now create wider ranges of digital assets, including videos, photos, location data, purchases, and fitness tracking data, just to name a few, and share them in wider ranges of social platforms. Additionally, the advance of technology has introduced new ways of working with data. Some examples of this are international partnerships that leverage astronomical data to create "virtual observatories," and similar partnerships have also leveraged data resulting from research at the Large Hadron Collider at CERN and the database of protein structures at the Protein Data Bank. === Storage format evolution and obsolescence === By comparison, archiving of analog assets is notably passive in nature, often limited to simply ensuring a suitable storage environment. Digital preservation requires a more proactive approach. Today’s artifacts of cultural significance are notably transient in nature and prone to obsolescence when social trends or dependent technologies change. This rapid progression of technology occasionally makes it necessary to migrate digital asset holdings from one file format to another in order to mitigate the dangers of hardware and software obsolescence which would render the asset unusable. === Underestimation of human labor costs === Modern tools for program planning often underestimate the amount of human labor costs required for adequate digital curation of large collections. As a result cost-benefit assessments often paint an inaccurate picture of both the amount of work involved and the true cost to the institution for bot

Top 10 AI Bug Finders Compared (2026)

Trying to pick the best AI bug finder? An AI bug finder is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI bug finder slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

RAMnets

RAMnets is one of the oldest practical neurally inspired classification algorithms. The RAMnets is also known as a type of "n-tuple recognition method" or "weightless neural network". == Algorithm == Consider (let us say N) sets of n distinct bit locations are selected randomly. These are the n-tuples. The restriction of a pattern to an n-tuple can be regarded as an n-bit number which, together with the identity of the n-tuple, constitutes a `feature' of the pattern. The standard n-tuple recognizer operates simply as follows: A pattern is classified as belonging to the class for which it has the most features in common with at least one training pattern of that class. This is the Θ {\displaystyle \Theta } = 0 case of a more general rule whereby the class assigned to unclassified pattern u is a c r g m a x ( ∑ i = 1 N Θ ( ∑ v ∈ D c δ ( α i ( u ) , α i ( v ) ) ) ) {\displaystyle {\begin{aligned}{\underset {c}{a}}rgmax(\sum _{i=1}^{N}\Theta (\sum _{v\in D_{c}}\delta (\alpha _{i}(u),\alpha _{i}(v))))\end{aligned}}} where Dc is the set of training patterns in class c, Θ ( x ) {\displaystyle \Theta (x)} = x for 0 ≤ x ≤ θ {\displaystyle 0\leq x\leq \theta } , Θ ( x ) = θ {\displaystyle \Theta (x)=\theta } for x ≥ θ {\displaystyle x\geq \theta } , δ i , j {\displaystyle \delta _{i,j}} is the Kronecker delta( δ i , j {\displaystyle \delta _{i,j}} =1 if i=j and 0 otherwise.)and ( α i ( u ) ) {\displaystyle (\alpha _{i}(u))} is the ith feature of the pattern u: ∑ j = 0 n − 1 u η i ( j ) 2 j {\displaystyle \sum _{j=0}^{n-1}u_{\eta }i(j)2^{j}} Here uk is the kth bit of u and u η i ( j ) {\displaystyle u_{\eta }i(j)} is the jth bit location of the ith n-tuple. With C classes to distinguish, the system can be implemented as a network of NC nodes, each of which is a random access memory (RAM); hence the term RAMnet. The memory content m c i α {\displaystyle m_{ci\alpha }} at address α {\displaystyle \alpha } of the ith node allocated to class c is set to m c i α {\displaystyle m_{ci\alpha }} = Θ ( ∑ v ∈ D c δ ( α , α i ( v ) ) ) {\displaystyle \Theta (\sum _{v\in D_{c}}\delta (\alpha ,\alpha _{i}(v)))} In the usual θ {\displaystyle \theta } = 1 case, the 1-bit content of m c i α {\displaystyle m_{ci\alpha }} is set if any pattern of Dc has feature α {\displaystyle \alpha } and unset otherwise. Recognition is accomplished by summing the contents of the nodes of each class at the addresses given by the features of the unclassified pattern. That is, pattern u is assigned to class a c r g m a x ( ∑ i = 1 N m c i α ( u ) ) {\displaystyle {\begin{aligned}{\underset {c}{a}}rgmax(\sum _{i=1}^{N}m_{ci\alpha }(u))\end{aligned}}} == RAM-discriminators and WiSARD == The RAMnets formed the basis of a commercial product known as WiSARD (Wilkie, Stonham and Aleksander Recognition Device) was the first artificial neural network machine to be patented. A RAM-discriminator consists of a set of X one-bit word RAMs with n inputs and a summing device (Σ). Any such RAM-discriminator can receive a binary pattern of X⋅n bits as input. The RAM input lines are connected to the input pattern by means of a biunivocal pseudo-random mapping. The summing device enables this network of RAMs to exhibit – just like other ANN models based on synaptic weights – generalization and noise tolerance. In order to train the discriminator one has to set all RAM memory locations to 0 and choose a training set formed by binary patterns of X⋅n bits. For each training pattern, a 1 is stored in the memory location of each RAM addressed by this input pattern. Once the training of patterns is completed, RAM memory contents will be set to a certain number of 0's and 1's. The information stored by the RAM during the training phase is used to deal with previous unseen patterns. When one of these is given as input, the RAM memory contents addressed by the input pattern are read and summed by Σ. The number r thus obtained, which is called the discriminator response, is equal to the number of RAMs that output 1. r reaches the maximum X if the input belongs to the training set. r is equal to 0 if no n-bit component of the input pattern appears in the training set (not a single RAM outputs 1). Intermediate values of r express a kind of “similarity measure” of the input pattern with respect to the patterns in the training set. A system formed by various RAM-discriminators is called WiSARD. Each RAM-discriminator is trained on a particular class of patterns, and classification by the multi-discriminator system is performed in the following way. When a pattern is given as input, each RAM-discriminator gives a response to that input. The various responses are evaluated by an algorithm which compares them and computes the relative confidence c of the highest response (e.g., the difference d between the highest response and the second highest response, divided by the highest response). A schematic representation of a RAM-discriminator and a 10 RAM-discriminator WiSARD is shown in Figure 1.

Deborah Raji

Inioluwa Deborah Raji (born 1995/1996) is a Nigerian-Canadian computer scientist and socio-tech leader who works on algorithmic bias, AI accountability, and algorithmic auditing. A current Mozilla fellow, she has been recognized by MIT Technology Review and Forbes as one of the world's top young innovators. Raji started her work with racial bias in technology during her internship with Clarifai when she recognized that people of color were more often tagged for NSFW compared to white people. Raji has previously worked with Joy Buolamwini, Timnit Gebru, and the Algorithmic Justice League on researching gender and racial bias in facial recognition technology. Her work on racial bias in facial recognition has forced companies to ultimately change their practices. She has also worked with Google’s Ethical AI team and been a research fellow at the Partnership on AI and AI Now Institute at New York University working on how to operationalize ethical considerations in machine learning engineering practice. She was working on a computer vision model that would help clients flag inappropriate images as NSFW. == Early life and education == Raji was born in Port Harcourt, Nigeria, and moved to Mississauga, Ontario, Canada, when she was four years old. Eventually her family moved to Ottawa. She attended Colonel By Secondary School and completed the International Baccalaureate programme. She studied Engineering Science at the University of Toronto, graduating in 2019. In 2015, she founded Project Include, a nonprofit providing increased student access to engineering education, mentorship, and resources in low income and immigrant communities in the Greater Toronto Area. She started a Doctor of Philosophy - PhD, in Computer Science from the University of California, Berkeley in Aug 2021. == Career and research == Raji worked with Joy Buolamwini at the MIT Media Lab and Algorithmic Justice League, where she audited commercial facial recognition technologies from Microsoft, Amazon, IBM, Face++, and Kairos. They found that these technologies were significantly less accurate for darker-skinned women than for white men. With support from other top AI researchers and increased public pressure and campaigning, their work led IBM and Amazon to agree to support facial recognition regulation and later halt the sale of their product to police for at least a year. Raji also interned at machine learning startup Clarifai, where she worked on a computer vision model for flagging images. She participated in a research mentorship program at Google and worked with their Ethical AI team on creating model cards, a documentation framework for more transparent machine learning model reporting. She also co-led the development of internal auditing practices at Google. Her contributions at Google were separately presented and published at the AAAI conference and ACM Conference on Fairness, Accountability, and Transparency. In 2019, Raji was a summer research fellow at The Partnership on AI working on setting industry machine learning transparency standards and benchmarking norms. Raji was a Tech Fellow at the AI Now Institute worked on algorithmic and AI auditing. Currently, she is a fellow at the Mozilla Foundation researching algorithmic auditing and evaluation. Raji's work on bias in facial recognition systems has been highlighted in the 2020 documentary Coded Bias directed by Shalini Kantayya. She also took part in the 2026 documentary The AI Doc: Or How I Became an Apocaloptimist directed by Daniel Roher. == Awards == 2019 Venture Beat AI Innovations Award in category AI for Good (received with Joy Buolamwini and Timnit Gebru) 2020 MIT Technology Review 35 Under 35 Innovator Award 2020 EFF Pioneer Award (received with Buolamwini and Gebru) 2021 Forbes 30 Under 30 Award in Enterprise Technology 2021 100 Brilliant Women in AI Ethics Hall of Fame Honoree 2023 Time magazine 100 Most Influential People in AI

Equalized odds

Equalized odds, also referred to as conditional procedure accuracy equality and disparate mistreatment, is a measure of fairness in machine learning. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal true positive rate and equal false positive rate, satisfying the formula: P ( R = + | Y = y , A = a ) = P ( R = + | Y = y , A = b ) y ∈ { + , − } ∀ a , b ∈ A {\displaystyle P(R=+|Y=y,A=a)=P(R=+|Y=y,A=b)\quad y\in \{+,-\}\quad \forall a,b\in A} For example, A {\displaystyle A} could be gender, race, or any other characteristics that we want to be free of bias, while Y {\displaystyle Y} would be whether the person is qualified for the degree, and the output R {\displaystyle R} would be the school's decision whether to offer the person to study for the degree. In this context, higher university enrollment rates of African Americans compared to whites with similar test scores might be necessary to fulfill the condition of equalized odds, if the "base rate" of Y {\displaystyle Y} differs between the groups. The concept was originally defined for binary-valued Y {\displaystyle Y} . In 2017, Woodworth et al. generalized the concept further for multiple classes.

Ω-automaton

In automata theory, a branch of theoretical computer science, an ω-automaton (or stream automaton) is a variation of a finite automaton that runs on infinite, rather than finite, strings as input. Since ω-automata do not stop, they have a variety of acceptance conditions rather than simply a set of accepting states. ω-automata are useful for specifying behavior of systems that are not expected to terminate, such as hardware, operating systems and control systems. For such systems, one may want to specify a property such as "for every request, an acknowledge eventually follows", or its negation "there is a request that is not followed by an acknowledge". The former is a property of infinite words: one cannot say of a finite sequence that it satisfies this property. Classes of ω-automata include the Büchi automata, Rabin automata, Streett automata, parity automata and Muller automata, each deterministic or non-deterministic. These classes of ω-automata differ only in terms of acceptance condition. They all recognize precisely the regular ω-languages except for the deterministic Büchi automata, which is strictly weaker than all the others. Although all these types of automata recognize the same set of ω-languages, they nonetheless differ in succinctness of representation for a given ω-language. == Deterministic ω-automata == Formally, a deterministic ω-automaton is a tuple A = ( Q , Σ , δ , q 0 , A a c c ) {\textstyle A=(Q,\Sigma ,\delta ,q_{0},A_{acc})} , that consists of the following components: Q {\textstyle Q} , is a finite set. The elements of Q {\textstyle Q} are called the states of A {\textstyle A} . Σ {\textstyle \Sigma } , is a finite set called the alphabet of A {\textstyle A} . δ : Q × Σ → Q {\textstyle \delta \colon Q\times \Sigma \rightarrow Q} is a function, called the transition function of A {\textstyle A} . Q 0 {\textstyle Q_{0}} is an element of Q {\textstyle Q} , called the initial state. A a c c {\textstyle A_{acc}} is a set of accepting states of A {\textstyle A} , formally a subset of Q ω {\textstyle Q^{\omega }} . An input for A {\textstyle A} is an infinite string over the alphabet Σ {\textstyle \Sigma } , i.e. it is an infinite sequence α = ( a 1 , a 2 , a 3 , … ) {\textstyle \alpha =(a_{1},a_{2},a_{3},\ldots )} . The run of A {\textstyle A} on such an input is an infinite sequence ρ = ( r 0 , r 1 , r 2 , … ) {\textstyle \rho =(r_{0},r_{1},r_{2},\ldots )} of states, defined as follows: r 0 = q 0 {\textstyle r_{0}=q_{0}} . r 1 = δ ( r 0 , a 1 ) {\textstyle r_{1}=\delta (r_{0},a_{1})} . r 2 = δ ( r 1 , a 2 ) {\textstyle r_{2}=\delta (r_{1},a_{2})} . ... that is, for every i {\textstyle i} : r i = δ ( r i − 1 , a i ) {\textstyle r_{i}=\delta (r_{i-1},a_{i})} . The main purpose of an ω-automaton is to define a subset of the set of all inputs: The set of accepted inputs. Whereas in the case of an ordinary finite automaton every run ends with a state r n {\textstyle r_{n}} and the input is accepted if and only if r n {\textstyle r_{n}} is an accepting state, the definition of the set of accepted inputs is more complicated for ω-automata. Here we must look at the entire run ρ {\textstyle \rho } . The input is accepted if the corresponding run is in Acc {\textstyle {\text{Acc}}} . The set of accepted input ω-words is called the recognized ω-language by the automaton, which is denoted as L ( A ) {\textstyle L(A)} . The definition of Acc {\textstyle {\text{Acc}}} as a subset of Q ω {\textstyle Q^{\omega }} is purely formal and not suitable for practice because normally such sets are infinite. The difference between various types of ω-automata (Büchi, Rabin etc.) consists in how they encode certain subsets Acc {\textstyle {\text{Acc}}} of Q ω {\textstyle Q^{\omega }} as finite sets, and therefore in which such subsets they can encode. == Nondeterministic ω-automata == Formally, a nondeterministic ω-automaton is a tuple A = ( Q , Σ , Δ , Q 0 , Acc ) {\textstyle A=(Q,\Sigma ,\Delta ,Q_{0},{\text{Acc}})} that consists of the following components: Q {\textstyle Q} is a finite set. The elements of Q {\textstyle Q} are called the states of A {\textstyle A} . Σ {\textstyle \Sigma } is a finite set called the alphabet of A {\textstyle A} . Δ {\textstyle \Delta } is a subset of Q × Σ × Q {\textstyle Q\times \Sigma \times Q} and is called the transition relation of A {\textstyle A} . Q 0 {\textstyle Q_{0}} is a subset of Q {\textstyle Q} , called the initial set of states. Acc {\textstyle {\text{Acc}}} is the acceptance condition, a subset of Q ω {\textstyle Q^{\omega }} . Unlike a deterministic ω-automaton, which has a transition function δ {\textstyle \delta } , the non-deterministic version has a transition relation Δ {\textstyle \Delta } . Note that Δ {\textstyle \Delta } can be regarded as a function Q × Σ → P ( Q ) {\textstyle Q\times \Sigma \rightarrow {\mathcal {P}}(Q)} from Q × Σ {\textstyle Q\times \Sigma } to the power set P ( Q ) {\textstyle {\mathcal {P}}(Q)} . Thus, given a state q n {\textstyle q_{n}} and a symbol a n {\textstyle a_{n}} , the next state q n + 1 {\textstyle q_{n+1}} is not necessarily determined uniquely, rather there is a set of possible next states. A run of A {\textstyle A} on the input α = ( a 1 , a 2 , a 3 , … ) {\textstyle \alpha =(a_{1},a_{2},a_{3},\ldots )} is any infinite sequence ρ = ( r 0 , r 1 , r 2 , … ) {\textstyle \rho =(r_{0},r_{1},r_{2},\ldots )} of states that satisfies the following conditions: r 0 {\textstyle r_{0}} is an element of Q 0 {\textstyle Q_{0}} . r 1 {\textstyle r_{1}} is an element of Δ ( r 0 , a 1 ) {\textstyle \Delta (r_{0},a_{1})} . r 2 {\textstyle r_{2}} is an element of Δ ( r 1 , a 2 ) {\textstyle \Delta (r_{1},a_{2})} . ... that is, for every i {\textstyle i} : r i {\textstyle r_{i}} is an element of Δ ( r i − 1 , a i ) {\textstyle \Delta (r_{i-1},a_{i})} . A nondeterministic ω-automaton may admit many different runs on any given input, or none at all. The input is accepted if at least one of the possible runs is accepting. Whether a run is accepting depends only on Acc {\textstyle {\text{Acc}}} , as for deterministic ω-automata. Every deterministic ω-automaton can be regarded as a nondeterministic ω-automaton by taking Δ {\textstyle \Delta } to be the graph of δ {\textstyle \delta } . The definitions of runs and acceptance for deterministic ω-automata are then special cases of the nondeterministic cases. == Acceptance conditions == Acceptance conditions may be infinite sets of ω-words. However, people mostly study acceptance conditions that are finitely representable. The following lists a variety of popular acceptance conditions. Before discussing the list, let's make the following observation. In the case of infinitely running systems, one is often interested in whether certain behavior is repeated infinitely often. For example, if a network card receives infinitely many ping requests, then it may fail to respond to some of the requests but should respond to an infinite subset of received ping requests. This motivates the following definition: For any run ρ {\textstyle \rho } , let Inf ( ρ ) {\textstyle {\text{Inf}}(\rho )} be the set of states that occur infinitely often in ρ {\textstyle \rho } . This notion of certain states being visited infinitely often will be helpful in defining the following acceptance conditions. A Büchi automaton is an ω-automaton A {\textstyle A} that uses the following acceptance condition, for some subset F {\textstyle F} of Q {\textstyle Q} : Büchi condition A {\textstyle A} accepts exactly those runs ρ {\textstyle \rho } for which Inf ( ρ ) ∩ F ≠ ∅ {\textstyle {\text{Inf}}(\rho )\cap F\neq \emptyset } , i.e. there is an accepting state that occurs infinitely often in ρ {\textstyle \rho } . A Rabin automaton is an ω-automaton A {\textstyle A} that uses the following acceptance condition, for some set Ω {\textstyle \Omega } of pairs ( B i , G i ) {\textstyle (B_{i},G_{i})} of sets of states: Rabin condition A {\textstyle A} accepts exactly those runs ρ {\textstyle \rho } for which there exists a pair ( B i , G i ) {\textstyle (B_{i},G_{i})} in Ω {\textstyle \Omega } such that B i ∩ Inf ( ρ ) = ∅ {\textstyle B_{i}\cap {\text{Inf}}(\rho )=\emptyset } and G i ∩ Inf ( ρ ) ≠ ∅ {\textstyle G_{i}\cap {\text{Inf}}(\rho )\neq \emptyset } . A Streett automaton is an ω-automaton A {\textstyle A} that uses the following acceptance condition, for some set Ω {\textstyle \Omega } of pairs ( B i , G i ) {\textstyle (B_{i},G_{i})} of sets of states: Streett condition A {\textstyle A} accepts exactly those runs ρ {\textstyle \rho } such that for all pairs ( B i , G i ) {\textstyle (B_{i},G_{i})} in Ω {\textstyle \Omega } , B i ∩ Inf ( ρ ) ≠ ∅ {\textstyle B_{i}\cap {\text{Inf}}(\rho )\neq \emptyset } or G i ∩ Inf ( ρ ) = ∅ {\textstyle G_{i}\cap {\text{Inf}}(\rho )=\emptyset } . A parity automaton is an automaton A {\textstyle A} whose set of states is Q = { 0 , 1 , 2 , … , k } {\textstyle Q=\{0,1,2,\ldots ,k\}} for some natural number k {\textst