Bulletin was an online newsletter platform launched by Facebook on July 6, 2021, that allows notable writers to make announcements directly to their subscribers. Its competitors included Substack, of which Bulletin was called a "near-clone." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. Facebook CEO Mark Zuckerberg stated that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." In October 2022 Meta announced the shutdown of Bulletin. The platform went into read only mode in January 2023 and became unavailable in April 2023. == History == Facebook announced Bulletin as its online newsletter platform on June 29, 2021. and launched by the company on July 6, 2021. Facebook CEO Mark Zuckerberg touted the service by saying that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. == Reception == Unlike competitor such as Substack, Facebook indicated upon service's launch that it would not take a cut of subscription fees of writers using that platform. According to Washington Post technology writer Will Oremus, the move was criticized by those who viewed it as a form of predatory pricing intended by Facebook to force those competitors out of business. Sandeep Vaheesan, legal director of the think tank Open Markets, called for the government to reexamine predatory pricing as a violation of antitrust law, saying, "We want companies to compete by making better products, investing in new equipment and tech — not purely relying on their financial advantages to capture market share."
Direct Graphics Access
Direct Graphics Access is a plug-in for the X display servers that allows client programs direct access to the frame buffer. Graphics hardware communicates via a chunk of memory called a frame buffer. This is an array of values that represent pixel color values on the screen. Writing the appropriate values into the frame buffer therefore allows a program to paint areas of the screen. However, as with any shared resource, problems occur when multiple programs attempt to access the same resource, as they tend to write over each other's work. In the X Window System, this is solved by having a central display server that mediates between programs that want to draw on the screen. The display server also used to perform a lot of the drawing work, allowing programs to say Draw me a circle of this radius filled with this pattern or draw this text in this font. The X server does all this work, freeing programmers from having to write their own drawing code. Another advantage of the X architecture is that it works over a network, allowing programs on one machine to display output on the screen of another. Direct Graphics Access allows direct access to the frame buffer and the X-server hands over control of the frame buffer to the client program and waits for the client to hand it back. This means that the client program has control of the whole screen, and so it is mostly used for full-screen video/games.
Absorbing Markov chain
In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .
Analogical modeling
Analogical modeling (AM) is a formal theory of exemplar based analogical reasoning, proposed by Royal Skousen, professor of Linguistics and English language at Brigham Young University in Provo, Utah. It is applicable to language modeling and other categorization tasks. Analogical modeling is related to connectionism and nearest neighbor approaches, in that it is data-based rather than abstraction-based; but it is distinguished by its ability to cope with imperfect datasets (such as caused by simulated short term memory limits) and to base predictions on all relevant segments of the dataset, whether near or far. In language modeling, AM has successfully predicted empirically valid forms for which no theoretical explanation was known (see the discussion of Finnish morphology in Skousen et al. 2002). == Implementation == === Overview === An exemplar-based model consists of a general-purpose modeling engine and a problem-specific dataset. Within the dataset, each exemplar (a case to be reasoned from, or an informative past experience) appears as a feature vector: a row of values for the set of parameters that define the problem. For example, in a spelling-to-sound task, the feature vector might consist of the letters of a word. Each exemplar in the dataset is stored with an outcome, such as a phoneme or phone to be generated. When the model is presented with a novel situation (in the form of an outcome-less feature vector), the engine algorithmically sorts the dataset to find exemplars that helpfully resemble it, and selects one, whose outcome is the model's prediction. The particulars of the algorithm distinguish one exemplar-based modeling system from another. In AM, we think of the feature values as characterizing a context, and the outcome as a behavior that occurs within that context. Accordingly, the novel situation is known as the given context. Given the known features of the context, the AM engine systematically generates all contexts that include it (all of its supracontexts), and extracts from the dataset the exemplars that belong to each. The engine then discards those supracontexts whose outcomes are inconsistent (this measure of consistency will be discussed further below), leaving an analogical set of supracontexts, and probabilistically selects an exemplar from the analogical set with a bias toward those in large supracontexts. This multilevel search exponentially magnifies the likelihood of a behavior's being predicted as it occurs reliably in settings that specifically resemble the given context. === Analogical modeling in detail === AM performs the same process for each case it is asked to evaluate. The given context, consisting of n variables, is used as a template to generate 2 n {\displaystyle 2^{n}} supracontexts. Each supracontext is a set of exemplars in which one or more variables have the same values that they do in the given context, and the other variables are ignored. In effect, each is a view of the data, created by filtering for some criteria of similarity to the given context, and the total set of supracontexts exhausts all such views. Alternatively, each supracontext is a theory of the task or a proposed rule whose predictive power needs to be evaluated. It is important to note that the supracontexts are not equal peers one with another; they are arranged by their distance from the given context, forming a hierarchy. If a supracontext specifies all of the variables that another one does and more, it is a subcontext of that other one, and it lies closer to the given context. (The hierarchy is not strictly branching; each supracontext can itself be a subcontext of several others, and can have several subcontexts.) This hierarchy becomes significant in the next step of the algorithm. The engine now chooses the analogical set from among the supracontexts. A supracontext may contain exemplars that only exhibit one behavior; it is deterministically homogeneous and is included. It is a view of the data that displays regularity, or a relevant theory that has never yet been disproven. A supracontext may exhibit several behaviors, but contain no exemplars that occur in any more specific supracontext (that is, in any of its subcontexts); in this case it is non-deterministically homogeneous and is included. Here there is no great evidence that a systematic behavior occurs, but also no counterargument. Finally, a supracontext may be heterogeneous, meaning that it exhibits behaviors that are found in a subcontext (closer to the given context), and also behaviors that are not. Where the ambiguous behavior of the nondeterministically homogeneous supracontext was accepted, this is rejected because the intervening subcontext demonstrates that there is a better theory to be found. The heterogeneous supracontext is therefore excluded. This guarantees that we see an increase in meaningfully consistent behavior in the analogical set as we approach the given context. With the analogical set chosen, each appearance of an exemplar (for a given exemplar may appear in several of the analogical supracontexts) is given a pointer to every other appearance of an exemplar within its supracontexts. One of these pointers is then selected at random and followed, and the exemplar to which it points provides the outcome. This gives each supracontext an importance proportional to the square of its size, and makes each exemplar likely to be selected in direct proportion to the sum of the sizes of all analogically consistent supracontexts in which it appears. Then, of course, the probability of predicting a particular outcome is proportional to the summed probabilities of all the exemplars that support it. (Skousen 2002, in Skousen et al. 2002, pp. 11–25, and Skousen 2003, both passim) === Formulas === Given a context with n {\displaystyle n} elements: total number of pairings: n 2 {\displaystyle n^{2}} number of agreements for outcome i: n i 2 {\displaystyle n_{i}^{2}} number of disagreements for outcome i: n i ( n − n i ) {\displaystyle n_{i}(n-n_{i})} total number of agreements: ∑ n i 2 {\displaystyle \sum {n_{i}^{2}}} total number of disagreements: ∑ n i ( n − n i ) = n 2 − ∑ n i 2 {\displaystyle \sum {n_{i}(n-n_{i})}=n^{2}-\sum {n_{i}^{2}}} === Example === This terminology is best understood through an example. In the example used in the second chapter of Skousen (1989), each context consists of three variables with potential values 0-3 Variable 1: 0,1,2,3 Variable 2: 0,1,2,3 Variable 3: 0,1,2,3 The two outcomes for the dataset are e and r, and the exemplars are: 3 1 0 e 0 3 2 r 2 1 0 r 2 1 2 r 3 1 1 r We define a network of pointers like so: The solid lines represent pointers between exemplars with matching outcomes; the dotted lines represent pointers between exemplars with non-matching outcomes. The statistics for this example are as follows: n = 5 {\displaystyle n=5} n r = 4 {\displaystyle n_{r}=4} n e = 1 {\displaystyle n_{e}=1} total number of pairings: n 2 = 25 {\displaystyle n^{2}=25} number of agreements for outcome r: n r 2 = 16 {\displaystyle n_{r}^{2}=16} number of agreements for outcome e: n e 2 = 1 {\displaystyle n_{e}^{2}=1} number of disagreements for outcome r: n r ( n − n r ) = 4 {\displaystyle n_{r}(n-n_{r})=4} number of disagreements for outcome e: n e ( n − n e ) = 4 {\displaystyle n_{e}(n-n_{e})=4} total number of agreements: n r 2 + n e 2 = 17 {\displaystyle n_{r}^{2}+n_{e}^{2}=17} total number of disagreements: n r ( n − n r ) + n e ( n − n e ) = n 2 − ( n r 2 + n e 2 ) = 8 {\displaystyle n_{r}(n-n_{r})+n_{e}(n-n_{e})=n^{2}-(n_{r}^{2}+n_{e}^{2})=8} uncertainty or fraction of disagreement: 8 / 25 = .32 {\displaystyle 8/25=.32} Behavior can only be predicted for a given context; in this example, let us predict the outcome for the context "3 1 2". To do this, we first find all of the contexts containing the given context; these contexts are called supracontexts. We find the supracontexts by systematically eliminating the variables in the given context; with m variables, there will generally be 2 m {\displaystyle 2^{m}} supracontexts. The following table lists each of the sub- and supracontexts; x means "not x", and - means "anything". These contexts are shown in the venn diagram below: The next step is to determine which exemplars belong to which contexts in order to determine which of the contexts are homogeneous. The table below shows each of the subcontexts, their behavior in terms of the given exemplars, and the number of disagreements within the behavior: Analyzing the subcontexts in the table above, we see that there is only 1 subcontext with any disagreements: "3 1 2", which in the dataset consists of "3 1 0 e" and "3 1 1 r". There are 2 disagreements in this subcontext; 1 pointing from each of the exemplars to the other (see the pointer network pictured above). Therefore, only supracontexts containing this subcontext will contain any disagreements. We use a simple rule to identify the homogeneous supraco
ID3 algorithm
In decision tree learning, ID3 (Iterative Dichotomiser 3) is a greedy algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm. The 3 in the name is meant to signify that this was Quinlan's third attempt at a model based on entropy-based splitting, and the term dichotimser is a misnomer as it implies a binary split, but the ID3 algorithm can split on multi-valued attributes. == Algorithm == The ID3 algorithm begins with the original set S {\displaystyle S} as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S {\displaystyle S} and calculates the entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} or the information gain I G ( S ) {\displaystyle IG(S)} of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S {\displaystyle S} is then split or partitioned by the selected attribute to produce subsets of the data. (For example, a node can be split into child nodes based upon the subsets of the population whose ages are less than 50, between 50 and 100, and greater than 100.) The algorithm continues to recurse on each subset, considering only attributes never selected before. Recursion on a subset may stop in one of these cases: every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples. there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset. there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the population with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set. Throughout the algorithm, the decision tree is constructed with each non-terminal node (internal node) representing the selected attribute on which the data was split, and terminal nodes (leaf nodes) representing the class label of the final subset of this branch. === Summary === Calculate the entropy of every attribute a {\displaystyle a} of the data set S {\displaystyle S} . Partition ("split") the set S {\displaystyle S} into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum. Make a decision tree node containing that attribute. Recurse on subsets using the remaining attributes. === Properties === ID3 does not guarantee an optimal solution. It can converge upon local optima. It uses a greedy strategy by selecting the locally best attribute to split the dataset on each iteration. The algorithm's optimality can be improved by using backtracking during the search for the optimal decision tree at the cost of possibly taking longer. ID3 can overfit the training data. To avoid overfitting, smaller decision trees should be preferred over larger ones. This algorithm usually produces small trees, but it does not always produce the smallest possible decision tree. ID3 is harder to use on continuous data than on factored data (factored data has a discrete number of possible values, thus reducing the possible branch points). If the values of any given attribute are continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time-consuming. === Usage === The ID3 algorithm is used by training on a data set S {\displaystyle S} to produce a decision tree which is stored in memory. At runtime, this decision tree is used to classify new test cases (feature vectors) by traversing the decision tree using the features of the datum to arrive at a leaf node. == The ID3 metrics == === Entropy === Entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} is a measure of the amount of uncertainty in the (data) set S {\displaystyle S} (i.e. entropy characterizes the (data) set S {\displaystyle S} ). H ( S ) = ∑ x ∈ X − p ( x ) log 2 p ( x ) {\displaystyle \mathrm {H} {(S)}=\sum _{x\in X}{-p(x)\log _{2}p(x)}} Where, S {\displaystyle S} – The current dataset for which entropy is being calculated This changes at each step of the ID3 algorithm, either to a subset of the previous set in the case of splitting on an attribute or to a "sibling" partition of the parent in case the recursion terminated previously. X {\displaystyle X} – The set of classes in S {\displaystyle S} p ( x ) {\displaystyle p(x)} – The proportion of the number of elements in class x {\displaystyle x} to the number of elements in set S {\displaystyle S} When H ( S ) = 0 {\displaystyle \mathrm {H} {(S)}=0} , the set S {\displaystyle S} is perfectly classified (i.e. all elements in S {\displaystyle S} are of the same class). In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set S {\displaystyle S} on this iteration. Entropy in information theory measures how much information is expected to be gained upon measuring a random variable; as such, it can also be used to quantify the amount to which the distribution of the quantity's values is unknown. A constant quantity has zero entropy, as its distribution is perfectly known. In contrast, a uniformly distributed random variable (discretely or continuously uniform) maximizes entropy. Therefore, the greater the entropy at a node, the less information is known about the classification of data at this stage of the tree; and therefore, the greater the potential to improve the classification here. As such, ID3 is a greedy heuristic performing a best-first search for locally optimal entropy values. Its accuracy can be improved by preprocessing the data. === Information gain === Information gain I G ( A ) {\displaystyle IG(A)} is the measure of the difference in entropy from before to after the set S {\displaystyle S} is split on an attribute A {\displaystyle A} . In other words, how much uncertainty in S {\displaystyle S} was reduced after splitting set S {\displaystyle S} on attribute A {\displaystyle A} . I G ( S , A ) = H ( S ) − ∑ t ∈ T p ( t ) H ( t ) = H ( S ) − H ( S | A ) . {\displaystyle IG(S,A)=\mathrm {H} {(S)}-\sum _{t\in T}p(t)\mathrm {H} {(t)}=\mathrm {H} {(S)}-\mathrm {H} {(S|A)}.} Where, H ( S ) {\displaystyle \mathrm {H} (S)} – Entropy of set S {\displaystyle S} T {\displaystyle T} – The subsets created from splitting set S {\displaystyle S} by attribute A {\displaystyle A} such that S = ⋃ t ∈ T t {\displaystyle S=\bigcup _{t\in T}t} p ( t ) {\displaystyle p(t)} – The proportion of the number of elements in t {\displaystyle t} to the number of elements in set S {\displaystyle S} H ( t ) {\displaystyle \mathrm {H} (t)} – Entropy of subset t {\displaystyle t} In ID3, information gain can be calculated (instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set S {\displaystyle S} on this iteration.
Algorithmic bias
Algorithmic bias describes systematic and repeatable harmful tendency in a computerized sociotechnical system to create "unfair" outcomes, such as "privileging" one category over another in ways that may or may not be different from the intended function of the algorithm. Bias can emerge from many factors, including intentionally biased design decisions or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected or used to train the algorithm. For example, algorithmic bias has been observed in search engine results and social media platforms. This bias can have impacts ranging from privacy violations to reinforcing social biases of race, gender, sexuality, and ethnicity. The study of algorithmic bias is most concerned with algorithms that reflect "systematic and unfair" discrimination. This bias has only recently been addressed in legal frameworks, such as the European Union's General Data Protection Regulation (enforced in 2018) and the Artificial Intelligence Act (proposed in 2021 and adopted in 2024). As algorithms expand their ability to organize society, politics, institutions, and behavior, sociologists have become concerned with the ways in which unanticipated output and manipulation of data can impact the physical world. Because algorithms are often considered to be neutral and unbiased, they can inaccurately project greater authority than human expertise (in part due to the psychological phenomenon of automation bias), and in some cases, reliance on algorithms can displace human responsibility for their outcomes, without last mile thinking. Bias can enter into algorithmic systems as a result of pre-existing cultural, social, or institutional expectations; by how features and labels are chosen; because of technical limitations of their design; or by being used in unanticipated contexts or by audiences who are not considered in the software's initial design. Algorithmic bias has been cited in cases ranging from election outcomes to the spread of online hate speech. It has also arisen in criminal justice, healthcare, and hiring, compounding existing racial, socioeconomic, and gender biases. The relative inability of facial recognition technology to accurately identify darker-skinned faces has been linked to multiple wrongful arrests of black men, an issue stemming from imbalanced datasets. Problems in understanding, researching, and discovering algorithmic bias persist due to the proprietary nature of algorithms, which are typically treated as trade secrets. Even when full transparency is provided, the complexity of certain algorithms poses a barrier to understanding their functioning. Furthermore, algorithms may change, or respond to input or output in ways that cannot be anticipated or easily reproduced for analysis. In many cases, even within a single website or application, there is no single "algorithm" to examine, but a network of many interrelated programs and data inputs, even between users of the same service. A 2021 survey identified multiple forms of algorithmic bias, including historical, representation, and measurement biases, each of which can contribute to unfair outcomes. == Definitions == Algorithms are difficult to define, but may be generally understood as lists of instructions that determine how programs read, collect, process, and analyze data to generate a usable output. For a rigorous technical introduction, see Algorithms. Advances in computer hardware and software have led to an increased capability to process, store and transmit data. This has in turn made the design and adoption of technologies such as machine learning and artificial intelligence technically and commercially feasible. By analyzing and processing data, algorithms are the backbone of search engines, social media websites, recommendation engines, online retail, online advertising, and more. Contemporary social scientists are concerned with algorithmic processes embedded into hardware and software applications because of their political and social impact, and question the underlying assumptions of an algorithm's neutrality. The term algorithmic bias describes systematic and repeatable errors that create unfair outcomes, such as privileging one arbitrary group of users over others. For example, a credit score algorithm may deny a loan without being unfair, if it is consistently weighing relevant financial criteria. If the algorithm recommends loans to one group of users, but denies loans to another set of nearly identical users based on unrelated criteria, and if this behavior can be repeated across multiple occurrences, an algorithm can be described as biased. This bias may be intentional or unintentional (for example, it can come from biased data obtained from a worker that previously did the job the algorithm is going to do from now on). == Methods == Bias can be introduced to an algorithm in several ways. During the assemblage of a dataset, data may be collected, digitized, adapted, and entered into a database according to human-designed cataloging criteria. Next, programmers assign priorities, or hierarchies, for how a program assesses and sorts that data. This requires human decisions about how data is categorized, and which data is included or discarded. Some algorithms collect their own data based on human-selected criteria, which can also reflect the bias of human designers. Other algorithms may reinforce stereotypes and preferences as they process and display "relevant" data for human users, for example, by selecting information based on previous choices of a similar user or group of users. Beyond assembling and processing data, bias can emerge as a result of design. For example, algorithms that determine the allocation of resources or scrutiny (such as determining school placements) may inadvertently discriminate against a category when determining risk based on similar users (as in credit scores). Meanwhile, recommendation engines that work by associating users with similar users, or that make use of inferred marketing traits, might rely on inaccurate associations that reflect broad ethnic, gender, socio-economic, or racial stereotypes. Another example comes from determining criteria for what is included and excluded from results. These criteria could present unanticipated outcomes for search results, such as with flight-recommendation software that omits flights that do not follow the sponsoring airline's flight paths. Algorithms may also display an uncertainty bias, offering more confident assessments when larger data sets are available. This can skew algorithmic processes toward results that more closely correspond with larger samples, which may disregard data from underrepresented populations. == History == === Early critiques === The earliest computer programs were designed to mimic human reasoning and deductions, and were deemed to be functioning when they successfully and consistently reproduced that human logic. In his 1976 book Computer Power and Human Reason, artificial intelligence pioneer Joseph Weizenbaum suggested that bias could arise both from the data used in a program, but also from the way a program is coded. Weizenbaum wrote that programs are a sequence of rules created by humans for a computer to follow. By following those rules consistently, such programs "embody law", that is, enforce a specific way to solve problems. The rules a computer follows are based on the assumptions of a computer programmer for how these problems might be solved. That means the code could incorporate the programmer's imagination of how the world works, including their biases and expectations. While a computer program can incorporate bias in this way, Weizenbaum also noted that any data fed to a machine additionally reflects "human decision making processes" as data is being selected. Finally, he noted that machines might also transfer good information with unintended consequences if users are unclear about how to interpret the results. Weizenbaum warned against trusting decisions made by computer programs that a user doesn't understand, comparing such faith to a tourist who can find his way to a hotel room exclusively by turning left or right on a coin toss. Crucially, the tourist has no basis of understanding how or why he arrived at his destination, and a successful arrival does not mean the process is accurate or reliable. An early example of algorithmic bias resulted in as many as 60 women and ethnic minorities denied entry to St. George's Hospital Medical School per year from 1982 to 1986, based on implementation of a new computer-guidance assessment system that denied entry to women and men with "foreign-sounding names" based on historical trends in admissions. While many schools at the time employed similar biases in their selection process, St. George was most notable for automating said bias through the use of an algorithm, thus gaining the attention of people on a much
Implicit blockmodeling
Implicit blockmodeling is an approach in blockmodeling, similar to a valued and homogeneity blockmodeling, where initially an additional normalization is used and then while specifying the parameter of the relevant link is replaced by the block maximum. This approach was first proposed by Batagelj and Ferligoj in 2000, and developed by Aleš Žiberna in 2007/08. Comparing with homogeneity, the implicit blockmodeling will perform similarly with max-regular equivalence, but slightly worse in other settings. It will perform worse than valued and homogeneity blockmodeling with a pre-specified blockmodel.