Optimal discriminant analysis and classification tree analysis

Optimal discriminant analysis and classification tree analysis

Optimal Discriminant Analysis (ODA) and the related classification tree analysis (CTA) are exact statistical methods that maximize predictive accuracy. For any specific sample and exploratory or confirmatory hypothesis, optimal discriminant analysis (ODA) identifies the statistical model that yields maximum predictive accuracy, assesses the exact Type I error rate, and evaluates potential cross-generalizability. Optimal discriminant analysis may be applied to > 0 dimensions, with the one-dimensional case being referred to as UniODA and the multidimensional case being referred to as MultiODA. Optimal discriminant analysis is an alternative to ANOVA (analysis of variance) and regression analysis.

IEEE Transactions on Visualization and Computer Graphics

IEEE Transactions on Visualization and Computer Graphics is a peer-reviewed scientific journal published by the IEEE Computer Society. It covers subjects related to computer graphics and visualization techniques, systems, software, hardware, and user interface issues. TVCG has been considered the top journal in the field of visualization. Since 2011, TVCG has allowed authors to present recently accepted papers at partner conferences. These include: IEEE Visualization (VIS), including VAST, InfoVis, and SciVis. IEEE Virtual Reality Conference (IEEE VR) IEEE International Symposium on Mixed and Augmented Reality (ISMAR) ACM Symposium on Interactive 3D Graphics and Games (I3D) IEEE Pacific Visualization Conference (IEEE PacificVis) ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA) Eurographics Symposium on Geometry Processing (SGP) Pacific Graphics Conference (PG) Eurovis - The EG and VGTC Conference on Visualization Graphics Interfaces (GI)

Thompson's construction

In computer science, Thompson's construction algorithm, also called the McNaughton–Yamada–Thompson algorithm, is a method of transforming a regular expression into an equivalent nondeterministic finite automaton (NFA). This NFA can be used to match strings against the regular expression. This algorithm is credited to Ken Thompson. Regular expressions and nondeterministic finite automata are two representations of formal languages. For instance, text processing utilities use regular expressions to describe advanced search patterns, but NFAs are better suited for execution on a computer. Hence, this algorithm is of practical interest, since it can compile regular expressions into NFAs. From a theoretical point of view, this algorithm is a part of the proof that they both accept exactly the same languages, that is, the regular languages. An NFA can be made deterministic by the powerset construction and then be minimized to get an optimal automaton corresponding to the given regular expression. However, an NFA may also be interpreted directly. To decide whether two given regular expressions describe the same language, each can be converted into an equivalent minimal deterministic finite automaton via Thompson's construction, powerset construction, and DFA minimization. If, and only if, the resulting automata agree up to renaming of states, the regular expressions' languages agree. == The algorithm == The algorithm works recursively by splitting an expression into its constituent subexpressions, from which the NFA will be constructed using a set of rules. More precisely, from a regular expression E, the obtained automaton A with the transition function Δ respects the following properties: A has exactly one initial state q0, which is not accessible from any other state. That is, for any state q and any letter a, Δ ( q , a ) {\displaystyle \Delta (q,a)} does not contain q0. A has exactly one final state qf, which is not co-accessible from any other state. That is, for any letter a, Δ ( q f , a ) = ∅ {\displaystyle \Delta (q_{f},a)=\emptyset } . Let c be the number of concatenation of the regular expression E and let s be the number of symbols apart from parentheses — that is, |, , a and ε. Then, the number of states of A is 2s − c (linear in the size of E). The number of transitions leaving any state is at most two. Since an NFA of m states and at most e transitions from each state can match a string of length n in time O(emn), a Thompson NFA can do pattern matching in linear time, assuming a fixed-size alphabet. === Rules === The following rules are depicted according to Aho et al. (2007), p. 122. In what follows, N(s) and N(t) are the NFA of the subexpressions s and t, respectively. The empty-expression ε is converted to A symbol a of the input alphabet is converted to The union expression s|t is converted to State q goes via ε either to the initial state of N(s) or N(t). Their final states become intermediate states of the whole NFA and merge via two ε-transitions into the final state of the NFA. The concatenation expression st is converted to The initial state of N(s) is the initial state of the whole NFA. The final state of N(s) becomes the initial state of N(t). The final state of N(t) is the final state of the whole NFA. The Kleene star expression s is converted to An ε-transition connects initial and final state of the NFA with the sub-NFA N(s) in between. Another ε-transition from the inner final to the inner initial state of N(s) allows for repetition of expression s according to the star operator. The parenthesized expression (s) is converted to N(s) itself. With these rules, using the empty expression and symbol rules as base cases, it is possible to prove with structural induction that any regular expression may be converted into an equivalent NFA. == Example == Two examples are now given, a small informal one with the result, and a bigger with a step by step application of the algorithm. === Small Example === The picture below shows the result of Thompson's construction on (ε|ab). The purple oval corresponds to a, the teal oval corresponds to a, the green oval corresponds to b, the orange oval corresponds to ab, and the blue oval corresponds to ε. === Application of the algorithm === As an example, the picture shows the result of Thompson's construction algorithm on the regular expression (0|(1(01(00)0)1)) that denotes the set of binary numbers that are multiples of 3: { ε, "0", "00", "11", "000", "011", "110", "0000", "0011", "0110", "1001", "1100", "1111", "00000", ... }. The upper right part shows the logical structure (syntax tree) of the expression, with "." denoting concatenation (assumed to have variable arity); subexpressions are named a-q for reference purposes. The left part shows the nondeterministic finite automaton resulting from Thompson's algorithm, with the entry and exit state of each subexpression colored in magenta and cyan, respectively. An ε as transition label is omitted for clarity — unlabelled transitions are in fact ε transitions. The entry and exit state corresponding to the root expression q is the start and accept state of the automaton, respectively. The algorithm's steps are as follows: An equivalent minimal deterministic automaton is shown below. == Relation to other algorithms == Thompson's is one of several algorithms for constructing NFAs from regular expressions; an earlier algorithm was given by McNaughton and Yamada. Converse to Thompson's construction, Kleene's algorithm transforms a finite automaton into a regular expression. Glushkov's construction algorithm is similar to Thompson's construction, once the ε-transitions are removed. == Use in string pattern matching == Regular expressions are often used to specify patterns that software is then asked to match. Generating an NFA by Thompson's construction, and using an appropriate algorithm to simulate it, it is possible to create pattern-matching software with performance that is ⁠ O ( m n ) {\displaystyle O(mn)} ⁠, where m is the length of the regular expression and n is the length of the string being matched. This is much better than is achieved by many popular programming-language implementations; however, it is restricted to purely regular expressions and does not support patterns for non-regular languages like backreferences.

GLIMMER

In bioinformatics, GLIMMER (Gene Locator and Interpolated Markov ModelER) is used to find genes in prokaryotic DNA. "It is effective at finding genes in bacteria, archea, viruses, typically finding 98-99% of all relatively long protein coding genes". GLIMMER was the first system that used the interpolated Markov model to identify coding regions. The GLIMMER software is open source and is maintained by Steven Salzberg, Art Delcher, and their colleagues at the Center for Computational Biology at Johns Hopkins University. The original GLIMMER algorithms and software were designed by Art Delcher, Simon Kasif and Steven Salzberg and applied to bacterial genome annotation in collaboration with Owen White. == Versions == === GLIMMER 1.0 === First Version of GLIMMER "i.e., GLIMMER 1.0" was released in 1998 and it was published in the paper Microbial gene identification using interpolated Markov model. Markov models were used to identify microbial genes in GLIMMER 1.0. GLIMMER considers the local composition sequence dependencies which makes GLIMMER more flexible and more powerful when compared to fixed-order Markov model. There was a comparison made between interpolated Markov model used by GLIMMER and fifth order Markov model in the paper Microbial gene identification using interpolated Markov models. "GLIMMER algorithm found 1680 genes out of 1717 annotated genes in Haemophilus influenzae where fifth order Markov model found 1574 genes. GLIMMER found 209 additional genes which were not included in 1717 annotated genes where fifth order Markov model found 104 genes."' === GLIMMER 2.0 === Second Version of GLIMMER i.e., GLIMMER 2.0 was released in 1999 and it was published in the paper Improved microbial identification with GLIMMER. This paper provides significant technical improvements such as using interpolated context model instead of interpolated Markov model and resolving overlapping genes which improves the accuracy of GLIMMER. Interpolated context models are used instead of interpolated Markov model which gives the flexibility to select any base. In interpolated Markov model probability distribution of a base is determined from the immediate preceding bases. If the immediate preceding base is irrelevant amino acid translation, interpolated Markov model still considers the preceding base to determine the probability of given base where as interpolated context model which was used in GLIMMER 2.0 can ignore irrelevant bases. False positive predictions were increased in GLIMMER 2.0 to reduce the number of false negative predictions. Overlapped genes are also resolved in GLIMMER 2.0. Various comparisons between GLIMMER 1.0 and GLIMMER 2.0 were made in the paper Improved microbial identification with GLIMMER which shows improvement in the later version. "Sensitivity of GLIMMER 1.0 ranges from 98.4 to 99.7% with an average of 99.1% where as GLIMMER 2.0 has a sensitivity range from 98.6 to 99.8% with an average of 99.3%. GLIMMER 2.0 is very effective in finding genes of high density. The parasite Trypanosoma brucei, responsible for causing African sleeping sickness is being identified by GLIMMER 2.0" === GLIMMER 3.0 === Third version of GLIMMER, "GLIMMER 3.0" was released in 2007 and it was published in the paper Identifying bacterial genes and endosymbiont DNA with Glimmer. This paper describes several major changes made to the GLIMMER system including improved methods to identify coding regions and start codon. Scoring of ORF in GLIMMER 3.0 is done in reverse order i.e., starting from stop codon and moves back towards the start codon. Reverse scanning helps in identifying the coding portion of the gene more accurately which is contained in the context window of IMM. GLIMMER 3.0 also improves the generated training set data by comparing the long-ORF with universal amino acid distribution of widely disparate bacterial genomes."GLIMMER 3.0 has an average long-ORF output of 57% for various organisms where as GLIMMER 2.0 has an average long-ORF output of 39%." GLIMMER 3.0 reduces the rate of false positive predictions which were increased in GLIMMER 2.0 to reduce the number of false negative predictions. "GLIMMER 3.0 has a start-site prediction accuracy of 99.5% for 3'5' matches where as GLIMMER 2.0 has 99.1% for 3'5' matches. GLIMMER 3.0 uses a new algorithm for scanning coding regions, a new start site detection module, and architecture which integrates all gene predictions across an entire genome." Minimum description length === Theoretical and Biological Foundation === The GLIMMER project helped introduce and popularize the use of variable length models in Computational Biology and Bioinformatics that subsequently have been applied to numerous problems such as protein classification and others. Variable length modeling was originally pioneered by information theorists and subsequently ingeniously applied and popularized in data compression (e.g. Ziv-Lempel compression). Prediction and compression are intimately linked using Minimum Description Length Principles. The basic idea is to create a dictionary of frequent words (motifs in biological sequences). The intuition is that the frequently occurring motifs are likely to be most predictive and informative. In GLIMMER the interpolated model is a mixture model of the probabilities of these relatively common motifs. Similarly to the development of HMMs in Computational Biology, the authors of GLIMMER were conceptually influenced by the previous application of another variant of interpolated Markov models to speech recognition by researchers such as Fred Jelinek (IBM) and Eric Ristad (Princeton). The learning algorithm in GLIMMER is different from these earlier approaches. == Access == GLIMMER can be downloaded from The Glimmer home page (requires a C++ compiler). Alternatively, an online version is hosted by NCBI [1]. == How it works == GLIMMER primarily searches for long-ORFS. An open reading frame might overlap with any other open reading frame which will be resolved using the technique described in the sub section. Using these long-ORFS and following certain amino acid distribution GLIMMER generates training set data. Using these training data, GLIMMER trains all the six Markov models of coding DNA from zero to eight order and also train the model for noncoding DNA GLIMMER tries to calculate the probabilities from the data. Based on the number of observations, GLIMMER determines whether to use fixed order Markov model or interpolated Markov model. If the number of observations are greater than 400, GLIMMER uses fixed order Markov model to obtain there probabilities. If the number of observations are less than 400, GLIMMER uses interpolated Markov model which is briefly explained in the next sub section. GLIMMER obtains score for every long-ORF generated using all the six coding DNA models and also using non-coding DNA model. If the score obtained in the previous step is greater than a certain threshold then GLIMMER predicts it to be a gene. The steps explained above describes the basic functionality of GLIMMER. There are various improvements made to GLIMMER and some of them are described in the following sub-sections. === The GLIMMER system === GLIMMER system consists of two programs. First program called build-imm, which takes an input set of sequences and outputs the interpolated Markov model as follows. The probability for each base i.e., A,C,G,T for all k-mers for 0 ≤ k ≤ 8 is computed. Then, for each k-mer, GLIMMER computes weight. New sequence probability is computed as follows. where n is the length of the sequence S x {\displaystyle S_{x}} is the oligomer at position x. I M M 8 ( S x ) {\displaystyle IMM_{8}(S_{x})} , the 8 t h {\displaystyle 8^{th}} -order interpolated Markov model score is computed as "where Y k ( S x − 1 ) {\displaystyle Y_{k}(S_{x-1})} is the weight of the k-mer at position x-1 in the sequence S and P k ( S x ) {\displaystyle P_{k}(S_{x})} is the estimate obtained from the training data of the probability of the base located at position x in the k t h {\displaystyle k^{th}} -order model." The probability of base S x {\displaystyle S_{x}} given the i previous bases is computed as follows. "The value of Y i ( S x ) {\displaystyle Y_{i}(S_{x})} associated with P i ( S x ) {\displaystyle P_{i}(S_{x})} can be regarded as a measure of confidence in the accuracy of this value as an estimate of the true probability. GLIMMER uses two criteria to determine Y i ( S x ) {\displaystyle Y_{i}(S_{x})} . The first of these is simple frequency occurrence in which the number of occurrences of context string S x , i {\displaystyle S_{x,i}} in the training data exceeds a specific threshold value, then Y i ( S x ) {\displaystyle Y_{i}(S_{x})} is set to 1.0. The current default value for threshold is 400, which gives 95% confidence. When there are insufficient sample occurrences of a context string, build-imm employ additional criteria to determine Y {\displaystyle Y} value. For a

Yasuo Matsuyama

Yasuo Matsuyama (born March 23, 1947) is a Japanese researcher in machine learning and human-aware information processing. Matsuyama is a Professor Emeritus and an Honorary Researcher of the Research Institute of Science and Engineering of Waseda University. == Early life and education == Matsuyama received his bachelor’s, master’s and doctoral degrees in electrical engineering from Waseda University in 1969, 1971, and 1974 respectively. The dissertation title for the Doctor of Engineering is Studies on Stochastic Modeling of Neurons. There, he contributed to the spiking neurons with stochastic pulse-frequency modulation. Advisors were Jun’ichi Takagi, Kageo, Akizuki, and Katsuhiko Shirai. Upon the completion of the doctoral work at Waseda University, he was dispatched to the United States as a Japan-U.S. exchange fellow by the joint program of the Japan Society for the Promotion of Science, Fulbright Program, and the Institute of International Education. Through this exchange program, he completed his Ph.D. program at Stanford University in 1978. The dissertation title is Process Distortion Measures and Signal Processing. There, he contributed to the theory of probabilistic distortion measures and its applications to speech encoding with spectral clustering or vector quantization. His advisor was Robert. M. Gray. == Career == From 1977 to 1078, Matsuyama was a research assistant at the Information Systems Laboratory of Stanford University Archived 2018-03-16 at the Wayback Machine. From 1979 to 1996, he was a faculty of Ibaraki University, Japan (the final position was a professor and chairperson of the Information and System Sciences Major). Since 1996, he was a Professor of Waseda University, Department of Computer Science and Engineering. From 2011 to 2013, he was the director of the Media Network Center of Waseda University. At the 2011 Tōhoku earthquake and tsunami of March 11, 2011, he was in charge of the safety inquiry of 65,000 students, staffs and faculties. Since 2017, Matsuyama is a Professor Emeritus and an Honorary Researcher of the Research Institute of Science and Engineering of Waseda University. Since 2018, he serves as an acting president of the Waseda Electrical Engineering Society. == Work == Matsuyama’s works on machine learning and human-aware information processing have dual foundations. Studies on the competitive learning (vector quantization) for his Ph.D. at Stanford University brought about his succeeding works on machine learning contributions. Studies on stochastic spiking neurons for his Dr. Engineering at Waseda University set off applications of biological signals to the machine learning. Thus, his works can be grouped reflecting these dual foundations. Statistical machine learning algorithms: The use of the alpha-logarithmic likelihood ratio in learning cycles generated the alpha-EM algorithm (alpha-Expectation maximization algorithm). Because the alpha-logarithm includes the usual logarithm, the alpha-EM algorithm contains the EM-algorithm (more precisely, the log-EM algorithm). The merit of the speedup by the alpha-EM over the log-EM is due to the ability to utilize the past information. Such a usage of the messages from the past brought about the alpha-HMM estimation algorithm (alpha-hidden Markov model estimation algorithm) that is a generalized and faster version of the hidden Markov model estimation algorithm (HMM estimation algorithm). Competitive learning on empirical data: Starting from the speech compression studies at Stanford, Matsuyama developed generalized competitive learning algorithms; the harmonic competition and the multiple descent cost competition. The former realizes the multiple-object optimization. The latter admits deformable centroids. Both algorithms generalize the batch-mode vector quantization (simply called, vector quantization) and the successive-mode vector quantization (or, called learning vector quantization). A hierarchy from the alpha-EM to the vector quantization: Matsuyama contributed to generate and identify the hierarchy of the above algorithms. Alpha-EM ⊃ log-EM ⊃ basic competitive learning (vector quantization, VQ; or clustering). On the class of the vector quantization and competitive learning, he contributed to generate and identify the hierarchy of VQs. VQ ⇔ {batch mode VQ, and learning VQ} ⊂ {harmonic competition} ⊂ {multiple descent cost competition}. Applications to Human-aware information processing: The dual foundations of his led to the applications to huma-aware information processing. Retrieval systems for similar images and videos. Bipedal humanoid operations via invasive and noninvasive brain signals as well as gestures. Continuous authentication of uses by brain signals. Self-organization and emotional feature injection based on the competitive learning. Decomposition of DNA sequences by the independent component analysis (US Patent: US 8,244,474 B2). Data compression of speech signals by the competitive learning. The above theories and applications work as contributions to IoCT (Internet of Collaborative Things) and IoXT (http://www.asc-events.org/ASC17/Workshop.php Archived 2018-02-06 at the Wayback Machine). == Awards and honors == 2016: e-Teaching Award of Waseda University 2015: Best Textbook Award by the Japanese Society of Information Processing 2014: Fellow of the Japanese Society of Information Processing 2013: IEEE Life Fellow 2008: Y. Dote Memorial Best Paper Award of CSTST 2008 from ACM and IEEE 2006: LSI Intellectual Property Design Award from the LSI IP Committee 2004: Best Paper Award for Application Oriented Research from Asia Pacific Neural Network Assembly 2002: Fellow Award from the Institute of Electronics, Information and Communication Engineers. 2001: Telecommunication System Major Award of the Telecommunications Advancement Foundation 2001: Outstanding Paper Award of IEEE Transactions on Neural Networks Archived 2013-01-17 at the Wayback Machine 1998: Fellow Award from IEEE for contributions to learning algorithms with competition. 1992: Best Paper Award from the Institute of Electronics, Information and Communication Engineers 1989: Telecommunication System Promotion Award of the Telecommunications Advancement Foundation

Glow (app)

Glow is a fertility awareness and period-tracking app. It is part of a suite of mobile apps focused on women's reproductive health and childcare, which includes Eve by Glow (a dedicated period tracker), Glow Nurture (a pregnancy tracker), and Glow Baby (a baby development tracker). The Glow company also operates an online shop that sells several fertility-related products, including ovulation test strips, pregnancy tests, and wearable breast pumps. In 2024, Glow was reported to have approximately 25 million users across its various apps and community message boards. == History == Glow debuted in August 2013 as an iOS app. It was founded by Michael Huang and Max Levchin and launched with $6 million in Series A funding from venture capital firms Founders Fund and Andreesen Horowitz. In 2014, Glow raised an additional $17 million in Series B funding, with Formation 8 joining existing investors. In 2015, Glow launched Ruby, an app dedicated to sexual health. That year, Wired reported that the company had added features to their apps allowing men to monitor their fertility. Glow subsequently released an additional set of apps focused on pregnancy tracking and infant development. In 2016, Glow reported that it had a total of approximately 3 million users; by 2018, this had grown to 15 million. Vox described it as one of the “big two” period and fertility tracking apps and the one that had started the “boom” in the femtech space. == Application and features == Glow was initially described as a fertility application that applied data-driven methods to menstrual and ovulation tracking. Core features include cycle logging, ovulation prediction, and symptom tracking. The app also provides educational content related to reproductive health and childcare, as well as a set of online message boards that allow individuals to share experiences and seek peer support. == Privacy and legal issues == Glow has received significant media attention for its privacy and security practices. In 2016, Consumer Reports identified potential exploits in the Glow app that they claimed could have exposed private user data to hackers. Glow subsequently reported that it had fixed the vulnerabilities and told The Washington Post they had no evidence that user data had been compromised. In September 2020, the California Attorney General announced a settlement with Glow related to Consumer Reports’ findings, which included a $250,000 civil penalty. Following the US Supreme Court's 2022 Dobbs v. Jackson ruling, which legalized state-level bans on abortion, Glow (and other fertility trackers, such as Clue and Flo) came under additional scrutiny over concerns that user data on abortions could be reported to law enforcement. After this surge of media interest, a research team affiliated with the University of New South Wales conducted an investigation into the privacy practices of several popular fertility apps, including Glow. Their review of Glow was mixed, noting that they provided several privacy settings and de-identified sensitive data, but that user information could still be disclosed in the future if the app was sold. Glow rejected that claim, telling the Australian Associated Press that it "did not share" personal data. The company also cited several internal security measures it had implemented and its apps' offline data protection setting, which allows users to permanently delete their health-related data. == Reception == In 2014, Fast Company reported that 20,000 women had used Glow to conceive. Later that year, The Guardian included Glow Nurture on its list of the best iPhone apps of 2014. Media coverage often praised Glow's array of menstrual tracking options, although some reviews also noted that fertility apps are not birth control tools and cautioned against relying on them for that purpose. In 2019, Cosmopolitan singled Glow's community of users as one of its standout features.

Timnit Gebru

Timnit W. Gebru (Amharic and Tigrinya: ትምኒት ገብሩ; 1982/1983) is an Eritrean Ethiopian-born computer scientist who works in the fields of artificial intelligence (AI), algorithmic bias and data mining. She is a co-founder of Black in AI, an advocacy group that has pushed for more Black roles in AI development and research. She is the founder of the Distributed Artificial Intelligence Research Institute (DAIR). In December 2020, public controversy erupted over the circumstances surrounding Gebru's departure from Google, where she was technical co-lead of the Ethical Artificial Intelligence Team. Gebru had coauthored a paper on the risks of large language models (LLMs) acting as stochastic parrots, and submitted it for publication. According to Jeff Dean, head of Google AI, the paper was submitted without waiting for Google's internal review, which then asserted that it ignored too much relevant research. Google management requested that Gebru either withdraw the paper or remove the names of all the authors employed by Google. Gebru requested the identity and feedback of every reviewer, and stated that if Google refused, she would talk to her manager about "a last date". Google terminated her employment immediately, stating that they were accepting her resignation. Gebru maintained that she had not formally offered to resign, and only threatened to. Gebru has been widely recognized for her expertise in the ethics of artificial intelligence. She was named one of the World's 50 Greatest Leaders by Fortune and one of Nature's ten people who shaped science in 2021, and in 2022, one of Time's most influential people. == Early life and education == Gebru was raised in Addis Ababa, Ethiopia. Her father, an electrical engineer with a Doctor of Philosophy (PhD), died when she was five years old, and she was raised by her mother, an economist. Both her parents are from Eritrea. When Gebru was 15, during the Eritrean–Ethiopian War, she fled Ethiopia after some of her family were deported to Eritrea and compelled to fight in the war. She was initially denied a U.S. visa and briefly lived in Ireland, but she eventually received political asylum in the U.S., an experience she said was "miserable". Gebru settled in Somerville, Massachusetts to attend high school, where she says she immediately started to experience racial discrimination, with some teachers refusing to allow her to take certain Advanced Placement courses, despite being a high-achiever. After she completed high school, an encounter with the police set Gebru on a course toward a focus on ethics in technology. A friend of hers, a Black woman, was assaulted in a bar, and Gebru called the police to report it. She says that instead of filing the assault report, her friend was arrested and remanded to a cell. Gebru called it a pivotal moment and a "blatant example of systemic racism." In 2001, Gebru was accepted at Stanford University. There, she earned her Bachelor of Science and Master of Science degrees in electrical engineering and her PhD in computer vision in 2017. Gebru was advised during her PhD program by Fei-Fei Li. During the 2008 United States presidential election, Gebru canvassed in support of Barack Obama. Gebru presented her doctoral research at the 2017 LDV Capital Vision Summit competition, where computer vision scientists present their work to members of industry and venture capitalists. Gebru won the competition, starting a series of collaborations with other entrepreneurs and investors. Both during her PhD program in 2016 and in 2018, Gebru returned to Ethiopia with Jelani Nelson's programming campaign, AddisCoder. While working on her PhD, Gebru authored a paper that was never published about her concern over the future of AI. She wrote of the dangers of the lack of diversity in the field, centered on her experiences with the police and on a ProPublica investigation into predictive policing, which revealed a projection of human biases in machine learning. In the paper, she scathed the "boy's club culture", reflecting on her experiences at conference gatherings of drunken male attendees sexually harassing her, and criticized the hero worship of the field's celebrities. == Career == === 2004–2013: Software development at Apple === Gebru joined Apple as an intern while at Stanford, working in their hardware division making circuitry for audio components, and was offered a full-time position the following year. Of her work as an audio engineer, her manager told Wired she was "fearless", and well-liked by her colleagues. During her tenure at Apple, Gebru became more interested in building software, namely computer vision that could detect human figures. She went on to develop signal processing algorithms for the first iPad. At the time, she said she did not consider the potential use for surveillance, saying "I just found it technically interesting." Long after leaving the company, during the #AppleToo movement in the summer of 2021, which was led by Apple engineer Cher Scarlett, who consulted with Gebru, Gebru revealed she experienced "so many egregious things" and "always wondered how they manage[d] to get out of the spotlight." She said that accountability at Apple was long overdue, and warned they could not continue to fly under the radar for much longer. Gebru also criticized the way the media covers Apple and other tech giants, saying that the press helps shield such companies from public scrutiny. === 2013–2017: Research at Stanford and Microsoft === In 2013, Gebru joined Fei-Fei Li's lab at Stanford, where she combined deep learning with Google Street View to estimate the demographics of United States neighbourhoods, showing that socioeconomic attributes such as voting patterns, income, race, and education can be inferred from observations of cars. In 2015, Gebru attended the field's top conference, Neural Information Processing Systems (NIPS), in Montreal, Canada. Out of 3,700 attendees, she noted she was one of only a few Black researchers. When she attended again the following year, she kept a tally and noted that there were only five Black men and that she was the only Black woman out of 8,500 delegates. Together with her colleague Rediet Abebe, Gebru founded Black in AI, a community of Black researchers working in artificial intelligence that aims to increase the presence, visibility, and well-being of Black professionals and leaders within the field. In the summer of 2017, Gebru joined Microsoft as a postdoctoral researcher in the Fairness, Accountability, Transparency, and Ethics in AI (FATE) lab. In 2017, Gebru spoke at the Fairness and Transparency conference, where MIT Technology Review interviewed her about biases that exist in AI systems and how adding diversity in AI teams can fix that issue. In her interview with Jackie Snow, Snow asked Gebru, "How does the lack of diversity distort artificial intelligence and specifically computer vision?" and Gebru pointed out that there are biases that exist in the software developers. While at Microsoft, Gebru co-authored a research paper called Gender Shades, which became the namesake of a project of a broader Massachusetts Institute of Technology project led by co-author Joy Buolamwini. The pair investigated facial recognition software, finding that in one particular implementation Black women were 35% less likely to be recognized than White men. === 2018–2020: Artificial intelligence ethics at Google === Gebru joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Margaret Mitchell. She studied the implications of artificial intelligence, looking to improve the ability of technology to do social good. In 2019, Gebru and other artificial intelligence researchers "signed a letter calling on Amazon to stop selling its facial-recognition technology to law enforcement agencies because it is biased against women and people of color", citing a study that was conducted by MIT researchers showing that Amazon's facial recognition system had more trouble identifying darker-skinned females than any other technology company's facial recognition software. In a New York Times interview, Gebru has further expressed that she believes facial recognition is too dangerous to be used for law enforcement and security purposes at present. === Exit from Google === In 2020 Gebru and five co-authors wrote a paper titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜". The paper examined risks of very large language models, including their environmental footprint, financial costs, the inscrutability of large models, the potential for LLMs to display prejudice against certain groups, the inability of LLMs to understand the language they process, and the use of LLMs to spread disinformation. In December 2020, her employment with Google ended after Google management asked her to either withdraw the paper before publication, or remove the names of all the Google employees from