AI Avatar Arbitrage

AI Avatar Arbitrage — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • VACUUM

    VACUUM

    VACUUM is a set of normative guidance principles for achieving training and test dataset quality for structured datasets in data science and machine learning. The garbage-in, garbage out principle motivates a solution to the problem of data quality but does not offer a specific solution. Unlike the majority of the ad-hoc data quality assessment metrics often used by practitioners VACUUM specifies qualitative principles for data quality management and serves as a basis for defining more detailed quantitative metrics of data quality. VACUUM is an acronym that stands for: valid accurate consistent uniform unified model

    Read more →
  • Top 10 AI Text-to-image Tools Compared (2026)

    Top 10 AI Text-to-image Tools Compared (2026)

    Comparing the best AI text-to-image tool? An AI text-to-image tool is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI text-to-image tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • AI Virtual Assistants: Free vs Paid (2026)

    AI Virtual Assistants: Free vs Paid (2026)

    Trying to pick the best AI virtual assistant? An AI virtual assistant is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI virtual assistant slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Is an AI Paraphrasing Tool Worth It in 2026?

    Is an AI Paraphrasing Tool Worth It in 2026?

    Curious about the best AI paraphrasing tool? An AI paraphrasing tool is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI paraphrasing tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Accelerated Linear Algebra

    Accelerated Linear Algebra

    XLA (Accelerated Linear Algebra) is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include: Compilation of Computation Graphs: Compiles computation graphs into efficient machine code. Optimization Techniques: Applies operation fusion, memory optimization, and other techniques. Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs. Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference. Seamless Integration: Can be used with existing machine learning code with minimal changes. XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance. == OpenXLA Project == OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation. == Components == The OpenXLA ecosystem includes several core components: XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets. PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API. StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks. Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments. Additional profiling, testing, and integration tools maintained under the OpenXLA organization. == Users and adopters == Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive. == Supported target devices == x86-64 ARM64 NVIDIA GPU AMD GPU Intel GPU Apple GPU Google TPU AWS Trainium, Inferentia Cerebras Graphcore IPU == Governance == OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures. == History == The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.

    Read more →
  • Tagged Deterministic Finite Automaton

    Tagged Deterministic Finite Automaton

    In the automata theory, a tagged deterministic finite automaton (TDFA) is an extension of deterministic finite automaton (DFA). In addition to solving the recognition problem for regular languages, TDFA is also capable of submatch extraction and parsing. While canonical DFA can find out if a string belongs to the language defined by a regular expression, TDFA can also extract substrings that match specific subexpressions. More generally, TDFA can identify positions in the input string that match tagged positions in a regular expression (tags are meta-symbols similar to capturing parentheses, but without the pairing requirement). == History == TDFA were first described by Ville Laurikari in 2000. Prior to that it was unknown whether it is possible to perform submatch extraction in one pass on a deterministic finite-state automaton, so this paper was an important advancement. Laurikari described TDFA construction and gave a proof that the determinization process terminates, however the algorithm did not handle disambiguation correctly. In 2007 Chris Kuklewicz implemented TDFA in a Haskell library Regex-TDFA with POSIX longest-match semantics. Kuklewicz gave an informal description of the algorithm and answered the principal question whether TDFA are capable of POSIX longest-match disambiguation, which was doubted by other researchers. In 2017 Ulya Trafimovich described TDFA with one-symbol lookahead. The use of a lookahead symbol reduces the number of registers and register operations in a TDFA, which makes it faster and often smaller than Laurikari TDFA. Trafimovich called TDFA variants with and without lookahead TDFA(1) and TDFA(0) by analogy with LR parsers LR(1) and LR(0). The algorithm was implemented in the open-source lexer generator RE2C. Trafimovich formalized Kuklewicz disambiguation algorithm. In 2018 Angelo Borsotti worked on an experimental Java implementation of TDFA; it was published later in 2021. In 2019 Borsotti and Trafimovich adapted POSIX disambiguation algorithm by Okui and Suzuki to TDFA. They gave a formal proof of correctness of the new algorithm and showed that it is faster than Kuklewicz algorithm in practice. In 2020 Trafimovich published an article about TDFA implementation in RE2C. In 2022 Borsotti and Trafimovich published a paper with a detailed description of TDFA construction. The paper incorporated their past research and presented multi-pass TDFA that are better suited to just-in-time determinization. They also compared TDFA against other algorithms and provided benchmarks. == Formal definition == TDFA have the same basic structure as ordinary DFA: a finite set of states linked by transitions. In addition to that, TDFA have a fixed set of registers that hold tag values, and register operations on transitions that set or copy register values. The values may be scalar offsets, or offset lists for tags that match repeatedly (the latter can be represented efficiently using a trie structure). There is no one-to-one mapping between tags in a regular expression and registers in a TDFA: a single tag may need many registers, and the same register may hold values of different tags. The following definition is according to Trafimovich and Borsotti. The original definition by Laurikari is slightly different. A tagged deterministic finite automaton F {\displaystyle F} is a tuple ( Σ , T , S , S f , s 0 , R , R f , δ , φ ) {\displaystyle (\Sigma ,T,S,S_{f},s_{0},R,R_{f},\delta ,\varphi )} , where: Σ {\displaystyle \Sigma } is a finite set of symbols (alphabet) T {\displaystyle T} is a finite set of tags S {\displaystyle S} is a finite set of states with initial state s 0 {\displaystyle s_{0}} and a subset of final states S f ⊆ S {\displaystyle S_{f}\subseteq S} R {\displaystyle R} is a finite set of registers with a subset of final registers R f {\displaystyle R_{f}} (one per tag) δ : S × Σ → S × O ∗ {\displaystyle \delta :S\times \Sigma \rightarrow S\times O^{}} is a transition function φ : S f → O ∗ {\displaystyle \varphi :S_{f}\rightarrow O^{}} is a final function, where O {\displaystyle O} is a set of register operations of the following types: set register i {\displaystyle i} to nil or to the current position: i ← v {\displaystyle i\leftarrow v} , where v ∈ { n , p } {\displaystyle v\in \{\mathbf {n} ,\mathbf {p} \}} copy register j {\displaystyle j} to register i {\displaystyle i} : i ← j {\displaystyle i\leftarrow j} copy register j {\displaystyle j} to register i {\displaystyle i} and append history: i ← j ⋅ h {\displaystyle i\leftarrow j\cdot h} , where h {\displaystyle h} is a string over { n , p } {\displaystyle \{\mathbf {n} ,\mathbf {p} \}} === Example === Figure 0 shows an example TDFA for regular expression ( 1 a 2 ) ∗ 3 ( a | 4 b ) 5 b ∗ {\displaystyle (1a2)^{}3(a|4b)5b^{}} with alphabet Σ = { a , b } {\displaystyle \Sigma =\{a,b\}} and a set of tags T = { 1 , 2 , 3 , 4 , 5 } {\displaystyle T=\{1,2,3,4,5\}} that matches strings of the form a … a b … b {\displaystyle a\dots ab\dots b} with at least one symbol. TDFA has four states S = { 0 , 1 , 2 , 3 } {\displaystyle S=\{0,1,2,3\}} three of which are final S f = { 1 , 2 , 3 } {\displaystyle S_{f}=\{1,2,3\}} . The set of registers is R = { r 1 , r 2 , r 3 , r 4 , r 5 } {\displaystyle R=\{r_{1},r_{2},r_{3},r_{4},r_{5}\}} with a subset of final registers R f = { r 1 , r 2 , r 3 , r 4 , r 5 } {\displaystyle R_{f}=\{r_{1},r_{2},r_{3},r_{4},r_{5}\}} where register r i {\displaystyle r_{i}} corresponds to i {\displaystyle i} -th tag. Transitions have operations defined by the δ {\displaystyle \delta } function, and final states have operations defined by the φ {\displaystyle \varphi } function (marked with wide-tipped arrow). For example, to match string a a b {\displaystyle aab} , one starts in state 0, matches the first a {\displaystyle a} and moves to state 1 (setting registers r 1 , r 2 {\displaystyle r_{1},r_{2}} to undefined and r 3 {\displaystyle r_{3}} to the current position 0), matches the second a {\displaystyle a} and loops to state 1 (register values are now r 1 = 0 , r 2 = r 3 = 1 {\displaystyle r_{1}=0,r_{2}=r_{3}=1} ), matches b {\displaystyle b} and moves to state 2 (register values are now r 1 = 1 , r 2 = r 3 = r 4 = 2 {\displaystyle r_{1}=1,r_{2}=r_{3}=r_{4}=2} ), executes the final operations in state 2 (register values are now r 1 = 1 , r 2 = r 3 = r 4 = 2 , r 5 = 3 {\displaystyle r_{1}=1,r_{2}=r_{3}=r_{4}=2,r_{5}=3} ) and finally exits TDFA. == Complexity == Canonical DFA solve the recognition problem in linear time. The same holds for TDFA, since the number of registers and register operations is fixed and depends only on the regular expression, but not on the length of input. The overhead on submatch extraction depends on tag density in a regular expression and nondeterminism degree of each tag (the maximum number of registers needed to track all possible values of the tag in a single TDFA state). On one extreme, if there are no tags, a TDFA is identical to a canonical DFA. On the other extreme, if every subexpression is tagged, a TDFA effectively performs full parsing and has many operations on every transition. In practice for real-world regular expressions with a few submatch groups the overhead is negligible compared to matching with canonical DFA. == TDFA construction == TDFA construction is performed in a few steps. First, a regular expression is converted to a tagged nondeterministic finite automaton (TNFA). Second, a TNFA is converted to a TDFA using a determinization procedure; this step also includes disambiguation that resolves conflicts between ambiguous TNFA paths. After that, a TDFA can optionally go through a number of optimizations that reduce the number of registers and operations, including minimization that reduces the number of states. Algorithms for all steps of TDFA construction with pseudocode are given in the paper by Borsotti and Trafimovich. This section explains TDFA construction on the example of a regular expression a ∗ t b ∗ | a b {\displaystyle a^{}tb^{}|ab} , where t {\displaystyle t} is a tag and { a , b } {\displaystyle \{a,b\}} are alphabet symbols. === Tagged NFA === TNFA is a nondeterministic finite automaton with tagged ε-transitions. It was first described by Laurikari, although similar constructions were known much earlier as Mealy machines and nondeterministic finite-state transducers. TNFA construction is very similar to Thompson's construction: it mirrors the structure of a regular expression. Importantly, TNFA preserves ambiguity in a regular expression: if it is possible to match a string in two different ways, then TNFA for this regular expression has two different accepting paths for this string. TNFA definition by Borsotti and Trafimovich differs from the original one by Laurikari in that TNFA can have negative tags on transitions: they are needed to make the absence of match explicit in cases when there is a bypass for a tagged transition. Figure 1 shows TNFA for the example regu

    Read more →
  • AI Text-to-image Tools: Free vs Paid (2026)

    AI Text-to-image Tools: Free vs Paid (2026)

    Shopping for the best AI text-to-image tool? An AI text-to-image tool is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI text-to-image tool slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Paul Christiano

    Paul Christiano

    Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests. He serves as the Head of Safety for the Center for AI Standards and Innovation inside NIST. He formerly led the language model alignment team at OpenAI and became founder and head of the non-profit Alignment Research Center (ARC), which works on theoretical AI alignment and evaluations of machine learning models. In 2023, Christiano was named as one of the TIME 100 Most Influential People in AI (TIME100 AI). In September 2023, Christiano was appointed to the UK government's Frontier AI Taskforce advisory board. Before working at the Center for AI Standards and Innovation, he was an initial trustee on Anthropic's Long-Term Benefit Trust. == Education == Christiano attended the Harker School in San Jose, California. He competed on the U.S. team and won a silver medal at the 49th International Mathematics Olympiad (IMO) in 2008. In 2012, Christiano graduated from the Massachusetts Institute of Technology (MIT) with a degree in mathematics. At MIT, he researched data structures, quantum cryptography, and combinatorial optimization. He then went on to complete a PhD at the University of California, Berkeley. While at Berkeley, Christiano collaborated with researcher Katja Grace on AI Impacts, co-developing a preliminary methodology for comparing supercomputers to brains, using traversed edges per second (TEPS). He also experimented with putting Carl Shulman's donor lottery theory into practice, raising nearly $50,000 in a pool to be donated to a single charity. == Career == At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF). He is considered one of the principal architects of RLHF, which in 2017 was "considered a notable step forward in AI safety research", according to The New York Times. Other works such as "AI safety via debate" (2018) focus on the problem of scalable oversight – supervising AIs in domains where humans would have difficulty judging output quality. Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the Alignment Research Center to focus on this area. One subject of study is the problem of eliciting latent knowledge from advanced machine learning models. ARC also develops techniques to identify and test whether an AI model is potentially dangerous. In April 2023, Christiano told The Economist that ARC was considering developing an industry standard for AI safety. As of April 2024, Christiano was listed as the head of AI safety for the US AI Safety Institute at NIST. One month earlier in March 2024, staff members and scientists at the institute threatened to resign upon being informed of Christiano's pending appointment to the role, stating that his ties to the effective altruism movement may jeopardize the AI Safety Institute's objectivity and integrity. === Views on AI risks === He is known for his views on the potential risks of advanced AI. In 2017, Wired magazine stated that Christiano and his colleagues at OpenAI weren't worried about the destruction of the human race by "evil robots", explaining that "[t]hey’re more concerned that, as AI progresses beyond human comprehension, the technology’s behavior may diverge from our intended goals." However, in a widely quoted interview with Business Insider in 2023, Christiano said that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.” == Personal life == Christiano is married to Ajeya Cotra, a member of METR's technical staff.

    Read more →
  • DreamLab

    DreamLab

    DreamLab was a volunteer computing Android and iOS app launched in 2015 by Imperial College London and the Vodafone Foundation. It was discontinued on 2nd April 2025. == Description == The app helped to research cancer, COVID-19, new drugs and tropical cyclones. To do this, DreamLab accessed part of the device's processing power, with the user's consent, while the owner charged their smartphone, to speed up the calculations of the algorithms from Imperial College London. The aim of the tropical cyclone project was to prepare for climate change risks. Other projects aimed to find existing drugs and food molecules that could help people with COVID-19 and other diseases. The performance of 100,000 smartphones would reach the annual output of all research computers at Imperial College in just three months, with a nightly runtime of six hours. The app was developed in 2015 by the Garvan Institute of Medical Research in Sydney and the Vodafone Foundation. In May 2020, the project had over 490,000 registered users.

    Read more →
  • Tamara Broderick

    Tamara Broderick

    Tamara Ann Broderick is an American computer scientist at the Massachusetts Institute of Technology. She works on machine learning and Bayesian inference. == Education and early career == Broderick is from Parma Heights, Ohio. She attended Laurel School and graduated in 2003. Whilst at high school she took part in the inaugural Massachusetts Institute of Technology Women's Technology Program. She studied mathematics at Princeton University, earning a bachelor's degree in 2007. She was a Marshall scholar, allowing her to pursue graduate research at the University of Cambridge. She was a runner-up in the Association for Women in Mathematics Alice T. Shafer Prize for Excellence in Mathematics. She was co-president of the Princeton Math Club and organised a competition for high school maths teams. She won the Phi Beta Kappa Prize for the highest academic average at Princeton University. During her undergraduate degree, Broderick worked on dark matter haloes with Rachel Mandelbaum. Broderick moved to the United Kingdom for her graduate studies, earning a Master of Advanced Studies for completing Part III of the Mathematical Tripos at the University of Cambridge in 2009. Her Master's thesis looked at the Nomon selection method, improving the efficiency of communications. She returned to America in 2009, joining University of California, Berkeley for her Master's and PhD. Her graduate research was supported by the Berkeley Fellowship and a National Science Foundation Fellowship. Her PhD thesis Clusters and features from combinatorial stochastic processes looked at clustering and speeding up the analysis of large, streaming data sets. In 2013 she was selected for the Berkeley EECS Rising Stars conference. == Research and career == Broderick joined Massachusetts Institute of Technology as an assistant professor in 2015. She is interested in Bayesian statistics and graphical models. She was the recipient of a Google Faculty Research Grant and International Society for Bayesian Analysis Lifetime Members Junior Researcher Award. She was awarded an Army Research Office young investigator program award to investigate machine-learning to quantify uncertainty in data analysis. Broderick is also Alfred P. Sloan Foundation scholar. === Academic service === In 2018, Broderick spoke at the Harvard University Institute for Applied Computational Science Women in Data Science conference. She spoke about Bayesian inference at the 2018 International Conference on Machine Learning. She led a three-day Masterclass on machine learning at University College London in June 2018. Broderick is a scientific advisor for AI.Reverie and WiML (Women in Machine Learning). She has developed a high-school level introduction to machine learning with the Women's Technology Program (WTP). Software she has developed is available on her website. === Awards and honors === Broderick was awarded the Evelyn Fix Memorial Medal and Citation and the International Society for Bayesian Analysis Savage Award for her doctoral thesis. She was awarded a National Science Foundation CAREER Award to scale her machine learning techniques. She was a 2021 Leadership Academy winner of the Committee of Presidents of Statistical Societies.

    Read more →
  • AI Text-to-video Tools Reviews: What Actually Works in 2026

    AI Text-to-video Tools Reviews: What Actually Works in 2026

    Looking for the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Timnit Gebru

    Timnit Gebru

    Timnit W. Gebru (Amharic and Tigrinya: ትምኒት ገብሩ; 1982/1983) is an Eritrean Ethiopian-born computer scientist who works in the fields of artificial intelligence (AI), algorithmic bias and data mining. She is a co-founder of Black in AI, an advocacy group that has pushed for more Black roles in AI development and research. She is the founder of the Distributed Artificial Intelligence Research Institute (DAIR). In December 2020, public controversy erupted over the circumstances surrounding Gebru's departure from Google, where she was technical co-lead of the Ethical Artificial Intelligence Team. Gebru had coauthored a paper on the risks of large language models (LLMs) acting as stochastic parrots, and submitted it for publication. According to Jeff Dean, head of Google AI, the paper was submitted without waiting for Google's internal review, which then asserted that it ignored too much relevant research. Google management requested that Gebru either withdraw the paper or remove the names of all the authors employed by Google. Gebru requested the identity and feedback of every reviewer, and stated that if Google refused, she would talk to her manager about "a last date". Google terminated her employment immediately, stating that they were accepting her resignation. Gebru maintained that she had not formally offered to resign, and only threatened to. Gebru has been widely recognized for her expertise in the ethics of artificial intelligence. She was named one of the World's 50 Greatest Leaders by Fortune and one of Nature's ten people who shaped science in 2021, and in 2022, one of Time's most influential people. == Early life and education == Gebru was raised in Addis Ababa, Ethiopia. Her father, an electrical engineer with a Doctor of Philosophy (PhD), died when she was five years old, and she was raised by her mother, an economist. Both her parents are from Eritrea. When Gebru was 15, during the Eritrean–Ethiopian War, she fled Ethiopia after some of her family were deported to Eritrea and compelled to fight in the war. She was initially denied a U.S. visa and briefly lived in Ireland, but she eventually received political asylum in the U.S., an experience she said was "miserable". Gebru settled in Somerville, Massachusetts to attend high school, where she says she immediately started to experience racial discrimination, with some teachers refusing to allow her to take certain Advanced Placement courses, despite being a high-achiever. After she completed high school, an encounter with the police set Gebru on a course toward a focus on ethics in technology. A friend of hers, a Black woman, was assaulted in a bar, and Gebru called the police to report it. She says that instead of filing the assault report, her friend was arrested and remanded to a cell. Gebru called it a pivotal moment and a "blatant example of systemic racism." In 2001, Gebru was accepted at Stanford University. There, she earned her Bachelor of Science and Master of Science degrees in electrical engineering and her PhD in computer vision in 2017. Gebru was advised during her PhD program by Fei-Fei Li. During the 2008 United States presidential election, Gebru canvassed in support of Barack Obama. Gebru presented her doctoral research at the 2017 LDV Capital Vision Summit competition, where computer vision scientists present their work to members of industry and venture capitalists. Gebru won the competition, starting a series of collaborations with other entrepreneurs and investors. Both during her PhD program in 2016 and in 2018, Gebru returned to Ethiopia with Jelani Nelson's programming campaign, AddisCoder. While working on her PhD, Gebru authored a paper that was never published about her concern over the future of AI. She wrote of the dangers of the lack of diversity in the field, centered on her experiences with the police and on a ProPublica investigation into predictive policing, which revealed a projection of human biases in machine learning. In the paper, she scathed the "boy's club culture", reflecting on her experiences at conference gatherings of drunken male attendees sexually harassing her, and criticized the hero worship of the field's celebrities. == Career == === 2004–2013: Software development at Apple === Gebru joined Apple as an intern while at Stanford, working in their hardware division making circuitry for audio components, and was offered a full-time position the following year. Of her work as an audio engineer, her manager told Wired she was "fearless", and well-liked by her colleagues. During her tenure at Apple, Gebru became more interested in building software, namely computer vision that could detect human figures. She went on to develop signal processing algorithms for the first iPad. At the time, she said she did not consider the potential use for surveillance, saying "I just found it technically interesting." Long after leaving the company, during the #AppleToo movement in the summer of 2021, which was led by Apple engineer Cher Scarlett, who consulted with Gebru, Gebru revealed she experienced "so many egregious things" and "always wondered how they manage[d] to get out of the spotlight." She said that accountability at Apple was long overdue, and warned they could not continue to fly under the radar for much longer. Gebru also criticized the way the media covers Apple and other tech giants, saying that the press helps shield such companies from public scrutiny. === 2013–2017: Research at Stanford and Microsoft === In 2013, Gebru joined Fei-Fei Li's lab at Stanford, where she combined deep learning with Google Street View to estimate the demographics of United States neighbourhoods, showing that socioeconomic attributes such as voting patterns, income, race, and education can be inferred from observations of cars. In 2015, Gebru attended the field's top conference, Neural Information Processing Systems (NIPS), in Montreal, Canada. Out of 3,700 attendees, she noted she was one of only a few Black researchers. When she attended again the following year, she kept a tally and noted that there were only five Black men and that she was the only Black woman out of 8,500 delegates. Together with her colleague Rediet Abebe, Gebru founded Black in AI, a community of Black researchers working in artificial intelligence that aims to increase the presence, visibility, and well-being of Black professionals and leaders within the field. In the summer of 2017, Gebru joined Microsoft as a postdoctoral researcher in the Fairness, Accountability, Transparency, and Ethics in AI (FATE) lab. In 2017, Gebru spoke at the Fairness and Transparency conference, where MIT Technology Review interviewed her about biases that exist in AI systems and how adding diversity in AI teams can fix that issue. In her interview with Jackie Snow, Snow asked Gebru, "How does the lack of diversity distort artificial intelligence and specifically computer vision?" and Gebru pointed out that there are biases that exist in the software developers. While at Microsoft, Gebru co-authored a research paper called Gender Shades, which became the namesake of a project of a broader Massachusetts Institute of Technology project led by co-author Joy Buolamwini. The pair investigated facial recognition software, finding that in one particular implementation Black women were 35% less likely to be recognized than White men. === 2018–2020: Artificial intelligence ethics at Google === Gebru joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Margaret Mitchell. She studied the implications of artificial intelligence, looking to improve the ability of technology to do social good. In 2019, Gebru and other artificial intelligence researchers "signed a letter calling on Amazon to stop selling its facial-recognition technology to law enforcement agencies because it is biased against women and people of color", citing a study that was conducted by MIT researchers showing that Amazon's facial recognition system had more trouble identifying darker-skinned females than any other technology company's facial recognition software. In a New York Times interview, Gebru has further expressed that she believes facial recognition is too dangerous to be used for law enforcement and security purposes at present. === Exit from Google === In 2020 Gebru and five co-authors wrote a paper titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜". The paper examined risks of very large language models, including their environmental footprint, financial costs, the inscrutability of large models, the potential for LLMs to display prejudice against certain groups, the inability of LLMs to understand the language they process, and the use of LLMs to spread disinformation. In December 2020, her employment with Google ended after Google management asked her to either withdraw the paper before publication, or remove the names of all the Google employees from

    Read more →
  • Text normalization

    Text normalization

    Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.

    Read more →
  • Best AI Blog Writers in 2026

    Best AI Blog Writers in 2026

    Trying to pick the best AI blog writer? An AI blog writer is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI blog writer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Top 10 AI Customer-support Bots Compared (2026)

    Top 10 AI Customer-support Bots Compared (2026)

    Trying to pick the best AI customer-support bot? An AI customer-support bot is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI customer-support bot slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →