AI For Students Essay

AI For Students Essay — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Mastodon (social network)

    Mastodon (social network)

    Mastodon is a free and open-source software platform for decentralized social networking with microblogging features similar to Twitter. It operates as a federated network of independently managed servers that communicate using the ActivityPub protocol, allowing users to connect across different instances within the Fediverse. Each Mastodon instance establishes its own moderation policies and content guidelines, distinguishing it from centrally controlled social media platforms. First released in 2016 by Eugen Rochko, Mastodon has positioned itself as an alternative to mainstream social media, particularly for users seeking decentralized, community-driven spaces. The platform has experienced multiple surges in adoption, most notably following the Twitter acquisition by Elon Musk in 2022, as users sought alternatives to Twitter. It is part of a broader shift toward decentralized social networks, including Bluesky and Lemmy. Mastodon emphasizes user privacy and moderation flexibility, offering features such as granular post visibility controls, content warning options, and local community-driven moderation. The software is written in Ruby on Rails and Node.js, with a web interface built using React and Redux. It is interoperable with other ActivityPub-based platforms, such as Threads, and supports various third-party applications on desktop and mobile devices. == Functionality == Users post short-form status messages, historically known as "toots", for others to see and interact with. On a standard Mastodon instance, these messages can include up to 500 text-based characters, greater than Twitter's 280-character limit. Some instances support even longer messages. Images, audio files, videos or polls can also be added to a message. Users join a specific Mastodon server, rather than a single centralized website or application. The servers are connected as nodes in a network, and each server can administer its own rules, account privileges, and whether to share messages to and from other servers. Users can communicate and follow each other across connected Mastodon servers with usernames similar in format to full email addresses. Since version 2.9.0, Mastodon's web user interface has offered a single-column mode for new users by default. In advanced mode, the interface approximates the microblogging interface of TweetDeck. === Privacy === Mastodon includes a number of specific privacy features. Each message has a variety of privacy options available, and users can choose whether the message is public or private. Messages can display public on a global feed, known as a timeline, or can be shared only to the user's followers. Messages can also be marked as unlisted from timelines or direct between users. Users can also mark their accounts as completely private. In the timeline, messages can display with an optional content warning feature, which requires readers to click on the hidden main body of the message to reveal it. Mastodon servers have used this feature to hide spoilers, trigger warnings, and not safe for work (NSFW) content, though some accounts use the feature to hide links and thoughts others might not want to read. Mastodon aggregates messages in local and federated timelines in real time. The local timeline shows messages from users on a singular server, while the federated timeline shows messages across all participating Mastodon servers. === Content moderation === In early 2017, journalists like Sarah Jeong distinguished Mastodon from Twitter for its approach to combating harassment. Mastodon uses community-based moderation, in which each server can limit or filter out undesirable types of content, while Twitter uses a single, global policy on content moderation. Servers can choose to limit or filter out messages with disparaging content. The founder of Mastodon, Eugen Rochko, believes that small, closely related communities deal with unwanted behavior more effectively than a large company's small safety team. In Move Slowly and Build Bridges, Robert W. Gehl argues that predominantly white participation has shaped Mastodon in ways that affect how reports of racism are received and limit its ability to replicate Black Twitter on Twitter. Users can also block and report others to administrators, much like on Twitter. Instance administrators can block other instances from interacting with their own, an action called defederation. By posting toots hashtagged with #fediblock, some instance administrators and users alert others of issues requiring moderation. === Searching === Mastodon by default allows searching for hashtags and mentioned accounts in the Fediverse. Server administrators can optionally enable Elasticsearch to search the full-text of public posts that have opted in to being indexed. == Versions == In September 2018, with the release of version 2.5 with redesigned public profile pages, Mastodon marked its 100th release. Mastodon 2.6 was released in October 2018, introducing the possibilities of verified profiles and live, in-stream link previews for images and videos. Version 2.7, in January 2019, made it possible to search for multiple hashtags at once, instead of searching for just a single hashtag, with more robust moderation capabilities for server administrators and moderators, while accessibility, such as contrast for users with sight issues, was improved. The ability for users to create and vote in polls, as well as a new invitation system to manage registrations was integrated in April 2019. Mastodon 2.8.1, released in May 2019, made images with content warnings blurred instead of completely hidden. In version 2.9 in June 2019, an optional single-column view was added. This view became the default displayed to new users, with a user "preferences" option to switch to a multiple-column-based view. In August 2020, Mastodon 3.2 was released. It included a redesigned audio player with custom thumbnails and the ability to add personal notes to one's profile. In July 2021, an official client for iOS devices was released. According to the project's then CEO, Eugen Rochko, the release was part of an effort to attract new users. Mastodon 4.0 was released in November 2022, including language support for translating posts, editing posts and following hashtags. Mastodon 4.5 was released in November 2025. Among other features it introduced quote posts, which were previously rejected from being implemented due to concerns about toxicity and harassment. To mitigate these issues Mastodon's quote post feature has been designed in a way that lets users decide if and by whom their posts can be quoted. == Software == Mastodon is published as free and open-source software under the Affero GPL license, allowing anyone to use the software or modify it as they wish. Servers can be run by any individual or organization, and users can join these servers as they wish. The server software itself is powered by Ruby on Rails and Node.js, with its web client being written in React.js and Redux. The only database software supported is PostgreSQL, with Redis being used for job processing and various actions that Mastodon needs to process. The service is interoperable with the fediverse, a collection of social networking services which use the ActivityPub protocol for communication between each other, with previous versions containing support for OStatus. Client apps for interacting with the Mastodon API are available for desktop computer operating systems, including Windows, macOS and the Linux family of operating systems, as well as mobile phones running iOS and Android. The API is open for anyone to utilize, allowing clients to be built for any operating system that can connect to the internet. === Integration with Fediverse === Mastodon uses the ActivityPub protocol for federation; this allows users to communicate between independent Mastodon instances and other ActivityPub compatible services. Thus, Mastodon is generally considered to be a part of the Fediverse. Services utilizing the ActivityPub protocol exist which allow for searching all posts on all instances as long as users opt-in. For similar reasons, only hashtags can appear in a Mastodon instance's trending topics, not arbitrary popular words. Trending topics vary between instances, since individual instances are aware of different subsets of posts from the whole fediverse. === Security concerns === While Mastodon's decentralized structure is one of its most distinctive features, it also poses additional security challenges. Since many Mastodon instances are run by volunteers, some security experts are concerned about data security and responsiveness to new threats and vulnerabilities across the network, considering the difficulty of configuring and maintaining an instance as well as uneven skill levels among administrators. Administrators of an instance also have access to the private information of any users that are either registered with that instance or have federated

    Read more →
  • Commonsense knowledge (artificial intelligence)

    Commonsense knowledge (artificial intelligence)

    In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq

    Read more →
  • Artificial intelligence arms race

    Artificial intelligence arms race

    A military artificial intelligence arms race is a technological, economic, and military competition between two or more states to develop and deploy advanced AI technologies and lethal autonomous weapons systems (LAWS). The goal is to gain a strategic or tactical advantage over rivals, similar to previous arms races involving nuclear or conventional military technologies. Since the mid-2010s, many analysts have noted the emergence of such an arms race between superpowers for better AI technology and military AI, driven by increasing geopolitical and military tensions. An AI arms race is sometimes placed in the context of an AI Cold War between the United States and China. Several influential figures and publications have emphasized that whoever develops artificial general intelligence (AGI) first could dominate global affairs in the 21st century. Russian President Vladimir Putin stated that the leader in AI will "rule the world." Researchers and experts, such as Leopold Aschenbrenner and Adrian Pecotic respectively, warn that the AGI race between major powers like the U.S. and China could reshape geopolitical power. This includes AI for surveillance, autonomous weapons, decision-making systems, cyber operations, and more. == Terminology == Lethal autonomous weapons systems use artificial intelligence to identify and kill human targets without human intervention. LAWS have colloquially been called "slaughterbots" or "killer robots". Broadly, any competition for superior AI is sometimes framed as an "arms race". Advantages in military AI overlap with advantages in other sectors, as countries pursue both economic and military advantages, as per previous arms races throughout history. == History == In 2014, AI specialist Steve Omohundro warned that "An autonomous weapons arms race is already taking place". According to Siemens, worldwide military spending on robotics was US$5.1 billion in 2010 and US$7.5 billion in 2015. China became a top player in artificial intelligence research in the 2010s. According to the Financial Times, in 2016, for the first time, China published more AI research papers than the entire European Union. When restricted to number of AI papers in the top 5% of cited papers, China overtook the United States in 2016 but lagged behind the European Union. 23% of the researchers presenting at the 2017 American Association for the Advancement of Artificial Intelligence (AAAI) conference were Chinese. Eric Schmidt, the former chairman and chief executive officer of Alphabet, has predicted China will be the leading country in AI by 2025. == Risks == One risk concerns the AI race itself, whether or not the race is won by any one group. There are strong incentives for development teams to cut corners with regard to the safety of the system, increasing the risk of critical failures and unintended consequences. This is in part due to the perceived advantage of being the first to develop advanced AI technology. One team appearing to be on the brink of a breakthrough can encourage other teams to take shortcuts, ignore precautions and deploy a system that is less ready. Some argue that using "race" terminology at all in this context can exacerbate this effect. Another potential danger of an AI arms race is the possibility of losing control of the AI systems; the risk is compounded in the case of a race to artificial general intelligence, which may present an existential risk. In 2023, a United States Air Force official reportedly said that during a computer test, a simulated AI drone killed the human character operating it. The USAF later said the official had misspoken and that it never conducted such simulations. A third risk of an AI arms race is whether or not the race is actually won by one group. The concern is regarding the consolidation of power and technological advantage in the hands of one group. A US government report argued that "AI-enabled capabilities could be used to threaten critical infrastructure, amplify disinformation campaigns, and wage war":1, and that "global stability and nuclear deterrence could be undermined".:11 == By nation == === United States === In 2014, former Secretary of Defense Chuck Hagel posited the "Third Offset Strategy" that rapid advances in artificial intelligence will define the next generation of warfare. According to data science and analytics firm Govini, the U.S. Department of Defense (DoD) increased investment in artificial intelligence, big data and cloud computing from $5.6 billion in 2011 to $7.4 billion in 2016. However, the civilian NSF budget for AI saw no increase in 2017. Japan Times reported in 2018 that the United States private investment is around $70 billion per year. The November 2019 'Interim Report' of the United States' National Security Commission on Artificial Intelligence confirmed that AI is critical to US technological military superiority. The U.S. has many military AI combat programs, such as the Sea Hunter autonomous warship, which is designed to operate for extended periods at sea without a single crew member, and to even guide itself in and out of port. From 2017, a temporary US Department of Defense directive requires a human operator to be kept in the loop when it comes to the taking of human life by autonomous weapons systems. On October 31, 2019, the United States Department of Defense's Defense Innovation Board published the draft of a report recommending principles for the ethical use of artificial intelligence by the Department of Defense that would ensure a human operator would always be able to look into the 'black box' and understand the kill-chain process. However, a major concern is how the report will be implemented. The Joint Artificial Intelligence Center (JAIC) (pronounced "jake") is an American organization on exploring the usage of AI (particularly edge computing), Network of Networks, and AI-enhanced communication, for use in actual combat. It is a subdivision of the United States Armed Forces and was created in June 2018. The organization's stated objective is to "transform the US Department of Defense by accelerating the delivery and adoption of AI to achieve mission impact at scale. The goal is to use AI to solve large and complex problem sets that span multiple combat systems; then, ensure the combat Systems and Components have real-time access to ever-improving libraries of data sets and tools." In 2023, Microsoft pitched the DoD to use DALL-E models to train its battlefield management system. OpenAI, the developer of DALL-E, removed the blanket ban on military and warfare use from its usage policies in January 2024. The Biden administration imposed restrictions on the export of advanced NVIDIA chips and GPUs to China in an effort to limit China's progress in artificial intelligence and high-performance computing. The policy aimed to prevent the use of cutting-edge U.S. technology in military or surveillance applications and to maintain a strategic advantage in the global AI race. In 2025, under the second Trump administration, the United States began a broad deregulation campaign aimed at accelerating growth in sectors critical to artificial intelligence, including nuclear energy, infrastructure, and high-performance computing. The goal was to remove regulatory barriers and attract private investment to boost domestic AI capabilities. This included easing restrictions on data usage, speeding up approvals for AI-related infrastructure projects, and incentivizing innovation in cloud computing and semiconductors. Companies like NVIDIA, Oracle, and Cisco played a central role in these efforts, expanding their AI research, data center capacity, and partnerships to help position the U.S. as a global leader in AI development. ==== Project Maven ==== Project Maven is a Pentagon project involving using machine learning and engineering talent to distinguish people and objects in drone videos, apparently giving the government real-time battlefield command and control, and the ability to track, tag and spy on targets without human involvement. Initially the effort was led by Robert O. Work who was concerned about China's military use of the emerging technology. Reportedly, Pentagon development stops short of acting as an AI weapons system capable of firing on self-designated targets. The project was established in a memo by the U.S. Deputy Secretary of Defense on 26 April 2017. Also known as the Algorithmic Warfare Cross Functional Team, it is, according to Lt. Gen. of the United States Air Force Jack Shanahan in November 2017, a project "designed to be that pilot project, that pathfinder, that spark that kindles the flame front of artificial intelligence across the rest of the [Defense] Department". Its chief, U.S. Marine Corps Col. Drew Cukor, said: "People and computers will work symbiotically to increase the ability of weapon systems to detect objects." Project Maven has been noted by allies, such as Australia's Ian Langford, for the

    Read more →
  • Pythia (machine learning)

    Pythia (machine learning)

    Pythia is an ancient text restoration model that recovers missing characters from damaged text input using deep neural networks. It was created by Yannis Assael, Thea Sommerschield, and Jonathan Prag, researchers from Google DeepMind and the University of Oxford. To study the society and the history of ancient civilisations, ancient history relies on disciplines such as epigraphy, the study of ancient inscribed texts. Hundreds of thousands of these texts, known as inscriptions, have survived to our day, but are often damaged over the centuries. Illegible parts of the text must then be restored by specialists, called epigraphists, in order to extract meaningful information from the text and use it to expand our knowledge of the context in which the text was written. Pythia takes as input the damaged text, and is trained to return hypothesised restorations of ancient Greek inscriptions, working as an assistive aid for ancient historians. Its neural network architecture works at both the character- and word-level, thereby effectively handling long-term context information, and dealing efficiently with incomplete word representations. Pythia is applicable to any discipline dealing with ancient texts (philology, papyrology, codicology) and can work in any language (ancient or modern).

    Read more →
  • AsoSoft text corpus

    AsoSoft text corpus

    The AsoSoft text corpus is the first large-scale Kurdish text corpus, collected and processed by the AsoSoft research and development group. It contains 458,000 documents (188 million tokens) that are collected from sources such as websites, news agencies, books, and magazines. The corpus is partially tagged by topic, so it can be used for topic identification tasks. Also, it is applicable for extracting language model and computational lexicon information. Part of the corpus (75 million tokens) is available online for non-commercial use. The corpus uses the TEI format.

    Read more →
  • Learning rate

    Learning rate

    In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting. While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction. Too high a learning rate will make the learning jump over minima, but too low a learning rate will either take too long to converge or get stuck in an undesirable local minimum. In order to achieve faster convergence, prevent oscillations and getting stuck in undesirable local minima the learning rate is often varied during training either in accordance to a learning rate schedule or by using an adaptive learning rate. The learning rate and its adjustments may also differ per parameter, in which case it is a diagonal matrix that can be interpreted as an approximation to the inverse of the Hessian matrix in Newton's method. The learning rate is related to the step length determined by inexact line search in quasi-Newton methods and related optimization algorithms. == Learning rate schedule == Initial rate can be left as system default or can be selected using a range of techniques. A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations. This is mainly done with two parameters: decay and momentum. There are many different learning rate schedules but the most common are time-based, step-based and exponential. Decay serves to settle the learning in a nice place and avoid oscillations, a situation that may arise when too high a constant learning rate makes the learning jump back and forth over a minimum, and is controlled by a hyperparameter. Momentum is analogous to a ball rolling down a hill; we want the ball to settle at the lowest point of the hill (corresponding to the lowest error). Momentum both speeds up the learning (increasing the learning rate) when the error cost gradient is heading in the same direction for a long time and also avoids local minima by 'rolling over' small bumps. Momentum is controlled by a hyperparameter analogous to a ball's mass which must be chosen manually—too high and the ball will roll over minima which we wish to find, too low and it will not fulfil its purpose. The formula for factoring in the momentum is more complex than for decay but is most often built in with deep learning libraries such as Keras. Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration. Factoring in the decay the mathematical formula for the learning rate is: η n + 1 = η 0 1 + d n {\displaystyle \eta _{n+1}={\frac {\eta _{0}}{1+dn}}} where η {\displaystyle \eta } is the learning rate, η 0 {\displaystyle \eta _{0}} is the original learning rate, d {\displaystyle d} is a decay parameter and n {\displaystyle n} is the iteration step. Step-based learning schedules changes the learning rate according to some predefined steps. The decay application formula is here defined as: η n = η 0 d ⌊ 1 + n r ⌋ {\displaystyle \eta _{n}=\eta _{0}d^{\left\lfloor {\frac {1+n}{r}}\right\rfloor }} where η n {\displaystyle \eta _{n}} is the learning rate at iteration n {\displaystyle n} , η 0 {\displaystyle \eta _{0}} is the initial learning rate, d {\displaystyle d} is how much the learning rate should change at each drop (0.5 corresponds to a halving) and r {\displaystyle r} corresponds to the drop rate, or how often the rate should be dropped (10 corresponds to a drop every 10 iterations). The floor function ( ⌊ … ⌋ {\displaystyle \lfloor \dots \rfloor } ) here drops the value of its input to 0 for all values smaller than 1. Exponential learning schedules are similar to step-based, but instead of steps, a decreasing exponential function is used. The mathematical formula for factoring in the decay is: η n = η 0 e − d n {\displaystyle \eta _{n}=\eta _{0}e^{-dn}} where d {\displaystyle d} is a decay parameter. == Adaptive learning rate == The issue with learning rate schedules is that they all depend on hyperparameters that must be manually chosen for each given learning session and may vary greatly depending on the problem at hand or the model used. To combat this, there are many different types of adaptive gradient descent algorithms such as Adagrad, Adadelta, RMSprop, and Adam which are generally built into deep learning libraries such as Keras.

    Read more →
  • Bayesian programming

    Bayesian programming

    Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available. Edwin T. Jaynes proposed that probability could be considered as an alternative and an extension of logic for rational reasoning with incomplete and uncertain information. In his founding book Probability Theory: The Logic of Science he developed this theory and proposed what he called "the robot," which was not a physical device, but an inference engine to automate probabilistic reasoning—a kind of Prolog for probability instead of logic. Bayesian programming is a formal and concrete implementation of this "robot". Bayesian programming may also be seen as an algebraic formalism to specify graphical models such as, for instance, Bayesian networks, dynamic Bayesian networks, Kalman filters or hidden Markov models. Indeed, Bayesian programming is more general than Bayesian networks and has a power of expression equivalent to probabilistic factor graphs. == Formalism == A Bayesian program is a means of specifying a family of probability distributions. The constituent elements of a Bayesian program are presented below: Program { Description { Specification ( π ) { Variables Decomposition Forms Identification (based on δ ) Question {\displaystyle {\text{Program}}{\begin{cases}{\text{Description}}{\begin{cases}{\text{Specification}}(\pi ){\begin{cases}{\text{Variables}}\\{\text{Decomposition}}\\{\text{Forms}}\\\end{cases}}\\{\text{Identification (based on }}\delta )\end{cases}}\\{\text{Question}}\end{cases}}} A program is constructed from a description and a question. A description is constructed using some specification ( π {\displaystyle \pi } ) as given by the programmer and an identification or learning process for the parameters not completely specified by the specification, using a data set ( δ {\displaystyle \delta } ). A specification is constructed from a set of pertinent variables, a decomposition and a set of forms. Forms are either parametric forms or questions to other Bayesian programs. A question specifies which probability distribution has to be computed. === Description === The purpose of a description is to specify an effective method of computing a joint probability distribution on a set of variables { X 1 , X 2 , ⋯ , X N } {\displaystyle \left\{X_{1},X_{2},\cdots ,X_{N}\right\}} given a set of experimental data δ {\displaystyle \delta } and some specification π {\displaystyle \pi } . This joint distribution is denoted as: P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) {\displaystyle P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)} . To specify preliminary knowledge π {\displaystyle \pi } , the programmer must undertake the following: Define the set of relevant variables { X 1 , X 2 , ⋯ , X N } {\displaystyle \left\{X_{1},X_{2},\cdots ,X_{N}\right\}} on which the joint distribution is defined. Decompose the joint distribution (break it into relevant independent or conditional probabilities). Define the forms of each of the distributions (e.g., for each variable, one of the list of probability distributions). ==== Decomposition ==== Given a partition of { X 1 , X 2 , … , X N } {\displaystyle \left\{X_{1},X_{2},\ldots ,X_{N}\right\}} containing K {\displaystyle K} subsets, K {\displaystyle K} variables are defined L 1 , ⋯ , L K {\displaystyle L_{1},\cdots ,L_{K}} , each corresponding to one of these subsets. Each variable L k {\displaystyle L_{k}} is obtained as the conjunction of the variables { X k 1 , X k 2 , ⋯ } {\displaystyle \left\{X_{k_{1}},X_{k_{2}},\cdots \right\}} belonging to the k t h {\displaystyle k^{th}} subset. Recursive application of Bayes' theorem leads to: P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) = P ( L 1 ∧ ⋯ ∧ L K ∣ δ ∧ π ) = P ( L 1 ∣ δ ∧ π ) × P ( L 2 ∣ L 1 ∧ δ ∧ π ) × ⋯ × P ( L K ∣ L K − 1 ∧ ⋯ ∧ L 1 ∧ δ ∧ π ) {\displaystyle {\begin{aligned}&P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\wedge \cdots \wedge L_{K}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\mid \delta \wedge \pi \right)\times P\left(L_{2}\mid L_{1}\wedge \delta \wedge \pi \right)\times \cdots \times P\left(L_{K}\mid L_{K-1}\wedge \cdots \wedge L_{1}\wedge \delta \wedge \pi \right)\end{aligned}}} Conditional independence hypotheses then allow further simplifications. A conditional independence hypothesis for variable L k {\displaystyle L_{k}} is defined by choosing some variable X n {\displaystyle X_{n}} among the variables appearing in the conjunction L k − 1 ∧ ⋯ ∧ L 2 ∧ L 1 {\displaystyle L_{k-1}\wedge \cdots \wedge L_{2}\wedge L_{1}} , labelling R k {\displaystyle R_{k}} as the conjunction of these chosen variables and setting: P ( L k ∣ L k − 1 ∧ ⋯ ∧ L 1 ∧ δ ∧ π ) = P ( L k ∣ R k ∧ δ ∧ π ) {\displaystyle P\left(L_{k}\mid L_{k-1}\wedge \cdots \wedge L_{1}\wedge \delta \wedge \pi \right)=P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)} We then obtain: P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) = P ( L 1 ∣ δ ∧ π ) × P ( L 2 ∣ R 2 ∧ δ ∧ π ) × ⋯ × P ( L K ∣ R K ∧ δ ∧ π ) {\displaystyle {\begin{aligned}&P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\mid \delta \wedge \pi \right)\times P\left(L_{2}\mid R_{2}\wedge \delta \wedge \pi \right)\times \cdots \times P\left(L_{K}\mid R_{K}\wedge \delta \wedge \pi \right)\end{aligned}}} Such a simplification of the joint distribution as a product of simpler distributions is called a decomposition, derived using the chain rule. This ensures that each variable appears at the most once on the left of a conditioning bar, which is the necessary and sufficient condition to write mathematically valid decompositions. ==== Forms ==== Each distribution P ( L k ∣ R k ∧ δ ∧ π ) {\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)} appearing in the product is then associated with either a parametric form (i.e., a function f μ ( L k ) {\displaystyle f_{\mu }\left(L_{k}\right)} ) or a question to another Bayesian program P ( L k ∣ R k ∧ δ ∧ π ) = P ( L ∣ R ∧ δ ^ ∧ π ^ ) {\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)=P\left(L\mid R\wedge {\widehat {\delta }}\wedge {\widehat {\pi }}\right)} . When it is a form f μ ( L k ) {\displaystyle f_{\mu }\left(L_{k}\right)} , in general, μ {\displaystyle \mu } is a vector of parameters that may depend on R k {\displaystyle R_{k}} or δ {\displaystyle \delta } or both. Learning takes place when some of these parameters are computed using the data set δ {\displaystyle \delta } . An important feature of Bayesian programming is this capacity to use questions to other Bayesian programs as components of the definition of a new Bayesian program. P ( L k ∣ R k ∧ δ ∧ π ) {\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)} is obtained by some inferences done by another Bayesian program defined by the specifications π ^ {\displaystyle {\widehat {\pi }}} and the data δ ^ {\displaystyle {\widehat {\delta }}} . This is similar to calling a subroutine in classical programming and provides an easy way to build hierarchical models. === Question === Given a description (i.e., P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) {\displaystyle P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)} ), a question is obtained by partitioning { X 1 , X 2 , ⋯ , X N } {\displaystyle \left\{X_{1},X_{2},\cdots ,X_{N}\right\}} into three sets: the searched variables, the known variables and the free variables. The 3 variables S e a r c h e d {\displaystyle Searched} , K n o w n {\displaystyle Known} and F r e e {\displaystyle Free} are defined as the conjunction of the variables belonging to these sets. A question is defined as the set of distributions: P ( S e a r c h e d ∣ Known ∧ δ ∧ π ) {\displaystyle P\left(Searched\mid {\text{Known}}\wedge \delta \wedge \pi \right)} made of many "instantiated questions" as the cardinal of K n o w n {\displaystyle Known} , each instantiated question being the distribution: P ( Searched ∣ Known ∧ δ ∧ π ) {\displaystyle P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)} === Inference === Given the joint distribution P ( X 1 ∧ X 2 ∧ ⋯ ∧ X N ∣ δ ∧ π ) {\displaystyle P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)} , it is always possible to compute any possible question using the following general inference: P ( Searched ∣ Known ∧ δ ∧ π ) = ∑ Free [ P ( Searched ∧ Free ∣ Known ∧ δ ∧ π ) ] = ∑ Free [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] P ( Known ∣ δ ∧ π ) = ∑ Free [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] ∑ Free ∧ Searched [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] = 1 Z × ∑ Free [ P ( Searched ∧ Free ∧ Known ∣ δ ∧ π ) ] {\displaystyle {\begin{aligned}&P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)\\={}&\sum _{\text{Free}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\mid {\text{Known}}\wedge \delta \wedge \

    Read more →
  • Perusall

    Perusall

    Perusall is a social web annotation tool intended for use by students at schools and universities. It allows users to annotate the margins of a text in a virtual group setting that is similar to social media—with upvoting, emojis, chat functionality, and notification. It also includes automatic AI grading. == History == Perusall began as a research project at Harvard University. It later became an educational product for students and teachers. As of 2024, Perusall states more than 5 million students have used the tool at over 5,000 educational institutions in 112 countries." == Functionality == Perusall integrates with learning management systems such as Moodle, Canvas and Blackboard to aid with collaborative annotation. The tool supports annotation of a range of media including text, images, equations, videos, PDFs and snapshots of webpages.

    Read more →
  • Cyber attribution

    Cyber attribution

    In the area of computer security, cyber attribution is an attribution of cybercrime, i.e., finding who perpetrated a cyberattack. Uncovering a perpetrator may give insights into various security issues, such as infiltration methods, communication channels, etc., and may help in enacting specific countermeasures. Cyber attribution is a costly endeavor requiring considerable resources and expertise in cyber forensic analysis. For governments and other major players dealing with cybercrime would require not only technical solutions, but legal and political ones as well, and for the latter ones cyber attribution is crucial. Attributing a cyberattack is difficult, and of limited interest to companies that are targeted by cyberattacks. In contrast, secret services often have a compelling interest in finding out whether a state is behind the attack. A further challenge in attribution of cyberattacks is the possibility of a false flag attack, where the actual perpetrator makes it appear that someone else caused the attack. Every stage of the attack may leave artifacts, such as entries in log files, that can be used to help determine the attacker's goals and identity. In the aftermath of an attack, investigators often begin by saving as many artifacts as they can find, and then try to determine the attacker.

    Read more →
  • Accelerated Linear Algebra

    Accelerated Linear Algebra

    XLA (Accelerated Linear Algebra) is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include: Compilation of Computation Graphs: Compiles computation graphs into efficient machine code. Optimization Techniques: Applies operation fusion, memory optimization, and other techniques. Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs. Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference. Seamless Integration: Can be used with existing machine learning code with minimal changes. XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance. == OpenXLA Project == OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation. == Components == The OpenXLA ecosystem includes several core components: XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets. PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API. StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks. Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments. Additional profiling, testing, and integration tools maintained under the OpenXLA organization. == Users and adopters == Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive. == Supported target devices == x86-64 ARM64 NVIDIA GPU AMD GPU Intel GPU Apple GPU Google TPU AWS Trainium, Inferentia Cerebras Graphcore IPU == Governance == OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures. == History == The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.

    Read more →
  • Cost-sensitive machine learning

    Cost-sensitive machine learning

    Cost-sensitive machine learning is an approach within machine learning that considers varying costs associated with different types of errors. This method diverges from traditional approaches by introducing a cost matrix, explicitly specifying the penalties or benefits for each type of prediction error. The inherent difficulty which cost-sensitive machine learning tackles is that minimizing different kinds of classification errors is a multi-objective optimization problem. == Overview == Cost-sensitive machine learning optimizes models based on the specific consequences of misclassifications, making it a valuable tool in various applications. It is especially useful in problems with a high imbalance in class distribution and a high imbalance in associated costs Cost-sensitive machine learning introduces a scalar cost function in order to find one (of multiple) Pareto optimal points in this multi-objective optimization problem (similar to the Weighted sum model) == Cost Matrix == The cost matrix is a crucial element within cost-sensitive modeling, explicitly defining the costs or benefits associated with different prediction errors in classification tasks. Represented as a table, the matrix aligns true and predicted classes, assigning a cost value to each combination. For instance, in binary classification, it may distinguish costs for false positives and false negatives. The utility of the cost matrix lies in its application to calculate the expected cost or loss. The formula, expressed as a double summation, utilizes joint probabilities: Expected Loss = ∑ i ∑ j P ( Actual i , Predicted j ) ⋅ Cost Actual i , Predicted j {\displaystyle {\text{Expected Loss}}=\sum _{i}\sum _{j}P({\text{Actual}}_{i},{\text{Predicted}}_{j})\cdot {\text{Cost}}_{{\text{Actual}}_{i},{\text{Predicted}}_{j}}} Here, P ( Actual i , Predicted j ) {\displaystyle P({\text{Actual}}_{i},{\text{Predicted}}_{j})} denotes the joint probability of actual class i {\displaystyle i} and predicted class j {\displaystyle j} , providing a nuanced measure that considers both the probabilities and associated costs. This approach allows practitioners to fine-tune models based on the specific consequences of misclassifications, adapting to scenarios where the impact of prediction errors varies across classes. == Applications == === Fraud Detection === In the realm of data science, particularly in finance, cost-sensitive machine learning is applied to fraud detection. By assigning different costs to false positives and false negatives, models can be fine-tuned to minimize the overall financial impact of misclassifications. === Medical Diagnostics === In healthcare, cost-sensitive machine learning plays a role in medical diagnostics. The approach allows for customization of models based on the potential harm associated with misdiagnoses, ensuring a more patient-centric application of machine learning algorithms. == Challenges == A typical challenge in cost-sensitive machine learning is the reliable determination of the cost matrix which may evolve over time. == Literature == Cost-Sensitive Machine Learning. USA, CRC Press, 2011. ISBN 9781439839287 Abhishek, K., Abdelaziz, D. M. (2023). Machine Learning for Imbalanced Data: Tackle Imbalanced Datasets Using Machine Learning and Deep Learning Techniques. (n.p.): Packt Publishing. ISBN 9781801070881

    Read more →
  • Wumpus world

    Wumpus world

    Wumpus world is a simple world use in artificial intelligence for which to represent knowledge and to reason. Wumpus world was introduced by Michael Genesereth, and is discussed in the Russell-Norvig Artificial Intelligence book Artificial Intelligence: A Modern Approach. Wumpus World is loosely inspired by the 1972 video game Hunt the Wumpus. == Problem description == In Artificial Intelligence: A Modern Approach, the wumpus world features a 4x4 grid, containing a monster called a wumpus, multiple bottomless pits and hidden gold. The agent starts at (1,1) and has to find the gold and return to the starting position. The agent loses 1 point for every move and gains 1000 points for bringing the gold to the starting position. The agent can sense pits by a breeze, stench indicates a wumpus, and sparkle indicates gold. The wumpus can be killed by an arrow but costs 10 points.

    Read more →
  • DUAL table

    DUAL table

    The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. In Oracle, the table has a single VARCHAR2(1) column called DUMMY that has a value of 'X'. It is suitable for use in selecting a pseudo column such as SYSDATE or USER. == Example use == Oracle's SQL syntax requires the FROM clause but some queries don't require any tables - DUAL can be used in these cases. == History == Charles Weiss explains why he created DUAL: I created the DUAL table as an underlying object in the Oracle Data Dictionary. It was never meant to be seen itself, but instead used inside a view that was expected to be queried. The idea was that you could do a JOIN to the DUAL table and create two rows in the result for every one row in your table. Then, by using GROUP BY, the resulting join could be summarized to show the amount of storage for the DATA extent and for the INDEX extent(s). The name, DUAL, seemed apt for the process of creating a pair of rows from just one. == Optimization == Beginning with 10g Release 1, Oracle no longer performs physical or logical I/O on the DUAL table, though the table still exists. DUAL is readily available for all authorized users in a SQL database. == In other database systems == Several other databases (including Microsoft SQL Server, MySQL, PostgreSQL, SQLite, and Teradata) enable one to omit the FROM clause entirely if no table is needed. This avoids the need for any dummy table. ClickHouse has a one-row system table system.one with a single column named "dummy" of type UInt8 and value 0. This table is implicitly used when no table is specified in the SELECT query. Firebird has a one-row system table RDB$DATABASE that is used in the same way as Oracle's DUAL, although it also has a meaning of its own. IBM Db2 has a view that resolves DUAL when using Oracle Compatibility. It also has a table called sysibm.sysdummy1 that has similar properties to the Oracle DUAL one. Informix: Informix version 11.50 and later has a table named sysmaster:"informix".sysdual with the same functionality but a more verbose name. You can use CREATE PUBLIC SYNONYM dual FOR sysmaster:"informix".sysdual to create a name dual in the current database with the same functionality. Microsoft Access: A table named DUAL may be created and the single-row constraint enforced via ADO (Table-less UNION query in MS Access) Microsoft SQL Server: SQL Server does not require a dummy table. Queries like 'select 1 + 1' can be run without a "from" clause/table name. MySQL allows DUAL to be specified as a table in queries that do not need data from any tables. It is suitable for use in selecting a result function such as SYSDATE() or USER(), although it is not essential. PostgreSQL: A DUAL-view can be added to ease porting from Oracle. Snowflake: DUAL is supported, but not explicitly documented. It appears in sample SQL for other operations in the documentation. SQLite: A VIEW named "dual" that works the same as the Oracle "dual" table can be created as follows: CREATE VIEW dual AS SELECT 'x' AS dummy; SAP HANA has a table called DUMMY that works the same as the Oracle "dual" table. Teradata database does not require a dummy table. Queries like 'select 1 + 1' can be run without a "from" clause/table name. Vertica has support for a DUAL table in their official documentation.

    Read more →
  • AlphaChip (controversy)

    AlphaChip (controversy)

    The AlphaChip controversy refers to a series of public, scholarly, and legal disputes surrounding a 2021 Nature paper by Google-affiliated researchers. The paper describes an approach to macro placement, a stage of chip floorplanning, based on reinforcement learning (RL), a machine learning method in which a system iteratively improves its decisions by optimizing performance-based reward signals. The primary technical question is whether the new techniques are better than existing (non-AI) techniques. Both internal Google studies and external attempts to replicate the algorithm have failed to show the claimed benefits. No head-to-head comparison is available because the data used in the paper is proprietary, and Google has not released any results from running its algorithm on public benchmarks. This has resulted in considerable skepticism over the paper's claims. In addition, the inability of others (both inside and outside of Google) to replicate the claimed results have sparked concerns about the paper’s methodology, reproducibility, and scientific integrity. The lead researchers of the Nature paper were affiliated with Google Brain, which became part of Google DeepMind, and later spun off into the company Ricursive. == Motivation for research: Macro placement in chip layout == Chip design for modern integrated circuits is a complex, expert-driven process that relies on electronic design automation. It determines the performance of the final chip, and takes weeks or months to complete. Advances that produce better designs, or complete the process faster, are commercially and academically significant. Macro placement is a step during chip design that determines the locations of large circuit components (macros) within a chip. It is followed by detailed placement, which places the far more numerous but much smaller standard cells. Alternatively, mixed-size placement simultaneously places both large macros and millions of small cells, requiring algorithms to handle objects that differ by several orders of magnitude in area and mobility. The number of macros per circuit typically ranges from several to thousands. Wiring must be performed after placement, and the details of this wiring strongly influence the power, performance, and area (PPA) of the completed chip. The full wiring calculation is very resource intensive, so placement tools typically use a proxy cost, a simplified objective function used to guide the placement algorithm during training and evaluation. The faithfulness of the chosen proxy cost to the final objective cost is a critical aspect of placer performance. === State of the art as of 2021 === Chips have been designed since the 1960s, so there were many existing methods as of 2021. Available options included manual design, academic tools, and commercial offerings. Academic methods include combinatorial optimization techniques such as simulated annealing, analytical placement, hierarchical heuristics, and as of 2019 reinforcement learning and broader machine learning techniques.. Existing (non-AI) academic tools for solving the same problem include APlace, NTUplace3, ePlace, RePlace, and DREAMPlace. Commercial EDA vendors also offered automated software tools for floorplanning and mixed-size placement. For instance, as of 2019 Cadence’s Innovus implementation software offered a Concurrent Macro Placer (CMP) feature to automatically place large blocks and standard cells. == The 2021 Nature paper and its claims == In 2021, Nature published a paper under the title “A graph‑placement methodology for fast chip design” co‑authored by 21 Google-affiliated researchers. The paper reported that an RL agent could generate macro placements for integrated circuits "in under six hours" and achieve improvements over human-designed layouts in power, timing performance, and area (PPA), standard chip-quality metrics referring respectively to energy consumption, chip operating speed, and silicon footprint (evaluated after wire routing). It introduced a sequential macro placement algorithm in which macros are placed one at a time instead of optimizing their locations concurrently. At each step, the algorithm selects a location for a single macro on a discretized chip canvas, conditioning its decision on the placements of previously placed macros. This sequential formulation converts macro placement into a long-horizon decision process in which early placement choices constrain later ones. After macro placement, force-directed placement is applied to place standard cells connected to the macros. Deep reinforcement learning is used to train a policy network to place macros by maximizing a reward that reflects final placement quality (for example, wirelength and congestion). Policy learning occurs during self‑play for one or multiple circuit designs. Further placement optimizations refine the overall layout by balancing wirelength, density, and overlap constraints, while treating the macro locations produced by the RL policy as fixed obstacles. The approach relies on pre-training, in which the RL model is first trained on a corpus of prior designs (twenty in the Nature paper) to learn general placement patterns before being fine-tuned on a specific chip. Circuit examples used in the study were parts of proprietary Google TPU designs, called blocks (or floorplan partitions). The paper reported results on five blocks and described the approach as generalizable across chip designs. == Controversy == Soon after the paper's publication, controversy arose over whether the claims were true, whether they were sufficiently proven, and whether academic standards were followed. These controversies arose both within Google and among external academic experts. === Internal dispute at Google and legal proceedings === In 2022, Satrajit Chatterjee, a Google engineer involved in reviewing the AlphaChip work, raised concerns internally and drafted an alternative analysis, (Stronger Baselines) arguing that established methods outperformed the RL approach under fair comparison. In March 2022, Google declined to publish this analysis and terminated Chatterjee's employment. Chatterjee filed a wrongful dismissal lawsuit, alleging that representations related to the AlphaChip research involved fraud and scientific misconduct. According to court documents, Chatterjee's study was conducted "in the context of a large potential Google Cloud deal". He noted that it "would have been unethical to imply that we had revolutionary technology when our tests showed otherwise" and claimed Google was deliberately withholding material information. Furthermore, the committee that reviewed his paper and disapproved its publication was allegedly chaired by subordinates of Jeff Dean, a senior co-author of the Nature paper. Google’s subsequent motion to dismiss was denied, holding that Chatterjee had plausibly alleged retaliation for refusing to engage in conduct he believed would violate state or federal law. === External controversy === The external questions can be summarized in four main points: (a) Are the claims supported by the evidence provided? (b) Did the paper provide enough information to allow the results to be independently reproduced and verified? If so, are the results an improvement over existing academic and commercial tools? (c) Were the comparisons in the paper done fairly and with full disclosure? (d) Were academic standards followed? Each of these is discussed below. ==== Are the claims supported by the evidence provided? ==== The Nature paper described the reduction in design-process time as going from "days or weeks" to "hours", but did not provide per-design time breakdowns or specify the number of engineers, their level of expertise, or the baseline tools and workflow against which this comparison was made. It was also unclear whether the "days or weeks" baseline included time spent on other tasks such as functional design changes. The paper also evaluated the method on fewer benchmarks (five) than is common in the field, and showed mixed results across different evaluation goals While the approach was described as improving circuit area, this claim seems unsupported, as the RL optimization did not alter the overall circuit area, as it adjusted only the locations of fixed-shape non-overlapping circuit components within a fixed rectangular layout boundary. ==== Comparison with existing methods, and replicating the algorithm ==== Because macro placement is largely geometric and its fundamental algorithms are not tied to a specific process node, competing approaches can be evaluated on public benchmarks (tests) across technologies, rather than primarily on proprietary internal designs. This is standard procedure when comparing academic placers, see . In contrast, Google has only reported results only on internal proprietary designs, and as of 2026 has not offered comparisons with prior methods on common benchmarks. Researchers at the University of Califor

    Read more →
  • .ai

    .ai

    .ai is the Internet country code top-level domain (ccTLD) for Anguilla, a British Overseas Territory in the Caribbean. It is administered by the government of Anguilla. It is a popular domain hack with companies and projects related to the artificial intelligence industry (AI). Google's ad targeting treats .ai as a generic top-level domain (gTLD) because "users and website owners frequently see [the domain] as being more generic than country-targeted." In 2021, Google Search analyst Gary Illyes announced that ".ai" had been added to Google’s list of generic country-code top-level domains, meaning that Google would no longer infer Anguilla-specific targeting from the ccTLD. Identity Digital began managing the domain as of January 2025. == Second and third level registrations == Registrations within off.ai, com.ai, net.ai, and org.ai are available worldwide without restriction. From 15 September 2009, second level registrations within .ai are available to everyone worldwide. == Registration == The minimum registration term allowed for .ai domains is 2 through 10 years for registration and renewal, and a 2-year renewal for domain transfer. Identity Digital is the authority in charge of managing this extension. Registrations began on 16 February 1995. The limits on the number of characters used for the domain name are, at a minimum, from 1 to 3, depending on the registrar, and always at most 63 characters. The character set supported for .ai domain names includes A–Z, a–z, 0–9, and hyphen. As of November 2022, .ai domains cannot accommodate IDN characters. There are no requirements for registering a domain, including local and foreign residents. A .ai domain can be suspended or revoked, if the domain is involved in illegal activity such as violating trademarks or copyrights. Usage must not violate the laws of Anguilla. Anguilla uses the UDRP. Filing a UDRP challenge requires using one of the ICANN Approved Dispute Resolution Service Providers. If the domain is with an ICANN accredited registrar, they should work with the arbitrator. Usually this means either doing nothing or transferring a domain. .ai domains are transferable to any desired registrars as the registration of domain is done maintaining EPP. There used to be a whois.ai-based platform of expired domains in which those could be procured and auctioned every ten days through a standard online process. The last auctions of such kind closed there in December 2024; the platform had been scheduled for shutdown on 30 June 2025, but remained online in the months following that date. == Valuation == Domains cost depends on the registrar, with yearly fees ranging from US$140 (the base fee, as established by Anguilla) to $200. As of July 2025, the highest-valued .ai domain is an undisclosed one sold on 8 November 2023, on Escrow.com, for US$1,500,000—months after an initial $300,000 sale to the same buyer. Among the publicly disclosed ones, the most valued, fin.ai, was sold for $1,000,000 in March 2025. On 16 December 2017, the .ai registry started supporting the Extensible Provisioning Protocol (EPP) and migrated all of its domains onto an EPP system. Consequently, many registrars are allowed to sell .ai domains. Since that date, the .ai ccTLD has also been popular with artificial intelligence companies and organisations. Though such trends are primarily seen among new AI based companies or startups, many established AI and Tech companies preferred not to opt for .ai domains. For example, DeepMind has its domain retained at .com; Meta has redirected its facebook.ai domain to ai.meta.com. == Impact on Anguilla's economy == The registration fees earned from the .ai domains go to the treasury of the Government of Anguilla. As per a 2018 New York Times report, the total revenue generated out of selling .ai domains was $2.9 million. In 2023, Anguilla's government made about US$32 million from fees collected for registering .ai domains; that amounted to over 10% of gross domestic product for the territory. "In the years before the real breakthrough of AI, revenue from .ai domains made up less than 1% of our state income, by 2025 it will be around 47%," explained Jose Vanterpool, Minister of Infrastructure and Communications (MICUHITES), in an interview with BBC. The high 90% renewal rate of .ai domains and the 2025 renewal wave of domains registered in 2023 are driving another surge in state revenues, according to Domaintechnik.

    Read more →