AI Coding Kiro

AI Coding Kiro — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Argumentation framework

    Argumentation framework

    In artificial intelligence and related fields, an argumentation framework is a way to deal with contentious information and draw conclusions from it using formalized arguments. In an abstract argumentation framework, entry-level information is a set of abstract arguments that, for instance, represent data or a proposition. Conflicts between arguments are represented by a binary relation on the set of arguments. In concrete terms, an argumentation framework is represented with a directed graph such that the nodes are the arguments, and the arrows represent the attack relation. There exist some extensions of the Dung's framework, like the logic-based argumentation frameworks or the value-based argumentation frameworks. == Abstract argumentation frameworks == === Formal framework === Abstract argumentation frameworks, also called argumentation frameworks à la Dung, are defined formally as a pair: A set of abstract elements called arguments, denoted A {\displaystyle A} A binary relation on A {\displaystyle A} , called attack relation, denoted R {\displaystyle R} For instance, the argumentation system S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } with A = { a , b , c , d } {\displaystyle A=\{a,b,c,d\}} and R = { ( a , b ) , ( b , c ) , ( d , c ) } {\displaystyle R=\{(a,b),(b,c),(d,c)\}} contains four arguments ( a , b , c {\displaystyle a,b,c} and d {\displaystyle d} ) and three attacks ( a {\displaystyle a} attacks b {\displaystyle b} , b {\displaystyle b} attacks c {\displaystyle c} and d {\displaystyle d} attacks c {\displaystyle c} ). Dung defines some notions : an argument a ∈ A {\displaystyle a\in A} is acceptable with respect to E ⊆ A {\displaystyle E\subseteq A} if and only if E {\displaystyle E} defends a {\displaystyle a} , that is ∀ b ∈ A {\displaystyle \forall b\in A} such that ( b , a ) ∈ R , ∃ c ∈ E {\displaystyle (b,a)\in R,\exists c\in E} such that ( c , b ) ∈ R {\displaystyle (c,b)\in R} , a set of arguments E {\displaystyle E} is conflict-free if there is no attack between its arguments, formally : ∀ a , b ∈ E , ( a , b ) ∉ R {\displaystyle \forall a,b\in E,(a,b)\not \in R} , a set of arguments E {\displaystyle E} is admissible if and only if it is conflict-free and all its arguments are acceptable with respect to E {\displaystyle E} . === Different semantics of acceptance === ==== Extensions ==== To decide if an argument can be accepted or not, or if several arguments can be accepted together, Dung defines several semantics of acceptance that allows, given an argumentation system, sets of arguments (called extensions) to be computed. For instance, given S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } , E {\displaystyle E} is a complete extension of S {\displaystyle S} only if it is an admissible set and every acceptable argument with respect to E {\displaystyle E} belongs to E {\displaystyle E} , E {\displaystyle E} is a preferred extension of S {\displaystyle S} only if it is a maximal element (with respect to the set-theoretical inclusion) among the admissible sets with respect to S {\displaystyle S} , E {\displaystyle E} is a stable extension of S {\displaystyle S} only if it is a conflict-free set that attacks every argument that does not belong in E {\displaystyle E} (formally, ∀ a ∈ A ∖ E , ∃ b ∈ E {\displaystyle \forall a\in A\backslash E,\exists b\in E} such that ( b , a ) ∈ R {\displaystyle (b,a)\in R} , E {\displaystyle E} is the (unique) grounded extension of S {\displaystyle S} only if it is the smallest element (with respect to set inclusion) among the complete extensions of S {\displaystyle S} . There exists some inclusions between the sets of extensions built with these semantics : Every stable extension is preferred, Every preferred extension is complete, The grounded extension is complete, If the system is well-founded (there exists no infinite sequence a 0 , a 1 , … , a n , … {\displaystyle a_{0},a_{1},\dots ,a_{n},\dots } such that ∀ i > 0 , ( a i + 1 , a i ) ∈ R {\displaystyle \forall i>0,(a_{i+1},a_{i})\in R} ), all these semantics coincide—only one extension is grounded, stable, preferred, and complete. Some other semantics have been defined. One introduce the notation E x t σ ( S ) {\displaystyle Ext_{\sigma }(S)} to note the set of σ {\displaystyle \sigma } -extensions of the system S {\displaystyle S} . In the case of the system S {\displaystyle S} in the figure above, E x t σ ( S ) = { { a , d } } {\displaystyle Ext_{\sigma }(S)=\{\{a,d\}\}} for every Dung's semantic—the system is well-founded. That explains why the semantics coincide, and the accepted arguments are: a {\displaystyle a} and d {\displaystyle d} . ==== Labellings ==== Labellings are a more expressive way than extensions to express the acceptance of the arguments. Concretely, a labelling is a mapping that associates every argument with a label in (the argument is accepted), out (the argument is rejected), or undec (the argument is undefined—not accepted or refused). One can also note a labelling as a set of pairs ( a r g u m e n t , l a b e l ) {\displaystyle ({\mathit {argument}},{\mathit {label}})} . Such a mapping does not make sense without additional constraint. The notion of reinstatement labelling guarantees the sense of the mapping. L {\displaystyle L} is a reinstatement labelling on the system S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } if and only if : ∀ a ∈ A , L ( a ) = i n {\displaystyle \forall a\in A,L(a)={\mathit {in}}} if and only if ∀ b ∈ A {\displaystyle \forall b\in A} such that ( b , a ) ∈ R , L ( b ) = o u t {\displaystyle (b,a)\in R,L(b)={\mathit {out}}} ∀ a ∈ A , L ( a ) = o u t {\displaystyle \forall a\in A,L(a)={\mathit {out}}} if and only if ∃ b ∈ A {\displaystyle \exists b\in A} such that ( b , a ) ∈ R {\displaystyle (b,a)\in R} and L ( b ) = i n {\displaystyle L(b)={\mathit {in}}} ∀ a ∈ A , L ( a ) = u n d e c {\displaystyle \forall a\in A,L(a)={\mathit {undec}}} if and only if L ( a ) ≠ i n {\displaystyle L(a)\neq {\mathit {in}}} and L ( a ) ≠ o u t {\displaystyle L(a)\neq {\mathit {out}}} One can convert every extension into a reinstatement labelling: the arguments of the extension are in, those attacked by an argument of the extension are out, and the others are undec. Conversely, one can build an extension from a reinstatement labelling just by keeping the arguments in. Indeed, Caminada proved that the reinstatement labellings and the complete extensions can be mapped in a bijective way. Moreover, the other Datung's semantics can be associated to some particular sets of reinstatement labellings. Reinstatement labellings distinguish arguments not accepted because they are attacked by accepted arguments from undefined arguments—that is, those that are not defended cannot defend themselves. An argument is undec if it is attacked by at least another undec. If it is attacked only by arguments out, it must be in, and if it is attacked some argument in, then it is out. The unique reinstatement labelling that corresponds to the system S {\displaystyle S} above is L = { ( a , i n ) , ( b , o u t ) , ( c , o u t ) , ( d , i n ) } {\displaystyle L=\{(a,{\mathit {in}}),(b,{\mathit {out}}),(c,{\mathit {out}}),(d,{\mathit {in}})\}} . === Inference from an argumentation system === In the general case when several extensions are computed for a given semantic σ {\displaystyle \sigma } , the agent that reasons from the system can use several mechanisms to infer information: Credulous inference: the agent accepts an argument if it belongs to at least one of the σ {\displaystyle \sigma } -extensions—in which case, the agent risks accepting some arguments that are not acceptable together ( a {\displaystyle a} attacks b {\displaystyle b} , and a {\displaystyle a} and b {\displaystyle b} each belongs to an extension) Skeptical inference: the agent accepts an argument only if it belongs to every σ {\displaystyle \sigma } -extension. In this case, the agent risks deducing too little information (if the intersection of the extensions is empty or has a very small cardinal). For these two methods to infer information, one can identify the set of accepted arguments, respectively C r σ ( S ) {\displaystyle Cr_{\sigma }(S)} the set of the arguments credulously accepted under the semantic σ {\displaystyle \sigma } , and S c σ ( S ) {\displaystyle Sc_{\sigma }(S)} the set of arguments accepted skeptically under the semantic σ {\displaystyle \sigma } (the σ {\displaystyle \sigma } can be missed if there is no possible ambiguity about the semantic). Of course, when there is only one extension (for instance, when the system is well-founded), this problem is very simple: the agent accepts arguments of the unique extension and rejects others. The same reasoning can be done with labellings that correspond to the chosen semantic : an argument can be accepted if it is in for each labelling and refused if it is out for each labelling, the others being in an undecided state (the status of the arguments can remind the

    Read more →
  • Action model learning

    Action model learning

    Action model learning (sometimes abbreviated action learning) is an area of machine learning concerned with the creation and modification of a software agent's knowledge about the effects and preconditions of the actions that can be executed within its environment. This knowledge is usually represented in a logic-based action description language and used as input for automated planners. Learning action models is important when goals change. When an agent acted for a while, it can use its accumulated knowledge about actions in the domain to make better decisions. Thus, learning action models differs from reinforcement learning. It enables reasoning about actions instead of expensive trials in the world. Action model learning is a form of inductive reasoning, where new knowledge is generated based on the agent's observations. The usual motivation for action model learning is the fact that manual specification of action models for planners is often a difficult, time-consuming, and error-prone task (especially in complex environments). == Action models == Given a training set E {\displaystyle E} consisting of examples e = ( s , a , s ′ ) {\displaystyle e=(s,a,s')} , where s , s ′ {\displaystyle s,s'} are observations of a world state from two consecutive time steps t , t ′ {\displaystyle t,t'} and a {\displaystyle a} is an action instance observed in time step t {\displaystyle t} , the goal of action model learning in general is to construct an action model ⟨ D , P ⟩ {\displaystyle \langle D,P\rangle } , where D {\displaystyle D} is a description of domain dynamics in action description formalism like STRIPS, ADL or PDDL and P {\displaystyle P} is a probability function defined over the elements of D {\displaystyle D} . However, many state of the art action learning methods assume determinism and do not induce P {\displaystyle P} . In addition to determinism, individual methods differ in how they deal with other attributes of domain (e.g. partial observability or sensoric noise). == Action learning methods == === State of the art === Recent action learning methods take various approaches and employ a wide variety of tools from different areas of artificial intelligence and computational logic. As an example of a method based on propositional logic, we can mention SLAF (Simultaneous Learning and Filtering) algorithm, which uses agent's observations to construct a long propositional formula over time and subsequently interprets it using a satisfiability (SAT) solver. Another technique, in which learning is converted into a satisfiability problem (weighted MAX-SAT in this case) and SAT solvers are used, is implemented in ARMS (Action-Relation Modeling System). Two mutually similar, fully declarative approaches to action learning were based on logic programming paradigm Answer Set Programming (ASP) and its extension, Reactive ASP. In another example, bottom-up inductive logic programming approach was employed. Several different solutions are not directly logic-based. For example, the action model learning using a perceptron algorithm or the multi level greedy search over the space of possible action models. In the older paper from 1992, the action model learning was studied as an extension of reinforcement learning. Nonetheless, further algorithms can be found that operate under different assumptions: FAMA can work even when some observations are missing, and it produces a general (lifted) planning model. It treats learning an action model like a planning problem, making sure the learned model matches the observations given. NOLAM can learn general action models even from noisy or imperfect data. LOCM focuses only on the order of actions in the data, ignoring any details about the states between those actions. The family of safe action model (SAM) learning methods create models that guarantee any plans made with them will actually work in the real world. There's also an extension called N-SAM that can learn action models with numeric conditions and effects. Additionally, numeric action models like N-SAM can be used to improve reinforcement learning (RL) performance through the RAMP algorithm. === Literature === Most action learning research papers are published in journals and conferences focused on artificial intelligence in general (e.g. Journal of Artificial Intelligence Research (JAIR), Artificial Intelligence, Applied Artificial Intelligence (AAI) or AAAI conferences). Despite mutual relevance of the topics, action model learning is usually not addressed in planning conferences like the International Conference on Automated Planning and Scheduling (ICAPS).

    Read more →
  • Sparkles emoji

    Sparkles emoji

    The Sparkles emoji (U+2728 ✨ SPARKLES) is an emoji that has one large star surrounded by smaller stars. Originating from Japan to represent sparkles used in anime and manga, the sparkles are often used as emphasis in text by surrounding words or phrases with it. It is the third most-used emoji in the world on Twitter as of 2021. Since the early 2020s it has been used by major software companies to represent artificial intelligence, marketing the technology as "like magic". == Development == According to Emojipedia, the Sparkles emoji was first used by Japanese mobile operators SoftBank, Docomo and au in the late 1990s. The emoji was added to Unicode 6.0 in 2010 and Emoji 1.0 in 2015. On some platforms the Sparkles emoji has been multicoloured whilst on other platforms it has been one colour. Twitter and Microsoft's Sparkles have changed from being multicoloured to being a single colour. Samsung's version of the emoji previously had a night sky in the background. == Usage == === Interpersonal communication === The Sparkles emoji was originally meant to represent the usage of sparkles in Japanese anime and manga, where the sparkles are used to represent beauty, happiness or awe. The emoji has several meanings and depends upon context. Starting in the late 2010s, the emoji started being used to surround words or phrases to be used as emphasis, an example from the book Because Internet being "I would simply ✨pass away✨". It can also be used as sarcasm, irony or as a way to mock people. Without emoji this could be represented with tildes or asterisks, for example, "~tildes~" or "~asterisk plus tilde~" or "~~true sparkle exuberance~~". The sparkles emoji can be used to represent stars in text, be used to represent cleanliness or can be used to mean "orgasm" whilst sexting. In September 2021 the Sparkles emoji overtook the Pleading Face (🥺) emoji to become the third most-used emoji in the world according to Emojipedia, with approximately 1 per cent of all tweets containing the Sparkles emoji. === Artificial intelligence === In the early 2020s, the Sparkles emoji started being used as an icon to represent artificial intelligence (AI). Companies who use the emoji this way include Google, OpenAI, Samsung, Microsoft, Adobe, Spotify and Zoom. As of August 2024, seven of the top 10 software companies by market capitalisation use the Sparkles emojis with AI. OpenAI has different versions of the Sparkles for different versions of the models that ChatGPT uses. One explanation is that Sparkles is being used by these companies as a way to market AI as "magic". Marketing technology as "magic" has been used before AI, particularly by Apple. Another explanation given by designers and marketers choosing to use Sparkles to signify AI is simply that other platforms are doing it, making it familiar to users. Around 2024, some of these companies started removing two of the smaller stars from the emoji in their AI services and have kept the one large star, an example being Google's Gemini chatbot. In early 2024, the Nielsen Norman Group provided test subjects with the star in isolation and found that people did not associate the symbol with AI, but instead mostly with "optimisation" or "favourite or save an item".

    Read more →
  • Semantic folding

    Semantic folding

    Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. == Theory == Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level. == Two-dimensional semantic space == Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vector can be obtained for any given word Y by employing the following algorithm: For each position X in the semantic map (where X represents cartesian coordinates) if the word Y is contained in the context-vector at position X then add 1 to the corresponding position in the word-vector for Y else add 0 to the corresponding position in the word-vector for Y The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. Some properties of word-SDRs that are of particular interest with respect to computational semantics are: high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". boolean logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of semantic similarity, from a simple overlap of vector elements, to a range of distance measures such as: Euclidean distance, Hamming distance, Jaccard distance, cosine similarity, Levenshtein distance, Sørensen-Dice index, etc. == Semantic spaces == Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis from Microsoft and Hyperspace Analogue to Language from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google and GloVe from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. == Visualization == The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

    Read more →
  • Color gradient

    Color gradient

    In color science, a color gradient (also known as a color ramp or a color progression) specifies a range of position-dependent colors, usually used to fill a region. In assigning colors to a set of values, a gradient is a continuous colormap, a type of color scheme. In computer graphics, the term swatch has come to mean a palette of active colors. == Definitions == Color gradient is a set of colors arranged in a linear order (ordered) A continuous colormap is a curve through a colorspace === Strict definition === A colormap is a function which associate a real value r with point c in color space C {\displaystyle C} f : [ r m i n , r m a x ] ⊂ R → C {\displaystyle f:[r_{min},r_{max}]\subset \mathbf {R} \to C} which is defined by: a colorspace C an increasing sequence of sampling points r 0 < . . . < r m ∈ [ r m i n , r m a x ] {\displaystyle r_{0}<... Read more →

  • Symbol level

    Symbol level

    In knowledge-based systems, agents choose actions based on the principle of rationality to move closer to a desired goal. The agent is able to make decisions based on knowledge it has about the world (see knowledge level). But for the agent to actually change its state, it must use whatever means it has available. This level of description for the agent's behavior is the symbol level. The term was coined by Allen Newell in 1982. For example, in a computer program, the knowledge level consists of the information contained in its data structures that it uses to perform certain actions. The symbol level consists of the program's algorithms, the data structures themselves, and so on.

    Read more →
  • JAX (software)

    JAX (software)

    JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. It is developed by Google with contributions from Nvidia and other community contributors. It is described as bringing together a modified version of the automatic differentiation system autograd and OpenXLA's XLA (Accelerated Linear Algebra). It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary features of JAX are: Providing a unified NumPy-like interface to computations that run on CPU, GPU, or TPU, in local or distributed settings. Built-in Just-In-Time (JIT) compilation via OpenXLA, an open-source machine learning compiler ecosystem. Efficient evaluation of gradients via its automatic differentiation transformations. Automatic vectorization to efficiently map functions over arrays representing batches of inputs. == Libraries using Jax == Flax Equinox Optax

    Read more →
  • AI data center

    AI data center

    An AI data center is a specialized data center facility designed for the computationally intensive tasks of training and running inference for artificial intelligence (AI) and machine learning models. Unlike general-purpose data centers, they are optimized for the parallel processing demands of AI workloads, typically using hardware such as AI accelerators (e.g., GPUs, TPUs) and high-speed interconnects. The global push to construct these specialized facilities accelerated dramatically during the AI boom of the 2020s. Memory manufacturers prioritized production of High Bandwidth Memory (HBM) essential for AI servers, which led to a global memory supply shortage amid a broader competition for advanced chips, power, and infrastructure. Major tech companies are estimated to spend $650 billion on AI data centers in 2026. == Architecture == Data centers for building and running large machine learning models contain specialized computer chips, GPUs, that use 2 to 4 times as much energy as their regular CPU counterparts (250-500 watts). AI data centers use 60 or more kilowatts per server rack, whereas more standard data centers typically use 5 to 10 kilowatts per rack. == Operators == As of August 2025, The Information tracked 18 planned or existing AI data centers in the United States, operated by Amazon Web Services, CoreWeave, Crusoe, Meta, Microsoft/OpenAI, Oracle, Tesla, and xAI. Other AI data center operators include Digital Realty and Alibaba. Data centers are also being built in China, India, Europe, Saudi Arabia, and Canada. The New Yorker described CoreWeave as the most prominent AI data center operator in the United States. Two types of data center providers for machine learning have been noted: hyperscalers and neoclouds. The Verge listed large technology companies such as Google, Meta, Microsoft, Oracle and Amazon as hyperscalers. The New York Times described neoclouds as "a new generation of data center providers". CoreWeave, Nebius, Nscale, and Lambda have been described as examples of neoclouds. In January 2025, OpenAI, in partnership with Oracle and Softbank, announced the Stargate project, which as of September 2025 is composed of six built or proposed AI data centers in the United States. In response to the Stargate project, Amazon launched in October 2025 an AI data center on 1,200 acres of farmland in Indiana. This data center, known as Project Rainier, is one of the largest AI data centers in the world, with Amazon spending $11 billion on the project. Rainier is specifically intended for training and running machine learning models from Anthropic. As of that time, this facility contains seven data centers (out of an estimated 30 planned) and will use 2.2 gigawatts of electricity (equivalent to 1 million households) and millions of gallons of water per year. Computer chips from Annapurna Labs and Anthropic, Trainium 2, were designed for use in such facilities. Amazon pumped millions of gallons of water out of the ground to construct the data center, and as of June 2025, Indiana state officials are investigating whether this dewatering process led to dry wells for local residents. In November 2025, Anthropic announced a plan in partnership with Fluidstack to develop artificial intelligence infrastructure in the United States, including data centers in New York and Texas, worth $50 billion. Other AI data center projects include the Colossus supercomputer from xAI, a Louisiana-based project from Meta, Hyperion, expected to use 5 GW of power, and a second Ohio-based Meta project, Prometheus, with a capacity of 1 GW. A 3,200-acre AI data center, capable of 4.4-4.5 GW of power and located on the decommissioned Homer City Generating Station, is under construction as of 2025, and will use seven 30-acre gas generating stations supplied by EQT. As of December 2025, CRH is working on over 100 data centers in the United States. In 2025, ExxonMobil and NextEra announced plans to build a data center powered by natural gas and using carbon capture technology, with 1.2 GW of power capacity. They previously purchased 2,500 acres of land in the Southeastern United States and plan to market the data center to an artificial intelligence company. The increased interest in AI data centers has led to several executives from companies in that space becoming billionaires, including CoreWeave, QTS, Nebius, Astera Labs, Groq, Fermi (which is connected to former United States Secretary of Energy Rick Perry), Snowflake and Cipher Mining. Several companies involved in cryptocurrency mining, such as Bitdeer, CoreWeave, Cipher Mining, TeraWulf, IREN, Core Scientific, and CleanSpark have also been involved with AI data centers. == Finances == Between January and August 2024, Microsoft, Meta, Google and Amazon collectively spent $125 billion on AI data centers. Citigroup forecasted that $2.8 trillion would be spent on AI data centers by 2030, while McKinsey and Company estimated that almost $7 trillion would be spent globally by that time. According to S&P Global, $61 billion has been spent on the data center market as a whole in 2025, while debt issuance for data centers was $182 billion during the same year. Large technology companies have offloaded the financial risks of building AI data centers by setting up special purpose vehicles or by contracting with neoclouds. For example, Meta's Hyperion was mostly funded by Blue Owl Capital, which did so using a bond offering from PIMCO. Those bonds were sold to a number of clients, including BlackRock. Meta did not borrow money itself and instead established a special purpose vehicle from which it would rent the data center. This deal was structured by Morgan Stanley for $30 billion, the largest known private capital transaction as of 2025. Neoclouds such as CoreWeave have gone into debt to buy computer chips from Nvidia for their data centers, and the chips themselves have been used for loan collateral. As of December 2025, CoreWeave took out three GPU-backed loans, collectively worth $12.4 billion, from private credit firms (Blackstone, Coatue, BlackRock, PIMCO) and from banks (Goldman Sachs, JPMorgan Chase, Wells Fargo). Thus, these companies provide an indirect connection between private credit and established banks. Data centers have also established asset-backed securities, and debt for data centers has its own derivative financial products. The real estate industry, including asset managers, public companies and private investors, has also invested in data centers. == Energy sourcing == == Environmental footprint == Average AI data centers have an electricity footprint equivalent to 100,000 households, and use billions of gallons of water for cooling their hardware. In 2025, the International Energy Agency estimated that the larger AI data centers currently under construction could consume as much electricity as 2 million households. A 2024 report from the United States Department of Energy stated that data centers overall used 17 billion gallons of water per year in the United States, primarily due to "rapid proliferation of AI servers", and that this usage was forecasted to grow to nearly 80 billion gallons by 2028. Researchers estimated that AI data centers in the United States would emit 24-44 million metric tons of carbon dioxide and use 731–1,125 million cubic meters of water per year between 2024 and 2030. Peaking power plants, which have been proposed as a power source for AI data centers, emit sulfur dioxide and have historically been located disproportionately near communities of color in the United States. Reciprocating internal combustion engines, proposed as another power source for a data center, emit PM 2.5, nitrogen oxides, and volatile organic compounds. == AI data centers in the United States == In the United States, both the Biden administration and second Trump administration supported the construction of AI data centers. In January 2025, then-president Joe Biden signed an executive order for federal government agencies to support AI data centers on federal sites built by private companies, study their effect on energy prices, and encourage their use of renewable energy. In April 2025, the United States Department of Energy suggested 16 possible sites, including Los Alamos National Laboratory, Sandia National Laboratories and Oak Ridge National Laboratory. In its July 2025 AI Action Plan, the second Trump administration supported increased production of AI data centers. Several US states have incentivized local data center construction. For example, in 2024, lawmakers in Michigan approved tax breaks for data center equipment and construction material. Some data center companies have also invested or promised to invest in the infrastructure of local communities. In December 2025, Democratic senators Elizabeth Warren, Chris Van Hollen, and Richard Blumenthal wrote to seven technology companies (Google, Microsoft, Amazon, Meta, CoreWeave, Digital Realty, and Equinix) that they w

    Read more →
  • OpenFog Consortium

    OpenFog Consortium

    The OpenFog Consortium (sometimes stylized as Open Fog Consortium) was a consortium of high tech industry companies and academic institutions across the world aimed at the standardization and promotion of fog computing in various capacities and fields. The consortium was founded by Cisco Systems, Intel, Microsoft, Princeton University, Dell, and ARM Holdings in 2015 and now has 57 members across the North America, Asia, and Europe, including Forbes 500 companies and noteworthy academic institutions. The OpenFog consortium merged with the Industrial Internet Consortium, now the Industry IoT Consortium, on January 31, 2019. == History == OpenFog was created on November 19, 2015, by ARM Holdings, Cisco Systems, Dell, Intel, Microsoft, and Princeton University. The idea for a consortium centered on the advancement and dissemination of fog computing was thought up by Helder Antunes, a Cisco executive with a history in IoT, Mung Chiang, then a Princeton University professor and now President of Purdue University, and Dr. Tao Zhang, a Cisco Distinguished Engineer and CIO for the IEEE Communications Society then and now a manager at the National Institute of Standards and Technologies (NIST). The project was executed from concept to launch by Armando Pereira at PVentures Consulting, a Silicon Valley–based high-tech consulting firm. OpenFog released its reference architecture for fog computing on February 13, 2017. The Fog World Congress 2017, with Dr. Tao Zhang as its General Chair, was hosted in October 2017 by OpenFog, in conjunction with the IEEE Communications Society, as the first congress devoted to fog computing. == Administration == The OpenFog Consortium was governed by its board of directors, which is chaired by Cisco Senior Director Helder Antunes. The board of directors is made up of 11 seats, each representing one of the following companies and institutions: ARM, AT&T, Cisco, Dell, Intel, Microsoft, Princeton University, IEEE, GE, ZTE and Shanghai Tech University. The consortium's general membership comprised 13 academic members: Aalto University, Arizona State University, California Institute of Technology, Georgia State University, National Chiao Tung University, National Taiwan University, Shanghai Research Centre for Wireless Communication, Chinese University of Hong Kong, University of Colorado Boulder, University of Southern California, University of Pisa, Vanderbilt University, Wayne State University, and 20 additional members: Hitachi, Internet Initiative Japan, Itochu, Kii, Nebbiolo, PrismTech, NEC, NGD Systems, NTT Communications, OSIsoft, Real-time Innovations, relayr, Sakura Internet, Stichting imec Nederland, Toshiba, TTT Tech, Fujitsu, FogHorn Systems, TTTech and MARSEC. == Published work == The OpenFog Consortium published the white paper, "OpenFog Reference Architecture". This document outlines the eight pillars of an OpenFog architecture:Security; Scalability; Open; Autonomy; Programmability; RAS (reliability, availability and serviceability); Agility; and Hierarchy. It also incorporates a glossary for fog computing terms. In July 2018, the IEEE Standards Association announced it had adopted the OpenFog Reference Architecture as the first standard for fog computing.

    Read more →
  • Spike-and-slab regression

    Spike-and-slab regression

    Spike-and-slab regression is a type of Bayesian linear regression in which a particular hierarchical prior distribution for the regression coefficients is chosen such that only a subset of the possible regressors is retained. The technique is particularly useful when the number of possible predictors is larger than the number of observations. The idea of the spike-and-slab model was originally proposed by Mitchell & Beauchamp (1988). The approach was further significantly developed by Madigan & Raftery (1994) and George & McCulloch (1997). A recent and important contribution to this literature is Ishwaran & Rao (2005). == Model description == Suppose we have P possible predictors in some model. Vector γ has a length equal to P and consists of zeros and ones. This vector indicates whether a particular variable is included in the regression or not. If no specific prior information on initial inclusion probabilities of particular variables is available, a Bernoulli prior distribution is a common default choice. Conditional on a predictor being in the regression, we identify a prior distribution for the model coefficient, which corresponds to that variable (β). A common choice on that step is to use a normal prior with a mean equal to zero and a large variance calculated based on ( X T X ) − 1 {\displaystyle (X^{T}X)^{-1}} (where X {\displaystyle X} is a design matrix of explanatory variables of the model). A draw of γ from its prior distribution is a list of the variables included in the regression. Conditional on this set of selected variables, we take a draw from the prior distribution of the regression coefficients (if γi = 1 then βi ≠ 0 and if γi = 0 then βi = 0). βγ denotes the subset of β for which γi = 1. In the next step, we calculate a posterior probability for both inclusion and coefficients by applying a standard statistical procedure. All steps of the described algorithm are repeated thousands of times using the Markov chain Monte Carlo (MCMC) technique. As a result, we obtain a posterior distribution of γ (variable inclusion in the model), β (regression coefficient values) and the corresponding prediction of y. The model got its name (spike-and-slab) due to the shape of the two prior distributions. The "spike" is the probability of a particular coefficient in the model to be zero. The "slab" is the prior distribution for the regression coefficient values. An advantage of Bayesian variable selection techniques is that they are able to make use of prior knowledge about the model. In the absence of such knowledge, some reasonable default values can be used; to quote Scott and Varian (2013): "For the analyst who prefers simplicity at the cost of some reasonable assumptions, useful prior information can be reduced to an expected model size, an expected R2, and a sample size ν determining the weight given to the guess at R2." Some researchers suggest the following default values: R2 = 0.5, ν = 0.01, and π = 0.5 (parameter of a prior Bernoulli distribution).

    Read more →
  • List of artificial intelligence journals

    List of artificial intelligence journals

    This is a list of notable peer-reviewed academic journals that publish research in the field of artificial intelligence (AI), including areas such as machine learning, computer vision, natural language processing, robotics, and intelligent systems. == General artificial intelligence == Artificial Intelligence (journal) – Elsevier Journal of Artificial Intelligence Research (JAIR) – AI Access Foundation Knowledge-Based Systems – Elsevier == Machine learning == Data Mining and Knowledge Discovery – Springer Machine Learning (journal) – Springer Journal of Machine Learning Research – Microtome Pattern Recognition (journal) – Elsevier Neural Networks (journal) – Elsevier Neural Computation (journal) – MIT Press Neurocomputing (journal) - Elsevier == Deep learning and neural computation == IEEE Transactions on Evolutionary Computation – IEEE IEEE Transactions on Neural Networks and Learning Systems – IEEE Nature Machine Intelligence – Springer Nature == Computer vision == International Journal of Computer Vision – Springer IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Machine Vision and Applications – Springer == Natural language processing == Computational Linguistics (journal) – MIT Press Natural Language Processing Transactions of the Association for Computational Linguistics – ACL == Robotics and intelligent systems == IEEE Transactions on Robotics – IEEE Autonomous Robots – Springer Journal of Intelligent & Robotic Systems – Springer == Interdisciplinary and ethics in AI == AI & Society – Springer Artificial Life – MIT Press Philosophy & Technology – Springer Minds and Machines – Springer

    Read more →
  • Something Big Is Happening

    Something Big Is Happening

    "Something Big Is Happening" is an essay by Matt Shumer, an AI entrepreneur, about the impact of artificial intelligence, published in February 2026, that has since been reportedly viewed more than 80 million times and widely discussed. Shumer noted that the technology has crossed an important threshold, where AI has become capable of creating self-improving systems. Referring to one the most recent AI models, he wrote: "It was making intelligent decisions. It had something that felt, for the first time, like judgment. Like taste." Speaking to CNBC's Power Lunch, Shumer said that his "core message" is "people in the workforce should start to use and experiment with AI tools so they can understand what’s coming". Even as the essay was widely shared and discussed, the essay also elicited criticism. Paulo Carvao, in an essay published by the Forbes Magazine stated that some of his advice is sound, but added: "It reads at times like a sales pitch. He urges readers to subscribe to the most advanced AI tools. He implies that those with access to premium models will outpace those without. He frames paid AI subscriptions as a form of insurance against obsolescence." Writing in The Guardian, Dan Milmo and Aisha Down mentioned Shumer as having a history of AI hype and stated, "He previously excited the internet by announcing the release of the world's "top open-source model", which it was not". Many workers in the technology sector criticized the article in blog posts shared on Hacker News; Edward Zitron commented that "while coding LLMs can test products, or scan/fix some bugs, this suggests they A) do this autonomously without human input, B) they do this correctly every time (or ever!)." In an article alluding to Shumer's original post, Ari Colaprete wrote "the LLM is fundamentally a writing machine, it does everything via text, and if you make it produce writing that exists purely to serve some sort of mechanical function, and you train it to succeed in that task, then it will tend to do so, even with vast intricacy."

    Read more →
  • CloudMinds

    CloudMinds

    CloudMinds is an operator of cloud-based systems for cognitive robotics. == History == CloudMinds was founded in 2015 and is backed by SoftBank, Foxconn, Walden Venture Investments, and Keytone Ventures. CloudMinds has developed research in smart devices, robot control, high-speed security networks, and cloud intelligence integration. CloudMinds developed the Mobile Intranet Cloud Services (MCS) based on these technologies in order to increase the information security of the cloud robot remote control. The technology has been applied in the fields of finance, medicine, the military, public safety, and large-scale manufacturing. == U.S. sanctions == In May 2020, CloudMinds was added to the Bureau of Industry and Security's Entity List due to U.S. national security concerns.

    Read more →
  • Learning rate

    Learning rate

    In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting. While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction. Too high a learning rate will make the learning jump over minima, but too low a learning rate will either take too long to converge or get stuck in an undesirable local minimum. In order to achieve faster convergence, prevent oscillations and getting stuck in undesirable local minima the learning rate is often varied during training either in accordance to a learning rate schedule or by using an adaptive learning rate. The learning rate and its adjustments may also differ per parameter, in which case it is a diagonal matrix that can be interpreted as an approximation to the inverse of the Hessian matrix in Newton's method. The learning rate is related to the step length determined by inexact line search in quasi-Newton methods and related optimization algorithms. == Learning rate schedule == Initial rate can be left as system default or can be selected using a range of techniques. A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations. This is mainly done with two parameters: decay and momentum. There are many different learning rate schedules but the most common are time-based, step-based and exponential. Decay serves to settle the learning in a nice place and avoid oscillations, a situation that may arise when too high a constant learning rate makes the learning jump back and forth over a minimum, and is controlled by a hyperparameter. Momentum is analogous to a ball rolling down a hill; we want the ball to settle at the lowest point of the hill (corresponding to the lowest error). Momentum both speeds up the learning (increasing the learning rate) when the error cost gradient is heading in the same direction for a long time and also avoids local minima by 'rolling over' small bumps. Momentum is controlled by a hyperparameter analogous to a ball's mass which must be chosen manually—too high and the ball will roll over minima which we wish to find, too low and it will not fulfil its purpose. The formula for factoring in the momentum is more complex than for decay but is most often built in with deep learning libraries such as Keras. Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration. Factoring in the decay the mathematical formula for the learning rate is: η n + 1 = η 0 1 + d n {\displaystyle \eta _{n+1}={\frac {\eta _{0}}{1+dn}}} where η {\displaystyle \eta } is the learning rate, η 0 {\displaystyle \eta _{0}} is the original learning rate, d {\displaystyle d} is a decay parameter and n {\displaystyle n} is the iteration step. Step-based learning schedules changes the learning rate according to some predefined steps. The decay application formula is here defined as: η n = η 0 d ⌊ 1 + n r ⌋ {\displaystyle \eta _{n}=\eta _{0}d^{\left\lfloor {\frac {1+n}{r}}\right\rfloor }} where η n {\displaystyle \eta _{n}} is the learning rate at iteration n {\displaystyle n} , η 0 {\displaystyle \eta _{0}} is the initial learning rate, d {\displaystyle d} is how much the learning rate should change at each drop (0.5 corresponds to a halving) and r {\displaystyle r} corresponds to the drop rate, or how often the rate should be dropped (10 corresponds to a drop every 10 iterations). The floor function ( ⌊ … ⌋ {\displaystyle \lfloor \dots \rfloor } ) here drops the value of its input to 0 for all values smaller than 1. Exponential learning schedules are similar to step-based, but instead of steps, a decreasing exponential function is used. The mathematical formula for factoring in the decay is: η n = η 0 e − d n {\displaystyle \eta _{n}=\eta _{0}e^{-dn}} where d {\displaystyle d} is a decay parameter. == Adaptive learning rate == The issue with learning rate schedules is that they all depend on hyperparameters that must be manually chosen for each given learning session and may vary greatly depending on the problem at hand or the model used. To combat this, there are many different types of adaptive gradient descent algorithms such as Adagrad, Adadelta, RMSprop, and Adam which are generally built into deep learning libraries such as Keras.

    Read more →
  • Feature engineering

    Feature engineering

    Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability. Beyond machine learning, the principles of feature engineering are applied in various scientific fields, including physics. For example, physicists construct dimensionless numbers such as the Reynolds number in fluid dynamics, the Nusselt number in heat transfer, and the Archimedes number in sedimentation. They also develop first approximations of solutions, such as analytical solutions for the strength of materials in mechanics. == Clustering == One of the applications of feature engineering has been clustering of feature-objects or sample-objects in a dataset. Especially, feature engineering based on matrix decomposition has been extensively used for data clustering under non-negativity constraints on the feature coefficients. These include Non-Negative Matrix Factorization (NMF), Non-Negative Matrix-Tri Factorization (NMTF), Non-Negative Tensor Decomposition/Factorization (NTF/NTD), etc. The non-negativity constraints on coefficients of the feature vectors mined by the above-stated algorithms yields a part-based representation, and different factor matrices exhibit natural clustering properties. Several extensions of the above-stated feature engineering methods have been reported in literature, including orthogonality-constrained factorization for hard clustering, and manifold learning to overcome inherent issues with these algorithms. Other classes of feature engineering algorithms include leveraging a common hidden structure across multiple inter-related datasets to obtain a consensus (common) clustering scheme. An example is Multi-view Classification based on Consensus Matrix Decomposition (MCMD), which mines a common clustering scheme across multiple datasets. MCMD is designed to output two types of class labels (scale-variant and scale-invariant clustering), and: is computationally robust to missing information, can obtain shape- and scale-based outliers, and can handle high-dimensional data effectively. Coupled matrix and tensor decompositions are popular in multi-view feature engineering. == Predictive modelling == Feature engineering in machine learning and statistical modeling involves selecting, creating, transforming, and extracting data features. Key components include feature creation from existing data, transforming and imputing missing or invalid features, reducing data dimensionality through methods like Principal Components Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), and selecting the most relevant features for model training based on importance scores and correlation matrices. Features vary in significance. Even relatively insignificant features may contribute to a model. Feature selection can reduce the number of features to prevent a model from becoming too specific to the training data set (overfitting). Feature explosion occurs when the number of identified features is too large for effective model estimation or optimization. Common causes include: Feature templates - implementing feature templates instead of coding new features Feature combinations - combinations that cannot be represented by a linear system Feature explosion can be limited via techniques such as regularization, kernel methods, and feature selection. == Automation == Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a decision tree. Deep Feature Synthesis uses simpler methods. === Multi-relational Decision Tree Learning (MRDTL) === Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as decision nodes, refined systematically until a specific termination criterion is reached. Most MRDTL studies base implementations on relational databases, which results in many redundant operations. These redundancies can be reduced by using techniques such as tuple id propagation. === Open-source implementations === There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: featuretools is a Python library for transforming time series and relational data into feature matrices for machine learning. MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. OneBM or One-Button Machine combines feature transformations and feature selection on relational data with feature selection techniques. OneBM helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost. getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in C/C++ with a Python interface. It has been shown to be at least 60 times faster than tsflex, tsfresh, tsfel, featuretools or kats. tsfresh is a Python library for feature extraction on time series data. It evaluates the quality of the features using hypothesis testing. tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. seglearn is an extension for multivariate, sequential time series data to the scikit-learn Python library. tsfel is a Python package for feature extraction on time series data. kats is a Python toolkit for analyzing time series data. === Deep feature synthesis === The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores == The feature store is where the features are stored and organized for the explicit purpose of being used to either train models (by data scientists) or make predictions (by applications that have a trained model). It is a central location where you can either create or update groups of features created from multiple different data sources, or create and update new datasets from those feature groups for training models or for use in applications that do not want to compute the features but just retrieve them when it needs them to make predictions. A feature store includes the ability to store code used to generate features, apply the code to raw data, and serve those features to models upon request. Useful capabilities include feature versioning and policies governing the circumstances under which features can be used. Feature stores can be standalone software tools or built into machine learning platforms. == Alternatives == Feature engineering can be a time-consuming and error-prone process, as it requires domain expertise and often involves trial and error. Deep learning algorithms may be used to process a large raw dataset without having to resort to feature engineering. However, deep learning algorithms still require careful preprocessing and cleaning of the input data. In addition, choosing the right architecture, hyperparameters, and optimization algorithm for a deep neural network can be a challenging and iterative process.

    Read more →