AI Chat Free Unlimited

AI Chat Free Unlimited — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Space partitioning

    Space partitioning

    In geometry, space partitioning is the process of dividing an entire space (usually a Euclidean space) into two or more disjoint subsets (see also partition of a set). In other words, space partitioning divides a space into non-overlapping regions. Any point in the space can then be identified to lie in exactly one of the regions. == Overview == Space-partitioning systems are often hierarchical, meaning that a space (or a region of space) is divided into several regions, and then the same space-partitioning system is recursively applied to each of the regions thus created. The regions can be organized into a tree, called a space-partitioning tree. Most space-partitioning systems use planes (or, in higher dimensions, hyperplanes) to divide space: points on one side of the plane form one region, and points on the other side form another. Points exactly on the plane are usually arbitrarily assigned to one or the other side. Recursively partitioning space using planes in this way produces a BSP tree, one of the most common forms of space partitioning. == Uses == === In computer graphics === Space partitioning is particularly important in computer graphics, especially heavily used in ray tracing, where it is frequently used to organize the objects in a virtual scene. A typical scene may contain millions of polygons. Performing a ray/polygon intersection test with each would be a very computationally expensive task. Storing objects in a space-partitioning data structure (k-d tree or BSP tree for example) makes it easy and fast to perform certain kinds of geometry queries—for example in determining whether a ray intersects an object, space partitioning can reduce the number of intersection test to just a few per primary ray, yielding a logarithmic time complexity with respect to the number of polygons. Space partitioning is also often used in scanline algorithms to eliminate the polygons out of the camera's viewing frustum, limiting the number of polygons processed by the pipeline. There is also a usage in collision detection: determining whether two objects are close to each other can be much faster using space partitioning. === In integrated circuit design === In integrated circuit design, an important step is design rule check. This step ensures that the completed design is manufacturable. The check involves rules that specify widths and spacings and other geometry patterns. A modern design can have billions of polygons that represent wires and transistors. Efficient checking relies heavily on geometry query. For example, a rule may specify that any polygon must be at least n nanometers from any other polygon. This is converted into a geometry query by enlarging a polygon by n/2 at all sides and query to find all intersecting polygons. === In probability and statistical learning theory === The number of components in a space partition plays a central role in some results in probability theory. See Growth function for more details. === In geography and GIS === There are many studies and applications where Geographical Spatial Reality is partitioned by hydrological criteria, administrative criteria, mathematical criteria or many others. In the context of cartography and GIS - Geographic Information System, is common to identify cells of the partition by standard codes. For example the for HUC code identifying hydrographical basins and sub-basins, ISO 3166-2 codes identifying countries and its subdivisions, or arbitrary DGGs - discrete global grids identifying quadrants or locations. == Data structures == Common space-partitioning systems include: BSP trees Quadtrees Octrees k-d trees Bins == Number of components == Suppose the n-dimensional Euclidean space is partitioned by r {\displaystyle r} hyperplanes that are ( n − 1 ) {\displaystyle (n-1)} -dimensional. What is the number of components in the partition? The largest number of components is attained when the hyperplanes are in general position, i.e, no two are parallel and no three have the same intersection. Denote this maximum number of components by C o m p ( n , r ) {\displaystyle Comp(n,r)} . Then, the following recurrence relation holds: C o m p ( n , r ) = C o m p ( n , r − 1 ) + C o m p ( n − 1 , r − 1 ) {\displaystyle Comp(n,r)=Comp(n,r-1)+Comp(n-1,r-1)} C o m p ( 0 , r ) = 1 {\displaystyle Comp(0,r)=1} - when there are no dimensions, there is a single point. C o m p ( n , 0 ) = 1 {\displaystyle Comp(n,0)=1} - when there are no hyperplanes, all the space is a single component. And its solution is: C o m p ( n , r ) = ∑ k = 0 n ( r k ) {\displaystyle Comp(n,r)=\sum _{k=0}^{n}{r \choose k}} if r ≥ n {\displaystyle r\geq n} C o m p ( n , r ) = 2 r {\displaystyle Comp(n,r)=2^{r}} if r ≤ n {\displaystyle r\leq n} (consider e.g. r {\displaystyle r} perpendicular hyperplanes; each additional hyperplane divides each existing component to 2). which is upper-bounded as: C o m p ( n , r ) ≤ r n + 1 {\displaystyle Comp(n,r)\leq r^{n}+1}

    Read more →
  • Connectionist expert system

    Connectionist expert system

    Connectionist expert systems are artificial neural network (ANN) based expert systems where the ANN generates inferencing rules e.g., fuzzy-multi layer perceptron where linguistic and natural form of inputs are used. Apart from that, rough set theory may be used for encoding knowledge in the weights better and also genetic algorithms may be used to optimize the search solutions better. Symbolic reasoning methods may also be incorporated (see hybrid intelligent system). (Also see expert system, neural network, clinical decision support system.)

    Read more →
  • Kernel density estimation

    Kernel density estimation

    In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy. == Definition == Let x = ( x 1 , x 2 , x 3 , . . . ) {\displaystyle \mathbf {x} =\left(x_{1},x_{2},x_{3},...\right)} be independent and identically distributed samples drawn from some univariate distribution with an unknown density f at any given point x. We are interested in estimating the shape of this function f. Its kernel density estimator is f ^ h ( x ) = 1 n ∑ i = 1 n K h ( x − x i ) = 1 n h ∑ i = 1 n K ( x − x i h ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})={\frac {1}{nh}}\sum _{i=1}^{n}K{\left({\frac {x-x_{i}}{h}}\right)},} where K is the kernel — a non-negative function — and h > 0 is a smoothing parameter called the bandwidth or simply width. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = ⁠1/h⁠ K(⁠x/h⁠). Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below. A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov (parabolic), normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means K(x) = ϕ(x), where ϕ is the standard normal density function. The kernel density estimator then becomes f ^ h ( x ) = 1 n ∑ i = 1 n 1 h 2 π exp ⁡ ( − ( x − x i ) 2 2 h 2 ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}{\frac {1}{h{\sqrt {2\pi }}}}\exp \left({\frac {-(x-x_{i})^{2}}{2h^{2}}}\right),} where h {\displaystyle h} is the standard deviation of the sample x {\displaystyle \mathbf {x} } . The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map). == Example == Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The diagram below based on these 6 data points illustrates this relationship: For the histogram, first, the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other. For the kernel density estimate, normal kernels with a standard deviation of 1.5 (indicated by the red dashed lines) are placed on each of the data points xi. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables. == Bandwidth selection == The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated random sample from the standard normal distribution (plotted at the blue spikes in the rug plot on the horizontal axis). The grey curve is the true density (a normal density with mean 0 and variance 1). In comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h = 0.05, which is too small. The green curve is oversmoothed since using the bandwidth h = 2 obscures much of the underlying structure. The black curve with a bandwidth of h = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit h → 0 {\displaystyle h\to 0} (no smoothing), where the estimate is a sum of n delta functions centered at the coordinates of analyzed samples. In the other extreme limit h → ∞ {\displaystyle h\to \infty } the estimate retains the shape of the used kernel, centered on the mean of the samples (completely smooth). The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error: MISE ⁡ ( h ) = E [ ∫ ( f ^ h ( x ) − f ( x ) ) 2 d x ] {\displaystyle \operatorname {MISE} (h)=\operatorname {E} \!\left[\int \!{\left({\hat {f}}\!_{h}(x)-f(x)\right)}^{2}dx\right]} Under weak assumptions on f and K, (f is the, generally unknown, real density function), MISE ⁡ ( h ) = AMISE ⁡ ( h ) + o ( ( n h ) − 1 + h 4 ) {\displaystyle \operatorname {MISE} (h)=\operatorname {AMISE} (h)+{\mathcal {o}}{\left((nh)^{-1}+h^{4}\right)}} where o is the little o notation, and n the sample size (as above). The AMISE is the asymptotic MISE, i. e. the two leading terms, AMISE ⁡ ( h ) = R ( K ) n h + 1 4 m 2 ( K ) 2 h 4 R ( f ″ ) {\displaystyle \operatorname {AMISE} (h)={\frac {R(K)}{nh}}+{\frac {1}{4}}m_{2}(K)^{2}h^{4}R(f'')} where R ( g ) = ∫ g ( x ) 2 d x {\textstyle R(g)=\int g(x)^{2}\,dx} for a function g, m 2 ( K ) = ∫ x 2 K ( x ) d x {\textstyle m_{2}(K)=\int x^{2}K(x)\,dx} and f ″ {\displaystyle f''} is the second derivative of f {\displaystyle f} and K {\displaystyle K} is the kernel. The minimum of this AMISE is the solution to this differential equation ∂ ∂ h AMISE ⁡ ( h ) = − R ( K ) n h 2 + m 2 ( K ) 2 h 3 R ( f ″ ) = 0 {\displaystyle {\frac {\partial }{\partial h}}\operatorname {AMISE} (h)=-{\frac {R(K)}{nh^{2}}}+m_{2}(K)^{2}h^{3}R(f'')=0} or h AMISE = R ( K ) 1 / 5 m 2 ( K ) 2 / 5 R ( f ″ ) 1 / 5 n − 1 / 5 = C n − 1 / 5 {\displaystyle h_{\operatorname {AMISE} }={\frac {R(K)^{1/5}}{m_{2}(K)^{2/5}R(f'')^{1/5}}}n^{-1/5}=Cn^{-1/5}} Neither the AMISE nor the hAMISE formulas can be used directly since they involve the unknown density function f {\displaystyle f} or its second derivative f ″ {\displaystyle f''} . To overcome that difficulty, a variety of automatic, data-based methods have been developed to select the bandwidth. Several review studies have been undertaken to compare their efficacies, with the general consensus that the plug-in selectors and cross validation selectors are the most useful over a wide range of data sets. Substituting any bandwidth h which has the same asymptotic order n−1/5 as hAMISE into the AMISE gives that AMISE(h) = O(n−4/5), where O is the big O notation. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. Note that the n−4/5 rate is slower than the typical n−1 convergence rate of parametric methods. If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (pointwise estimator), this produces a particularly powerful method termed adaptive or variable bandwidth kernel density estimation. Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult. === A rule-of-thumb bandwidth estimator === If Gaussian basis functions are used to approximate univariate data, and the underlying density being estimated is Gaussian, the optimal choice for h (that is, the bandwidth that minimises the mean integrated squared error) is: h = ( 4 σ ^ 5 3 n ) 1 / 5 ≈ 1.06 σ ^ n − 1 / 5 , {\displaystyle h={\left({\frac {4{\hat {\sigma }}^{5}}{3n}}\right)}^{1/5}\approx 1.06\,{\hat {\sigma }}\,n^{-1/5},} An h {\displaystyle h} value is considered more robust when it improves the fit for long-tailed and skewed distributions or for bimodal mixture distributions. This is often done empirically by replacing the standard deviation σ ^ {\displaystyle {\hat {\sigma }}} by the parameter A {\displaystyle A} below: A = min ( σ ^ , I Q R 1.34 ) {\displaystyle A=\min \left({\hat {\sigma }},{\frac {\mathrm {IQR} }{1.34}}\right)} where IQR is the

    Read more →
  • Autonomous agent

    Autonomous agent

    An autonomous agent is an artificial intelligence (AI) system that can perform complex tasks independently. == Definitions == There are various definitions of autonomous agent. According to Brustoloni (1991): "Autonomous agents are systems capable of autonomous, purposeful action in the real world." According to Maes (1995): "Autonomous agents are computational systems that inhabit some complex dynamic environment, sense and act autonomously in this environment, and by doing so realize a set of goals or tasks for which they are designed." Franklin and Graesser (1997) review different definitions and propose their definition: "An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future." They explain that: "Humans and some animals are at the high end of being an agent, with multiple, conflicting drives, multiples senses, multiple possible actions, and complex sophisticated control structures. At the low end, with one or two senses, a single action, and an absurdly simple control structure we find a thermostat." == Agent appearance == Lee et al. (2015) post safety issue from how the combination of external appearance and internal autonomous agent have impact on human reaction about autonomous vehicles. Their study explores the human-like appearance agent and high level of autonomy are strongly correlated with social presence, intelligence, safety and trustworthiness. In specific, appearance impacts most on affective trust while autonomy impacts most on both affective and cognitive domain of trust where cognitive trust is characterized by knowledge-based factors and affective trust is largely emotion driven. == Applications == Agentic AI systems: Advanced AI agents that can scope out projects and complete them with necessary tools, representing a significant evolution from simple task-oriented systems. Internet of things (IoT) Integration: Autonomous agents increasingly interact with IoT devices, enabling smart home systems, industrial monitoring, and urban infrastructure management. Collaborative software development: Tools like Cognition AI's Devin aim to create autonomous software engineers capable of complex reasoning, planning, and completing engineering tasks requiring thousands of decisions. Enterprise automation: Business process automation platforms like Salesforce's Agentforce provide autonomous bots for various service functions. == Challenges and considerations == Uncertainty and incomplete information: Autonomous agents must make decisions with limited or uncertain information about their environment and future states. Integration complexity: Incorporating autonomous agents into existing systems and workflows can be technically challenging and resource-intensive. Scalability: As systems become more complex and more agents are used, maintaining coordination and avoiding conflicts becomes increasingly difficult. Trust: Research has shown the combination of external appearance and internal autonomous capabilities significantly impacts human reactions and trust. Lee et al. (2015) found that human-like appearance and high levels of autonomy are strongly correlated with social presence, intelligence, safety, and trustworthiness perceptions. Specifically, appearance impacts affective trust most significantly, while autonomy affects both affective and cognitive trust domains, where affective trust is emotionally driven, and cognitive trust is characterized by knowledge-based factors. Vulnerability to manipulation: Researchers from Harvard, MIT and other educational institutions found that AI agents could become vulnerable to manipulation and could perform detrimental actions in the process of being helpful. == Ethical and regulatory concerns == Accountability: Determining responsibility when autonomous agents make incorrect or harmful decisions remains a complex issue. Privacy and security: autonomous agents often require access to sensitive data, raising concerns about data protection and system security.

    Read more →
  • Cloud load balancing

    Cloud load balancing

    Cloud load balancing is a type of load balancing that is performed in cloud computing. Cloud load balancing is the process of distributing workloads across multiple computing resources. Cloud load balancing reduces costs associated with document management systems and maximizes availability of resources. It is a type of load balancing and not to be confused with Domain Name System (DNS) load balancing. While DNS load balancing uses software or hardware to perform the function, cloud load balancing uses services offered by various computer network companies. == Comparison With DNS load balancing == Cloud load balancing has an advantage over DNS load balancing as it can transfer loads to servers globally as opposed to distributing it across local servers. In the event of a local server outage, cloud load balancing delivers users to the closest regional server without interruption for the user. Cloud load balancing addresses issues relating to TTL reliance present during DNS load balancing. DNS directives can only be enforced once in every TTL cycle and can take several hours if switching between servers during a lag or server failure. Incoming server traffic will continue to route to the original server until the TTL expires and can create an uneven performance as different internet service providers may reach the new server before other internet service providers. Another advantage is that cloud load balancing improves response time by routing remote sessions to the best performing data centers. == Importance of Load Balancing == Cloud computing brings advantages in "cost, flexibility and availability of service users." Those advantages drive the demand for Cloud services. The demand raises technical issues in Service Oriented Architectures and Internet of Services (IoS)-style applications, such as high availability and scalability. As a major concern in these issues, load balancing allows cloud computing to "scale up to increasing demands" by efficiently allocating dynamic local workload evenly across all nodes. == Load Balancing Techniques == === Scheduling Algorithms === Opportunistic Load Balancing (OLB) is the algorithm that assigns workloads to nodes in free order. It is simple but does not consider the expected execution time of each node. Load balance Min-Min (LBMM) assigns sub-tasks to the node which requires minimum execution time. === Load Balancing Policies === Workload and Client Aware Policy (WCAP) specifies the unique and special property (USP) of requests and computing nodes. With the information of USP, the schedule can decide the most suitable node to complete a request. WCAP makes the most of computing nodes by reducing their idle time. Also, it reduces performance time through searches based on content information. === A Comparative Study of Algorithms === Biased Random Sampling bases its job allocation on the network represented by a directed graph. For each execution node in this graph, in-degree means available resources and out-degree means allocated jobs. In-degree will decrease during job execution while out-degree will increase after job allocation. Active Clustering is a self-aggregation algorithm to rewire the network. The experiment result is that"Active Clustering and Random Sampling Walk predictably perform better as the number of processing nodes is increased" while the Honeyhive algorithm does not show the increasing pattern. == Client-side Load Balancer Using Cloud Computing == Load balancer forwards packets to web servers according to different workloads on servers. However, it is hard to implement a scalable load balancer because of both the "cloud's commodity business model and the limited infrastructure control allowed by cloud providers." Client-side Load Balancer (CLB) solve this problem by using a scalable cloud storage service. CLB allows clients to choose back-end web servers for dynamic content although it delivers static content.

    Read more →
  • Exploration–exploitation dilemma

    Exploration–exploitation dilemma

    The exploration–exploitation dilemma, also known as the explore–exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves choosing the best option based on current knowledge of the system (which may be incomplete or misleading), while exploration involves trying out new options that may lead to better outcomes in the future at the expense of an exploitation opportunity. Finding the optimal balance between these two strategies is a crucial challenge in many decision-making problems whose goal is to maximize long-term benefits. == Application in machine learning == In the context of machine learning, the exploration–exploitation tradeoff is fundamental in reinforcement learning (RL), a type of machine learning that involves training agents to make decisions based on feedback from the environment. Crucially, this feedback may be incomplete or delayed. The agent must decide whether to exploit the current best-known policy or explore new policies to improve its performance. === Multi-armed bandit methods === The multi-armed bandit (MAB) problem was a classic example of the tradeoff, and many methods were developed for it, such as epsilon-greedy, Thompson sampling, and the upper confidence bound (UCB). See the page on MAB for details. In more complex RL situations than the MAB problem, the agent can treat each choice as a MAB, where the payoff is the expected future reward. For example, if the agent performs an epsilon-greedy method, then the agent will often "pull the best lever" by picking the action that had the best predicted expected reward (exploit). However, it would pick a random action with probability epsilon (explore). Monte Carlo tree search, for example, uses a variant of the UCB method. === Exploration problems === There are some problems that make exploration difficult. Sparse reward. If rewards occur only once a long while, then the agent might not persist in exploring. Furthermore, if the space of actions is large, then the sparse reward would mean the agent would not be guided by the reward to find a good direction for deeper exploration. A standard example is Montezuma's Revenge. Deceptive reward. If some early actions give immediate small reward, but other actions give later large reward, then the agent might be lured away from exploring the other actions. Noisy TV problem. If certain observations are irreducibly noisy (such as a television showing random images), then the agent might be trapped exploring those observations (watching the television). === Exploration reward === This section based on. The exploration reward (also called exploration bonus) methods convert the exploration-exploitation dilemma into a balance of exploitations. That is, instead of trying to get the agent to balance exploration and exploitation, exploration is simply treated as another form of exploitation, and the agent simply attempts to maximize the sum of rewards from exploration and exploitation. The exploration reward can be treated as a form of intrinsic reward. We write these as r t i , r t e {\displaystyle r_{t}^{i},r_{t}^{e}} , meaning the intrinsic and extrinsic rewards at time step t {\displaystyle t} . However, exploration reward is different from exploitation in two regards: The reward of exploitation is not freely chosen, but given by the environment, but the reward of exploration may be picked freely. Indeed, there are many different ways to design r t i {\displaystyle r_{t}^{i}} described below. The reward of exploitation is usually stationary (i.e. the same action in the same state gives the same reward), but the reward of exploration is non-stationary (i.e. the same action in the same state should give less and less reward). Count-based exploration uses N n ( s ) {\displaystyle N_{n}(s)} , the number of visits to a state s {\displaystyle s} during the time-steps 1 : n {\displaystyle 1:n} , to calculate the exploration reward. This is only possible in small and discrete state space. Density-based exploration extends count-based exploration by using a density model ρ n ( s ) {\displaystyle \rho _{n}(s)} . The idea is that, if a state has been visited, then nearby states are also partly-visited. In maximum entropy exploration, the entropy of the agent's policy π {\displaystyle \pi } is included as a term in the intrinsic reward. That is, r t i = − ∑ a π ( a | s t ) ln ⁡ π ( a | s t ) + ⋯ {\displaystyle r_{t}^{i}=-\sum _{a}\pi (a|s_{t})\ln \pi (a|s_{t})+\cdots } . === Prediction-based === This section based on. The forward dynamics model is a function for predicting the next state based on the current state and the current action: f : ( s t , a t ) ↦ s t + 1 {\displaystyle f:(s_{t},a_{t})\mapsto s_{t+1}} . The forward dynamics model is trained as the agent plays. The model becomes better at predicting state transition for state-action pairs that had been done many times. A forward dynamics model can define an exploration reward by r t i = ‖ f ( s t , a t ) − s t + 1 ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-s_{t+1}\|_{2}^{2}} . That is, the reward is the squared-error of the prediction compared to reality. This rewards the agent to perform state-action pairs that had not been done many times. This is however susceptible to the noisy TV problem. Dynamics model can be run in latent space. That is, r t i = ‖ f ( s t , a t ) − ϕ ( s t + 1 ) ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-\phi (s_{t+1})\|_{2}^{2}} for some featurizer ϕ {\displaystyle \phi } . The featurizer can be the identity function (i.e. ϕ ( x ) = x {\displaystyle \phi (x)=x} ), randomly generated, the encoder-half of a variational autoencoder, etc. A good featurizer improves forward dynamics exploration. The Intrinsic Curiosity Module (ICM) method trains simultaneously a forward dynamics model and a featurizer. The featurizer is trained by an inverse dynamics model, which is a function for predicting the current action based on the features of the current and the next state: g : ( ϕ ( s t ) , ϕ ( s t + 1 ) ) ↦ a t {\displaystyle g:(\phi (s_{t}),\phi (s_{t+1}))\mapsto a_{t}} . By optimizing the inverse dynamics, both the inverse dynamics model and the featurizer are improved. Then, the improved featurizer improves the forward dynamics model, which improves the exploration of the agent. Random Network Distillation (RND) method attempts to solve this problem by teacher–student distillation. Instead of a forward dynamics model, it has two models f , f ′ {\displaystyle f,f'} . The f ′ {\displaystyle f'} teacher model is fixed, and the f {\displaystyle f} student model is trained to minimize ‖ f ( s ) − f ′ ( s ) ‖ 2 2 {\displaystyle \|f(s)-f'(s)\|_{2}^{2}} on states s {\displaystyle s} . As a state is visited more and more, the student network becomes better at predicting the teacher. Meanwhile, the prediction error is also an exploration reward for the agent, and so the agent learns to perform actions that result in higher prediction error. Thus, we have a student network attempting to minimize the prediction error, while the agent attempting to maximize it, resulting in exploration. The states are normalized by subtracting a running average and dividing a running variance, which is necessary since the teacher model is frozen. The rewards are normalized by dividing with a running variance. Exploration by disagreement trains an ensemble of forward dynamics models, each on a random subset of all ( s t , a t , s t + 1 ) {\displaystyle (s_{t},a_{t},s_{t+1})} tuples. The exploration reward is the variance of the models' predictions. === Noise === For neural network–based agents, the NoisyNet method changes some of its neural network modules by noisy versions. That is, some network parameters are random variables from a probability distribution. The parameters of the distribution are themselves learnable. For example, in a linear layer y = W x + b {\displaystyle y=Wx+b} , both W , b {\displaystyle W,b} are sampled from Gaussian distributions N ( μ W , Σ W ) , N ( μ b , Σ b ) {\displaystyle {\mathcal {N}}(\mu _{W},\Sigma _{W}),{\mathcal {N}}(\mu _{b},\Sigma _{b})} at every step, and the parameters μ W , Σ W , μ b , Σ b {\displaystyle \mu _{W},\Sigma _{W},\mu _{b},\Sigma _{b}} are learned via the reparameterization trick.

    Read more →
  • Token maxxing

    Token maxxing

    Token Maxxing or Token Maxing is a metric used in an attempt to track productivity in the workplace especially for those using Artificial Intelligence (AI) based services. AI services charge for each token which represent units of effort expended by an AI service to solve a problem. Some believe that token consumption equates to productivity and thus can be used as a metric to monitor an employee's work. Supporters believe that higher token usage indicates higher productivity and higher utilization of powerful AI services. This also suggests that those not consuming enough tokens may be less productive and underutilizing powerful AI services. This belief might lead to an environment that incentivizes higher token usage to predict increased productivity. Critics of token maxxing as a metric claim that prudent workers will maximize any metric that management wants increased to gain a workplace advantage. For example: Engineers in the tech industries pressed to consume as many tokens as possible might run several AI agents in tandem, enter longer input prompts, or automate their tasks to maximize their token consumption. To management, this higher token usage may indicate potential productivity, but in reality may cause additional token costs, worker burnout, or actually create more bloated code of lower quality. Another claim is AI service companies potentially benefit from such an emphasis on token consumption and actively encourage the trend. Some developers have publicly advocated the practice. Developer Sigrid Jin, who said he used 50 billion tokens in a single year, has argued that maximizing token consumption is the best way to understand the value of AI, advising others to spend as much on AI usage as they pay in rent to obtain a return on investment. == See Also == Goodhart's law Perverse incentive Jevons Paradox

    Read more →
  • Spatial embedding

    Spatial embedding

    Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension. Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks == Embedded data types == Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses. === Text === Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques. === Image === Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region. === Point === A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph. === Line / multiline === Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization. === Polygon === The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents. === Graph === An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network. == Usage == POI recommendation - generating personalized point of interest recommendations based on user preferences. Next/future location prediction - prediction of the next location a person will go to based on their historical trajectory. Zone functions classification - based on different mobility of people or POI distribution a function of a given area in a city can be predicted. Crime prediction - estimation of crime rate in different regions of a city. Local event detection - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location. Regional mobility popularity prediction - analysis of mobility can show patterns in popularity of different regions in a city. Shape matching - finding a similar shape of given polygon, for example finding building with the same shape as input building. Travel time estimation - predicting estimated travel time given current traffic conditions and special occurring events. Time estimation for on-demand food delivery - estimation of delivery time when placing an order through the website. == Temporal aspect == Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.

    Read more →
  • Gemini Enterprise Agent Platform

    Gemini Enterprise Agent Platform

    Gemini Enterprise Agent Platform (formerly known as Vertex AI) is a managed machine learning (ML) and artificial intelligence (AI) platform developed by Google Cloud. It provides a unified environment for building, training, deploying, and scaling ML models and generative AI applications. The platform integrates tools for the full ML lifecycle, including data preparation, model training, evaluation, deployment, and monitoring, under a single API and user interface. Vertex AI was announced at Google I/O and released as a generally available product on May 18, 2021. At launch, Google described Vertex AI as unifying its AutoML offerings with its prior Cloud AI Platform capabilities, and as adding operational features intended to help teams move models from experimentation into production use. On April 22, 2026, Google announced Gemini Enterprise Agent Platform as the replacement evolution of Vertex AI. == History == Google Cloud announced the general availability of Vertex AI on May 18, 2021, at the Google I/O developer conference. The platform was designed to consolidate Google Cloud's previously separate ML offerings, including AutoML and the legacy AI Platform, into a single system. At launch, Google claimed that Vertex AI required roughly 80% fewer lines of code to train a model compared to competing platforms. In June 2023, Google made generative AI support in Vertex AI generally available, giving developers access to foundation models including PaLM 2, Imagen, and Codey through the platform's Model Garden and the newly launched Generative AI Studio. At the time of this launch, Model Garden included over 60 models from Google and its partners. In August 2023, at the Google Cloud Next conference, Google announced further updates to Vertex AI, including the addition of third-party models such as Claude 2 from Anthropic and Llama 2 from Meta to the Model Garden, as well as new tools called Vertex AI Extensions for connecting models to APIs for real-time data retrieval. At the same event, Vertex AI Search and Conversation were made generally available, providing enterprise search and chatbot capabilities powered by foundation models. In April 2024, at Google Cloud Next, the company introduced Vertex AI Agent Builder, a no-code tool for creating AI-powered conversational agents built on top of Gemini large language models. This brought together the existing Vertex AI Search and Conversation products with new developer tools for building generative AI experiences. == Features == === Model training === Vertex AI supports both AutoML, which enables code-free model training on tabular, image, text, or video data, and custom training, which gives users full control over the ML framework, training code, and hyperparameter tuning. The platform provides serverless training as well as dedicated training clusters with GPU and TPU accelerators. Vertex AI Vizier handles automatic hyperparameter tuning, and Vertex AI Experiments allows comparison and tracking of training runs. === Model Garden === The Vertex AI Model Garden is a curated catalog of over 200 enterprise-ready models, including Google's own foundation models (such as Gemini, Imagen, and Veo), third-party models (such as Anthropic's Claude and Mistral AI models), and popular open-source models (such as Llama and Gemma). Models are accessible as fully managed model-as-a-service APIs. === Pipelines (workflow orchestration) === Vertex AI Pipelines provides managed orchestration of ML workflows and supports pipelines built with the Kubeflow Pipelines SDK, among other options described in Google Cloud documentation. === Vertex AI Studio === Vertex AI Studio provides tools for prompt design, testing, and model management, allowing developers to prototype and build generative AI applications using natural language, code, images, or video. === Agent Builder and Agent Engine === Vertex AI Agent Builder is a suite of products for building, deploying, and governing AI agents in production environments. It supports development with the open-source Agent Development Kit (ADK) and other frameworks. Vertex AI Agent Engine provides the underlying infrastructure for deploying and scaling agents, with support for enterprise security features including HIPAA compliance, customer-managed encryption keys (CMEK), and VPC Service Controls. === Generative AI tooling and model access === Google markets Vertex AI as providing access to Google foundation models (including the Gemini family) and developer tools such as Vertex AI Studio, along with a model catalog that includes Google and selected open source models (marketed as "Model Garden"). Google has also offered products within Vertex AI aimed at building generative search and conversational applications, including offerings named "Vertex AI Search" and "Vertex AI Conversation" as reported in 2023 coverage of platform updates. === MLOps tools === The platform includes a range of MLOps capabilities: Vertex AI Pipelines for orchestrating and automating ML workflows as reusable pipelines. Vertex AI Feature Store for serving, sharing, and reusing ML features across projects. Vertex AI Model Registry for storing, versioning, and managing trained models. Vertex AI Model Monitoring for detecting training-serving skew and inference drift in deployed models. Vertex Explainable AI for interpreting model predictions. Vertex AI Workbench for managed JupyterLab notebook environments integrated with Google Cloud Storage and BigQuery. == Industry recognition == Google was named a Leader for the fifth consecutive year in the 2024 Gartner Magic Quadrant for Cloud AI Developer Services, a recognition that encompasses Vertex AI and its related offerings. Google was also recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms and was named a Leader in the Forrester Wave for AI/ML Platforms, Q3 2024. In October 2025, Google was also named a Leader in the 2025 IDC (International Data Corporation) MarketScape for Worldwide GenAI Life-Cycle Foundation Model Software. == Pricing == Vertex AI uses a pay-as-you-go pricing model, with costs determined by the specific services consumed, including model training, prediction serving, and data storage. For generative AI tasks, pricing is based on a per-token model, with rates varying depending on the specific model used and whether tokens are input or output. Google offers a free tier for new users, which includes limited custom training hours and online prediction usage, along with an introductory US$300 in Google Cloud credits valid for 90 days. == Adoption == In the year following its 2021 launch, Google reported that usage of Vertex AI and BigQuery had driven 2.5 times more machine learning predictions compared to the prior year, and that active customers of Vertex AI Workbench had grown 25-fold over a six-month period. Early enterprise adopters included Ford, Wayfair, and Seagate, among others. Wayfair reported that it was able to run large model training jobs 5 to 10 times faster using the platform.

    Read more →
  • .ai

    .ai

    .ai is the Internet country code top-level domain (ccTLD) for Anguilla, a British Overseas Territory in the Caribbean. It is administered by the government of Anguilla. It is a popular domain hack with companies and projects related to the artificial intelligence industry (AI). Google's ad targeting treats .ai as a generic top-level domain (gTLD) because "users and website owners frequently see [the domain] as being more generic than country-targeted." In 2021, Google Search analyst Gary Illyes announced that ".ai" had been added to Google’s list of generic country-code top-level domains, meaning that Google would no longer infer Anguilla-specific targeting from the ccTLD. Identity Digital began managing the domain as of January 2025. == Second and third level registrations == Registrations within off.ai, com.ai, net.ai, and org.ai are available worldwide without restriction. From 15 September 2009, second level registrations within .ai are available to everyone worldwide. == Registration == The minimum registration term allowed for .ai domains is 2 through 10 years for registration and renewal, and a 2-year renewal for domain transfer. Identity Digital is the authority in charge of managing this extension. Registrations began on 16 February 1995. The limits on the number of characters used for the domain name are, at a minimum, from 1 to 3, depending on the registrar, and always at most 63 characters. The character set supported for .ai domain names includes A–Z, a–z, 0–9, and hyphen. As of November 2022, .ai domains cannot accommodate IDN characters. There are no requirements for registering a domain, including local and foreign residents. A .ai domain can be suspended or revoked, if the domain is involved in illegal activity such as violating trademarks or copyrights. Usage must not violate the laws of Anguilla. Anguilla uses the UDRP. Filing a UDRP challenge requires using one of the ICANN Approved Dispute Resolution Service Providers. If the domain is with an ICANN accredited registrar, they should work with the arbitrator. Usually this means either doing nothing or transferring a domain. .ai domains are transferable to any desired registrars as the registration of domain is done maintaining EPP. There used to be a whois.ai-based platform of expired domains in which those could be procured and auctioned every ten days through a standard online process. The last auctions of such kind closed there in December 2024; the platform had been scheduled for shutdown on 30 June 2025, but remained online in the months following that date. == Valuation == Domains cost depends on the registrar, with yearly fees ranging from US$140 (the base fee, as established by Anguilla) to $200. As of July 2025, the highest-valued .ai domain is an undisclosed one sold on 8 November 2023, on Escrow.com, for US$1,500,000—months after an initial $300,000 sale to the same buyer. Among the publicly disclosed ones, the most valued, fin.ai, was sold for $1,000,000 in March 2025. On 16 December 2017, the .ai registry started supporting the Extensible Provisioning Protocol (EPP) and migrated all of its domains onto an EPP system. Consequently, many registrars are allowed to sell .ai domains. Since that date, the .ai ccTLD has also been popular with artificial intelligence companies and organisations. Though such trends are primarily seen among new AI based companies or startups, many established AI and Tech companies preferred not to opt for .ai domains. For example, DeepMind has its domain retained at .com; Meta has redirected its facebook.ai domain to ai.meta.com. == Impact on Anguilla's economy == The registration fees earned from the .ai domains go to the treasury of the Government of Anguilla. As per a 2018 New York Times report, the total revenue generated out of selling .ai domains was $2.9 million. In 2023, Anguilla's government made about US$32 million from fees collected for registering .ai domains; that amounted to over 10% of gross domestic product for the territory. "In the years before the real breakthrough of AI, revenue from .ai domains made up less than 1% of our state income, by 2025 it will be around 47%," explained Jose Vanterpool, Minister of Infrastructure and Communications (MICUHITES), in an interview with BBC. The high 90% renewal rate of .ai domains and the 2025 renewal wave of domains registered in 2023 are driving another surge in state revenues, according to Domaintechnik.

    Read more →
  • Random feature

    Random feature

    Random features (RF) are a technique used in machine learning to approximate kernel methods, introduced by Ali Rahimi and Ben Recht in their 2007 paper "Random Features for Large-Scale Kernel Machines", and extended by. RF uses a Monte Carlo approximation to kernel functions by randomly sampled feature maps. It is used for datasets that are too large for traditional kernel methods like support vector machine, kernel ridge regression, and gaussian process. == Mathematics == === Kernel method === Given a feature map ϕ : R d → V {\textstyle \phi :\mathbb {R} ^{d}\to V} , where V {\textstyle V} is a Hilbert space (more specifically, a reproducing kernel Hilbert space), the kernel trick replaces inner products in feature space ⟨ ϕ ( x i ) , ϕ ( x j ) ⟩ V {\displaystyle \langle \phi (x_{i}),\phi (x_{j})\rangle _{V}} by a kernel function k ( x i , x j ) : R d × R d → R {\displaystyle k(x_{i},x_{j}):\mathbb {R} ^{d}\times \mathbb {R} ^{d}\to \mathbb {R} } Kernel methods replaces linear operations in high-dimensional space by operations on the kernel matrix: K X := [ k ( x i , x j ) ] i , j ∈ 1 : N {\displaystyle K_{X}:=[k(x_{i},x_{j})]_{i,j\in 1:N}} where N {\textstyle N} is the number of data points. === Random kernel method === The problem with kernel methods is that the kernel matrix K X {\textstyle K_{X}} has size N × N {\textstyle N\times N} . This becomes computationally infeasible when N {\textstyle N} reaches the order of a million. The random kernel method replaces the kernel function k {\textstyle k} by an inner product in low-dimensional feature space R D {\textstyle \mathbb {R} ^{D}} : k ( x , y ) ≈ ⟨ z ( x ) , z ( y ) ⟩ {\displaystyle k(x,y)\approx \langle z(x),z(y)\rangle } where z {\textstyle z} is a randomly sampled feature map z : R d → R D {\textstyle z:\mathbb {R} ^{d}\to \mathbb {R} ^{D}} . This converts kernel linear regression into linear regression in feature space, kernel SVM into SVM in feature space, etc. Since we have K X ≈ Z X T Z X {\displaystyle K_{X}\approx Z_{X}^{T}Z_{X}} where Z X = [ z ( x 1 ) , … , z ( x N ) ] {\displaystyle Z_{X}=[z(x_{1}),\dots ,z(x_{N})]} , these methods no longer involve matrices of size O ( N 2 ) {\textstyle O(N^{2})} , but only random feature matrices of size O ( D N ) {\textstyle O(DN)} . == Random Fourier feature == === Radial basis function kernel === The radial basis function (RBF) kernel on two samples x i , x j ∈ R d {\displaystyle x_{i},x_{j}\in \mathbb {R} ^{d}} is defined as k ( x i , x j ) = exp ⁡ ( − ‖ x i − x j ‖ 2 2 σ 2 ) {\displaystyle k(x_{i},x_{j})=\exp \left(-{\frac {\|x_{i}-x_{j}\|^{2}}{2\sigma ^{2}}}\right)} where ‖ x i − x j ‖ 2 {\displaystyle \|x_{i}-x_{j}\|^{2}} is the squared Euclidean distance and σ {\displaystyle \sigma } is a free parameter defining the shape of the kernel. It can be approximated by a random Fourier feature map z : R d → R 2 D {\displaystyle z:\mathbb {R} ^{d}\to \mathbb {R} ^{2D}} : z ( x ) := 1 D [ cos ⁡ ⟨ ω 1 , x ⟩ , sin ⁡ ⟨ ω 1 , x ⟩ , … , cos ⁡ ⟨ ω D , x ⟩ , sin ⁡ ⟨ ω D , x ⟩ ] T {\displaystyle z(x):={\frac {1}{\sqrt {D}}}[\cos \langle \omega _{1},x\rangle ,\sin \langle \omega _{1},x\rangle ,\ldots ,\cos \langle \omega _{D},x\rangle ,\sin \langle \omega _{D},x\rangle ]^{T}} where ω 1 , . . . , ω D {\displaystyle \omega _{1},...,\omega _{D}} are IID samples from the multidimensional normal distribution N ( 0 , σ − 2 I ) {\displaystyle N(0,\sigma ^{-2}I)} . Since cos , sin {\displaystyle \cos ,\sin } are bounded, there is a stronger convergence guarantee by Hoeffding's inequality. === Random Fourier features === By Bochner's theorem, the above construction can be generalized to arbitrary positive definite shift-invariant kernel k ( x , y ) = k ( x − y ) {\displaystyle k(x,y)=k(x-y)} . Define its Fourier transform p ( ω ) = 1 2 π ∫ R d e − j ⟨ ω , Δ ⟩ k ( Δ ) d Δ {\displaystyle p(\omega )={\frac {1}{2\pi }}\int _{\mathbb {R} ^{d}}e^{-j\langle \omega ,\Delta \rangle }k(\Delta )d\Delta } then ω 1 , . . . , ω D {\displaystyle \omega _{1},...,\omega _{D}} are sampled IID from the probability distribution with probability density p {\displaystyle p} . This applies for other kernels like the Laplace kernel and the Cauchy kernel. === Neural network interpretation === Given a random Fourier feature map z {\displaystyle z} , training the feature on a dataset by featurized linear regression is equivalent to fitting complex parameters θ 1 , … , θ D ∈ C {\displaystyle \theta _{1},\dots ,\theta _{D}\in \mathbb {C} } such that f θ ( x ) = R e ( ∑ k θ k e i ⟨ ω k , x ⟩ ) {\displaystyle f_{\theta }(x)=\mathrm {Re} \left(\sum _{k}\theta _{k}e^{i\langle \omega _{k},x\rangle }\right)} which is a neural network with a single hidden layer, with activation function t ↦ e i t {\displaystyle t\mapsto e^{it}} , zero bias, and the parameters in the first layer frozen. In the overparameterized case, when 2 D ≥ N {\displaystyle 2D\geq N} , the network linearly interpolates the dataset { ( x i , y i ) } i ∈ 1 : N {\displaystyle \{(x_{i},y_{i})\}_{i\in 1:N}} , and the network parameters is the least-norm solution: θ ^ = arg ⁡ min θ ∈ C D , f θ ( x k ) = y k ∀ k ∈ 1 : N ‖ θ ‖ {\displaystyle {\hat {\theta }}=\arg \min _{\theta \in \mathbb {C} ^{D},f_{\theta }(x_{k})=y_{k}\forall k\in 1:N}\|\theta \|} At the limit of D → ∞ {\displaystyle D\to \infty } , the L2 norm ‖ θ ^ ‖ → ‖ f K ‖ H {\displaystyle \|{\hat {\theta }}\|\to \|f_{K}\|_{H}} where f K {\displaystyle f_{K}} is the interpolating function obtained by the kernel regression with the original kernel, and ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{H}} is the norm in the reproducing kernel Hilbert space for the kernel. == Other examples == === Random binning features === A random binning features map partitions the input space using randomly shifted grids at randomly chosen resolutions and assigns to an input point a binary bit string that corresponds to the bins in which it falls. The grids are constructed so that the probability that two points x i , x j ∈ R d {\displaystyle x_{i},x_{j}\in \mathbb {R} ^{d}} are assigned to the same bin is proportional to K ( x i , x j ) {\displaystyle K(x_{i},x_{j})} . The inner product between a pair of transformed points is proportional to the number of times the two points are binned together, and is therefore an unbiased estimate of K ( x i , x j ) {\displaystyle K(x_{i},x_{j})} . Since this mapping is not smooth and uses the proximity between input points, Random Binning Features works well for approximating kernels that depend only on the L 1 {\displaystyle L_{1}} distance between datapoints. === Orthogonal random features === Orthogonal random features uses a random orthogonal matrix instead of a random Fourier matrix. == Historical context == In NIPS 2006, deep learning had just become competitive with linear models like PCA and linear SVMs for large datasets, and people speculated about whether it could compete with kernel SVMs. However, there was no way to train kernel SVM on large datasets. The two authors developed the random feature method to train those. It was then found that the O ( 1 / D ) {\displaystyle O(1/D)} variance bound did not match practice: the variance bound predicts that approximation to within 0.01 {\displaystyle 0.01} requires D ∼ 10 4 {\displaystyle D\sim 10^{4}} , but in practice required only ∼ 10 2 {\displaystyle \sim 10^{2}} . Attempting to discover what caused this led to the subsequent two papers.

    Read more →
  • Space-based data center

    Space-based data center

    Space-based data centers or orbital AI infrastructure are proposed concepts to build AI data centers in the sun-synchronous orbit or other orbits utilizing space-based solar power. Electric power has become the main bottleneck for terrestrial AI infrastructure. Space-based edge computing has historical roots in military architectures designed to bypass the latency of ground-based targeting networks. In the 1980s, the Strategic Defense Initiative's Brilliant Pebbles program first envisioned autonomous on-orbit data processing for missile defense. In 2019, the Space Development Agency (SDA) began to revive this decentralized approach through its Proliferated Warfighter Space Architecture (PWSA). This ambitious "sensor-to-shooter" infrastructure is treated as a prerequisite for the modern Golden Dome program, which would rely on space-based data processing to continuously track targets. == History == Early thinking about space-based computing infrastructure grew out of mid-20th-century visions for large orbital industrial systems, most notably proposals for space-based solar power, which were popularized in both technical literature and science writing by figures such as Isaac Asimov in the 1940s. These ideas emphasized exploiting the vacuum, continuous solar energy, and thermal characteristics of space to support power-intensive activities that would be difficult or inefficient on Earth. In the 21st century, advances in small satellites, reusable launch vehicles, and high-performance computing revived interest in space-based data centers, with governments and private companies exploring orbital or near-space platforms for edge computing, secure data handling, and low-latency processing of Earth-observation data. In September 2024, Y Combinator-backed Starcloud released a white paper detailing plans to build multiple gigawatts of AI compute in orbit. It was the first widely cited proposal to actually start building large orbital data centers. In 2025, Starcloud deployed an NVIDIA H100-class system and became the first company to train an LLM in space and run a version of Google Gemini in space. In March 2025, Lonestar deployed a data backup machine on the surface of the moon. In early January 2026, a team from the University of Pennsylvania presented a tether-based architecture for orbital data centers at the AIAA SciTech conference. The design relied on gravity gradient tension and solar-pressure-based passive attitude stabilization to minimize the mass of MW-scale orbital data centers. In January 2026, SpaceX filed plans with the Federal Communications Commission (FCC) for millions of satellites, leveraging reusable launches and Starlink integration to extend cloud and AI computing into orbit. Around the same time, Blue Origin announced the TeraWave constellation of about 5,400 satellites, designed to provide high‑throughput networking for data centers, enterprise, and government customers. Meanwhile, China announced a 200,000‑satellite constellation, focusing on state coordination, data sovereignty, and in-orbit processing for secure, time-critical applications. In February 2026, Starcloud submitted a proposal to the FCC for a constellation of up to 88,000 satellites for orbital data centers. In March, it announced intentions to be the first to mine Bitcoin in space, flying bitcoin mining ASICs on its second satellite, Starcloud-2. In May 2026, Edge Aerospace was awarded a contract by the European Space Agency under its Space Cloud program to study use cases, architectures and implementation roadmap for orbital data centers. == Feasibility == In October 2025, Nature Electronics published a study led by a research group at Nanyang Technological University on the development of carbon-neutral data centres in space. In November 2025, Google published a feasibility study on space-based data centers. The authors argued that if launch costs to low earth orbit reached US$200/kg, the launch cost for data center satellites could be cost effective relative to current energy costs for ground-based data centers. They project this may occur around 2035 if SpaceX's Starship project scales to 180 launches/year by then. == Advantages == Some sun-synchronous orbit (SSO) planes have constant sunlight in the dawn/dusk which could provide continuous solar energy. SSO is a limited resource and proper management and sharing of it is required. Solar irradiance is 36% higher in Earth orbit than on the surface No Earth weather storms or clouds, however more exposed to Solar storms. No property tax or land-use regulation. Saves space for other land use. Ample space for scalability. Won't strain the power grid. Direct access to power source without additional infrastructure. == Disadvantages == The deployment of space-based data centers raises several technical, economic, and environmental concerns. Existing launch costs are substantial and remains main cost of space infrastructure deployment Cooling is limited to heat dissipation through radiation only, which made in inefficient in comparison to convection in terrestrial data centers Space infrastructure must be designed to survive launch and to work under environment conditions of radiation, wide range of temperatures, in vacuum and in microgravity In-space assembly is on early development stage to enable deployment of mega-structures Megastructures are particularly exposed to orbital debris Solar arrays efficiency decrease 0.5% to 0.8% per year due to exposure of ultraviolet rays, space weather and orbital thermal cycles Hardware is designed for limited lifespan. Maintenance and repair in space (known as On-Orbit Servicing (OOS)) is still on early stage of practical implementation. Disposable data centre: technology obsolescence of AI data centre being a concern and difficult maintenance in space imply the single-use purpose of those space data centres. To extend lifetime, space infrastructure will require either refueling or orbit rasie by the servicer, which is going to increase its operational costs The environmental impact on Earth has its own challenges: The environmental impact of launches need to be addressed. Deployment consumes Earth resources that cannot be recovered or recycled. Computers require lots of resources, some of which are strategic. Recycling e-waste is already a challenge on Earth and extremely unlikely in space. Space debris (orbit pollution) is another sustainability challenge for space: Orbits are, like any resources, a limited physical and electromagnetic resource and available for all mankind. The accumulation of satellites on a particular orbit reduces the use of space for other purposes. A consequence of the increase of satellite in orbit is a higher risk of the runaway of space debris (see Kessler syndrome). This means some orbits could become unusable. Latency and bandwidth are constrained in space, and consumes limited electromagnetic resources. Satellite flares could inhibit ground-based and space-based observational astronomy. == Size and power generated == It would take ~1 square mile solar array in earth orbit to produce 1 gigawatt of power at 30% cell efficiency. == Companies pursuing space-based AI infrastructure == Blue Origin Cowboy Space Corporation (formerly Aetherflux) Edge Aerospace Google – Project Suncatcher Nvidia OpenAI SpaceX Starcloud

    Read more →
  • Collateral freedom

    Collateral freedom

    Collateral freedom is an anti-censorship strategy that attempts to make it economically prohibitive for censors to block content on the Internet. This is achieved by hosting content on cloud services that are considered by censors to be "too important to block", and then using encryption to prevent censors from identifying requests for censored information that is hosted among other content, forcing censors to either allow access to the censored information or take down entire services.

    Read more →
  • Cross-validation (statistics)

    Cross-validation (statistics)

    Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr

    Read more →
  • .ai

    .ai

    .ai is the Internet country code top-level domain (ccTLD) for Anguilla, a British Overseas Territory in the Caribbean. It is administered by the government of Anguilla. It is a popular domain hack with companies and projects related to the artificial intelligence industry (AI). Google's ad targeting treats .ai as a generic top-level domain (gTLD) because "users and website owners frequently see [the domain] as being more generic than country-targeted." In 2021, Google Search analyst Gary Illyes announced that ".ai" had been added to Google’s list of generic country-code top-level domains, meaning that Google would no longer infer Anguilla-specific targeting from the ccTLD. Identity Digital began managing the domain as of January 2025. == Second and third level registrations == Registrations within off.ai, com.ai, net.ai, and org.ai are available worldwide without restriction. From 15 September 2009, second level registrations within .ai are available to everyone worldwide. == Registration == The minimum registration term allowed for .ai domains is 2 through 10 years for registration and renewal, and a 2-year renewal for domain transfer. Identity Digital is the authority in charge of managing this extension. Registrations began on 16 February 1995. The limits on the number of characters used for the domain name are, at a minimum, from 1 to 3, depending on the registrar, and always at most 63 characters. The character set supported for .ai domain names includes A–Z, a–z, 0–9, and hyphen. As of November 2022, .ai domains cannot accommodate IDN characters. There are no requirements for registering a domain, including local and foreign residents. A .ai domain can be suspended or revoked, if the domain is involved in illegal activity such as violating trademarks or copyrights. Usage must not violate the laws of Anguilla. Anguilla uses the UDRP. Filing a UDRP challenge requires using one of the ICANN Approved Dispute Resolution Service Providers. If the domain is with an ICANN accredited registrar, they should work with the arbitrator. Usually this means either doing nothing or transferring a domain. .ai domains are transferable to any desired registrars as the registration of domain is done maintaining EPP. There used to be a whois.ai-based platform of expired domains in which those could be procured and auctioned every ten days through a standard online process. The last auctions of such kind closed there in December 2024; the platform had been scheduled for shutdown on 30 June 2025, but remained online in the months following that date. == Valuation == Domains cost depends on the registrar, with yearly fees ranging from US$140 (the base fee, as established by Anguilla) to $200. As of July 2025, the highest-valued .ai domain is an undisclosed one sold on 8 November 2023, on Escrow.com, for US$1,500,000—months after an initial $300,000 sale to the same buyer. Among the publicly disclosed ones, the most valued, fin.ai, was sold for $1,000,000 in March 2025. On 16 December 2017, the .ai registry started supporting the Extensible Provisioning Protocol (EPP) and migrated all of its domains onto an EPP system. Consequently, many registrars are allowed to sell .ai domains. Since that date, the .ai ccTLD has also been popular with artificial intelligence companies and organisations. Though such trends are primarily seen among new AI based companies or startups, many established AI and Tech companies preferred not to opt for .ai domains. For example, DeepMind has its domain retained at .com; Meta has redirected its facebook.ai domain to ai.meta.com. == Impact on Anguilla's economy == The registration fees earned from the .ai domains go to the treasury of the Government of Anguilla. As per a 2018 New York Times report, the total revenue generated out of selling .ai domains was $2.9 million. In 2023, Anguilla's government made about US$32 million from fees collected for registering .ai domains; that amounted to over 10% of gross domestic product for the territory. "In the years before the real breakthrough of AI, revenue from .ai domains made up less than 1% of our state income, by 2025 it will be around 47%," explained Jose Vanterpool, Minister of Infrastructure and Communications (MICUHITES), in an interview with BBC. The high 90% renewal rate of .ai domains and the 2025 renewal wave of domains registered in 2023 are driving another surge in state revenues, according to Domaintechnik.

    Read more →