Digital goods or e-goods are intangible goods that exist in digital form. Examples are Wikipedia articles; digital media, such as e-books, downloadable music, internet radio, internet television and streaming media; fonts, logos, photos and graphics; digital subscriptions; online ads (as purchased by the advertiser); internet coupons; electronic tickets; electronically treated documentation in many different fields; downloadable software (Digital Distribution) and mobile apps; cloud-based applications and online games; virtual goods used within the virtual economies of online games and communities; community access; workbooks; worksheets; planners; e-learning (online courses); webinars, video tutorials, blog posts; cards; patterns; website themes and templates. == Legal concerns about digital goods == Special legal concerns regarding digital goods include copyright infringement and taxation. Also the question of the ownership (versus licensed use or service only) of purely digital goods is not finally resolved. For instance, the software installers of the digital software distributor gog.com are technically independent to the account but are still subject to the EULA, where a "licensed, not sold" formulation is used. Therefore, it is not clear if the software can be legally used after a hypothetical loss of the account; a question which was also raised before in practice for the similar service Steam. In July 2012, the European Court of Justice ruled in the case UsedSoft GMbH v. Oracle International Corp. that the sale of a software product, either through a physical support or download, constituted a transfer of ownership in EU law, thus the first sale doctrine applies; the ruling thereby breaks the "licensed, not sold" legal theory, but leaves open numerous questions. Therefore, it is also permissible to resell software licenses even if the digital good has been downloaded directly from the Internet, as the first-sale doctrine applied whenever software was originally sold to a customer for an unlimited amount of time, thus prohibiting any software maker from preventing the resale of their software by any of their legitimate owners. The court requires that the previous owner must no longer be able to use the licensed software after the resale, but finds that the practical difficulties in enforcing this clause should not be an obstacle to authorizing resale, as they are also present for software which can be installed from physical supports, where the first-sale doctrine is in force. In several cases, content providers have faced criticism for revoking access to digital goods due to expired licenses or the discontinuation of a product, such as ebooks (which resulted in a lawsuit against Amazon.com, Inc.), digital video (with Sony Interactive Entertainment revoking access to purchased StudioCanal content from its now-defunct PlayStation video store; a similar move involving Warner Bros. Discovery content was averted by an updated license agreement), and video games (such as Ubisoft discontinuing and revoking access to its game The Crew without providing refunds or the ability to redownload the game) In September 2024, the U.S. state of California implemented a consumer protection law that prohibits the use of terms such as "buy" or "purchase" during transactions involving digital goods if there is no way to obtain the purchases in a manner that cannot be revoked by the seller (such as allowing it to be downloaded for permanent, offline access), and requires a disclaimer to be displayed to the customer at the time of purchase.
Open-source robotics
Open-source robotics is a branch of robotics where robots are developed with open-source hardware and free and open-source software, publicly sharing blueprints, schematics, and source code. It is thus closely related to the open design movement, the maker movement and open science. == Requirements == Open source robotics means that information about the hardware is easily discerned, so that others can easily rebuild it. In turn, this requires design to use only easily available standard subcomponents and tools, and for the build process to be documented in detail including a bill of materials and detailed ('Ikea style') step-by-step building and testing instructions. (A CAD file alone is not sufficient, as it does not show the steps for performing or testing the build). These requirements are standard to open source hardware in general, and are formalised by various licences, certifications, especially those defined by the peer-reviewed journals Journal of Open Hardware and HardwareX. Licensing requirements for software are the same as for any open source software. But in addition, for software components to be of practical use in real robot systems, they need to be compatible with other software, usually as defined by some robotics middleware community standard. == Hardware systems == Applications to date include: Robot arms, e.g. PARA or Thor Wheeled mobile robots. e.g. OpenScout Four-legged robots such as the Open Dynamic Robot Initiative UAV quadcopters (drones) such as Agilicious Humanoid robots, e.g. iCub, Berkeley Humanoid Lite Self-driving cars, e.g. OpenPodcar (→ Personal rapid transit) Submersible robots, eg. OpenFish Laboratory robotics such as chemical liquid handling Vertical farming Swarm robots, e.g. HeRoSwarm Domestic tasks: vacuum cleaning, floor washing and grass mowing Robot sports including robot combat and autonomous racing Education == Hardware subcomponents == Most open source hardware definitions allow non-open subcomponents to be used in modular design, as long as they are easily available. However many designs try to push openness down into as many subcomponents as possible, with the aim of ultimately reaching fully open designs. Open hardware manual-drive vehicles and their subcomponents, such as from Open Source Ecology, are often used as starting points and extended with automation systems. Open subcomponents can include open-source computing hardware as subcomponents, such as Arduino and RISC-V, as well as open source motors and drivers such as the Open Source Motor Controller and ODrive. Open hardware robotics interface boards can simplify interfacing between middleware software and physical hardware. == Software subcomponents == === Middleware === Robotics middleware is software which links multiple other software components together. In robotics, this specifically means real-time communication systems with standardized message passing protocols. The predominant open source middleware is ROS2, the robot operating system, now as version 2. Other alternatives include ROS1, YARP — used in the iCub, URBI, and Orca. Open source middleware is usually run on an open source operating system, especially the Ubuntu distribution of Linux. === Driver software === Most robot sensors and actuators require software drivers. There is little standardization of open source software at this level, because each hardware device is different. Creating open drivers for closed hardware is difficult as it requires both low level programming and reverse engineering. === Simulation software === Open source robotics simulators include Gazebo, MuJoCo and Webots. Open source 3D game engines such as Godot are also sometimes used as simulators, when equipped with suitable middleware interfaces. === Automation software === At the level of AI, many standard algorithms have open source software implementations, mostly in ROS2. Major components include: Machine vision systems such as the YOLO object detector. 3D photogrammetry Navigation including SLAM and planning such as nav2 Arm inverse kinematics such as moveIt2 == Community == The first signs of the increasing popularity of building and sharing robot designs were found with the maker culture community. What began with small competitions for remote operated vehicles (e.g. Robot combat), soon developed to the building of autonomous telepresence robots such as Sparky and then true robots (being able to take decisions themselves) as the Open Automaton Project. Several commercial companies now also produce kits for making simple robots. The community has adopted open source hardware licenses, certifications, and peer-reviewed publications, which check that source has been made correctly and permanently available under community definitions, and which validate that this has been done. These processes have become critically important due to many historical projects claiming to be open source but them reverting on the promise due to commercialisation or other pressures. As with other forms of open source hardware, the community continues to debate precise criteria for 'ease of build'. A common standard is that designs should be buildable by a technical university student, in a few days, using typical fablab tools, but definitions of all of these subterms can also be debated. Compared to other forms of open source hardware, open source robotics typically includes a large software element, so involves software as well as hardware engineers. Open source concepts are more established in open source software than hardware, so robotics is a field in which those concepts can be shared and transferred from software to hardware. While the community in open source robotics is multi-faceted with a wide range of backgrounds, a sizable sub-community uses the ROS middleware and meets at the ROSCon conferences to discuss development of ROS itself and automation components built on it.
Data augmentation
Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data. == Synthetic oversampling techniques for traditional machine learning == Synthetic Minority Over-sampling Technique (SMOTE) is a method used to address imbalanced datasets in machine learning. In such datasets, the number of samples in different classes varies significantly, leading to biased model performance. For example, in a medical diagnosis dataset with 90 samples representing healthy individuals and only 10 samples representing individuals with a particular disease, traditional algorithms may struggle to accurately classify the minority class. SMOTE rebalances the dataset by generating synthetic samples for the minority class. For instance, if there are 100 samples in the majority class and 10 in the minority class, SMOTE can create synthetic samples by randomly selecting a minority class sample and its nearest neighbors, then generating new samples along the line segments joining these neighbors. This process helps increase the representation of the minority class, improving model performance. == Data augmentation for image classification == When convolutional neural networks grew larger in mid-1990s, there was a lack of data to use, especially considering that some part of the overall dataset should be spared for later testing. It was proposed to perturb existing data with affine transformations to create new examples with the same labels, which were complemented by so-called elastic distortions in 2003, and the technique was widely used as of 2010s. Data augmentation can enhance CNN performance and acts as a countermeasure against CNN profiling attacks. Data augmentation has become fundamental in image classification, enriching training dataset diversity to improve model generalization and performance. The evolution of this practice has introduced a broad spectrum of techniques, including geometric transformations, color space adjustments, and noise injection. === Geometric Transformations === Geometric transformations alter the spatial properties of images to simulate different perspectives, orientations, and scales. Common techniques include: Affine Transformation Rotation: Rotating images by a specified degree to help models recognize objects at various angles. Reflection: Reflecting images horizontally or vertically to introduce variability in orientation. Translation: Shifting images in different directions to teach models positional invariance. Scaling Shear Mapping Cropping: Removing sections of the image to focus on particular features or simulate closer views. Elastic Distortion Morphing within the same class: Generating new samples by applying morphing techniques between two images belonging to the same class, thereby increasing intra-class diversity. === Color Space Transformations === Color space transformations modify the color properties of images, addressing variations in lighting, color saturation, and contrast. Techniques include: Brightness Adjustment: Varying the image's brightness to simulate different lighting conditions. Contrast Adjustment: Changing the contrast to help models recognize objects under various clarity levels. Saturation Adjustment: Altering saturation to prepare models for images with diverse color intensities. Color Jittering: Randomly adjusting brightness, contrast, saturation, and hue to introduce color variability. === Noise Injection === Injecting noise into images simulates real-world imperfections, teaching models to ignore irrelevant variations. Techniques involve: Gaussian Noise: Adding Gaussian noise mimics sensor noise or graininess. Salt and Pepper Noise: Introducing black or white pixels at random simulates sensor dust or dead pixels. == Data augmentation for signal processing == Residual or block bootstrap can be used for time series augmentation. === Biological signals === Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and scarce. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Data scarcity is notable in signal processing problems such as for Parkinson's Disease Electromyography signals, which are difficult to source - Zanini, et al. noted that it is possible to use a generative adversarial network (in particular, a DCGAN) to perform style transfer in order to generate synthetic electromyographic signals that corresponded to those exhibited by sufferers of Parkinson's Disease. The approaches are also important in electroencephalography (brainwaves). Wang, et al. explored the idea of using deep convolutional neural networks for EEG-Based Emotion Recognition, results show that emotion recognition was improved when data augmentation was used. A common approach is to generate synthetic signals by re-arranging components of real data. Lotte proposed a method of "Artificial Trial Generation Based on Analogy" where three data examples x 1 , x 2 , x 3 {\displaystyle x_{1},x_{2},x_{3}} provide examples and an artificial x s y n t h e t i c {\displaystyle x_{synthetic}} is formed which is to x 3 {\displaystyle x_{3}} what x 2 {\displaystyle x_{2}} is to x 1 {\displaystyle x_{1}} . A transformation is applied to x 1 {\displaystyle x_{1}} to make it more similar to x 2 {\displaystyle x_{2}} , the same transformation is then applied to x 3 {\displaystyle x_{3}} which generates x s y n t h e t i c {\displaystyle x_{synthetic}} . This approach was shown to improve performance of a Linear Discriminant Analysis classifier on three different datasets. Current research shows great impact can be derived from relatively simple techniques. For example, Freer observed that introducing noise into gathered data to form additional data points improved the learning ability of several models which otherwise performed relatively poorly. Tsinganos et al. studied the approaches of magnitude warping, wavelet decomposition, and synthetic surface EMG models (generative approaches) for hand gesture recognition, finding classification performance increases of up to +16% when augmented data was introduced during training. More recently, data augmentation studies have begun to focus on the field of deep learning, more specifically on the ability of generative models to create artificial data which is then introduced during the classification model training process. In 2018, Luo et al. observed that useful EEG signal data could be generated by Conditional Wasserstein Generative Adversarial Networks (GANs) which was then introduced to the training set in a classical train-test learning framework. The authors found classification performance was improved when such techniques were introduced. === Mechanical signals === The prediction of mechanical signals based on data augmentation brings a new generation of technological innovations, such as new energy dispatch, 5G communication field, and robotics control engineering. In 2022, Yang et al. integrate constraints, optimization and control into a deep network framework based on data augmentation and data pruning with spatio-temporal data correlation, and improve the interpretability, safety and controllability of deep learning in real industrial projects through explicit mathematical programming equations and analytical solutions.
Data-centric AI
Data-centric AI is an approach within artificial intelligence that emphasizes on improving the quality, consistency and representativeness of the data used to train machine learning models, rather than focusing primarily on optimizing model architectures or algorithms. This idea has gained traction as researchers and practitioners have come to believe that many performance limitations of machine learning systems stem from issues such as noisy labels, biased datasets, and lack of coverage in the data. Data-centric AI involves disciplined approach to data cleaning, augmentation, labeling, and governance that improves model performance and reliability in applications such as computer vision, natural language processing, and further.
Hidden layer
In artificial neural networks, a hidden layer is a layer of artificial neurons that is neither an input layer nor an output layer. The simplest examples appear in multilayer perceptrons (MLP), as illustrated in the diagram. An MLP without any hidden layer is essentially just a linear model. With hidden layers and activation functions, however, nonlinearity is introduced into the model. In typical machine learning practice, the weights and biases are initialized, then iteratively updated during training via backpropagation.
You Only Look Once
You Only Look Once (YOLO) is a series of real-time object detection systems based on convolutional neural networks. First introduced by Joseph Redmon et al. in 2015, YOLO has undergone several iterations and improvements, becoming one of the most popular object detection frameworks. The name "You Only Look Once" refers to the fact that the algorithm requires only one forward propagation pass through the neural network to make predictions, unlike previous region proposal-based techniques like R-CNN that require thousands for a single image. == Overview == Compared to previous methods like R-CNN and OverFeat, instead of applying the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. === OverFeat === OverFeat was an early influential model for simultaneous object classification and localization. Its architecture is as follows: Train a neural network for image classification only ("classification-trained network"). This could be one like the AlexNet. The last layer of the trained network is removed, and for every possible object class, initialize a network module at the last layer ("regression network"). The base network has its parameters frozen. The regression network is trained to predict the ( x , y ) {\displaystyle (x,y)} coordinates of two corners of the object's bounding box. During inference time, the classification-trained network is run over the same image over many different zoom levels and croppings. For each, it outputs a class label and a probability for that class label. Each output is then processed by the regression network of the corresponding class. This results in thousands of bounding boxes with class labels and probability. These boxes are merged until only one single box with a single class label remains. == Versions == There are two parts to the YOLO series. The original part contained YOLOv1, v2, and v3, all released on a website maintained by Joseph Redmon. === YOLOv1 === The original YOLO algorithm, introduced in 2015, divides the image into an S × S {\displaystyle S\times S} grid of cells. If the center of an object's bounding box falls into a grid cell, that cell is said to "contain" that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts. In more detail, the network performs the same convolutional operation over each of the S 2 {\displaystyle S^{2}} patches. The output of the network on each patch is a tuple as follows: ( p 1 , … , p C , c 1 , x 1 , y 1 , w 1 , h 1 , … , c B , x B , y B , w B , h B ) {\displaystyle (p_{1},\dots ,p_{C},c_{1},x_{1},y_{1},w_{1},h_{1},\dots ,c_{B},x_{B},y_{B},w_{B},h_{B})} where p i {\displaystyle p_{i}} is the conditional probability that the cell contains an object of class i {\displaystyle i} , conditional on the cell containing at least one object. x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are the center coordinates, width, and height of the j {\displaystyle j} -th predicted bounding box that is centered in the cell. Multiple bounding boxes are predicted to allow each prediction to specialize in one kind of bounding box. For example, slender objects might be predicted by j = 2 {\displaystyle j=2} while stout objects might be predicted by j = 1 {\displaystyle j=1} . c j {\displaystyle c_{j}} is the predicted intersection over union (IoU) of each bounding box with its corresponding ground truth. The network architecture has 24 convolutional layers followed by 2 fully connected layers. During training, for each cell, if it contains a ground truth bounding box, then only the predicted bounding boxes with the highest IoU with the ground truth bounding boxes is used for gradient descent. Concretely, let j {\displaystyle j} be that predicted bounding box, and let i {\displaystyle i} be the ground truth class label, then x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are trained by gradient descent to approach the ground truth, p i {\displaystyle p_{i}} is trained towards 1 {\displaystyle 1} , other p i ′ {\displaystyle p_{i'}} are trained towards zero. If a cell contains no ground truth, then only c 1 , c 2 , … , c B {\displaystyle c_{1},c_{2},\dots ,c_{B}} are trained by gradient descent to approach zero. === YOLOv2 === Released in 2016, YOLOv2 (also known as YOLO9000) improved upon the original model by incorporating batch normalization, a higher resolution classifier, and using anchor boxes to predict bounding boxes. It could detect over 9000 object categories. It was also released on GitHub under the Apache 2.0 license. === YOLOv3 === YOLOv3, introduced in 2018, contained only "incremental" improvements, including the use of a more complex backbone network, multiple scales for detection, and a more sophisticated loss function. === YOLOv4 and beyond === Subsequent versions of YOLO (v4, v5, etc.) have been developed by different researchers, further improving performance and introducing new features. These versions are not officially associated with the original YOLO authors but build upon their work. As of 2026, versions up to YOLO26 have been released..
Sample complexity
The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function. More precisely, the sample complexity is the number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1. There are two variants of sample complexity: The weak variant fixes a particular input-output distribution; The strong variant takes the worst-case sample complexity over all input-output distributions. The No free lunch theorem, discussed below, proves that, in general, the strong sample complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target function using a finite number of training samples. However, if we are only interested in a particular class of target functions (e.g., only linear functions) then the sample complexity is finite, and it depends linearly on the VC dimension on the class of target functions. == Definition == Let X {\displaystyle X} be a space which we call the input space, and Y {\displaystyle Y} be a space which we call the output space, and let Z {\displaystyle Z} denote the product X × Y {\displaystyle X\times Y} . For example, in the setting of binary classification, X {\displaystyle X} is typically a finite-dimensional vector space and Y {\displaystyle Y} is the set { − 1 , 1 } {\displaystyle \{-1,1\}} . Fix a hypothesis space H {\displaystyle {\mathcal {H}}} of functions h : X → Y {\displaystyle h\colon X\to Y} . A learning algorithm over H {\displaystyle {\mathcal {H}}} is a computable map from Z {\displaystyle Z} to H {\displaystyle {\mathcal {H}}} . In other words, it is an algorithm that takes as input a finite sequence of training samples and outputs a function from X {\displaystyle X} to Y {\displaystyle Y} . Typical learning algorithms include empirical risk minimization, without or with Tikhonov regularization. Fix a loss function L : Y × Y → R ≥ 0 {\displaystyle {\mathcal {L}}\colon Y\times Y\to \mathbb {R} _{\geq 0}} , for example, the square loss L ( y , y ′ ) = ( y − y ′ ) 2 {\displaystyle {\mathcal {L}}(y,y')=(y-y')^{2}} , where h ( x ) = y ′ {\displaystyle h(x)=y'} . For a given distribution ρ {\displaystyle \rho } on X × Y {\displaystyle X\times Y} , the expected risk of a hypothesis (a function) h ∈ H {\displaystyle h\in {\mathcal {H}}} is E ( h ) := E ρ [ L ( h ( x ) , y ) ] = ∫ X × Y L ( h ( x ) , y ) d ρ ( x , y ) {\displaystyle {\mathcal {E}}(h):=\mathbb {E} _{\rho }[{\mathcal {L}}(h(x),y)]=\int _{X\times Y}{\mathcal {L}}(h(x),y)\,d\rho (x,y)} In our setting, we have h = A ( S n ) {\displaystyle h={\mathcal {A}}(S_{n})} , where A {\displaystyle {\mathcal {A}}} is a learning algorithm and S n = ( ( x 1 , y 1 ) , … , ( x n , y n ) ) ∼ ρ n {\displaystyle S_{n}=((x_{1},y_{1}),\ldots ,(x_{n},y_{n}))\sim \rho ^{n}} is a sequence of vectors which are all drawn independently from ρ {\displaystyle \rho } . Define the optimal risk E H ∗ = inf h ∈ H E ( h ) . {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}={\underset {h\in {\mathcal {H}}}{\inf }}{\mathcal {E}}(h).} Set h n = A ( S n ) {\displaystyle h_{n}={\mathcal {A}}(S_{n})} , for each sample size n {\displaystyle n} . h n {\displaystyle h_{n}} is a random variable and depends on the random variable S n {\displaystyle S_{n}} , which is drawn from the distribution ρ n {\displaystyle \rho ^{n}} . The algorithm A {\displaystyle {\mathcal {A}}} is called consistent if E ( h n ) {\displaystyle {\mathcal {E}}(h_{n})} probabilistically converges to E H ∗ {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}} . In other words, for all ϵ , δ > 0 {\displaystyle \epsilon ,\delta >0} , there exists a positive integer N {\displaystyle N} , such that, for all sample sizes n ≥ N {\displaystyle n\geq N} , we have Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] < δ . {\displaystyle \Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]<\delta .} The sample complexity of A {\displaystyle {\mathcal {A}}} is then the minimum N {\displaystyle N} for which this holds, as a function of ρ , ϵ {\displaystyle \rho ,\epsilon } , and δ {\displaystyle \delta } . We write the sample complexity as N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} to emphasize that this value of N {\displaystyle N} depends on ρ , ϵ {\displaystyle \rho ,\epsilon } , and δ {\displaystyle \delta } . If A {\displaystyle {\mathcal {A}}} is not consistent, then we set N ( ρ , ϵ , δ ) = ∞ {\displaystyle N(\rho ,\epsilon ,\delta )=\infty } . If there exists an algorithm for which N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is finite, then we say that the hypothesis space H {\displaystyle {\mathcal {H}}} is learnable. In others words, the sample complexity N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} defines the rate of consistency of the algorithm: given a desired accuracy ϵ {\displaystyle \epsilon } and confidence δ {\displaystyle \delta } , one needs to sample N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} data points to guarantee that the risk of the output function is within ϵ {\displaystyle \epsilon } of the best possible, with probability at least 1 − δ {\displaystyle 1-\delta } . In probably approximately correct (PAC) learning, one is concerned with whether the sample complexity is polynomial, that is, whether N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is bounded by a polynomial in 1 / ϵ {\displaystyle 1/\epsilon } and 1 / δ {\displaystyle 1/\delta } . If N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is polynomial for some learning algorithm, then one says that the hypothesis space H {\displaystyle {\mathcal {H}}} is PAC-learnable. This is a stronger notion than being learnable. == Unrestricted hypothesis space: infinite sample complexity == One can ask whether there exists a learning algorithm so that the sample complexity is finite in the strong sense, that is, there is a bound on the number of samples needed so that the algorithm can learn any distribution over the input-output space with a specified target error. More formally, one asks whether there exists a learning algorithm A {\displaystyle {\mathcal {A}}} , such that, for all ϵ , δ > 0 {\displaystyle \epsilon ,\delta >0} , there exists a positive integer N {\displaystyle N} such that for all n ≥ N {\displaystyle n\geq N} , we have sup ρ ( Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] ) < δ , {\displaystyle \sup _{\rho }\left(\Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]\right)<\delta ,} where h n = A ( S n ) {\displaystyle h_{n}={\mathcal {A}}(S_{n})} , with S n = ( ( x 1 , y 1 ) , … , ( x n , y n ) ) ∼ ρ n {\displaystyle S_{n}=((x_{1},y_{1}),\ldots ,(x_{n},y_{n}))\sim \rho ^{n}} as above. The No Free Lunch Theorem says that without restrictions on the hypothesis space H {\displaystyle {\mathcal {H}}} , this is not the case, i.e., there always exist "bad" distributions for which the sample complexity is arbitrarily large. Thus, in order to make statements about the rate of convergence of the quantity sup ρ ( Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] ) , {\displaystyle \sup _{\rho }\left(\Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]\right),} one must either constrain the space of probability distributions ρ {\displaystyle \rho } , e.g. via a parametric approach, or constrain the space of hypotheses H {\displaystyle {\mathcal {H}}} , as in distribution-free approaches. == Restricted hypothesis space: finite sample-complexity == The latter approach leads to concepts such as VC dimension and Rademacher complexity which control the complexity of the space H {\displaystyle {\mathcal {H}}} . A smaller hypothesis space introduces more bias into the inference process, meaning that E H ∗ {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}} may be greater than the best possible risk in a larger space. However, by restricting the complexity of the hypothesis space it becomes possible for an algorithm to produce more uniformly consistent functions. This trade-off leads to the concept of regularization. It is a theorem from VC theory that the following three statements are equivalent for a hypothesis space H {\displaystyle {\mathcal {H}}} : H {\displaystyle {\mathcal {H}}} is PAC-learnable. The VC dimension of H {\displaystyle {\mathcal {H}}} is finite. H {\displaystyle {\mathcal {H}}} is a uniform Glivenko-Cantelli class. This gives a way to prove that certain hypothesis spaces are PAC learnable, and by extension, learnable. === An example of a PAC-learnable hypothesis space === X = R d , Y = { − 1 , 1 } {\displaystyle X=\mathbb {R} ^{d},Y=\{-1,1\}} , and let H {\displaystyle {\mathcal {H}}} be the space of affine functions on X {\displaystyle X} , that is, functions of the form x ↦ ⟨ w , x ⟩ + b {\displaystyle x\mapsto \langl