The W3C Device Description Working Group (DDWG), operating as part of the World Wide Web Consortium (W3C) Mobile Web Initiative (MWI), was chartered to "foster the provision and access to device descriptions that can be used in support of Web-enabled applications that provide an appropriate user experience on mobile devices." Mobile devices exhibit the greatest diversity of capabilities, and therefore present the greatest challenge to content adaptation technologies. The group published several documents, including a list of requirements for an interface to a Device Description Repository (DDR) and a standard interface meeting those requirements. The group was rechartered in 2006 to work in public towards the development of the Application Programming Interface (API) for a DDR. Early in 2007, the group launched a wiki and a blog to add to the public mailing list. The group subsequently published a formal vocabulary of core device properties, and an API called the DDR Simple API, which became a W3C Recommendation in December 2008. The group closed at the end of 2008, but with the intention of maintaining the Web pages, blog and wiki through W3C volunteer effort. == Publications == The DDWG published several W3C Working Group Notes and one W3C Recommendation. A W3C WG Note that articulates what the W3C and other organizations are doing or have already done with regard to device information. This document suggests an environment in which these technologies work together to meet the goals of content adaptation. The completed document was published on 31 October 2007. A W3C WG Note describing the ecosystem surrounding creation, maintenance and use of device descriptions. The completed document was published on 31 October 2007. A W3C WG Note describing a set of requirements for a reference repository of device descriptions. The completed document was published on 17 December 2007. A W3C WG Note describing a process to manage contributions to an initial core vocabulary, identification of key device properties, a formal initial core vocabulary and the identification of a maintainer for the core vocabulary. The details were contained in the Working Group Note describing the DDWG Core Vocabulary published on 14 April 2008. A W3C WG Note defining useful grouping and structure patterns in device descriptions. The Device Description Structures document was published as a Working Draft on 5 December 2008. The intention is that this document will be future input to other W3C groups. A W3C Recommendation defining a language-neutral programming interface to a Device Description Repository. The DDR Simple API was published on 5 December 2008. There is the possibility of future publications on the DDWG wiki describing implementations of the API in various languages, including Java, IDL, WSDL, C# etc. Much of the DDWG's material was developed in public via the DDWG Wiki and through their public mailing lists.
Colloquis
Colloquis, previously known as ActiveBuddy and Conversagent, was a company that created conversation-based interactive agents originally distributed via instant messaging platforms. The company had offices in New York, New York, and Sunnyvale, California. == History == Founded in 2000, the company was the brainchild of Robert Hoffer, Timothy Kay, and Peter Levitan. The idea for interactive agents (also known as Internet bots) came from the team's vision to add functionality to increasingly popular instant messaging services. The original implementation took shape as a word-based adventure game but quickly grew to include a wide range of database applications, including access to news, weather, stock information, movie times, Yellow Pages listings, and detailed sports data, as well as a variety of tools (calculators, translator, etc.). These various applications were bundled into one entity and launched as SmarterChild in 2001. SmarterChild acted as a showcase for the quick data access and possibilities for fun conversation that the company planned to turn into customized, niche-specific products. The rapid success of SmarterChild led to targeted promotional products for Radiohead, Austin Powers, The Sporting News, and others. ActiveBuddy sought to strengthen its hold on the interactive agent market for the future by filing for, and receiving, a controversial patent on their creation in 2002. The company also released the BuddyScript SDK, a free developer kit that allow programmers to design and launch their own interactive agents using ActiveBuddy's proprietary scripting language, in 2002. Ultimately, however, the decline in ad spending in 2001 and 2002 led to a shift in corporate strategy towards business focused Automated Service Agents, building products for clients including Cingular, Comcast and Cox Communications. The company subsequently changed its name from ActiveBuddy to Conversagent in 2003, and then again to Colloquis in 2006. Colloquis was purchased by Microsoft in October 2006.
Confirmatory blockmodeling
Confirmatory blockmodeling is a deductive approach in blockmodeling, where a blockmodel (or part of it) is prespecify before the analysis, and then the analysis is fit to this model. When only a part of analysis is prespecify (like individual cluster(s) or location of the block types), it is called partially confirmatory blockmodeling. This is so-called indirect approach, where the blockmodeling is done on the blockmodel fitting (e.g., a priori hypothesized blockmodel). Opposite approach to the confirmatory blockmodeling is an inductive exploratory blockmodeling.
LogitBoost
In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost function of logistic regression, one can derive the LogitBoost algorithm. == Minimizing the LogitBoost cost function == LogitBoost can be seen as a convex optimization. Specifically, given that we seek an additive model of the form f = ∑ t α t h t {\displaystyle f=\sum _{t}\alpha _{t}h_{t}} the LogitBoost algorithm minimizes the logistic loss: ∑ i log ( 1 + e − y i f ( x i ) ) {\displaystyle \sum _{i}\log \left(1+e^{-y_{i}f(x_{i})}\right)}
Vapnik–Chervonenkis theory
Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view. == Introduction == VC theory covers at least four parts (as explained in The Nature of Statistical Learning Theory): Theory of consistency of learning processes What are (necessary and sufficient) conditions for consistency of a learning process based on the empirical risk minimization principle? Nonasymptotic theory of the rate of convergence of learning processes How fast is the rate of convergence of the learning process? Theory of controlling the generalization ability of learning processes How can one control the rate of convergence (the generalization ability) of the learning process? Theory of constructing learning machines How can one construct algorithms that can control the generalization ability? VC Theory is a major subbranch of statistical learning theory. One of its main applications in statistical learning theory is to provide generalization conditions for learning algorithms. From this point of view, VC theory is related to stability, which is an alternative approach for characterizing generalization. In addition, VC theory and VC dimension are instrumental in the theory of empirical processes, in the case of processes indexed by VC classes. Arguably these are the most important applications of the VC theory, and are employed in proving generalization. Several techniques will be introduced that are widely used in the empirical process and VC theory. The discussion is mainly based on the book Weak Convergence and Empirical Processes: With Applications to Statistics. == Overview of VC theory in empirical processes == === Background on empirical processes === Let ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} be a measurable space. For any measure Q {\displaystyle Q} on ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} , and any measurable functions f : X → R {\displaystyle f:{\mathcal {X}}\to \mathbf {R} } , define Q f = ∫ f d Q {\displaystyle Qf=\int fdQ} Measurability issues will be ignored here, for more technical detail see. Let F {\displaystyle {\mathcal {F}}} be a class of measurable functions f : X → R {\displaystyle f:{\mathcal {X}}\to \mathbf {R} } and define: ‖ Q ‖ F = sup { | Q f | : f ∈ F } . {\displaystyle \|Q\|_{\mathcal {F}}=\sup\{\vert Qf\vert \ :\ f\in {\mathcal {F}}\}.} Let X 1 , … , X n {\displaystyle X_{1},\ldots ,X_{n}} be independent, identically distributed random elements of ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} . Then define the empirical measure P n = n − 1 ∑ i = 1 n δ X i , {\displaystyle \mathbb {P} _{n}=n^{-1}\sum _{i=1}^{n}\delta _{X_{i}},} where δ here stands for the Dirac measure. The empirical measure induces a map F → R {\displaystyle {\mathcal {F}}\to \mathbf {R} } given by: f ↦ P n f = 1 n ( f ( X 1 ) + . . . + f ( X n ) ) {\displaystyle f\mapsto \mathbb {P} _{n}f={\frac {1}{n}}(f(X_{1})+...+f(X_{n}))} Now suppose P is the underlying true distribution of the data, which is unknown. Empirical Processes theory aims at identifying classes F {\displaystyle {\mathcal {F}}} for which statements such as the following hold: uniform law of large numbers: ‖ P n − P ‖ F → n 0 , {\displaystyle \|\mathbb {P} _{n}-P\|_{\mathcal {F}}{\underset {n}{\to }}0,} That is, as n → ∞ {\displaystyle n\to \infty } , | 1 n ( f ( X 1 ) + . . . + f ( X n ) ) − ∫ f d P | → 0 {\displaystyle \left|{\frac {1}{n}}(f(X_{1})+...+f(X_{n}))-\int fdP\right|\to 0} uniformly for all f ∈ F {\displaystyle f\in {\mathcal {F}}} . uniform central limit theorem: G n = n ( P n − P ) ⇝ G , in ℓ ∞ ( F ) {\displaystyle \mathbb {G} _{n}={\sqrt {n}}(\mathbb {P} _{n}-P)\rightsquigarrow \mathbb {G} ,\quad {\text{in }}\ell ^{\infty }({\mathcal {F}})} In the former case F {\displaystyle {\mathcal {F}}} is called Glivenko–Cantelli class, and in the latter case (under the assumption ∀ x , sup f ∈ F | f ( x ) − P f | < ∞ {\displaystyle \forall x,\sup \nolimits _{f\in {\mathcal {F}}}\vert f(x)-Pf\vert <\infty } ) the class F {\displaystyle {\mathcal {F}}} is called Donsker or P-Donsker. A Donsker class is Glivenko–Cantelli in probability by an application of Slutsky's theorem. These statements are true for a single f {\displaystyle f} , by standard LLN, CLT arguments under regularity conditions, and the difficulty in the Empirical Processes comes in because joint statements are being made for all f ∈ F {\displaystyle f\in {\mathcal {F}}} . Intuitively then, the set F {\displaystyle {\mathcal {F}}} cannot be too large, and as it turns out that the geometry of F {\displaystyle {\mathcal {F}}} plays a very important role. One way of measuring how big the function set F {\displaystyle {\mathcal {F}}} is to use the so-called covering numbers. The covering number N ( ε , F , ‖ ⋅ ‖ ) {\displaystyle N(\varepsilon ,{\mathcal {F}},\|\cdot \|)} is the minimal number of balls { g : ‖ g − f ‖ < ε } {\displaystyle \{g:\|g-f\|<\varepsilon \}} needed to cover the set F {\displaystyle {\mathcal {F}}} (here it is obviously assumed that there is an underlying norm on F {\displaystyle {\mathcal {F}}} ). The entropy is the logarithm of the covering number. Two sufficient conditions are provided below, under which it can be proved that the set F {\displaystyle {\mathcal {F}}} is Glivenko–Cantelli or Donsker. A class F {\displaystyle {\mathcal {F}}} is P-Glivenko–Cantelli if it is P-measurable with envelope F such that P ∗ F < ∞ {\displaystyle P^{\ast }F<\infty } and satisfies: ∀ ε > 0 sup Q N ( ε ‖ F ‖ Q , F , L 1 ( Q ) ) < ∞ . {\displaystyle \forall \varepsilon >0\quad \sup \nolimits _{Q}N(\varepsilon \|F\|_{Q},{\mathcal {F}},L_{1}(Q))<\infty .} The next condition is a version of Dudley's theorem. If F {\displaystyle {\mathcal {F}}} is a class of functions such that ∫ 0 ∞ sup Q log N ( ε ‖ F ‖ Q , 2 , F , L 2 ( Q ) ) d ε < ∞ {\displaystyle \int _{0}^{\infty }\sup \nolimits _{Q}{\sqrt {\log N\left(\varepsilon \|F\|_{Q,2},{\mathcal {F}},L_{2}(Q)\right)}}d\varepsilon <\infty } then F {\displaystyle {\mathcal {F}}} is P-Donsker for every probability measure P such that P ∗ F 2 < ∞ {\displaystyle P^{\ast }F^{2}<\infty } . In the last integral, the notation means ‖ f ‖ Q , 2 = ( ∫ | f | 2 d Q ) 1 2 {\displaystyle \|f\|_{Q,2}=\left(\int |f|^{2}dQ\right)^{\frac {1}{2}}} . === Symmetrization === The majority of the arguments about how to bound the empirical process rely on symmetrization, maximal and concentration inequalities, and chaining. Symmetrization is usually the first step of the proofs, and since it is used in many machine learning proofs on bounding empirical loss functions (including the proof of the VC inequality which is discussed in the next section). It is presented here: Consider the empirical process: f ↦ ( P n − P ) f = 1 n ∑ i = 1 n ( f ( X i ) − P f ) {\displaystyle f\mapsto (\mathbb {P} _{n}-P)f={\dfrac {1}{n}}\sum _{i=1}^{n}(f(X_{i})-Pf)} Turns out that there is a connection between the empirical and the following symmetrized process: f ↦ P n 0 f = 1 n ∑ i = 1 n ε i f ( X i ) {\displaystyle f\mapsto \mathbb {P} _{n}^{0}f={\dfrac {1}{n}}\sum _{i=1}^{n}\varepsilon _{i}f(X_{i})} The symmetrized process is a Rademacher process, conditionally on the data X i {\displaystyle X_{i}} . Therefore, it is a sub-Gaussian process by Hoeffding's inequality. Lemma (Symmetrization). For every nondecreasing, convex Φ: R → R and class of measurable functions F {\displaystyle {\mathcal {F}}} , E Φ ( ‖ P n − P ‖ F ) ≤ E Φ ( 2 ‖ P n 0 ‖ F ) {\displaystyle \mathbb {E} \Phi (\|\mathbb {P} _{n}-P\|_{\mathcal {F}})\leq \mathbb {E} \Phi \left(2\left\|\mathbb {P} _{n}^{0}\right\|_{\mathcal {F}}\right)} The proof of the Symmetrization lemma relies on introducing independent copies of the original variables X i {\displaystyle X_{i}} (sometimes referred to as a ghost sample) and replacing the inner expectation of the LHS by these copies. After an application of Jensen's inequality different signs could be introduced (hence the name symmetrization) without changing the expectation. The proof can be found below because of its instructive nature. The same proof method can be used to prove the Glivenko–Cantelli theorem. A typical way of proving empirical CLTs, first uses symmetrization to pass the empirical process to P n 0 {\displaystyle \mathbb {P} _{n}^{0}} and then argue conditionally on the data, using the fact that Rademacher processes are simple processes with nice properties. === VC Connection === It turns out that there is a fascinating connection between certain combinatorial properties of the set F {\displaystyle {\mathcal {F}}} and the entropy numbers. Uniform covering numbers can be controlled by the notion of Vapnik–Chervonenkis classes of sets – or shortly VC sets. Consider a collection C {\displaystyle {\mathcal {C}}} of subsets of the sample space X {\displaystyle
Pixel aspect ratio
A pixel aspect ratio (PAR) is a mathematical ratio that describes how the width of a pixel in a digital image compares to the height of that pixel. Most digital imaging systems display an image as a grid of tiny, square pixels. However, some imaging systems, especially those that must be compatible with standard-definition television motion pictures, display an image as a grid of rectangular pixels, in which the pixel width and height are different. Pixel aspect ratio describes this difference. Use of pixel aspect ratio mostly involves pictures pertaining to standard-definition television and some other exceptional cases. Most other imaging systems, including those that comply with SMPTE standards and practices, use square pixels. PAR is also known as sample aspect ratio and abbreviated SAR, though it can be confused with storage aspect ratio. == Introduction == The ratio of the width to the height of an image is known as the aspect ratio, or more precisely the display aspect ratio (DAR) – the aspect ratio of the image as displayed; for TV, DAR was traditionally 4:3 (a.k.a. fullscreen), with 16:9 (a.k.a. widescreen) now the standard for HDTV. In digital images, there is a distinction with the storage aspect ratio (SAR), which is the ratio of pixel dimensions. If an image is displayed with square pixels, then these ratios agree; if not, then non-square, "rectangular" pixels are used, and these ratios disagree. The aspect ratio of the pixels themselves is known as the pixel aspect ratio (PAR) – for square pixels this is 1:1 – and these are related by the identity: Rearranging (solving for PAR) yields: For example: A 640 × 480 VGA image has a SAR of 640/480 = 4:3, and if displayed on a 4:3 display (DAR = 4:3) has square pixels, hence a PAR of 1:1. By contrast, a 720 × 576 D-1 PAL image has a SAR of 720/576 = 5:4, but if displayed on a 4:3 display (DAR = 4:3) the PAR is 4/3 : 5/4 = 16:15 ≈ 1.066. This means that the pixels of the PAL picture must be "stretched" by this amount to fit in the 4:3 display. In analog images such as film there is no notion of pixel, nor notion of SAR or PAR, but in the digitization of analog images the resulting digital image has pixels, hence SAR (and accordingly PAR, if displayed at the same aspect ratio as the original). Non-square pixels arise often in early digital TV standards, related to digitalization of analog TV signals – whose vertical and "effective" horizontal resolutions differ and are thus best described by non-square pixels – and also in some digital video cameras and computer display modes, such as Color Graphics Adapter (CGA). Today they arise also in transcoding between resolutions with different SARs. Actual displays do not generally have non-square pixels, though digital sensors might; they are rather a mathematical abstraction used in resampling images to convert between resolutions. There are several complicating factors in understanding PAR, particularly as it pertains to digitization of analog video: First, analog video does not have pixels, but rather a raster scan, and thus has a well-defined vertical resolution (the lines of the raster), but not a well-defined horizontal resolution, since each line is an analog signal. However, by a standardized sampling rate, the effective horizontal resolution can be determined by the sampling theorem, as is done below. Second, due to overscan, some of the lines at the top and bottom of the raster are not visible, as are some of the possible image on the left and right – see Overscan: Analog to digital resolution issues. Also, the resolution may be rounded (DV NTSC uses 480 lines, rather than the 486 that are possible). Third, analog video signals are interlaced – each image (frame) is sent as two "fields", each with half the lines. Thus either the pixels are twice as tall as they would be without interlacing, or the image is deinterlaced. == Background == Video is presented as a sequential series of images called video frames. Historically, video frames were created and recorded in analog form. As digital display technology, digital broadcast technology, and digital video compression evolved separately, it resulted in video frame differences that must be addressed using pixel aspect ratio. Digital video frames are generally defined as a grid of pixels used to present each sequential image. The horizontal component is defined by pixels (or samples), and is known as a video line. The vertical component is defined by the number of lines, as in 480 lines. Standard-definition television standards and practices were developed as broadcast technologies and intended for terrestrial broadcasting, and were therefore not designed for digital video presentation. Such standards define an image as an array of well-defined horizontal "Lines", well-defined vertical "Line Duration" and a well-defined picture center. However, there is not a standard-definition television standard that properly defines image edges or explicitly demands a certain number of picture elements per line. Furthermore, analog video systems such as NTSC 480i and PAL 576i, instead of employing progressively displayed frames, employ fields or interlaced half-frames displayed in an interwoven manner to reduce flicker and double the image rate for smoother motion. === Analog-to-digital conversion === As a result of computers becoming powerful enough to serve as video editing tools, video digital-to-analog converters and analog-to-digital converters were made to overcome this incompatibility. To convert analog video lines into a series of square pixels, the industry adopted a default sampling rate at which luma values were extracted into pixels. The luma sampling rate for 480i pictures was 12+3⁄11 MHz and for 576i pictures was 14+3⁄4 MHz. The term pixel aspect ratio was first coined when ITU-R BT.601 (commonly known as Rec. 601) specified that standard-definition television pictures are made of lines of exactly 720 non-square pixels. ITU-R BT.601 did not define the exact pixel aspect ratio but did provide enough information to calculate the exact pixel aspect ratio based on industry practices: The standard luma sampling rate of precisely 13+1⁄2 MHz. Based on this information: The pixel aspect ratio for 480i would be 10:11 as: 12 3 11 ÷ 13 1 2 = 10 11 {\displaystyle 12{\tfrac {3}{11}}\div 13{\tfrac {1}{2}}={\tfrac {10}{11}}} The pixel aspect ratio for 576i would be 59:54 as: 14 3 4 ÷ 13 1 2 = 59 54 {\displaystyle 14{\tfrac {3}{4}}\div 13{\tfrac {1}{2}}={\tfrac {59}{54}}} SMPTE RP 187 further attempted to standardize the pixel aspect ratio values for 480i and 576i. It designated 177:160 for 480i or 1035:1132 for 576i. However, due to significant difference with practices in effect by industry and the computational load that they imposed upon the involved hardware, SMPTE RP 187 was simply ignored. SMPTE RP 187 information annex A.4 further suggested the use of 10:11 for 480i. As of this writing, ITU-R BT.601-6, which is the latest edition of ITU-R BT.601, still implies that the pixel aspect ratios mentioned above are correct. === Digital video processing === As stated above, ITU-R BT.601 specified that standard-definition television pictures are made of lines of 720 non-square pixels, sampled with a precisely specified sampling rate. A simple mathematical calculation reveals that a 704 pixel width would be enough to contain a 480i or 576i standard 4:3 picture: A 4:3 480-line picture, digitized with the Rec. 601-recommended sampling rate, would be 704 non-square pixels wide. x 480 × 10 11 = 4 3 ⇒ x = 480 × 11 × 4 10 × 3 = 704 {\displaystyle {\frac {x}{480}}\times {\frac {10}{11}}={\frac {4}{3}}\Rightarrow x={\frac {480\times 11\times 4}{10\times 3}}=704} A 4:3 576-line picture, digitized with the Rec. 601-recommended sampling rate, would be 702+54⁄59 non-square pixels wide. x 576 × 59 54 = 4 3 ⇒ x = 576 × 54 × 4 59 × 3 = 702 54 59 {\displaystyle {\frac {x}{576}}\times {\frac {59}{54}}={\frac {4}{3}}\Rightarrow x={\frac {576\times 54\times 4}{59\times 3}}=702{\tfrac {54}{59}}} Unfortunately, not all standard TV pictures are exactly 4:3: As mentioned earlier, in analog video, the center of a picture is well-defined but the edges of the picture are not standardized. As a result, some analog devices (mostly PAL devices but also some NTSC devices) generated motion pictures that were horizontally (slightly) wider. This also proportionately applies to anamorphic widescreen (16:9) pictures. Therefore, to maintain a safe margin of error, ITU-R BT.601 required sampling 16 more non-square pixels per line (8 more at each edge) to ensure saving all video data near the margins. This requirement, however, had implications for PAL motion pictures. PAL pixel aspect ratios for standard (4:3) and anamorphic wide screen (16:9), respectively 59:54 and 118:81, were awkward for digital image processing, especially for mixing PAL and NTSC video clips. Therefore, video editing products chose the almost equivalent value
Probably approximately correct learning
In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant. In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error (the "approximately correct" part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples. The model was later extended to treat noise (misclassified samples). An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (time and space requirements bounded to a polynomial of the example size), and the learner itself must implement an efficient procedure (requiring an example count bounded to a polynomial of the concept size, modified by the approximation and likelihood bounds). == Definitions and terminology == In order to give the definition for something that is PAC-learnable, we first have to introduce some terminology. For the following definitions, two examples will be used. The first is the problem of character recognition given an array of n {\displaystyle n} bits encoding a binary-valued image. The other example is the problem of finding an interval that will correctly classify points within the interval as positive and the points outside of the range as negative. Let X {\displaystyle X} be a set called the instance space or the encoding of all the samples. In the character recognition problem, the instance space is X = { 0 , 1 } n {\displaystyle X=\{0,1\}^{n}} . In the interval problem the instance space, X {\displaystyle X} , is the set of all bounded intervals in R {\displaystyle \mathbb {R} } , where R {\displaystyle \mathbb {R} } denotes the set of all real numbers. A concept is a subset c ⊂ X {\displaystyle c\subset X} . One concept is the set of all patterns of bits in X = { 0 , 1 } n {\displaystyle X=\{0,1\}^{n}} that encode a picture of the letter "P". An example concept from the second example is the set of open intervals, { ( a , b ) ∣ 0 ≤ a ≤ π / 2 , π ≤ b ≤ 13 } {\displaystyle \{(a,b)\mid 0\leq a\leq \pi /2,\pi \leq b\leq {\sqrt {13}}\}} , each of which contains only the positive points. A concept class C {\displaystyle C} is a collection of concepts over X {\displaystyle X} . This could be the set of all subsets of the array of bits that are skeletonized 4-connected (width of the font is 1). Let EX ( c , D ) {\displaystyle \operatorname {EX} (c,D)} be a procedure that draws an example, x {\displaystyle x} , using a probability distribution D {\displaystyle D} and gives the correct label c ( x ) {\displaystyle c(x)} , that is 1 if x ∈ c {\displaystyle x\in c} and 0 otherwise. Now, given 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} , assume there is an algorithm A {\displaystyle A} and a polynomial p {\displaystyle p} in 1 / ϵ , 1 / δ {\displaystyle 1/\epsilon ,1/\delta } (and other relevant parameters of the class C {\displaystyle C} ) such that, given a sample of size p {\displaystyle p} drawn according to EX ( c , D ) {\displaystyle \operatorname {EX} (c,D)} , then, with probability of at least 1 − δ {\displaystyle 1-\delta } , A {\displaystyle A} outputs a hypothesis h ∈ C {\displaystyle h\in C} that has an average error less than or equal to ϵ {\displaystyle \epsilon } on X {\displaystyle X} with the same distribution D {\displaystyle D} . Further if the above statement for algorithm A {\displaystyle A} is true for every concept c ∈ C {\displaystyle c\in C} and for every distribution D {\displaystyle D} over X {\displaystyle X} , and for all 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} then C {\displaystyle C} is (efficiently) PAC learnable (or distribution-free PAC learnable). We can also say that A {\displaystyle A} is a PAC learning algorithm for C {\displaystyle C} . == Equivalence == Under some regularity conditions these conditions are equivalent: The concept class C is PAC learnable. The VC dimension of C is finite. C is a uniformly Glivenko-Cantelli class. C is compressible in the sense of Littlestone and Warmuth