Web data integration (WDI) is the process of aggregating and managing data from different websites into a single, homogeneous workflow. This process includes data access, transformation, mapping, quality assurance and fusion of data. Data that is sourced and structured from websites is referred to as "web data". WDI is an extension and specialization of data integration that views the web as a collection of heterogeneous databases. Data integration techniques in the context of the web, forms the foundation for businesses taking advantage of data available on the ever-increasing number of publicly-accessible websites. Corporate spending on this area amounted to about USD 2.5bn in 2017, and it is expected that by 2020 the market will reach almost USD 7bn.
Blackmagic Design
Blackmagic Design Pty Ltd is an Australian company that develops digital cinema technology and manufactures professional video production hardware and software. Headquartered in South Melbourne, it is known for producing high-end digital movie cameras and a range of broadcast and post-production equipment. The company also develops software applications, including the DaVinci Resolve application for non-linear video editing, color correction, color grading, visual effects, and audio post-production. == History == Blackmagic Design Pty Ltd was founded on 7 September 2001 by Grant Petty. Its first product, DeckLink, introduced in 2002, was a video capture card for macOS that supported uncompressed 10-bit video, marking a shift toward professional-grade yet affordable video workflows. Subsequent versions—including the DeckLink 2, Pro SDI, HD Plus, and Multibridge—added capabilities such as color correction, Windows support, and compatibility with major editing software like Adobe Premiere Pro, to broaden the product's appeal. At the 2012 NAB Show, Blackmagic announced its first Cinema Camera, a digital movie camera. Blackmagic made several acquisitions over the next decade. In 2009, it acquired da Vinci Systems, known for its color-grading tools. In 2010, it acquired Echolab's ATEM switcher line, in 2014, it added eyeon Software (developer of the Blackmagic Fusion compositing software) and London's Cintel (film scanning and restoration), and in 2016, it acquired Fairlight, an audio technology company known for its CMI synthesizers as well as mixing consoles. == Products == List of all products developed by the company. Editing, Color Correction and Audio Post Production DaVinci Resolve (free version) and DaVinci Resolve Studio (paid version), computer software for non-linear video editing, color correction, color grading, visual effects, and audio post-production. Audio/Video Controller Consoles: Editor Keyboard, Speed Editor, DaVinci Resolve Replay Editor, Micro Panel, Mini Panel, DaVinci Resolve Micro Color Panel, Advanced Panel, Fairlight Console Channel Fader, Fairlight Console Channel Control, Fairlight Console LCD Monitor, Fairlight Console Audio Editor, Fairlight Desktop Audio Editor, Fairlight Desktop Console, Fairlight Audio Interface Cintel Film Scanner (Generations 1-3) Live Production Home Streaming: ATEM Mini, ATEM Mini Pro/ISO, ATEM Mini Extreme, ATEM Mini Extreme ISO (The ATEM Mini series has both HDMI and SDI variants) Production Switchers: ATEM 1,2 & 4 M/E Constellation HD, ATEM 1,2 & 4 M/E Constellation 4K, ATEM Constellation 8K, ATEM 1,2 & 4 M/E Production Studio 4K, ATEM Television Studio HD8 & HD8 ISO Switcher & Camera Controllers: ATEM Camera Control Panel, ATEM 1 M/E Advanced Panel, ATEM 2 M/E Advanced Panel, ATEM 4 M/E Advanced Panel Chroma Keyers: Ultimatte 12 HD Mini, Ultimatte 12 HD, Ultimatte 12 4K, Ultimatte 12 8K Recording and Storage: HyperDeck Studio HD Mini, HyperDeck Studio HD Plus, HyperDeck Studio HD Plus, HyperDeck Studio 4K Pro, HyperDeck Extreme 8K HDR, HyperDeck Extreme 4K HDR, HyperDeck Extreme Control, HyperDeck Shuttle HD, Duplicator 4K, MultiDock 10G, Video Assist 7" 12G HDR, Video Assist 5" 12G HDR Capture and Playback UltraStudio: 3G, HD Mini, 4K Mini, 4K Extreme 3 DeckLink (PCIe cards): Mini Recorder, Mini Monitor, Mini Monitor 4K, Mini Recorder 4K, Duo 2 Mini, Duo 2, Quad 2, SDI 4K, Studio 4K, 4K Extreme 12G, 8K Pro, Quad HDMI Recorder Network Storage Cloud Store Cloud Pod Broadcast Converters Micro Converter: BiDirectional SDI/HDMI 3G wPSU, HDMI to SDI 3G wPSU, SDI to HDMI 3G wPSU, BiDirectional SDI/HDMI 3G, HDMI to SDI 3G, SDI to HDMI 3G Mini Converters: Audio to SDI, Optical Fiber 12G, SDI Multiplex 4K, Quad SDI to HDMI 4K, SDI Distribution 4K, SDI to Analog 4K, Audio to SDI 4K, SDI to Audio 4K, HDMI to SDI 6G, SDI to HDMI 6G Teranex Mini: SDI Distribution 12G, SDI to HDMI 12G, Audio to SDI 12G, SDI to Analog 12G, SDI to HDMI 8K HDR, SDI to DisplayPort 8K HDR 2110 IP Converters Routing and Distribution Videohub
Stochastic gradient descent
Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient (calculated from the entire data set) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate. The basic idea behind stochastic approximation can be traced back to the Robbins–Monro algorithm of the 1950s. Today, stochastic gradient descent has become an important optimization method in machine learning. == Background == Both statistical estimation and machine learning consider the problem of minimizing an objective function that has the form of a sum: Q ( w ) = 1 n ∑ i = 1 n Q i ( w ) , {\displaystyle Q(w)={\frac {1}{n}}\sum _{i=1}^{n}Q_{i}(w),} where the parameter w {\displaystyle w} that minimizes Q ( w ) {\displaystyle Q(w)} is to be estimated. Each summand function Q i {\displaystyle Q_{i}} is typically associated with the i {\displaystyle i} -th observation in the data set (used for training). In classical statistics, sum-minimization problems arise in least squares and in maximum-likelihood estimation (for independent observations). The general class of estimators that arise as minimizers of sums are called M-estimators. However, in statistics, it has been long recognized that requiring even local minimization is too restrictive for some problems of maximum-likelihood estimation. Therefore, contemporary statistical theorists often consider stationary points of the likelihood function (or zeros of its derivative, the score function, and other estimating equations). The sum-minimization problem also arises for empirical risk minimization. There, Q i ( w ) {\displaystyle Q_{i}(w)} is the value of the loss function at i {\displaystyle i} -th example, and Q ( w ) {\displaystyle Q(w)} is the empirical risk. When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations: w := w − η ∇ Q ( w ) = w − η n ∑ i = 1 n ∇ Q i ( w ) . {\displaystyle w:=w-\eta \,\nabla Q(w)=w-{\frac {\eta }{n}}\sum _{i=1}^{n}\nabla Q_{i}(w).} The step size is denoted by η {\displaystyle \eta } (sometimes called the learning rate in machine learning) and here " := {\displaystyle :=} " denotes the update of a variable in the algorithm. In many cases, the summand functions have a simple form that enables inexpensive evaluations of the sum-function and the sum gradient. For example, in statistics, one-parameter exponential families allow economical function-evaluations and gradient-evaluations. However, in other cases, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. When the training set is enormous and no simple formulas exist, evaluating the sums of gradients becomes very expensive, because evaluating the gradient requires evaluating all the summand functions' gradients. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very effective in the case of large-scale machine learning problems. == Iterative method == In stochastic (or "on-line") gradient descent, the true gradient of Q ( w ) {\displaystyle Q(w)} is approximated by a gradient at a single sample: w := w − η ∇ Q i ( w ) . {\displaystyle w:=w-\eta \,\nabla Q_{i}(w).} As the algorithm sweeps through the training set, it performs the above update for each training sample. Several passes can be made over the training set until the algorithm converges. If this is done, the data can be shuffled for each pass to prevent cycles. Typical implementations may use an adaptive learning rate so that the algorithm converges. In pseudocode, stochastic gradient descent can be presented as : A compromise between computing the true gradient and the gradient at a single sample is to compute the gradient against more than one training sample (called a "mini-batch") at each step. This can perform significantly better than "true" stochastic gradient descent described, because the code can make use of vectorization libraries rather than computing each step separately as was first shown in where it was called "the bunch-mode back-propagation algorithm". It may also result in smoother convergence, as the gradient computed at each step is averaged over more training samples. The convergence of stochastic gradient descent has been analyzed using the theories of convex minimization and of stochastic approximation. Briefly, when the learning rates η {\displaystyle \eta } decrease with an appropriate rate, and subject to relatively mild assumptions, stochastic gradient descent converges almost surely to a global minimum when the objective function is convex or pseudoconvex, and otherwise converges almost surely to a local minimum. This is in fact a consequence of the Robbins–Siegmund theorem. == Linear regression == Suppose we want to fit a straight line y ^ = w 1 + w 2 x {\displaystyle {\hat {y}}=w_{1}+w_{2}x} to a training set with observations ( ( x 1 , y 1 ) , ( x 2 , y 2 ) … , ( x n , y n ) ) {\displaystyle ((x_{1},y_{1}),(x_{2},y_{2})\ldots ,(x_{n},y_{n}))} and corresponding estimated responses ( y ^ 1 , y ^ 2 , … , y ^ n ) {\displaystyle ({\hat {y}}_{1},{\hat {y}}_{2},\ldots ,{\hat {y}}_{n})} using least squares. The objective function to be minimized is Q ( w ) = ∑ i = 1 n Q i ( w ) = ∑ i = 1 n ( y ^ i − y i ) 2 = ∑ i = 1 n ( w 1 + w 2 x i − y i ) 2 . {\displaystyle Q(w)=\sum _{i=1}^{n}Q_{i}(w)=\sum _{i=1}^{n}\left({\hat {y}}_{i}-y_{i}\right)^{2}=\sum _{i=1}^{n}\left(w_{1}+w_{2}x_{i}-y_{i}\right)^{2}.} The last line in the above pseudocode for this specific problem will become: [ w 1 w 2 ] ← [ w 1 w 2 ] − η [ ∂ ∂ w 1 ( w 1 + w 2 x i − y i ) 2 ∂ ∂ w 2 ( w 1 + w 2 x i − y i ) 2 ] = [ w 1 w 2 ] − η [ 2 ( w 1 + w 2 x i − y i ) 2 x i ( w 1 + w 2 x i − y i ) ] . {\displaystyle {\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}\leftarrow {\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}{\frac {\partial }{\partial w_{1}}}(w_{1}+w_{2}x_{i}-y_{i})^{2}\\{\frac {\partial }{\partial w_{2}}}(w_{1}+w_{2}x_{i}-y_{i})^{2}\end{bmatrix}}={\begin{bmatrix}w_{1}\\w_{2}\end{bmatrix}}-\eta {\begin{bmatrix}2(w_{1}+w_{2}x_{i}-y_{i})\\2x_{i}(w_{1}+w_{2}x_{i}-y_{i})\end{bmatrix}}.} Note that in each iteration or update step, the gradient is only evaluated at a single x i {\displaystyle x_{i}} . This is the key difference between stochastic gradient descent and batched gradient descent. In general, given a linear regression y ^ = ∑ k ∈ 1 : m w k x k {\displaystyle {\hat {y}}=\sum _{k\in 1:m}w_{k}x_{k}} problem, stochastic gradient descent behaves differently when m < n {\displaystyle m Silhouette is a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object has been classified. It was proposed by Belgian statistician Peter Rousseeuw in 1987. The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette value ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters. A clustering with an average silhouette width of over 0.7 is considered to be "strong", a value over 0.5 "reasonable", and over 0.25 "weak". However, with an increasing dimensionality of the data, it becomes difficult to achieve such high values because of the curse of dimensionality, as the distances become more similar. The silhouette score is specialized for measuring cluster quality when the clusters are convex-shaped, and may not perform well if the data clusters have irregular shapes or are of varying sizes. The silhouette value can be calculated with any distance metric, such as Euclidean distance or Manhattan distance. == Definition == Assume the data have been clustered via any technique, such as k-medoids or k-means, into k {\displaystyle k} clusters. For data point i ∈ C i {\displaystyle i\in C_{i}} (data point i {\displaystyle i} in the cluster C i {\displaystyle C_{i}} ), calculate a ( i ) {\displaystyle a(i)} , the average distance that i {\displaystyle i} is from all other points in that cluster: a ( i ) = 1 | C i | − 1 ∑ j ∈ C i , i ≠ j d ( i , j ) {\displaystyle a(i)={\frac {1}{|C_{i}|-1}}\sum _{j\in C_{i},i\neq j}d(i,j)} where | C i | {\displaystyle |C_{i}|} is the number of points belonging to cluster C i {\displaystyle C_{i}} , and d ( i , j ) {\displaystyle d(i,j)} is the distance between data points i {\displaystyle i} and j {\displaystyle j} in the cluster C i {\displaystyle C_{i}} (we divide by | C i | − 1 {\displaystyle |C_{i}|-1} because the distance d ( i , i ) {\displaystyle d(i,i)} is not included in the sum). a ( i ) {\displaystyle a(i)} can be interpreted as a measure of how well i {\displaystyle i} is assigned to its cluster (the smaller the value, the better the assignment). We then define the mean dissimilarity of point i {\displaystyle i} to some cluster C j {\displaystyle C_{j}} as the mean of the distance from i {\displaystyle i} to all points in C j {\displaystyle C_{j}} (where C j ≠ C i {\displaystyle C_{j}\neq C_{i}} ). For each data point i ∈ C i {\displaystyle i\in C_{i}} , we now define b ( i ) {\displaystyle b(i)} as the average distance between i {\displaystyle i} and the points in the closest cluster (hence: "min") that i {\displaystyle i} does not belong to: b ( i ) = min j ≠ i 1 | C j | ∑ l ∈ C j d ( i , l ) {\displaystyle b(i)=\min _{j\neq i}{\frac {1}{|C_{j}|}}\sum _{l\in C_{j}}d(i,l)} The cluster with the smallest mean dissimilarity is said to be the "neighboring cluster" of i {\displaystyle i} because it is the next best fit cluster for point i {\displaystyle i} . We now define a silhouette (value) of one data point i {\displaystyle i} s ( i ) = b ( i ) − a ( i ) max { a ( i ) , b ( i ) } {\displaystyle s(i)={\frac {b(i)-a(i)}{\max\{a(i),b(i)\}}}} , if | C i | > 1 {\displaystyle |C_{i}|>1} and s ( i ) = 0 {\displaystyle s(i)=0} , if | C i | = 1 {\displaystyle |C_{i}|=1} , which can also be written as s ( i ) = { 1 − a ( i ) b ( i ) , if a ( i ) < b ( i ) 0 , if a ( i ) = b ( i ) b ( i ) a ( i ) − 1 , if a ( i ) > b ( i ) {\displaystyle s(i)={\begin{cases}1-{\frac {a(i)}{b(i)}},&{\mbox{ if }}a(i)b(i)\\\end{cases}}} From the above definition, s ( i ) {\displaystyle s(i)} is bounded to the interval [ − 1 , 1 ] {\displaystyle [-1,1]} , i.e. − 1 ≤ s ( i ) ≤ 1. {\displaystyle -1\leq s(i)\leq 1.} Note that a ( i ) {\displaystyle a(i)} is not clearly defined for clusters with size = 1, in which case we set s ( i ) = 0 {\displaystyle s(i)=0} . This choice is arbitrary, but neutral in the sense that it is at the midpoint of the bounds, -1 and 1. For s ( i ) {\displaystyle s(i)} to be close to 1 we require a ( i ) ≪ b ( i ) {\displaystyle a(i)\ll b(i)} . As a ( i ) {\displaystyle a(i)} is a measure of how dissimilar i {\displaystyle i} is to its own cluster, a small value means it is well matched. Furthermore, a large b ( i ) {\displaystyle b(i)} implies that i {\displaystyle i} is badly matched to its neighbouring cluster. Thus an s ( i ) {\displaystyle s(i)} close to 1 means that the data is appropriately clustered. If s ( i ) {\displaystyle s(i)} is close to -1, then by the same logic we see that i {\displaystyle i} would be more appropriate if it was clustered in its neighbouring cluster. An s ( i ) {\displaystyle s(i)} near zero means that the datum is on the border of two natural clusters. The mean s ( i ) {\displaystyle s(i)} over all points of a cluster is a measure of how tightly grouped all the points in the cluster are. Thus the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset is a measure of how appropriately the data have been clustered. If there are too many or too few clusters, as may occur when a poor choice of k {\displaystyle k} is used in the clustering algorithm (e.g., k-means), some of the clusters will typically display much narrower silhouettes than the rest. Thus silhouette plots and means may be used to determine the natural number of clusters within a dataset. One can also increase the likelihood of the silhouette being maximized at the correct number of clusters by re-scaling the data using feature weights that are cluster specific. Kaufman et al. introduced the term silhouette coefficient for the maximum value of the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset, i.e., S C = max k s ~ ( k ) , {\displaystyle SC=\max _{k}{\tilde {s}}\left(k\right),} where s ~ ( k ) {\displaystyle {\tilde {s}}\left(k\right)} represents the mean s ( i ) {\displaystyle s(i)} over all data of the entire dataset for a specific number of clusters k {\displaystyle k} . The silhouette coefficient describes the best possible clustering possible for a given number of clusters, as measured by the highest average silhouette score for all points in the dataset. == Simplified and medoid silhouette == Computing the silhouette coefficient needs all O ( N 2 ) {\displaystyle {\mathcal {O}}(N^{2})} pairwise distances, making this evaluation much more costly than clustering with k-means. For a clustering with centers μ C I {\displaystyle \mu _{C_{I}}} for each cluster C I {\displaystyle C_{I}} , we can use the following simplified Silhouette for each point i ∈ C I {\displaystyle i\in C_{I}} instead, which can be computed using only O ( N k ) {\displaystyle {\mathcal {O}}(Nk)} distances: a ′ ( i ) = d ( i , μ C I ) {\displaystyle a'(i)=d(i,\mu _{C_{I}})} and b ′ ( i ) = min C J ≠ C I d ( i , μ C J ) {\displaystyle b'(i)=\min _{C_{J}\neq C_{I}}d(i,\mu _{C_{J}})} , which has the additional benefit that a ′ ( i ) {\displaystyle a'(i)} is always defined, then define accordingly the simplified silhouette and simplified silhouette coefficient s ′ ( i ) = b ′ ( i ) − a ′ ( i ) max { a ′ ( i ) , b ′ ( i ) } {\displaystyle s'(i)={\frac {b'(i)-a'(i)}{\max\{a'(i),b'(i)\}}}} S C ′ = max k 1 N ∑ i s ′ ( i ) {\displaystyle SC'=\max _{k}{\frac {1}{N}}\sum _{i}s'\left(i\right)} . If the cluster centers are medoids (as in k-medoids clustering) instead of arithmetic means (as in k-means clustering), this is also called the medoid-based silhouette or medoid silhouette. If every object is assigned to the nearest medoid (as in k-medoids clustering), we know that a ′ ( i ) ≤ b ′ ( i ) {\displaystyle a'(i)\leq b'(i)} , and hence s ′ ( i ) = b ′ ( i ) − a ′ ( i ) b ′ ( i ) = 1 − a ′ ( i ) b ′ ( i ) {\displaystyle s'(i)={\frac {b'(i)-a'(i)}{b'(i)}}=1-{\frac {a'(i)}{b'(i)}}} . == Silhouette clustering == Instead of using the average silhouette to evaluate a clustering obtained from, e.g., k-medoids or k-means, we can try to directly find a solution that maximizes the Silhouette. We do not have a closed form solution to maximize this, but it will usually be best to assign points to the nearest cluster as done by these methods. Van der Laan et al. proposed to adapt the standard algorithm for k-medoids, PAM, for this purpose and call this algorithm PAMSIL: Choose initial medoids by using PAM Compute the average silhouette of this initial solution For each pair of a medoid m and a non-medoid x swap m and x compute the average silhouette of the resulting solution remember the best swap un-swap m and x for the next iteration Perform the best swap and return to Vowpal Wabbit (VW) is an open-source fast online interactive machine learning system library and program developed originally at Yahoo! Research, and currently at Microsoft Research. It was started and is led by John Langford. Vowpal Wabbit's interactive learning support is particularly notable including Contextual Bandits, Active Learning, and forms of guided Reinforcement Learning. Vowpal Wabbit provides an efficient scalable out-of-core implementation with support for a number of machine learning reductions, importance weighting, and a selection of different loss functions and optimization algorithms. == Notable features == The VW program supports: Multiple supervised (and semi-supervised) learning problems: Classification (both binary and multi-class) Regression Active learning (partially labeled data) for both regression and classification Multiple learning algorithms (model-types / representations) OLS regression Matrix factorization (sparse matrix SVD) Single layer neural net (with user specified hidden layer node count) Searn (Search and Learn) Latent Dirichlet Allocation (LDA) Stagewise polynomial approximation Recommend top-K out of N One-against-all (OAA) and cost-sensitive OAA reduction for multi-class Weighted all pairs Contextual-bandit (with multiple exploration/exploitation strategies) Multiple loss functions: squared error quantile hinge logistic poisson Multiple optimization algorithms Stochastic gradient descent (SGD) BFGS Conjugate gradient Regularization (L1 norm, L2 norm, & elastic net regularization) Flexible input - input features may be: Binary Numerical Categorical (via flexible feature-naming and the hash trick) Can deal with missing values/sparse-features Other features On the fly generation of feature interactions (quadratic and cubic) On the fly generation of N-grams with optional skips (useful for word/language data-sets) Automatic test-set holdout and early termination on multiple passes bootstrapping User settable online learning progress report + auditing of the model Hyperparameter optimization == Scalability == Vowpal wabbit has been used to learn a tera-feature (1012) data-set on 1000 nodes in one hour. Its scalability is aided by several factors: Out-of-core online learning: no need to load all data into memory The hashing trick: feature identities are converted to a weight index via a hash (uses 32-bit MurmurHash3) Exploiting multi-core CPUs: parsing of input and learning are done in separate threads. Compiled C++ code igHome is a customizable start page introduced in 2012 as an alternative to iGoogle, the personal web portal launched by Google in May 2005. Just like iGoogle, igHome offers users the possibility to build a start page containing a central search box and a number of gadgets. igHome mimics the user interface of iGoogle. Registered igHome users can create multiple tabs and import RSS feeds. In computational geometry, the farthest-first traversal of a compact metric space is a sequence of points in the space, where the first point is selected arbitrarily and each successive point is as far as possible from the set of previously-selected points. The same concept can also be applied to a finite set of geometric points, by restricting the selected points to belong to the set or equivalently by considering the finite metric space generated by these points. For a finite metric space or finite set of geometric points, the resulting sequence forms a permutation of the points, also known as the greedy permutation. Every prefix of a farthest-first traversal provides a set of points that is widely spaced and close to all remaining points. More precisely, no other set of equally many points can be spaced more than twice as widely, and no other set of equally many points can be less than half as far to its farthest remaining point. In part because of these properties, farthest-point traversals have many applications, including the approximation of the traveling salesman problem and the metric k-center problem. They may be constructed in polynomial time, or (for low-dimensional Euclidean spaces) approximated in near-linear time. == Definition and properties == A farthest-first traversal is a sequence of points in a compact metric space, with each point appearing at most once. If the space is finite, each point appears exactly once, and the traversal is a permutation of all of the points in the space. The first point of the sequence may be any point in the space. Each point p after the first must have the maximum possible distance to the set of points earlier than p in the sequence, where the distance from a point to a set is defined as the minimum of the pairwise distances to points in the set. A given space may have many different farthest-first traversals, depending both on the choice of the first point in the sequence (which may be any point in the space) and on ties for the maximum distance among later choices. Farthest-point traversals may be characterized by the following properties. Fix a number k, and consider the prefix formed by the first k points of the farthest-first traversal of any metric space. Let r be the distance between the final point of the prefix and the other points in the prefix. Then this subset has the following two properties: All pairs of the selected points are at distance at least r from each other, and All points of the metric space are at distance at most r from the subset. Conversely any sequence having these properties, for all choices of k, must be a farthest-first traversal. These are the two defining properties of a Delone set, so each prefix of the farthest-first traversal forms a Delone set. == Applications == Rosenkrantz, Stearns & Lewis (1977) used the farthest-first traversal to define the farthest-insertion heuristic for the travelling salesman problem. This heuristic finds approximate solutions to the travelling salesman problem by building up a tour on a subset of points, adding one point at a time to the tour in the ordering given by a farthest-first traversal. To add each point to the tour, one edge of the previous tour is broken and replaced by a pair of edges through the added point, in the cheapest possible way. Although Rosenkrantz et al. prove only a logarithmic approximation ratio for this method, they show that in practice it often works better than other insertion methods with better provable approximation ratios. Later, the same sequence of points was popularized by Gonzalez (1985), who used it as part of greedy approximation algorithms for two problems in clustering, in which the goal is to partition a set of points into k clusters. One of the two problems that Gonzalez solve in this way seeks to minimize the maximum diameter of a cluster, while the other, known as the metric k-center problem, seeks to minimize the maximum radius, the distance from a chosen central point of a cluster to the farthest point from it in the same cluster. For instance, the k-center problem can be used to model the placement of fire stations within a city, in order to ensure that every address within the city can be reached quickly by a fire truck. For both clustering problems, Gonzalez chooses a set of k cluster centers by selecting the first k points of a farthest-first traversal, and then creates clusters by assigning each input point to the nearest cluster center. If r is the distance from the set of k selected centers to the next point at position k + 1 in the traversal, then with this clustering every point is within distance r of its center and every cluster has diameter at most 2r. However, the subset of k centers together with the next point are all at distance at least r from each other, and any k-clustering would put some two of these points into a single cluster, with one of them at distance at least r/2 from its center and with diameter at least r. Thus, Gonzalez's heuristic gives an approximation ratio of 2 for both clustering problems. Gonzalez's heuristic was independently rediscovered for the metric k-center problem by Dyer & Frieze (1985), who applied it more generally to weighted k-center problems. Another paper on the k-center problem from the same time, Hochbaum & Shmoys (1985), achieves the same approximation ratio of 2, but its techniques are different. Nevertheless, Gonzalez's heuristic, and the name "farthest-first traversal", are often incorrectly attributed to Hochbaum and Shmoys. For both the min-max diameter clustering problem and the metric k-center problem, these approximations are optimal: the existence of a polynomial-time heuristic with any constant approximation ratio less than 2 would imply that P = NP. As well as for clustering, the farthest-first traversal can also be used in another type of facility location problem, the max-min facility dispersion problem, in which the goal is to choose the locations of k different facilities so that they are as far apart from each other as possible. More precisely, the goal in this problem is to choose k points from a given metric space or a given set of candidate points, in such a way as to maximize the minimum pairwise distance between the selected points. Again, this can be approximated by choosing the first k points of a farthest-first traversal. If r denotes the distance of the kth point from all previous points, then every point of the metric space or the candidate set is within distance r of the first k − 1 points. By the pigeonhole principle, some two points of the optimal solution (whatever it is) must both be within distance r of the same point among these first k − 1 chosen points, and (by the triangle inequality) within distance 2r of each other. Therefore, the heuristic solution given by the farthest-first traversal is within a factor of two of optimal. Other applications of the farthest-first traversal include color quantization (clustering the colors in an image to a smaller set of representative colors), progressive scanning of images (choosing an order to display the pixels of an image so that prefixes of the ordering produce good lower-resolution versions of the whole image rather than filling in the image from top to bottom), point selection in the probabilistic roadmap method for motion planning, simplification of point clouds, generating masks for halftone images, hierarchical clustering, finding the similarities between polygon meshes of similar surfaces, choosing diverse and high-value observation targets for underwater robot exploration, fault detection in sensor networks, modeling phylogenetic diversity, matching vehicles in a heterogenous fleet to customer delivery requests, uniform distribution of geodetic observatories on the Earth's surface or of other types of sensor network, generation of virtual point lights in the instant radiosity computer graphics rendering method, and geometric range searching data structures. == Algorithms == === Greedy exact algorithm === The farthest-first traversal of a finite point set may be computed by a greedy algorithm that maintains the distance of each point from the previously selected points, performing the following steps: Initialize the sequence of selected points to the empty sequence, and the distances of each point to the selected points to infinity. While not all points have been selected, repeat the following steps: Scan the list of not-yet-selected points to find a point p that has the maximum distance from the selected points. Remove p from the not-yet-selected points and add it to the end of the sequence of selected points. For each remaining not-yet-selected point q, replace the distance stored for q by the minimum of its old value and the distance from p to q. For a set of n points, this algorithm takes O(n2) steps and O(n2) distance computations. === Approximations === A faster approximation algorithm, given by Har-Peled & Mendel (2006), applieSilhouette (clustering)
Vowpal Wabbit
IgHome
Farthest-first traversal