Quantification (machine learning)

In machine learning, quantification (variously called learning to quantify, or supervised prevalence estimation, or class prior estimation) is the task of using supervised learning in order to train models (quantifiers) that estimate the relative frequencies (also known as prevalence values) of the classes of interest in a sample of unlabelled data items. For instance, in a sample of 100,000 unlabelled tweets known to express opinions about a certain political candidate, a quantifier may be used to estimate the percentage of these tweets which belong to class `Positive' (i.e., which manifest a positive stance towards this candidate), and to do the same for classes `Neutral' and `Negative'. Quantification may also be viewed as the task of training predictors that estimate a (discrete) probability distribution, i.e., that generate a predicted distribution that approximates the unknown true distribution of the items across the classes of interest. Quantification is different from classification, since the goal of classification is to predict the class labels of individual data items, while the goal of quantification it to predict the class prevalence values of sets of data items. Quantification is also different from regression, since in regression the training data items have real-valued labels, while in quantification the training data items have class labels. It has been shown in multiple research works that performing quantification by classifying all unlabelled instances and then counting the instances that have been attributed to each class (the 'classify and count' method) usually leads to suboptimal quantification accuracy. This suboptimality may be seen as a direct consequence of 'Vapnik's principle', which states: If you possess a restricted amount of information for solving some problem, try to solve the problem directly and never solve a more general problem as an intermediate step. It is possible that the available information is sufficient for a direct solution but is insufficient for solving a more general intermediate problem. In our case, the problem to be solved directly is quantification, while the more general intermediate problem is classification. As a result of the suboptimality of the 'classify and count' method, quantification has evolved as a task in its own right, different (in goals, methods, techniques, and evaluation measures) from classification. == Quantification tasks == === Quantification tasks according to the set of classes === The main variants of quantification, according to the characteristics of the set of classes used, are: Binary quantification, corresponding to the case in which there are only n = 2 {\displaystyle n=2} classes and each data item belongs to exactly one of them; Single-label multiclass quantification, corresponding to the case in which there are n > 2 {\displaystyle n>2} classes and each data item belongs to exactly one of them; Multi-label multiclass quantification, corresponding to the case in which there are n ≥ 2 {\displaystyle n\geq 2} classes and each data item can belong to zero, one, or several classes at the same time; Ordinal quantification, corresponding to the single-label multiclass case in which a total order is defined on the set of classes. Regression quantification, a task which stands to 'standard' quantification as regression stands to classification. Strictly speaking, this task is not a quantification task as defined above (since the individual items do not have class labels but are labelled by real values), but has enough commonalities with other quantification tasks to be considered one of them. Most known quantification methods address the binary case or the single-label multiclass case, and only few of them address the multi-label, ordinal, and regression cases. Binary-only methods include the Mixture Model (MM) method, the HDy method, SVM(KLD), and SVM(Q). Methods that can deal with both the binary case and the single-label multiclass case include probabilistic classify and count (PCC), adjusted classify and count (ACC), probabilistic adjusted classify and count (PACC), the Saerens-Latinne-Decaestecker EM-based method (SLD), and KDEy. Methods for multi-label quantification include regression-based quantification (RQ) and label powerset-based quantification (LPQ). Methods for the ordinal case include ordinal versions of the above-mentioned ACC, PACC, and SLD methods, and ordinal versions of the above-mentioned HDy method. Methods for the regression case include Regress and splice and Adjusted regress and sum. === Quantification tasks according to the type of data === Several subtasks of quantification may be identified according to the type of data involved. Example such tasks are: Quantification of networked data. This task consists of performing quantification when the datapoints are members of a relation, i.e., are interlinked. As such, this task is a strict relative of collective classification. Quantification over time. This task consists of performing quantification on sets that become available in a temporal sequence, i.e., as a data stream, and finds application in contexts in which class prevalence values must be monitored over time. == Evaluation measures for quantification == Several evaluation measures can be used for evaluating the error of a quantification method. Since quantification consists of generating a predicted probability distribution that estimates a true probability distribution, these evaluation measures are ones that compare two probability distributions. Most evaluation measures for quantification belong to the class of divergences. Evaluation measures for binary quantification, single-label multiclass quantification, and multi-label quantification, are Absolute Error Squared Error Relative Absolute Error Kullback–Leibler divergence Pearson Divergence Evaluation measures for ordinal quantification are Normalized Match Distance (a particular case of the Earth Mover's Distance) Root Normalized Order-Aware Distance == Applications == Quantification is of special interest in fields such as the social sciences, epidemiology, market research, allocating resources, and ecological modelling, since these fields are inherently concerned with aggregate data. However, quantification is also useful as a building block for solving other downstream tasks, such as improving the accuracy of classifiers on out-of-distribution data, measuring classifier bias and ranker bias, and estimating the accuracy of classifiers on out-of-distribution data. == Resources == LQ 2021: the 1st International Workshop on Learning to Quantify LQ 2022: the 2nd International Workshop on Learning to Quantify LQ 2023: the 3rd International Workshop on Learning to Quantify LQ 2024: the 4th International Workshop on Learning to Quantify LQ 2025: the 5th International Workshop on Learning to Quantify LeQua 2022: the 1st Data Challenge on Learning to Quantify LeQua 2024: the 2nd Data Challenge on Learning to Quantify QuaPy: An open-source Python-based software library for quantification QuantificationLib: A Python library for quantification and prevalence estimation

Adobe PhotoDeluxe

PhotoDeluxe was a consumer-oriented image editing software line published by Adobe Systems from 1996 until July 8, 2002. At that time it was replaced by Adobe's newly launched consumer-oriented image editing software Photoshop Elements. Adobe no longer provides technical support for the PhotoDeluxe software line. PhotoDeluxe had a range of image processing capabilities for the home photographer and image handler. These included removing red-eye, cropping, and adjusting brightness, contrast, and sharpness. It also included software to extract pictures from an image scanner. Among the functionality included was the ability to dynamically resize photos and export them in a wide range of formats. It also had a range of printing options including printing multiple copies of an image on the same page. It was often bundled free with Epson scanners or as free software with new computers. == Features == Despite the critical concerns regarding the quality of the setup, Photo Deluxe supports layering, blurs, sharpening, cloning, gradient fills, color and background switches, color variations, resizing options, and many other features. Another drawback of PhotoDeluxe was that it was designed for Mac computers, so working on Windows PC was a problem for those who were unable to customize their preferences. == Versions == === Adobe PhotoDeluxe 1.0 === The first version was released in 1996 for Windows and Macintosh computers. In one year, it sold over one million copies. === Adobe PhotoDeluxe 2.0 === The new version was released in 1997 and had added features such as a Clone Tool, red-eye removal, and sample templates for making posters, cards, and calendars. It also had new special effect features. === Adobe PhotoDeluxe 3.0 === The 3rd version was released in 1998. The new features included customizable clipart settings, the ability to import photos on the web, enhanced repair activities following Guided Activities, and Adobe Connectables to add new activities. === Adobe PhotoDeluxe Home Edition (4.0) === Version 4.0 was created by the makers of Photoshop. It had advanced abilities such as tools to add animation, voice, and music to a picture. It also had features to restore photos to their original position. == History == Adobe PhotoDeluxe 1.0 was released in 1996 for Macintosh computers, initially retailing for an MSRP of $49. The software did quite well, reportedly selling over a million copies by February of the next year, primarily due to bundles with companies like Apple and Hewlett-Packard. PhotoDeluxe was primarily advertised to consumers as a way to do basic photo manipulation, such as cropping and rotating images, or creating simple cards and calendars. PhotoDeluxe 2.0 was released in 1997, and was the last version of PhotoDeluxe that Adobe made that worked on Macs. PhotoDeluxe 2.0 became the "number one selling consumer photo-editing software product in the world." PhotoDeluxe 3.0 was released in 1998, where it was rebranded as "3.0 Home Edition", as Adobe released PhotoDeluxe Business Edition later that year for a higher price. PhotoDeluxe Home Edition, unofficially called PhotoDeluxe 4.0, was released in 1999 and was the last version of PhotoDeluxe to be released. Adobe officially cancelled PhotoDeluxe on July 8, 2002, citing the presence of Photoshop and Photoshop Elements, with support being officially cancelled in mid-2003. No version of PhotoDeluxe is compatible with Windows 10, rendering the program obsolete. == Pricing == All home versions of PhotoDeluxe retailed for an MSRP of $49. PhotoDeluxe 2.0 and onwards allowed users to upgrade from a previous version of PhotoDeluxe or a competing piece of graphics software for $39. Additionally PhotoDeluxe Business Edition allowed a similar deal, allowing users to upgrade from other versions of PhotoDeluxe or a competing software for $59, instead of its normal price of $99. Adobe also offered a bundle allowing users of 1.0 or 2.0 to get 3.0 and Business Edition for $79.

CU-RTC-WEB

Customizable, Ubiquitous Real Time Communication over the Web is an API definition being drafted by Bernard Aboba at Microsoft. It is a competing standard to WebRTC, which drafted by a World Wide Web Consortium working group since May 2011. As of 2024, CU-RTC-WEB is still in the drafting phase, with ongoing discussions and contributions from various stakeholders in the tech community. Bernard Aboba, who serves as a co-chair of the W3C WebRTC Working Group, is actively involved in both CU-RTC-WEB and WebRTC, indicating a commitment to advancing real-time communication standards across platforms.

Influence-for-hire

Influence-for-hire or collective influence, refers to the economy that has emerged around buying and selling influence on social media platforms. == Overview == Companies that engage in the influence-for-hire industry range from content farms to high-end public relations agencies. Traditionally influence operations have largely been confined to public sector actors like intelligence agencies, in the influence-for-hire industry the groups conduction the operations are private with commerce being their primary consideration. However many of the clients in the influence-for-hire industry are countries or countries acting through proxies. They are often located in countries with less expensive digital labor. == History == In May 2021, Facebook took a Ukrainian influence-for-hire network offline. Facebook attributed the network to organizations and consultants linked to Ukrainian politicians including Andriy Derkach. During the COVID-19 pandemic state sponsored misinformation was spread through influence-for-hire networks. In August 2021, a report published by the Australian Strategic Policy Institute implicated the Chinese government and the ruling Chinese Communist Party in campaigns of online manipulation conducted against Australia and Taiwan using influence-for-hire.

X2 transceiver

The X2 transceiver format is a 10 gigabit per second modular fiber optic interface intended for use in routers, switches and optical transport platforms. It is an early generation 10 gigabit interface related to the similar XENPAK and XPAK formats. X2 may be used with 10 Gigabit Ethernet or OC-192/STM-64 speed SDH/SONET equipment. X2 modules are smaller and consume less power than first-generation XENPAK modules, but larger and consume more energy than the newer XFP transceiver standard and SFP+ standards. As of 2016 this format is relatively uncommon and has been replaced by 10 Gbit/s SFP+ in most new equipment.

Language identification

In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

Foreground detection

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing (object recognition etc.). Many applications do not need to know everything about the evolution of movement in a video sequence, but only require the information of changes in the scene, because an image's regions of interest are objects (humans, cars, text etc.) in its foreground. After the stage of image preprocessing (which may include image denoising, post processing like morphology etc.) object localisation is required which may make use of this technique. Foreground detection separates foreground from background based on these changes taking place in the foreground. It is a set of techniques that typically analyze video sequences recorded in real time with a stationary camera. == Description == All detection techniques are based on modelling the background of the image, i.e., setting the background and detecting which changes occur. Defining the background can be difficult when it contains shapes, shadows, and moving objects. In defining the background, it is assumed that stationary objects may vary in color and intensity over time. Scenarios in which these techniques apply tend to be very diverse. There can be highly variable sequences, such as images with different lighting, interiors, exteriors, quality, and noise. In addition to real-time processing, systems need to adapt to these changes. A foreground detection system should be able to: Develop a background model (estimate). Be robust to lighting changes, repetitive movements (leaves, waves, shadows), and long-term changes. == Background subtraction == Background subtraction is a widely used approach for detecting moving objects in videos from static cameras. The rationale in the approach is that of detecting the moving objects from the difference between the current frame and a reference frame, often called "background image", or "background model". Background subtraction is mostly done if the image in question is a part of a video stream. Background subtraction provides important cues for numerous applications in computer vision, for example surveillance tracking or human pose estimation. Background subtraction is generally based on a static background hypothesis which is often not applicable in real environments. With indoor scenes, reflections or animated images on screens lead to background changes. Similarly, due to wind, rain or illumination changes brought by weather, static backgrounds methods have difficulties with outdoor scenes. == Temporal average filter == The temporal average filter is a method that was proposed at the Velastin. This system estimates the background model from the median of all pixels of a number of previous images. The system uses a buffer with the pixel values of the last frames to update the median for each image. To model the background, the system examines all images in a given time period called training time. At this time, we only display images and will find the median, pixel by pixel, of all the plots in the background this time. After the training period for each new frame, each pixel value is compared with the input value of funds previously calculated. If the input pixel is within a threshold, the pixel is considered to match the background model and its value is included in the pixbuf. Otherwise, if the value is outside this threshold pixel is classified as foreground, and not included in the buffer. This method cannot be considered very efficient because they do not present a rigorous statistical basis and requires a buffer that has a high computational cost. == Conventional approaches == A robust background subtraction algorithm should be able to handle lighting changes, repetitive motions from clutter and long-term scene changes. The following analyses make use of the function of V(x,y,t) as a video sequence where t is the time dimension, x and y are the pixel location variables. e.g. V(1,2,3) is the pixel intensity at (1,2) pixel location of the image at t = 3 in the video sequence. === Using frame differencing === A motion detection algorithm begins with the segmentation part where foreground or moving objects are segmented from the background. The simplest way to implement this is to take an image as background and take the frames obtained at the time t, denoted by I(t) to compare with the background image denoted by B. Here using simple arithmetic calculations, we can segment out the objects simply by using image subtraction technique of computer vision meaning for each pixels in I(t), take the pixel value denoted by P[I(t)] and subtract it with the corresponding pixels at the same position on the background image denoted as P[B]. In mathematical equation, it is written as: P [ F ( t ) ] = P [ I ( t ) ] − P [ B ] {\displaystyle P[F(t)]=P[I(t)]-P[B]} The background is assumed to be the frame at time t. This difference image would only show some intensity for the pixel locations which have changed in the two frames. Though we have seemingly removed the background, this approach will only work for cases where all foreground pixels are moving, and all background pixels are static. A threshold "Threshold" is put on this difference image to improve the subtraction (see Image thresholding): | P [ F ( t ) ] − P [ F ( t + 1 ) ] | > T h r e s h o l d {\displaystyle |P[F(t)]-P[F(t+1)]|>\mathrm {Threshold} } This means that the difference image's pixels' intensities are 'thresholded' or filtered on the basis of value of Threshold. The accuracy of this approach is dependent on speed of movement in the scene. Faster movements may require higher thresholds. === Mean filter === For calculating the image containing only the background, a series of preceding images are averaged. For calculating the background image at the instant t: B ( x , y , t ) = 1 N ∑ i = 1 N V ( x , y , t − i ) {\displaystyle B(x,y,t)={1 \over N}\sum _{i=1}^{N}V(x,y,t-i)} where N is the number of preceding images taken for averaging. This averaging refers to averaging corresponding pixels in the given images. N would depend on the video speed (number of images per second in the video) and the amount of movement in the video. After calculating the background B(x,y,t) we can then subtract it from the image V(x,y,t) at time t = t and threshold it. Thus the foreground is: | V ( x , y , t ) − B ( x , y , t ) | > T h {\displaystyle |V(x,y,t)-B(x,y,t)|>\mathrm {Th} } where Th is a threshold value. Similarly, we can also use median instead of mean in the above calculation of B(x,y,t). Usage of global and time-independent thresholds (same Th value for all pixels in the image) may limit the accuracy of the above two approaches. === Running Gaussian average === For this method, Wren et al. propose fitting a Gaussian probabilistic density function (pdf) on the most recent n {\displaystyle n} frames. In order to avoid fitting the pdf from scratch at each new frame time t {\displaystyle t} , a running (or on-line cumulative) average is computed. The pdf of every pixel is characterized by mean μ t {\displaystyle \mu _{t}} and variance σ t 2 {\displaystyle \sigma _{t}^{2}} . The following is a possible initial condition (assuming that initially every pixel is background): μ 0 = I 0 {\displaystyle \mu _{0}=I_{0}} σ 0 2 = ⟨ some default value ⟩ {\displaystyle \sigma _{0}^{2}=\langle {\text{some default value}}\rangle } where I t {\displaystyle I_{t}} is the value of the pixel's intensity at time t {\displaystyle t} . In order to initialize variance, we can, for example, use the variance in x and y from a small window around each pixel. Note that background may change over time (e.g. due to illumination changes or non-static background objects). To accommodate for that change, at every frame t {\displaystyle t} , every pixel's mean and variance must be updated, as follows: μ t = ρ I t + ( 1 − ρ ) μ t − 1 {\displaystyle \mu _{t}=\rho I_{t}+(1-\rho )\mu _{t-1}} σ t 2 = d 2 ρ + ( 1 − ρ ) σ t − 1 2 {\displaystyle \sigma _{t}^{2}=d^{2}\rho +(1-\rho )\sigma _{t-1}^{2}} d = | ( I t − μ t ) | {\displaystyle d=|(I_{t}-\mu _{t})|} Where ρ {\displaystyle \rho } determines the size of the temporal window that is used to fit the pdf (usually ρ = 0.01 {\displaystyle \rho =0.01} ) and d {\displaystyle d} is the Euclidean distance between the mean and the value of the pixel. We can now classify a pixel as background if its current intensity lies within some confidence interval of its distribution's mean: | ( I t − μ t ) | σ t > k ⟶ foreground {\displaystyle {\frac {|(I_{t}-\mu _{t})|}{\sigma _{t}}}>k\longrightarrow {\text{foreground}}} | ( I t − μ t ) | σ t ≤ k ⟶ background {\displaystyle {\frac {|(I_{t}-\mu _{t})|}{\sigma _{t}}}\leq k\longrightarrow {\text{background}}} where the parameter k {\displaystyle k} is a free threshold (usuall