Electronic kit

Electronic kit

An electronic kit is a package of electrical components used to build an electronic device. Generally, kits are composed of electronic components, a circuit diagram (schematic), assembly instructions, and often a printed circuit board (PCB) or another type of prototyping board. There are two types of kits. Some build a single device or system. Other types used for education demonstrate a range of circuits. These will include a solderless construction board of some type, such as: Components mounted in plastic blocks with side contacts, that are held together in a base, e.g. Denshi blocks Springs on a card board, the springs trap wire leads, or component leads, such as Philips EE electronic experiment kits. These are a cheap and flexible option Professional type prototyping boards, (breadboards) into which component leads are inserted, following documentation of the "kit". The first type of kit for constructing a single device normally uses a PCB on which components are soldered. They normally come with extended documentation describing which component goes where into the PCB. For advanced hobby projects, sometimes the kit may only consist of a printed circuit board and assembly instructions, and the purchaser may have to source all the parts independently; or, the vendor may provide hard-to-get or pre-programmed parts while expecting the purchaser to obtain the rest of the components. People primarily purchase electronic kits to have fun and learn how things work. They were once popular as a means to reduce the cost of buying goods, but there is usually no cost saving in buying a kit today. Some electronic kits were assembled to make complete complex devices such as color television sets, oscilloscopes, high-end audio amplifiers, amateur radio equipment, electric organs, and even computers such as the Heathkit H-8, and the LNW-80. Many of the early microprocessor computers were sold as either electronic kits or assembled and tested. Heathkit sold millions of electronic kits during its 45-year history. Home assembly of common consumer electronics items no longer provides a cost advantage over commercially manufactured and distributed devices. People still build kits for custom devices and special-purpose electronics for professional and educational use and as a hobby. Also emerging is a trend to simplify the complexity by providing preprogrammed or modular kits often provided by many suppliers online. The fun and thrill of making your own electronics have shifted, in many cases, from easy-to-comprehend applications and analog devices to more sophisticated digital devices. == Examples == The Altair 8800 (the first home computer) was also sold as a kit, as were the MK14, Sinclair ZX80, Sinclair ZX81 and Acorn Atom computers. Many S-100 bus system cards were sold only as kits. Building a Robot kit, most often with a micro controller inside, is now in fashion.

Convolution

In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions f {\displaystyle f} and g {\displaystyle g} that produces a third function f ∗ g {\displaystyle fg} , as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The term convolution refers to both the resulting function and to the process of computing it. The integral is evaluated for all values of shift, producing the convolution function. The choice of which function is reflected and shifted before the integral does not change the integral result (see commutativity). Graphically, it expresses how the 'shape' of one function is modified by the other. Some features of convolution are similar to cross-correlation: for real-valued functions, of a continuous or discrete variable, convolution f ∗ g {\displaystyle fg} differs from cross-correlation f ⋆ g {\displaystyle f\star g} only in that either f ( x ) {\displaystyle f(x)} or g ( x ) {\displaystyle g(x)} is reflected about the y-axis in convolution; thus it is a cross-correlation of g ( − x ) {\displaystyle g(-x)} and f ( x ) {\displaystyle f(x)} , or f ( − x ) {\displaystyle f(-x)} and g ( x ) {\displaystyle g(x)} . For complex-valued functions, the cross-correlation operator is the adjoint of the convolution operator. Convolution has applications that include probability, statistics, acoustics, spectroscopy, signal processing and image processing, computer vision and human vision, geophysics, engineering, physics, and differential equations. The convolution can be defined for functions on Euclidean space and other groups (as algebraic structures). For example, periodic functions, such as the discrete-time Fourier transform, can be defined on a circle and convolved by periodic convolution. (See row 18 at DTFT § Properties.) A discrete convolution can be defined for functions on the set of integers. Generalizations of convolution have applications in the field of numerical analysis and numerical linear algebra, and in the design and implementation of finite impulse response filters in signal processing. Computing the inverse of the convolution operation is known as deconvolution. == Definition == The convolution of f {\displaystyle f} and g {\displaystyle g} is written f ∗ g {\displaystyle fg} , denoting the operator with the symbol ∗ {\displaystyle } . It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. As such, it is a particular kind of integral transform: ( f ∗ g ) ( t ) := ∫ − ∞ ∞ f ( τ ) g ( t − τ ) d τ . {\displaystyle (fg)(t):=\int _{-\infty }^{\infty }f(\tau )g(t-\tau )\,d\tau .} An equivalent definition is (see commutativity): ( f ∗ g ) ( t ) := ∫ − ∞ ∞ f ( t − τ ) g ( τ ) d τ . {\displaystyle (fg)(t):=\int _{-\infty }^{\infty }f(t-\tau )g(\tau )\,d\tau .} While the symbol t {\displaystyle t} is used above, it need not represent the time domain. At each t {\displaystyle t} , the convolution formula can be described as the area under the function f ( τ ) {\displaystyle f(\tau )} weighted by the function g ( − τ ) {\displaystyle g(-\tau )} shifted by the amount t {\displaystyle t} . As t {\displaystyle t} changes, the weighting function g ( t − τ ) {\displaystyle g(t-\tau )} emphasizes different parts of the input function f ( τ ) {\displaystyle f(\tau )} ; If t {\displaystyle t} is a positive value, then g ( t − τ ) {\displaystyle g(t-\tau )} is equal to g ( − τ ) {\displaystyle g(-\tau )} that slides or is shifted along the τ {\displaystyle \tau } -axis toward the right (toward + ∞ {\displaystyle +\infty } ) by the amount of t {\displaystyle t} , while if t {\displaystyle t} is a negative value, then g ( t − τ ) {\displaystyle g(t-\tau )} is equal to g ( − τ ) {\displaystyle g(-\tau )} that slides or is shifted toward the left (toward − ∞ {\displaystyle -\infty } ) by the amount of | t | {\displaystyle |t|} . For functions f {\displaystyle f} , g {\displaystyle g} supported on only [ 0 , ∞ ) {\displaystyle [0,\infty )} (i.e., zero for negative arguments), the integration limits can be truncated, resulting in: ( f ∗ g ) ( t ) = ∫ 0 t f ( τ ) g ( t − τ ) d τ for f , g : [ 0 , ∞ ) → R . {\displaystyle (fg)(t)=\int _{0}^{t}f(\tau )g(t-\tau )\,d\tau \quad \ {\text{for }}f,g:[0,\infty )\to \mathbb {R} .} For the multi-dimensional formulation of convolution, see domain of definition (below). === Notation === A common engineering notational convention is: f ( t ) ∗ g ( t ) := ∫ − ∞ ∞ f ( τ ) g ( t − τ ) d τ ⏟ ( f ∗ g ) ( t ) , {\displaystyle f(t)g(t)\mathrel {:=} \underbrace {\int _{-\infty }^{\infty }f(\tau )g(t-\tau )\,d\tau } _{(fg)(t)},} which has to be interpreted carefully to avoid confusion. For instance, f ( t ) ∗ g ( t − t 0 ) {\displaystyle f(t)g(t-t_{0})} is equivalent to ( f ∗ g ) ( t − t 0 ) {\displaystyle (fg)(t-t_{0})} , but f ( t − t 0 ) ∗ g ( t − t 0 ) {\displaystyle f(t-t_{0})g(t-t_{0})} is in fact equivalent to ( f ∗ g ) ( t − 2 t 0 ) {\displaystyle (fg)(t-2t_{0})} . === Relations with other transforms === Given two functions f ( t ) {\displaystyle f(t)} and g ( t ) {\displaystyle g(t)} with bilateral Laplace transforms (two-sided Laplace transform) F ( s ) = ∫ − ∞ ∞ e − s u f ( u ) d u {\displaystyle F(s)=\int _{-\infty }^{\infty }e^{-su}\ f(u)\ {\text{d}}u} and G ( s ) = ∫ − ∞ ∞ e − s v g ( v ) d v {\displaystyle G(s)=\int _{-\infty }^{\infty }e^{-sv}\ g(v)\ {\text{d}}v} respectively, the convolution operation ( f ∗ g ) ( t ) {\displaystyle (fg)(t)} can be defined as the inverse Laplace transform of the product of F ( s ) {\displaystyle F(s)} and G ( s ) {\displaystyle G(s)} . More precisely, F ( s ) ⋅ G ( s ) = ∫ − ∞ ∞ e − s u f ( u ) d u ⋅ ∫ − ∞ ∞ e − s v g ( v ) d v = ∫ − ∞ ∞ ∫ − ∞ ∞ e − s ( u + v ) f ( u ) g ( v ) d u d v {\displaystyle {\begin{aligned}F(s)\cdot G(s)&=\int _{-\infty }^{\infty }e^{-su}\ f(u)\ {\text{d}}u\cdot \int _{-\infty }^{\infty }e^{-sv}\ g(v)\ {\text{d}}v\\&=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }e^{-s(u+v)}\ f(u)\ g(v)\ {\text{d}}u\ {\text{d}}v\end{aligned}}} Let t = u + v {\displaystyle t=u+v} , then F ( s ) ⋅ G ( s ) = ∫ − ∞ ∞ ∫ − ∞ ∞ e − s t f ( u ) g ( t − u ) d u d t = ∫ − ∞ ∞ e − s t ∫ − ∞ ∞ f ( u ) g ( t − u ) d u ⏟ ( f ∗ g ) ( t ) d t = ∫ − ∞ ∞ e − s t ( f ∗ g ) ( t ) d t . {\displaystyle {\begin{aligned}F(s)\cdot G(s)&=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }e^{-st}\ f(u)\ g(t-u)\ {\text{d}}u\ {\text{d}}t\\&=\int _{-\infty }^{\infty }e^{-st}\underbrace {\int _{-\infty }^{\infty }f(u)\ g(t-u)\ {\text{d}}u} _{(fg)(t)}\ {\text{d}}t\\&=\int _{-\infty }^{\infty }e^{-st}(fg)(t)\ {\text{d}}t.\end{aligned}}} Note that F ( s ) ⋅ G ( s ) {\displaystyle F(s)\cdot G(s)} is the bilateral Laplace transform of ( f ∗ g ) ( t ) {\displaystyle (fg)(t)} . A similar derivation can be done using the unilateral Laplace transform (one-sided Laplace transform). The convolution operation also describes the output (in terms of the input) of an important class of operations known as linear time-invariant (LTI). See LTI system theory for a derivation of convolution as the result of LTI constraints. In terms of the Fourier transforms of the input and output of an LTI operation, no new frequency components are created. The existing ones are only modified (amplitude and/or phase). In other words, the output transform is the pointwise product of the input transform with a third transform (known as a transfer function). See Convolution theorem for a derivation of that property of convolution. Conversely, convolution can be derived as the inverse Fourier transform of the pointwise product of two Fourier transforms. == Visual explanation == == Historical developments == One of the earliest uses of the convolution integral appeared in D'Alembert's derivation of Taylor's theorem in Recherches sur différents points importants du système du monde, published in 1754. Also, an expression of the type: ∫ f ( u ) ⋅ g ( x − u ) d u {\displaystyle \int f(u)\cdot g(x-u)\,du} is used by Sylvestre François Lacroix on page 505 of his book entitled Treatise on differences and series, which is the last of 3 volumes of the encyclopedic series: Traité du calcul différentiel et du calcul intégral, Chez Courcier, Paris, 1797–1800. Soon thereafter, convolution operations appear in the works of Pierre Simon Laplace, Jean-Baptiste Joseph Fourier, Siméon Denis Poisson, and others. The term itself did not come into wide use until the 1950s or 1960s. Prior to that it was sometimes known as Faltung (which means folding in German), composition product, superposition integral, and Carson's integral. Yet it appears as early as 1903, though the definition is rather unfamiliar in older uses. The operation: ∫ 0 t φ ( s ) ψ ( t − s ) d s , 0 ≤ t < ∞ , {\displaystyle \int _{0}^{t}\varphi (s)\psi (t-s)\,ds,\quad 0\leq t<\infty ,} is a particular case of composition products considered by the Italian mathematician Vito Volterra in 1913. == Circular c

Journal of Machine Learning Research

The Journal of Machine Learning Research is a peer-reviewed open access scientific journal covering machine learning. It was established in 2000 and the first editor-in-chief was Leslie Kaelbling. The current editors-in-chief are Francis Bach (Inria) and David Blei (Columbia University). == History == The journal was established as an open-access alternative to the journal Machine Learning. In 2001, forty editorial board members of Machine Learning resigned, saying that in the era of the Internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives. The open access model employed by the Journal of Machine Learning Research allows authors to publish articles for free and retain copyright, while archives are freely available online. Print editions of the journal were published by MIT Press until 2004 and by Microtome Publishing thereafter. From its inception, the journal received no revenue from the print edition and paid no subvention to MIT Press or Microtome Publishing. In response to the prohibitive costs of arranging workshop and conference proceedings publication with traditional academic publishing companies, the journal launched a proceedings publication arm in 2007 and now publishes proceedings for several leading machine learning conferences, including the International Conference on Machine Learning, COLT, AISTATS, and workshops held at the Conference on Neural Information Processing Systems.

Croissant (metadata format)

Croissant is a metadata format design to support sharing of datasets for machine learning applications. It is a platform-agnostic schema used to standardize metadata in data repositories like Hugging Face, kaggle, Dataverse and OpenML. == Structure == Croissant builds upon schema.org, uses primarily JSON-LD, and divides metadata in four "layers": Dataset Metadata, Resource, Structure and Semantic: The Dataset Metadata layer constrains which schema.org properties should be used, including additional properties, linking together the resources (files) of the dataset with general metadata, like licensing and citation information. The Resource layer describes the individual files and sets of those using two new classes, FileObject and FileSet. A FileSet may be a collection of related images. The Structure layer specifies how the files are organized in the dataset. A RecordSet class describes how resources are present, configurations that may very a lot between modality. This specification facilitates interoperability of the datasets. Finally, the Semantic layer adds information for practical reuse of the dataset, such as splits for train, test and validation subsets. It also provides a default extension for metadata related to responsible AI. The use of a standard machine-readable structure increases, for example, the discoverability of datasets in search engines such as Google Dataset Search. == History == Croissant was shared in arXiv in March 2024 and published in the proceedings of NeurIPS 2024. It started as community driven as a MLCommons Croissant Working Group, including stakeholders organizations from academia and industry, including Google, the open data institute, Sage Bionetworks and King's College London. Variations of Croissant are developed to support datasets in different areas of research, such as Geo-Croissant for geospatial datasets. Other technical extensions, such as support for RDF, soon followed.

Incremental heuristic search

Incremental heuristic search algorithms combine both incremental and heuristic search to speed up searches of sequences of similar search problems, which is important in domains that are only incompletely known or change dynamically. Incremental search has been studied at least since the late 1960s. Incremental search algorithms reuse information from previous searches to speed up the current search and solve search problems potentially much faster than solving them repeatedly from scratch. Similarly, heuristic search has also been studied at least since the late 1960s. Heuristic search algorithms, often based on A, use heuristic knowledge in the form of approximations of the goal distances to focus the search and solve search problems potentially much faster than uninformed search algorithms. The resulting search problems, sometimes called dynamic path planning problems, are graph search problems where paths have to be found repeatedly because the topology of the graph, its edge costs, the start vertex or the goal vertices change over time. So far, three main classes of incremental heuristic search algorithms have been developed: The first class restarts A at the point where its current search deviates from the previous one (example: Fringe Saving A). The second class updates the h-values (heuristic, i.e. approximate distance to goal) from the previous search during the current search to make them more informed (example: Generalized Adaptive A). The third class updates the g-values (distance from start) from the previous search during the current search to correct them when necessary, which can be interpreted as transforming the A search tree from the previous search into the A search tree for the current search (examples: Lifelong Planning A, D, D Lite). All three classes of incremental heuristic search algorithms are different from other replanning algorithms, such as planning by analogy, in that their plan quality does not deteriorate with the number of replanning episodes. == Applications == Incremental heuristic search has been extensively used in robotics, where a larger number of path planning systems are based on either D (typically earlier systems) or D Lite (current systems), two different incremental heuristic search algorithms.

Automatic taxonomy construction

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a corpus. ATC is a branch of natural language processing, which in turn is a branch of artificial intelligence. A taxonomy (or taxonomical classification) is a scheme of classification, especially, a hierarchical classification, in which things are organized into groups or types. Among other things, a taxonomy can be used to organize and index knowledge (stored as documents, articles, videos, etc.), such as in the form of a library classification system, or a search engine taxonomy, so that users can more easily find the information they are searching for. Many taxonomies are hierarchies (and thus, have an intrinsic tree structure), but not all are. Manually developing and maintaining a taxonomy is a labor-intensive task requiring significant time and resources, including familiarity of or expertise in the taxonomy's domain (scope, subject, or field), which drives the costs and limits the scope of such projects. Also, domain modelers have their own points of view which inevitably, even if unintentionally, work their way into the taxonomy. ATC uses artificial intelligence techniques to quickly automatically generate a taxonomy for a domain in order to avoid these problems and remove limitations. == Approaches == There are several approaches to ATC. One approach is to use rules to detect patterns in the corpus and use those patterns to infer relations such as hyponymy. Other approaches use machine learning techniques such as Bayesian inferencing and Artificial Neural Networks. === Keyword extraction === One approach to building a taxonomy is to automatically gather the keywords from a domain using keyword extraction, then analyze the relationships between them (see Hyponymy, below), and then arrange them as a taxonomy based on those relationships. === Hyponymy and "is-a" relations === In ATC programs, one of the most important tasks is the discovery of hypernym and hyponym relations among words. One way to do that from a body of text is to search for certain phrases like "is a" and "such as". In linguistics, is-a relations are called hyponymy. Words that describe categories are called hypernyms and words that are examples of categories are hyponyms. For example, dog is a hypernym and Fido is one of its hyponyms. A word can be both a hyponym and a hypernym. So, dog is a hyponym of mammal and also a hypernym of Fido. Taxonomies are often represented as is-a hierarchies where each level is more specific than (in mathematical language "a subset of") the level above it. For example, a basic biology taxonomy would have concepts such as mammal, which is a subset of animal, and dogs and cats, which are subsets of mammal. This kind of taxonomy is called an is-a model because the specific objects are considered instances of a concept. For example, Fido is-a instance of the concept dog and Fluffy is-a cat. == Applications == ATC can be used to build taxonomies for search engines, to improve search results. ATC systems are a key component of ontology learning (also known as automatic ontology construction), and have been used to automatically generate large ontologies for domains such as insurance and finance. They have also been used to enhance existing large networks such as Wordnet to make them more complete and consistent. == ATC software == == Other names == Other names for automatic taxonomy construction include: Automated outline building Automated outline construction Automated outline creation Automated outline extraction Automated outline generation Automated outline induction Automated outline learning Automated outlining Automated taxonomy building Automated taxonomy construction Automated taxonomy creation Automated taxonomy extraction Automated taxonomy generation Automated taxonomy induction Automated taxonomy learning Automatic outline building Automatic outline construction Automatic outline creation Automatic outline extraction Automatic outline generation Automatic outline induction Automatic outline learning Automatic taxonomy building Automatic taxonomy creation Automatic taxonomy extraction Automatic taxonomy generation Automatic taxonomy induction Automatic taxonomy learning Outline automation Outline building Outline construction Outline creation Outline extraction Outline generation Outline induction Outline learning Semantic taxonomy building Semantic taxonomy construction Semantic taxonomy creation Semantic taxonomy extraction Semantic taxonomy generation Semantic taxonomy induction Semantic taxonomy learning Taxonomy automation Taxonomy building Taxonomy construction Taxonomy creation Taxonomy extraction Taxonomy generation Taxonomy induction Taxonomy learning

Active learning (machine learning)

Active learning is a special case of machine learning in which a learning algorithm can interactively query a human user (or some other information source) to label new data points with the desired outputs. The human user must possess expertise in the problem domain, including the ability to consult authoritative sources when necessary. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the teacher for labels. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. However, there is a risk that the algorithm is overwhelmed by uninformative examples. Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of machine learning (e.g. conflict and ignorance) with adaptive, incremental learning policies in the field of online machine learning. Using active learning allows for faster development of a machine learning algorithm, when comparative updates would require a quantum or super computer. Large-scale active learning projects may benefit from crowdsourcing frameworks such as Amazon Mechanical Turk that include many humans in the active learning loop. == Definitions == Let T be the total set of all data under consideration. For example, in a protein engineering problem, T would include all proteins that are known to have a certain interesting activity and all additional proteins that one might want to test for that activity. During each iteration, i, T is broken up into three subsets T K , i {\displaystyle \mathbf {T} _{K,i}} : Data points where the label is known. T U , i {\displaystyle \mathbf {T} _{U,i}} : Data points where the label is unknown. T C , i {\displaystyle \mathbf {T} _{C,i}} : A subset of TU,i that is chosen to be labeled. Most of the current research in active learning involves the best method to choose the data points for TC,i. == Scenarios == Pool-based sampling: In this approach, which is the most well known scenario, the learning algorithm attempts to evaluate the entire dataset before selecting data points (instances) for labeling. It is often initially trained on a fully labeled subset of the data using a machine-learning method such as logistic regression or SVM that yields class-membership probabilities for individual data instances. The candidate instances are those for which the prediction is most ambiguous. Instances are drawn from the entire data pool and assigned a confidence score, a measurement of how well the learner "understands" the data. The system then selects the instances for which it is the least confident and queries the teacher for the labels. The theoretical drawback of pool-based sampling is that it is memory-intensive and is therefore limited in its capacity to handle enormous datasets, but in practice, the rate-limiting factor is that the teacher is typically a (fatiguable) human expert who must be paid for their effort, rather than computer memory. Stream-based selective sampling: Here, each consecutive unlabeled instance is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint. As contrasted with Pool-based sampling, the obvious drawback of stream-based methods is that the learning algorithm does not have sufficient information, early in the process, to make a sound assign-label-vs ask-teacher decision, and it does not capitalize as efficiently on the presence of already labeled data. Therefore, the teacher is likely to spend more effort in supplying labels than with the pool-based approach. Membership query synthesis: This is where the learner generates synthetic data from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if the dataset is small. The challenge here, as with all synthetic-data-generation efforts, is in ensuring that the synthetic data is consistent in terms of meeting the constraints on real data. As the number of variables/features in the input data increase, and strong dependencies between variables exist, it becomes increasingly difficult to generate synthetic data with sufficient fidelity. For example, to create a synthetic data set for human laboratory-test values, the sum of the various white blood cell (WBC) components in a white blood cell differential must equal 100, since the component numbers are really percentages. Similarly, the enzymes alanine transaminase (ALT) and aspartate transaminase (AST) measure liver function (though AST is also produced by other tissues, e.g., lung, pancreas) A synthetic data point with AST at the lower limit of normal range (8–33 units/L) with an ALT several times above normal range (4–35 units/L) in a simulated chronically ill patient would be physiologically impossible. == Query strategies == Algorithms for determining which data points should be labeled can be organized into a number of different categories, based upon their purpose: Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al. propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Expected model change: label those points that would most change the current model. Expected error reduction: label those points that would most reduce the model's generalization error. Exponentiated Gradient Exploration for Active Learning: In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most Querying from diverse subspaces or partitions: When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original feature space. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. Variance reduction: label those points that would minimize output variance, which is one of the components of error. Conformal prediction: predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction. Mismatch-first farthest-traversal: The primary selection criterion is the prediction mismatch between the current model and nearest-neighbour prediction. It targets on wrongly predicted data points. The second selection criterion is the distance to previously selected data, the farthest first. It aims at optimizing the diversity of selected data. User-centered labeling strategies: Learning is accomplished by applying dimensionality reduction to graphs and figures like scatter plots. Then the user is asked to label the compiled data (categorical, numerical, relevance scores, relation between two instances). A wide variety of algorithms have been studied that fall into these categories. While the traditional AL strategies can achieve remarkable performance, it is often challenging to predict in advance which strategy is the most suitable in a particular situation. In recent years, meta-learning algorithms have been gaining in popularity. Some of them have been proposed to tackle the problem of learning AL strategies instead of relying on manually designed strategies. A benchmark which compares 'meta-learning approaches to active learning' to 'traditional heuristic-based Active Learning' may give intuitions if 'Learning active learning' is at the crossroads == Minimum marginal hyperplane == Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, W, of each u