Nearest neighbor search

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M {\displaystyle q\in M} , find the closest point in S to q. Donald Knuth in volume 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a k-NN search, where we need to find the k closest points. Most commonly M is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Even more common, M is taken to be the d-dimensional vector space where dissimilarity is measured using the Euclidean distance, Manhattan distance or other distance metric. However, the dissimilarity function can be arbitrary. One example is asymmetric Bregman divergence, for which the triangle inequality does not hold. == Applications == The nearest neighbor search problem arises in numerous fields of application, including: Pattern recognition – in particular for optical character recognition Statistical classification – see k-nearest neighbor algorithm Computer vision – for point cloud registration Computational geometry – see Closest pair of points problem Cryptanalysis – for lattice problem Databases – e.g. content-based image retrieval Coding theory – see maximum likelihood decoding Semantic search Vector databases, where nearest-neighbor lookup over embeddings is used to retrieve semantically similar records Retrieval-augmented generation systems, where nearest-neighbor retrieval over embeddings is used to fetch candidate passages or documents before generation Data compression – see MPEG-2 standard Robotic sensing Recommendation systems, e.g. see Collaborative filtering Internet marketing – see contextual advertising and behavioral targeting DNA sequencing Spell checking – suggesting correct spelling Plagiarism detection Similarity scores for predicting career paths of professional athletes. Cluster analysis – assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense, usually based on Euclidean distance Chemical similarity Sampling-based motion planning == Methods == Various solutions to the NNS problem have been proposed. The quality and usefulness of the algorithms are determined by the time complexity of queries as well as the space complexity of any search data structures that must be maintained. The informal observation usually referred to as the curse of dimensionality states that there is no general-purpose exact solution for NNS in high-dimensional Euclidean space using polynomial preprocessing and polylogarithmic search time. === Exact methods === ==== Linear search ==== The simplest solution to the NNS problem is to compute the distance from the query point to every other point in the database, keeping track of the "best so far". This algorithm, sometimes referred to as the naive approach, has a running time of O(dN), where N is the cardinality of S and d is the dimensionality of S. There are no search data structures to maintain, so the linear search has no space complexity beyond the storage of the database. Naive search can, on average, outperform space partitioning approaches on higher dimensional spaces. The absolute distance is not required for distance comparison, only the relative distance. In geometric coordinate systems the distance calculation can be sped up considerably by omitting the square root calculation from the distance calculation between two coordinates. The distance comparison will still yield identical results. ==== Space partitioning ==== Since the 1970s, the branch and bound methodology has been applied to the problem. In the case of Euclidean space, this approach encompasses spatial index or spatial access methods. Several space-partitioning methods have been developed for solving the NNS problem. Perhaps the simplest is the k-d tree, which iteratively bisects the search space into two regions containing half of the points of the parent region. Queries are performed via traversal of the tree from the root to a leaf by evaluating the query point at each split. Depending on the distance specified in the query, neighboring branches that might contain hits may also need to be evaluated. For constant dimension query time, average complexity is O(log N) in the case of randomly distributed points, worst case complexity is O(kN^(1-1/k)) Alternatively the R-tree data structure was designed to support nearest neighbor search in dynamic context, as it has efficient algorithms for insertions and deletions such as the R tree. R-trees can yield nearest neighbors not only for Euclidean distance, but can also be used with other distances. In the case of general metric space, the branch-and-bound approach is known as the metric tree approach. Particular examples include vp-tree and BK-tree methods. Using a set of points taken from a 3-dimensional space and put into a BSP tree, and given a query point taken from the same space, a possible solution to the problem of finding the nearest point-cloud point to the query point is given in the following description of an algorithm. (Strictly speaking, no such point may exist, because it may not be unique. But in practice, usually we only care about finding any one of the subset of all point-cloud points that exist at the shortest distance to a given query point.) The idea is, for each branching of the tree, guess that the closest point in the cloud resides in the half-space containing the query point. This may not be the case, but it is a good heuristic. After having recursively gone through all the trouble of solving the problem for the guessed half-space, now compare the distance returned by this result with the shortest distance from the query point to the partitioning plane. This latter distance is that between the query point and the closest possible point that could exist in the half-space not searched. If this distance is greater than that returned in the earlier result, then clearly there is no need to search the other half-space. If there is such a need, then you must go through the trouble of solving the problem for the other half space, and then compare its result to the former result, and then return the proper result. The performance of this algorithm is nearer to logarithmic time than linear time when the query point is near the cloud, because as the distance between the query point and the closest point-cloud point nears zero, the algorithm needs only perform a look-up using the query point as a key to get the correct result. === Approximation methods === An approximate nearest neighbor search algorithm is allowed to return points whose distance from the query is at most c {\displaystyle c} times the distance from the query to its nearest points. The appeal of this approach is that, in many cases, an approximate nearest neighbor is almost as good as the exact one. In particular, if the distance measure accurately captures the notion of user quality, then small differences in the distance should not matter. ==== Greedy search in proximity neighborhood graphs ==== Proximity graph methods (such as navigable small world graphs and HNSW) are considered the current state-of-the-art for the approximate nearest neighbors search. The methods are based on greedy traversing in proximity neighborhood graphs G ( V , E ) {\displaystyle G(V,E)} in which every point x i ∈ S {\displaystyle x_{i}\in S} is uniquely associated with vertex v i ∈ V {\displaystyle v_{i}\in V} . The search for the nearest neighbors to a query q in the set S takes the form of searching for the vertex in the graph G ( V , E ) {\displaystyle G(V,E)} . The basic algorithm – greedy search – works as follows: search starts from an enter-point vertex v i ∈ V {\displaystyle v_{i}\in V} by computing the distances from the query q to each vertex of its neighborhood { v j : ( v i , v j ) ∈ E } {\displaystyle \{v_{j}:(v_{i},v_{j})\in E\}} , and then finds a vertex with the minimal distance value. If the distance value between the query and the selected vertex is smaller than the one between the query and the current element, then the algorithm moves to the selected vertex, and it becomes new enter-point. The algorithm stops when it reaches a local minimum: a vertex whose neighborhood does not contain a vertex that is closer to the query than the vertex itself. The idea of proximity neighborhood graphs was exploited in multiple publications, including the seminal paper by Arya and Mount, in the VoroNet syst

Uniform convergence in probability

Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family uniformly converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory. Specifically, the Glivenko-Cantelli theorem and the homonymous classes of functions are fundamentally related to uniform convergence. The law of large numbers says that, for each single event A {\displaystyle A} , its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. In many application however, the need arises to judge simultaneously the probabilities of events of an entire class S {\displaystyle S} from one and the same sample. Moreover, it, is required that the relative frequency of the events converge to the probability uniformly over the entire class of events S {\displaystyle S} . The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds. == Definitions == For a class of predicates H {\displaystyle H} defined on a set X {\displaystyle X} and a set of samples x = ( x 1 , x 2 , … , x m ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{m})} , where x i ∈ X {\displaystyle x_{i}\in X} , the empirical frequency of h ∈ H {\displaystyle h\in H} on x {\displaystyle x} is Q ^ x ( h ) = 1 m | { i : 1 ≤ i ≤ m , h ( x i ) = 1 } | . {\displaystyle {\widehat {Q}}_{x}(h)={\frac {1}{m}}|\{i:1\leq i\leq m,h(x_{i})=1\}|.} The theoretical probability of h ∈ H {\displaystyle h\in H} is defined as Q P ( h ) = P { y ∈ X : h ( y ) = 1 } . {\displaystyle Q_{P}(h)=P\{y\in X:h(y)=1\}.} The Uniform Convergence Theorem states, roughly, that if H {\displaystyle H} is "simple" and we draw samples independently (with replacement) from X {\displaystyle X} according to any distribution P {\displaystyle P} , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability. Here "simple" means that the Vapnik–Chervonenkis dimension of the class H {\displaystyle H} is small relative to the size of the sample. In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole. The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis using the concept of growth function. == Uniform Convergence Theorem == The statement of the Uniform Convergence Theorem is as follows: If H {\displaystyle H} is a set of { 0 , 1 } {\displaystyle \{0,1\}} -valued functions defined on a set X {\displaystyle X} and P {\displaystyle P} is a probability distribution on X {\displaystyle X} then for ε > 0 {\displaystyle \varepsilon >0} and m {\displaystyle m} a positive integer, we have: P m { | Q P ( h ) − Q x ^ ( h ) | ≥ ε for some h ∈ H } ≤ 4 Π H ( 2 m ) e − ε 2 m / 8 . {\displaystyle P^{m}\{|Q_{P}(h)-{\widehat {Q_{x}}}(h)|\geq \varepsilon {\text{ for some }}h\in H\}\leq 4\Pi _{H}(2m)e^{-\varepsilon ^{2}m/8}.} In the above, for any x ∈ X m , {\displaystyle x\in X^{m},} Q P ( h ) = P { ( y ∈ X : h ( y ) = 1 } , {\displaystyle Q_{P}(h)=P\{(y\in X:h(y)=1\},} Q ^ x ( h ) = 1 m | { i : 1 ≤ i ≤ m , h ( x i ) = 1 } | {\displaystyle {\widehat {Q}}_{x}(h)={\frac {1}{m}}|\{i:1\leq i\leq m,h(x_{i})=1\}|} and | x | = m . {\displaystyle |x|=m.} P m {\displaystyle P^{m}} indicates that the probability is taken over x {\displaystyle x} consisting of m {\displaystyle m} i.i.d. draws from the distribution P . {\displaystyle P.} Finally, the growth function Π H {\displaystyle \Pi _{H}} is defined in the following way, for any { 0 , 1 } {\displaystyle \{0,1\}} -valued functions H {\displaystyle H} over X {\displaystyle X} and for any natural number m {\displaystyle m} : Π H ( m ) = max | { h ∩ D : D ⊆ X , | D | = m , h ∈ H } | . {\displaystyle \Pi _{H}(m)=\max |\{h\cap D:D\subseteq X,|D|=m,h\in H\}|.} From the point of view of Learning Theory one can consider H {\displaystyle H} to be the Concept/Hypothesis class defined over the instance set X {\displaystyle X} . Crucially, the Sauer–Shelah lemma implies that Π H ( m ) ≤ m d {\displaystyle \Pi _{H}(m)\leq m^{d}} , where d {\displaystyle d} is the VC dimension of H {\displaystyle H} . == Proof of the Uniform Convergence Theorem == and are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof. Symmetrization: We transform the problem of analyzing | Q P ( h ) − Q ^ x ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{x}(h)|\geq \varepsilon } into the problem of analyzing | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} , where r {\displaystyle r} and s {\displaystyle s} are i.i.d samples of size m {\displaystyle m} drawn according to the distribution P {\displaystyle P} . One can view r {\displaystyle r} as the original randomly drawn sample of length m {\displaystyle m} , while s {\displaystyle s} may be thought as the testing sample which is used to estimate Q P ( h ) {\displaystyle Q_{P}(h)} . Permutation: Since r {\displaystyle r} and s {\displaystyle s} are picked identically and independently, so swapping elements between them will not change the probability distribution on r {\displaystyle r} and s {\displaystyle s} . So, we will try to bound the probability of | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} for some h ∈ H {\displaystyle h\in H} by considering the effect of a specific collection of permutations of the joint sample x = r | | s {\displaystyle x=r||s} . Specifically, we consider permutations σ ( x ) {\displaystyle \sigma (x)} which swap x i {\displaystyle x_{i}} and x m + i {\displaystyle x_{m+i}} in some subset of 1 , 2 , . . . , m {\displaystyle {1,2,...,m}} . The symbol r | | s {\displaystyle r||s} means the concatenation of r {\displaystyle r} and s {\displaystyle s} . Reduction to a finite class: We can now restrict the function class H {\displaystyle H} to a fixed joint sample and hence, if H {\displaystyle H} has finite VC Dimension, it reduces to the problem to one involving a finite function class. We present the technical details of the proof. It should be stressed that this proof glosses over details like the measurability of the events V {\displaystyle V} and R {\displaystyle R} ; measurability is granted in the case of H {\displaystyle H} being finite or countable, but this is not normally the case in standard applications of the theorem (e.g. for statistical learning theory or to prove the Glivenko-Cantelli theorem). To get measurability, one needs to use a notion of separability of the underlying space, possibly related to H {\displaystyle H} . === Symmetrization === Lemma: Let V = { x ∈ X m : | Q P ( h ) − Q ^ x ( h ) | ≥ ε for some h ∈ H } {\displaystyle V=\{x\in X^{m}:|Q_{P}(h)-{\widehat {Q}}_{x}(h)|\geq \varepsilon {\text{ for some }}h\in H\}} and R = { ( r , s ) ∈ X m × X m : | Q r ^ ( h ) − Q ^ s ( h ) | ≥ ε / 2 for some h ∈ H } . {\displaystyle R=\{(r,s)\in X^{m}\times X^{m}:|{\widehat {Q_{r}}}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2{\text{ for some }}h\in H\}.} Then for m ≥ 2 ε 2 {\displaystyle m\geq {\frac {2}{\varepsilon ^{2}}}} , P m ( V ) ≤ 2 P 2 m ( R ) {\displaystyle P^{m}(V)\leq 2P^{2m}(R)} . Proof: By the triangle inequality, if | Q P ( h ) − Q ^ r ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon } and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2} then | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} . Therefore, P 2 m ( R ) ≥ P 2 m { ∃ h ∈ H , | Q P ( h ) − Q ^ r ( h ) | ≥ ε and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 } = ∫ V P m { s : ∃ h ∈ H , | Q P ( h ) − Q ^ r ( h ) | ≥ ε and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 } d P m ( r ) = A {\displaystyle {\begin{aligned}&P^{2m}(R)\\[5pt]\geq {}&P^{2m}\{\exists h\in H,|Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon {\text{ and }}|Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2\}\\[5pt]={}&\int _{V}P^{m}\{s:\exists h\in H,|Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon {\text{ and }}|Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2\}\,dP^{m}(r)\\[5pt]={}&A\end{aligned}}} since r {\displaystyle r} and s {\displaystyle s} are independent. Now for r ∈ V {\displaystyle r\in V} fix an h ∈ H {\displaystyle h\in H} such that | Q P ( h ) − Q ^ r ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon } . For this h {\displaystyle h} , we shall

Bernhard Schölkopf

Bernhard Schölkopf (born 20 February 1968) is a German computer scientist known for his work in machine learning, especially on kernel methods and causality. He is a director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he heads the Department of Empirical Inference. He is also an affiliated professor at ETH Zürich, honorary professor at the University of Tübingen and Technische Universität Berlin, and chairman of the European Laboratory for Learning and Intelligent Systems (ELLIS). == Research == === Kernel methods === Schölkopf developed SVM methods achieving world record performance on the MNIST pattern recognition benchmark at the time. With the introduction of kernel PCA, Schölkopf and coauthors argued that SVMs are a special case of a much larger class of methods, and all algorithms that can be expressed in terms of dot products can be generalized to a nonlinear setting by means of what is known as reproducing kernels. Another significant observation was that the data on which the kernel is defined need not be vectorial, as long as the kernel Gram matrix is positive definite. Both insights together led to the foundation of the field of kernel methods, encompassing SVMs and many other algorithms. Kernel methods are now textbook knowledge and one of the major machine learning paradigms in research and applications. Developing kernel PCA, Schölkopf extended it to extract invariant features and to design invariant kernels and showed how to view other major dimensionality reduction methods such as LLE and Isomap as special cases. In further work with Alex Smola and others, he extended the SVM method to regression and classification with pre-specified sparsity and quantile/support estimation. He proved a representer theorem implying that SVMs, kernel PCA, and most other kernel algorithms, regularized by a norm in a reproducing kernel Hilbert space, have solutions taking the form of kernel expansions on the training data, thus reducing an infinite dimensional optimization problem to a finite dimensional one. He co-developed kernel embeddings of distributions methods to represent probability distributions in Hilbert Spaces, with links to Fraunhofer diffraction as well as applications to independence testing. === Causality === Starting in 2005, Schölkopf turned his attention to causal inference. Causal mechanisms in the world give rise to statistical dependencies as epiphenomena, but only the latter are exploited by popular machine learning algorithms. Knowledge about causal structures and mechanisms is useful by letting us predict not only future data coming from the same source, but also the effect of interventions in a system, and by facilitating transfer of detected regularities to new situations. Schölkopf and co-workers addressed (and in certain settings solved) the problem of causal discovery for the two-variable setting and connected causality to Kolmogorov complexity. Around 2010, Schölkopf began to explore how to use causality for machine learning, exploiting assumptions of independence of mechanisms and invariance. His early work on causal learning was exposed to a wider machine learning audience during his Posner lecture at NeurIPS 2011, as well as in a keynote talk at ICML 2017. He assayed how to exploit underlying causal structures in order to make machine learning methods more robust with respect to distribution shifts and systematic errors, the latter leading to the discovery of a number of new exoplanets including K2-18b, which was subsequently found to contain water vapour in its atmosphere, a first for an exoplanet in the habitable zone. == Education and employment == Schölkopf studied mathematics, physics, and philosophy in Tübingen and London. He was supported by the Studienstiftung and won the Lionel Cooper Memorial Prize for the best M.Sc. in Mathematics at the University of London. He completed a Diplom in Physics, and then moved to Bell Labs in New Jersey, where he worked with Vladimir Vapnik, who became co-adviser of his PhD thesis at TU Berlin (with Stefan Jähnichen). His thesis, defended in 1997, won the annual award of the German Informatics Association. In 2001, following positions in Berlin, Cambridge and New York, he founded the Department for Empirical Inference at the Max Planck Institute for Biological Cybernetics, which grew into a leading center for research in machine learning. In 2011, he became founding director at the Max Planck Institute for Intelligent Systems. With Alex Smola, Schölkopf co-founded the series of Machine Learning Summer Schools. He also co-founded a Cambridge-Tübingen PhD Programme and the Max Planck-ETH Center for Learning Systems. In 2016, he co-founded the Cyber Valley research consortium. He participated in the IEEE Global Initiative on "Ethically Aligned Design". Schölkopf is co-editor-in-Chief of the Journal of Machine Learning Research, a journal he helped found, being part of a mass resignation of the editorial board of Machine Learning (journal). He is among the world’s most cited computer scientists. Alumni of his lab include Ulrike von Luxburg, Carl Rasmussen, Matthias Hein, Arthur Gretton, Gunnar Rätsch, Matthias Bethge, Stefanie Jegelka, Jason Weston, Olivier Bousquet, Olivier Chapelle, Joaquin Quinonero-Candela, and Sebastian Nowozin. As of late 2023, Schölkopf is also a scientific advisor to French research group Kyutai which is being funded by Xavier Niel, Rodolphe Saadé, Eric Schmidt, and others. == Awards and recognition == Schölkopf’s awards include the Royal Society Milner Award and, shared with Isabelle Guyon and Vladimir Vapnik, the BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category. He was the first scientist working in Europe to receive this award. He was elected a Fellow of the Royal Society in 2026.

Michael Kearns (computer scientist)

Michael Justin Kearns is an American computer scientist, professor and National Center Chair at the University of Pennsylvania, the founding director of Penn's Singh Program in Networked & Social Systems Engineering (NETS), the founding director of Warren Center for Network and Data Sciences, and also holds secondary appointments in Penn's Wharton School and department of Economics. He is a leading researcher in computational learning theory and algorithmic game theory, and interested in machine learning, artificial intelligence, computational finance, algorithmic trading, computational social science and social networks. He previously led the Advisory and Research function in Morgan Stanley's Artificial Intelligence Center of Excellence team, and is currently an Amazon Scholar within Amazon Web Services. == Biography == Kearns was born into an academic family, where his father David R Kearns is Professor Emeritus at University of California, San Diego in chemistry, who won Guggenheim Fellowship in 1969, and his uncle Thomas R. Kearns is Professor Emeritus at Amherst College in Philosophy and Law, Jurisprudence, and Social Thought. His paternal grandfather Clyde W. Kearns was a pioneer in insecticide toxicology and was a professor at University of Illinois at Urbana–Champaign in Entomology, and his maternal grandfather Chen Shou-Yi (1899–1978) was a professor at Pomona College in history and literature, who was born in Canton (Guangzhou, China) into a family noted for their scholarship and educational leadership. Kearns received his B.S. degree at the University of California at Berkeley in math and computer science in 1985, and Ph.D. in computer science from Harvard University in 1989, under the supervision of Turing Award winner Leslie Valiant. His doctoral dissertation was The Computational Complexity of Machine Learning, later published by MIT press as part of the ACM Doctoral Dissertation Award Series in 1990. Before joining AT&T Bell Labs in 1991, he continued with postdoctoral positions at the Laboratory for Computer Science at MIT hosted by Ronald Rivest, and at the International Computer Science Institute (ICSI) in UC Berkeley hosted by Richard M. Karp, both of whom are Turing Award winners. Kearns is currently a full professor and National Center Chair at the University of Pennsylvania, where his appointment is split across the Department of Computer and Information Science, and Statistics and Operations and Information Management in the Wharton School. Prior to joining the Penn faculty in 2002, he spent a decade (1991–2001) in AT&T Labs and Bell Labs, including as head of the AI department with colleagues including Michael L. Littman, David A. McAllester, and Richard S. Sutton; Secure Systems Research department; and Machine Learning department with members such as Michael Collins and the leader Fernando Pereira. Other AT&T Labs colleagues in Algorithms and Theoretical Computer Science included Yoav Freund, Ronald Graham, Mehryar Mohri, Robert Schapire, and Peter Shor, as well as Sebastian Seung, Yann LeCun, Corinna Cortes, and Vladimir Vapnik (the V in VC dimension). Kearns was named Fellow of the Association for Computing Machinery (2014) for contributions to machine learning, and a fellow of the American Academy of Arts and Sciences (2012). His former graduate students and postdoctoral visitors include Ryan W. Porter, John Langford, and Jennifer Wortman Vaughan. Kearns' work has been reported by media, such as MIT Technology Review (2014) Can a Website Help You Decide to Have a Kid?, Bloomberg News (2014) Schneiderman (and Einstein) Pressure High-Speed Trading and NPR audio (2012) Online Education Grows Up, And For Now, It's Free. == Academic life == === Computational learning theory === Kearns and Umesh Vazirani published An introduction to computational learning theory, which has been a standard text on computational learning theory since it was published in 1994. === Weak learnability and the origin of Boosting algorithms === The question "is weakly learnability equivalent to strong learnability?" posed by Kearns and Valiant (Unpublished manuscript 1988, ACM Symposium on Theory of Computing 1989) is the origin of boosting machine learning algorithms, which got a positive answer by Robert Schapire (1990, proof by construction, not practical) and Yoav Freund (1993, by voting, not practical) and then they developed the practical AdaBoost (European Conference on Computational Learning Theory 1995, Journal of Computer and System Sciences 1997), an adaptive boosting algorithm that won the prestigious Gödel Prize (2003). == Honors and awards == 2021. Member of the U. S. National Academy of Sciences. 2014. ACM Fellow. For contributions to machine learning, artificial intelligence, and algorithmic game theory and computational social science. 2012. American Academy of Arts and Sciences Fellow. == Selected works == 2019. The Ethical Algorithm: The Science of Socially Aware Algorithm Design. (with Aaron Roth). Oxford University Press. 1994. An introduction to computational learning theory. (with Umesh Vazirani). MIT press. Widely used as a text book in computational learning theory courses. 1990. The computational complexity of machine learning. MIT press. Based on his 1989 doctoral dissertation; ACM Doctoral Dissertation Award Series in 1990 Archived 2014-11-03 at the Wayback Machine 1989. Cryptographic limitations on learning Boolean formulae and finite automata. (with Leslie Valiant) Proceedings of the twenty-first annual ACM symposium on Theory of computing (STOC'89). The open question: is weakly learnability equivalent to strong learnability?; The origin of boosting algorithms; Important publication in machine learning.

Sasha Luccioni

Alexandra Sasha Luccioni (née Vorobyova; born 1990) is a computer scientist specializing in the intersection of artificial intelligence (AI) and climate change. Her work focuses on quantifying the environmental impact of AI technologies and promoting sustainable practices in machine learning development. == Early life and education == Alexandra Sasha Vorobyova was born in the Ukrainian Soviet Socialist Republic in 1990. When she was four years old, her family relocated to Ontario, Canada. Her interest in science is influenced by her family's history; her mother, grandmother, and great-grandmother all pursued careers in scientific fields. Luccioni earned a B.A. in language science from University of Paris III: Sorbonne Nouvelle in 2010. Subsequently, she completed a M.S. in cognitive science, with a minor in natural language processing, at École normale supérieure in Paris in 2012. Luccioni obtained her PhD in cognitive computing from Université du Québec à Montréal (UQAM) in 2018. == Career == Luccioni began her professional career at Nuance Communications in 2017, where she focused on natural language processing (NLP) and machine learning (ML) techniques to enhance conversational agents. She then joined Morgan Stanley’s AI/ML Center of Excellence in 2018, working on explainable artificial intelligence (AI) and decision-making systems. In 2019, she became a postdoctoral researcher at Université de Montréal and Mila, collaborating with computer scientist Yoshua Bengio on a project titled This Climate Does Not Exist. This initiative used generative adversarial networks to visualize the effects of climate change. During this time, she also contributed to integrating fairness and accountability into machine learning education at Mila. Luccioni briefly worked with the United Nations Global Pulse in 2021, developing tools to monitor COVID-19 misinformation. Later that year, she joined Hugging Face as a research scientist. Her role includes quantifying the carbon footprint of AI systems, co-chairing the carbon working group in the Big Science project, and advancing responsible machine learning practices. She helped create "CodeCarbon," an open-source software tool that estimates the carbon emissions produced during the training and operation of machine learning models. In addition to her research, she has developed tools to measure the environmental impact of AI models, communicated findings through media engagements, and presented at international conferences, including a TED Talk. In 2024, she was listed on BBC 100 Women and Time 100 AI.

RFPolicy

The RFPolicy outlines a method for contacting vendors about security vulnerabilities found in their products. It was initially written in 2000 by hacker and security consultant Rain Forest Puppy. It was perhaps the second disclosure policy, following Simple Nomad's. The policy gives the vendor five working days to respond to the reporter of the bug. If the vendor fails to contact the reporter within those five days, the issue is recommended to be disclosed to the general community. The reporter should help the vendor reproduce the bug and work out a fix. The reporter should delay notifying the general community about the bug if the vendor provides feasible reasons for requiring so. If the vendor fails to respond or shuts down communication with the reporter of the problem within five working days, the reporter should disclose the issue to the general community. When issuing an alert or fix, the vendor should give the reporter proper credit for reporting the bug. Context for the history of vulnerability disclosure is available in a history article.

The Best Free AI Resume Builder for Beginners

Curious about the best AI resume builder? An AI resume builder is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI resume builder slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.