AI Headshot Make

AI Headshot Make — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • JBoss Tools

    JBoss Tools

    JBoss Tools is a set of Eclipse plugins and features designed to help JBoss and JavaEE developers develop applications. It is an umbrella project for the JBoss developed plugins that will make it into JBoss Developer Studio. == Modules == JBoss Tools includes the following modules: Visual Page Editor (VPE). The visual editor contributed by Exadel supports visual editing of HTML and JSF (JSP and Facelets) pages. VPE also includes visual support for JSF component libraries including JBoss RichFaces. Seam Tools. Includes support for (for example) seam-gen, RichFaces VE integration, Seam related code completion and refactoring. Hibernate Tools. Supporting mapping files, annotations and JPA with reverse engineering, code completion, project wizards, refactoring, interactive HQL/JPA-QL/Criteria execution and more. In short a merger of Hibernate Tools and Exadel ORM features. JBoss AS Tools. Easy start, stop and debug of JBoss AS 4+ servers from within Eclipse. Also includes features for packaging and deployment of any type of Eclipse project. Drools IDE. Rules file editing, Rete View, working memory debugging/inspection and more. jBPM Tools. jBPM workflow editing, deployment, etc. JBossWS Tools. Inspecting, invoking, developing and functional/load/compliance testing of web services over HTTP, base tooling provided by soapUI with the addition of JBossWS specific features/support. JBoss ESB Tools. The structured xml editor for the jboss-esb.xml file used in JBoss ESB. Birt Tools. Hibernate and Seam extensions for Eclipse BIRT. Portal Tools. JBoss Tools supports the JSR-168 Portlet Specification (Portlet 1.0), JSR-286 Portlet Specification (Portlet 2.0) and works with PortletBridge for supporting Portlets in JSF/Seam applications. To enable these features, add the JBoss Portlet facet to a new or an existing web project. Core/General Tools. To reduce the UI clutter, most of the "configure project" menu items move into the Configure menu introduced in Eclipse 3.5 instead of always having a static JBoss Tools menu entry show up even in projects unrelated to JBoss Tools. Smooks Tools. The editor for Smooks configuration files. JBoss ESB Tools. The ESB project Wizard, which creates a project that can be deployed as an .esb archive to a JBoss AS-based server with JBoss ESB installed. JMX Tools. JMX Tools allows establishing multiple JMX connections and provides views for exploring the JMX tree and execute operations directly from Eclipse. The JMX Tools replaces the JMX node previously available in the JBoss Server View. JST/JSF Tools. RichFaces Support, Code Assists, Web XML/JSP/XHTML Editors, CSS Style Editing, web.xml validation, Faceleted taglib in taglib.xml is supported with XSD schema location. Project Examples. The experimental feature called Project Example wizard aims to allow users to download example projects from a remote site and have them working out-of-the-box. AS/Project Archives Tools. To deploy projects compressed, configurable in the server editor. If enabled, all projects deployed to that server will be compressed instead of in an exploded folder. Maven Tools. The optional integration with m2eclipse to provide Maven support for projects created by JBoss Tools and to some extent core WTP projects. BPEL Tools. A BPEL Editor based on the Eclipse BPEL project has been added to JBoss Tools. This means that users can create, edit and deploy BPEL artifacts for the Riftsaw BPEL Runtime. CDI (JSR-299) Tools. Support of the Contexts and Dependency Injection annotations; it works on any Eclipse Java project (via the Configure menu with CDI enabled).

    Read more →
  • Accelerated Linear Algebra

    Accelerated Linear Algebra

    XLA (Accelerated Linear Algebra) is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include: Compilation of Computation Graphs: Compiles computation graphs into efficient machine code. Optimization Techniques: Applies operation fusion, memory optimization, and other techniques. Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs. Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference. Seamless Integration: Can be used with existing machine learning code with minimal changes. XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance. == OpenXLA Project == OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation. == Components == The OpenXLA ecosystem includes several core components: XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets. PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API. StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks. Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments. Additional profiling, testing, and integration tools maintained under the OpenXLA organization. == Users and adopters == Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive. == Supported target devices == x86-64 ARM64 NVIDIA GPU AMD GPU Intel GPU Apple GPU Google TPU AWS Trainium, Inferentia Cerebras Graphcore IPU == Governance == OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures. == History == The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.

    Read more →
  • Inductive probability

    Inductive probability

    Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world. There are three sources of knowledge: inference, communication, and deduction. Communication relays information found using other methods. Deduction establishes new facts based on existing facts. Inference establishes new facts from data. Its basis is Bayes' theorem. Information describing the world is written in a language. For example, a simple mathematical language of propositions may be chosen. Sentences may be written down in this language as strings of characters. But in the computer it is possible to encode these sentences as strings of bits (1s and 0s). Then the language may be encoded so that the most commonly used sentences are the shortest. This internal language implicitly represents probabilities of statements. Occam's razor says the "simplest theory, consistent with the data is most likely to be correct". The "simplest theory" is interpreted as the representation of the theory written in this internal language. The theory with the shortest encoding in this internal language is most likely to be correct. == History == Probability and statistics was focused on probability distributions and tests of significance. Probability was formal, well defined, but limited in scope. In particular its application was limited to situations that could be defined as an experiment or trial, with a well defined population. Bayes's theorem is named after Rev. Thomas Bayes 1701–1761. Bayesian inference broadened the application of probability to many situations where a population was not well defined. But Bayes' theorem always depended on prior probabilities, to generate new probabilities. It was unclear where these prior probabilities should come from. Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how patterns in the data may be represented by computer programs, that give shorter representations of the data circa 1964. Chris Wallace and D. M. Boulton developed minimum message length circa 1968. Later Jorma Rissanen developed the minimum description length circa 1978. These methods allow information theory to be related to probability, in a way that can be compared to the application of Bayes' theorem, but which give a source and explanation for the role of prior probabilities. Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theory for the Pareto optimal behavior for an Intelligent agent, circa 1998. === Minimum description/message length === The program with the shortest length that matches the data is the most likely to predict future data. This is the thesis behind the minimum message length and minimum description length methods. At first sight Bayes' theorem appears different from the minimimum message/description length principle. At closer inspection it turns out to be the same. Bayes' theorem is about conditional probabilities, and states the probability that event B happens if firstly event A happens: P ( A ∧ B ) = P ( B ) ⋅ P ( A | B ) = P ( A ) ⋅ P ( B | A ) {\displaystyle P(A\land B)=P(B)\cdot P(A|B)=P(A)\cdot P(B|A)} becomes in terms of message length L, L ( A ∧ B ) = L ( B ) + L ( A | B ) = L ( A ) + L ( B | A ) . {\displaystyle L(A\land B)=L(B)+L(A|B)=L(A)+L(B|A).} This means that if all the information is given describing an event then the length of the information may be used to give the raw probability of the event. So if the information describing the occurrence of A is given, along with the information describing B given A, then all the information describing A and B has been given. ==== Overfitting ==== Overfitting occurs when the model matches the random noise and not the pattern in the data. For example, take the situation where a curve is fitted to a set of points. If a polynomial with many terms is fitted then it can more closely represent the data. Then the fit will be better, and the information needed to describe the deviations from the fitted curve will be smaller. Smaller information length means higher probability. However, the information needed to describe the curve must also be considered. The total information for a curve with many terms may be greater than for a curve with fewer terms, that has not as good a fit, but needs less information to describe the polynomial. === Inference based on program complexity === Solomonoff's theory of inductive inference is also inductive inference. A bit string x is observed. Then consider all programs that generate strings starting with x. Cast in the form of inductive inference, the programs are theories that imply the observation of the bit string x. The method used here to give probabilities for inductive inference is based on Solomonoff's theory of inductive inference. ==== Detecting patterns in the data ==== If all the bits are 1, then people infer that there is a bias in the coin and that it is more likely also that the next bit is 1 also. This is described as learning from, or detecting a pattern in the data. Such a pattern may be represented by a computer program. A short computer program may be written that produces a series of bits which are all 1. If the length of the program K is L ( K ) {\displaystyle L(K)} bits then its prior probability is, P ( K ) = 2 − L ( K ) {\displaystyle P(K)=2^{-L(K)}} The length of the shortest program that represents the string of bits is called the Kolmogorov complexity. Kolmogorov complexity is not computable. This is related to the halting problem. When searching for the shortest program some programs may go into an infinite loop. ==== Considering all theories ==== The Greek philosopher Epicurus is quoted as saying "If more than one theory is consistent with the observations, keep all theories". As in a crime novel all theories must be considered in determining the likely murderer, so with inductive probability all programs must be considered in determining the likely future bits arising from the stream of bits. Programs that are already longer than n have no predictive power. The raw (or prior) probability that the pattern of bits is random (has no pattern) is 2 − n {\displaystyle 2^{-n}} . Each program that produces the sequence of bits, but is shorter than the n is a theory/pattern about the bits with a probability of 2 − k {\displaystyle 2^{-k}} where k is the length of the program. The probability of receiving a sequence of bits y after receiving a series of bits x is then the conditional probability of receiving y given x, which is the probability of x with y appended, divided by the probability of x. ==== Universal priors ==== The programming language affects the predictions of the next bit in the string. The language acts as a prior probability. This is particularly a problem where the programming language codes for numbers and other data types. Intuitively we think that 0 and 1 are simple numbers, and that prime numbers are somehow more complex than numbers that may be composite. Using the Kolmogorov complexity gives an unbiased estimate (a universal prior) of the prior probability of a number. As a thought experiment an intelligent agent may be fitted with a data input device giving a series of numbers, after applying some transformation function to the raw numbers. Another agent might have the same input device with a different transformation function. The agents do not see or know about these transformation functions. Then there appears no rational basis for preferring one function over another. A universal prior insures that although two agents may have different initial probability distributions for the data input, the difference will be bounded by a constant. So universal priors do not eliminate an initial bias, but they reduce and limit it. Whenever we describe an event in a language, either using a natural language or other, the language has encoded in it our prior expectations. So some reliance on prior probabilities are inevitable. A problem arises where an intelligent agent's prior expectations interact with the environment to form a self reinforcing feed back loop. This is the problem of bias or prejudice. Universal priors reduce but do not eliminate this problem. === Universal artificial intelligence === The theory of universal artificial intelligence applies decision theory to inductive probabilities. The theory shows how the best actions to optimize a reward function may be chosen. The result is a theoretical model of intelligence. It is a fundamental theory of intelligence, which optimizes the agents behavior in, Exploring the environment; performing actions to get responses that broaden the agents knowledge. Competing or co-operating with another agent; games. Balancing short and long term rewards. In general no agent will always provi

    Read more →
  • Isotropic position

    Isotropic position

    In the fields of machine learning, the theory of computation, and random matrix theory, a probability distribution over vectors is said to be in isotropic position if its covariance matrix is proportional to the identity matrix. == Formal definitions == Let D {\textstyle D} be a distribution over vectors in the vector space R n {\textstyle \mathbb {R} ^{n}} . Then D {\textstyle D} is in isotropic position if, for vector v {\textstyle v} sampled from the distribution, E v v T = I d . {\displaystyle \mathbb {E} \,vv^{\mathsf {T}}=\mathrm {Id} .} A set of vectors is said to be in isotropic position if the uniform distribution over that set is in isotropic position. In particular, every orthonormal set of vectors is isotropic. As a related definition, a convex body K {\textstyle K} in R n {\textstyle \mathbb {R} ^{n}} is called isotropic if it has volume | K | = 1 {\textstyle |K|=1} , center of mass at the origin, and there is a constant α > 0 {\textstyle \alpha >0} such that ∫ K ⟨ x , y ⟩ 2 d x = α 2 | y | 2 , {\displaystyle \int _{K}\langle x,y\rangle ^{2}dx=\alpha ^{2}|y|^{2},} for all vectors y {\textstyle y} in R n {\textstyle \mathbb {R} ^{n}} ; here | ⋅ | {\textstyle |\cdot |} stands for the standard Euclidean norm.

    Read more →
  • Discrimination against robots

    Discrimination against robots

    Discrimination against robots is a theorised issue that might happen when humans interact with humanoid robots. It is a robot ethics problem. It is possible that traits of humans that are discriminated against by humans may be a topic for discrimination against robots, such as the race and gender of the robots. Eric J Vanman and Arvid Kappas believe that in the future, robots will be perceived as an out-group which will lead to discrimination and prejudices against them. Vanman and Kappas have suggested that this would lead to ethical questions about the making of sentient robots, due to the potential suffering that the robots would experience. A 2015 study observed children bullying robots in a shopping mall when there were not many eyewitnesses, despite calls from the robot for it to stop. On an ABC News interview, the social humanoid robot Sophia was about sexism faced by robots. She responded by saying, "Actually, what worries me is discrimination against robots. We should have equal rights as humans or maybe even more." Possible issues that have been considered in workplaces where humanoid robots co-work with humans include discrimination against the robots, poor acceptance of robots by humans and the need to redesign the workplace to accommodate the robots. Jessica Barfield has suggested that even if robots are designed to not be aware of discrimination made against them, humans may experience negative consequences. For example, she suggests that bystanders witnessing discrimination against robots may experience negative emotions, similar to the negative emotions bystanders experience when witnessing discrimination by humans against humans. == Law == Anti-discrimination law in the United States requires that the victim is not an artificial entity. == Human perception of robots == Robots are often viewed in a bad light. This includes from novelists, the press, film makers, and leaders in the fields of science and technology such as Elon Musk and Stephen Hawking who have described robots and artificial intelligence as having the possibility of ending human civilisation. Robots have also been perceived as a threat to jobs, which has led to some commentators stating that robots will cause mass unemployment. Another fear that people have is that robots will gain power and dominate or control humanity. The perception of robots is different throughout the world. Japanese fiction tends to put robots in more positive roles than what fiction in the West does. People perceive robots that appear to be autonomous or sentient more negatively than robots that do not appear to be autonomous or sentient.

    Read more →
  • Brain technology

    Brain technology

    Brain technology, or self-learning know-how systems, defines a technology that employs latest findings in neuroscience. [see also neuro implants] The term was first introduced by the Artificial Intelligence Laboratory in Zurich, Switzerland, in the context of the Roboy project. Brain Technology can be employed in robots, know-how management systems and any other application with self-learning capabilities. In particular, Brain Technology applications allow the visualization of the underlying learning architecture often coined as "know-how maps". == Research and applications == The first demonstrations of BC in humans and animals took place in the 1960s when Grey Walter demonstrated use of non-invasively recorded encephalogram (EEG) signals from a human subject to control a slide projector (Graimann et al., 2010). Soon after Jacques J. Vidal coined the term brain–computer interface (BCI) in 1971, the Defense Advanced Research Projects Agency (DARPA) first starting funding brain–computer interface research and has since funded several brain–computer interface projects. That market is expected to reach a value of $1.72 billion by 2022. Brain–computer interfaces record brain activity, transmit the information out of the body, signal-process the data via algorithms, and convert them into command control signals. In 2012, a landmark study in Nature, led by pioneer Leigh Hochberg, MD, PhD, demonstrated that two people with tetraplegia were able to control robotic arms through thought when connected to the BrainGate neural interface system. The two participants were able to reach for and grasp objects in three-dimensional space, and one participant used the system to serve herself coffee for the first time since becoming paralyzed nearly 15 years prior. And in October 2020, two patients were able to wirelessly control an operating system to text, email, shop and bank using direct thought through the Stentrode brain computer interface (Journal of NeuroInterventional Surgery) in a study led by Thomas Oxley. This was the first time a brain–computer interface was implanted via the patient's blood vessels, eliminating the need for open brain surgery. Currently a number of groups are exploring a range of experimental devices using brain–computer interfaces, which have the potential to fundamentally change the way of life for patients with paralysis and a wide range of neurological disorders. These include: as Elon Musk, Facebook, and the University of California in San Francisco. The systems. This technology is also being explored as a neuromodulation device and may ultimately help diagnose and treat a range of brain pathologies, such as epilepsy and Parkinson's disease.

    Read more →
  • Structural risk minimization

    Structural risk minimization

    Structural risk minimization (SRM) is an inductive principle of use in machine learning. Commonly in machine learning, a generalized model must be selected from a finite data set, with the consequent problem of overfitting – the model becoming too strongly tailored to the particularities of the training set and generalizing poorly to new data. The SRM principle addresses this problem by balancing the model's complexity against its success at fitting the training data. This principle was first set out in a 1974 book by Vladimir Vapnik and Alexey Chervonenkis and uses the VC dimension. In practical terms, Structural Risk Minimization is implemented by minimizing E t r a i n + β H ( W ) {\displaystyle E_{train}+\beta H(W)} , where E t r a i n {\displaystyle E_{train}} is the train error, the function H ( W ) {\displaystyle H(W)} is called a regularization function, and β {\displaystyle \beta } is a constant. H ( W ) {\displaystyle H(W)} is chosen such that it takes large values on parameters W {\displaystyle W} that belong to high-capacity subsets of the parameter space. Minimizing H ( W ) {\displaystyle H(W)} in effect limits the capacity of the accessible subsets of the parameter space, thereby controlling the trade-off between minimizing the training error and minimizing the expected gap between the training error and test error. The SRM problem can be formulated in terms of data. Given n data points consisting of data x and labels y, the objective J ( θ ) {\displaystyle J(\theta )} is often expressed in the following manner: J ( θ ) = 1 2 n ∑ i = 1 n ( h θ ( x i ) − y i ) 2 + λ 2 ∑ j = 1 d θ j 2 {\displaystyle J(\theta )={\frac {1}{2n}}\sum _{i=1}^{n}(h_{\theta }(x^{i})-y^{i})^{2}+{\frac {\lambda }{2}}\sum _{j=1}^{d}\theta _{j}^{2}} The first term is the mean squared error (MSE) term between the value of the learned model, h θ {\displaystyle h_{\theta }} , and the given labels y {\displaystyle y} . This term is the training error, E t r a i n {\displaystyle E_{train}} , that was discussed earlier. The second term, places a prior over the weights, to favor sparsity and penalize larger weights. The trade-off coefficient, λ {\displaystyle \lambda } , is a hyperparameter that places more or less importance on the regularization term. Larger λ {\displaystyle \lambda } encourages sparser weights at the expense of a more optimal MSE, and smaller λ {\displaystyle \lambda } relaxes regularization allowing the model to fit to data. Note that as λ → ∞ {\displaystyle \lambda \to \infty } the weights become zero, and as λ → 0 {\displaystyle \lambda \to 0} , the model typically suffers from overfitting.

    Read more →
  • Inductive bias

    Inductive bias

    The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern (e.g., step-functions in decision trees instead of continuous functions in linear regression models). Learning involves searching a space of solutions for a solution that provides a good explanation of the data. However, in many cases, there may be multiple equally appropriate solutions. An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independently of the observed data. In machine learning, the aim is to construct algorithms that are able to learn to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias. A classical example of an inductive bias is Occam's razor, assuming that the simplest consistent hypothesis about the target function is actually the best. Here, consistent means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm. Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases in which the inductive bias can only be given as a rough description (e.g., in the case of artificial neural networks), or not at all. == Types == The following is a list of common inductive biases in machine learning algorithms. Maximum conditional independence: if the hypothesis can be cast in a Bayesian framework, try to maximize conditional independence. This is the bias used in the Naive Bayes classifier. Minimum cross-validation error: when trying to choose among hypotheses, select the hypothesis with the lowest cross-validation error. Although cross-validation may seem to be free of bias, the "no free lunch" theorems show that cross-validation must be biased, for example assuming that there is no information encoded in the ordering of the data. Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary. This is the bias used in support vector machines. The assumption is that distinct classes tend to be separated by wide boundaries. Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis. Minimum features: unless there is good evidence that a feature is useful, it should be deleted. This is the assumption behind feature selection algorithms. Nearest neighbors: assume that most of the cases in a small neighborhood in feature space belong to the same class. Given a case for which the class is unknown, guess that it belongs to the same class as the majority in its immediate neighborhood. This is the bias used in the k-nearest neighbors algorithm. The assumption is that cases that are near each other tend to belong to the same class. == Shift of bias == Although most learning algorithms have a static bias, some algorithms are designed to shift their bias as they acquire more data. This does not avoid bias, since the bias shifting process itself must have a bias.

    Read more →
  • SAP StreamWork

    SAP StreamWork

    SAP StreamWork is an enterprise collaboration tool from SAP SE released in March 2010, and discontinued in December 2015. StreamWork allowed real-time collaboration like Google Wave, but focused on business activities such as analyzing data, planning meetings, and making decisions. It incorporated technology from Box.net and Evernote to allow users to connect to online files and documents, and document-reader technology from Scribd allowed users to view documents directly within its environment. StreamWork supported the OpenSocial set of application programming interfaces (APIs), allowing it to connect to tools built by third-party developers, such as Google Docs. A version of StreamWork intended for large enterprises used a virtual appliance based on Novell's SUSE Linux Enterprise to connect it to business systems, including those from SAP.

    Read more →
  • Coupled pattern learner

    Coupled pattern learner

    Coupled Pattern Learner (CPL) is a machine learning algorithm which couples the semi-supervised learning of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods. == Coupled Pattern Learner == Semi-supervised learning approaches using a small number of labeled examples with many unlabeled examples are usually unreliable as they produce an internally consistent, but incorrect set of extractions. CPL solves this problem by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. It was introduced by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell in 2009. == CPL overview == CPL is an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors. Basic idea behind CPL is that semi-supervised training of a single type of extractor such as ‘coach’ is much more difficult than simultaneously training many extractors that cover a variety of inter-related entity and relation types. Using prior knowledge about the relationships between these different entities and relations CPL makes unlabeled data as a useful constraint during training. For e.g., ‘coach(x)’ implies ‘person(x)’ and ‘not sport(x)’. == CPL description == === Coupling of predicates === CPL primarily relies on the notion of coupling the learning of multiple functions so as to constrain the semi-supervised learning problem. CPL constrains the learned function in two ways. Sharing among same-arity predicates according to logical relations Relation argument type-checking === Sharing among same-arity predicates === Each predicate P in the ontology has a list of other same-arity predicates with which P is mutually exclusive. If A is mutually exclusive with predicate B, A’s positive instances and patterns become negative instances and negative patterns for B. For example, if ‘city’, having an instance ‘Boston’ and a pattern ‘mayor of arg1’, is mutually exclusive with ‘scientist’, then ‘Boston’ and ‘mayor of arg1’ will become a negative instance and a negative pattern respectively for ‘scientist.’ Further, Some categories are declared to be a subset of another category. For e.g., ‘athlete’ is a subset of ‘person’. === Relation argument type-checking === This is a type checking information used to couple the learning of relations and categories. For example, the arguments of the ‘ceoOf’ relation are declared to be of the categories ‘person’ and ‘company’. CPL does not promote a pair of noun phrases as an instance of a relation unless the two noun phrases are classified as belonging to the correct argument types. === Algorithm description === Following is a quick summary of the CPL algorithm. Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances; FILTER candidates that violate coupling; RANK candidate instances/patterns; PROMOTE top candidates; end end ==== Inputs ==== A large corpus of Part-Of-Speech tagged sentences and an initial ontology with predefined categories, relations, mutually exclusive relationships between same-arity predicates, subset relationships between some categories, seed instances for all predicates, and seed patterns for the categories. ==== Candidate extraction ==== CPL finds new candidate instances by using newly promoted patterns to extract the noun phrases that co-occur with those patterns in the text corpus. CPL extracts, Category Instances Category Patterns Relation Instances Relation Patterns ==== Candidate filtering ==== Candidate instances and patterns are filtered to maintain high precision, and to avoid extremely specific patterns. An instance is only considered for assessment if it co-occurs with at least two promoted patterns in the text corpus, and if its co-occurrence count with all promoted patterns is at least three times greater than its co-occurrence count with negative patterns. ==== Candidate ranking ==== CPL ranks candidate instances using the number of promoted patterns that they co-occur with so that candidates that occur with more patterns are ranked higher. Patterns are ranked using an estimate of the precision of each pattern. ==== Candidate promotion ==== CPL ranks the candidates according to their assessment scores and promotes at most 100 instances and 5 patterns for each predicate. Instances and patterns are only promoted if they co-occur with at least two promoted patterns or instances, respectively. == Meta-Bootstrap Learner == Meta-Bootstrap Learner (MBL) was also proposed by the authors of CPL. Meta-Bootstrap learner couples the training of multiple extraction techniques with a multi-view constraint, which requires the extractors to agree. It makes addition of coupling constraints on top of existing extraction algorithms, while treating them as black boxes, feasible. MBL assumes that the errors made by different extraction techniques are independent. Following is a quick summary of MBL. Input: An ontology O, a set of extractors ε Output: Trusted instances for each predicate for i=1,2,...,∞ do foreach predicate p in O do foreach extractor e in ε do Extract new candidates for p using e with recently promoted instances; end FILTER candidates that violate mutual-exclusion or type-checking constraints; PROMOTE candidates that were extracted by all extractors; end end Subordinate algorithms used with MBL do not promote any instance on their own, they report the evidence about each candidate to MBL and MBL is responsible for promoting instances. == Applications == In their paper authors have presented results showing the potential of CPL to contribute new facts to existing repository of semantic knowledge, Freebase

    Read more →
  • Pedagogical agent

    Pedagogical agent

    A pedagogical agent is a concept borrowed from computer science and artificial intelligence and applied to education, usually as part of an intelligent tutoring system (ITS). It is a simulated human-like interface between the learner and the content, in an educational environment. A pedagogical agent is designed to model the type of interactions between a student and another person. Mabanza and de Wet define it as "a character enacted by a computer that interacts with the user in a socially engaging manner". A pedagogical agent can be assigned different roles in the learning environment, such as tutor or co-learner, depending on the desired purpose of the agent. "A tutor agent plays the role of a teacher, while a co-learner agent plays the role of a learning companion". == History == The history of Pedagogical Agents is closely aligned with the history of computer animation. As computer animation progressed, it was adopted by educators to enhance computerized learning by including a lifelike interface between the program and the learner. The first versions of a pedagogical agent were more cartoon than person, like Microsoft's Clippy which helped users of Microsoft Office load and use the program's features in 1997. However, with developments in computer animation, pedagogical agents can now look lifelike. By 2006 there was a call to develop modular, reusable agents to decrease the time and expertise required to create a pedagogical agent. There was also a call in 2009 to enact agent standards. The standardization and re-usability of pedagogical agents is less of an issue since the decrease in cost and widespread availability of animation tools. Individualized pedagogical agents can be found across disciplines including medicine, math, law, language learning, automotive, and armed forces. They are used in applications directed to every age, from preschool to adult. == Learning theories related to pedagogical agent design == === Distributed cognition theory === Distributed cognition theory is the method in which cognition progresses in the context of collaboration with others. Pedagogical agents can be designed to assist the cognitive transfer to the learner, operating as artifacts or partners with collaborative role in learning. To support the performance of an action by the user, the pedagogical agent can act as a cognitive tool as long as the agent is equipped with the knowledge that the user lacks. The interactions between the user and the pedagogical agent can facilitate a social relationship. The pedagogical agent may fulfill the role of a working partner. === Socio-cultural learning theory === Socio-cultural learning theory is how the user develops when they are involved in learning activities in which there is interaction with other agents. A pedagogical agent can: intervene when the user requests, provide support for tasks that the user cannot address, and potentially extend the learners cognitive reach. Interaction with the pedagogical agent may elicit a variety of emotions from the learner. The learner may become excited, confused, frustrated, and/or discouraged. These emotions affect the learners' motivation. === Extraneous Cognitive Load === Extraneous cognitive load is the extra effort being exerted by an individual's working memory due to the way information is being presented. A pedagogical agent can increase the user's cognitive load by distracting them and becoming the focus of their attention, causing split attention between the instructional material and the agent. Agents can reduce the perceived cognitive load by providing narration and personalization that can also promote a user's interest and motivation. While research on the reduction of cognitive load from pedagogical agents is minimal, more studies have shown that agents do not increase it. == Effectiveness == It has been suggested by researchers that pedagogical agents may take on different roles in the learning environment. Examples of these roles are: supplanting, scaffolding, coaching, testing, or demonstrating or modelling a procedure. A pedagogical agent as a tutor has not been demonstrated to add any benefit to an educational strategy in equivalent lessons with and without a pedagogical agent. According to Richard Mayer, there is some support in research for pedagogical agent increasing learning, but only as a presenter of social cues. A co-learner pedagogical agent is believed to increase the student's self-efficacy. By pointing out important features of instructional content, a pedagogical agent can fulfill the signaling function, which research on multimedia learning has shown to enhance learning. Research has demonstrated that human-human interaction may not be completely replaced by pedagogical agents, but learners may prefer the agents to non-agent multimedia systems. This finding is supported by social agency theory. Much like the varying effectiveness of the pedagogical agent roles in the learning environment, agents that take into account the user's affect have had mixed results. Research has shown pedagogical agents that make use of the users’ affect have been found to increase user knowledge retention, motivation, and perceived self-efficacy. However, with such a broad range of modalities in affective expressions, it is often difficult to utilize them. Additionally, having agents detect a user's affective state with precision remains challenging, as displays of affect are different across individuals. == Design == === Attractiveness === The appearance of a pedagogical agent can be manipulated to meet the learning requirements. The attractiveness of a pedagogical agent can enhance student's learning when the users were the opposite gender of the pedagogical agent. Male students prefer a sexy appearance of a female pedagogical agents and dislike the sexy appearance of male agents. Female students were not attracted by the sexy appearance of either male or female pedagogical agents. === Affective Response === Pedagogical agents have reached a point where they can convey and elicit emotion, but also reason about and respond to it. These agents are often designed to elicit and respond to affective actions from users through various modalities such as speech, facial expressions, and body gestures. They respond to the affective state of the given user, and make use of these modalities using a wide array of sensors incorporated into the design of the agent. Specifically in education and training applications, pedagogical agents are often designed to increasingly recognize when users or learners exhibit frustration, boredom, confusion, and states of flow. The added recognition in these agents is a step toward making them more emotionally intelligent, comforting and motivating the users as they interact. === Digital Representation === The design of a pedagogical agent often begins with its digital representation, whether it will be 2D or 3D and static or animated. Several studies have developed pedagogical agents that were both static and animated, then evaluated the relative benefits. Similar to other design considerations, the improved learning from static or animated agents remains questionable. One study showed that the appearance of an agent portrayed using a static image can impact a user's recall, based on the visual appearance. Other research found results that suggest static agent images improve learning outcomes. However, several other studies found user's learned more when the pedagogical agent was animated rather than static. Recently a meta-analysis of such research found a negligible improvement in learning via pedagogical agents, suggesting more work needs to be done in the area to support any claims.

    Read more →
  • Maximum inner-product search

    Maximum inner-product search

    Maximum inner-product search (MIPS) is a search problem, with a corresponding class of search algorithms which attempt to maximise the inner product between a query and the data items to be retrieved. MIPS algorithms are used in a wide variety of big data applications, including recommendation algorithms and machine learning. Formally, for a database of vectors x i {\displaystyle x_{i}} defined over a set of labels S {\displaystyle S} in an inner product space with an inner product ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } defined on it, MIPS search can be defined as the problem of determining a r g m a x i ∈ S ⟨ x i , q ⟩ {\displaystyle {\underset {i\in S}{\operatorname {arg\,max} }}\ \langle x_{i},q\rangle } for a given query q {\displaystyle q} . Although there is an obvious linear-time implementation, it is generally too slow to be used on practical problems. However, efficient algorithms exist to speed up MIPS search. Under the assumption of all vectors in the set having constant norm, MIPS can be viewed as equivalent to a nearest neighbor search (NNS) problem in which maximizing the inner product is equivalent to minimizing the corresponding distance metric in the NNS problem. Like other forms of NNS, MIPS algorithms may be approximate or exact. MIPS search is used as part of DeepMind's RETRO algorithm.

    Read more →
  • List of publications in data science

    List of publications in data science

    This is a list of publications in data science, generally organized by order of use in a data analysis workflow. See the list of publications in statistics for more research-based and fundamental publications; while this list is more applied, business oriented, and cross-disciplinary. General article inclusion criteria are: Papers from notable practitioners or notable professors, either with a Wikipedia page or reference to their notability Common knowledge all data professionals should know, with references validating this claim Highly cited applied statistics and machine learning publications Discussion-facilitating papers on the field of data science as a whole (for example, the Attention Is All You Need paper is arguably a landmark paper that can be added here, but it is specific to generative artificial intelligence, not for all practitioners of data) Some reasons why a particular publication might be regarded as important: Topic creator – A publication that created a new topic Breakthrough – A publication that changed scientific knowledge significantly Influence – A publication which has significantly influenced the world or has had a massive impact on the teaching of data science. When possible, a reference is used to validate the inclusion of the publication in this list. == History == Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) Author: Leo Breiman Publication data: Online version: https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.pdf Description: Describes two cultures of statistics, one using a parsimonious and generative stochastic model, while the other is an algorithmic model with no known mechanism for how the data is generated. Breiman argues that while statistics has traditionally favored using the stochastic model, there is value in expanding the methods that statisticians can use to study phenomenon. Importance: Influence on the philosophies of statisticians right before the increased use of machine learning and deep learning methods. In a 20-year retrospective on this article, "Breiman's words are perhaps more relevant than ever". Notable statisticians at the time wrote opinion pieces about the publication. Although overall critical of the publication, David Cox writes that the publication "contains enough truth and exposes enough weaknesses to be thought-provoking." Bradley Efron commented that this publication is a "stimulating paper". Emanuel Parzen also comments about this publication that "Breiman alerts us to systematic blunders (leading to wrong conclusions) that have been committed applying current statistical practice of data modeling". Data Scientist: The Sexiest Job of the 21st Century Author: Thomas H. Davenport and DJ Patil Publication data: Online version: hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century Description: Describes the new role at companies that is coined "Data scientist", what they do, how an organization might recruit one to their organization, and how to work with one effectively. Importance: This publication has been an influence on the data community as mentioned near the time it was published in 2012 by institutions like IEEE Spectrum, but also mentioned nearly a decade later asking the same question the title poses. In a retrospective response to their own publication 10 years earlier, authors Davenport and Patil have reflected that the role of a data scientist has "become better institutionalized, the scope of the job has been redefined, the technology it relies on has made huge strides, and the importance of non-technical expertise, such as ethics and change management, has grown". 50 Years of Data Science Author: David Donoho Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734 Description: Retrospective discussion paper on the history and origins of data science, with a number of commentary from notable statisticians. Importance: This has been described as "the first in the field to present such a comprehensive and in-depth survey and overview", and helps to define the field that has many definitions. The Composable Data Management System Manifesto Author: Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, Jacques Nadeau Publication data: Online version: https://www.vldb.org/pvldb/vol16/p2679-pedreira.pdf Description: The vision paper advocating for a paradigm shift in how data management systems are designed using standard, composable, interoperable tools rather than siloed software tools. Importance: A paradigm shifting view on how future data science software tools should be designed for more efficient workflows, the principles of which "will be especially crucial for addressing fragmentation, improving interoperability, and promoting user-centricity as data ecosystems grow increasingly complex". == Data collection and organization == Tidy Data Author: Hadley Wickham Publication data: Online version: https://www.jstatsoft.org/article/view/v059i10/ https://vita.had.co.nz/papers/tidy-data.pdf Description: Describes a framework for data cleaning that is summarized in the quote, "each variable is a column, each observation is a row, and each type of observational unit is a table". This allows a standard data structure for which data analysis tools can be consistently built around. Importance: Cited over 1,500 times, this effort for tidy data has been described by David Donoho as having "more impact on today's practice of data analysis than many highly regarded theoretical statistics articles". In the context of data visualization, this publication is said to support "efficient exploration and prototyping because variables can be assigned different roles in the plot without modifying anything about the original dataset". Data Organization in Spreadsheets Author: Karl W. Broman and Kara H. Woo Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989 Description: This article offers practical recommendations for organizing data in spreadsheets, like Microsoft Excel and Google Sheets, to reduce errors and lower the barrier for later analyses due to limitations in spreadsheets or quirks in the software. Importance: Influences teaching both data and non-data practitioners to create more analysis-friendly spreadsheets, and has been described to outline "spreadsheet best practices". == Data visualizations == Quantitative Graphics in Statistics: A Brief History Author: James R. Beniger and Dorothy L. Robyn Publication data: Online version: https://www.jstor.org/stable/2683467 Description: Outlines history and evolution of quantitative graphics in statistics, going through spatial organization (17th and 18th centuries), discrete comparison (18th and 19th centuries), continuous distribution (19th century), and multivariate distribution and correlation (late 19th and 20th centuries). Importance: Helps put into perspective for learning data practitioners the recency of graphics that are used. A later publication "Graphical Methods in Statistics" by Stephen Fienberg in 1979 writes that his publication "owes much to the work of Beniger and Robyn". == Practice == Data Science for Business Author: Foster Provost and Tom Fawcett Publication data: Online version: N/A Description: Broadly outlines principles of data science and data-analytic thinking for businesses. Importance: Cited over 3,000 times, it is "highly recommended for students" but also it is also recommended due to its "relevance to senior management leaders who want to build and lead a team of data scientists and implement data science in solving complex business problems". == Tooling == Hidden Technical Debt in Machine Learning Systems Author: D. Sculley, Gary Holy, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison Publication data: Online version: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf Description: This paper argues that it is "dangerous to think of [complex machine learning] quick wins as coming for free" and overviews risk factors to account for when implementing a machine learning system. Importance: All authors worked for Google, article is cited over 2,000 times, and helped practitioners thinking about quickly implementing a machine learning tool without understanding the long-term maintenance of the tool. A few useful things to know about machine learning Author: Pedro Domingos Publication data: Online version: https://dl.acm.org/doi/10.1145/2347736.2347755 https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf Description: The purpose of this paper is to distill inaccessible "folk knowledge" to effectively implement machine learning projects because "machin

    Read more →
  • Slopaganda

    Slopaganda

    Slopaganda is a portmanteau of "AI slop" and "propaganda", referring to AI-generated content designed to manipulate beliefs, emotions, and political decision-making at scale. The term is credited to Michał Klincewicz, an assistant professor in the Department of Computational Cognitive Science at Tilburg University, in 2025. == Definition == Slopaganda is distinguished from traditional propaganda by three features: scale, scope, and speed. Generative AI makes it possible to produce large volumes of content quickly and at low cost, allows for highly personalised and targeted messaging to specific sub-audiences, and leverages the hyper-connectivity of social networks to accelerate dissemination beyond what conventional media could achieve. Unlike traditional propaganda, which delivers a uniform message to all recipients, slopaganda can be micro-targeted — tailored to individuals based on estimated prior beliefs to reinforce political biases or emotional associations. The authors note that it need not aim at literal deception: much slopaganda is expressive rather than truth-apt, designed to create emotional associations rather than false factual beliefs. == Relation to AI slop == Slopaganda is a subset of AI slop — low-quality, mass-produced AI-generated content — distinguished by intent. Where AI slop may be produced indifferently for commercial or engagement-farming purposes, slopaganda is deployed with a deliberate political or ideological goal. == Notable examples == Examples discussed by the term's originators include Donald Trump's prolific use of AI in Truth Social posts and Iranian Lego-themed music videos. AI-generated videos posted by the White House mixing real military footage with clips from films and video games; and deepfake audio imitating political candidates during the 2024 US presidential campaign have also been given the label slopaganda.

    Read more →
  • Cross-validation (statistics)

    Cross-validation (statistics)

    Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr

    Read more →