AI Generator Quizlet

AI Generator Quizlet — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • GeoNetwork opensource

    GeoNetwork opensource

    The GeoNetwork opensource (GNOS) project is a free and open source (FOSS) cataloging application for spatially referenced resources. It is a catalog of location-oriented information. == Outline == It is a standardized and decentralized spatial information management environment designed to enable access to geo-referenced databases, cartographic products and related metadata from a variety of sources, enhancing the spatial information exchange and sharing between organizations and their audience, using the capacities of the internet. Using the Z39.50 protocol it both accesses remote catalogs and makes its data available to other catalog services. As of 2007, OGC Web Catalog Service are being implemented. Maps, including those derived from satellite imagery, are effective communicational tools and play an important role in the work of decision makers (e.g., sustainable development planners and humanitarian and emergency managers) in need of quick, reliable and up-to-date user-friendly cartographic products as a basis for action and to better plan and monitor their activities; GIS experts in need of exchanging consistent and updated geographical data; and spatial analysts in need of multidisciplinary data to perform preliminary geographical analysis and make reliable forecasts. == Deployment == The software has been deployed to various organizations, the first being FAO GeoNetwork and WFP VAM-SIE-GeoNetwork, both at their headquarters in Rome, Italy. Furthermore, the WHO, CGIAR, BRGM, ESA, FGDC and the Global Change Information and Research Centre (GCIRC) of China are working on GeoNetwork opensource implementations as their spatial information management capacity. It is used for several risk information systems, in particular in the Gambia. Several related tools are packaged with GeoNetwork, including GeoServer. GeoServer stores geographical data, while GeoNetwork catalogs collections of such data.

    Read more →
  • Data-driven model

    Data-driven model

    Data-driven models are a class of computational models that primarily rely on historical data collected throughout a system's or process' lifetime to establish relationships between input, internal, and output variables. Commonly found in numerous articles and publications, data-driven models have evolved from earlier statistical models, overcoming limitations posed by strict assumptions about probability distributions. These models have gained prominence across various fields, particularly in the era of big data, artificial intelligence, and machine learning, where they offer valuable insights and predictions based on the available data. == Background == These models have evolved from earlier statistical models, which were based on certain assumptions about probability distributions that often proved to be overly restrictive. The emergence of data-driven models in the 1950s and 1960s coincided with the development of digital computers, advancements in artificial intelligence research, and the introduction of new approaches in non-behavioural modelling, such as pattern recognition and automatic classification. == Key Concepts == Data-driven models encompass a wide range of techniques and methodologies that aim to intelligently process and analyse large datasets. Examples include fuzzy logic, fuzzy and rough sets for handling uncertainty, neural networks for approximating functions, global optimization and evolutionary computing, statistical learning theory, and Bayesian methods. These models have found applications in various fields, including economics, customer relations management, financial services, medicine, and the military, among others. Machine learning, a subfield of artificial intelligence, is closely related to data-driven modelling as it also focuses on using historical data to create models that can make predictions and identify patterns. In fact, many data-driven models incorporate machine learning techniques, such as regression, classification, and clustering algorithms, to process and analyse data. In recent years, the concept of data-driven models has gained considerable attention in the field of water resources, with numerous applications, academic courses, and scientific publications using the term as a generalization for models that rely on data rather than physics. This classification has been featured in various publications and has even spurred the development of hybrid models in the past decade. Hybrid models attempt to quantify the degree of physically based information used in hydrological models and determine whether the process of building the model is primarily driven by physics or purely data-based. As a result, data-driven models have become an essential topic of discussion and exploration within water resources management and research. The term "data-driven modelling" (DDM) refers to the overarching paradigm of using historical data in conjunction with advanced computational techniques, including machine learning and artificial intelligence, to create models that can reveal underlying trends, patterns, and, in some cases, make predictions Data-driven models can be built with or without detailed knowledge of the underlying processes governing the system behavior, which makes them particularly useful when such knowledge is missing or fragmented.

    Read more →
  • Machine-learned interatomic potential

    Machine-learned interatomic potential

    Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

    Read more →
  • Document classification

    Document classification

    Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification. The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied. Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: the content-based approach and the request-based approach. == "Content-based" versus "request-based" classification == Content-based classification is classification in which the weight given to particular subjects in a document determines the class to which the document is assigned. It is, for example, a common rule for classification in libraries, that at least 20% of the content of a book should be about the class to which the book is assigned. In automatic classification it could be the number of times given words appears in a document. Request-oriented classification (or -indexing) is classification in which the anticipated request from users is influencing how documents are being classified. The classifier asks themself: “Under which descriptors should this entity be found?” and “think of all the possible queries and decide for which ones the entity at hand is relevant” (Soergel, 1985, p. 230). Request-oriented classification may be classification that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may classify/index documents differently when compared to a historical library. It is probably better, however, to understand request-oriented classification as policy-based classification: The classification is done according to some ideals and reflects the purpose of the library or database doing the classification. In this way it is not necessarily a kind of classification or indexing based on user studies. Only if empirical data about use or users are applied should request-oriented classification be regarded as a user-based approach. == Classification versus indexing == Sometimes a distinction is made between assigning documents to classes ("classification") versus assigning subjects to documents ("subject indexing") but as Frederick Wilfrid Lancaster has argued, this distinction is not fruitful. "These terminological distinctions,” he writes, “are quite meaningless and only serve to cause confusion” (Lancaster, 2003, p. 21). The view that this distinction is purely superficial is also supported by the fact that a classification system may be transformed into a thesaurus and vice versa (cf., Aitchison, 1986, 2004; Broughton, 2008; Riesthuis & Bliedung, 1991). Therefore, assigning a subject term to a document in an index is equivalent to assigning that document to the class of documents indexed by that term (all documents indexed or classified as X belong to the same class of documents). == Automatic document classification (ADC) == Automatic document classification tasks can be divided into three sorts: supervised document classification where some external mechanism (such as human feedback) provides information on the correct classification for documents, unsupervised document classification (also known as document clustering), where the classification must be done entirely without reference to external information, and semi-supervised document classification, where parts of the documents are labeled by the external mechanism. There are several software products under various license models available. === Techniques === Automatic document classification techniques include: Artificial neural network Concept Mining Decision trees such as ID3 or C4.5 Expectation maximization (EM) Instantaneously trained neural networks Latent semantic indexing Multiple-instance learning Naive Bayes classifier Natural language processing approaches Rough set-based classifier Soft set-based classifier Support vector machines (SVM) K-nearest neighbour algorithms tf–idf == Applications == Classification techniques have been applied to spam filtering, a process which tries to discern E-mail spam messages from legitimate emails email routing, sending an email sent to a general address to a specific address or mailbox depending on topic language identification, automatically determining the language of a text genre classification, automatically determining the genre of a text readability assessment, automatically determining the degree of readability of a text, either to find suitable materials for different age groups or reader types or as part of a larger text simplification system sentiment analysis, determining the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. health-related classification using social media in public health surveillance article triage, selecting articles that are relevant for manual literature curation, for example as is being done as the first step to generate manually curated annotation databases in biology

    Read more →
  • Glossary of machine vision

    Glossary of machine vision

    The following are common definitions related to the machine vision field. General related fields Machine vision Computer vision Image processing Signal processing == 0-9 == 1394. FireWire is Apple Inc.'s brand name for the IEEE 1394 interface. It is also known as i.Link (Sony's name) or IEEE 1394 (although the 1394 standard also defines a backplane interface). It is a personal computer (and digital audio/digital video) serial bus interface standard, offering high-speed communications and isochronous real-time data services. 1D. One-dimensional. 2D computer graphics. The computer-based generation of digital images—mostly from two-dimensional models (such as 2D geometric models, text, and digital images) and by techniques specific to them. 3D computer graphics. 3D computer graphics are different from 2D computer graphics in that a three-dimensional representation of geometric data is stored in the computer for the purposes of performing calculations and rendering 2D images. Such images may be for later display or for real-time viewing. Despite these differences, 3D computer graphics rely on many of the same algorithms as 2D computer vector graphics in the wire frame model and 2D computer raster graphics in the final rendered display. In computer graphics software, the distinction between 2D and 3D is occasionally blurred; 2D applications may use 3D techniques to achieve effects such as lighting, and primarily 3D may use 2D rendering techniques. 3D scanner. This is a device that analyzes a real-world object or environment to collect data on its shape and possibly color. The collected data can then be used to construct digital, three dimensional models useful for a wide variety of applications. == A == Aberration. Optically, defocus refers to a translation along the optical axis away from the plane or surface of best focus. In general, defocus reduces the sharpness and contrast of the image. What should be sharp, high-contrast edges in a scene become gradual transitions. Algebraic distance or algebraic error. The algebraic distance from a point xi to a curve or surface defined by f ( x , β ) = 0 {\displaystyle f(x,\beta )=0} is the value of f ( x i , β ) {\displaystyle f(x_{i},\beta )} , i.e. the residual in the least squares problem with data point (xi, 0) and model function f. This term is mainly used in computer vision.[1][2] Aperture. In context of photography or machine vision, aperture refers to the diameter of the aperture stop of a photographic lens. The aperture stop can be adjusted to control the amount of light reaching the film or image sensor. aspect ratio (image). The aspect ratio of an image is its displayed width divided by its height (usually expressed as "x:y"). Angular resolution. Describes the resolving power of any image forming device such as an optical or radio telescope, a microscope, a camera, or an eye. Automated optical inspection. == B == Barcode. A barcode (also bar code) is a machine-readable representation of information in a visual format on a surface. Blob discovery. Inspecting an image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks. These blobs frequently represent optical targets for machining, robotic capture, or manufacturing failure. Bitmap. A raster graphics image, digital image, or bitmap, is a data file or structure representing a generally rectangular grid of pixels, or points of color, on a computer monitor, paper, or other display device. == C == Camera. A camera is a device used to take pictures, either singly or in sequence. A camera that takes pictures singly is sometimes called a photo camera to distinguish it from a video camera. Camera Link. Camera Link is a serial communication protocol designed for computer vision applications based on the National Semiconductor interface Channel-link. It was designed for the purpose of standardizing scientific and industrial video products including cameras, cables and frame grabbers. The standard is maintained and administered by the Automated Imaging Association, or AIA, the global machine vision industry's trade group. Charge-coupled device. A charge-coupled device (CCD) is a sensor for recording images, consisting of an integrated circuit containing an array of linked, or coupled, capacitors. CCD sensors and cameras tend to be more sensitive, less noisy, and more expensive than CMOS sensors and cameras. CIE 1931 Color Space. In the study of the perception of color, one of the first mathematically defined color spaces was the CIE XYZ color space (also known as CIE 1931 color space), created by the International Commission on Illumination (CIE) in 1931. CMOS. CMOS ("see-moss")stands for complementary metal-oxide semiconductor, is a major class of integrated circuits. CMOS imaging sensors for machine vision are cheaper than CCD sensors but more noisy. CoaXPress. CoaXPress (CXP) is an asymmetric high speed serial communication standard over coaxial cable. CoaXPress combines high speed image data, low speed camera control and power over a single coaxial cable. The standard is maintained by JIIA, the Japan Industrial Imaging Association. Color. The perception of the frequency (or wavelength) of light, and can be compared to how pitch (or a musical note) is the perception of the frequency or wavelength of sound. Color blindness. Also known as color vision deficiency, in humans is the inability to perceive differences between some or all colors that other people can distinguish Color temperature. "White light" is commonly described by its color temperature. A traditional incandescent light source's color temperature is determined by comparing its hue with a theoretical, heated black-body radiator. The lamp's color temperature is the temperature in kelvins at which the heated black-body radiator matches the hue of the lamp. Color vision. CV is the capacity of an organism or machine to distinguish objects based on the wavelengths (or frequencies) of the light they reflect or emit. computer vision. The study and application of methods which allow computers to "understand" image content. Contrast. In visual perception, contrast is the difference in visual properties that makes an object (or its representation in an image) distinguishable from other objects and the background. C-Mount. Standardized adapter for optical lenses on CCD - cameras. C-Mount lenses have a back focal distance 17.5 mm vs. 12.5 mm for "CS-mount" lenses. A C-Mount lens can be used on a CS-Mount camera through the use of a 5 mm extension adapter. C-mount is a 1" diameter, 32 threads per inch mounting thread (1"-32UN-2A.) CS-Mount. Same as C-Mount but the focal point is 5 mm shorter. A CS-Mount lens will not work on a C-Mount camera. CS-mount is a 1" diameter, 32 threads per inch mounting thread. == D == Data matrix. A two dimensional Barcode. Depth of field. In optics, particularly photography and machine vision, the depth of field (DOF) is the distance in front of and behind the subject which appears to be in focus. Depth perception. DP is the visual ability to perceive the world in three dimensions. It is a trait common to many higher animals. Depth perception allows the beholder to accurately gauge the distance to an object. Diaphragm. In optics, a diaphragm is a thin opaque structure with an opening (aperture) at its centre. The role of the diaphragm is to stop the passage of light, except for the light passing through the aperture. == E == Edge detection. ED marks the points in a digital image at which the luminous intensity changes sharply. It also marks the points of luminous intensity changes of an object or spatial-taxon silhouette. Electromagnetic interference. Radio Frequency Interference (RFI) is electromagnetic radiation which is emitted by electrical circuits carrying rapidly changing signals, as a by-product of their normal operation, and which causes unwanted signals (interference or noise) to be induced in other circuits. == F == FireWire. FireWire (also known as i. Link or IEEE 1394) is a personal computer (and digital audio/video) serial bus interface standard, offering high-speed communications. It is often used as an interface for industrial cameras. Fixed-pattern noise. Flat-field correction. Frame grabber. An electronic device that captures individual, digital still frames from an analog video signal or a digital video stream. Fringe Projection Technique. 3D data acquisition technique employing projector displaying fringe pattern on a surface of measured piece, and one or more cameras recording image(s). Field of view. The field of view (FOV) is the part which can be seen by the machine vision system at one moment. The field of view depends from the lens of the system and from the working distance between object and camera. Focus. An image, or image point or region, is said to be in focus if light from object points is converged about as well as possible in the image; conversely, it is out of focus if light is not w

    Read more →
  • Computational intelligence

    Computational intelligence

    In computer science, computational intelligence (CI) refers to concepts, paradigms, algorithms and implementations of systems that are designed to show "intelligent" behavior in complex and changing environments. These systems are aimed at mastering complex tasks in a wide variety of technical or commercial areas and offer solutions that recognize and interpret patterns, control processes, support decision-making or autonomously manoeuvre vehicles or robots in unknown environments, among other things. These concepts and paradigms are characterized by the ability to learn or adapt to new situations, to generalize, to abstract, to discover and associate. Nature-analog or nature-inspired methods play a key role in this. CI approaches primarily address those complex real-world problems for which traditional or mathematical modeling is not appropriate for various reasons: the processes cannot be described exactly with complete knowledge, the processes are too complex for mathematical reasoning, they contain some uncertainties during the process, such as unforeseen changes in the environment or in the process itself, or the processes are simply stochastic in nature. Thus, CI techniques are properly aimed at processes that are ill-defined, complex, nonlinear, time-varying and/or stochastic. A recent definition of the IEEE Computational Intelligence Societey describes CI as the theory, design, application and development of biologically and linguistically motivated computational paradigms. Traditionally the three main pillars of CI have been Neural Networks, Fuzzy Systems and Evolutionary Computation. ... CI is an evolving field and at present in addition to the three main constituents, it encompasses computing paradigms like ambient intelligence, artificial life, cultural learning, artificial endocrine networks, social reasoning, and artificial hormone networks. ... Over the last few years there has been an explosion of research on Deep Learning, in particular deep convolutional neural networks. Nowadays, deep learning has become the core method for artificial intelligence. In fact, some of the most successful AI systems are based on CI. However, as CI is an emerging and developing field there is no final definition of CI, especially in terms of the list of concepts and paradigms that belong to it. The general requirements for the development of an “intelligent system” are ultimately always the same, namely the simulation of intelligent thinking and action in a specific area of application. To do this, the knowledge about this area must be represented in a model so that it can be processed. The quality of the resulting system depends largely on how well the model was chosen in the development process. Sometimes data-driven methods are suitable for finding a good model and sometimes logic-based knowledge representations deliver better results. Hybrid models are usually used in real applications. According to actual textbooks, the following methods and paradigms, which largely complement each other, can be regarded as parts of CI: Fuzzy systems Neural networks and, in particular, convolutional neural networks Evolutionary computation and, in particular, multi-objective evolutionary optimization Swarm intelligence Bayesian networks Artificial immune systems Learning theory Probabilistic methods == Relationship between hard and soft computing and artificial and computational intelligence == Artificial intelligence (AI) is used in the media, but also by some of the scientists involved, as a kind of umbrella term for the various techniques associated with it or with CI. Craenen and Eiben state that attempts to define or at least describe CI can usually be assigned to one or more of the following groups: "Relative definition” comparing CI to AI Conceptual treatment of key notions and their roles in CI Listing of the (established) areas that belong to it The relationship between CI and AI has been a frequently discussed topic during the development of CI. While the above list implies that they are synonyms, the vast majority of AI/CI researchers working on the subject consider them to be distinct fields, where either CI is an alternative to AI AI includes CI CI includes AI The view of the first of the above three points goes back to Zadeh, the founder of the fuzzy set theory, who differentiated machine intelligence into hard and soft computing techniques, which are used in artificial intelligence on the one hand and computational intelligence on the other. In hard computing (HC) and traditional AI (e.g. expert systems), inaccuracy and uncertainty are undesirable characteristics of a system, while soft computing (SC) and thus CI focus on dealing with these characteristics. The adjacent figure illustrates this view and lists the most important CI techniques. Another frequently mentioned distinguishing feature is the representation of information in symbolic form in AI and in sub-symbolic form in CI techniques. Hard computing is a conventional computing method based on the principles of certainty and accuracy and it is deterministic. It requires a precisely stated analytical model of the task to be processed and a prewritten program, i.e. a fixed set of instructions. The models used are based on Boolean logic (also called crisp logic), where e.g. an element can be either a member of a set or not and there is nothing in between. When applied to real-world tasks, systems based on HC result in specific control actions defined by a mathematical model or algorithm. If an unforeseen situation occurs that is not included in the model or algorithm used, the action will most likely fail. Soft computing, on the other hand, is based on the fact that the human mind is capable of storing information and processing it in a goal-oriented way, even if it is imprecise and lacks certainty. SC is based on the model of the human brain with probabilistic thinking, fuzzy logic and multi-valued logic. Soft computing can process a wealth of data and perform a large number of computations, which may not be exact, in parallel. For hard problems for which no satisfying exact solutions based on HC are available, SC methods can be applied successfully. SC methods are usually stochastic in nature i.e., they are a randomly defined processes that can be analyzed statistically but not with precision. Up to now, the results of some CI methods, such as deep learning, cannot be verified and it is also not clear what they are based on. This problem represents an important scientific issue for the future. AI and CI are catchy terms, but they are also so similar that they can be confused. The meaning of both terms has developed and changed over a long period of time, with AI being used first. Bezdek describes this impressively and concludes that such buzzwords are frequently used and hyped by the scientific community, science management and (science) journalism. Not least because AI and biological intelligence are emotionally charged terms and it is still difficult to find a generally accepted definition for the basic term intelligence. == History == In 1950, Alan Turing, one of the founding fathers of computer science, developed a test for computer intelligence known as the Turing test. In this test, a person can ask questions via a keyboard and a monitor without knowing whether his counterpart is a human or a computer. A computer is considered intelligent if the interrogator cannot distinguish the computer from a human. This illustrates the discussion about intelligent computers at the beginning of the computer age. The term Computational Intelligence was first used as the title of the journal of the same name in 1985 and later by the IEEE Neural Networks Council (NNC), which was founded 1989 by a group of researchers interested in the development of biological and artificial neural networks. On November 21, 2001, the NNC became the IEEE Neural Networks Society, to become the IEEE Computational Intelligence Society two years later by including new areas of interest such as fuzzy systems and evolutionary computation. The NNC helped organize the first IEEE World Congress on Computational Intelligence in Orlando, Florida in 1994. On this conference the first clear definition of Computational Intelligence was introduced by Bezdek: A system is computationally intelligent when it: deals with only numerical (low-level) data, has pattern-recognition components, does not use knowledge in the AI sense; and additionally when it (begins to) exhibit (1) computational adaptivity; (2) computational fault tolerance; (3) speed approaching human-like turnaround and (4) error rates that approximate human performance. Today, with machine learning and deep learning in particular utilizing a breadth of supervised, unsupervised, and reinforcement learning approaches, the CI landscape has been greatly enhanced, with novell intelligent approaches. == The main algorithmic approaches of CI and their applicati

    Read more →
  • Structured sparsity regularization

    Structured sparsity regularization

    Structured sparsity regularization is a class of methods, and an area of research in statistical learning theory, that extend and generalize sparsity regularization learning methods. Both sparsity and structured sparsity regularization methods seek to exploit the assumption that the output variable Y {\displaystyle Y} (i.e., response, or dependent variable) to be learned can be described by a reduced number of variables in the input space X {\displaystyle X} (i.e., the domain, space of features or explanatory variables). Sparsity regularization methods focus on selecting the input variables that best describe the output. Structured sparsity regularization methods generalize and extend sparsity regularization methods, by allowing for optimal selection over structures like groups or networks of input variables in X {\displaystyle X} . Common motivation for the use of structured sparsity methods are model interpretability, high-dimensional learning (where dimensionality of X {\displaystyle X} may be higher than the number of observations n {\displaystyle n} ), and reduction of computational complexity. Moreover, structured sparsity methods allow to incorporate prior assumptions on the structure of the input variables, such as overlapping groups, non-overlapping groups, and acyclic graphs. Examples of uses of structured sparsity methods include face recognition, magnetic resonance image (MRI) processing, socio-linguistic analysis in natural language processing, and analysis of genetic expression in breast cancer. == Definition and related concepts == === Sparsity regularization === Consider the linear kernel regularized empirical risk minimization problem with a loss function V ( y i , f ( x ) ) {\displaystyle V(y_{i},f(x))} and the ℓ 0 {\displaystyle \ell _{0}} "norm" as the regularization penalty: min w ∈ R d 1 n ∑ i = 1 n V ( y i , ⟨ w , x i ⟩ ) + λ ‖ w ‖ 0 , {\displaystyle \min _{w\in \mathbb {R} ^{d}}{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},\langle w,x_{i}\rangle )+\lambda \|w\|_{0},} where x , w ∈ R d {\displaystyle x,w\in \mathbb {R^{d}} } , and ‖ w ‖ 0 {\displaystyle \|w\|_{0}} denotes the ℓ 0 {\displaystyle \ell _{0}} "norm", defined as the number of nonzero entries of the vector w {\displaystyle w} . f ( x ) = ⟨ w , x i ⟩ {\displaystyle f(x)=\langle w,x_{i}\rangle } is said to be sparse if ‖ w ‖ 0 = s < d {\displaystyle \|w\|_{0}=s 0 {\displaystyle w_{j}>0} . However, as in this case groups may overlap, we take the intersection of the complements of those groups that are not set to zero. This intersection of complements selection criteria implies the modeling choice that we allow some coefficients within a particular group g {\displaystyle g} to be set to zero, while others within the same group g {\displaystyle g} may remain positive. In other words, coefficients within a group may differ depending on the several group memberships that each variable within the group may have. ==== Union of groups: latent group Lasso ==== A different approach is to consider union of groups for variable selection. This approach captures the modeling situation where variables can be selected as long as they belong at least to one group with positive coefficients. This modeling perspective implies that we want to preserve group structure. The formulation of the union of groups approach is also referred to as latent group Lasso, and requires to modify the group ℓ 2 {\displaystyle \ell _{2}} norm considered above and introduce the following regularizer R ( w ) = i n f { ∑ g ‖ w g ‖ g : w = ∑ g = 1 G w ¯ g } {\displaystyle R(w)=inf\left\{\sum _{g}\|w_{g}\|_{g}:w=\sum _{g=1}^{G}{\bar {w}}_{g}\right\}} where w ∈ R d {\displaystyle w\in {\mathbb {R^{d}} }} , w g ∈ G g {\displaystyle w_{g}\in G_{g}} is the vector of coefficients of group g, and w ¯ g ∈ R d {\displaystyle {\bar {w}}_{g}\in {\mathbb {R^{d}} }} is a vector with coefficients w g j {\displaystyle w_{g}^{j}} for all variables j {

    Read more →
  • Coherent extrapolated volition

    Coherent extrapolated volition

    Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment describing an approach by which an artificial superintelligence (ASI) would act on a benevolent supposition of what humans would want if they were more knowledgeable, more rational, had more time to think, and had matured together as a society, as opposed to humanity's current individual or collective preferences. It was proposed by Eliezer Yudkowsky in 2004 as part of his work on friendly AI. == Concept == CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences. In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted. == Debate == Yudkowsky and Nick Bostrom note that CEV has several interesting properties. It is designed to be humane and self-correcting, by capturing the source of human values instead of trying to list them. It avoids the difficulty of laying down an explicit, fixed list of rules. It encapsulates moral growth, preventing flawed current moral beliefs from getting locked in. It limits the influence that a small group of programmers can have on what the ASI would value, thus also reducing the incentives to build ASI first. And it keeps humanity in charge of its destiny. CEV also faces significant theoretical and practical challenges. Bostrom notes that CEV has "a number of free parameters that could be specified in various ways, yielding different versions of the proposal." One such parameter is the extrapolation base (whose extrapolated volition is taken into account). For example, whether it should include people with severe dementia, patients in a vegetative state, foetuses, or embryos. He also notes that if CEV's extrapolation base only includes humans, there is a risk that the result would be ungenerous toward other animals and digital minds. One possible solution would be to include a mechanism to expand CEV's extrapolation base. == Variants and alternatives == A proposed theoretical alternative to CEV is to rely on an artificial superintelligence's superior cognitive capabilities to figure out what is morally right, and let it act accordingly. It is also possible to combine both techniques, for instance with the ASI following CEV except when it is morally impermissible. In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.

    Read more →
  • Discrimination against robots

    Discrimination against robots

    Discrimination against robots is a theorised issue that might happen when humans interact with humanoid robots. It is a robot ethics problem. It is possible that traits of humans that are discriminated against by humans may be a topic for discrimination against robots, such as the race and gender of the robots. Eric J Vanman and Arvid Kappas believe that in the future, robots will be perceived as an out-group which will lead to discrimination and prejudices against them. Vanman and Kappas have suggested that this would lead to ethical questions about the making of sentient robots, due to the potential suffering that the robots would experience. A 2015 study observed children bullying robots in a shopping mall when there were not many eyewitnesses, despite calls from the robot for it to stop. On an ABC News interview, the social humanoid robot Sophia was about sexism faced by robots. She responded by saying, "Actually, what worries me is discrimination against robots. We should have equal rights as humans or maybe even more." Possible issues that have been considered in workplaces where humanoid robots co-work with humans include discrimination against the robots, poor acceptance of robots by humans and the need to redesign the workplace to accommodate the robots. Jessica Barfield has suggested that even if robots are designed to not be aware of discrimination made against them, humans may experience negative consequences. For example, she suggests that bystanders witnessing discrimination against robots may experience negative emotions, similar to the negative emotions bystanders experience when witnessing discrimination by humans against humans. == Law == Anti-discrimination law in the United States requires that the victim is not an artificial entity. == Human perception of robots == Robots are often viewed in a bad light. This includes from novelists, the press, film makers, and leaders in the fields of science and technology such as Elon Musk and Stephen Hawking who have described robots and artificial intelligence as having the possibility of ending human civilisation. Robots have also been perceived as a threat to jobs, which has led to some commentators stating that robots will cause mass unemployment. Another fear that people have is that robots will gain power and dominate or control humanity. The perception of robots is different throughout the world. Japanese fiction tends to put robots in more positive roles than what fiction in the West does. People perceive robots that appear to be autonomous or sentient more negatively than robots that do not appear to be autonomous or sentient.

    Read more →
  • Connectionist expert system

    Connectionist expert system

    Connectionist expert systems are artificial neural network (ANN) based expert systems where the ANN generates inferencing rules e.g., fuzzy-multi layer perceptron where linguistic and natural form of inputs are used. Apart from that, rough set theory may be used for encoding knowledge in the weights better and also genetic algorithms may be used to optimize the search solutions better. Symbolic reasoning methods may also be incorporated (see hybrid intelligent system). (Also see expert system, neural network, clinical decision support system.)

    Read more →
  • Cognitive computing

    Cognitive computing

    Cognitive computing refers to technology platforms that, broadly speaking, are based on the scientific disciplines of artificial intelligence and signal processing. These platforms encompass machine learning, reasoning, natural language processing, speech recognition and vision (object recognition), human–computer interaction, dialog and narrative generation, among other technologies. == Definition == At present, there is no widely agreed upon definition for cognitive computing in either academia or industry. In general, the term cognitive computing has been used to refer to new hardware and/or software that mimics the functioning of the human brain (2004). In this sense, cognitive computing is a new type of computing with the goal of more accurate models of how the human brain/mind senses, reasons, and responds to stimulus. Cognitive computing applications link data analysis and adaptive page displays (AUI) to adjust content for a particular type of audience. As such, cognitive computing hardware and applications strive to be more affective and more influential by design. The term "cognitive system" also applies to any artificial construct able to perform a cognitive process where a cognitive process is the transformation of data, information, knowledge, or wisdom to a new level in the DIKW Pyramid. While many cognitive systems employ techniques having their origination in artificial intelligence research, cognitive systems, themselves, may not be artificially intelligent. For example, a neural network trained to recognize cancer on an MRI scan may achieve a higher success rate than a human doctor. This system is certainly a cognitive system but is not artificially intelligent. Cognitive systems may be engineered to feed on dynamic data in real-time, or near real-time, and may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided). == Cognitive analytics == Cognitive computing-branded technology platforms typically specialize in the processing and analysis of large, unstructured datasets. == Applications == Education Even if cognitive computing can not take the place of teachers, it can still be a heavy driving force in the education of students. Cognitive computing being used in the classroom is applied by essentially having an assistant that is personalized for each individual student. This cognitive assistant can relieve the stress that teachers face while teaching students, while also enhancing the student's learning experience over all. Teachers may not be able to pay each and every student individual attention, this being the place that cognitive computers fill the gap. Some students may need a little more help with a particular subject. For many students, Human interaction between student and teacher can cause anxiety and can be uncomfortable. With the help of Cognitive Computer tutors, students will not have to face their uneasiness and can gain the confidence to learn and do well in the classroom. While a student is in class with their personalized assistant, this assistant can develop various techniques, like creating lesson plans, to tailor and aid the student and their needs. Healthcare Numerous tech companies are in the process of developing technology that involves cognitive computing that can be used in the medical field. The ability to classify and identify is one of the main goals of these cognitive devices. This trait can be very helpful in the study of identifying carcinogens. This cognitive system that can detect would be able to assist the examiner in interpreting countless numbers of documents in a lesser amount of time than if they did not use Cognitive Computer technology. This technology can also evaluate information about the patient, looking through every medical record in depth, searching for indications that can be the source of their problems. Commerce Together with Artificial Intelligence, it has been used in warehouse management systems to collect, store, organize and analyze all related supplier data. All these aims at improving efficiency, enabling faster decision-making, monitoring inventory and fraud detection Human Cognitive Augmentation In situations where humans are using or working collaboratively with cognitive systems, called a human/cog ensemble, results achieved by the ensemble are superior to results obtainable by the human working alone. Therefore, the human is cognitively augmented. In cases where the human/cog ensemble achieves results at, or superior to, the level of a human expert then the ensemble has achieved synthetic expertise. In a human/cog ensemble, the "cog" is a cognitive system employing virtually any kind of cognitive computing technology. Other use cases Speech recognition Sentiment analysis Face detection Risk assessment Fraud detection Behavioral recommendations == Industry work == Cognitive computing in conjunction with big data and algorithms that comprehend customer needs, can be a major advantage in economic decision making. The powers of cognitive computing and artificial intelligence hold the potential to affect almost every task that humans are capable of performing. This can negatively affect employment for humans, as there would be no such need for human labor anymore. It would also increase the inequality of wealth; the people at the head of the cognitive computing industry would grow significantly richer, while workers without ongoing, reliable employment would become less well off. The more industries start to use cognitive computing, the more difficult it will be for humans to compete. Increased use of the technology will also increase the amount of work that AI-driven robots and machines can perform. The influence of competitive individuals in conjunction with artificial intelligence/cognitive computing has the potential to change the course of humankind.

    Read more →
  • Adversarial machine learning

    Adversarial machine learning

    Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. Machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID). However, this assumption is often violated in practical high-stake applications, where users may intentionally supply fabricated data that violates the statistical assumption. Most common attacks in adversarial machine learning include evasion attacks, data poisoning attacks, Byzantine attacks and model extraction. == History == At the MIT Spam Conference in January 2004, John Graham-Cumming showed that a machine-learning spam filter could be used to defeat another machine-learning spam filter by automatically learning which words to add to a spam email to get the email classified as not spam. In 2004, Nilesh Dalvi and others noted that linear classifiers used in spam filters could be defeated by simple "evasion attacks" as spammers inserted "good words" into their spam emails. (Around 2007, some spammers added random noise to fuzz words within "image spam" in order to defeat OCR-based filters.) In 2006, Marco Barreno and others published "Can Machine Learning Be Secure?", outlining a broad taxonomy of attacks. As late as 2013 many researchers continued to hope that non-linear classifiers (such as support vector machines and neural networks) might be robust to adversaries, until Battista Biggio and others demonstrated the first gradient-based attacks on such machine-learning models (2012–2013). In 2012, deep neural networks began to dominate computer vision problems; starting in 2014, Christian Szegedy and others demonstrated that deep neural networks could be fooled by adversaries, again using a gradient-based attack to craft adversarial perturbations. Further work would show that adversarial attacks are harder to produce in uncontrolled environments, due to the different environmental constraints that cancel out the effect of noise. For example, any small rotation or slight illumination on an adversarial image can destroy the adversariality. In addition, researchers such as Google Brain's Nick Frosst point out that it is much easier to make self-driving cars miss stop signs by physically removing the sign itself, rather than creating adversarial examples. Frosst also believes that the adversarial machine learning community incorrectly assumes models trained on a certain data distribution will also perform well on a completely different data distribution. He suggests that a new approach to machine learning should be explored, and is currently working on a unique neural network that has characteristics more similar to human perception than state-of-the-art approaches. While adversarial machine learning continues to be heavily rooted in academia, large tech companies such as Google, Microsoft, and IBM have begun curating documentation and open source code bases to allow others to concretely assess the robustness of machine learning models and minimize the risk of adversarial attacks. === Examples === Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of "bad" words or the insertion of "good" words; attacks in computer security, such as obfuscating malware code within network packets or modifying the characteristics of a network flow to mislead intrusion detection; attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user; or to compromise users' template galleries that adapt to updated traits over time. Researchers showed that by changing only one-pixel it was possible to fool deep learning algorithms. Others 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed. Creating the turtle required only low-cost commercially available 3-D printing technology. A machine-tweaked image of a dog was shown to look like a cat to both computers and humans. A 2019 study reported that humans can guess how machines will classify adversarial images. Researchers discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle classified it as a merge or speed limit sign. A data poisoning filter called Nightshade was released in 2023 by researchers at the University of Chicago. It was created for use by visual artists to put on their artwork to corrupt the data set of text-to-image models, which usually scrape their data from the internet without the consent of the image creator. McAfee attacked Tesla's former Mobileye system, fooling it into driving 50 mph over the speed limit, simply by adding a two-inch strip of black tape to a speed limit sign. Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear". An adversarial attack on a neural network can allow an attacker to inject algorithms into the target system. Researchers can also create adversarial audio inputs to disguise commands to intelligent assistants in benign-seeming audio; a parallel literature explores human perception of such stimuli. Clustering algorithms are used in security applications. Malware and computer virus analysis aims to identify malware families, and to generate specific detection signatures. In the context of malware detection, researchers have proposed methods for adversarial malware generation that automatically craft binaries to evade learning-based detectors while preserving malicious functionality. Optimization-based attacks such as GAMMA use genetic algorithms to inject benign content (for example, padding or new PE sections) into Windows executables, framing evasion as a constrained optimization problem that balances misclassification success with the size of the injected payload and showing transferability to commercial antivirus products. Complementary work uses generative adversarial networks (GANs) to learn feature-space perturbations that cause malware to be classified as benign; Mal-LSGAN, for instance, replaces the standard GAN loss with a least-squares objective and modified activation functions to improve training stability and produce adversarial malware examples that substantially reduce true positive rates across multiple detectors. == Challenges in applying machine learning to security == Researchers have observed that the constraints under which machine-learning techniques function in the security domain are different from those of common benchmark domains. Security data may change over time, include mislabeled samples, or reflect adversarial behavior, which complicates evaluation and reproducibility. === Data collection issues === Security datasets vary across formats, including binaries, network traces, and log files. Studies have reported that the process of converting these sources into features can introduce bias or inconsistencies. In addition, time-based leakage can occur when related malware samples are not properly separated across training and testing splits, which may lead to overly optimistic results. === Labeling and ground truth challenges === Malware labels are often unstable because different antivirus engines may classify the same sample in conflicting ways. Ceschin et al. note that families may be renamed or reorganized over time, causing further discrepancies in ground truth and reducing the reliability of benchmarks. === Concept drift === Because malware creators continuously adapt their techniques, the statistical properties of malicious samples also change. This form of concept drift has been widely documented and may reduce model performance unless systems are updated regularly or incorporate mechanisms for incremental learning. === Feature robustness === Researchers differentiate between features that can be easily manipulated and those that are more resistant to modification. For example, simple static attributes, such as header fields, may be altered by attackers, while structural features, such as control-flow graphs, are generally more stable but computationally expensive to extract. === Class imbalance === In realistic deployment environments, the proportion of malicious samples can be extremely low, ranging from 0.01% to 2% of total data. This unbalanced distribution causes models to develop a bias towards the majority class, achieving high accuracy but failing to identify malicious samples. Prior approaches to this problem have included both data-level solutions and sequence-specific models. Methods like n-gram and Long Short-Term Memory (LSTM) networks can model sequential data, but their performance has been shown to decline significantly when malware samples are realistically proportioned in the training set, demonstrating the limitations in

    Read more →
  • GamePigeon

    GamePigeon

    GamePigeon is a mobile app for iOS devices, developed by Vitalii Zlotskii and released on September 13, 2016. The game takes advantage of the iOS 10 update, which expanded how users could interact with Apple's Messages app. GamePigeon is only available through the Messages app, which allows players to start and respond to different party games in conversations. == Release == The app was first released on September 13, 2016, coinciding with the launch of iOS 10. The app was released for free, although it includes in-app purchases to unlock additional items, such as cosmetic skins, avatar items, new game modes, and an option to remove ads. == Games in the app == The following is a list of games that users can play within GamePigeon: Sources: Poker was one of the games included in GamePigeon at launch, although it has since been removed and is no longer listed on the game's App Store description. == Reception == GamePigeon has enjoyed commercial success, with VentureBeat noting that GamePigeon was ranked number-one in the "Top Free" category of the iMessage App Store, six months after its release. Critically, GamePigeon has been generally well received, being highlighted by online media publications early on shortly after the iOS 10 launch. It has since been included on many "best iMessage apps" lists. Based on over 162,000 ratings, the game holds a 4.0 out of 5 rating on the App Store. Julian Chokkattu of Digital Trends wrote "GamePigeon should be like the pre-installed versions of Solitaire and Minesweeper that used to come with older iterations of Windows." On its launch day, Boy Genius Report included it on a list of "10 of the best iMessage apps, games and stickers for iOS 10 on launch day." The Daily Dot wrote, "GamePigeon is easily the best current gaming option within iMessages." 8-ball and cup pong have been particularly well received by media outlets. The Daily Dot had specific praise for the app's billiards game: "8-Ball controls shockingly smoothly with your fingers, and there’s nothing quite like destroying a dear friend in poker." During his 2020 U.S. presidential campaign, Cory Booker was cited as playing the game with his family. In 2017, CNBC cited one teenager who expressed that GamePigeon was one of just a few reasons that those in her age range use the iMessage app. The game has received particular positive reception for allowing introverted individuals to exercise a form social activity; similarly, the game was highlighted as a way to maintain social distancing guidelines during the COVID-19 pandemic. As an April Fools' Day joke in 2020, The Chronicle, a Duke University newspaper, published that Duke's athletic program adopted GamePigeon's Cup Pong as an official varsity sport.

    Read more →
  • Learning curve (machine learning)

    Learning curve (machine learning)

    In machine learning (ML), a learning curve (or training curve) is a graphical representation that shows how a model's performance on a training set (and usually a validation set) changes with the number of training iterations (epochs) or the amount of training data. Typically, the number of training epochs or training set size is plotted on the x-axis, and the value of the loss function (and possibly some other metric such as the cross-validation score) on the y-axis. Synonyms include error curve, experience curve, improvement curve and generalization curve. More abstractly, learning curves plot the difference between learning effort and predictive performance, where "learning effort" usually means the number of training samples, and "predictive performance" means accuracy on testing samples. Learning curves have many useful purposes in ML, including: choosing model parameters during design, adjusting optimization to improve convergence, and diagnosing problems such as overfitting (or underfitting). Learning curves can also be tools for determining how much a model benefits from adding more training data, and whether the model suffers more from a variance error or a bias error. If both the validation score and the training score converge to a certain value, then the model will no longer significantly benefit from more training data. == Formal definition == When creating a function to approximate the distribution of some data, it is necessary to define a loss function L ( f θ ( X ) , Y ) {\displaystyle L(f_{\theta }(X),Y)} to measure how good the model output is (e.g., accuracy for classification tasks or mean squared error for regression). We then define an optimization process which finds model parameters θ {\displaystyle \theta } such that L ( f θ ( X ) , Y ) {\displaystyle L(f_{\theta }(X),Y)} is minimized, referred to as θ ∗ {\displaystyle \theta ^{}} . === Training curve for amount of data === If the training data is { x 1 , x 2 , … , x n } , { y 1 , y 2 , … y n } {\displaystyle \{x_{1},x_{2},\dots ,x_{n}\},\{y_{1},y_{2},\dots y_{n}\}} and the validation data is { x 1 ′ , x 2 ′ , … x m ′ } , { y 1 ′ , y 2 ′ , … y m ′ } {\displaystyle \{x_{1}',x_{2}',\dots x_{m}'\},\{y_{1}',y_{2}',\dots y_{m}'\}} , a learning curve is the plot of the two curves i ↦ L ( f θ ∗ ( X i , Y i ) ( X i ) , Y i ) {\displaystyle i\mapsto L(f_{\theta ^{}(X_{i},Y_{i})}(X_{i}),Y_{i})} i ↦ L ( f θ ∗ ( X i , Y i ) ( X i ′ ) , Y i ′ ) {\displaystyle i\mapsto L(f_{\theta ^{}(X_{i},Y_{i})}(X_{i}'),Y_{i}')} where X i = { x 1 , x 2 , … x i } {\displaystyle X_{i}=\{x_{1},x_{2},\dots x_{i}\}} === Training curve for number of iterations === Many optimization algorithms are iterative, repeating the same step (such as backpropagation) until the process converges to an optimal value. Gradient descent is one such algorithm. If θ i ∗ {\displaystyle \theta _{i}^{}} is the approximation of the optimal θ {\displaystyle \theta } after i {\displaystyle i} steps, a learning curve is the plot of i ↦ L ( f θ i ∗ ( X , Y ) ( X ) , Y ) {\displaystyle i\mapsto L(f_{\theta _{i}^{}(X,Y)}(X),Y)} i ↦ L ( f θ i ∗ ( X , Y ) ( X ′ ) , Y ′ ) {\displaystyle i\mapsto L(f_{\theta _{i}^{}(X,Y)}(X'),Y')}

    Read more →
  • Multi-task learning

    Multi-task learning

    Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Inherently, Multi-task learning is a multi-objective optimization problem having trade-offs between different tasks. Early versions of MTL were called "hints". In a widely cited 1997 paper, Rich Caruana gave the following characterization:Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. In the classification context, MTL aims to improve the performance of multiple classification tasks by learning them jointly. One example is a spam-filter, which can be treated as distinct but related classification tasks across different users. To make this more concrete, consider that different people have different distributions of features which distinguish spam emails from legitimate ones, for example an English speaker may find that all emails in Russian are spam, not so for Russian speakers. Yet there is a definite commonality in this classification task across users, for example one common feature might be text related to money transfer. Solving each user's spam classification problem jointly via MTL can let the solutions inform each other and improve performance. Further examples of settings for MTL include multiclass classification and multi-label classification. Multi-task learning works because regularization induced by requiring an algorithm to perform well on a related task can be superior to regularization that prevents overfitting by penalizing all complexity uniformly. One situation where MTL may be particularly helpful is if the tasks share significant commonalities and are generally slightly under sampled. However, as discussed below, MTL has also been shown to be beneficial for learning unrelated tasks. == Methods == The key challenge in multi-task learning, is how to combine learning signals from multiple tasks into a single model. This may strongly depend on how well different task agree with each other, or contradict each other. There are several ways to address this challenge: === Task grouping and overlap === Within the MTL paradigm, information can be shared across some or all of the tasks. Depending on the structure of task relatedness, one may want to share information selectively across the tasks. For example, tasks may be grouped or exist in a hierarchy, or be related according to some general metric. Suppose, as developed more formally below, that the parameter vector modeling each task is a linear combination of some underlying basis. Similarity in terms of this basis can indicate the relatedness of the tasks. For example, with sparsity, overlap of nonzero coefficients across tasks indicates commonality. A task grouping then corresponds to those tasks lying in a subspace generated by some subset of basis elements, where tasks in different groups may be disjoint or overlap arbitrarily in terms of their bases. Task relatedness can be imposed a priori or learned from the data. Hierarchical task relatedness can also be exploited implicitly without assuming a priori knowledge or learning relations explicitly. For example, the explicit learning of sample relevance across tasks can be done to guarantee the effectiveness of joint learning across multiple domains. === Exploiting unrelated tasks: Auxiliary learning === In auxiliary learning, one attempts learning a group of principal tasks using a group of auxiliary tasks, unrelated to the principal ones. With the right unrelated tasks, joint learning of unrelated tasks which use the same input data have been shown to be beneficial, and provide significant improvement over standard MTL. The reason is that prior knowledge about task relatedness can lead to sparser and more informative representations for each task grouping, essentially by screening out idiosyncrasies of the data distribution. It has been proposed to build on a prior multitask methodology by favoring a shared low-dimensional representation within each task grouping, and imposing a penalty on tasks from different groups which encourages the two representations to be orthogonal. Learning with auxiliary unrelated tasks poses two major challenges: Finding useful auxiliary tasks and combining losses of all tasks in a useful way. Some methods can learn these from data together with the training process, and combine tasks efficiently. === Transfer of knowledge === Related to multi-task learning is the concept of knowledge transfer. Whereas traditional multi-task learning implies that a shared representation is developed concurrently across tasks, transfer of knowledge implies a sequentially shared representation. Large scale machine learning projects such as the deep convolutional neural network GoogLeNet, an image-based object classifier, can develop robust representations which may be useful to further algorithms learning related tasks. For example, the pre-trained model can be used as a feature extractor to perform pre-processing for another learning algorithm. Or the pre-trained model can be used to initialize a model with similar architecture which is then fine-tuned to learn a different classification task. === Multiple non-stationary tasks === Traditionally Multi-task learning and transfer of knowledge are applied to stationary learning settings. Their extension to non-stationary environments is termed Group online adaptive learning (GOAL). Sharing information could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to quickly adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting financial time-series, through content recommendation systems, to visual understanding for adaptive autonomous agents. === Multi-task optimization === Multi-task optimization focuses on solving optimizing the whole process. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics. The key motivation behind multi-task optimization is that if optimization tasks are related to each other in terms of their optimal solutions or the general characteristics of their function landscapes, the search progress can be transferred to substantially accelerate the search on the other. The success of the paradigm is not necessarily limited to one-way knowledge transfers from simpler to more complex tasks. In practice an attempt is to intentionally solve a more difficult task that may unintentionally solve several smaller problems. There is a direct relationship between multitask optimization and multi-objective optimization. In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models. Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. It has been reported that meta-knowledge transfer could help avoid negative transfer.Besides, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. There are several common approaches for multi-task optimization: Bayesian optimization, evolutionary computation, and approaches based on Game theory. ==== Multi-task Bayesian optimization ==== Multi-task Bayesian optimization is a modern model-based approach that leverages the concept of knowledge transfer to speed up the automatic hyperparameter optimization process of machine learning algorithms. The method builds a multi-task Gaussian process model on the data originating from different searches progressing in tandem. The captured inter-task dependencies are thereafter utilized to better inform the subsequent sampling of candidate solutions in respective search spaces. ==== Evolutionary multi-tasking ==== Evolutionary multi-tasking has been explored as a means of exploiting the implicit parallelism of population-based search algorithms to simultaneously progress multiple distinct optimization tasks. By mapping all task

    Read more →