AI Code Meme

AI Code Meme — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Deconvolution

    Deconvolution

    In mathematics, deconvolution is the inverse of convolution. Both operations are used in signal processing and image processing. For example, it may be possible to recover the original signal after a filter (convolution) by using a deconvolution method with a certain degree of accuracy. Due to the measurement error of the recorded signal or image, it can be demonstrated that the worse the signal-to-noise ratio (SNR), the worse the reversing of a filter will be; hence, inverting a filter is not always a good solution as the error amplifies. Deconvolution offers a solution to this problem. The foundations for deconvolution and time-series analysis were largely laid by Norbert Wiener of the Massachusetts Institute of Technology in his book Extrapolation, Interpolation, and Smoothing of Stationary Time Series (1949). The book was based on work Wiener had done during World War II but that had been classified at the time. Some of the early attempts to apply these theories were in the fields of weather forecasting and economics. == Description == In general, the objective of deconvolution is to find the solution f of a convolution equation of the form: f ∗ g = h {\displaystyle fg=h\,} Usually, h is some recorded signal, and f is some signal that we wish to recover, but has been convolved with a filter or distortion function g, before we recorded it. Usually, h is a distorted version of f and the shape of f can't be easily recognized by the eye or simpler time-domain operations. The function g represents the impulse response of an instrument or a driving force that was applied to a physical system. If we know g, or at least know the form of g, then we can perform deterministic deconvolution. However, if we do not know g in advance, then we need to estimate it. This can be done using methods of statistical estimation or building the physical principles of the underlying system, such as the electrical circuit equations or diffusion equations. There are several deconvolution techniques, depending on the choice of the measurement error and deconvolution parameters: === Raw deconvolution === When the measurement error is very low (ideal case), deconvolution collapses into a filter reversing. This kind of deconvolution can be performed in the Laplace domain. By computing the Fourier transform of the recorded signal h and the system response function g, you get H and G, with G as the transfer function. Using the convolution theorem, F = H / G {\displaystyle F=H/G\,} where F is the estimated Fourier transform of f. Finally, the inverse Fourier transform of the function F is taken to find the estimated deconvolved signal f. Note that G is at the denominator and could amplify elements of the error model if present. === Deconvolution with noise === In physical measurements, the situation is usually closer to ( f ∗ g ) + ε = h {\displaystyle (fg)+\varepsilon =h\,} In this case ε is noise that has entered our recorded signal. If a noisy signal or image is assumed to be noiseless, the statistical estimate of g will be incorrect. In turn, the estimate of ƒ will also be incorrect. The lower the signal-to-noise ratio, the worse the estimate of the deconvolved signal will be. That is the reason why inverse filtering the signal (as in the "raw deconvolution" above) is usually not a good solution. However, if at least some knowledge exists of the type of noise in the data (for example, white noise), the estimate of ƒ can be improved through techniques such as Wiener deconvolution. == Applications == === Seismology === The concept of deconvolution had an early application in reflection seismology. In 1950, Enders Robinson was a graduate student at MIT. He worked with others at MIT, such as Norbert Wiener, Norman Levinson, and economist Paul Samuelson, to develop the "convolutional model" of a reflection seismogram. This model assumes that the recorded seismogram s(t) is the convolution of an Earth-reflectivity function e(t) and a seismic wavelet w(t) from a point source, where t represents recording time. Thus, our convolution equation is s ( t ) = ( e ∗ w ) ( t ) . {\displaystyle s(t)=(ew)(t).\,} The seismologist is interested in e, which contains information about the Earth's structure. By the convolution theorem, this equation may be Fourier transformed to S ( ω ) = E ( ω ) W ( ω ) {\displaystyle S(\omega )=E(\omega )W(\omega )\,} in the frequency domain, where ω {\displaystyle \omega } is the frequency variable. By assuming that the reflectivity is white, we can assume that the power spectrum of the reflectivity is constant, and that the power spectrum of the seismogram is the spectrum of the wavelet multiplied by that constant. Thus, | S ( ω ) | ≈ k | W ( ω ) | . {\displaystyle |S(\omega )|\approx k|W(\omega )|.\,} If we assume that the wavelet is minimum phase, we can recover it by calculating the minimum phase equivalent of the power spectrum we just found. The reflectivity may be recovered by designing and applying a Wiener filter that shapes the estimated wavelet to a Dirac delta function (i.e., a spike). The result may be seen as a series of scaled, shifted delta functions (although this is not mathematically rigorous): e ( t ) = ∑ i = 1 N r i δ ( t − τ i ) , {\displaystyle e(t)=\sum _{i=1}^{N}r_{i}\delta (t-\tau _{i}),} where N is the number of reflection events, r i {\displaystyle r_{i}} are the reflection coefficients, t − τ i {\displaystyle t-\tau _{i}} are the reflection times of each event, and δ {\displaystyle \delta } is the Dirac delta function. In practice, since we are dealing with noisy, finite bandwidth, finite length, discretely sampled datasets, the above procedure only yields an approximation of the filter required to deconvolve the data. However, by formulating the problem as the solution of a Toeplitz matrix and using Levinson recursion, we can relatively quickly estimate a filter with the smallest mean squared error possible. We can also do deconvolution directly in the frequency domain and get similar results. The technique is closely related to linear prediction. === Optics and other imaging === In optics and imaging, the term "deconvolution" is specifically used to refer to the process of reversing the optical distortion that takes place in an optical microscope, electron microscope, telescope, or other imaging instrument, thus creating clearer images. It is usually done in the digital domain by a software algorithm, as part of a suite of microscope image processing techniques. Deconvolution is also practical to sharpen images that suffer from fast motion or jiggles during capturing. Early Hubble Space Telescope images were distorted by a flawed mirror and were sharpened by deconvolution. The usual method is to assume that the optical path through the instrument is optically perfect, convolved with a point spread function (PSF), that is, a mathematical function that describes the distortion in terms of the pathway a theoretical point source of light (or other waves) takes through the instrument. Usually, such a point source contributes a small area of fuzziness to the final image. If this function can be determined, it is then a matter of computing its inverse or complementary function, and convolving the acquired image with that. The result is the original, undistorted image. In practice, finding the true PSF is impossible, and usually an approximation of it is used, theoretically calculated or based on some experimental estimation by using known probes. Real optics may also have different PSFs at different focal and spatial locations, and the PSF may be non-linear. The accuracy of the approximation of the PSF will dictate the final result. Different algorithms can be employed to give better results, at the price of being more computationally intensive. Since the original convolution discards data, some algorithms use additional data acquired at nearby focal points to make up some of the lost information. Regularization in iterative algorithms (as in expectation-maximization algorithms) can be applied to avoid unrealistic solutions. When the PSF is unknown, it may be possible to deduce it by systematically trying different possible PSFs and assessing whether the image has improved. This procedure is called blind deconvolution. Blind deconvolution is a well-established image restoration technique in astronomy, where the point nature of the objects photographed exposes the PSF thus making it more feasible. It is also used in fluorescence microscopy for image restoration, and in fluorescence spectral imaging for spectral separation of multiple unknown fluorophores. The most common iterative algorithm for the purpose is the Richardson–Lucy deconvolution algorithm; the Wiener deconvolution (and approximations) are the most common non-iterative algorithms. For some specific imaging systems such as laser pulsed terahertz systems, PSF can be modeled mathematically. As a result, as shown in the figure, deconvolution of the modeled PS

    Read more →
  • Kubeflow

    Kubeflow

    Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks), model training (Kubeflow Pipelines, Kubeflow Training Operator), model serving (KServe), and automated machine learning (Katib). Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. == History == The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan to address a perceived lack of flexible options for building production-ready machine learning systems. The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018. Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. == Components == === Kubeflow Notebooks for model development === Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. === Kubeflow Pipelines for model training === Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. === Kubeflow Training Operator for model training === For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. === KServe for model serving === The KServe component (previously named KFServing) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. Publicly disclosed adopters of KServe include Bloomberg, Gojek, the Wikimedia Foundation, and others. === Katib for automated machine learning === Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. == Release timeline ==

    Read more →
  • Witness set

    Witness set

    In combinatorics and computational learning theory, a witness set is a set of elements that distinguishes a given Boolean function from a given class of other Boolean functions. Let C {\displaystyle C} be a concept class over a domain X {\displaystyle X} (that is, a family of Boolean functions over X {\displaystyle X} ) and c {\displaystyle c} be a concept in X {\displaystyle X} (a single Boolean function). A subset S {\displaystyle S} of X {\displaystyle X} is a witness set for c {\displaystyle c} in X {\displaystyle X} if S {\displaystyle S} distinguishes c {\displaystyle c} from all the other functions in C {\displaystyle C} , in the sense that no other function in C {\displaystyle C} has the same values on S {\displaystyle S} . For a concept class with | C | {\displaystyle |C|} concepts, there exists a concept that has a witness of size at most log 2 ⁡ | C | {\displaystyle \log _{2}|C|} ; this bound is tight when C {\displaystyle C} consists of all Boolean functions over X {\displaystyle X} . By a result of Bondy (1972) there exists a single witness set of size at most | C | − 1 {\displaystyle |C|-1} that is valid for all concepts in C {\displaystyle C} ; this bound is tight when C {\displaystyle C} consists of the indicator functions of the empty set and some singleton sets. One way to construct this set is to interpret the concepts as bitstrings, and the domain elements as positions in these bitstrings. Then the set of positions at which a trie of the bitstrings branches forms the desired witness set. This construction is central to the operation of the fusion tree data structure. The minimum size of a witness set for c {\displaystyle c} is called the witness size or specification number and is denoted by w C ( c ) {\displaystyle w_{C}(c)} . The value max { w C ( c ) : c ∈ C } {\displaystyle \max\{w_{C}(c):c\in C\}} is called the teaching dimension of C {\displaystyle C} . It represents the number of examples of a concept that need to be presented by a teacher to a learner, in the worst case, to enable the learner to determine which concept is being presented. Witness sets have also been called teaching sets, keys, specifying sets, or discriminants. The "witness set" terminology is from Kushilevitz et al. (1996), who trace the concept of witness sets to work by Cover (1965).

    Read more →
  • Rprop

    Rprop

    Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks. This is a first-order optimization algorithm. This algorithm was created by Martin Riedmiller and Heinrich Braun in 1992. Similarly to the Manhattan update rule, Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight". For each weight, if there was a sign change of the partial derivative of the total error function compared to the last iteration, the update value for that weight is multiplied by a factor η−, where η− < 1. If the last iteration produced the same sign, the update value is multiplied by a factor of η+, where η+ > 1. The update values are calculated for each weight in the above manner, and finally each weight is changed by its own update value, in the opposite direction of that weight's partial derivative, so as to minimise the total error function. η+ is empirically set to 1.2 and η− to 0.5. Rprop can result in very large weight increments or decrements if the gradients are large, which is a problem when using mini-batches as opposed to full batches. RMSprop addresses this problem by keeping the moving average of the squared gradients for each weight and dividing the gradient by the square root of the mean square. RPROP is a batch update algorithm. Next to the cascade correlation algorithm and the Levenberg–Marquardt algorithm, Rprop is one of the fastest weight update mechanisms. == Variations == Martin Riedmiller developed three algorithms, all named RPROP. Igel and Hüsken assigned names to them and added a new variant: RPROP+ is defined at A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. RPROP− is defined at Advanced Supervised Learning in Multi-layer Perceptrons – From Backpropagation to Adaptive Learning Algorithms. Backtracking is removed from RPROP+. iRPROP− is defined in Rprop – Description and Implementation Details and was reinvented by Igel and Hüsken. This variant is very popular and most simple. iRPROP+ is defined at Improving the Rprop Learning Algorithm and is very robust and typically faster than the other three variants.

    Read more →
  • RemObjects Software

    RemObjects Software

    RemObjects Software is an American software company founded in 2002 by Alessandro Federici and Marc Hoffman. It develops and offers tools and libraries for software developers on a variety of development platforms, including Embarcadero Delphi, Microsoft .NET, Mono, and Apple's Xcode. == History == RemObjects Software was founded in the summer of 2002. Its first product was RemObjects SDK 1.0 for Delphi, the company's remoting solution which is now in its 6th version. In late 2003 RemObjects expanded its product portfolio to add Data Abstract for Delphi, a multi-tier database framework built on top of the SDK. In 2004, Carlo Kok, who would eventually become Chief Compiler Architect for Oxygene, joined the company, adding the open source Pascal Script library for Delphi to the company's portfolio. Initial development began on Oxygene (which was then named Chrome) based on Carlo's experience from writing the widely used Pascal Script scripting engine. Towards the end of 2004, RemObjects SDK for .NET was released, expanding the remoting framework to its second platform. Chrome 1.0 was released in mid-2005, providing support for .NET 1.1 and .NET 2.0, which was still in beta at the time - making Chrome the first shipping language for .NET that supported features such as generics. It was followed by Chrome 1.5 when .NET 2.0 shipped in November of the same year. 2005 also saw the expansion of Data Abstract to .NET as a second platform. Data Abstract for .NET was the first RemObjects product (besides Oxygene itself) to be written in Oxygene. Hydra 3.0, was released for .NET in December 2006, bringing a paradigm shift to the product, away from a regular plugin framework, and focusing on interoperability between plugins and host applications written in either .NET or Delphi/Win32, essentially enabling the use of both managed and unmanaged code in the same project. In Summer 2007, RemObjects released Chrome 'Joyride' which added official support for .NET 3.0 and 3.5. Chrome once again was the first language to ship release level support for new .NET framework features supported by that runtime - most importantly Sequences and Queries (aka LINQ). Development continued and in May 2008 Oxygene 3.0 was released, dropping the "Chrome" moniker. Oxygene once again brought major language enhancements, including extensive support for concurrency and parallel programming as part of the language syntax. In October 2008, RemObjects Software and Embarcadero Technologies announced plans to collaborate and ship future versions of Oxygene under the Delphi Prism moniker, later changed to Embarcadero Prism. The first of these releases of Prism became available in December 2008. Over the course of 2009, RemObjects software completed the expansion of its Data Abstract and RemObjects SDK product combo to a third development platform - Xcode and Cocoa, for both Mac OS X and iPhone SDK client development. RemObjects SDK for OS X shipped in the spring of 2009, followed by Data Abstract for OS X in the fall. In 2011, Oxygene was expanded to add support for the Java platform, in addition to NET. In 2014, RemObjects introduced a C# compiler which runs as a Visual Studio 2013 plugin, that can output code for iOS, MacOS (Cocoa) and Android, in addition to .NET compatible code. In addition, an IDE called Fire was introduced for macOS which works with their C# and Oxygene compilers. Together, the compiler supporting both Oxygene and C# was rebranded as the Elements Compiler, with CE# having the Code name "Hydrogene". In February 2015, RemObjects introduced a beta version of a Swift compiler called Silver as part of its Elements effort. Silver, too, could create code that will execute on Android, the JVM, .NET platform and also create native Cocoa code. Silver added new features to the Swift language, such as exceptions and has a few differences and limitations compared to Apple's Swift. In February 2020, support for the Go programming language was introduced with RemObjects Gold, including the ability to compile Go language code for all Elements platforms, and a port of the extensive Go Base Library available to all Elements languages. In 2021, Mercury was added to the Elements compiler as the sixth language, providing a future for the Visual Basic .NET language recently deprecated by Microsoft. Mercury supports building and maintaining existing VB.NET projects, as well as using the language for new projects both on .NET and the other platforms. == Commercial products == Elements is a development toolchain that targets .NET runtime, Java/Android virtual machines, the Apple ecosystem (macOS, iOS, tvOS), WebAssembly and native and Windows/Linux/Android NDK processor-native machine code in conjunction with a runtime library that does automatic garbage collection on non-ARC environments and ARC on ARC-based environments, such as iOS and MacOS. Because Java, C#, Swift, and Oxygene all can import each other's APIs, Elements effectively functions as Java bonded together with C# bonded together with Swift bonded together with Oxygene as a confederation of languages cooperating together quite intimately. Oxygene, a unique programming language based on Object Pascal, which can import Java, C#, and Swift APIs from the runtime of the target operating system; RemObjects C#, an implementation of C# programming language, which can import Java, Swift, and Oxygene APIs from the runtime of the target operating system and which is intended as a competitor of Xamarin, but Hydrogene's C# targets JVM bytecode instead of Xamarin's C# compiling to only Common Language Infrastructure byte code and needing the accompanying Mono Common Language Runtime to be present in such JVM-centric environments as Android; Silver, a free implementation of the Swift programming language, which can import Java, C#, and Oxygene APIs from the runtime of the target operating system; Iodine, an implementation of the Java programming language. Gold, an implementation of the Go programming language. Mercury, an implementation of the Visual Basic .NET programming language. Fire an integrated development environment for macOS. Water an integrated development environment for Windows. Data Abstract Remoting SDK, a.k.a. RemObjects SDK Hydra Oxfuscator Oxidizer, an automatic translator from Java, C#, Objective-C, and Delphi to Oxygene, from Java, Objective-C, and C# to Swift, and from Java and Objective-C to C#. == Open source projects == Train is an open-source JavaScript-based tool for building and running build scripts and automation. Internet Pack for .NET is a free, open source library for building network clients and servers using TCP and higher level protocols such as HTTP or FTP, using the .NET or Mono platforms. It includes a range of ready to use protocol implementations, as well as base classes that allow the creation of custom implementations. RemObjects Script for .NET is a fully managed ECMAScript implementation for .NET and Mono. Pascal Script for Delphi is a widely used implementation of Pascal as scripting language. == Involvement of other projects == The Oxygene Compiler Oxygene is a language based on Object Pascal and designed to efficiently target the Microsoft .NET and Mono managed runtimes; it expands Object Pascal with a range of additional language features, such as Aspect Oriented Programming, Class Contracts and support for Parallelism. It integrates with the Microsoft Visual Studio and MonoDevelop IDEs.

    Read more →
  • Random indexing

    Random indexing

    Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately. This is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma, and locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on sparse distributed memory, and can be described as an incremental formulation of a random projection. It can be also verified that random indexing is a random projection technique for the construction of Euclidean spaces—i.e. L2 normed vector spaces. In Euclidean spaces, random projections are elucidated using the Johnson–Lindenstrauss lemma. The TopSig technique extends the random indexing model to produce bit vectors for comparison with the Hamming distance similarity function. It is used for improving the performance of information retrieval and document clustering. In a similar line of research, Random Manhattan Integer Indexing (RMII) is proposed for improving the performance of the methods that employ the Manhattan distance between text units. Many random indexing methods primarily generate similarity from co-occurrence of items in a corpus. Reflexive Random Indexing (RRI) generates similarity from co-occurrence and from shared occurrence with other items.

    Read more →
  • Premature convergence

    Premature convergence

    Premature convergence is an unwanted effect in evolutionary algorithms (EA), a metaheuristic that mimics the basic principles of biological evolution as a computer algorithm for solving an optimization problem. The effect means that the population of an EA has converged too early, resulting in being suboptimal. In this context, the parental solutions, through the aid of genetic operators, are not able to generate offspring that are superior to, or outperform, their parents. Premature convergence is a common problem found in evolutionary algorithms, as it leads to a loss, or convergence of, a large number of alleles, subsequently making it very difficult to search for a specific gene in which the alleles were present. An allele is considered lost if, in a population, a gene is present, where all individuals are sharing the same value for that particular gene. An allele is, as defined by De Jong, considered to be a converged allele, when 95% of a population share the same value for a certain gene. == Strategies for preventing premature convergence == Strategies to regain genetic variation can be: a mating strategy called incest prevention, uniform crossover, mimicking sexual selection, favored replacement of similar individuals (preselection or crowding), segmentation of individuals of similar fitness (fitness sharing), increasing population size niche and specie The genetic variation can also be regained by mutation though this process is highly random. A general strategy to reduce the risk of premature convergence is to use structured populations instead of the commonly used panmictic ones. == Identification of the occurrence of premature convergence == It is hard to determine when premature convergence has occurred, and it is equally hard to predict its presence in the future. One measure is to use the difference between the average and maximum fitness values, as used by Patnaik & Srinivas, to then vary the crossover and mutation probabilities. Population diversity is another measure which has been extensively used in studies to measure premature convergence. However, although it has been widely accepted that a decrease in the population diversity directly leads to premature convergence, there have been little studies done on the analysis of population diversity. In other words, by using the term population diversity, the argument for a study in preventing premature convergence lacks robustness, unless specified what their definition of population diversity is. There are models to counter the effect and risk of premature convergence that do not compromise core GA parameters like population size, mutation rate, and other core mechanisms. These models were inspired by biological ecology, where genetic interactions are limited by external mechanisms such as spatial topologies or speciation. These ecological models, such as the Eco-GA, adopt diffusion-based strategies to improve the robustness of GA runs and increase the likelihood of reaching near-global optima. == Causes for premature convergence == There are a number of presumed or hypothesized causes for the occurrence of premature convergence. === Self-adaptive mutations === Rechenberg introduced the idea of self-adaptation of mutation distributions in evolution strategies. According to Rechenberg, the control parameters for these mutation distributions evolved internally through self-adaptation, rather than predetermination. He called it the 1/5-success rule of evolution strategies (1 + 1)-ES: The step size control parameter would be increased by some factor if the relative frequency of positive mutations through a determined period of time is larger than 1/5, vice versa if it is smaller than 1/5. Self-adaptive mutations may very well be one of the causes for premature convergence. Accurately locating of optima can be enhanced by self-adaptive mutation, as well as accelerating the search for this optima. This has been widely recognized, though the mechanism's underpinnings of this have been poorly studied, as it is often unclear whether the optima is found locally or globally. Self-adaptive methods can cause global convergence to global optimum, provided that the selection methods used are using elitism, as well as that the rule of self-adaptation doesn't interfere with the mutation distribution, which has the property of ensuring a positive minimum probability when hitting a random subset. This is for non-convex objective functions with sets that include bounded lower levels of non-zero measurements. A study by Rudolph suggests that self-adaption mechanisms among elitist evolution strategies do resemble the 1/5-success rule, and could very well get caught by a local optimum that include a positive probability. === Panmictic populations === Most EAs use unstructured or panmictic populations where basically every individual in the population is eligible for mate selection based on fitness. Thus, The genetic information of an only slightly better individual can spread in a population within a few generations, provided that no better other offspring is produced during this time. Especially in comparatively small populations, this can quickly lead to a loss of genotypic diversity and thus to premature convergence. A well-known countermeasure is to switch to alternative population models which introduce substructures into the population that preserve genotypic diversity over a longer period of time and thus counteract the tendency towards premature convergence. This has been shown for various EAs such as genetic algorithms, the evolution strategy, other EAs or memetic algorithms.

    Read more →
  • Triplet loss

    Triplet loss

    Triplet loss is a machine learning loss function widely used in one-shot learning, a setting where models are trained to generalize effectively from limited examples. It was conceived by Google researchers for their prominent FaceNet algorithm for face detection. Triplet loss is designed to support metric learning. Namely, to assist training models to learn an embedding (mapping to a feature space) where similar data points are closer together and dissimilar ones are farther apart, enabling robust discrimination across varied conditions. In the context of face detection, data points correspond to images. == Definition == The loss function is defined using triplets of training points of the form ( A , P , N ) {\displaystyle (A,P,N)} . In each triplet, A {\displaystyle A} (called an "anchor point") denotes a reference point of a particular identity, P {\displaystyle P} (called a "positive point") denotes another point of the same identity in point A {\displaystyle A} , and N {\displaystyle N} (called a "negative point") denotes a point of an identity different from the identity in point A {\displaystyle A} and P {\displaystyle P} . Let x {\displaystyle x} be some point and let f ( x ) {\displaystyle f(x)} be the embedding of x {\displaystyle x} in the finite-dimensional Euclidean space. It shall be assumed that the L2-norm of f ( x ) {\displaystyle f(x)} is unity (the L2 norm of a vector X {\displaystyle X} in a finite dimensional Euclidean space is denoted by ‖ X ‖ {\displaystyle \Vert X\Vert } .) We assemble m {\displaystyle m} triplets of points from the training dataset. The goal of training here is to ensure that, after learning, the following condition (called the "triplet constraint") is satisfied by all triplets ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} in the training data set: ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 + α < ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 {\displaystyle \Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}+\alpha <\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}} The variable α {\displaystyle \alpha } is a hyperparameter called the margin, and its value must be set manually. In the FaceNet system, its value was set as 0.2. Thus, the full form of the function to be minimized is the following: L = ∑ i = 1 m max ( ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 − ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 + α , 0 ) {\displaystyle L=\sum _{i=1}^{m}\max {\Big (}\Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}-\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}+\alpha ,0{\Big )}} == Intuition == A baseline for understanding the effectiveness of triplet loss is the contrastive loss, which operates on pairs of samples (rather than triplets). Training with the contrastive loss pulls embeddings of similar pairs closer together, and pushes dissimilar pairs apart. Its pairwise approach is greedy, as it considers each pair in isolation. Triplet loss innovates by considering relative distances. Its goal is that the embedding of an anchor (query) point be closer to positive points than to negative points (also accounting for the margin). It does not try to further optimize the distances once this requirement is met. This is approximated by simultaneously considering two pairs (anchor-positive and anchor-negative), rather than each pair in isolation. == Triplet "mining" == One crucial implementation detail when training with triplet loss is triplet "mining", which focuses on the smart selection of triplets for optimization. This process adds an additional layer of complexity compared to contrastive loss. A naive approach to preparing training data for the triplet loss involves randomly selecting triplets from the dataset. In general, the set of valid triplets of the form ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} is very large. To speed-up training convergence, it is essential to focus on challenging triplets. In the FaceNet paper, several options were explored, eventually arriving at the following. For each anchor-positive pair, the algorithm considers only semi-hard negatives. These are negatives that violate the triplet requirement (i.e, are "hard"), but lie farther from the anchor than the positive (not too hard). Restated, for each A ( i ) {\displaystyle A^{(i)}} and P ( i ) {\displaystyle P^{(i)}} , they seek N ( i ) {\displaystyle N^{(i)}} such that: The rationale for this design choice is heuristic. It may appear puzzling that the mining process neglects "very hard" negatives (i.e., closer to the anchor than the positive). Experiments conducted by the FaceNet designers found that this often leads to a convergence to degenerate local minima. Triplet mining is performed at each training step, from within the sample points contained in the training batch (this is known as online mining), after embeddings were computed for all points in the batch. While ideally the entire dataset could be used, this is impractical in general. To support a large search space for triplets, the FaceNet authors used very large batches (1800 samples). Batches are constructed by selecting a large number of same-category sample points (40), and randomly selected negatives for them. == Extensions == Triplet loss has been extended to simultaneously maintain a series of distance orders by optimizing a continuous relevance degree with a chain (i.e., ladder) of distance inequalities. This leads to the Ladder Loss, which has been demonstrated to offer performance enhancements of visual-semantic embedding in learning to rank tasks. In Natural Language Processing, triplet loss is one of the loss functions considered for BERT fine-tuning in the SBERT architecture. Other extensions involve specifying multiple negatives (multiple negatives ranking loss).

    Read more →
  • The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis is a report authored by James van Geelen and Alap Shah and published by Citrini Research in February 2026, on the impact of artificial intelligence on humanity's future. Written in the form of a scenario analysis, it was viewed millions of times online and reportedly caused a fall in the stock market prices of major tech and financial firms. It also received criticism among others, for its allegedly flawed economic logic. The 'thought exercise', as the authors called it, painted a gloomy picture for the near future, where outputs keep growing while consumer's ability to spend collapses. "...driven by ai agents that don’t sleep, take sick days or require health insurance”, "outputs that are shown in national accounts increases, "but never circulates through the real economy"(which the report calls 'Ghost GDP'), the authors argued. In other words, the authors predict a scenario where the owners of the AI firms will accumulate a vast fortune but there will be scant demand from consumers as AI would cause massive unemployment. The authors caution the reader that what they make is a scenario and not a prediction. In the scenario they visualise, any service whose value proposition is “I will navigate complexity that you find tedious” is getting disrupted. The reports argues that the unique ability of human beings to analyse, decide, create, persuade, and coordinate was “the thing that could not be replicated at scale,” and call the historical scarcity of this precious entity 'friction'. When this friction becomes zero, a gamut of changes occur which then triggers a cascading of changes across the economy. ”Travel booking platforms are an early casualty; Financial advice. tax prep., and routine legal work follow suit. National unemployment rate go as high 10.2% and the S&P 500 goes for a massive 38% peak-to-trough crash. In contrast to the previous technological revolutions the high-earning professionals suffers more and get forced to take up roles in the gig economy. Labour supply becomes abundant and this cuts wages all across the economy. The dent in income for the employees then affects other sectors of the economy such as the residential mortgage market. The losses for the software companies triggers loan defaults and heralds peril for the private credit sector.

    Read more →
  • Farthest-first traversal

    Farthest-first traversal

    In computational geometry, the farthest-first traversal of a compact metric space is a sequence of points in the space, where the first point is selected arbitrarily and each successive point is as far as possible from the set of previously-selected points. The same concept can also be applied to a finite set of geometric points, by restricting the selected points to belong to the set or equivalently by considering the finite metric space generated by these points. For a finite metric space or finite set of geometric points, the resulting sequence forms a permutation of the points, also known as the greedy permutation. Every prefix of a farthest-first traversal provides a set of points that is widely spaced and close to all remaining points. More precisely, no other set of equally many points can be spaced more than twice as widely, and no other set of equally many points can be less than half as far to its farthest remaining point. In part because of these properties, farthest-point traversals have many applications, including the approximation of the traveling salesman problem and the metric k-center problem. They may be constructed in polynomial time, or (for low-dimensional Euclidean spaces) approximated in near-linear time. == Definition and properties == A farthest-first traversal is a sequence of points in a compact metric space, with each point appearing at most once. If the space is finite, each point appears exactly once, and the traversal is a permutation of all of the points in the space. The first point of the sequence may be any point in the space. Each point p after the first must have the maximum possible distance to the set of points earlier than p in the sequence, where the distance from a point to a set is defined as the minimum of the pairwise distances to points in the set. A given space may have many different farthest-first traversals, depending both on the choice of the first point in the sequence (which may be any point in the space) and on ties for the maximum distance among later choices. Farthest-point traversals may be characterized by the following properties. Fix a number k, and consider the prefix formed by the first k points of the farthest-first traversal of any metric space. Let r be the distance between the final point of the prefix and the other points in the prefix. Then this subset has the following two properties: All pairs of the selected points are at distance at least r from each other, and All points of the metric space are at distance at most r from the subset. Conversely any sequence having these properties, for all choices of k, must be a farthest-first traversal. These are the two defining properties of a Delone set, so each prefix of the farthest-first traversal forms a Delone set. == Applications == Rosenkrantz, Stearns & Lewis (1977) used the farthest-first traversal to define the farthest-insertion heuristic for the travelling salesman problem. This heuristic finds approximate solutions to the travelling salesman problem by building up a tour on a subset of points, adding one point at a time to the tour in the ordering given by a farthest-first traversal. To add each point to the tour, one edge of the previous tour is broken and replaced by a pair of edges through the added point, in the cheapest possible way. Although Rosenkrantz et al. prove only a logarithmic approximation ratio for this method, they show that in practice it often works better than other insertion methods with better provable approximation ratios. Later, the same sequence of points was popularized by Gonzalez (1985), who used it as part of greedy approximation algorithms for two problems in clustering, in which the goal is to partition a set of points into k clusters. One of the two problems that Gonzalez solve in this way seeks to minimize the maximum diameter of a cluster, while the other, known as the metric k-center problem, seeks to minimize the maximum radius, the distance from a chosen central point of a cluster to the farthest point from it in the same cluster. For instance, the k-center problem can be used to model the placement of fire stations within a city, in order to ensure that every address within the city can be reached quickly by a fire truck. For both clustering problems, Gonzalez chooses a set of k cluster centers by selecting the first k points of a farthest-first traversal, and then creates clusters by assigning each input point to the nearest cluster center. If r is the distance from the set of k selected centers to the next point at position k + 1 in the traversal, then with this clustering every point is within distance r of its center and every cluster has diameter at most 2r. However, the subset of k centers together with the next point are all at distance at least r from each other, and any k-clustering would put some two of these points into a single cluster, with one of them at distance at least r/2 from its center and with diameter at least r. Thus, Gonzalez's heuristic gives an approximation ratio of 2 for both clustering problems. Gonzalez's heuristic was independently rediscovered for the metric k-center problem by Dyer & Frieze (1985), who applied it more generally to weighted k-center problems. Another paper on the k-center problem from the same time, Hochbaum & Shmoys (1985), achieves the same approximation ratio of 2, but its techniques are different. Nevertheless, Gonzalez's heuristic, and the name "farthest-first traversal", are often incorrectly attributed to Hochbaum and Shmoys. For both the min-max diameter clustering problem and the metric k-center problem, these approximations are optimal: the existence of a polynomial-time heuristic with any constant approximation ratio less than 2 would imply that P = NP. As well as for clustering, the farthest-first traversal can also be used in another type of facility location problem, the max-min facility dispersion problem, in which the goal is to choose the locations of k different facilities so that they are as far apart from each other as possible. More precisely, the goal in this problem is to choose k points from a given metric space or a given set of candidate points, in such a way as to maximize the minimum pairwise distance between the selected points. Again, this can be approximated by choosing the first k points of a farthest-first traversal. If r denotes the distance of the kth point from all previous points, then every point of the metric space or the candidate set is within distance r of the first k − 1 points. By the pigeonhole principle, some two points of the optimal solution (whatever it is) must both be within distance r of the same point among these first k − 1 chosen points, and (by the triangle inequality) within distance 2r of each other. Therefore, the heuristic solution given by the farthest-first traversal is within a factor of two of optimal. Other applications of the farthest-first traversal include color quantization (clustering the colors in an image to a smaller set of representative colors), progressive scanning of images (choosing an order to display the pixels of an image so that prefixes of the ordering produce good lower-resolution versions of the whole image rather than filling in the image from top to bottom), point selection in the probabilistic roadmap method for motion planning, simplification of point clouds, generating masks for halftone images, hierarchical clustering, finding the similarities between polygon meshes of similar surfaces, choosing diverse and high-value observation targets for underwater robot exploration, fault detection in sensor networks, modeling phylogenetic diversity, matching vehicles in a heterogenous fleet to customer delivery requests, uniform distribution of geodetic observatories on the Earth's surface or of other types of sensor network, generation of virtual point lights in the instant radiosity computer graphics rendering method, and geometric range searching data structures. == Algorithms == === Greedy exact algorithm === The farthest-first traversal of a finite point set may be computed by a greedy algorithm that maintains the distance of each point from the previously selected points, performing the following steps: Initialize the sequence of selected points to the empty sequence, and the distances of each point to the selected points to infinity. While not all points have been selected, repeat the following steps: Scan the list of not-yet-selected points to find a point p that has the maximum distance from the selected points. Remove p from the not-yet-selected points and add it to the end of the sequence of selected points. For each remaining not-yet-selected point q, replace the distance stored for q by the minimum of its old value and the distance from p to q. For a set of n points, this algorithm takes O(n2) steps and O(n2) distance computations. === Approximations === A faster approximation algorithm, given by Har-Peled & Mendel (2006), applie

    Read more →
  • Multiple correspondence analysis

    Multiple correspondence analysis

    In statistics, multiple correspondence analysis (MCA) is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Euclidean space. The procedure thus appears to be the counterpart of principal component analysis for categorical data. MCA can be viewed as an extension of simple correspondence analysis (CA) in that it is applicable to a large set of categorical variables. == As an extension of correspondence analysis == MCA is performed by applying the CA algorithm to either an indicator matrix (also called complete disjunctive table – CDT) or a Burt table formed from these variables. An indicator matrix is an individuals × variables matrix, where the rows represent individuals and the columns are dummy variables representing categories of the variables. Analyzing the indicator matrix allows the direct representation of individuals as points in geometric space. The Burt table is the symmetric matrix of all two-way cross-tabulations between the categorical variables, and has an analogy to the covariance matrix of continuous variables. Analyzing the Burt table is a more natural generalization of simple correspondence analysis, and individuals or the means of groups of individuals can be added as supplementary points to the graphical display. In the indicator matrix approach, associations between variables are uncovered by calculating the chi-square distance between different categories of the variables and between the individuals (or respondents). These associations are then represented graphically as "maps", which eases the interpretation of the structures in the data. Oppositions between rows and columns are then maximized, in order to uncover the underlying dimensions best able to describe the central oppositions in the data. As in factor analysis or principal component analysis, the first axis is the most important dimension, the second axis the second most important, and so on, in terms of the amount of variance accounted for. The number of axes to be retained for analysis is determined by calculating modified eigenvalues. == Details == Since MCA is adapted to draw statistical conclusions from categorical variables (such as multiple choice questions), the first thing one needs to do is to transform quantitative data (such as age, size, weight, day time, etc) into categories (using for instance statistical quantiles). When the dataset is completely represented as categorical variables, one is able to build the corresponding so-called complete disjunctive table. We denote this table X {\displaystyle X} . If I {\displaystyle I} persons answered a survey with J {\displaystyle J} multiple choices questions with 4 answers each, X {\displaystyle X} will have I {\displaystyle I} rows and 4 J {\displaystyle 4J} columns. More theoretically, assume X {\displaystyle X} is the completely disjunctive table of I {\displaystyle I} observations of K {\displaystyle K} categorical variables. Assume also that the k {\displaystyle k} -th variable have J k {\displaystyle J_{k}} different levels (categories) and set J = ∑ k = 1 K J k {\displaystyle J=\sum _{k=1}^{K}J_{k}} . The table X {\displaystyle X} is then a I × J {\displaystyle I\times J} matrix with all coefficient being 0 {\displaystyle 0} or 1 {\displaystyle 1} . Set the sum of all entries of X {\displaystyle X} to be N {\displaystyle N} and introduce Z = X / N {\displaystyle Z=X/N} . In an MCA, there are also two special vectors: first r {\displaystyle r} , that contains the sums along the rows of Z {\displaystyle Z} , and c {\displaystyle c} , that contains the sums along the columns of Z {\displaystyle Z} . Note D r = diag ( r ) {\displaystyle D_{r}={\text{diag}}(r)} and D c = diag ( c ) {\displaystyle D_{c}={\text{diag}}(c)} , the diagonal matrices containing r {\displaystyle r} and c {\displaystyle c} respectively as diagonal. With these notations, computing an MCA consists essentially in the singular value decomposition of the matrix: M = D r − 1 / 2 ( Z − r c T ) D c − 1 / 2 {\displaystyle M=D_{r}^{-1/2}(Z-rc^{T})D_{c}^{-1/2}} The decomposition of M {\displaystyle M} gives you P {\displaystyle P} , Δ {\displaystyle \Delta } and Q {\displaystyle Q} such that M = P Δ Q T {\displaystyle M=P\Delta Q^{T}} with P, Q two unitary matrices and Δ {\displaystyle \Delta } is the generalized diagonal matrix of the singular values (with the same shape as Z {\displaystyle Z} ). The positive coefficients of Δ 2 {\displaystyle \Delta ^{2}} are the eigenvalues of Z {\displaystyle Z} . The interest of MCA comes from the way observations (rows) and variables (columns) in Z {\displaystyle Z} can be decomposed. This decomposition is called a factor decomposition. The coordinates of the observations in the factor space are given by F = D r − 1 / 2 P Δ {\displaystyle F=D_{r}^{-1/2}P\Delta } The i {\displaystyle i} -th rows of F {\displaystyle F} represent the i {\displaystyle i} -th observation in the factor space. And similarly, the coordinates of the variables (in the same factor space as observations!) are given by G = D c − 1 / 2 Q Δ {\displaystyle G=D_{c}^{-1/2}Q\Delta } == Recent works and extensions == In recent years, several students of Jean-Paul Benzécri have refined MCA and incorporated it into a more general framework of data analysis known as geometric data analysis. This involves the development of direct connections between simple correspondence analysis, principal component analysis and MCA with a form of cluster analysis known as Euclidean classification. Two extensions have great practical use. It is possible to include, as active elements in the MCA, several quantitative variables. This extension is called factor analysis of mixed data (see below). Very often, in questionnaires, the questions are structured in several issues. In the statistical analysis it is necessary to take into account this structure. This is the aim of multiple factor analysis which balances the different issues (i.e. the different groups of variables) within a global analysis and provides, beyond the classical results of factorial analysis (mainly graphics of individuals and of categories), several results (indicators and graphics) specific of the group structure. == Application fields == In the social sciences, MCA is arguably best known for its application by Pierre Bourdieu, notably in his books La Distinction, Homo Academicus and The State Nobility. Bourdieu argued that there was an internal link between his vision of the social as spatial and relational --– captured by the notion of field, and the geometric properties of MCA. Sociologists following Bourdieu's work most often opt for the analysis of the indicator matrix, rather than the Burt table, largely because of the central importance accorded to the analysis of the 'cloud of individuals'. == Multiple correspondence analysis and principal component analysis == MCA can also be viewed as a PCA applied to the complete disjunctive table. To do this, the CDT must be transformed as follows. Let y i k {\displaystyle y_{ik}} denote the general term of the CDT. y i k {\displaystyle y_{ik}} is equal to 1 if individual i {\displaystyle i} possesses the category k {\displaystyle k} and 0 if not. Let denote p k {\displaystyle p_{k}} , the proportion of individuals possessing the category k {\displaystyle k} . The transformed CDT (TCDT) has as general term: x i k = y i k / p k − 1 {\displaystyle x_{ik}=y_{ik}/p_{k}-1} The unstandardized PCA applied to TCDT, the column k {\displaystyle k} having the weight p k {\displaystyle p_{k}} , leads to the results of MCA. This equivalence is fully explained in a book by Jérôme Pagès. It plays an important theoretical role because it opens the way to the simultaneous treatment of quantitative and qualitative variables. Two methods simultaneously analyze these two types of variables: factor analysis of mixed data and, when the active variables are partitioned in several groups: multiple factor analysis. This equivalence does not mean that MCA is a particular case of PCA as it is not a particular case of CA. It only means that these methods are closely linked to one another, as they belong to the same family: the factorial methods. == Software == There are numerous software of data analysis that include MCA, such as STATA and SPSS. The R package FactoMineR also features MCA. This software is related to a book describing the basic methods for performing MCA . There is also a Python package for [1] which works with numpy array matrices; the package has not been implemented yet for Spark dataframes.

    Read more →
  • Multiclass classification

    Multiclass classification

    In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification). For example, deciding on whether an image is showing a banana, peach, orange, or an apple is a multiclass classification problem, with four possible classes (banana, peach, orange, apple), while deciding on whether an image contains an apple or not is a binary classification problem (with the two possible classes being: apple, no apple). While many classification algorithms (e.g., decision trees, k-NN, neural networks and multinomial logistic regression) naturally permit the use of more than two classes, some are by nature binary algorithms (e.g., classical binary support vector machine) and require decomposition strategies such as one-vs-all, one-vs-one, or ECOC to solve multiclass problems. Multiclass classification should not be confused with multi-label classification, where multiple labels are to be predicted for each instance (e.g., predicting that an image contains both an apple and an orange, in the previous example). == Better-than-random multiclass models == From the confusion matrix of a multiclass model, we can determine whether a model does better than chance. Let K ≥ 3 {\displaystyle K\geq 3} be the number of classes, O {\displaystyle {\mathcal {O}}} a set of observations, y ^ : O → { 1 , . . . , K } {\displaystyle {\hat {y}}:{\mathcal {O}}\to \{1,...,K\}} a model of the target variable y : O → { 1 , . . . , K } {\displaystyle y:{\mathcal {O}}\to \{1,...,K\}} and n i , j {\displaystyle n_{i,j}} be the number of observations in the set { y = i } ∩ { y ^ = j } {\displaystyle \{y=i\}\cap \{{\hat {y}}=j\}} . We note n i . = ∑ j n i , j {\displaystyle n_{i.}=\sum _{j}n_{i,j}} , n . j = ∑ i n i , j {\displaystyle n_{.j}=\sum _{i}n_{i,j}} , n = ∑ j n . j = ∑ i n i . {\displaystyle n=\sum _{j}n_{.j}=\sum _{i}n_{i.}} , λ i = n i . n {\displaystyle \lambda _{i}={\frac {n_{i.}}{n}}} and μ j = n . j n {\displaystyle \mu _{j}={\frac {n_{.j}}{n}}} . It is assumed that the confusion matrix ( n i , j ) i , j {\displaystyle (n_{i,j})_{i,j}} contains at least one non-zero entry in each row, that is λ i > 0 {\displaystyle \lambda _{i}>0} for any i {\displaystyle i} . Finally we call "normalized confusion matrix" the matrix of conditional probabilities ( P ( y ^ = j ∣ y = i ) ) i , j = ( n i , j n i . ) i , j {\displaystyle (\mathbb {P} ({\hat {y}}=j\mid y=i))_{i,j}=\left({\frac {n_{i,j}}{n_{i.}}}\right)_{i,j}} . === Intuitive explanation === The lift is a way of measuring the deviation from independence of two events A {\displaystyle A} and B {\displaystyle B} : L i f t ( A , B ) = P ( A ∩ B ) P ( A ) P ( B ) = P ( A ∣ B ) P ( A ) = P ( B ∣ A ) P ( B ) {\displaystyle \mathrm {Lift} (A,B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (A)\mathbb {P} (B)}}={\frac {\mathbb {P} (A\mid B)}{\mathbb {P} (A)}}={\frac {\mathbb {P} (B\mid A)}{\mathbb {P} (B)}}} We have L i f t ( A , B ) > 1 {\displaystyle \mathrm {Lift} (A,B)>1} if and only if events A {\displaystyle A} and B {\displaystyle B} occur simultaneously with a greater probability than if they were independent. In other words, if one of the two events occurs, the probability of observing the other event increases. A first condition to satisfy is to have L i f t ( y = i , y ^ = i ) ≥ 1 {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)\geq 1} for any i {\displaystyle i} . And the quality of a model (better or worse than chance) does not change if we over- or undersample the dataset, that is if we multiply each row R i {\displaystyle R_{i}} of the confusion matrix by a constant c i {\displaystyle c_{i}} . Thus the second condition is that the necessary and sufficient conditions for doing better than chance need only depend on the normalized confusion matrix. The condition on lifts can be reformulated with One versus Rest binary models : for any i {\displaystyle i} , we define the binary target variable y i {\displaystyle y_{i}} which is the indicator of event { y = i } {\displaystyle \{y=i\}} , and the binary model y ^ i {\displaystyle {\hat {y}}_{i}} of y i {\displaystyle y_{i}} which is the indicator of event { y ^ = i } {\displaystyle \{{\hat {y}}=i\}} . Each of the y ^ i {\displaystyle {\hat {y}}_{i}} models is a "One versus Rest" model. L i f t ( y = i , y ^ = i ) {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)} only depends on the events { y = i } {\displaystyle \{y=i\}} and { y ^ = i } {\displaystyle \{{\hat {y}}=i\}} , so merging or not merging the other classes doesn't change its value. We therefore have L i f t ( y = i , y ^ = i ) = L i f t ( y i = 1 , y ^ i = 1 ) {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)=\mathrm {Lift} (y_{i}=1,{\hat {y}}_{i}=1)} and the first condition is that all binary One versus Rest models are better than chance. ==== Example ==== If K = 2 {\displaystyle K=2} and 2 is the class of interest , the normalized confusion matrix is ( s p e c i f i c i t y 1 − s p e c i f i c i t y 1 − s e n s i t i v i t y s e n s i t i v i t y ) {\displaystyle {\begin{pmatrix}\mathrm {specificity} &1-\mathrm {specificity} \\1-\mathrm {sensitivity} &\mathrm {sensitivity} \end{pmatrix}}} and we have L i f t ( y = 1 , y ^ = 1 ) − 1 = P ( y = y ^ = 1 ) λ 1 μ 1 − 1 = n 1 , 1 n n 1. n .1 − 1 {\displaystyle \mathrm {Lift} (y=1,{\hat {y}}=1)-1={\frac {\mathbb {P} (y={\hat {y}}=1)}{\lambda _{1}\mu _{1}}}-1={\frac {n_{1,1}n}{n_{1.}n_{.1}}}-1} = n 1 , 1 ( n 1 , 1 + n 1 , 2 + n 2 , 1 + n 2 , 2 ) − ( n 1 , 1 + n 1 , 2 ) ( n 1 , 1 + n 2 , 1 ) n 1. n .1 = n 1 , 1 n 2 , 2 − n 1 , 2 n 2 , 1 n 1. n .1 {\displaystyle ={\frac {n_{1,1}(n_{1,1}+n_{1,2}+n_{2,1}+n_{2,2})-(n_{1,1}+n_{1,2})(n_{1,1}+n_{2,1})}{n_{1.}n_{.1}}}={\frac {n_{1,1}n_{2,2}-n_{1,2}n_{2,1}}{n_{1.}n_{.1}}}} . Thus L i f t ( y = 1 , y ^ = 1 ) ≥ 1 ⟺ n 1 , 1 n 2 , 2 − n 1 , 2 n 2 , 1 ≥ 0 {\displaystyle \mathrm {Lift} (y=1,{\hat {y}}=1)\geq 1\iff n_{1,1}n_{2,2}-n_{1,2}n_{2,1}\geq 0} . Similarly, by swapping the roles of 1 and 2, we find that L i f t ( y = 2 , y ^ = 2 ) ≥ 1 ⟺ n 1 , 1 n 2 , 2 − n 1 , 2 n 2 , 1 ≥ 0 {\displaystyle \mathrm {Lift} (y=2,{\hat {y}}=2)\geq 1\iff n_{1,1}n_{2,2}-n_{1,2}n_{2,1}\geq 0} . Dividing by n 1. n 2. {\displaystyle n_{1.}n_{2.}} we find that the necessary and sufficient condition on the normalized confusion matrix is s e n s i t i v i t y s p e c i f i c i t y − ( 1 − s e n s i t i v i t y ) ( 1 − s p e c i f i c i t y ) ≥ 0 ⟺ s e n s i t i v i t y + s p e c i f i c i t y − 1 ≥ 0 ⟺ J ≥ 0 {\displaystyle \mathrm {sensitivity} \ \mathrm {specificity} -(1-\mathrm {sensitivity} )(1-\mathrm {specificity} )\geq 0\iff \mathrm {sensitivity} +\mathrm {specificity} -1\geq 0\iff J\geq 0} . This brings us back to the classical binary condition: Youden's J must be positive (or zero for random models). === Random models === A random model is a model that is independent of the target variable. This property is easily reformulated with the confusion matrix. This proposition shows that the model y ^ {\displaystyle {\hat {y}}} of y {\displaystyle y} is uninformative if and only if there are two families of numbers ( α i ) i {\displaystyle (\alpha _{i})_{i}} and ( β j ) j {\displaystyle (\beta _{j})_{j}} such that P ( { y = i } ∩ { y ^ = j } ) = α i β j {\displaystyle \mathbb {P} (\{y=i\}\cap \{{\hat {y}}=j\})=\alpha _{i}\beta _{j}} for any i {\displaystyle i} and j {\displaystyle j} . === Multiclass likelihood ratios and diagnostic odds ratios === We define generalized likelihood ratios calculated from the normalized confusion matrix: for any i {\displaystyle i} and j ≠ i {\displaystyle j\not =i} , let L R i , j = P ( y ^ = j ∣ y = j ) P ( y ^ = j ∣ y = i ) {\displaystyle \mathrm {LR} _{i,j}={\frac {\mathbb {P} ({\hat {y}}=j\mid y=j)}{\mathbb {P} ({\hat {y}}=j\mid y=i)}}} . When K = 2 {\displaystyle K=2} , if 2 is the class of interest,, we find the classical likelihood ratios L R 1 , 2 = L R + {\displaystyle \mathrm {LR} _{1,2}=\mathrm {LR} _{+}} and L R 2 , 1 = 1 L R − {\displaystyle \mathrm {LR} _{2,1}={\frac {1}{\mathrm {LR} _{-}}}} . Multiclass diagnostic odds ratios can also be defined using the formula D O R i , j = D O R j , i = L R i , j L R j , i = n i , i n j , j n i , j n j , i = P ( y ^ = j ∣ y = j ) / P ( y ^ = i ∣ y = j ) P ( y ^ = j ∣ y = i ) / P ( y ^ = i ∣ y = i ) {\displaystyle \mathrm {DOR} _{i,j}=\mathrm {DOR} _{j,i}=\mathrm {LR} _{i,j}\mathrm {LR} _{j,i}={\frac {n_{i,i}n_{j,j}}{n_{i,j}n_{j,i}}}={\frac {\mathbb {P} ({\hat {y}}=j\mid y=j)/\mathbb {P} ({\hat {y}}=i\mid y=j)}{\mathbb {P} ({\hat {y}}=j\mid y=i)/\mathbb {P} ({\hat {y}}=i\mid y=i)}}} We saw above that a better-than-chance model (or a random model) must verify L i f t ( y = i , y ^ = i ) ≥ 1 {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)\geq 1} for any i {\displaystyle i} and λ i {\displaystyle \lambda _{i}} . According to the previous corollary, likelihood ratios are thus greater

    Read more →
  • Virtual intelligence

    Virtual intelligence

    Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.

    Read more →
  • Types of artificial neural networks

    Types of artificial neural networks

    Types of neural networks (NN) include a family of techniques. The simplest types have static components, including number of units, number of layers, unit weights and topology. Dynamic NNs evolve via learning. Some types allow/require learning to be "supervised" by the operator, while others operate independently. Some types operate purely in hardware, while others are purely software and run on general purpose computers. The main types are: Transformers: these use attention to analyze every token in the input stream against every other token in the stream. That technique has enabled neural networks to reach the general public via chatbots, code generators and many other forms. Convolutional neural networks (CNN): a FNN that uses kernels and regularization to evade problems in prior generations of NNs. They are typically used to analyze visual and other two-dimensional data. Generative adversarial networks set networks (of varying structure) against each other, each trying to push the other(s) to produce better results such as winning a game or to deceive the opponent about the authenticity of an input. == Feedforward == In feedforward neural networks the information moves from the input to output directly in every layer. There can be hidden layers with or without cycles/loops to sequence inputs. Feedforward networks can be constructed with various types of units, such as binary McCulloch–Pitts neurons, the simplest of which is the perceptron. Continuous neurons, frequently with sigmoidal activation, are used in the context of backpropagation. == Group method of data handling == The Group Method of Data Handling (GMDH) features fully automatic structural and parametric model optimization. The node activation functions are Kolmogorov–Gabor polynomials that permit additions and multiplications. It uses a deep multilayer perceptron with eight layers. It is a supervised learning network that grows layer by layer, where each layer is trained by regression analysis. Useless items are detected using a validation set, and pruned through regularization. The size and depth of the resulting network depends on the task. == Autoencoder == An autoencoder, autoassociator or Diabolo network is similar to the multilayer perceptron (MLP) – with an input layer, an output layer and one or more hidden layers connecting them. However, the output layer has the same number of units as the input layer. Its purpose is to reconstruct its own inputs (instead of emitting a target value). Therefore, autoencoders are unsupervised learning models. An autoencoder is used for unsupervised learning of efficient codings, typically for the purpose of dimensionality reduction and for learning generative models of data. == Probabilistic == A probabilistic neural network (PNN) is a four-layer feedforward neural network. The layers are Input, hidden pattern, hidden summation, and output. In the PNN algorithm, the parent probability distribution function (PDF) of each class is approximated by a Parzen window and a non-parametric function. Then, using PDF of each class, the class probability of a new input is estimated and Bayes’ rule is employed to allocate it to the class with the highest posterior probability. It was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It is used for classification and pattern recognition. == Time delay == A time delay neural network (TDNN) is a feedforward architecture for sequential data that recognizes features independent of sequence position. In order to achieve time-shift invariance, delays are added to the input so that multiple data points (points in time) are analyzed together. It usually forms part of a larger pattern recognition system. It has been implemented using a perceptron network whose connection weights were trained with back propagation (supervised learning). == Convolutional == A convolutional neural network (CNN, or ConvNet or shift invariant or space invariant) is a class of deep network, composed of one or more convolutional layers with fully connected layers (matching those in typical ANNs) on top. It uses tied weights and pooling layers. In particular, max-pooling. It is often structured via Fukushima's convolutional architecture. They are variations of multilayer perceptrons that use minimal preprocessing. This architecture allows CNNs to take advantage of the 2D structure of input data. Its unit connectivity pattern is inspired by the organization of the visual cortex. Units respond to stimuli in a restricted region of space known as the receptive field. Receptive fields partially overlap, over-covering the entire visual field. Unit response can be approximated mathematically by a convolution operation. CNNs are suitable for processing visual and other two-dimensional data. They have shown superior results in both image and speech applications. They can be trained with standard backpropagation. CNNs are easier to train than other regular, deep, feed-forward neural networks and have many fewer parameters to estimate. Capsule Neural Networks (CapsNet) add structures called capsules to a CNN and reuse output from several capsules to form more stable (with respect to various perturbations) representations. Examples of applications in computer vision include DeepDream and robot navigation. They have wide applications in image and video recognition, recommender systems and natural language processing. == Deep stacking network == A deep stacking network (DSN) (deep convex network) is based on a hierarchy of blocks of simplified neural network modules. It was introduced in 2011 by Deng and Yu. It formulates the learning as a convex optimization problem with a closed-form solution, emphasizing the mechanism's similarity to stacked generalization. Each DSN block is a simple module that is easy to train by itself in a supervised fashion without backpropagation for the entire blocks. Each block consists of a simplified multi-layer perceptron (MLP) with a single hidden layer. The hidden layer h has logistic sigmoidal units, and the output layer has linear units. Connections between these layers are represented by weight matrix U; input-to-hidden-layer connections have weight matrix W. Target vectors t form the columns of matrix T, and the input data vectors x form the columns of matrix X. The matrix of hidden units is H = σ ( W T X ) {\displaystyle {\boldsymbol {H}}=\sigma ({\boldsymbol {W}}^{T}{\boldsymbol {X}})} . Modules are trained in order, so lower-layer weights W are known at each stage. The function performs the element-wise logistic sigmoid operation. Each block estimates the same final label class y, and its estimate is concatenated with original input X to form the expanded input for the next block. Thus, the input to the first block contains the original data only, while downstream blocks' input adds the output of preceding blocks. Then learning the upper-layer weight matrix U given other weights in the network can be formulated as a convex optimization problem: min U T f = ‖ U T H − T ‖ F 2 , {\displaystyle \min _{U^{T}}f=\|{\boldsymbol {U}}^{T}{\boldsymbol {H}}-{\boldsymbol {T}}\|_{F}^{2},} which has a closed-form solution. Unlike other deep architectures, such as DBNs, the goal is not to discover the transformed feature representation. The structure of the hierarchy of this kind of architecture makes parallel learning straightforward, as a batch-mode optimization problem. In purely discriminative tasks, DSNs outperform conventional DBNs. === Tensor deep stacking networks === This architecture is a DSN extension. It offers two important improvements: it uses higher-order information from covariance statistics, and it transforms the non-convex problem of a lower-layer to a convex sub-problem of an upper-layer. TDSNs use covariance statistics in a bilinear mapping from each of two distinct sets of hidden units in the same layer to predictions, via a third-order tensor. While parallelization and scalability are not considered seriously in conventional DNNs, all learning for DSNs and TDSNs is done in batch mode, to allow parallelization. Parallelization allows scaling the design to larger (deeper) architectures and data sets. The basic architecture is suitable for diverse tasks such as classification and regression. == Physics-informed == Such a neural network is designed for the numerical solution of mathematical equations, such as differential, integral, delay, fractional and others. As input parameters, PINN accepts variables (spatial, temporal, and others), transmits them through the network block. At the output, it produces an approximate solution and substitutes it into the mathematical model, considering the initial and boundary conditions. If the solution does not satisfy the required accuracy, one uses the backpropagation and rectify the solution. Besides PINN, other architectures have been developed to produce surrogate models for scientific comput

    Read more →
  • Dominance-based rough set approach

    Dominance-based rough set approach

    The dominance-based rough set approach (DRSA) is an extension of rough set theory for multi-criteria decision analysis (MCDA), introduced by Greco, Matarazzo and Słowiński. The main change compared to the classical rough sets is the substitution for the indiscernibility relation by a dominance relation, which permits one to deal with inconsistencies typical to consideration of criteria and preference-ordered decision classes. == Multicriteria classification (sorting) == Multicriteria classification (sorting) is one of the problems considered within MCDA and can be stated as follows: given a set of objects evaluated by a set of criteria (attributes with preference-order domains), assign these objects to some pre-defined and preference-ordered decision classes, such that each object is assigned to exactly one class. Due to the preference ordering, improvement of evaluations of an object on the criteria should not worsen its class assignment. The sorting problem is very similar to the problem of classification, however, in the latter, the objects are evaluated by regular attributes and the decision classes are not necessarily preference ordered. The problem of multicriteria classification is also referred to as ordinal classification problem with monotonicity constraints and often appears in real-life application when ordinal and monotone properties follow from the domain knowledge about the problem. As an illustrative example, consider the problem of evaluation in a high school. The director of the school wants to assign students (objects) to three classes: bad, medium and good (notice that class good is preferred to medium and medium is preferred to bad). Each student is described by three criteria: level in Physics, Mathematics and Literature, each taking one of three possible values bad, medium and good. Criteria are preference-ordered and improving the level from one of the subjects should not result in worse global evaluation (class). As a more serious example, consider classification of bank clients, from the viewpoint of bankruptcy risk, into classes safe and risky. This may involve such characteristics as "return on equity (ROE)", "return on investment (ROI)" and "return on sales (ROS)". The domains of these attributes are not simply ordered but involve a preference order since, from the viewpoint of bank managers, greater values of ROE, ROI or ROS are better for clients being analysed for bankruptcy risk . Thus, these attributes are criteria. Neglecting this information in knowledge discovery may lead to wrong conclusions. == Data representation == === Decision table === In DRSA, data are often presented using a particular form of decision table. Formally, a DRSA decision table is a 4-tuple S = ⟨ U , Q , V , f ⟩ {\displaystyle S=\langle U,Q,V,f\rangle } , where U {\displaystyle U\,\!} is a finite set of objects, Q {\displaystyle Q\,\!} is a finite set of criteria, V = ⋃ q ∈ Q V q {\displaystyle V=\bigcup {}_{q\in Q}V_{q}} where V q {\displaystyle V_{q}\,\!} is the domain of the criterion q {\displaystyle q\,\!} and f : U × Q → V {\displaystyle f\colon U\times Q\to V} is an information function such that f ( x , q ) ∈ V q {\displaystyle f(x,q)\in V_{q}} for every ( x , q ) ∈ U × Q {\displaystyle (x,q)\in U\times Q} . The set Q {\displaystyle Q\,\!} is divided into condition criteria (set C ≠ ∅ {\displaystyle C\neq \emptyset } ) and the decision criterion (class) d {\displaystyle d\,\!} . Notice, that f ( x , q ) {\displaystyle f(x,q)\,\!} is an evaluation of object x {\displaystyle x\,\!} on criterion q ∈ C {\displaystyle q\in C} , while f ( x , d ) {\displaystyle f(x,d)\,\!} is the class assignment (decision value) of the object. An example of decision table is shown in Table 1 below. === Outranking relation === It is assumed that the domain of a criterion q ∈ Q {\displaystyle q\in Q} is completely preordered by an outranking relation ⪰ q {\displaystyle \succeq _{q}} ; x ⪰ q y {\displaystyle x\succeq _{q}y} means that x {\displaystyle x\,\!} is at least as good as (outranks) y {\displaystyle y\,\!} with respect to the criterion q {\displaystyle q\,\!} . Without loss of generality, we assume that the domain of q {\displaystyle q\,\!} is a subset of reals, V q ⊆ R {\displaystyle V_{q}\subseteq \mathbb {R} } , and that the outranking relation is a simple order between real numbers ≥ {\displaystyle \geq \,\!} such that the following relation holds: x ⪰ q y ⟺ f ( x , q ) ≥ f ( y , q ) {\displaystyle x\succeq _{q}y\iff f(x,q)\geq f(y,q)} . This relation is straightforward for gain-type ("the more, the better") criterion, e.g. company profit. For cost-type ("the less, the better") criterion, e.g. product price, this relation can be satisfied by negating the values from V q {\displaystyle V_{q}\,\!} . === Decision classes and class unions === Let T = { 1 , … , n } {\displaystyle T=\{1,\ldots ,n\}\,\!} . The domain of decision criterion, V d {\displaystyle V_{d}\,\!} consist of n {\displaystyle n\,\!} elements (without loss of generality we assume V d = T {\displaystyle V_{d}=T\,\!} ) and induces a partition of U {\displaystyle U\,\!} into n {\displaystyle n\,\!} classes Cl = { C l t , t ∈ T } {\displaystyle {\textbf {Cl}}=\{Cl_{t},t\in T\}} , where C l t = { x ∈ U : f ( x , d ) = t } {\displaystyle Cl_{t}=\{x\in U\colon f(x,d)=t\}} . Each object x ∈ U {\displaystyle x\in U} is assigned to one and only one class C l t , t ∈ T {\displaystyle Cl_{t},t\in T} . The classes are preference-ordered according to an increasing order of class indices, i.e. for all r , s ∈ T {\displaystyle r,s\in T} such that r ≥ s {\displaystyle r\geq s\,\!} , the objects from C l r {\displaystyle Cl_{r}\,\!} are strictly preferred to the objects from C l s {\displaystyle Cl_{s}\,\!} . For this reason, we can consider the upward and downward unions of classes, defined respectively, as: C l t ≥ = ⋃ s ≥ t C l s C l t ≤ = ⋃ s ≤ t C l s t ∈ T {\displaystyle Cl_{t}^{\geq }=\bigcup _{s\geq t}Cl_{s}\qquad Cl_{t}^{\leq }=\bigcup _{s\leq t}Cl_{s}\qquad t\in T} == Main concepts == === Dominance === We say that x {\displaystyle x\,\!} dominates y {\displaystyle y\,\!} with respect to P ⊆ C {\displaystyle P\subseteq C} , denoted by x D p y {\displaystyle xD_{p}y\,\!} , if x {\displaystyle x\,\!} is better than y {\displaystyle y\,\!} on every criterion from P {\displaystyle P\,\!} , x ⪰ q y , ∀ q ∈ P {\displaystyle x\succeq _{q}y,\,\forall q\in P} . For each P ⊆ C {\displaystyle P\subseteq C} , the dominance relation D P {\displaystyle D_{P}\,\!} is reflexive and transitive, i.e. it is a partial pre-order. Given P ⊆ C {\displaystyle P\subseteq C} and x ∈ U {\displaystyle x\in U} , let D P + ( x ) = { y ∈ U : y D p x } {\displaystyle D_{P}^{+}(x)=\{y\in U\colon yD_{p}x\}} D P − ( x ) = { y ∈ U : x D p y } {\displaystyle D_{P}^{-}(x)=\{y\in U\colon xD_{p}y\}} represent P-dominating set and P-dominated set with respect to x ∈ U {\displaystyle x\in U} , respectively. === Rough approximations === The key idea of the rough set philosophy is approximation of one knowledge by another knowledge. In DRSA, the knowledge being approximated is a collection of upward and downward unions of decision classes and the "granules of knowledge" used for approximation are P-dominating and P-dominated sets. The P-lower and the P-upper approximation of C l t ≥ , t ∈ T {\displaystyle Cl_{t}^{\geq },t\in T} with respect to P ⊆ C {\displaystyle P\subseteq C} , denoted as P _ ( C l t ≥ ) {\displaystyle {\underline {P}}(Cl_{t}^{\geq })} and P ¯ ( C l t ≥ ) {\displaystyle {\overline {P}}(Cl_{t}^{\geq })} , respectively, are defined as: P _ ( C l t ≥ ) = { x ∈ U : D P + ( x ) ⊆ C l t ≥ } {\displaystyle {\underline {P}}(Cl_{t}^{\geq })=\{x\in U\colon D_{P}^{+}(x)\subseteq Cl_{t}^{\geq }\}} P ¯ ( C l t ≥ ) = { x ∈ U : D P − ( x ) ∩ C l t ≥ ≠ ∅ } {\displaystyle {\overline {P}}(Cl_{t}^{\geq })=\{x\in U\colon D_{P}^{-}(x)\cap Cl_{t}^{\geq }\neq \emptyset \}} Analogously, the P-lower and the P-upper approximation of C l t ≤ , t ∈ T {\displaystyle Cl_{t}^{\leq },t\in T} with respect to P ⊆ C {\displaystyle P\subseteq C} , denoted as P _ ( C l t ≤ ) {\displaystyle {\underline {P}}(Cl_{t}^{\leq })} and P ¯ ( C l t ≤ ) {\displaystyle {\overline {P}}(Cl_{t}^{\leq })} , respectively, are defined as: P _ ( C l t ≤ ) = { x ∈ U : D P − ( x ) ⊆ C l t ≤ } {\displaystyle {\underline {P}}(Cl_{t}^{\leq })=\{x\in U\colon D_{P}^{-}(x)\subseteq Cl_{t}^{\leq }\}} P ¯ ( C l t ≤ ) = { x ∈ U : D P + ( x ) ∩ C l t ≤ ≠ ∅ } {\displaystyle {\overline {P}}(Cl_{t}^{\leq })=\{x\in U\colon D_{P}^{+}(x)\cap Cl_{t}^{\leq }\neq \emptyset \}} Lower approximations group the objects which certainly belong to class union C l t ≥ {\displaystyle Cl_{t}^{\geq }} (respectively C l t ≤ {\displaystyle Cl_{t}^{\leq }} ). This certainty comes from the fact, that object x ∈ U {\displaystyle x\in U} belongs to the lower approximation P _ ( C l t ≥ ) {\displaystyle {\underline {P}}(Cl_{t}^{\geq })} (respectively P _ ( C l t ≤ ) {\displaystyle {\underl

    Read more →