AI Generator Jokes

AI Generator Jokes — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Natural language processing

    Natural language processing

    Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and linguistics more broadly. Major processing tasks in an NLP system include: speech recognition, text classification, natural language understanding, and natural language generation. == History == Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence," which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language. === Symbolic NLP (1950s – early 1990s) === The premise of symbolic NLP is often illustrated using John Searle's Chinese room thought experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. 1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe) until the late 1980s when the first statistical machine translation systems were developed. 1960s: Some notably successful natural language processing systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies, and ELIZA, a simulation of Rogerian psychotherapy, written by Joseph Weizenbaum between 1964 and 1966. Despite using minimal information about human thought or emotion, ELIZA was able to produce interactions that appeared human-like. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only twenty words, because that was all that would fit in a computer memory at the time. 1970s: During the 1970s, many programmers began to write "conceptual ontologies", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first chatterbots were written (e.g., PARRY). 1980s: The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of HPSG as a computational operationalization of generative grammar), morphology (e.g., two-level morphology), semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory) and other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other lines of research were continued, e.g., the development of chatterbots with Racter and Jabberwacky. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period. === Statistical NLP (1990s–present) === Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This shift was influenced by increasing computational power (see Moore's law) and a decline in the dominance of Chomskyan linguistic theories (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. 1990s: Many of the notable early successes in statistical methods in NLP occurred in the field of machine translation, due especially to work at IBM Research, such as IBM alignment models. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, many systems relied on corpora that were specifically developed for the tasks they were designed to perform. This reliance has been a major limitation to their broader effectiveness and continues to affect similar systems. Consequently, significant research has focused on methods for learning effectively from limited amounts of data. 2000s: With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, large quantities of non-annotated data are available (including, among other things, the entire content of the World Wide Web), which can often make up for the worse efficiency if the algorithm used has a low enough time complexity to be practical. 2003: word n-gram model, at the time the best statistical algorithm, is outperformed by a multi-layer perceptron (with a single hidden layer and context length of several words, trained on up to 14 million words, by Bengio et al.) 2010: Tomáš Mikolov (then a PhD student at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden layer to language modeling, and in the following years he went on to develop Word2vec. In the 2010s, representation learning and deep neural network-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. This shift gained momentum due to results showing that such techniques can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling and parsing. This is increasingly important in medicine and healthcare, where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care or protect patient privacy. == Approaches: Symbolic, statistical, neural networks == Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular: such as by writing grammars or devising heuristic rules for stemming. Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: both statistical and neural network methods tend to focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare and common cases equally. language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce. the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to intractability problems. Rule-based systems are commonly used: when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system, for preprocessing in NLP pipelines, e.g., tokenization, or for post-processing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses. === Statistical approach === In the late 1980s and mid-1990s, the statistical approach ended a peri

    Read more →
  • Latent class model

    Latent class model

    In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent). Latent class analysis (LCA) is a subset of structural equation modeling used to find groups or subtypes of cases in multivariate categorical data. These groups or subtypes of cases are called "latent classes". When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d. In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability. The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another. == Model == Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the LCM is written as p i 1 , i 2 , … , i N ≈ ∑ t T p t ∏ n N p i n , t n , {\displaystyle p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},} where T {\displaystyle T} is the number of latent classes and p t {\displaystyle p_{t}} are the so-called recruitment or unconditional probabilities that should sum to one. p i n , t n {\displaystyle p_{i_{n},t}^{n}} are the marginal or conditional probabilities. For a two-way latent class model, the form is p i j ≈ ∑ t T p t p i t p j t . {\displaystyle p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.} This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization. The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label. == Related methods == There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit. In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen. == Application == LCA may be used in many fields, such as: collaborative filtering, Behavior Genetics and Evaluation of diagnostic tests.

    Read more →
  • Out-of-bag error

    Out-of-bag error

    Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from. OOB error is the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. Bootstrap aggregating allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations that were not used in the building of the next base learner. == Out-of-bag dataset == When bootstrap aggregating is performed, two independent sets are created. One set, the bootstrap sample, is the data chosen to be "in-the-bag" by sampling with replacement. The out-of-bag set is all data not chosen in the sampling process. When this process is repeated, such as when building a random forest, many bootstrap samples and OOB sets are created. The OOB sets can be aggregated into one dataset, but each sample is only considered out-of-bag for the trees that do not include it in their bootstrap sample. The picture below shows that for each bag sampled, the data is separated into two groups. This example shows how bagging could be used in the context of diagnosing disease. A set of patients are the original dataset, but each model is trained only by the patients in its bag. The patients in each out-of-bag set can be used to test their respective models. The test would consider whether the model can accurately determine if the patient has the disease. == Calculating out-of-bag error == Since each out-of-bag set is not used to train the model, it is a good test for the performance of the model. The specific calculation of OOB error depends on the implementation of the model, but a general calculation is as follows. Find all models (or trees, in the case of a random forest) that are not trained by the OOB instance. Take the majority vote of these models' result for the OOB instance, compared to the true value of the OOB instance. Compile the OOB error for all instances in the OOB dataset. The bagging process can be customized to fit the needs of a model. To ensure an accurate model, the bootstrap training sample size should be close to that of the original set. Also, the number of iterations (trees) of the model (forest) should be considered to find the true OOB error. The OOB error will stabilize over many iterations so starting with a high number of iterations is a good idea. Shown in the example to the right, the OOB error can be found using the method above once the forest is set up. == Comparison to cross-validation == Out-of-bag error and cross-validation (CV) are different methods of measuring the error estimate of a machine learning model. Over many iterations, the two methods should produce a very similar error estimate. That is, once the OOB error stabilizes, it will converge to the cross-validation (specifically leave-one-out cross-validation) error. The advantage of the OOB method is that it requires less computation and allows one to test the model as it is being trained. == Accuracy and Consistency == Out-of-bag error is used frequently for error estimation within random forests but with the conclusion of a study done by Silke Janitza and Roman Hornung, out-of-bag error has shown to overestimate in settings that include an equal number of observations from all response classes (balanced samples), small sample sizes, a large number of predictor variables, small correlation between predictors, and weak effects.

    Read more →
  • Tensor sketch

    Tensor sketch

    In statistics, machine learning and algorithms, a tensor sketch is a type of dimensionality reduction that is particularly efficient when applied to vectors that have tensor structure. Such a sketch can be used to speed up explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms. == Mathematical definition == Mathematically, a dimensionality reduction or sketching matrix is a matrix M ∈ R k × d {\displaystyle M\in \mathbb {R} ^{k\times d}} , where k < d {\displaystyle k Read more →

  • SwissCovid

    SwissCovid

    SwissCovid is a COVID-19 contact tracing app used for digital contact tracing in Switzerland. Use of the app is voluntary and based on a decentralized approach using Bluetooth Low Energy and Decentralized Privacy-Preserving Proximity Tracing (dp3t). == Development == The app was developed in collaboration with the FOPH by Federal Office for Information Technology, Systems and Communications FOITT, École polytechnique fédérale de Lausanne (EPFL) and the Swiss Federal Institute of Technology in Zurich (ETH) as well as other experts. == Non-interoperability with applications in European countries == There is an agreement between EU countries to make applications compatible. However, there is no legal basis for the SwissCovid application to be part of this portal even though technically speaking it is ready, according to Sang-Ill Kim, head of the digital transformation department of the Federal Office of Public Health. == Criticism == === Not full open source and dependence on Google and Apple === In June 2020, researchers Serge Vaudenay and Martin Vuagnoux published a critical analysis of the application, noting that it relies heavily on Google and Apple's exposure notification system, which is integrated into their respective Android and iOS operating systems. Since Google and Apple have not released the full source code of this system, this would call into question the truly open source nature of the application. The researchers note that the dp3t collective, which includes the developers of the application, has asked Google and Apple to release their code. Moreover, they criticize the official description of the application and its functionalities, as well as the adequacy of the legal basis for its effective operation. === Cyber attacks === Professor Serge Vaudenay and Martin Vuagnoux identify also various security vulnerabilities in the application. The system would thus allow a third party to trace the movements of a phone using the application by means of Bluetooth sensors scattered along its path, for example in a building. Another possible attack would be to copy identifiers from the phones of people who may be ill (for example, in a hospital), and to reproduce those identifiers in order to receive notification of exposure to COVID-19 and illegitimately benefit from quarantine (thus entitling them to paid leave, a postponed examination, or other benefits). The system would also allow a third party to use a phone using the application by means of Bluetooth sensors scattered along the way. Paul-Olivier Dehaye of Personaldata.io and professor Joel Reardon of the University of Calgary published in June 2020 several examples of AEM (Associated Encrypted Metadata) replay and manipulation attacks via software development kits (SDKs) found in benign third-party mobile applications downloaded by the general public and having the phone's Bluetooth access permissions and in September 2020 a paper indicating that "Bluetooth-based proximity tracing apps are fundamentally insecure with respect to an attacker leveraging a malevolent app or SDK". === Costs === According to a publication by the federal administration, "the costs of developing the software for the mobile phone application, the GR back-end and the code management system as well as the costs for access management for the cantonal doctors' services are estimated at a one-off amount of 1.65 million francs. However, the Zurich-based company Ubique, responsible for the development of the application, was finally awarded the mandate to develop the application for an amount of 1.8 million francs. Through the Botnar Foundation based in Basel, École polytechnique fédérale de Lausanne received 3.5 million Swiss francs for the development of the application

    Read more →
  • Generalized multidimensional scaling

    Generalized multidimensional scaling

    Generalized multidimensional scaling (GMDS) is an extension of metric multidimensional scaling, in which the target space is non-Euclidean. When the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. GMDS is an emerging research direction. Currently, main applications are recognition of deformable objects (e.g. for three-dimensional face recognition) and texture mapping.

    Read more →
  • Arabic Speech Corpus

    Arabic Speech Corpus

    The Arabic Speech Corpus is a Modern Standard Arabic (MSA) speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of more than 3.7 hours of MSA speech aligned with recorded speech on the phoneme level. The annotations include word stress marks on the individual phonemes. The Arabic Speech Corpus was built as part of a doctoral project by Nawar Halabi at the University of Southampton funded by MicroLinkPC who own an exclusive license to commercialise the corpus, but the corpus is available for strictly non-commercial purposes through the official Arabic Speech Corpus website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. == Purpose == The corpus was mainly built for speech synthesis purposes, specifically Speech Synthesis, but the corpus has been used for building HMM based voices in Arabic. It was also used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems. == Contents == The package contains the following: 1813 .wav files containing spoken utterances. 1813 .lab files containing text utterances. 1813 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line. orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line. Orthography is in Buckwalter Format which is friendlier where there is software that does not read Arabic script. It can be easily converted back to Arabic. There is an extra 18 minutes of fully annotated corpus (separate from above but with the same structure as above) which was used to evaluated the corpus (see PhD thesis). The corpus was also used to prove that using automatically extracted, orthography-based stress marks improve the quality of speech synthesis in MSA.

    Read more →
  • IBM Watsonx

    IBM Watsonx

    Watsonx is a platform by IBM for building and managing artificial intelligence (AI) applications for business use. Released on May 9, 2023, the platform provides software tools and infrastructure for companies to work with both IBM's own AI models and models from third-party sources. The platform consists of three main components: watsonx.ai, a studio for training, validating, and deploying AI models; watsonx.data, a system for storing and managing data used by the models; and watsonx.governance, a toolkit to ensure AI applications are compliant with company policies and regulations. A key feature of the platform is that it can be trained on a company's private data to perform specialized tasks, a process known as fine-tuning. IBM states that this client-specific data is not used to train its own models. == History == Watsonx was introduced on May 9, 2023, at the annual IBM Think conference, as a platform that includes multiple services. Just like Watson AI computer with the similar name, Watsonx was named after Thomas J. Watson, IBM's founder and first CEO. On February 13, 2024, Anaconda partnered with IBM to embed its open-source Python packages into Watsonx. Watsonx is used at ESPN's Fantasy Football App for managing players' performance, and by Italian telecommunications company Wind Tre. It was employed to generate editorial content around nominees during the 66th Annual Grammy Awards. In 2025, Wimbledon integrated IBM watsonx generative AI into its app and website. Integrated with IBM Safer Payments, IBM watsonx has been used in banking sector fraud detection and anti-money laundering (AML) systems. == Services == === watsonx.ai === Watsonx.ai is a platform that allows AI developers to leverage a wide range of LLMs under IBM's own Granite series and others such as Facebook's LLaMA-2, free and open-source model Mistral, and many others present in the Hugging Face community. These models come pre-trained and optimized for various natural language processing (NLP) applications.The platform also allows fine-tuning with its Tuning Studio. === watsonx.data === Watsonx.data is a platform designed to assist clients in addressing issues related to data volume, complexity, cost, and governance.. The platform facilitates seamless data access, whether stored in the cloud or on-premises, through a single entry point. === watsonx.governance === Watsonx.governance is a platform that utilizes IBM's AI capabilities to implement AI lifecycle governance. This helps them manage risks and maintain compliance with evolving AI and industry regulations, while reducing AI bias through automated oversight.

    Read more →
  • Shader lamps

    Shader lamps

    Shader lamps is a computer graphic technique used to change the appearance of physical objects. The still or moving objects are illuminated, using one or more video projectors, by static or animated texture or video stream. The method was invented at University of North Carolina at Chapel Hill by Ramesh Raskar, Greg Welch, Kok-lim Low and Deepak Bandyopadhyay in 1999 [1] as a follow on to Spatial Augmented Reality [2] also invented at University of North Carolina at Chapel Hill in 1998 by Ramesh Raskar, Greg Welch and Henry Fuchs. A 3D graphic rendering software is typically used to compute the deformation caused by the non perpendicular, non-planar or even complex projection surface. Complex objects (or aggregation of multiple simple objects) create self shadows that must be compensated by using several projectors. The objects are typically replaced by neutral color ones, the projection giving all its visual properties, thus the name shader lamps. The technique can be used to create a sense of invisibility, by rendering transparency. The object is illuminated not by a replacement of its own visual properties, but by the corresponding visual surface placed behind the object as seen from an arbitrary viewing point.

    Read more →
  • European Conference on Computer Vision

    European Conference on Computer Vision

    The European Conference on Computer Vision (ECCV) is a biennial research conference with the proceedings published by Springer Science+Business Media. Similar to ICCV in scope and quality, it is held those years which ICCV is not. It is considered to be one of the top conferences in computer vision, alongside CVPR and ICCV, with an 'A' rating from the Australian Ranking of ICT Conferences and an 'A1' rating from the Brazilian ministry of education. The acceptance rate for ECCV 2010 was 24.4% for posters and 3.3% for oral presentations. Like other top computer vision conferences, ECCV has tutorial talks, technical sessions, and poster sessions. The conference is usually spread over five to six days with the main technical program occupying three days in the middle, and tutorial and workshops, focused on specific topics, being held in the beginning and at the end. The ECCV presents the Koenderink Prize annually to recognize fundamental contributions in computer vision. == Location == The conference is usually held in autumn in Europe.

    Read more →
  • Evolutionary programming

    Evolutionary programming

    Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES( μ + λ {\displaystyle \mu +\lambda } ) in one detail. All individuals are selected for the new population, while in ES( μ + λ {\displaystyle \mu +\lambda } ), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms. == History == It was first used by Lawrence J. Fogel in the US in 1960 in order to use simulated evolution as a learning process aiming to generate artificial intelligence. It was used to evolve finite-state machines as predictors.

    Read more →
  • Multiple kernel learning

    Multiple kernel learning

    Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources (e.g. sound and images from a video) that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source. Multiple kernel learning approaches have been used in many applications, such as event recognition in video, object recognition in images, and biomedical data fusion. == Algorithms == Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels, however, many algorithms have been developed. The basic idea behind multiple kernel learning algorithms is to add an extra parameter to the minimization problem of the learning algorithm. As an example, consider the case of supervised learning of a linear combination of a set of n {\displaystyle n} kernels K {\displaystyle K} . We introduce a new kernel K ′ = ∑ i = 1 n β i K i {\displaystyle K'=\sum _{i=1}^{n}\beta _{i}K_{i}} , where β {\displaystyle \beta } is a vector of coefficients for each kernel. Because the kernels are additive (due to properties of reproducing kernel Hilbert spaces), this new function is still a kernel. For a set of data X {\displaystyle X} with labels Y {\displaystyle Y} , the minimization problem can then be written as min β , c E ( Y , K ′ c ) + R ( K , c ) {\displaystyle \min _{\beta ,c}\mathrm {E} (Y,K'c)+R(K,c)} where E {\displaystyle \mathrm {E} } is an error function and R {\displaystyle R} is a regularization term. E {\displaystyle \mathrm {E} } is typically the square loss function (Tikhonov regularization) or the hinge loss function (for SVM algorithms), and R {\displaystyle R} is usually an ℓ n {\displaystyle \ell _{n}} norm or some combination of the norms (i.e. elastic net regularization). This optimization problem can then be solved by standard optimization methods. Adaptations of existing techniques such as the Sequential Minimal Optimization have also been developed for multiple kernel SVM-based methods. === Supervised learning === For supervised learning, there are many other algorithms that use different methods to learn the form of the kernel. The following categorization has been proposed by Gonen and Alpaydın (2011) ==== Fixed rules approaches ==== Fixed rules approaches such as the linear combination algorithm described above use rules to set the combination of the kernels. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm. Other examples of fixed rules include pairwise kernels, which are of the form k ( ( x 1 i , x 1 j ) , ( x 2 i , x 2 j ) ) = k ( x 1 i , x 2 i ) k ( x 1 j , x 2 j ) + k ( x 1 i , x 2 j ) k ( x 1 j , x 2 i ) {\displaystyle k((x_{1i},x_{1j}),(x_{2i},x_{2j}))=k(x_{1i},x_{2i})k(x_{1j},x_{2j})+k(x_{1i},x_{2j})k(x_{1j},x_{2i})} . These pairwise approaches have been used in predicting protein-protein interactions. ==== Heuristic approaches ==== These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix. Examples of these include the kernel from Tenabe et al. (2008). Letting π m {\displaystyle \pi _{m}} be the accuracy obtained using only K m {\displaystyle K_{m}} , and letting δ {\displaystyle \delta } be a threshold less than the minimum of the single-kernel accuracies, we can define β m = π m − δ ∑ h = 1 n ( π h − δ ) {\displaystyle \beta _{m}={\frac {\pi _{m}-\delta }{\sum _{h=1}^{n}(\pi _{h}-\delta )}}} Other approaches use a definition of kernel similarity, such as A ( K 1 , K 2 ) = ⟨ K 1 , K 2 ⟩ ⟨ K 1 , K 1 ⟩ ⟨ K 2 , K 2 ⟩ {\displaystyle A(K_{1},K_{2})={\frac {\langle K_{1},K_{2}\rangle }{\sqrt {\langle K_{1},K_{1}\rangle \langle K_{2},K_{2}\rangle }}}} Using this measure, Qui and Lane (2009) used the following heuristic to define β m = A ( K m , Y Y T ) ∑ h = 1 n A ( K h , Y Y T ) {\displaystyle \beta _{m}={\frac {A(K_{m},YY^{T})}{\sum _{h=1}^{n}A(K_{h},YY^{T})}}} ==== Optimization approaches ==== These approaches solve an optimization problem to determine parameters for the kernel combination function. This has been done with similarity measures and structural risk minimization approaches. For similarity measures such as the one defined above, the problem can be formulated as follows: max β , tr ⁡ ( K t r a ′ ) = 1 , K ′ ≥ 0 A ( K t r a ′ , Y Y T ) . {\displaystyle \max _{\beta ,\operatorname {tr} (K'_{tra})=1,K'\geq 0}A(K'_{tra},YY^{T}).} where K t r a ′ {\displaystyle K'_{tra}} is the kernel of the training set. Structural risk minimization approaches that have been used include linear approaches, such as that used by Lanckriet et al. (2002). We can define the implausibility of a kernel ω ( K ) {\displaystyle \omega (K)} to be the value of the objective function after solving a canonical SVM problem. We can then solve the following minimization problem: min tr ⁡ ( K t r a ′ ) = c ω ( K t r a ′ ) {\displaystyle \min _{\operatorname {tr} (K'_{tra})=c}\omega (K'_{tra})} where c {\displaystyle c} is a positive constant. Many other variations exist on the same idea, with different methods of refining and solving the problem, e.g. with nonnegative weights for individual kernels and using non-linear combinations of kernels. ==== Bayesian approaches ==== Bayesian approaches put priors on the kernel parameters and learn the parameter values from the priors and the base algorithm. For example, the decision function can be written as f ( x ) = ∑ i = 0 n α i ∑ m = 1 p η m K m ( x i m , x m ) {\displaystyle f(x)=\sum _{i=0}^{n}\alpha _{i}\sum _{m=1}^{p}\eta _{m}K_{m}(x_{i}^{m},x^{m})} η {\displaystyle \eta } can be modeled with a Dirichlet prior and α {\displaystyle \alpha } can be modeled with a zero-mean Gaussian and an inverse gamma variance prior. This model is then optimized using a customized multinomial probit approach with a Gibbs sampler. These methods have been used successfully in applications such as protein fold recognition and protein homology problems ==== Boosting approaches ==== Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached. An example of this is the MARK model developed by Bennett et al. (2002) f ( x ) = ∑ i = 1 N ∑ m = 1 P α i m K m ( x i m , x m ) + b {\displaystyle f(x)=\sum _{i=1}^{N}\sum _{m=1}^{P}\alpha _{i}^{m}K_{m}(x_{i}^{m},x^{m})+b} The parameters α i m {\displaystyle \alpha _{i}^{m}} and b {\displaystyle b} are learned by gradient descent on a coordinate basis. In this way, each iteration of the descent algorithm identifies the best kernel column to choose at each particular iteration and adds that to the combined kernel. The model is then rerun to generate the optimal weights α i {\displaystyle \alpha _{i}} and b {\displaystyle b} . === Semisupervised learning === Semisupervised learning approaches to multiple kernel learning are similar to other extensions of supervised learning approaches. An inductive procedure has been developed that uses a log-likelihood empirical loss and group LASSO regularization with conditional expectation consensus on unlabeled data for image categorization. We can define the problem as follows. Let L = ( x i , y i ) {\displaystyle L={(x_{i},y_{i})}} be the labeled data, and let U = x i {\displaystyle U={x_{i}}} be the set of unlabeled data. Then, we can write the decision function as follows. f ( x ) = α 0 + ∑ i = 1 | L | α i K i ( x ) {\displaystyle f(x)=\alpha _{0}+\sum _{i=1}^{|L|}\alpha _{i}K_{i}(x)} The problem can be written as min f L ( f ) + λ R ( f ) + γ Θ ( f ) {\displaystyle \min _{f}L(f)+\lambda R(f)+\gamma \Theta (f)} where L {\displaystyle L} is the loss function (weighted negative log-likelihood in this case), R {\displaystyle R} is the regularization parameter (Group LASSO in this case), and Θ {\displaystyle \Theta } is the conditional expectation consensus (CEC) penalty on unlabeled data. The CEC penalty is defined as follows. Let the marginal kernel density for all the data be g m π ( x ) = ⟨ ϕ m π , ψ m ( x ) ⟩ {\displaystyle g_{m}^{\pi }(x)=\langle \phi _{m}^{\pi },\psi _{m}(x)\rangle } where ψ m ( x ) = [ K m ( x 1 , x ) , … , K m ( x L , x ) ] T {\displaystyle \psi _{m}(x)=[K_{m}(x_{1},x),\ldots ,K_{m}(x_{L},x)]^{T}} (the kernel distance between the labe

    Read more →
  • Comparison of operating systems

    Comparison of operating systems

    These tables provide a comparison of operating systems, of computer devices, as listing general and technical information for a number of widely used and currently available PC or handheld (including smartphone and tablet computer) operating systems. The article "Usage share of operating systems" provides a broader, and more general, comparison of operating systems that includes servers, mainframes and supercomputers. Because of the large number and variety of available Linux distributions, they are all grouped under a single entry; see comparison of Linux distributions for a detailed comparison. There is also a variety of BSD and DOS operating systems, covered in comparison of BSD operating systems and comparison of DOS operating systems. == Nomenclature == The nomenclature for operating systems varies among providers and sometimes within providers. For purposes of this article the terms used are; kernel In some operating systems, the OS is split into a low level region called the kernel and higher level code that relies on the kernel. Typically the kernel implements processes but its code does not run as part of a process. hybrid kernel monolithic kernel Nucleus In some operating systems there is OS code permanently present in a contiguous region of memory addressable by unprivileged code; in IBM systems this is typically referred to as the nucleus. The nucleus typically contains both code that requires special privileges and code that can run in an unprivileged state. Typically some code in the nucleus runs in the context of a dispatching unit, e.g., address space, process, task, thread, while other code runs independent of any dispatching unit. In contemporary operating systems unprivileged applications cannot alter the nucleus. License and pricing policies vary widely among different systems. Among others, the tables below use the following terms: BSD BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. bundled The fee is included in the price of the hardware == General information == == Technical information == == Security == == Commands == For POSIX compliant (or partly compliant) systems like FreeBSD, Linux, macOS or Solaris, the basic commands are the same because they are standardized. NOTE: Linux systems may vary by distribution which specific program, or even 'command' is called, via the POSIX alias function. For example, if you wanted to use the DOS dir to give you a directory listing with one detailed file listing per line you could use alias dir='ls -lahF' (e.g. in a session configuration file).

    Read more →
  • Unique negative dimension

    Unique negative dimension

    Unique negative dimension (UND) is a complexity measure for the model of learning from positive examples. The unique negative dimension of a class C {\displaystyle C} of concepts is the size of the maximum subclass D ⊆ C {\displaystyle D\subseteq C} such that for every concept c ∈ D {\displaystyle c\in D} , we have ∩ ( D ∖ { c } ) ∖ c {\displaystyle \cap (D\setminus \{c\})\setminus c} is nonempty. This concept was originally proposed by M. Gereb-Graus in "Complexity of learning from one-side examples", Technical Report TR-20-89, Harvard University Division of Engineering and Applied Science, 1989.

    Read more →
  • Artificial development

    Artificial development

    Artificial development, also known as artificial embryogeny or machine intelligence or computational development, is an area of computer science and engineering concerned with computational models motivated by genotype–phenotype mappings in biological systems. Artificial development is often considered a sub-field of evolutionary computation, although the principles of artificial development have also been used within stand-alone computational models. Within evolutionary computation, the need for artificial development techniques was motivated by the perceived lack of scalability and evolvability of direct solution encodings (Tufte, 2008). Artificial development entails indirect solution encoding. Rather than describing a solution directly, an indirect encoding describes (either explicitly or implicitly) the process by which a solution is constructed. Often, but not always, these indirect encodings are based upon biological principles of development such as morphogen gradients, cell division and cellular differentiation (e.g. Doursat 2008), gene regulatory networks (e.g. Guo et al., 2009), degeneracy (Whitacre et al., 2010), grammatical evolution (de Salabert et al., 2006), or analogous computational processes such as re-writing, iteration, and time. The influences of interaction with the environment, spatiality and physical constraints on differentiated multi-cellular development have been investigated more recently (e.g. Knabe et al. 2008). Artificial development approaches have been applied to a number of computational and design problems, including electronic circuit design (Miller and Banzhaf 2003), robotic controllers (e.g. Taylor 2004), and the design of physical structures (e.g. Hornby 2004).

    Read more →