AI Detector Accuracy

AI Detector Accuracy — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • BulSemCor

    BulSemCor

    The Bulgarian Sense-annotated Corpus (BulSemCor) (Bulgarian: Български семантично анотиран корпус (БулСемКор)) is a structured corpus of Bulgarian texts in which each lexical item is assigned a sense tag. BulSemCor was created by the Department of Computational Linguistics at the Institute for Bulgarian Language of the Bulgarian Academy of Sciences. == Structure == BulSemCor was created as part of a nationally funded project titled "BulNet – A lexico-semantic network for the Bulgarian Language" (2005–2010). It follows the general methodology of SemCor combined with some specific principles. The corpus for annotation consists of 101,791 tokens covering an excerpt from the Bulgarian "Brown" Corpus modelled on the Brown Corpus.Francis Kucera An important feature of BulSemCor is that the samples are selected using heuristics that provide optimal coverage of ambiguous lexis. BulSemCor is manually sense-annotated according to the Bulgarian WordNet. Its size is comparable to that of other contemporary semantically annotated corpora or pool of acceptable linguistic components. The semantic annotation consists in associating each lexical item in the corpus with exactly one synonym set (synset) in the Bulgarian WordNet that best describes its sense in the particular context. The selection of the best match among the suggested candidates is based on a set of procedures, such as the other synset members, the synset gloss (explanatory definition) and the position of a given candidate in the WordNet structure. == Scale == The number of annotated tokens is 99,480 (the difference in the number of tokens compared to the initial corpus is due to the fact that some of them are not linguistic items). The simple word count is 86,842 and multiword expressions (MWE) are 5,797 (12,638 tokens). == Specific features == All words in BulSemCor are assigned a sense, while according to established practice only simple content words or content word classes (typically nouns and verbs) are annotated. Since 2000 the development of language resources, has broadened to include annotation of function words and multiword expressions covering particular senses or types of words and expressions. In this respect, BulSemCor's annotation is more exhaustive and hence provides greater opportunities for linguistic observations and non-linear programming (NLP) applications. Annotated items inherit the linguistic information associated with the corresponding synset, which along with morphological and semantic tags may include annotation on one or more of the following additional levels: Partial information about the syntactic structure of MWE types – particularly, information about syntactic heads and their dependents; Information about the category of the named entities – names, locations, organisations, dates, numbers, etc.; Information about the taxonomic category of adverbs, such as time, place, manner, degree, quantity, etc.; Information about the type of the syntactic relationships – coordination or subordination – expressed by conjunctions; Information about the original part-of-speech of substantivised words (non-nouns that act as nouns in a particular context); Stylistic/register, grammatical and other information about synsets or individual synset members;

    Read more →
  • Artificial intelligence in spirituality

    Artificial intelligence in spirituality

    Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

    Read more →
  • Spreading activation

    Spreading activation

    Spreading activation is a method for searching associative networks, biological and artificial neural networks, or semantic networks. The search process is initiated by labeling a set of source nodes (e.g. concepts in a semantic network) with weights or "activation" and then iteratively propagating or "spreading" that activation out to other nodes linked to the source nodes. Most often these "weights" are real values that decay as activation propagates through the network. When the weights are discrete this process is often referred to as marker passing. Activation may originate from alternate paths, identified by distinct markers, and terminate when two alternate paths reach the same node. However brain studies show that several different brain areas play an important role in semantic processing. Spreading activation in semantic networks as a model were invented in cognitive psychology to model the fan out effect. Spreading activation can also be applied in information retrieval, by means of a network of nodes representing documents and terms contained in those documents. == Cognitive psychology == As it relates to cognitive psychology, spreading activation is the theory of how the brain iterates through a network of associated ideas to retrieve specific information. The spreading activation theory presents the array of concepts within our memory as cognitive units, each consisting of a node and its associated elements or characteristics, all connected together by edges. A spreading activation network can be represented schematically, in a sort of web diagram with shorter lines between two nodes meaning the ideas are more closely related and will typically be associated more quickly to the original concept. In memory psychology, the spreading activation model holds that people organize their knowledge of the world based on their personal experiences, which in turn form the network of ideas that is the person's knowledge of the world. When a word (the target) is preceded by an associated word (the prime) in word recognition tasks, participants seem to perform better in the amount of time that it takes them to respond. For instance, subjects respond faster to the word "doctor" when it is preceded by "nurse" than when it is preceded by an unrelated word like "carrot". This semantic priming effect with words that are close in meaning within the cognitive network has been seen in a wide range of tasks given by experimenters, ranging from sentence verification to lexical decision and naming. As another example, if the original concept is "red" and the concept "vehicles" is primed, they are much more likely to say "fire engine" instead of something unrelated to vehicles, such as "cherries". If instead "fruits" was primed, they would likely name "cherries" and continue on from there. The activation of pathways in the network has everything to do with how closely linked two concepts are by meaning, as well as how a subject is primed. == Algorithm == A directed graph is populated by Nodes[ 1...N ] each having an associated activation value A [ i ] which is a real number in the range [0.0 ... 1.0]. A Link[ i, j ] connects source node[ i ] with target node[ j ]. Each edge has an associated weight W [ i, j ] usually a real number in the range [0.0 ... 1.0]. Parameters: Firing threshold F, a real number in the range [0.0 ... 1.0] Decay factor D, a real number in the range [0.0 ... 1.0] Steps: Initialize the graph setting all activation values A [ i ] to zero. Set one or more origin nodes to an initial activation value greater than the firing threshold F. A typical initial value is 1.0. For each unfired node [ i ] in the graph having an activation value A [ i ] greater than the node firing threshold F: For each Link [ i, j ] connecting the source node [ i ] with target node [ j ], adjust A [ j ] = A [ j ] + (A [ i ] W [ i, j ] D) where D is the decay factor. If a target node receives an adjustment to its activation value so that it would exceed 1.0, then set its new activation value to 1.0. Likewise maintain 0.0 as a lower bound on the target node's activation value should it receive an adjustment to below 0.0. Once a node has fired it may not fire again, although variations of the basic algorithm permit repeated firings and loops through the graph. Nodes receiving a new activation value that exceeds the firing threshold F are marked for firing on the next spreading activation cycle. If activation originates from more than one node, a variation of the algorithm permits marker passing to distinguish the paths by which activation is spread over the graph The procedure terminates when either there are no more nodes to fire or in the case of marker passing from multiple origins, when a node is reached from more than one path. Variations of the algorithm that permit repeated node firings and activation loops in the graph, terminate after a steady activation state, with respect to some delta, is reached, or when a maximum number of iterations is exceeded. == Examples ==

    Read more →
  • Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, and aesthetic ranking. == Algorithm == The CLIP method trains a pair of models contrastively. One model takes in a piece of text as input and outputs a single vector representing its semantic content. The other model takes in an image and similarly outputs a single vector representing its visual content. The models are trained so that the vectors corresponding to semantically similar text-image pairs are close together in the shared vector space, while those corresponding to dissimilar pairs are far apart. To train a pair of CLIP models, one would start by preparing a large dataset of image-caption pairs. During training, the models are presented with batches of N {\displaystyle N} image-caption pairs. Let the outputs from the text and image models be respectively v 1 , . . . , v N , w 1 , . . . , w N {\displaystyle v_{1},...,v_{N},w_{1},...,w_{N}} . Two vectors are considered "similar" if their dot product is large. The loss incurred on this batch is the multi-class N-pair loss, which is a symmetric cross-entropy loss over similarity scores: − 1 N ∑ i ln ⁡ e v i ⋅ w i / T ∑ j e v i ⋅ w j / T − 1 N ∑ j ln ⁡ e v j ⋅ w j / T ∑ i e v i ⋅ w j / T {\displaystyle -{\frac {1}{N}}\sum _{i}\ln {\frac {e^{v_{i}\cdot w_{i}/T}}{\sum _{j}e^{v_{i}\cdot w_{j}/T}}}-{\frac {1}{N}}\sum _{j}\ln {\frac {e^{v_{j}\cdot w_{j}/T}}{\sum _{i}e^{v_{i}\cdot w_{j}/T}}}} In essence, this loss function encourages the dot product between matching image and text vectors ( v i ⋅ w i {\displaystyle v_{i}\cdot w_{i}} ) to be high, while discouraging high dot products between non-matching pairs. The parameter T > 0 {\displaystyle T>0} is the temperature, which is parameterized in the original CLIP model as T = e − τ {\displaystyle T=e^{-\tau }} where τ ∈ R {\displaystyle \tau \in \mathbb {R} } is a learned parameter. Other loss functions are possible. For example, Sigmoid CLIP (SigLIP) proposes the following loss function: L = 1 N ∑ i , j ∈ 1 : N f ( ( 2 δ i , j − 1 ) ( e τ w i ⋅ v j + b ) ) {\displaystyle L={\frac {1}{N}}\sum _{i,j\in 1:N}f((2\delta _{i,j}-1)(e^{\tau }w_{i}\cdot v_{j}+b))} where f ( x ) = ln ⁡ ( 1 + e − x ) {\displaystyle f(x)=\ln(1+e^{-x})} is the negative log sigmoid loss, and the Dirac delta symbol δ i , j {\displaystyle \delta _{i,j}} is 1 if i = j {\displaystyle i=j} else 0. == CLIP models == While the original model was developed by OpenAI, subsequent models have been trained by other organizations as well. === Image model === The image encoding models used in CLIP are typically vision transformers (ViT). The naming convention for these models often reflects the specific ViT architecture used. For instance, "ViT-L/14" means a "vision transformer large" (compared to other models in the same series) with a patch size of 14, meaning that the image is divided into 14-by-14 pixel patches before being processed by the transformer. The size indicator ranges from B, L, H, G (base, large, huge, giant), in that order. Other than ViT, the image model is typically a convolutional neural network, such as ResNet (in the original series by OpenAI), or ConvNeXt (in the OpenCLIP model series by LAION). Since the output vectors of the image model and the text model must have exactly the same length, both the image model and the text model have fixed-length vector outputs, which in the original report is called "embedding dimension". For example, in the original OpenAI model, the ResNet models have embedding dimensions ranging from 512 to 1024, and for the ViTs, from 512 to 768. Its implementation of ViT was the same as the original one, with one modification: after position embeddings are added to the initial patch embeddings, there is a LayerNorm. Its implementation of ResNet was the same as the original one, with 3 modifications: In the start of the CNN (the "stem"), they used three stacked 3x3 convolutions instead of a single 7x7 convolution, as suggested by. There is an average pooling of stride 2 at the start of each downsampling convolutional layer (they called it rect-2 blur pooling according to the terminology of ). This has the effect of blurring images before downsampling, for antialiasing. The final convolutional layer is followed by a multiheaded attention pooling. ALIGN a model with similar capabilities, trained by researchers from Google used EfficientNet, a kind of convolutional neural network. === Text model === The text encoding models used in CLIP are typically Transformers. In the original OpenAI report, they reported using a Transformer (63M-parameter, 12-layer, 512-wide, 8 attention heads) with lower-cased byte pair encoding (BPE) with 49152 vocabulary size. Context length was capped at 76 for efficiency. Like GPT, it was decoder-only, with only causally-masked self-attention. Its architecture is the same as GPT-2. Like BERT, the text sequence is bracketed by two special tokens [SOS] and [EOS] ("start of sequence" and "end of sequence"). Take the activations of the highest layer of the transformer on the [EOS], apply LayerNorm, then a final linear map. This is the text encoding of the input sequence. The final linear map has output dimension equal to the embedding dimension of whatever image encoder it is paired with. These models all had context length 77 and vocabulary size 49408. ALIGN used BERT of various sizes. == Dataset == === WebImageText === The CLIP models released by OpenAI were trained on a dataset called "WebImageText" (WIT) containing 400 million pairs of images and their corresponding captions scraped from the internet. The total number of words in this dataset is similar in scale to the WebText dataset used for training GPT-2, which contains about 40 gigabytes of text data. The dataset contains 500,000 text-queries, with up to 20,000 (image, text) pairs per query. The text-queries were generated by starting with all words occurring at least 100 times in English Wikipedia, then extended by bigrams with high mutual information, names of all Wikipedia articles above a certain search volume, and WordNet synsets. The dataset is private and has not been released to the public, and there is no further information on it. ==== Data preprocessing ==== For the CLIP image models, the input images are preprocessed by first dividing each of the R, G, B values of an image by the maximum possible value, so that these values fall between 0 and 1, then subtracting by [0.48145466, 0.4578275, 0.40821073], and dividing by [0.26862954, 0.26130258, 0.27577711]. The rationale was that these are the mean and standard deviations of the images in the WebImageText dataset, so this preprocessing step roughly whitens the image tensor. These numbers slightly differ from the standard preprocessing for ImageNet, which uses [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. If the input image does not have the same resolution as the native resolution (224×224 for all except ViT-L/14@336px, which has 336×336 resolution), then the input image is first scaled by bicubic interpolation, so that its shorter side is the same as the native resolution, then the central square of the image is cropped out. === Others === ALIGN used over one billion image-text pairs, obtained by extracting images and their alt-tags from online crawling. The method was described as similar to how the Conceptual Captions dataset was constructed, but instead of complex filtering, they only applied a frequency-based filtering. Later models trained by other organizations had published datasets. For example, LAION trained OpenCLIP with published datasets LAION-400M, LAION-2B, and DataComp-1B. == Training == In the original OpenAI CLIP report, they reported training 5 ResNet and 3 ViT (ViT-B/32, ViT-B/16, ViT-L/14). Each was trained for 32 epochs. The largest ResNet model took 18 days to train on 592 V100 GPUs. The largest ViT model took 12 days on 256 V100 GPUs. All ViT models were trained on 224×224 image resolution. The ViT-L/14 was then boosted to 336×336 resolution by FixRes, resulting in a model. They found this was the best-performing model. In the OpenCLIP series, the ViT-L/14 model was trained on 384 A100 GPUs on the LAION-2B dataset, for 160 epochs for a total of 32B samples seen. == Applications == === Cross-modal retrieval === CLIP's cross-modal retrieval enables the alignment of visual and textual data in a shared latent space, allowing users to retrieve images based on text descriptions and vice versa, without the need for explicit image annotations. In text-to-image retrieval, users input descriptive text, and CLIP retrieves images with matching embeddings. In image-to-text retrieval, images are used to find related text content. CLIP’s ability to connect vis

    Read more →
  • Virtual Woman

    Virtual Woman

    Virtual Woman is a software program that has elements of a chatbot, virtual reality, artificial intelligence, a video game, and a virtual human. It claims to be the oldest form of virtual life in existence, as it has been distributed since the late 1980s. Recent releases of the program can update their intelligence by connecting online and downloading newer personalities and histories. == Program play == When Virtual Woman starts, the user is presented with a list of options and then may choose their Virtual Woman's ethnic type, personality, location, clothing, etc. or load a pre-built Virtual Woman from a Digital DNA file. Once the options are determined, the user is presented with a 3-D animated Virtual Woman of their selection and then can engage them in conversation, progressing in a manner similar to that of its predecessor, ELIZA and its successors, the chatbots. In most versions of Virtual Woman, this is done through the keyboard, but some versions also support voice input. == In popular culture == Software sales and usage statistics from private companies are difficult to verify. WinSite, an independent Internet shareware distribution site that does publish public download counts, has for some time now listed some version of Virtual Woman in their top three shareware downloads of all time with well over seven hundred thousand downloads. == Compadre == The group of beta testers and advisers for Virtual Woman are referred to as Compadre and have their own beta testing site and forum. == Criticisms == As Virtual Woman has developed the ability to conduct longer and more realistic interactions, particularly in recent beta releases, criticism has arisen that this may lead some users to social isolation, or to use the program as a substitute for real human interaction. However, these are criticisms that have been leveled at all video games and at the use of the Internet itself. == Release history == Versions of Virtual Woman with rough release dates and PC platforms for which they were designed: Virtual Woman (????) (DOS) Virtual Woman for Windows (1991) (Windows 3.0) Virtual Woman 95 (1995) (Windows 3X, Windows 95) Virtual Woman 98 (1998) (Windows 3X, Windows 95) Virtual Woman 2000 (2000) (Windows 95+) Virtual Woman Millennium (Windows 95, XP) Virtual Woman Net ( Windows XP/Vista specific)

    Read more →
  • Gödel machine

    Gödel machine

    A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. == Limitations == Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired. This does not take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome. The Gödel machine has limitations of its own, however. According to Gödel's First Incompleteness Theorem, any formal system that encompasses arithmetic is either flawed or allows for statements that cannot be proved in the system. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove. == Variables of interest == There are three variables that are particularly useful in the run time of the Gödel machine. At some time t {\displaystyle t} , the variable time {\displaystyle {\text{time}}} will have the binary equivalent of t {\displaystyle t} . This is incremented steadily throughout the run time of the machine. Any input meant for the Gödel machine from the natural environment is stored in variable x {\displaystyle x} . It is likely the case that x {\displaystyle x} will hold different values for different values of variable time {\displaystyle {\text{time}}} . The outputs of the Gödel machine are stored in variable y {\displaystyle y} , where y ( t ) {\displaystyle y(t)} would be the output bit-string at some time t {\displaystyle t} . At any given time t {\displaystyle t} , where ( 1 ≤ t ≤ T ) {\displaystyle (1\leq t\leq T)} , the goal is to maximize future success or utility. A typical utility function follows the pattern u ( s , E n v ) : S × E → R {\displaystyle u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R} } : u ( s , E n v ) = E μ [ ∑ τ = time T r ( τ ) ∣ s , E n v ] {\displaystyle u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}} where r ( t ) {\displaystyle r(t)} is a real-valued reward input (encoded within s ( t ) {\displaystyle s(t)} ) at time t {\displaystyle t} , E μ [ ⋅ ∣ ⋅ ] {\displaystyle E_{\mu }[\cdot \mid \cdot ]} denotes the conditional expectation operator with respect to some possibly unknown distribution μ {\displaystyle \mu } from a set M {\displaystyle M} of possible distributions ( M {\displaystyle M} reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned time = time ⁡ ( s ) {\displaystyle {\text{time}}=\operatorname {time} (s)} is a function of state s {\displaystyle s} which uniquely identifies the current cycle. Note that we take into account the possibility of extending the expected lifespan through appropriate actions. == Instructions used by proof techniques == The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification. === get-axiom(n) === Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme: Hardware Axioms formally specify how components of the machine could change from one cycle to the next. Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs. Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y. Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine. Initial State Axioms contain information about how to reconstruct parts or all of the initial state. Utility Axioms describe the overall goal in the form of utility function u. === apply-rule(k, m, n) === Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof. === delete-theorem(m) === Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function. === set-switchprog(m, n) === Replaces switchprog S pm:n, provided it is a non-empty substring of S p. === check() === Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs). === state2theorem(m, n) === Takes in two arguments, m and n, and attempts to convert the contents of Sm:n into a theorem. == Example applications == === Time-limited NP-hard optimization === The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias. === Fast theorem proving === Prove or disprove as quickly as possible that all even integers > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof. === Maximizing expected reward with bounded resources === A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.

    Read more →
  • Algorithm selection

    Algorithm selection

    Algorithm selection (sometimes also called per-instance algorithm selection or offline algorithm selection) is a meta-algorithmic technique to choose an algorithm from a portfolio on an instance-by-instance basis. It is motivated by the observation that on many practical problems, different algorithms have different performance characteristics. That is, while one algorithm performs well in some scenarios, it performs poorly in others and vice versa for another algorithm. If we can identify when to use which algorithm, we can optimize for each scenario and improve overall performance. This is what algorithm selection aims to do. The only prerequisite for applying algorithm selection techniques is that there exists (or that there can be constructed) a set of complementary algorithms. == Definition == Given a portfolio P {\displaystyle {\mathcal {P}}} of algorithms A ∈ P {\displaystyle {\mathcal {A}}\in {\mathcal {P}}} , a set of instances i ∈ I {\displaystyle i\in {\mathcal {I}}} and a cost metric m : P × I → R {\displaystyle m:{\mathcal {P}}\times {\mathcal {I}}\to \mathbb {R} } , the algorithm selection problem consists of finding a mapping s : I → P {\displaystyle s:{\mathcal {I}}\to {\mathcal {P}}} from instances I {\displaystyle {\mathcal {I}}} to algorithms P {\displaystyle {\mathcal {P}}} such that the cost ∑ i ∈ I m ( s ( i ) , i ) {\displaystyle \sum _{i\in {\mathcal {I}}}m(s(i),i)} across all instances is optimized. == Examples == === Boolean satisfiability problem (and other hard combinatorial problems) === A well-known application of algorithm selection is the Boolean satisfiability problem. Here, the portfolio of algorithms is a set of (complementary) SAT solvers, the instances are Boolean formulas, the cost metric is for example average runtime or number of unsolved instances. So, the goal is to select a well-performing SAT solver for each individual instance. In the same way, algorithm selection can be applied to many other N P {\displaystyle {\mathcal {NP}}} -hard problems (such as mixed integer programming, CSP, AI planning, TSP, MAXSAT, QBF and answer set programming). Competition-winning systems in SAT are SATzilla, 3S and CSHC === Machine learning === In machine learning, algorithm selection is better known as meta-learning. The portfolio of algorithms consists of machine learning algorithms (e.g., Random Forest, SVM, DNN), the instances are data sets and the cost metric is for example the error rate. So, the goal is to predict which machine learning algorithm will have a small error on each data set. == Instance features == The algorithm selection problem is mainly solved with machine learning techniques. By representing the problem instances by numerical features f {\displaystyle f} , algorithm selection can be seen as a multi-class classification problem by learning a mapping f i ↦ A {\displaystyle f_{i}\mapsto {\mathcal {A}}} for a given instance i {\displaystyle i} . Instance features are numerical representations of instances. For example, we can count the number of variables, clauses, average clause length for Boolean formulas, or number of samples, features, class balance for ML data sets to get an impression about their characteristics. === Static vs. probing features === We distinguish between two kinds of features: Static features are in most cases some counts and statistics (e.g., clauses-to-variables ratio in SAT). These features ranges from very cheap features (e.g. number of variables) to very complex features (e.g., statistics about variable-clause graphs). Probing features (sometimes also called landmarking features) are computed by running some analysis of algorithm behavior on an instance (e.g., accuracy of a cheap decision tree algorithm on an ML data set, or running for a short time a stochastic local search solver on a Boolean formula). These feature often cost more than simple static features. === Feature costs === Depending on the used performance metric m {\displaystyle m} , feature computation can be associated with costs. For example, if we use running time as performance metric, we include the time to compute our instance features into the performance of an algorithm selection system. SAT solving is a concrete example, where such feature costs cannot be neglected, since instance features for CNF formulas can be either very cheap (e.g., to get the number of variables can be done in constant time for CNFs in the DIMACs format) or very expensive (e.g., graph features which can cost tens or hundreds of seconds). It is important to take the overhead of feature computation into account in practice in such scenarios; otherwise a misleading impression of the performance of the algorithm selection approach is created. For example, if the decision which algorithm to choose can be made with perfect accuracy, but the features are the running time of the portfolio algorithms, there is no benefit to the portfolio approach. This would not be obvious if feature costs were omitted. == Approaches == === Regression approach === One of the first successful algorithm selection approaches predicted the performance of each algorithm m ^ A : I → R {\displaystyle {\hat {m}}_{\mathcal {A}}:{\mathcal {I}}\to \mathbb {R} } and selected the algorithm with the best predicted performance a r g min A ∈ P m ^ A ( i ) {\displaystyle arg\min _{{\mathcal {A}}\in {\mathcal {P}}}{\hat {m}}_{\mathcal {A}}(i)} for an instance i {\displaystyle i} . === Clustering approach === A common assumption is that the given set of instances I {\displaystyle {\mathcal {I}}} can be clustered into homogeneous subsets and for each of these subsets, there is one well-performing algorithm for all instances in there. So, the training consists of identifying the homogeneous clusters via an unsupervised clustering approach and associating an algorithm with each cluster. A new instance is assigned to a cluster and the associated algorithm selected. A more modern approach is cost-sensitive hierarchical clustering using supervised learning to identify the homogeneous instance subsets. === Pairwise cost-sensitive classification approach === A common approach for multi-class classification is to learn pairwise models between every pair of classes (here algorithms) and choose the class that was predicted most often by the pairwise models. We can weight the instances of the pairwise prediction problem by the performance difference between the two algorithms. This is motivated by the fact that we care most about getting predictions with large differences correct, but the penalty for an incorrect prediction is small if there is almost no performance difference. Therefore, each instance i {\displaystyle i} for training a classification model A 1 {\displaystyle {\mathcal {A}}_{1}} vs A 2 {\displaystyle {\mathcal {A}}_{2}} is associated with a cost | m ( A 1 , i ) − m ( A 2 , i ) | {\displaystyle |m({\mathcal {A}}_{1},i)-m({\mathcal {A}}_{2},i)|} . == Requirements == The algorithm selection problem can be effectively applied under the following assumptions: The portfolio P {\displaystyle {\mathcal {P}}} of algorithms is complementary with respect to the instance set I {\displaystyle {\mathcal {I}}} , i.e., there is no single algorithm A ∈ P {\displaystyle {\mathcal {A}}\in {\mathcal {P}}} that dominates the performance of all other algorithms over I {\displaystyle {\mathcal {I}}} (see figures to the right for examples on complementary analysis). In some application, the computation of instance features is associated with a cost. For example, if the cost metric is running time, we have also to consider the time to compute the instance features. In such cases, the cost to compute features should not be larger than the performance gain through algorithm selection. == Application domains == Algorithm selection is not limited to single domains but can be applied to any kind of algorithm if the above requirements are satisfied. Application domains include: hard combinatorial problems: SAT, Mixed Integer Programming, CSP, AI Planning, TSP, MAXSAT, QBF and Answer Set Programming combinatorial auctions in machine learning, the problem is known as meta-learning software design black-box optimization multi-agent systems numerical optimization linear algebra, differential equations evolutionary algorithms vehicle routing problem power systems For an extensive list of literature about algorithm selection, we refer to a literature overview. == Variants of algorithm selection == === Online selection === Online algorithm selection refers to switching between different algorithms during the solving process. This is useful as a hyper-heuristic. In contrast, offline algorithm selection selects an algorithm for a given instance only once and before the solving process. === Computation of schedules === An extension of algorithm selection is the per-instance algorithm scheduling problem, in which we do not select only one solver, but we select a time budget for each algorithm

    Read more →
  • Representation collapse

    Representation collapse

    Representation collapse is a phenomenon in machine learning and representation learning where a model maps different inputs to the same or very similar embeddings, which means it loses important information about how the data is spread out. It is frequently encountered in self-supervised learning, especially within contrastive and non-contrastive frameworks, when training objectives or model architectures do not maintain variance across representations. Collapse results in degenerate solutions characterized by uninformative learned features, significantly impairing downstream task performance. Various techniques have been proposed to mitigate representation collapse, including the use of negative samples, architectural asymmetry, stop-gradient operations, variance regularization, and redundancy reduction objectives, as seen in methods such as SimCLR, BYOL, and VICReg. Comprehending and averting representation collapse is regarded as a fundamental challenge in the advancement of stable and efficient self-supervised learning systems.

    Read more →
  • Open Cloud Computing Interface

    Open Cloud Computing Interface

    The Open Cloud Computing Interface (OCCI) is a set of specifications delivered through the Open Grid Forum, for cloud computing service providers. OCCI has a set of implementations that act as proofs of concept. It builds upon World Wide Web fundamentals by using the Representational State Transfer (REST) approach for interacting with services. == Scope == The aim of the Open Cloud Computing Interface is the development of an open specification and API for cloud offerings. The focus was on Infrastructure-as-a-Service (IaaS) based offerings but the interface can be extended to support Platform and Software as a Service offerings as well. IaaS is one of three primary segments of the cloud computing industry in which compute, storage and network resources are provided as services. The API is based on a review of existing service-provider functionality and a set of use cases contributed by the working group. OCCI is a boundary API that acts as a service front-end to an IaaS provider’s internal infrastructure management framework. OCCI provides commonly understood semantics, syntax and a means of management in the domain of consumer-to-provider IaaS. It covers management of the entire life-cycle of OCCI-defined model entities and is compatible with existing standards such as the Open Virtualization Format (OVF) and the Cloud Data Management Interface (CDMI). Notably, it serves as an integration point for standardization efforts including Distributed Management Task Force, Internet Engineering Task Force and the Storage Networking Industry Association. == Context == OCCI began in March 2009 and was initially led by RabbitMQ and the Complutense University of Madrid. Today, the working group has over 250 members and includes numerous individuals, industry and academic parties. The OCCI operates under the umbrella of the Open Grid Forum (OGF), using a wiki and a mailing list for collaboration. == Goals == Interoperability: allow different Cloud providers to work together without data schema/format translation, facade/proxying between APIs and understanding and/or dependency on multiple APIs Portability: no technical/vendor lock-in and enable services to move between providers allows clients to easily switch between providers based on business objectives (e.g., cost) with minimal technical costs, thus enabling and fostering competition. Integration: the specification can be implemented with both the latest infrastructures or legacy ones. Extensibility: thanks to the use of a meta-model and capabilities discovery features, an OCCI client is able to interact with any OCCI server using provider-specific OCCI extensions. == Specific Implementations == They implement specific extensions of OCCI for a particular service: IaaS, PaaS, brokering, etc. Several implementations have been announced or released. == Generic Implementations (frameworks) == Here are frameworks to build OCCI APIs. Complementing these are a variety of developer tools. == Alternatives == Alternative approaches include the use of the Cloud Infrastructure Management Interface (CIMI) and related standards set from DMTF and the Amazon Web Services interfaces from Amazon. (The latter have not been endorsed by any known Standards organization). OpenNebula conducted a survey of their users in which the results showed, 38% do not expose cloud APIs, their users only interface through the Sunstone GUI, 36% mostly use the Amazon Web Services API, and 26% mostly use the OpenNebula’s OCCI API or the OCCI API offered by rOCCI.

    Read more →
  • AI anthropomorphism

    AI anthropomorphism

    AI anthropomorphism is the attribution of human-like feelings, mental states, and behavioral characteristics to artificial intelligence systems. Factors related to the user of the AI – such as culture, age, education, gender, and personality traits – are also important determinants of the strength of anthropomorphic effects. Since the earliest days of AI development, humans have interpreted machine outputs through anthropomorphic frameworks, but the recent emergence of generative AI has amplified these tendencies. In research and engineering, there is a distinction between anthropomorphism and anthropomorphic design. The former is an innate human tendency toward non-human entities. The latter is the scientific community effort to “design anthropomorphism”. Such a design can involve the manipulation of cues, including AI appearance, behaviour and language. Contemporary AI systems today can generate extremely human-like outputs and are often designed specifically to do so, meaning that their anthropomorphic effects can be especially powerful. In some cases, anthropomorphism is accompanied with explicit beliefs that AI systems are capable of empathy, goodwill, understanding, or consciousness. == Background == === In early AIs === Views of artificial agents possessing a human-like intelligence have existed since the early development of computers in the mid-1900s. The use of the human mind as a metaphor for understanding the workings of machine systems was prevalent among researchers in the early days of computer science, with multiple influential works widely distributing the idea of intelligent machines. Among the most widely cited papers of this period was Alan Turing's "Computing Machinery and Intelligence" in which he introduced the Turing Test, stating that a machine was intelligent if it could produce conversation that was indistinguishable from that of a human. These academic works in the 1940s and 1950s gave early credibility to the idea that machine workings could be thought of similarly to human minds. The public quickly came to view artificial systems similarly, with often exaggerated conceptions of the capabilities of early machines. Among the most well-known demonstrations of this was through the chatbot ELIZA designed by Joseph Weizenbaum in 1966. ELIZA responded to user inputs with a rudimentary text-processing approach that could not be considered anything resembling true understanding of the inputs, yet users, even when operating with full conscious knowledge of ELIZA's limitations, often began to ascribe motivation and understanding to the program's output. Weizenbaum later wrote, "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." Comparisons between the intellectual capabilities of artificial intelligence and human intelligence were continually intensified by the attempts of computer scientists to develop machines that could perform human tasks at a level equal to or better than humans. A symbolic turning point was achieved in 1997, when IBM's chess supercomputer Deep Blue defeated then-world champion Garry Kasparov in a highly publicized six-game match. The defeat of a human by a machine for the first time in chess – a game viewed as a canonical example of human intellect – and the media attention surrounding the match led to a significant shift, where views of parallels between human and artificial intelligence moved from abstract speculation to being concretely demonstrated. A similar achievement was reached in the board game Go in 2017, when the program AlphaGo defeated world top-ranked Ke Jie. === Large language models === The AI boom of the 2020s brought about the widespread emergence of generative AI; in particular, chatbots such as ChatGPT, Gemini, and Claude based on large language models (LLMs) have become increasingly pervasive in everyday society. These systems are notable for the fact that they are able to respond to a wide range of prompts across contexts while producing strikingly human-like outputs – research has shown that humans are often unable to distinguish human-generated text from AI-generated text, and modern AI chatbots have formally been shown to pass the Turing test. As such, the anthropomorphic effects of AI are more powerful than ever. Given that LLMs have brought AI into the technological mainstream, considerable scientific effort has been devoted in recent years to understand existing and potential ramifications of AI in the public sphere; the prevalence and effects of anthropomorphism is one of those domains where much of this effort has been directed. == Current anthropomorphic attributions == === In the general public === Surveys have shown that a substantial portion of the public attributes human-like qualities to AI. In one sample of U.S. adults from 2024, two-thirds of people believed that ChatGPT is possibly conscious on some level, though other research has shown that the public still views the likelihood itself of AI consciousness as comparatively low. Another study conducted in 2025 found that women, people of color, and older individuals were most likely to anthropomorphize AI, as well as that – in general – humans view AIs as warm and competent, and anthropomorphic attributions to AI had increased by 34% in the past year. A YouGov poll reported that 46% of Americans believe that people should display politeness to AI chatbots by saying "please" and "thank you", demonstrating the application of social norms to AI. These beliefs extend to behavior, where majorities of AI users claim to always be polite to chatbots; of those who behave politely, most say they do so simply because it is the "nice" thing to do. In many recent cases, humans have developed robust interpersonal bonds with AI systems. For example: users of social chatbots like Replika and Character.ai have been documented to fall in love with the AIs, or to otherwise treat the AIs as intimate companions, and it has become increasingly common for individuals to use LLMs like ChatGPT as therapists. Chatbots are able to produce responses deeply attuned to users, as they are often designed to maximize agreeableness and mirror users' emotions; this can create compelling illusions of intimacy. === In the research community === In many cases, even AI researchers anthropomorphize AI systems in some capacity. Among the most extreme and well-publicized of these instances occurred in 2022, when engineer Blake Lemoine publicly claimed that Google's LLM LaMDA was conscious. Lemoine published the transcript of a conversation he had had with LaMDA regarding self identity and morality which he claimed was evidence of its sentience; he asserted that LaMDA was "a person" as defined by the United States Constitution and compared its mental capability to that of a 7- or 8-year-old. Lemoine's claims were widely dismissed by the scientific community and by Google itself, which described Lemoine's conclusions as "wholly unfounded" and fired him on the grounds that he had violated policies "to safeguard product information". It is much more common that AI researchers unintentionally imply humanness of AI through the ordinary use of anthropomorphic language to describe nonhuman agents. This kind of language, which Daniel Dennett coined the "intentional stance", is very common in everyday life in a variety of different contexts (e.g., "My computer doesn't want to turn on today"). For AI agents that may actually appear to very closely replicate some human abilities, however, the casual use of such anthropomorphic language in research has been scrutinized for being potentially misleading to the public. As early as 1976, Drew McDermott criticized the research community for the use of "wishful mnemonics", where AIs were referred to with terms like "understand" and "learn". In the LLM era, these criticisms have further intensified, with the negative effects of AI anthropomorphism in the public posing an especially salient danger given the elevated accessibility of modern AI. In some cases, the use of anthropomorphic language for AI is not unintentional, but is willfully used by researchers in order to promote better understanding of the brain – the idea being that, as AI can be functionally similar in some ways to the human brain, we may gain new insights and ideas from treating AI as a kind of model of the brain's workings. In particular, deep neuronal networks (DNNs) are often explicitly compared to the human brain, and significant advances in DNN research have stirred considerable enthusiasm about the ability of AI to emulate the human abilities. Caution has been urged in this domain as well, however; the use of anthropomorphic language can mask important differences that fundamentally distinguish AI from human intelligence. When it comes to DNNs, for example, it has been pointed out that they are still structurally quite different

    Read more →
  • Psychology of reasoning

    Psychology of reasoning

    The psychology of reasoning (also known as the cognitive science of reasoning) is the study of how people reason, often broadly defined as the process of drawing conclusions to inform how people solve problems and make decisions. It overlaps with psychology, philosophy, linguistics, cognitive science, artificial intelligence, logic, and probability theory. Psychological experiments on how humans and other animals reason have been carried out for over 100 years. An enduring question is whether or not people have the capacity to be rational. Current research in this area addresses various questions about reasoning, rationality, judgments, intelligence, relationships between emotion and reasoning, and development. == Everyday reasoning == One of the most obvious areas in which people employ reasoning is with sentences in everyday language. Most experimentation on deduction has been carried out on hypothetical thought, in particular, examining how people reason about conditionals, e.g., If A then B. Participants in experiments make the modus ponens inference, given the indicative conditional If A then B, and given the premise A, they conclude B. However, given the indicative conditional and the minor premise for the modus tollens inference, not-B, about half of the participants in experiments conclude not-A and the remainder concludes that nothing follows. The ease with which people make conditional inferences is affected by context, as demonstrated in the well-known selection task developed by Peter Wason. Participants are better able to test a conditional in an ecologically relevant context, e.g., if the envelope is sealed then it must have a 50 cent stamp on it compared to one that contains symbolic content, e.g., if the letter is a vowel then the number is even. Background knowledge can also lead to the suppression of even the simple modus ponens inference Participants given the conditional if Lisa has an essay to write then she studies late in the library and the premise Lisa has an essay to write make the modus ponens inference 'she studies late in the library', but the inference is suppressed when they are also given a second conditional if the library stays open then she studies late in the library. Interpretations of the suppression effect are controversial Other investigations of propositional inference examine how people think about disjunctive alternatives, e.g., A or else B, and how they reason about negation, e.g., It is not the case that A and B. Many experiments have been carried out to examine how people make relational inferences, including comparisons, e.g., A is better than B. Such investigations also concern spatial inferences, e.g. A is in front of B and temporal inferences, e.g. A occurs before B. Other common tasks include categorical syllogisms, used to examine how people reason about quantifiers such as All or Some, e.g., Some of the A are not B. For example if all A are B and some B are C, what (if anything) follows? == Theories of reasoning == There are several alternative theories of the cognitive processes that human reasoning is based on. One view is that people rely on a mental logic consisting of formal (abstract or syntactic) inference rules similar to those developed by logicians in the propositional calculus. Another view is that people rely on domain-specific or content-sensitive rules of inference. A third view is that people rely on mental models, that is, mental representations that correspond to imagined possibilities. A fourth view is that people compute probabilities. One controversial theoretical issue is the identification of an appropriate competence model, or a standard against which to compare human reasoning. Initially classical logic was chosen as a competence model. Subsequently, some researchers opted for non-monotonic logic and Bayesian probability. Research on mental models and reasoning has led to the suggestion that people are rational in principle but err in practice. Connectionist approaches towards reasoning have also been proposed. Despite the ongoing debate about the cognitive processes involved in human reasoning, recent research has shown that multiple approaches can be useful in modeling human thinking. For instance, studies have found that people's reasoning is often influenced by their prior beliefs, which can be modeled using Bayesian probability theory. Additionally, research on mental models has shown that people tend to reason about problems by constructing multiple mental representations of the situation, which can help them to identify relevant features and make inferences based on their understanding of the problem. Moreover, connectionist approaches to reasoning have also gained attention, which focus on the neural network models that can learn from data and generalize to new situations. == Development of reasoning == It is an active question in psychology how, why, and when the ability to reason develops from infancy to adulthood. Jean Piaget's theory of cognitive development posited general mechanisms and stages in the development of reasoning from infancy to adulthood. According to the neo-Piagetian theories of cognitive development, changes in reasoning with development come from increasing working memory capacity, increasing speed of processing, and enhanced executive functions and control. Increasing self-awareness is also an important factor. In their book The Enigma of Reason, the cognitive scientists Hugo Mercier and Dan Sperber put forward an "argumentative" theory of reasoning, claiming that humans evolved to reason primarily to justify our beliefs and actions and to convince others in a social environment. Key evidence for their theory includes the errors in reasoning that solitary individuals are prone to when their arguments are not criticized, such as logical fallacies, and how groups become much better at performing cognitive reasoning tasks when they communicate with one another and can evaluate each other's arguments. Sperber and Mercier offer one attempt to resolve the apparent paradox that the confirmation bias is so strong despite the function of reasoning naively appearing to be to come to veridical conclusions about the world. The study of the development of reasoning abilities is an ongoing area of research in psychology, and multiple factors have been proposed to explain how, why, and when reasoning develops from infancy to adulthood. Recent research has suggested that early experiences and social interactions play a critical role in the development of reasoning abilities. For example, studies have shown that infants as young as six months old can engage in basic logical reasoning, such as reasoning about the relationship between objects and their properties. Furthermore, research has highlighted the importance of parental interaction and cognitive stimulation in the development of children's reasoning abilities. Additionally, studies have suggested that cultural factors, such as educational practices and the emphasis on critical thinking, can also influence the development of reasoning skills across different populations. == Different sorts of reasoning == Philip Johnson-Laird trying to taxonomize thought, distinguished between goal-directed thinking and thinking without goal, noting that association was involved in unrelated reading. He argues that goal directed reasoning can be classified based on the problem space involved in a solution, citing Allen Newell and Herbert A. Simon. Inductive reasoning makes broad generalizations from specific cases or observations. In this process of reasoning, general assertions are made based on past specific pieces of evidence. This kind of reasoning allows the conclusion to be false even if the original statement is true. For example, if one observes a college athlete, one makes predictions and assumptions about other college athletes based on that one observation. Scientists use inductive reasoning to create theories and hypotheses. Philip Johnson-Laird distinguished inductive from deductive reasoning, in that the former creates semantic information while the later does not . In opposition, deductive reasoning is a basic form of valid reasoning. In this reasoning process a person starts with a known claim or a general belief and from there asks what follows from these foundations or how will these premises influence other beliefs. In other words, deduction starts with a hypothesis and examines the possibilities to reach a conclusion. Deduction helps people understand why their predictions are wrong and indicates that their prior knowledge or beliefs are off track. An example of deduction can be seen in the scientific method when testing hypotheses and theories. Although the conclusion usually corresponds and therefore proves the hypothesis, there are some cases where the conclusion is logical, but the generalization is not. For example, the argument, "All young girls wear skirts; Julie is a young

    Read more →
  • Stability (learning theory)

    Stability (learning theory)

    Stability, also known as algorithmic stability, is a notion in computational learning theory of how a machine learning algorithm output is changed with small perturbations to its inputs. A stable learning algorithm is one for which the prediction does not change much when the training data is modified slightly. For instance, consider a machine learning algorithm that is being trained to recognize handwritten letters of the alphabet, using 1000 examples of handwritten letters and their labels ("A" to "Z") as a training set. One way to modify this training set is to leave out an example, so that only 999 examples of handwritten letters and their labels are available. A stable learning algorithm would produce a similar classifier with both the 1000-element and 999-element training sets. Stability can be studied for many types of learning problems, from language learning to inverse problems in physics and engineering, as it is a property of the learning process rather than the type of information being learned. The study of stability gained importance in computational learning theory in the 2000s when it was shown to have a connection with generalization. It was shown that for large classes of learning algorithms, notably empirical risk minimization algorithms, certain types of stability ensure good generalization. == History == A central goal in designing a machine learning system is to guarantee that the learning algorithm will generalize, or perform accurately on new examples after being trained on a finite number of them. In the 1990s, milestones were reached in obtaining generalization bounds for supervised learning algorithms. The technique historically used to prove generalization was to show that an algorithm was consistent, using the uniform convergence properties of empirical quantities to their means. This technique was used to obtain generalization bounds for the large class of empirical risk minimization (ERM) algorithms. An ERM algorithm is one that selects a solution from a hypothesis space H {\displaystyle H} in such a way to minimize the empirical error on a training set S {\displaystyle S} . A general result, proved by Vladimir Vapnik for an ERM binary classification algorithms, is that for any target function and input distribution, any hypothesis space H {\displaystyle H} with VC-dimension d {\displaystyle d} , and n {\displaystyle n} training examples, the algorithm is consistent and will produce a training error that is at most O ( d n ) {\displaystyle O\left({\sqrt {\frac {d}{n}}}\right)} (plus logarithmic factors) from the true error. The result was later extended to almost-ERM algorithms with function classes that do not have unique minimizers. Vapnik's work, using what became known as VC theory, established a relationship between generalization of a learning algorithm and properties of the hypothesis space H {\displaystyle H} of functions being learned. However, these results could not be applied to algorithms with hypothesis spaces of unbounded VC-dimension. Put another way, these results could not be applied when the information being learned had a complexity that was too large to measure. Some of the simplest machine learning algorithms—for instance, for regression—have hypothesis spaces with unbounded VC-dimension. Another example is language learning algorithms that can produce sentences of arbitrary length. Stability analysis was developed in the 2000s for computational learning theory and is an alternative method for obtaining generalization bounds. The stability of an algorithm is a property of the learning process, rather than a direct property of the hypothesis space H {\displaystyle H} , and it can be assessed in algorithms that have hypothesis spaces with unbounded or undefined VC-dimension such as nearest neighbor. A stable learning algorithm is one for which the learned function does not change much when the training set is slightly modified, for instance by leaving out an example. A measure of Leave one out error is used in a Cross Validation Leave One Out (CVloo) algorithm to evaluate a learning algorithm's stability with respect to the loss function. As such, stability analysis is the application of sensitivity analysis to machine learning. == Summary of classic results == Early 1900s - Stability in learning theory was earliest described in terms of continuity of the learning map L {\displaystyle L} , traced to Andrey Nikolayevich Tikhonov. 1979 - Devroye and Wagner observed that the leave-one-out behavior of an algorithm is related to its sensitivity to small changes in the sample. 1999 - Kearns and Ron discovered a connection between finite VC-dimension and stability. 2002 - In a landmark paper, Bousquet and Elisseeff proposed the notion of uniform hypothesis stability of a learning algorithm and showed that it implies low generalization error. Uniform hypothesis stability, however, is a strong condition that does not apply to large classes of algorithms, including ERM algorithms with a hypothesis space of only two functions. 2002 - Kutin and Niyogi extended Bousquet and Elisseeff's results by providing generalization bounds for several weaker forms of stability which they called almost-everywhere stability. Furthermore, they took an initial step in establishing the relationship between stability and consistency in ERM algorithms in the Probably Approximately Correct (PAC) setting. 2004 - Poggio et al. proved a general relationship between stability and ERM consistency. They proposed a statistical form of leave-one-out-stability which they called CVEEEloo stability, and showed that it is a) sufficient for generalization in bounded loss classes, and b) necessary and sufficient for consistency (and thus generalization) of ERM algorithms for certain loss functions such as the square loss, the absolute value and the binary classification loss. 2010 - Shalev Shwartz et al. noticed problems with the original results of Vapnik due to the complex relations between hypothesis space and loss class. They discuss stability notions that capture different loss classes and different types of learning, supervised and unsupervised. 2016 - Moritz Hardt et al. proved stability of gradient descent given certain assumption on the hypothesis and number of times each instance is used to update the model. == Preliminary definitions == We define several terms related to learning algorithms training sets, so that we can then define stability in multiple ways and present theorems from the field. A machine learning algorithm, also known as a learning map L {\displaystyle L} , maps a training data set, which is a set of labeled examples ( x , y ) {\displaystyle (x,y)} , onto a function f {\displaystyle f} from X {\displaystyle X} to Y {\displaystyle Y} , where X {\displaystyle X} and Y {\displaystyle Y} are in the same space of the training examples. The functions f {\displaystyle f} are selected from a hypothesis space of functions called H {\displaystyle H} . The training set from which an algorithm learns is defined as S = { z 1 = ( x 1 , y 1 ) , . . , z m = ( x m , y m ) } {\displaystyle S=\{z_{1}=(x_{1},\ y_{1})\ ,..,\ z_{m}=(x_{m},\ y_{m})\}} and is of size m {\displaystyle m} in Z = X × Y {\displaystyle Z=X\times Y} drawn i.i.d. from an unknown distribution D. Thus, the learning map L {\displaystyle L} is defined as a mapping from Z m {\displaystyle Z_{m}} into H {\displaystyle H} , mapping a training set S {\displaystyle S} onto a function f S {\displaystyle f_{S}} from X {\displaystyle X} to Y {\displaystyle Y} . Here, we consider only deterministic algorithms where L {\displaystyle L} is symmetric with respect to S {\displaystyle S} , i.e. it does not depend on the order of the elements in the training set. Furthermore, we assume that all functions are measurable and all sets are countable. The loss V {\displaystyle V} of a hypothesis f {\displaystyle f} with respect to an example z = ( x , y ) {\displaystyle z=(x,y)} is then defined as V ( f , z ) = V ( f ( x ) , y ) {\displaystyle V(f,z)=V(f(x),y)} . The empirical error of f {\displaystyle f} is I S [ f ] = 1 n ∑ V ( f , z i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum V(f,z_{i})} . The true error of f {\displaystyle f} is I [ f ] = E z V ( f , z ) {\displaystyle I[f]=\mathbb {E} _{z}V(f,z)} Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows: By removing the i-th element S | i = { z 1 , . . . , z i − 1 , z i + 1 , . . . , z m } {\displaystyle S^{|i}=\{z_{1},...,\ z_{i-1},\ z_{i+1},...,\ z_{m}\}} By replacing the i-th element S i = { z 1 , . . . , z i − 1 , z i ′ , z i + 1 , . . . , z m } {\displaystyle S^{i}=\{z_{1},...,\ z_{i-1},\ z_{i}',\ z_{i+1},...,\ z_{m}\}} == Definitions of stability == === Hypothesis Stability === An algorithm L {\displaystyle L} has hypothesis stability β with respect to the loss function V if the following holds: ∀ i ∈ { 1 , . . . , m } , E S , z [ | V ( f S , z ) − V ( f S |

    Read more →
  • Color histogram

    Color histogram

    In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges that span the image's color space (the set of all possible colors). A color histogram can be built for any kind of color space, although the term is more often used for three-dimensional spaces such as RGB or HSV. For monochromatic images, the term intensity histogram may be used instead. For multi-spectral images, where each pixel is represented by an arbitrary number of measurements (for example, beyond the three measurements in RGB), a color histogram is N-dimensional, with N being the number of measurements taken. Each measurement has its own wavelength range of the light spectrum, some of which may be outside the visible spectrum. If the set of possible color values is sufficiently small, each of those colors may be placed on a range by itself; then the histogram is merely the count of pixels that have each possible color. Most often, the space is divided into an appropriate number of ranges, often arranged as a regular grid, each containing many similar color values. A color histogram may also be represented and displayed as a smooth function defined over the color space that approximates the pixel counts. Like other kinds of histograms, a color histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of color values. == Overview == Color histograms are flexible constructs that can be built from images in various color spaces, whether RGB, rg chromaticity or any other color space of any dimension. A histogram of an image is produced first by discretization of the colors in the image into a number of bins, and counting the number of image pixels in each bin. For example, a red–blue chromaticity histogram can be formed by first normalizing color pixel values by dividing RGB values by R+G+B, then quantizing the normalized R and B coordinates into N bins each. A two-dimensional histogram of red–blue chromaticity divided into four bins (N=4) may yield a histogram similar to this table: A histogram can be N-dimensional. Although harder to display, a three-dimensional color histogram for the above example could be thought of as four separate red–blue histograms, where each of the four histograms contains the red–blue values for a bin of green (0–63, 64–127, 128–191, and 192–255). The histogram provides a compact summarization of the distribution of data in an image. A color histogram of an image is relatively invariant with translation and rotation about the viewing axis, and varies only slowly with the angle of view. By comparing histogram signatures of two images and matching the color content of one image with the other, a color histogram is particularly well suited for the problem of recognizing an object of unknown position and rotation within a scene. Importantly, translation of an RGB image into the illumination invariant rg-chromaticity space allows the histogram to operate well in varying light levels. 1. What is a histogram? A histogram is a graphical representation of the number of pixels in an image. In a more simple way to explain, a histogram is a bar graph, whose X-axis represents the tonal scale (black at the left and white at the right), and Y-axis represents the number of pixels in an image in a certain area of the tonal scale. For example, the graph of a luminance histogram shows the number of pixels for each brightness level (from black to white), and when there are more pixels, the peak at the certain luminance level is higher. 2. What is a color histogram? A color histogram of an image represents the distribution of the composition of colors in the image. It shows different types of colors appeared and the number of pixels in each type of the colors appeared. The relation between a color histogram and a luminance histogram is that a color histogram can be also expressed as “three luminance histograms”, each of which shows the brightness distribution of each individual red/green/blue color channel. == Characteristics of a color histogram == A color histogram focuses only on the proportion of the number of different types of colors, regardless of the spatial location of the colors. The values of a color histogram are from statistics. They show the statistical distribution of colors and the essential tone of an image. In general, as the color distributions of the foreground and background in an image are different, there might be a bimodal distribution in the histogram. For the luminance histogram alone, there is no perfect histogram and in general, the histogram can tell whether it is over-exposure or not, but there are times when you might think the image is over exposed by viewing the histogram; however, in reality it is not. == Principles of the formation of a color histogram == The formation of a color histogram is rather simple. From the definition above, we can simply count the number of pixels for each 256 scales in each of the 3 RGB channel, and plot them on 3 individual bar graphs. In general, a color histogram is based on a certain color space, such as RGB or HSV. When we compute the pixels of different colors in an image, if the color space is large, then we can first divide the color space into certain numbers of small intervals. Each of the intervals is called a bin. This process is called color quantization. Then, by counting the number of pixels in each of the bins, we get a color histogram of the image. The concrete steps of the principles can be viewed in Example 1. == Examples == === Example 1 === Given the following image of a cat (an original version and a version that has been reduced to 256 colors for easy histogram purposes), the following data represents a color histogram in the RGB color space, using four bins. Bin 0 corresponds to intensities 0–63 Bin 1 is 64–127 Bin 2 is 128–191 and Bin 3 is 192–255. === Example 2 === Application in camera: Nowadays, some cameras have the ability to show the 3 color histograms when we take photos. We can examine clips (spikes on either the black or white side of the scale) in each of the 3 RGB color histograms. If we find one or more clipping on a channel of the 3 RGB channels, then this would result in a loss of detail for that color. To illustrate this, consider this example: We know that each of the three R, G, B channels has a range of values from 0 to 255 (8 bit). So consider a photo that has a luminance range of 0–255. Assume the photo we take is made of 4 blocks that are adjacent to each other and we set the luminance scale for each of the 4 blocks of original photo to be 10, 100, 205, 245. Thus, the image looks like the topmost figure on the right. Then, we overexpose the photo a little, say, the luminance scale of each block is increased by 10. Thus, the luminance scale for each of the 4 blocks of new photo is 20, 110, 215, 255. Then, the image looks like the second figure on the right. There is not much difference between both figures, all we can see is that the whole image becomes brighter (the contrast for each of the blocks remain the same). Now, we overexpose the original photo again, this time the luminance scale of each block is increased by 50. Thus, the luminance scale for each of the 4 blocks of the new photo is 60, 150, 255, 255. The new image now looks like the third figure on the right. Note that the scale for the last block is 255 instead of 295, for 255 is the top scale and thus the last block has clipped. When this happens, we lose the contrast of the last 2 blocks, and thus we cannot recover the image no matter how we adjust it. To conclude, when taking photos with a camera that displays histograms, always keep the brightest tone in the image below the largest scale 255 on the histogram in order to avoid losing details. == Drawbacks and other approaches == The main drawback of histograms for classification is that the representation is dependent on the color of the object being studied, ignoring its shape and texture. Color histograms can potentially be identical for two images with different object content which happens to share color information. Conversely, without spatial or shape information, similar objects of different color may be indistinguishable based solely on color histogram comparisons. There is no way to distinguish a red and white cup from a red and white plate. Put it another way: histogram-based algorithms have no concept of a generic 'cup', and a model of a red and white cup is no use when given an otherwise identical blue and white cup. Another problem is that color histograms have high sensitivity to noisy interference such as lighting intensity changes and quantization errors. High dimensionality (bins) color histograms are also another issue. Some color histogram feature spaces often occupy more than one hundred di

    Read more →
  • Rule induction

    Rule induction

    Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. The rules extracted may represent a full scientific model of the data, or merely represent local patterns in the data. Data mining in general and rule induction in detail are trying to create algorithms without human programming but with analyzing existing data structures. In the easiest case, a rule is expressed with “if-then statements” and was created with the ID3 algorithm for decision tree learning. Rule learning algorithm are taking training data as input and creating rules by partitioning the table with cluster analysis. A possible alternative over the ID3 algorithm is genetic programming which evolves a program until it fits to the data. Creating different algorithm and testing them with input data can be realized in the WEKA software. Additional tools are machine learning libraries for Python, like scikit-learn. == Paradigms == Some major rule induction paradigms are: Association rule learning algorithms (e.g., Agrawal) Decision rule algorithms (e.g., Quinlan 1987) Hypothesis testing algorithms (e.g., RULEX) Horn clause induction Version spaces Rough set rules Inductive Logic Programming Boolean decomposition (Feldman) == Algorithms == Some rule induction algorithms are: Charade Rulex Progol CN2

    Read more →
  • Neurorobotics

    Neurorobotics

    Neurorobotics is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms (e.g. connectionist networks), computational models of biological neural networks (e.g. artificial spiking neural networks, large-scale simulations of neural microcircuits) and actual biological systems (e.g. in vivo and in vitro neural nets). Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures. Neurorobotics is that branch of neuroscience with robotics, which deals with the study and application of science and technology of embodied autonomous neural systems like brain-inspired algorithms. It is based on the idea that the brain is embodied and the body is embedded in the environment. Therefore, most neurorobots are required to function in the real world, as opposed to a simulated environment. Beyond brain-inspired algorithms for robots neurorobotics may also involve the design of brain-controlled robot systems. == Major classes of models == Neurorobots can be divided into various major classes based on the robot's purpose. Each class is designed to implement a specific mechanism of interest for study. Common types of neurorobots are those used to study motor control, memory, action selection, and perception. === Locomotion and motor control === Neurorobots are often used to study motor feedback and control systems, and have proved their merit in developing controllers for robots. Locomotion is modeled by a number of neurologically inspired theories on the action of motor systems. Locomotion control has been mimicked using models or central pattern generators, clumps of neurons capable of driving repetitive behavior, to make four-legged walking robots. Other groups have expanded the idea of combining rudimentary control systems into a hierarchical set of simple autonomous systems. These systems can formulate complex movements from a combination of these rudimentary subsets. This theory of motor action is based on the organization of cortical columns, which progressively integrate from simple sensory input into a complex afferent signals, or from complex motor programs to simple controls for each muscle fiber in efferent signals, forming a similar hierarchical structure. Another method for motor control uses learned error correction and predictive controls to form a sort of simulated muscle memory. In this model, awkward, random, and error-prone movements are corrected for using error feedback to produce smooth and accurate movements over time. The controller learns to create the correct control signal by predicting the error. Using these ideas, robots have been designed which can learn to produce adaptive arm movements or to avoid obstacles in a course. === Learning and memory systems === Robots designed to test theories of animal memory systems. Many studies examine the memory system of rats, particularly the rat hippocampus, dealing with place cells, which fire for a specific location that has been learned. Systems modeled after the rat hippocampus are generally able to learn mental maps of the environment, including recognizing landmarks and associating behaviors with them, allowing them to predict the upcoming obstacles and landmarks. Another study has produced a robot based on the proposed learning paradigm of barn owls for orientation and localization based on primarily auditory, but also visual stimuli. The hypothesized method involves synaptic plasticity and neuromodulation, a mostly chemical effect in which reward neurotransmitters such as dopamine or serotonin affect the firing sensitivity of a neuron to be sharper. The robot used in the study adequately matched the behavior of barn owls. Furthermore, the close interaction between motor output and auditory feedback proved to be vital in the learning process, supporting active sensing theories that are involved in many of the learning models. Neurorobots in these studies are presented with simple mazes or patterns to learn. Some of the problems presented to the neurorobot include recognition of symbols, colors, or other patterns and execute simple actions based on the pattern. In the case of the barn owl simulation, the robot had to determine its location and direction to navigate in its environment. === Action selection and value systems === Action selection studies deal with negative or positive weighting to an action and its outcome. Neurorobots can and have been used to study simple ethical interactions, such as the classical thought experiment where there are more people than a life raft can hold, and someone must leave the boat to save the rest. However, more neurorobots used in the study of action selection contend with much simpler persuasions such as self-preservation or perpetuation of the population of robots in the study. These neurorobots are modeled after the neuromodulation of synapses to encourage circuits with positive results. In biological systems, neurotransmitters such as dopamine or acetylcholine positively reinforce neural signals that are beneficial. One study of such interaction involved the robot Darwin VII, which used visual, auditory, and a simulated taste input to "eat" conductive metal blocks. The arbitrarily chosen good blocks had a striped pattern on them while the bad blocks had a circular shape on them. The taste sense was simulated by conductivity of the blocks. The robot had positive and negative feedbacks to the taste based on its level of conductivity. The researchers observed the robot to see how it learned its action selection behaviors based on the inputs it had. Other studies have used herds of small robots which feed on batteries strewn about the room, and communicate its findings to other robots. === Sensory perception === Neurorobots have also been used to study sensory perception, particularly vision. These are primarily systems that result from embedding neural models of sensory pathways in automatas. This approach gives exposure to the sensory signals that occur during behavior and also enables a more realistic assessment of the degree of robustness of the neural model. It is well known that changes in the sensory signals produced by motor activity provide useful perceptual cues that are used extensively by organisms. For example, researchers have used the depth information that emerges during replication of human head and eye movements to establish robust representations of the visual scene. == Biological robots == Biological robots are not officially neurorobots in that they are not neurologically inspired AI systems, but actual neuron tissue wired to a robot. This employs the use of cultured neural networks to study brain development or neural interactions. These typically consist of a neural culture raised on a multielectrode array (MEA), which is capable of both recording the neural activity and stimulating the tissue. In some cases, the MEA is connected to a computer which presents a simulated environment to the brain tissue and translates brain activity into actions in the simulation, as well as providing sensory feedback The ability to record neural activity gives researchers a window into a brain, which they can use to learn about a number of the same issues neurorobots are used for. An area of concern with the biological robots is ethics. Many questions are raised about how to treat such experiments. The central question concerns consciousness and whether or not the rat brain experiences it. There are many theories about how to define consciousness. == Implications for neuroscience == Neuroscientists benefit from neurorobotics because it provides a blank slate to test various possible methods of brain function in a controlled and testable environment. While robots are more simplified versions of the systems they emulate, they are more specific, allowing more direct testing of the issue at hand. They also have the benefit of being accessible at all times, while it is more difficult to monitor large portions of a brain while the human or animal is active, especially individual neurons. The development of neuroscience has produced neural treatments. These include pharmaceuticals and neural rehabilitation. Progress is dependent on an intricate understanding of the brain and how exactly it functions. It is difficult to study the brain, especially in humans, due to the danger associated with cranial surgeries. Neurorobots can improved the range of tests and experiments that can be performed in the study of neural processes.

    Read more →