CuckooChess

CuckooChess

CuckooChess is an advanced free and open-source chess engine under the GNU General Public License written in Java by Peter Österlund. CuckooChess provides an own GUI, and optionally supports the Universal Chess Interface protocol for the use with external GUIs such as Arena. An Android port is available, where its GUI is also based on Peter Österlund's Stockfish port dubbed DroidFish. The program uses the Chess Cases chess font, created by Matthieu Leschemelle. The name CuckooChess comes due that the transposition table is based on Cuckoo hashing. Android app based chess gaming app Droidfish employs both CuckooChess and Stockfish chess engines. Similarly, Kickstarter funded AI based virtual reality chess game Square Off also uses CuckooChess engine. It has an ELO rating of 2583 (as of July 2018) and a rank of 135‑137 in the Computer Chess Rating List.

Collaboration-oriented architecture

Collaboration Oriented Architecture (COA) is a computer system that is designed to collaborate, or use services, from systems that are outside of the operators control. Collaboration Oriented Architecture will often use Service Oriented Architecture to deliver the technical framework. Collaboration Oriented Architecture is the ability to collaborate between systems that are based on the Jericho Forum principles or "Commandments". Bill Gates and Craig Mundie (Microsoft) clearly articulated the need for people to work outside of their organizations in a secure and collaborative manner in their opening keynote to the RSA Security Conference in February 2007. Successful implementation of a Collaboration Oriented Architecture implies the ability to successfully inter-work securely over the Internet and will typically mean the resolution of the problems that come with de-perimeterisation. == Etymology == The term Collaboration Oriented Architectures was defined and developed in a meeting of the Jericho Forum at a meeting held at HSBC on 6 July 2007. == Definition == The key elements that qualify a security architecture as a Collaboration Oriented Architecture are as follows; Protocol: Systems use appropriately secure protocols to communicate. Authentication: The protocol is authenticated with user and/or system credentials. Federation: User and/or systems credentials are accepted and validated by systems that are not under your (locus of) control. Network Agnostic: The design does not rely on a secure network, thus it will operate securely from an Intranet to raw-Internet Trust: The collaborating system have the capacity to be able to confirm to a specified degree of confidence that the components in a transaction chain have. Risk: The collaborating systems can make a risk assessment on any transaction based on the communicated levels of required trust, based on the required degree of identity, confidentiality, integrity, availability. == Authentication == Working in a collaborative multi-sourced environment implies the need for authentication, authorization and accountability which must interoperate / exchange outside of your locus / area of control. People/systems must be able to manage permissions of resources and rights of users they don't control There must be capability of trusting an organization, which can authenticate individuals or groups, thus eliminating the need to create separate identities In principle, only one instance of person / system / identity may exist, but privacy necessitates the support for multiple instances, or one instance with multiple facets, often referred to as personas Systems must be able to pass on security credentials /assertions Multiple loci (areas) of control must be supported

Curriculum learning

Curriculum learning is a technique in machine learning in which a model is trained on examples of increasing difficulty, where the definition of "difficulty" may be provided externally or discovered as part of the training process. This is intended to attain good performance more quickly, or to converge to a better local optimum if the global optimum is not found. == Approach == Most generally, curriculum learning is the technique of successively increasing the difficulty of examples in the training set that is presented to a model over multiple training iterations. This can produce better results than exposing the model to the full training set immediately under some circumstances; most typically, when the model is able to learn general principles from easier examples, and then gradually incorporate more complex and nuanced information as harder examples are introduced, such as edge cases. This has been shown to work in many domains, most likely as a form of regularization. There are several major variations in how the technique is applied: A concept of "difficulty" must be defined. This may come from human annotation or an external heuristic; for example in language modeling, shorter sentences might be classified as easier than longer ones. Another approach is to use the performance of another model, with examples accurately predicted by that model being classified as easier (providing a connection to boosting). Difficulty can be increased steadily or in distinct epochs, and in a deterministic schedule or according to a probability distribution. This may also be moderated by a requirement for diversity at each stage, in cases where easier examples are likely to be disproportionately similar to each other. Applications must also decide the schedule for increasing the difficulty. Simple approaches may use a fixed schedule, such as training on easy examples for half of the available iterations and then all examples for the second half. Other approaches use self-paced learning to increase the difficulty in proportion to the performance of the model on the current set. Since curriculum learning only concerns the selection and ordering of training data, it can be combined with many other techniques in machine learning. The success of the method assumes that a model trained for an easier version of the problem can generalize to harder versions, so it can be seen as a form of transfer learning. Some authors also consider curriculum learning to include other forms of progressively increasing complexity, such as increasing the number of model parameters. It is frequently combined with reinforcement learning, such as learning a simplified version of a game first. Some domains have shown success with anti-curriculum learning: training on the most difficult examples first. One example is the ACCAN method for speech recognition, which trains on the examples with the lowest signal-to-noise ratio first. == History == The term "curriculum learning" was introduced by Yoshua Bengio et al in 2009, with reference to the psychological technique of shaping in animals and structured education for humans: beginning with the simplest concepts and then building on them. The authors also note that the application of this technique in machine learning has its roots in the early study of neural networks such as Jeffrey Elman's 1993 paper Learning and development in neural networks: the importance of starting small. Bengio et al showed good results for problems in image classification, such as identifying geometric shapes with progressively more complex forms, and language modeling, such as training with a gradually expanding vocabulary. They conclude that, for curriculum strategies, "their beneficial effect is most pronounced on the test set", suggesting good generalization. The technique has since been applied to many other domains: Natural language processing: Part-of-speech tagging Intent detection Sentiment analysis Machine translation Speech recognition Language model pre-training Image recognition: Facial recognition Object detection Reinforcement learning: Game-playing Graph learning Matrix factorization

Sammon mapping

Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality (see multidimensional scaling) by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection. It is particularly suited for use in exploratory data analysis. The method was proposed by John W. Sammon in 1969. It is considered a non-linear approach as the mapping cannot be represented as a linear combination of the original variables as possible in techniques such as principal component analysis, which also makes it more difficult to use for classification applications. Denote the distance between ith and jth objects in the original space by d i j ∗ {\displaystyle \scriptstyle d_{ij}^{}} , and the distance between their projections by d i j {\displaystyle \scriptstyle d_{ij}^{}} . Sammon's mapping aims to minimize the following error function, which is often referred to as Sammon's stress or Sammon's error: E = 1 ∑ i < j d i j ∗ ∑ i < j ( d i j ∗ − d i j ) 2 d i j ∗ . {\displaystyle E={\frac {1}{\sum \limits _{i

Autoencoder

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data (unsupervised learning). An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function that recreates the input data from the encoded representation. The autoencoder learns an efficient representation (encoding) for a set of data, typically for dimensionality reduction, to generate lower-dimensional embeddings for subsequent use by other machine learning algorithms. Variants exist which aim to make the learned representations assume useful properties. Examples are regularized autoencoders (sparse, denoising and contractive autoencoders), which are effective in learning representations for subsequent classification tasks, and variational autoencoders, which can be used as generative models. Autoencoders are applied to many problems, including facial recognition, feature detection, anomaly detection, and learning the meaning of words. In terms of data synthesis, autoencoders can also be used to randomly generate new data that is similar to the input (training) data. == Mathematical principles == === Definition === An autoencoder is defined by the following components: Two sets: the space of encoded messages Z {\displaystyle {\mathcal {Z}}} ; the space of decoded messages X {\displaystyle {\mathcal {X}}} . Typically X {\displaystyle {\mathcal {X}}} and Z {\displaystyle {\mathcal {Z}}} are Euclidean spaces, that is, X = R m , Z = R n {\displaystyle {\mathcal {X}}=\mathbb {R} ^{m},{\mathcal {Z}}=\mathbb {R} ^{n}} with m > n . {\displaystyle m>n.} Two parametrized families of functions: the encoder family E ϕ : X → Z {\displaystyle E_{\phi }:{\mathcal {X}}\rightarrow {\mathcal {Z}}} , parametrized by ϕ {\displaystyle \phi } ; the decoder family D θ : Z → X {\displaystyle D_{\theta }:{\mathcal {Z}}\rightarrow {\mathcal {X}}} , parametrized by θ {\displaystyle \theta } .For any x ∈ X {\displaystyle x\in {\mathcal {X}}} , we usually write z = E ϕ ( x ) {\displaystyle z=E_{\phi }(x)} , and refer to it as the code, the latent variable, latent representation, latent vector, etc. Conversely, for any z ∈ Z {\displaystyle z\in {\mathcal {Z}}} , we usually write x ′ = D θ ( z ) {\displaystyle x'=D_{\theta }(z)} , and refer to it as the (decoded) message. Usually, both the encoder and the decoder are defined as multilayer perceptrons (MLPs). For example, a one-layer-MLP encoder E ϕ {\displaystyle E_{\phi }} is: E ϕ ( x ) = σ ( W x + b ) {\displaystyle E_{\phi }(\mathbf {x} )=\sigma (Wx+b)} where σ {\displaystyle \sigma } is an element-wise activation function, W {\displaystyle W} is a "weight" matrix, and b {\displaystyle b} is a "bias" vector. === Training an autoencoder === An autoencoder, by itself, is simply a tuple of two functions. To judge its quality, we need a task. A task is defined by a reference probability distribution μ r e f {\displaystyle \mu _{ref}} over X {\displaystyle {\mathcal {X}}} , and a "reconstruction quality" function d : X × X → [ 0 , ∞ ] {\displaystyle d:{\mathcal {X}}\times {\mathcal {X}}\to [0,\infty ]} , such that d ( x , x ′ ) {\displaystyle d(x,x')} measures how much x ′ {\displaystyle x'} differs from x {\displaystyle x} . With those, we can define the loss function for the autoencoder as L ( θ , ϕ ) := E x ∼ μ r e f [ d ( x , D θ ( E ϕ ( x ) ) ) ] {\displaystyle L(\theta ,\phi ):=\mathbb {\mathbb {E} } _{x\sim \mu _{ref}}[d(x,D_{\theta }(E_{\phi }(x)))]} The optimal autoencoder for the given task ( μ r e f , d ) {\displaystyle (\mu _{ref},d)} is then arg ⁡ min θ , ϕ L ( θ , ϕ ) {\displaystyle \arg \min _{\theta ,\phi }L(\theta ,\phi )} . The search for the optimal autoencoder can be accomplished by any mathematical optimization technique, but usually by gradient descent. This search process is referred to as "training the autoencoder". In most situations, the reference distribution is just the empirical distribution given by a dataset { x 1 , . . . , x N } ⊂ X {\displaystyle \{x_{1},...,x_{N}\}\subset {\mathcal {X}}} , so that μ r e f = 1 N ∑ i = 1 N δ x i {\displaystyle \mu _{ref}={\frac {1}{N}}\sum _{i=1}^{N}\delta _{x_{i}}} where δ x i {\displaystyle \delta _{x_{i}}} is the Dirac measure, the quality function is just L 2 {\displaystyle L^{2}} loss: d ( x , x ′ ) = ‖ x − x ′ ‖ 2 2 {\displaystyle d(x,x')=\|x-x'\|_{2}^{2}} , and ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} is the Euclidean norm. Then the problem of searching for the optimal autoencoder is just a least-squares optimization: min θ , ϕ L ( θ , ϕ ) , where L ( θ , ϕ ) = 1 N ∑ i = 1 N ‖ x i − D θ ( E ϕ ( x i ) ) ‖ 2 2 {\displaystyle \min _{\theta ,\phi }L(\theta ,\phi ),\qquad {\text{where }}L(\theta ,\phi )={\frac {1}{N}}\sum _{i=1}^{N}\|x_{i}-D_{\theta }(E_{\phi }(x_{i}))\|_{2}^{2}} === Interpretation === An autoencoder has two main parts: an encoder that maps the message to a code, and a decoder that reconstructs the message from the code. An optimal autoencoder would perform as close to perfect reconstruction as possible, with "close to perfect" defined by the reconstruction quality function d {\displaystyle d} . The simplest way to perform the copying task perfectly would be to duplicate the signal. To suppress this behavior, the code space Z {\displaystyle {\mathcal {Z}}} usually has fewer dimensions than the message space X {\displaystyle {\mathcal {X}}} . Such an autoencoder is called undercomplete. It can be interpreted as compressing the message, or reducing its dimensionality. At the limit of an ideal undercomplete autoencoder, every possible code z {\displaystyle z} in the code space is used to encode a message x {\displaystyle x} that really appears in the distribution μ r e f {\displaystyle \mu _{ref}} , and the decoder is also perfect: D θ ( E ϕ ( x ) ) = x {\displaystyle D_{\theta }(E_{\phi }(x))=x} . This ideal autoencoder can then be used to generate messages indistinguishable from real messages, by feeding its decoder arbitrary code z {\displaystyle z} and obtaining D θ ( z ) {\displaystyle D_{\theta }(z)} , which is a message that really appears in the distribution μ r e f {\displaystyle \mu _{ref}} . If the code space Z {\displaystyle {\mathcal {Z}}} has dimension larger than (overcomplete), or equal to, the message space X {\displaystyle {\mathcal {X}}} , or the hidden units are given enough capacity, an autoencoder can learn the identity function and become useless. However, experimental results found that overcomplete autoencoders might still learn useful features. In the ideal setting, the code dimension and the model capacity could be set on the basis of the complexity of the data distribution to be modeled. A standard way to do so is to add modifications to the basic autoencoder, to be detailed below. == Variations == === Variational autoencoder (VAE) === Variational autoencoders (VAEs) belong to the families of variational Bayesian methods. Despite the architectural similarities with basic autoencoders, VAEs are architected with different goals and have a different mathematical formulation. The latent space is, in this case, composed of a mixture of distributions instead of fixed vectors. Given an input dataset x {\displaystyle x} characterized by an unknown probability function P ( x ) {\displaystyle P(x)} and a multivariate latent encoding vector z {\displaystyle z} , the objective is to model the data as a distribution p θ ( x ) {\displaystyle p_{\theta }(x)} , with θ {\displaystyle \theta } defined as the set of the network parameters so that p θ ( x ) = ∫ z p θ ( x , z ) d z {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }(x,z)dz} . === Sparse autoencoder (SAE) === Inspired by the sparse coding hypothesis in neuroscience, sparse autoencoders (SAE) are variants of autoencoders, such that the codes E ϕ ( x ) {\displaystyle E_{\phi }(x)} for messages tend to be sparse codes, that is, E ϕ ( x ) {\displaystyle E_{\phi }(x)} is close to zero in most entries. Sparse autoencoders may include more (rather than fewer) hidden units than inputs, but only a small number of the hidden units are allowed to be active at the same time. Encouraging sparsity improves performance on classification tasks. There are two main ways to enforce sparsity. One way is to simply clamp all but the highest-k activations of the latent code to zero. This is the k-sparse autoencoder. The k-sparse autoencoder inserts the following "k-sparse function" in the latent layer of a standard autoencoder: f k ( x 1 , . . . , x n ) = ( x 1 b 1 , . . . , x n b n ) {\displaystyle f_{k}(x_{1},...,x_{n})=(x_{1}b_{1},...,x_{n}b_{n})} where b i = 1 {\displaystyle b_{i}=1} if | x i | {\displaystyle |x_{i}|} ranks in the top k, and 0 otherwise. Backpropagating through f k {\displaystyle f_{k}} is simple: set gradient to 0 for b i = 0 {\displaystyle b_{i}=0} entries, and keep gradient for b i = 1 {\displaystyle b_{i}=1} entries. This is essentially a generalized ReLU function. The other way is a relaxed version of the k-

Autonomous things

Autonomous things, abbreviated AuT, or the Internet of autonomous things, abbreviated as IoAT, is an emerging term for the technological developments that are expected to bring computers into the physical environment as autonomous entities without human direction, freely moving and interacting with humans and other objects. Self-navigating drones are the first AuT technology in (limited) deployment. It is expected that the first mass-deployment of AuT technologies will be the autonomous car, generally expected to be available around 2020. Other currently expected AuT technologies include home robotics (e.g., machines that provide care for the elderly, infirm or young), and military robots (air, land or sea autonomous machines with information-collection or target-attack capabilities). AuT technologies share many common traits, which justify the common notation. They are all based on recent breakthroughs in the domains of (deep) machine learning and artificial intelligence. They all require extensive and prompt regulatory developments to specify the requirements from them and to license and manage their deployment (see the further reading below). And they all require unprecedented levels of safety (e.g., automobile safety) and security, to overcome concerns about the potential negative impact of the new technology. As an example, the autonomous car both addresses the main existing safety issues and creates new issues. It is expected to be much safer than existing vehicles, by eliminating the single most dangerous element – the driver. The US's National Highway Traffic Safety Administration estimates 94 percent of US accidents were the result of human error and poor decision-making, including speeding and impaired driving, and the Center for Internet and Society at Stanford Law School claims that "Some ninety percent of motor vehicle crashes are caused at least in part by human error". So while safety standards like the ISO 26262 specify the required safety, there is still a burden on the industry to demonstrate acceptable safety. While car accidents claim every year 35,000 lives in the US, and 1.25 million worldwide, some believe that even "a car that's 10 times as safe, which means 3,500 people die on the roads each year [in the US alone]" would not be accepted by the public. The acceptable level may be closer to the current figures on aviation accidents and incidents, with under a thousand worldwide deaths in most years – three orders of magnitude lower than cars. This underscores the unprecedented nature of the safety requirements that will need to be met for cars, with similar levels of safety expected for other Autonomous Things.

Sammon mapping

Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality (see multidimensional scaling) by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection. It is particularly suited for use in exploratory data analysis. The method was proposed by John W. Sammon in 1969. It is considered a non-linear approach as the mapping cannot be represented as a linear combination of the original variables as possible in techniques such as principal component analysis, which also makes it more difficult to use for classification applications. Denote the distance between ith and jth objects in the original space by d i j ∗ {\displaystyle \scriptstyle d_{ij}^{}} , and the distance between their projections by d i j {\displaystyle \scriptstyle d_{ij}^{}} . Sammon's mapping aims to minimize the following error function, which is often referred to as Sammon's stress or Sammon's error: E = 1 ∑ i < j d i j ∗ ∑ i < j ( d i j ∗ − d i j ) 2 d i j ∗ . {\displaystyle E={\frac {1}{\sum \limits _{i