Kernel method

Kernel method

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing. Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick". Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. Algorithms capable of operating with kernels include the kernel perceptron, support-vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters and many others. Most kernel algorithms are based on convex optimization or eigenproblems and are statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory (for example, using Rademacher complexity). == Motivation and informal explanation == Kernel methods can be thought of as instance-based learners: rather than learning some fixed set of parameters corresponding to the features of their inputs, they instead "remember" the i {\displaystyle i} -th training example ( x i , y i ) {\displaystyle (\mathbf {x} _{i},y_{i})} and learn for it a corresponding weight w i {\displaystyle w_{i}} . Prediction for unlabeled inputs, i.e., those not in the training set, are treated by the application of a similarity function k {\displaystyle k} , called a kernel, between the unlabeled input x ′ {\displaystyle \mathbf {x'} } and each of the training inputs x i {\displaystyle \mathbf {x} _{i}} . For instance, a kernelized binary classifier typically computes a weighted sum of similarities y ^ = sgn ⁡ ∑ i = 1 n w i y i k ( x i , x ′ ) , {\displaystyle {\hat {y}}=\operatorname {sgn} \sum _{i=1}^{n}w_{i}y_{i}k(\mathbf {x} _{i},\mathbf {x'} ),} where y ^ ∈ { − 1 , + 1 } {\displaystyle {\hat {y}}\in \{-1,+1\}} is the kernelized binary classifier's predicted label for the unlabeled input x ′ {\displaystyle \mathbf {x'} } whose hidden true label y {\displaystyle y} is of interest; k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is the kernel function that measures similarity between any pair of inputs x , x ′ ∈ X {\displaystyle \mathbf {x} ,\mathbf {x'} \in {\mathcal {X}}} ; the sum ranges over the n labeled examples { ( x i , y i ) } i = 1 n {\displaystyle \{(\mathbf {x} _{i},y_{i})\}_{i=1}^{n}} in the classifier's training set, with y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} ; the w i ∈ R {\displaystyle w_{i}\in \mathbb {R} } are the weights for the training examples, as determined by the learning algorithm; the sign function sgn {\displaystyle \operatorname {sgn} } determines whether the predicted classification y ^ {\displaystyle {\hat {y}}} comes out positive or negative. Kernel classifiers were described as early as the 1960s, with the invention of the kernel perceptron. They rose to great prominence with the popularity of the support-vector machine (SVM) in the 1990s, when the SVM was found to be competitive with neural networks on tasks such as handwriting recognition. == Mathematics: the kernel trick == The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all x {\displaystyle \mathbf {x} } and x ′ {\displaystyle \mathbf {x'} } in the input space X {\displaystyle {\mathcal {X}}} , certain functions k ( x , x ′ ) {\displaystyle k(\mathbf {x} ,\mathbf {x'} )} can be expressed as an inner product in another space V {\displaystyle {\mathcal {V}}} . The function k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is often referred to as a kernel or a kernel function. The word "kernel" is used in mathematics to denote a weighting function for a weighted sum or integral. Certain problems in machine learning have more structure than an arbitrary weighting function k {\displaystyle k} . The computation is made much simpler if the kernel can be written in the form of a "feature map" φ : X → V {\displaystyle \varphi \colon {\mathcal {X}}\to {\mathcal {V}}} which satisfies k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ V . {\displaystyle k(\mathbf {x} ,\mathbf {x'} )=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle _{\mathcal {V}}.} The key restriction is that ⟨ ⋅ , ⋅ ⟩ V {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {V}}} must be a proper inner product. On the other hand, an explicit representation for φ {\displaystyle \varphi } is not necessary, as long as V {\displaystyle {\mathcal {V}}} is an inner product space. The alternative follows from Mercer's theorem: an implicitly defined function φ {\displaystyle \varphi } exists whenever the space X {\displaystyle {\mathcal {X}}} can be equipped with a suitable measure ensuring the function k {\displaystyle k} satisfies Mercer's condition. Mercer's theorem is similar to a generalization of the result from linear algebra that associates an inner product to any positive-definite matrix. In fact, Mercer's condition can be reduced to this simpler case. If we choose as our measure the counting measure μ ( T ) = | T | {\displaystyle \mu (T)=|T|} for all T ⊂ X {\displaystyle T\subset X} , which counts the number of points inside the set T {\displaystyle T} , then the integral in Mercer's theorem reduces to a summation ∑ i = 1 n ∑ j = 1 n k ( x i , x j ) c i c j ≥ 0. {\displaystyle \sum _{i=1}^{n}\sum _{j=1}^{n}k(\mathbf {x} _{i},\mathbf {x} _{j})c_{i}c_{j}\geq 0.} If this summation holds for all finite sequences of points ( x 1 , … , x n ) {\displaystyle (\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n})} in X {\displaystyle {\mathcal {X}}} and all choices of n {\displaystyle n} real-valued coefficients ( c 1 , … , c n ) {\displaystyle (c_{1},\dots ,c_{n})} (cf. positive definite kernel), then the function k {\displaystyle k} satisfies Mercer's condition. Some algorithms that depend on arbitrary relationships in the native space X {\displaystyle {\mathcal {X}}} would, in fact, have a linear interpretation in a different setting: the range space of φ {\displaystyle \varphi } . The linear interpretation gives us insight about the algorithm. Furthermore, there is often no need to compute φ {\displaystyle \varphi } directly during computation, as is the case with support-vector machines. Some cite this running time shortcut as the primary benefit. Researchers also use it to justify the meanings and properties of existing algorithms. Theoretically, a Gram matrix K ∈ R n × n {\displaystyle \mathbf {K} \in \mathbb {R} ^{n\times n}} with respect to { x 1 , … , x n } {\displaystyle \{\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n}\}} (sometimes also called a "kernel matrix"), where K i j = k ( x i , x j ) {\displaystyle K_{ij}=k(\mathbf {x} _{i},\mathbf {x} _{j})} , must be positive semi-definite (PSD). Empirically, for machine learning heuristics, choices of a function k {\displaystyle k} that do not satisfy Mercer's condition may still perform reasonably if k {\displaystyle k} at least approximates the intuitive idea of similarity. Regardless of whether k {\displaystyle k} is a Mercer kernel, k {\displaystyle k} may still be referred to as a "kernel". If the kernel function k {\displaystyle k} is also a covariance function as used in Gaussian processes, then the Gram matrix K {\displaystyle \mathbf {K} } can also be called a covariance matrix. == Applications == Application areas of kernel methods are diverse and include geostatistics, kriging, inverse distance weighting, 3D reconstruction, bioinformatics, cheminformatics, information extraction and handwriting recognition. == Popular kernels == Fisher kernel Graph kernels Kernel smoother Polynomial kernel Radial basis function kern

Visible (mobile app)

Visible is a health tracking mobile app for people with long COVID and myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). The company was founded by a Harry Leeming, an engineer from London living with long Covid since 2020, and Luke Martin-Fuller. In November 2022, Visible released an open beta of an app that aims to help people pace their activities to avoid post-exertional malaise. The app gathers data on exertion levels, symptom severity, and heart-rate variability. HRV is approximated using a smartphone's camera via a technique called photoplethysmography, and according to the app's developers, can indicate how much someone needs rest. The app is currently free, but is expected to be freemium in the future. Users can also opt to allow their data be used for research purposes. In July 2023, Visible and Imperial College London announced the start of the first two studies. One is on the effects of the menstrual cycle on long COVID symptoms, and the other is on the condition's epidemiology and economic impact. Visible has announced plans to couple the app with activity trackers for continuous monitoring of heart-rate and actimetry data, which the developers claim will be more effective. As of 2022, no clinical trials on Visible's effectiveness have been conducted.

Empowerment (artificial intelligence)

Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment. An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation. The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables ( S : s ∈ S , A : a ∈ A {\displaystyle S:s\in {\mathcal {S}},A:a\in {\mathcal {A}}} ) and time ( t {\displaystyle t} ). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network. == Definition == Empowerment ( E {\displaystyle {\mathfrak {E}}} ) is defined as the channel capacity ( C {\displaystyle C} ) of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors. E := C ( A t ⟶ S t + 1 ) ≡ max p ( a t ) I ( A t ; S t + 1 ) {\displaystyle {\mathfrak {E}}:=C(A_{t}\longrightarrow S_{t+1})\equiv \max _{p(a_{t})}I(A_{t};S_{t+1})} In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment. E ( A t n ⟶ S t + n ) = max p ( a t , . . . , a t + n − 1 ) I ( A t , . . . , A t + n − 1 ; S t + n ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n})=\max _{p(a_{t},...,a_{t+n-1})}I(A_{t},...,A_{t+n-1};S_{t+n})} The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits. === Contextual Empowerment === In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'. C {\displaystyle C} is a random variable describing the context (e.g. state). E ( A t n ⟶ S t + n ∣ C ) = ∑ c ∈ C p ( c ) E ( A t n ⟶ S t + n ∣ C = c ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C)=\sum _{c{\in }C}p(c){\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C=c)} == Application == Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent. Empowerment has been applied in studies of collective behaviour and in continuous domains. As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control. Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games, and in the control of underwater vehicles.

Web intelligence

Web intelligence is the area of scientific research and development that explores the roles and makes use of artificial intelligence and information technology for new products, services and frameworks that are empowered by the World Wide Web. The term was coined in a paper written by Ning Zhong, Jiming Liu Yao and Y.Y. Ohsuga in the Computer Software and Applications Conference in 2000. == Research == The research about the web intelligence covers many fields – including data mining (in particular web mining), information retrieval, pattern recognition, predictive analytics, the semantic web, web data warehousing – typically with a focus on web personalization and adaptive websites.

Outline of machine learning

The following outline is provided as an overview of, and topical guide to, machine learning: Machine learning (ML) is a subfield of artificial intelligence within computer science that evolved from the study of pattern recognition and computational learning theory. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". ML involves the study and construction of algorithms that can learn from and make predictions on data. These algorithms operate by building a model from a training set of example observations to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. == How can machine learning be categorized? == An academic discipline A branch of science An applied science A subfield of computer science A branch of artificial intelligence A subfield of soft computing Application of statistics === Paradigms of machine learning === Supervised learning, where the model is trained on labeled data Unsupervised learning, where the model tries to identify patterns in unlabeled data Reinforcement learning, where the model learns to make decisions by receiving rewards or penalties. == Applications of machine learning == Applications of machine learning Bioinformatics Biomedical informatics Computer vision Customer relationship management Data mining Earth sciences Email filtering Inverted pendulum (balance and equilibrium system) Natural language processing Named Entity Recognition Automatic summarization Automatic taxonomy construction Dialog system Grammar checker Language recognition Handwriting recognition Optical character recognition Speech recognition Text to Speech Synthesis Speech Emotion Recognition Machine translation Question answering Speech synthesis Text mining Term frequency–inverse document frequency Text simplification Pattern recognition Facial recognition system Handwriting recognition Image recognition Optical character recognition Speech recognition Recommendation system Collaborative filtering Content-based filtering Hybrid recommender systems Search engine Search engine optimization Social engineering == Machine learning hardware == Graphics processing unit Tensor processing unit Vision processing unit == Machine learning tools == Comparison of machine learning software Comparison of deep learning software === Machine learning frameworks === ==== Proprietary machine learning frameworks ==== Amazon Machine Learning Microsoft Azure Machine Learning Studio DistBelief (replaced by TensorFlow) ==== Open source machine learning frameworks ==== Apache Singa Apache MXNet Caffe PyTorch mlpack TensorFlow Torch CNTK Accord.Net Jax MLJ.jl – A machine learning framework for Julia === Machine learning libraries === Deeplearning4j Theano scikit-learn Keras === Machine learning algorithms === == Machine learning methods == === Instance-based algorithm === K-nearest neighbors algorithm (KNN) Learning vector quantization (LVQ) Self-organizing map (SOM) === Regression analysis === Logistic regression Ordinary least squares regression (OLSR) Linear regression Stepwise regression Multivariate adaptive regression splines (MARS) Regularization algorithm Ridge regression Least Absolute Shrinkage and Selection Operator (LASSO) Elastic net Least-angle regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier === Dimensionality reduction === Dimensionality reduction Canonical correlation analysis (CCA) Factor analysis Feature extraction Feature selection Independent component analysis (ICA) Linear discriminant analysis (LDA) Multidimensional scaling (MDS) Non-negative matrix factorization (NMF) Partial least squares regression (PLSR) Principal component analysis (PCA) Principal component regression (PCR) Projection pursuit Sammon mapping t-distributed stochastic neighbor embedding (t-SNE) === Ensemble learning === Ensemble learning AdaBoost Boosting Bootstrap aggregating (also "bagging" or "bootstrapping") Ensemble averaging Gradient boosted decision tree (GBDT) Gradient boosting Random Forest Stacked Generalization === Meta-learning === Meta-learning Inductive bias Metadata === Reinforcement learning === Reinforcement learning Q-learning State–action–reward–state–action (SARSA) Temporal difference learning (TD) Learning Automata === Supervised learning === Supervised learning Averaged one-dependence estimators (AODE) Artificial neural network Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling (GMDH) Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length (decision trees, decision graphs, etc.) Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Conditional Random Field ANOVA Quadratic classifiers k-nearest neighbor Boosting SPRINT Bayesian networks Naive Bayes Hidden Markov models Hierarchical hidden Markov model ==== Bayesian ==== Bayesian statistics Bayesian knowledge base Naive Bayes Gaussian Naive Bayes Multinomial Naive Bayes Averaged One-Dependence Estimators (AODE) Bayesian Belief Network (BBN) Bayesian Network (BN) ==== Decision tree algorithms ==== Decision tree algorithm Decision tree Classification and regression tree (CART) Iterative Dichotomiser 3 (ID3) C4.5 algorithm C5.0 algorithm Chi-squared Automatic Interaction Detection (CHAID) Decision stump Conditional decision tree ID3 algorithm Random forest SLIQ ==== Linear classifier ==== Linear classifier Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machine === Unsupervised learning === Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Association rule learning algorithms Apriori algorithm Eclat algorithm ==== Artificial neural networks ==== Artificial neural network Feedforward neural network Extreme learning machine Convolutional neural network Recurrent neural network Long short-term memory (LSTM) Logic learning machine Self-organizing map ==== Association rule learning ==== Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm ==== Hierarchical clustering ==== Hierarchical clustering Single-linkage clustering Conceptual clustering ==== Cluster analysis ==== Cluster analysis BIRCH DBSCAN Expectation–maximization (EM) Fuzzy clustering Hierarchical clustering k-means clustering k-medians Mean-shift OPTICS algorithm ==== Anomaly detection ==== Anomaly detection k-nearest neighbors algorithm (k-NN) Local outlier factor === Semi-supervised learning === Semi-supervised learning Active learning Generative models Low-density separation Graph-based methods Co-training Transduction === Deep learning === Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Generative Adversarial Network Style transfer Transformer Stacked Auto-Encoders === Other machine learning methods and problems === Anomaly detection Association rules Bias-variance dilemma Classification Multi-label classification Clustering Data Pre-processing Empirical risk minimization Feature engineering Feature learning Learning to rank Occam learning Online machine learning PAC learning Regression Reinforcement Learning Semi-supervised learning Statistical learning Structured prediction Graphical models Bayesian network Conditional random field (CRF) Hidden Markov model (HMM) Unsupervised learning VC theory == Machine learning research == List of artificial intelligence projects List of datasets for machine learning research == History of machine learning == History of machine learning Timeline of machine learning == Machine learning projects == Machine learning projects: DeepMind Google Brain OpenAI Meta AI Hugging Face == Machine learning organizations == === Machine learning conferences and workshops === Artificial Intelligence and Security (AISec) (co-located workshop with CCS) Conference on Neural Information Processing Systems (NIPS) ECML PKDD International Conference on Machine Learning (ICML) ML4ALL (Machine Learning For All) == Machine learning publications == === Books on machine learning === Mathematics for Machine Learning Hands-On Machine Learning Scikit-Learn, Keras, and TensorFlow The Hundred-Page Machine Learning Book === Machine learning journals === Machine Learning Journal of Machine Learning Research (JMLR) Neural Computation == Pe

Process map

Process map is a global-system process model that is used to outline the processes that make up the business system and how they interact with each other. Process map shows the processes as objects, which means it is a static and non-algorithmic view of the processes. It should be differentiated from a detailed process model, which shows a dynamic and algorithmic view of the processes, usually known as a process flow diagram. There are different notation standards that can be used for modelling process maps, but the most notable ones are TOGAF Event Diagram, Eriksson-Penker notation, and ARIS Value Added Chain. == Global process models == Global characteristics of the business system are captured by global or system models. Global process models are presented using different methodologies and sometimes under different names. Most notably, they are named process map in Visual Paradigm and MMABP, value-added chain in ARIS, and process diagram in Eriksson-Penker notation – which can easily lead to the confusion with process flow (detailed process model). Global models are mainly object-oriented and present a static view of the business system; they do not describe dynamic aspects of processes. A process map shows the presence of processes and their mutual relationships. The requirement for the global perspective of the system as a supplementary to the internal process logic description results from the necessity of taking into consideration not only the internal process logic but also its significant surroundings. The algorithmic process model cannot take the place of this perspective since it represents the system model of the process. The detailed process model and the global process model represent different perspectives on the same business system, so these models must be mutually consistent. A macro process map represents the major processes required to deliver a product or service to the customer. These macro process maps can be further detailed in sub-diagrams. It is often the case that process maps cross different functional areas of the organization. Process maps are used by many companies to have a holistic view of all processes and the connections between them. Maps help in navigating the sub-processes and make understanding of the organization's operations easier. The process map shows relationships and dependencies between processes and its focus should be on core business processes of the organization. A process map can be seen as the most abstract level of the process architecture, and it acts as the introduction to the more detailed levels. A process map that is correctly designed is able to provide a general understanding of a company's operations. Designing the process map is an important and strategic step for the organization, and it is followed by further business process modelling implementation. == Context == Methodology for Modelling and Analysis of Business Process (MMABP) is a business process modelling methodology developed at the Department of Information Technology, Faculty of Informatics and Statistics of the Prague University of Economics and Business. The methodology is defined as a “general methodology for modelling business systems using informatics methods and approaches”. Methodology is used to analyse business processes and to develop a comprehensive model of the system. The goal of developing a model is to be used for process optimization. The model should be created following the characteristics and specifics of the organization in question and following external influences that can affect the organization. The model should be optimal from an economic perspective, but it should also be optimal from a factual perspective, meaning that it should be as simple as possible while maintaining complete functionality. Business system modelling is based on a two-dimensional approach: Real World structure (substance) – set of objects and their relationships Real World behaviour – set of mutually connected business processes Additionally, there are also two views of the systems: Global view of the system Detailed view of the system's parts This results in the need to model the system from four different perspectives in order to achieve the complete and comprehensive view of the business system. MMABP also proposes which notation languages can be used for modelling each perspective, and it also suggests some improvements to the notation languages in order to fit the purpose. Global view of the objects – Conceptual model (Class diagram) Detailed view of the objects – Object life cycle (State Chart) Global view of the processes – Process map (Eriksson-Penker Diagram/TOGAF Event Diagram/ARIS VAC) Detailed view of the processes – Model of the process flow (BPMN Diagram) Data Flow Diagram (DFD) is additional diagram used for describing the required functionalities of the information system. == Notation standards == === Eriksson-Penker Diagram === Eriksson-Penker diagram is a tool used in business model analysis and design. It is named after Hans-Erik Eriksson and Magnus Penker, who developed the concept in their book "Business modelling with UML: Business Patterns at Work”. Eriksson-Penker diagrams are used to map out the key components of a business model and how they interact with one another. The diagrams typically consist of a series of boxes and lines that represent the different elements of the business model, such as the value proposition, customer segments, channels, revenue streams, and key resources. The lines between the boxes represent the relationships and dependencies between the different elements of the business model. These diagrams are useful for visualizing and understanding the various components of a business model, and can help organizations identify potential areas for improvement or areas of risk. They can also be used as a communication tool to help stakeholders understand the business model and its underlying assumptions. These diagrams are useful for visualizing and understanding the various components of a business model, and can help organizations identify potential areas for improvement or areas of risk. They can also be used as a communication tool to help stakeholders understand the business model and its underlying assumptions. It is possible to use Eriksson-Penker diagrams to create a global process view of a business. In this case, a diagram would be used to map out the key processes and activities that are involved in the business, as well as the relationships and dependencies between these processes. For example, an Eriksson-Penker diagram could be used to depict the various steps involved in the product development process, from concept development to market launch. It could also be used to show how different functions within the organization, such as marketing, sales, and production, interact and depend on one another to support the overall business. Eriksson-Penker diagram is one of the most popular de facto standards that can be used for an object-oriented global view of business processes. It is developed as an extension of the UML, and it is often used together with the BPMN to compensate for the lack of possibility to model the global view with this widely accepted standard. === TOGAF Event Diagram === TOGAF (The Open Group Architecture Framework) is a framework for enterprise architecture that provides a common language and set of standards for designing, planning, implementing, and governing an enterprise's IT architecture. TOGAF event diagrams are diagrams used in the TOGAF framework to represent the flow of events within a system or process. The TOGAF Event Diagram is a visual representation of the events within an organization or system. It can be used to show the sequence of events that occur in a particular process, as well as the relationships between the events and the stakeholders involved. TOGAF Event Diagrams can be useful in creating a global process view because they provide a visual representation of the events, which can be helpful in understanding how the process fits into the larger context of the organization. TOGAF Event Diagram is the most perspective standard for the system view of processes today. It is used to represent the system of processes as well as their connections to the functional organizational structure. === ARIS Value Added Chain === ARIS (Architecture of Integrated Information Systems) is a methodology and a set of tools for designing and managing business processes. It is based on the idea that business processes are the core of an organization and that they can be modelled and optimized to improve efficiency and effectiveness. The ARIS methodology provides a framework for understanding and analysing business processes, as well as for designing and implementing improvements to those processes. It includes a set of graphical modelling languages and tools for creating process models, as well as a database for storing and managing pr

Learning automaton

A learning automaton is one type of machine learning algorithm studied since 1970s. Learning automata select their current action based on past experiences from the environment. It will fall into the range of reinforcement learning if the environment is stochastic and a Markov decision process (MDP) is used. == History == Research in learning automata can be traced back to the work of Michael Lvovitch Tsetlin in the early 1960s in the Soviet Union. Together with some colleagues, he published a collection of papers on how to use matrices to describe automata functions. Additionally, Tsetlin worked on reasonable and collective automata behaviour, and on automata games. Learning automata were also investigated by researches in the United States in the 1960s. However, the term learning automaton was not used until Narendra and Thathachar introduced it in a survey paper in 1974. == Definition == A learning automaton is an adaptive decision-making unit situated in a random environment that learns the optimal action through repeated interactions with its environment. The actions are chosen according to a specific probability distribution which is updated based on the environment response the automaton obtains by performing a particular action. With respect to the field of reinforcement learning, learning automata are characterized as policy iterators. In contrast to other reinforcement learners, policy iterators directly manipulate the policy π. Another example for policy iterators are evolutionary algorithms. Formally, Narendra and Thathachar define a stochastic automaton to consist of: a set X of possible inputs, a set Φ = { Φ1, ..., Φs } of possible internal states, a set α = { α1, ..., αr } of possible outputs, or actions, with r ≤ s, an initial state probability vector p(0) = ≪ p1(0), ..., ps(0) ≫, a computable function A which after each time step t generates p(t+1) from p(t), the current input, and the current state, and a function G: Φ → α which generates the output at each time step. In their paper, they investigate only stochastic automata with r = s and G being bijective, allowing them to confuse actions and states. The states of such an automaton correspond to the states of a "discrete-state discrete-parameter Markov process". At each time step t=0,1,2,3,..., the automaton reads an input from its environment, updates p(t) to p(t+1) by A, randomly chooses a successor state according to the probabilities p(t+1) and outputs the corresponding action. The automaton's environment, in turn, reads the action and sends the next input to the automaton. Frequently, the input set X = { 0,1 } is used, with 0 and 1 corresponding to a nonpenalty and a penalty response of the environment, respectively; in this case, the automaton should learn to minimize the number of penalty responses, and the feedback loop of automaton and environment is called a "P-model". More generally, a "Q-model" allows an arbitrary finite input set X, and an "S-model" uses the interval [0,1] of real numbers as X. A visualised demo/ Art Work of a single Learning Automaton had been developed by μSystems (microSystems) Research Group at Newcastle University. == Finite action-set learning automata == Finite action-set learning automata (FALA) are a class of learning automata for which the number of possible actions is finite or, in more mathematical terms, for which the size of the action-set is finite.