AI Chatbot Zendesk

AI Chatbot Zendesk — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Insider threat

    Insider threat

    An insider threat is a perceived threat to an organization that comes from people within the organization, such as employees, former employees, contractors or business associates, who have inside information concerning the organization's security practices, data and computer systems. The threat may involve fraud, the theft of confidential or commercially valuable information, the theft of intellectual property, or the sabotage of computer systems. == Overview == Insiders may have accounts giving them legitimate access to computer systems, with this access originally having been given to them to serve in the performance of their duties; these permissions could be abused to harm the organization. Insiders are often familiar with the organization's data and intellectual property as well as the methods that are in place to protect them. This makes it easier for the insider to circumvent any security controls of which they are aware. Physical proximity to data means that the insider does not need to hack into the organizational network through the outer perimeter by traversing firewalls; rather they are in the building already, often with direct access to the organization's internal network. Insider threats are harder to defend against than attacks from outsiders, since the insider already has legitimate access to the organization's information and assets. An insider may attempt to steal property or information for personal gain or to benefit another organization or country. The threat to the organization could also be through malicious software left running on its computer systems by former employees, a so-called logic bomb. == Research == Insider threat is an active area of research in academia and government. The CERT Coordination Center at Carnegie-Mellon University maintains the CERT Insider Threat Center, which includes a database of more than 850 cases of insider threats, including instances of fraud, theft and sabotage; the database is used for research and analysis. CERT's Insider Threat Team also maintains an informational blog to help organizations and businesses defend themselves against insider crime. The Threat Lab and Defense Personnel and Security Research Center (DOD PERSEREC) has also recently emerged as a national resource within the United States of America. The Threat Lab hosts an annual conference, the SBS Summit. They also maintain a website that contains resources from this conference. Complimenting these efforts, a companion podcast was created, Voices from the SBS Summit. In 2022, the Threat Lab created an interdisciplinary journal, Counter Insider Threat Research and Practice (CITRAP) which publishes research on insider threat detection. === Findings === In the 2022 Data Breach Investigations Report (DBIR), Verizon found that 82% of breaches involved the human element, noting that employees continue to play a leading role in cybersecurity incidents and breaches. According to the UK Information Commissioners Office, 90% of all breaches reported to them in 2019 were the result of mistakes made by end users. This was up from 61% and 87% over the previous two years. A 2018 whitepaper reported that 53% of companies surveyed had confirmed insider attacks against their organization in the previous 12 months, with 27% saying insider attacks have become more frequent. A report published in July 2012 on the insider threat in the U.S. financial sector gives some statistics on insider threat incidents: 80% of the malicious acts were committed at work during working hours; 81% of the perpetrators planned their actions beforehand; 33% of the perpetrators were described as "difficult" and 17% as being "disgruntled". The insider was identified in 74% of cases. Financial gain was a motive in 81% of cases, revenge in 23% of cases, and 27% of the people carrying out malicious acts were in financial difficulties at the time. The US Department of Defense Personnel Security Research Center published a report that describes approaches for detecting insider threats. Earlier it published ten case studies of insider attacks by information technology professionals. Cybersecurity experts believe that 38% of negligent insiders are victims of a phishing attack, whereby they receive an email that appears to come from a legitimate source such as a company. These emails normally contain malware in the form of hyperlinks. == Typologies and ontologies == Multiple classification systems and ontologies have been proposed to classify insider threats. Traditional models of insider threat identify three broad categories: Malicious insiders, which are people who take advantage of their access to inflict harm on an organization; Negligent insiders, which are people who make errors and disregard policies, which place their organizations at risk; and Infiltrators, who are external actors that obtain legitimate access credentials without authorization. == Criticisms == Insider threat research has been criticized. Critics have argued that insider threat is a poorly defined concept. Forensically investigating insider data theft is notoriously difficult, and requires novel techniques such as stochastic forensics. Data supporting insider threat is generally proprietary (i.e., encrypted data). Theoretical/conceptual models of insider threat are often based on loose interpretations of research in the behavioral and social sciences, using "deductive principles and intuitions of subject matter expert." Adopting sociotechnical approaches, researchers have also argued for the need to consider insider threat from the perspective of social systems. Jordan Schoenherr said that "surveillance requires an understanding of how sanctioning systems are framed, how employees will respond to surveillance, what workplace norms are deemed relevant, and what ‘deviance’ means, e.g., deviation for a justified organization norm or failure to conform to an organizational norm that conflicts with general social values." By treating all employees as potential insider threats, organizations might create conditions that lead to insider threats. == Sector-specific concerns == === Healthcare === The healthcare industry faces particularly acute insider threat risks due to the large number of workforce members who require access to sensitive patient records for legitimate clinical purposes. The U.S. Department of Health and Human Services has identified unauthorized access by insiders, including workforce snooping on patient records and theft of protected health information for identity fraud, as a persistent enforcement concern. The Health Insurance Portability and Accountability Act (HIPAA) Security Rule addresses insider threats through several administrative safeguards, including workforce security procedures requiring covered entities to implement policies for authorizing and supervising workforce members who work with electronic protected health information, as well as termination procedures to revoke access when employment ends (45 CFR 164.308(a)(3)). The rule also requires audit controls to record and examine information system activity (45 CFR 164.312(b)), enabling detection of unauthorized access by insiders. The December 2024 Notice of proposed rulemaking (NPRM) to overhaul the HIPAA Security Rule would strengthen insider threat defenses by mandating role-based access controls, requiring notification of relevant workforce members within 24 hours of any changes to access privileges, and requiring regular review of audit logs to detect anomalous access patterns.

    Read more →
  • Multinomial logistic regression

    Multinomial logistic regression

    In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). Multinomial logistic regression is known by a variety of other names, including polytomous LR, multiclass LR, softmax regression, multinomial logit (mlogit), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model. == Background == Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently categorical, meaning that it falls into any one of a set of categories that cannot be ordered in any meaningful way) and for which there are more than two categories. Some examples would be: Which major will a college student choose, given their grades, stated likes and dislikes, etc.? Which blood type does a person have, given the results of various diagnostic tests? In a hands-free mobile phone dialing application, which person's name was spoken, given various properties of the speech signal? Which candidate will a person vote for, given particular demographic characteristics? Which country will a firm locate an office in, given the characteristics of the firm and of the various candidate countries? These are all statistical classification problems. They all have in common a dependent variable to be predicted that comes from one of a limited set of items that cannot be meaningfully ordered, as well as a set of independent variables (also known as features, explanators, etc.), which are used to predict the dependent variable. Multinomial logistic regression is a particular solution to classification problems that use a linear combination of the observed features and some problem-specific parameters to estimate the probability of each particular value of the dependent variable. The best values of the parameters for a given problem are usually determined from some training data (e.g. some people for whom both the diagnostic test results and blood types are known, or some examples of known words being spoken). == Assumptions == The multinomial logistic model assumes that data are case-specific; that is, each independent variable has a single value for each case. As with other types of regression, there is no need for the independent variables to be statistically independent from each other (unlike, for example, in a naive Bayes classifier); however, collinearity is assumed to be relatively low, as it becomes difficult to differentiate between the impact of several variables if this is not the case. If the multinomial logit is used to model choices, it relies on the assumption of independence of irrelevant alternatives (IIA), which is not always desirable. This assumption states that the odds of preferring one class over another do not depend on the presence or absence of other "irrelevant" alternatives. For example, the relative probabilities of taking a car or bus to work do not change if a bicycle is added as an additional possibility. This allows the choice of K alternatives to be modeled as a set of K − 1 independent binary choices, in which one alternative is chosen as a "pivot" and the other K − 1 compared against it, one at a time. The IIA hypothesis is a core hypothesis in rational choice theory; however numerous studies in psychology show that individuals often violate this assumption when making choices. An example of a problem case arises if choices include a car and a blue bus. Suppose the odds ratio between the two is 1 : 1. Now if the option of a red bus is introduced, a person may be indifferent between a red and a blue bus, and hence may exhibit a car : blue bus : red bus odds ratio of 1 : 0.5 : 0.5, thus maintaining a 1 : 1 ratio of car : any bus while adopting a changed car : blue bus ratio of 1 : 0.5. Here the red bus option was not in fact irrelevant, because a red bus was a perfect substitute for a blue bus. If the multinomial logit is used to model choices, it may in some situations impose too much constraint on the relative preferences between the different alternatives. It is especially important to take into account if the analysis aims to predict how choices would change if one alternative were to disappear (for instance if one political candidate withdraws from a three candidate race). Other models like the nested logit or the multinomial probit may be used in such cases as they allow for violation of the IIA. == Model == === Introduction === There are multiple equivalent ways to describe the mathematical model underlying multinomial logistic regression. This can make it difficult to compare different treatments of the subject in different texts. The article on logistic regression presents a number of equivalent formulations of simple logistic regression, and many of these have analogues in the multinomial logit model. The idea behind all of them, as in many other statistical classification techniques, is to construct a linear predictor function that constructs a score from a set of weights that are linearly combined with the explanatory variables (features) of a given observation using a dot product: score ⁡ ( X i , k ) = β k ⋅ X i , {\displaystyle \operatorname {score} (\mathbf {X} _{i},k)={\boldsymbol {\beta }}_{k}\cdot \mathbf {X} _{i},} where Xi is the vector of explanatory variables describing observation i, βk is a vector of weights (or regression coefficients) corresponding to outcome k, and score(Xi, k) is the score associated with assigning observation i to category k. In discrete choice theory, where observations represent people and outcomes represent choices, the score is considered the utility associated with person i choosing outcome k. The predicted outcome is the one with the highest score. The difference between the multinomial logit model and numerous other methods, models, algorithms, etc. with the same basic setup (the perceptron algorithm, support vector machines, linear discriminant analysis, etc.) is the procedure for determining (training) the optimal weights/coefficients and the way that the score is interpreted. In particular, in the multinomial logit model, the score can directly be converted to a probability value, indicating the probability of observation i choosing outcome k given the measured characteristics of the observation. This provides a principled way of incorporating the prediction of a particular multinomial logit model into a larger procedure that may involve multiple such predictions, each with a possibility of error. Without such means of combining predictions, errors tend to multiply. For example, imagine a large predictive model that is broken down into a series of submodels where the prediction of a given submodel is used as the input of another submodel, and that prediction is in turn used as the input into a third submodel, etc. If each submodel has 90% accuracy in its predictions, and there are five submodels in series, then the overall model has only 0.95 = 59% accuracy. If each submodel has 80% accuracy, then overall accuracy drops to 0.85 = 33% accuracy. This issue is known as error propagation and is a serious problem in real-world predictive models, which are usually composed of numerous parts. Predicting probabilities of each possible outcome, rather than simply making a single optimal prediction, is one means of alleviating this issue. === Setup === The basic setup is the same as in logistic regression, the only difference being that the dependent variables are categorical rather than binary, i.e. there are K possible outcomes rather than just two. The following description is somewhat shortened; for more details, consult the logistic regression article. ==== Data points ==== Specifically, it is assumed that we have a series of N observed data points. Each data point i (ranging from 1 to N) consists of a set of M explanatory variables x1,i ... xM,i (also known as independent variables, predictor variables, features, etc.), and an associated categorical outcome Yi (also known as dependent variable, response variable), which can take on one of K possible values. These possible values represent logically separate categories (e.g. different political parties, blood types, etc.), and are often described mathematically by arbitrarily assigning each a number from 1 to K. The explanatory variables and outcome represent observed properties of the data points, and are often thought of as originating in the observations of N "experiments" — although an "experiment" may consist of nothing more than gathering data. The goal of multinomial logistic regression is to construct a model that explains the relationship between the explanatory variables and the outcome, so tha

    Read more →
  • Accumulated local effects

    Accumulated local effects

    Accumulated local effects (ALE) is a machine learning interpretability method. == Concepts == ALE uses a conditional feature distribution as an input and generates augmented data, creating more realistic data than a marginal distribution. It ignores far out-of-distribution (outlier) values. Unlike partial dependence plots and marginal plots, ALE is not defeated in the presence of correlated predictors. It analyzes differences in predictions instead of averaging them by calculating the average of the differences in model predictions over the augmented data, instead of the average of the predictions themselves. == Example == Given a model that predicts house prices based on its distance from city center and size of the building area, ALE compares the differences of predictions of houses of different sizes. The result separates the impact of the size from otherwise correlated features. == Limitations == Defining evaluation windows is subjective. High correlations between features can defeat the technique. ALE requires more and more uniformly distributed observations than PDP so that the conditional distribution can be reliably determined. The technique may produce inadequate results if the data is highly sparse, which is more common with high-dimensional data (curse of dimensionality).

    Read more →
  • Large margin nearest neighbor

    Large margin nearest neighbor

    Large margin nearest neighbor (LMNN) classification is a statistical machine learning algorithm for metric learning. It learns a pseudometric designed for k-nearest neighbor classification. The algorithm is based on semidefinite programming, a sub-class of convex optimization. The goal of supervised learning (more specifically classification) is to learn a decision rule that can categorize data instances into pre-defined classes. The k-nearest neighbor rule assumes a training data set of labeled instances (i.e. the classes are known). It classifies a new data instance with the class obtained from the majority vote of the k closest (labeled) training instances. Closeness is measured with a pre-defined metric. Large margin nearest neighbors is an algorithm that learns this global (pseudo-)metric in a supervised fashion to improve the classification accuracy of the k-nearest neighbor rule. == Setup == The main intuition behind LMNN is to learn a pseudometric under which all data instances in the training set are surrounded by at least k instances that share the same class label. If this is achieved, the leave-one-out error (a special case of cross validation) is minimized. Let the training data consist of a data set D = { ( x → 1 , y 1 ) , … , ( x → n , y n ) } ⊂ R d × C {\displaystyle D=\{({\vec {x}}_{1},y_{1}),\dots ,({\vec {x}}_{n},y_{n})\}\subset R^{d}\times C} , where the set of possible class categories is C = { 1 , … , c } {\displaystyle C=\{1,\dots ,c\}} . The algorithm learns a pseudometric of the type d ( x → i , x → j ) = ( x → i − x → j ) ⊤ M ( x → i − x → j ) {\displaystyle d({\vec {x}}_{i},{\vec {x}}_{j})=({\vec {x}}_{i}-{\vec {x}}_{j})^{\top }\mathbf {M} ({\vec {x}}_{i}-{\vec {x}}_{j})} . For d ( ⋅ , ⋅ ) {\displaystyle d(\cdot ,\cdot )} to be well defined, the matrix M {\displaystyle \mathbf {M} } needs to be positive semi-definite. The Euclidean metric is a special case, where M {\displaystyle \mathbf {M} } is the identity matrix. This generalization is often (falsely) referred to as Mahalanobis metric. Figure 1 illustrates the effect of the metric under varying M {\displaystyle \mathbf {M} } . The two circles show the set of points with equal distance to the center x → i {\displaystyle {\vec {x}}_{i}} . In the Euclidean case this set is a circle, whereas under the modified (Mahalanobis) metric it becomes an ellipsoid. The algorithm distinguishes between two types of special data points: target neighbors and impostors. === Target neighbors === Target neighbors are selected before learning. Each instance x → i {\displaystyle {\vec {x}}_{i}} has exactly k {\displaystyle k} different target neighbors within D {\displaystyle D} , which all share the same class label y i {\displaystyle y_{i}} . The target neighbors are the data points that should become nearest neighbors under the learned metric. Let us denote the set of target neighbors for a data point x → i {\displaystyle {\vec {x}}_{i}} as N i {\displaystyle N_{i}} . === Impostors === An impostor of a data point x → i {\displaystyle {\vec {x}}_{i}} is another data point x → j {\displaystyle {\vec {x}}_{j}} with a different class label (i.e. y i ≠ y j {\displaystyle y_{i}\neq y_{j}} ) which is one of the nearest neighbors of x → i {\displaystyle {\vec {x}}_{i}} . During learning the algorithm tries to minimize the number of impostors for all data instances in the training set. == Algorithm == Large margin nearest neighbors optimizes the matrix M {\displaystyle \mathbf {M} } with the help of semidefinite programming. The objective is twofold: For every data point x → i {\displaystyle {\vec {x}}_{i}} , the target neighbors should be close and the impostors should be far away. Figure 1 shows the effect of such an optimization on an illustrative example. The learned metric causes the input vector x → i {\displaystyle {\vec {x}}_{i}} to be surrounded by training instances of the same class. If it was a test point, it would be classified correctly under the k = 3 {\displaystyle k=3} nearest neighbor rule. The first optimization goal is achieved by minimizing the average distance between instances and their target neighbors ∑ i , j ∈ N i d ( x → i , x → j ) {\displaystyle \sum _{i,j\in N_{i}}d({\vec {x}}_{i},{\vec {x}}_{j})} . The second goal is achieved by penalizing distances to impostors x → l {\displaystyle {\vec {x}}_{l}} that are less than one unit further away than target neighbors x → j {\displaystyle {\vec {x}}_{j}} (and therefore pushing them out of the local neighborhood of x → i {\displaystyle {\vec {x}}_{i}} ). The resulting value to be minimized can be stated as: ∑ i , j ∈ N i , l , y l ≠ y i [ d ( x → i , x → j ) + 1 − d ( x → i , x → l ) ] + {\displaystyle \sum _{i,j\in N_{i},l,y_{l}\neq y_{i}}[d({\vec {x}}_{i},{\vec {x}}_{j})+1-d({\vec {x}}_{i},{\vec {x}}_{l})]_{+}} With a hinge loss function [ ⋅ ] + = max ( ⋅ , 0 ) {\textstyle [\cdot ]_{+}=\max(\cdot ,0)} , which ensures that impostor proximity is not penalized when outside the margin. The margin of exactly one unit fixes the scale of the matrix M {\displaystyle M} . Any alternative choice c > 0 {\displaystyle c>0} would result in a rescaling of M {\displaystyle M} by a factor of 1 / c {\displaystyle 1/c} . The final optimization problem becomes: min M ∑ i , j ∈ N i d ( x → i , x → j ) + λ ∑ i , j , l ξ i j l {\displaystyle \min _{\mathbf {M} }\sum _{i,j\in N_{i}}d({\vec {x}}_{i},{\vec {x}}_{j})+\lambda \sum _{i,j,l}\xi _{ijl}} ∀ i , j ∈ N i , l , y l ≠ y i {\displaystyle \forall _{i,j\in N_{i},l,y_{l}\neq y_{i}}} d ( x → i , x → j ) + 1 − d ( x → i , x → l ) ≤ ξ i j l {\displaystyle d({\vec {x}}_{i},{\vec {x}}_{j})+1-d({\vec {x}}_{i},{\vec {x}}_{l})\leq \xi _{ijl}} ξ i j l ≥ 0 {\displaystyle \xi _{ijl}\geq 0} M ⪰ 0 {\displaystyle \mathbf {M} \succeq 0} The hyperparameter λ > 0 {\textstyle \lambda >0} is some positive constant (typically set through cross-validation). Here the variables ξ i j l {\displaystyle \xi _{ijl}} (together with two types of constraints) replace the term in the cost function. They play a role similar to slack variables to absorb the extent of violations of the impostor constraints. The last constraint ensures that M {\displaystyle \mathbf {M} } is positive semi-definite. The optimization problem is an instance of semidefinite programming (SDP). Although SDPs tend to suffer from high computational complexity, this particular SDP instance can be solved very efficiently due to the underlying geometric properties of the problem. In particular, most impostor constraints are naturally satisfied and do not need to be enforced during runtime (i.e. the set of variables ξ i j l {\displaystyle \xi _{ijl}} is sparse). A particularly well suited solver technique is the working set method, which keeps a small set of constraints that are actively enforced and monitors the remaining (likely satisfied) constraints only occasionally to ensure correctness. == Extensions and efficient solvers == LMNN was extended to multiple local metrics in the 2008 paper. This extension significantly improves the classification error, but involves a more expensive optimization problem. In their 2009 publication in the Journal of Machine Learning Research, Weinberger and Saul derive an efficient solver for the semi-definite program. It can learn a metric for the MNIST handwritten digit data set in several hours, involving billions of pairwise constraints. An open source Matlab implementation is freely available at the authors web page. Kumal et al. extended the algorithm to incorporate local invariances to multivariate polynomial transformations and improved regularization.

    Read more →
  • Tribute (website)

    Tribute (website)

    Tribute is an American video-sharing website headquartered in Brooklyn. Created in 2014 by Andrew Horn and Rory Petty, the platform lets customers create video montages (called "tributes") for occasions including weddings, birthdays, anniversaries, get well soon, and memorials. Tribute.co allows users to record video messages, request submissions from friends and family, insert photos, add music, and send the resulting video tribute montage to a recipient. == Overview == Tribute's collaborative technology starts with inviting people to contribute via email, SMS or social media. Participants receive a prompt to record a short video via their phone, computer or tablet. The site's video editing software allows users to drag and drop the clips in their desired order without prior video editing experience. == History == When Andrew Horn turned twenty-seven, his girlfriend, Miki Agrawal surprised him with a video montage containing clips of his family and closest friends explaining why they loved him. This resulted in Andrew's idea to create Tribute–a "living eulogy" video-compilation service that he co-founded with software engineer Rory Petty. Founded in 2014, Tribute's activity accelerated in 2020 due to the COVID-19 pandemic, and it had sent over 5 million videos as of December 2021. While social distance restrictions were in effect, the site provided a way for people to connect while in-person celebrations were put on hold. For each video sold, Tribute makes one available to hospitals for free and has partnered with Cleveland Clinic Cancer Center in Ohio, Lurie Children's Hospital in Illinois and CarePoint Health in New Jersey.

    Read more →
  • Variational message passing

    Variational message passing

    Variational message passing (VMP) is an approximate inference technique for continuous- or discrete-valued Bayesian networks, with conjugate-exponential parents, developed by John Winn. VMP was developed as a means of generalizing the approximate variational methods used by such techniques as latent Dirichlet allocation, and works by updating an approximate distribution at each node through messages in the node's Markov blanket. == Likelihood lower bound == Given some set of hidden variables H {\displaystyle H} and observed variables V {\displaystyle V} , the goal of approximate inference is to maximize a lower-bound on the probability that a graphical model is in the configuration V {\displaystyle V} . Over some probability distribution Q {\displaystyle Q} (to be defined later), ln ⁡ P ( V ) = ∑ H Q ( H ) ln ⁡ P ( H , V ) P ( H | V ) = ∑ H Q ( H ) [ ln ⁡ P ( H , V ) Q ( H ) − ln ⁡ P ( H | V ) Q ( H ) ] {\displaystyle \ln P(V)=\sum _{H}Q(H)\ln {\frac {P(H,V)}{P(H|V)}}=\sum _{H}Q(H){\Bigg [}\ln {\frac {P(H,V)}{Q(H)}}-\ln {\frac {P(H|V)}{Q(H)}}{\Bigg ]}} . So, if we define our lower bound to be L ( Q ) = ∑ H Q ( H ) ln ⁡ P ( H , V ) Q ( H ) {\displaystyle L(Q)=\sum _{H}Q(H)\ln {\frac {P(H,V)}{Q(H)}}} , then the likelihood is simply this bound plus the relative entropy between P {\displaystyle P} and Q {\displaystyle Q} . Because the relative entropy is non-negative, the function L {\displaystyle L} defined above is indeed a lower bound of the log likelihood of our observation V {\displaystyle V} . The distribution Q {\displaystyle Q} will have a simpler character than that of P {\displaystyle P} because marginalizing over P {\displaystyle P} is intractable for all but the simplest of graphical models. In particular, VMP uses a factorized distribution Q ( H ) = ∏ i Q i ( H i ) , {\displaystyle Q(H)=\prod _{i}Q_{i}(H_{i}),} where H i {\displaystyle H_{i}} is a disjoint part of the graphical model. == Determining the update rule == The likelihood estimate needs to be as large as possible; because it's a lower bound, getting closer log ⁡ P {\displaystyle \log P} improves the approximation of the log likelihood. By substituting in the factorized version of Q {\displaystyle Q} , L ( Q ) {\displaystyle L(Q)} , parameterized over the hidden nodes H i {\displaystyle H_{i}} as above, is simply the negative relative entropy between Q j {\displaystyle Q_{j}} and Q j ∗ {\displaystyle Q_{j}^{}} plus other terms independent of Q j {\displaystyle Q_{j}} if Q j ∗ {\displaystyle Q_{j}^{}} is defined as Q j ∗ ( H j ) = 1 Z e E − j { ln ⁡ P ( H , V ) } {\displaystyle Q_{j}^{}(H_{j})={\frac {1}{Z}}e^{\mathbb {E} _{-j}\{\ln P(H,V)\}}} , where E − j { ln ⁡ P ( H , V ) } {\displaystyle \mathbb {E} _{-j}\{\ln P(H,V)\}} is the expectation over all distributions Q i {\displaystyle Q_{i}} except Q j {\displaystyle Q_{j}} . Thus, if we set Q j {\displaystyle Q_{j}} to be Q j ∗ {\displaystyle Q_{j}^{}} , the bound L {\displaystyle L} is maximized. == Messages in variational message passing == Parents send their children the expectation of their sufficient statistic while children send their parents their natural parameter, which also requires messages to be sent from the co-parents of the node. == Relationship to exponential families == Because all nodes in VMP come from exponential families and all parents of nodes are conjugate to their children nodes, the expectation of the sufficient statistic can be computed from the normalization factor. == VMP algorithm == The algorithm begins by computing the expected value of the sufficient statistics for that vector. Then, until the likelihood converges to a stable value (this is usually accomplished by setting a small threshold value and running the algorithm until it increases by less than that threshold value), do the following at each node: Get all messages from parents. Get all messages from children (this might require the children to get messages from the co-parents). Compute the expected value of the nodes sufficient statistics. == Constraints == Because every child must be conjugate to its parent, this has limited the types of distributions that can be used in the model. For example, the parents of a Gaussian distribution must be a Gaussian distribution (corresponding to the Mean) and a gamma distribution (corresponding to the precision, or one over σ {\displaystyle \sigma } in more common parameterizations). Discrete variables can have Dirichlet parents, and Poisson and exponential nodes must have gamma parents. More recently, VMP has been extended to handle models that violate this conditional conjugacy constraint. == Literature == John Winn; Christopher M. Bishop (2005). "Variational Message Passing" (PDF). Journal of Machine Learning Research. 6: 661–694. ISSN 1533-7928. Wikidata Q139488859. Beal, M.J. (2003). Variational Algorithms for Approximate Bayesian Inference (PDF) (PhD). Gatsby Computational Neuroscience Unit, University College London. Archived from the original (PDF) on 2005-04-28. Retrieved 2007-02-15.

    Read more →
  • Concept class

    Concept class

    In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory. Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning. In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map c : X → Y {\displaystyle c:X\to Y} , i.e. from examples to classifier labels (where Y = { 0 , 1 } {\displaystyle Y=\{0,1\}} and where c is a subset of X), c is then said to be a concept. A concept class C {\displaystyle C} is then a collection of such concepts. Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s. Not every subclass is reachable. == Background == A sample s {\displaystyle s} is a partial function from X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} . Identifying a concept with its characteristic function mapping X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} , it is a special case of a sample. Two samples are consistent if they agree on the intersection of their domains. A sample s ′ {\displaystyle s'} extends another sample s {\displaystyle s} if the two are consistent and the domain of s {\displaystyle s} is contained in the domain of s ′ {\displaystyle s'} . == Examples == Suppose that C = S + ( X ) {\displaystyle C=S^{+}(X)} . Then: the subclass { { x } } {\displaystyle \{\{x\}\}} is reachable with the sample s = { ( x , 1 ) } {\displaystyle s=\{(x,1)\}} ; the subclass S + ( Y ) {\displaystyle S^{+}(Y)} for Y ⊆ X {\displaystyle Y\subseteq X} are reachable with a sample that maps the elements of X − Y {\displaystyle X-Y} to zero; the subclass S ( X ) {\displaystyle S(X)} , which consists of the singleton sets, is not reachable. == Applications == Let C {\displaystyle C} be some concept class. For any concept c ∈ C {\displaystyle c\in C} , we call this concept 1 / d {\displaystyle 1/d} -good for a positive integer d {\displaystyle d} if, for all x ∈ X {\displaystyle x\in X} , at least 1 / d {\displaystyle 1/d} of the concepts in C {\displaystyle C} agree with c {\displaystyle c} on the classification of x {\displaystyle x} . The fingerprint dimension F D ( C ) {\displaystyle FD(C)} of the entire concept class C {\displaystyle C} is the least positive integer d {\displaystyle d} such that every reachable subclass C ′ ⊆ C {\displaystyle C'\subseteq C} contains a concept that is 1 / d {\displaystyle 1/d} -good for it. This quantity can be used to bound the minimum number of equivalence queries needed to learn a class of concepts according to the following inequality: F D ( C ) − 1 ≤ # E Q ( C ) ≤ ⌈ F D ( C ) ln ⁡ ( | C | ) ⌉ {\textstyle FD(C)-1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil } .

    Read more →
  • Frequent pattern discovery

    Frequent pattern discovery

    Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold. == Techniques == Techniques for FP mining include: market basket analysis cross-marketing catalog design clustering classification recommendation systems For the most part, FP discovery can be done using association rule learning with particular algorithms Eclat, FP-growth and the Apriori algorithm. Other strategies include: Frequent subtree mining Structure mining Sequential pattern mining and respective specific techniques. Implementations exist for various machine learning systems or modules like MLlib for Apache Spark.

    Read more →
  • AI Mode

    AI Mode

    AI Mode is a search feature used within Google Search. In March 2025, Google introduced an experimental "AI Mode" within its search platform, enabling users to input complex, multi-part queries and receive comprehensive, AI-generated responses. This feature uses Google's Gemini model, which enhances the system's reasoning capabilities and supports multimodal inputs, including text, images, and voice. Users need to be signed in to be able to use the image generation features. Initially, AI Mode was available to Google One AI Premium subscribers in the United States, who could access it through the Search Labs platform. This phased rollout allowed Google to gather user feedback and refine the feature before a broader release.

    Read more →
  • Nonlinear dimensionality reduction

    Nonlinear dimensionality reduction

    Nonlinear dimensionality reduction (NLDR), also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis. == Applications of NLDR == High dimensional data can be hard for machines to work with, requiring significant time and space for analysis. It also presents a challenge for humans, since it's hard to visualize or understand data in more than three dimensions. Reducing the dimensionality of a data set, while keeping its essential features relatively intact, can make algorithms more efficient and allow analysts to visualize trends and patterns. The reduced-dimensional representations of data are often referred to as "intrinsic variables". This description implies that these are the values from which the data was produced. For example, consider a dataset that contains images of a letter 'A', which has been scaled and rotated by varying amounts. Each image has 32×32 pixels. Each image can be represented as a vector of 1024 pixel values. Each row is a sample on a two-dimensional manifold in 1024-dimensional space (a Hamming space). The intrinsic dimensionality is two, because two variables (rotation and scale) were varied in order to produce the data. Information about the shape or look of a letter 'A' is not part of the intrinsic variables because it is the same in every instance. Nonlinear dimensionality reduction will discard the correlated information (the letter 'A') and recover only the varying information (rotation and scale). By comparison, if principal component analysis, which is a linear dimensionality reduction algorithm, is used to reduce this same dataset into two dimensions, the resulting values are not so well organized. This demonstrates that the high-dimensional vectors (each representing a letter 'A') that sample this manifold vary in a non-linear manner. It should be apparent, therefore, that NLDR has several applications in the field of computer-vision. For example, consider a robot that uses a camera to navigate in a closed static environment. The images obtained by that camera can be considered to be samples on a manifold in high-dimensional space, and the intrinsic variables of that manifold will represent the robot's position and orientation. Invariant manifolds are of general interest for model order reduction in dynamical systems. In particular, if there is an attracting invariant manifold in the phase space, nearby trajectories will converge onto it and stay on it indefinitely, rendering it a candidate for dimensionality reduction of the dynamical system. While such manifolds are not guaranteed to exist in general, the theory of spectral submanifolds (SSM) gives conditions for the existence of unique attracting invariant objects in a broad class of dynamical systems. Active research in NLDR seeks to unfold the observation manifolds associated with dynamical systems to develop modeling techniques. Some of the more prominent nonlinear dimensionality reduction techniques are listed below. == Important concepts == === Sammon's mapping === Sammon's mapping is one of the first and most popular NLDR techniques. === Self-organizing map === The self-organizing map (SOM, also called Kohonen map) and its probabilistic variant generative topographic mapping (GTM) use a point representation in the embedded space to form a latent variable model based on a non-linear mapping from the embedded space to the high-dimensional space. These techniques are related to work on density networks, which also are based around the same probabilistic model. === Kernel principal component analysis === Perhaps the most widely used algorithm for dimensional reduction is kernel PCA. PCA begins by computing the covariance matrix of the m × n {\displaystyle m\times n} matrix X {\displaystyle \mathbf {X} } C = 1 m ∑ i = 1 m x i x i T . {\displaystyle C={\frac {1}{m}}\sum _{i=1}^{m}{\mathbf {x} _{i}\mathbf {x} _{i}^{\mathsf {T}}}.} It then projects the data onto the first k eigenvectors of that matrix. By comparison, KPCA begins by computing the covariance matrix of the data after being transformed into a higher-dimensional space, C = 1 m ∑ i = 1 m Φ ( x i ) Φ ( x i ) T . {\displaystyle C={\frac {1}{m}}\sum _{i=1}^{m}{\Phi (\mathbf {x} _{i})\Phi (\mathbf {x} _{i})^{\mathsf {T}}}.} It then projects the transformed data onto the first k eigenvectors of that matrix, just like PCA. It uses the kernel trick to factor away much of the computation, such that the entire process can be performed without actually computing Φ ( x ) {\displaystyle \Phi (\mathbf {x} )} . Of course Φ {\displaystyle \Phi } must be chosen such that it has a known corresponding kernel. Unfortunately, it is not trivial to find a good kernel for a given problem, so KPCA does not yield good results with some problems when using standard kernels. For example, it is known to perform poorly with these kernels on the Swiss roll manifold. However, one can view certain other methods that perform well in such settings (e.g., Laplacian Eigenmaps, LLE) as special cases of kernel PCA by constructing a data-dependent kernel matrix. KPCA has an internal model, so it can be used to map points onto its embedding that were not available at training time. === Principal curves and manifolds === Principal curves and manifolds give the natural geometric framework for nonlinear dimensionality reduction and extend the geometric interpretation of PCA by explicitly constructing an embedded manifold, and by encoding using standard geometric projection onto the manifold. This approach was originally proposed by Trevor Hastie in his 1984 thesis, which he formally introduced in 1989. This idea has been explored further by many authors. How to define the "simplicity" of the manifold is problem-dependent, however, it is commonly measured by the intrinsic dimensionality and/or the smoothness of the manifold. Usually, the principal manifold is defined as a solution to an optimization problem. The objective function includes a quality of data approximation and some penalty terms for the bending of the manifold. The popular initial approximations are generated by linear PCA and Kohonen's SOM. === Laplacian eigenmaps === Laplacian eigenmaps uses spectral techniques to perform dimensionality reduction. This technique relies on the basic assumption that the data lies in a low-dimensional manifold in a high-dimensional space. This algorithm cannot embed out-of-sample points, but techniques based on Reproducing kernel Hilbert space regularization exist for adding this capability. Such techniques can be applied to other nonlinear dimensionality reduction algorithms as well. Traditional techniques like principal component analysis do not consider the intrinsic geometry of the data. Laplacian eigenmaps builds a graph from neighborhood information of the data set. Each data point serves as a node on the graph and connectivity between nodes is governed by the proximity of neighboring points (using e.g. the k-nearest neighbor algorithm). The graph thus generated can be considered as a discrete approximation of the low-dimensional manifold in the high-dimensional space. Minimization of a cost function based on the graph ensures that points close to each other on the manifold are mapped close to each other in the low-dimensional space, preserving local distances. The eigenfunctions of the Laplace–Beltrami operator on the manifold serve as the embedding dimensions, since under mild conditions this operator has a countable spectrum that is a basis for square integrable functions on the manifold (compare to Fourier series on the unit circle manifold). Attempts to place Laplacian eigenmaps on solid theoretical ground have met with some success, as under certain nonrestrictive assumptions, the graph Laplacian matrix has been shown to converge to the Laplace–Beltrami operator as the number of points goes to infinity. === Isomap === Isomap is a combination of the Floyd–Warshall algorithm with classic Multidimensional Scaling (MDS). Classic MDS takes a matrix of pair-wise distances between all points and computes a position for each point. Isomap assumes that the pair-wise distances are only known between neighboring points, and uses the Floyd–Warshall algorithm to compute the pair-wise distances between all other points. This effectively estimates the full matrix of pair-wise geodesic distances between all of the points. Isomap th

    Read more →
  • Ni1000

    Ni1000

    The Ni1000 is an artificial neural network chip developed by Nestor Corporation and Intel, developed in the 1990s. It is Intel's second-generation neural network chip, but the first all-digital chip. The chip is aimed at image analysis applications– containing more than 3 million transistors – and can analyze 40,000 patterns per second. Prototypes running Nestor's OCR software in 1994 were capable of recognizing around 100 handwritten characters per second. The development was funded with money from DARPA and Office of Naval Research.

    Read more →
  • Artificial development

    Artificial development

    Artificial development, also known as artificial embryogeny or machine intelligence or computational development, is an area of computer science and engineering concerned with computational models motivated by genotype–phenotype mappings in biological systems. Artificial development is often considered a sub-field of evolutionary computation, although the principles of artificial development have also been used within stand-alone computational models. Within evolutionary computation, the need for artificial development techniques was motivated by the perceived lack of scalability and evolvability of direct solution encodings (Tufte, 2008). Artificial development entails indirect solution encoding. Rather than describing a solution directly, an indirect encoding describes (either explicitly or implicitly) the process by which a solution is constructed. Often, but not always, these indirect encodings are based upon biological principles of development such as morphogen gradients, cell division and cellular differentiation (e.g. Doursat 2008), gene regulatory networks (e.g. Guo et al., 2009), degeneracy (Whitacre et al., 2010), grammatical evolution (de Salabert et al., 2006), or analogous computational processes such as re-writing, iteration, and time. The influences of interaction with the environment, spatiality and physical constraints on differentiated multi-cellular development have been investigated more recently (e.g. Knabe et al. 2008). Artificial development approaches have been applied to a number of computational and design problems, including electronic circuit design (Miller and Banzhaf 2003), robotic controllers (e.g. Taylor 2004), and the design of physical structures (e.g. Hornby 2004).

    Read more →
  • Seed (programming)

    Seed (programming)

    Seed is a JavaScript interpreter and a library of the GNOME project to create standalone applications in JavaScript. It uses the JavaScript engine JavaScriptCore of the WebKit project. It is possible to easily create modules in C. Seed is integrated in GNOME since the 2.28 version and is used by two games in the GNOME Games package. It is also used by the Web web browser for the design of its extensions. The module is also officially supported by the GTK+ project. == Hello world in Seed == This example uses the standard output to output the string "Hello, World". == A program using GTK+ == This code shows an empty window named "Example". == Modules == To use a module, just instantiate a class having for name imports. followed by the name of the module respecting the case sensitivity. The modules using GObject Introspection, who starts by imports.gi. : Gtk Gst GObject Gio Clutter GLib Gdk WebKit GdkPixbuf, GdkPixbuf Libxml Cairo DBus MPFR Os (system library) Canvas (using Cairo) multiprocessing readline Archived 2009-11-09 at the Wayback Machine ffi sqlite sandbox Archived 2009-11-09 at the Wayback Machine == List of the Seed versions == The names of the versions of Seed are albums of famous rock bands.

    Read more →
  • Blockmodeling linked networks

    Blockmodeling linked networks

    Blockmodeling linked networks is an approach in blockmodeling in analysing the linked networks. Such approach is based on the generalized multilevel blockmodeling approach. The main objective of this approach is to achieve clustering of the nodes from all involved sets, while at the same time using all available information. At the same time, all one-mode and two-node networks, that are connected, are blockmodeled, which results in obtaining only one clustering, using nodes from each sets. Each cluster ideally contains only nodes from one set, which also allows the modeling of the links among clusters from different sets (through two-mode networks). This approach was introduced by Aleš Žiberna in 2014. Blockmodeling linked networks can be done using: separate analysis: blockmodeling each level separately; conversion approach: converting all one-mode networks to the same level and joining with two-mode networks; a true multilevel approach: one-mode and two-mode networks are blockmodeled at the same time, resulting in one clustering for nodes from each level.

    Read more →
  • Fitness approximation

    Fitness approximation

    Fitness approximation aims to approximate the objective or fitness functions in evolutionary optimization by building up machine learning models based on data collected from numerical simulations or physical experiments. The machine learning models for fitness approximation are also known as meta-models or surrogates, and evolutionary optimization based on approximated fitness evaluations are also known as surrogate-assisted evolutionary approximation. Fitness approximation in evolutionary optimization can be seen as a sub-area of data-driven evolutionary optimization. == Approximate models in function optimization == === Motivation === In many real-world optimization problems including engineering problems, the number of fitness function evaluations needed to obtain a good solution dominates the optimization cost. In order to obtain efficient optimization algorithms, it is crucial to use prior information gained during the optimization process. Conceptually, a natural approach to utilizing the known prior information is building a model of the fitness function to assist in the selection of candidate solutions for evaluation. A variety of techniques for constructing such a model, often also referred to as surrogates, metamodels or approximation models – for computationally expensive optimization problems have been considered. === Approaches === Common approaches to constructing approximate models based on learning and interpolation from known fitness values of a small population include: Low-degree polynomials and regression models Fourier surrogate modeling Artificial neural networks including Multilayer perceptrons Radial basis function network Support vector machines Due to the limited number of training samples and high dimensionality encountered in engineering design optimization, constructing a globally valid approximate model remains difficult. As a result, evolutionary algorithms using such approximate fitness functions may converge to local optima. Therefore, it can be beneficial to selectively use the original fitness function together with the approximate model.

    Read more →