AI Avatar Heygen

AI Avatar Heygen — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Agent-assisted automation

    Agent-assisted automation

    Agent-assisted automation is a type of call center technology that automates elements of what the call center agent 1) does with his/her desktop tools and/or 2) says to customers during the call using pre-recorded audio. It is a relatively new category of call center technology that shows promise in improving call center productivity and compliance. == Types of agent-assisted automation == === Pre-recorded audio === Pre-recorded audio (sometimes referred to as soundboard (computer program) or as soundboard technology) is another form of agent-assisted automation. The purpose of using pre-recorded messages is to increase the probability (and in some cases error-proof the process so) that the right information is provided to customers at the right time. The required disclosures are pre-recorded to ensure accuracy and understandability. By integrating the recordings with the customer relationship management software, the right combination of disclosures can be played based on the combination of goods and services the customer purchased. The integration with the customer relationship management software also ensures that the order cannot be submitted until the disclosures are played, essentially error-proofing (poka-yoke) the process of ensuring the customer gets all the required consumer protection information. Phone surveys are ideal applications of this technology. Whether surveying market preferences or political views, the pre-recorded audio with an agent listening allows the questions to be asked in the same way every time, uninfluenced by the agents' fatigue levels, accents, or their own views. === Fraud prevention === Fraud prevention is a specialized type of agent-assisted automation focused on reducing ID theft and credit card fraud. ID theft and credit card fraud are huge threats for call centers and their customers and few good solutions exist, but new agent-assisted automation solutions are producing promising results. The technology allows the agents to remain on the phone while the customers use their phone key pads to enter the information. The tones are masked and the information passes directly into the customer relationship management system or payment gateway in the case of credit card transactions. The automation essentially makes it impossible for call center agents and also call center personnel that might be monitoring the calls to steal the credit card number, social security number, or other personally identifiable information. === Outbound telemarketing === Another specialized application space of agent-assisted automation is in outbound telemarketing, which goes under numerous headings including outbound prospecting, cold calling, solicitation, fund-raising, etc. Turnover is high among agents engaged in this kind of work because the task is tedious and emotionally difficult. It is tedious because the agent spends the bulk of their day, not talking to qualified leads, but in getting wrong numbers and answering machines. == Benefits == Just as automation has benefited manufacturing by reducing the mental and physical effort required of workers while simultaneously improving throughput, quality, and safety, agent-assisted automation is improving call center results while reducing the tiring aspects of the job for agents. In some cases, the agent-assisted automation streamlines the process and allows calls to be handled more quickly. By eliminating cutting and pasting from one application to another, by auto-navigating applications, and by providing a single view of the customer, agent-assisted automation can reduce call handle time and increase agent productivity. Second, in theory, the more steps that can be automated and the more logic that can be built into the call flow (e.g., if the customer buys items 2 and 9, then disclosures a, c, and f are read by the pre-recorded audio), then companies may be able to reduce the amount of training that is required of the agents while at the same time ensuring more consistency and accuracy. However, no published studies have reported this result yet. But an even larger problem in call centers is between-agent variation in behavior and results. Agents differ in the amount of training and coaching they receive, they differ in the amount of experience they have, their jobs are repetitious and tiring, and the process and procedures the agents are supposed to follow constantly change. Moreover, there are significant individual differences between agents in their intelligence, personality, motivations, etc. which all affect performance. Despite the large amount of money call centers have spent over decades trying to reduce between-agent variation, the problem is still so prevalent that one large study of customer interactions with call centers found that a customer's experience was completely a function of the quality of the agent who happened to answer the phone. Therefore, the most significant benefit of agent-assisted automation may prove to be in how the automation error-proofs or poka-yoke the process and ensures that something that needs to be done or said happens every time. Properly implemented, the between-agent variation for whatever step of the process the automation is applied to may be able to be reduced to near zero. This is especially important in a collection agency whose processes and procedures are closely regulated by the Fair Debt Collection Practices Act.

    Read more →
  • Homogeneity blockmodeling

    Homogeneity blockmodeling

    In mathematics applied to analysis of social structures, homogeneity blockmodeling is an approach in blockmodeling, which is best suited for a preliminary or main approach to valued networks, when a prior knowledge about these networks is not available. This is because homogeneity blockmodeling emphasizes the similarity of link (tie) strengths within the blocks over the pattern of links. In this approach, tie (link) values (or statistical data computed on them) are assumed to be equal (homogenous) within blocks. This approach to the generalized blockmodeling of valued networks was first proposed by Aleš Žiberna in 2007 with the basic idea, "that the inconsistency of an empirical block with its ideal block can be measured by within block variability of appropriate values". The newly–formed ideal blocks, which are appropriate for blockmodeling of valued networks, are then presented together with the definitions of their block inconsistencies. Similar approach to the homogeneity blockmodeling, dealing with direct approach for structural equivalence, was previously suggested by Stephen P. Borgatti and Martin G. Everett (1992).

    Read more →
  • Sammon mapping

    Sammon mapping

    Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality (see multidimensional scaling) by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection. It is particularly suited for use in exploratory data analysis. The method was proposed by John W. Sammon in 1969. It is considered a non-linear approach as the mapping cannot be represented as a linear combination of the original variables as possible in techniques such as principal component analysis, which also makes it more difficult to use for classification applications. Denote the distance between ith and jth objects in the original space by d i j ∗ {\displaystyle \scriptstyle d_{ij}^{}} , and the distance between their projections by d i j {\displaystyle \scriptstyle d_{ij}^{}} . Sammon's mapping aims to minimize the following error function, which is often referred to as Sammon's stress or Sammon's error: E = 1 ∑ i < j d i j ∗ ∑ i < j ( d i j ∗ − d i j ) 2 d i j ∗ . {\displaystyle E={\frac {1}{\sum \limits _{i Read more →

  • C4.5 algorithm

    C4.5 algorithm

    C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date". It became quite popular after ranking #1 in the Top 10 Algorithms in Data Mining pre-eminent paper published by Springer LNCS in 2008. == Algorithm == C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set S = s 1 , s 2 , . . . {\displaystyle S={s_{1},s_{2},...}} of already classified samples. Each sample s i {\displaystyle s_{i}} consists of a p-dimensional vector ( x 1 , i , x 2 , i , . . . , x p , i ) {\displaystyle (x_{1,i},x_{2,i},...,x_{p,i})} , where the x j {\displaystyle x_{j}} represent attribute values or features of the sample, as well as the class in which s i {\displaystyle s_{i}} falls. At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurses on the partitioned sublists. This algorithm has a few base cases. All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class. None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class. Instance of previously unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value. === Pseudocode === In pseudocode, the general algorithm for building decision trees is: Check for the above base cases. For each attribute a, find the normalized information gain ratio from splitting on a. Let a_best be the attribute with the highest normalized information gain. Create a decision node that splits on a_best. Recurse on the sublists obtained by splitting on a_best, and add those nodes as children of node. == Improvements from ID3 algorithm == C4.5 made a number of improvements to ID3. Some of these are: Handling both continuous and discrete attributes: In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values: C4.5 allows attribute values to be marked as missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation: C4.5 goes back through the tree once it's been created and attempts to remove branches that do not help by replacing them with leaf nodes. == Improvements in C5.0/See5 algorithm == Quinlan went on to create C5.0 and See5 (C5.0 for Unix/Linux, See5 for Windows) which he markets commercially. C5.0 offers a number of improvements on C4.5. Some of these are: Speed - C5.0 is significantly faster than C4.5 (several orders of magnitude) Memory usage - C5.0 is more memory efficient than C4.5 Smaller decision trees - C5.0 gets similar results to C4.5 with considerably smaller decision trees. Support for boosting - Boosting improves the trees and gives them more accuracy. Weighting - C5.0 allows you to weight different cases and misclassification types. Winnowing - a C5.0 option automatically winnows the attributes to remove those that may be unhelpful. Source for a single-threaded Linux version of C5.0 is available under the GNU General Public License (GPL).

    Read more →
  • Autocommit

    Autocommit

    In the context of data management, autocommit is a mode of operation of a database connection. Each individual database interaction (i.e., each SQL statement) submitted through the database connection in autocommit mode will be executed in its own transaction that is implicitly committed. A SQL statement executed in autocommit mode cannot be rolled back. Autocommit mode incurs per-statement transaction overhead and can often lead to undesirable performance or resource utilization impact on the database. Nonetheless, in systems such as Microsoft SQL Server, as well as connection technologies such as ODBC and Microsoft OLE DB, autocommit mode is the default for all statements that change data, in order to ensure that individual statements will conform to the ACID (atomicity-consistency-isolation-durability) properties of transactions. The alternative to autocommit mode (non-autocommit) means that the SQL client application itself is responsible for ending transactions explicitly via the commit or rollback SQL commands. Non-autocommit mode enables grouping of multiple data manipulation SQL commands into a single atomic transaction. Some DBMS (e.g. MariaDB) force autocommit for every DDL statement, even in non-autocommit mode. In this case, before each DDL statement, previous DML statements in transaction are autocommitted. Each DDL statement is executed in its own new autocommit transaction.

    Read more →
  • ID3 algorithm

    ID3 algorithm

    In decision tree learning, ID3 (Iterative Dichotomiser 3) is a greedy algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm. The 3 in the name is meant to signify that this was Quinlan's third attempt at a model based on entropy-based splitting, and the term dichotimser is a misnomer as it implies a binary split, but the ID3 algorithm can split on multi-valued attributes. == Algorithm == The ID3 algorithm begins with the original set S {\displaystyle S} as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S {\displaystyle S} and calculates the entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} or the information gain I G ( S ) {\displaystyle IG(S)} of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S {\displaystyle S} is then split or partitioned by the selected attribute to produce subsets of the data. (For example, a node can be split into child nodes based upon the subsets of the population whose ages are less than 50, between 50 and 100, and greater than 100.) The algorithm continues to recurse on each subset, considering only attributes never selected before. Recursion on a subset may stop in one of these cases: every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples. there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset. there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the population with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set. Throughout the algorithm, the decision tree is constructed with each non-terminal node (internal node) representing the selected attribute on which the data was split, and terminal nodes (leaf nodes) representing the class label of the final subset of this branch. === Summary === Calculate the entropy of every attribute a {\displaystyle a} of the data set S {\displaystyle S} . Partition ("split") the set S {\displaystyle S} into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum. Make a decision tree node containing that attribute. Recurse on subsets using the remaining attributes. === Properties === ID3 does not guarantee an optimal solution. It can converge upon local optima. It uses a greedy strategy by selecting the locally best attribute to split the dataset on each iteration. The algorithm's optimality can be improved by using backtracking during the search for the optimal decision tree at the cost of possibly taking longer. ID3 can overfit the training data. To avoid overfitting, smaller decision trees should be preferred over larger ones. This algorithm usually produces small trees, but it does not always produce the smallest possible decision tree. ID3 is harder to use on continuous data than on factored data (factored data has a discrete number of possible values, thus reducing the possible branch points). If the values of any given attribute are continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time-consuming. === Usage === The ID3 algorithm is used by training on a data set S {\displaystyle S} to produce a decision tree which is stored in memory. At runtime, this decision tree is used to classify new test cases (feature vectors) by traversing the decision tree using the features of the datum to arrive at a leaf node. == The ID3 metrics == === Entropy === Entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} is a measure of the amount of uncertainty in the (data) set S {\displaystyle S} (i.e. entropy characterizes the (data) set S {\displaystyle S} ). H ( S ) = ∑ x ∈ X − p ( x ) log 2 ⁡ p ( x ) {\displaystyle \mathrm {H} {(S)}=\sum _{x\in X}{-p(x)\log _{2}p(x)}} Where, S {\displaystyle S} – The current dataset for which entropy is being calculated This changes at each step of the ID3 algorithm, either to a subset of the previous set in the case of splitting on an attribute or to a "sibling" partition of the parent in case the recursion terminated previously. X {\displaystyle X} – The set of classes in S {\displaystyle S} p ( x ) {\displaystyle p(x)} – The proportion of the number of elements in class x {\displaystyle x} to the number of elements in set S {\displaystyle S} When H ( S ) = 0 {\displaystyle \mathrm {H} {(S)}=0} , the set S {\displaystyle S} is perfectly classified (i.e. all elements in S {\displaystyle S} are of the same class). In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set S {\displaystyle S} on this iteration. Entropy in information theory measures how much information is expected to be gained upon measuring a random variable; as such, it can also be used to quantify the amount to which the distribution of the quantity's values is unknown. A constant quantity has zero entropy, as its distribution is perfectly known. In contrast, a uniformly distributed random variable (discretely or continuously uniform) maximizes entropy. Therefore, the greater the entropy at a node, the less information is known about the classification of data at this stage of the tree; and therefore, the greater the potential to improve the classification here. As such, ID3 is a greedy heuristic performing a best-first search for locally optimal entropy values. Its accuracy can be improved by preprocessing the data. === Information gain === Information gain I G ( A ) {\displaystyle IG(A)} is the measure of the difference in entropy from before to after the set S {\displaystyle S} is split on an attribute A {\displaystyle A} . In other words, how much uncertainty in S {\displaystyle S} was reduced after splitting set S {\displaystyle S} on attribute A {\displaystyle A} . I G ( S , A ) = H ( S ) − ∑ t ∈ T p ( t ) H ( t ) = H ( S ) − H ( S | A ) . {\displaystyle IG(S,A)=\mathrm {H} {(S)}-\sum _{t\in T}p(t)\mathrm {H} {(t)}=\mathrm {H} {(S)}-\mathrm {H} {(S|A)}.} Where, H ( S ) {\displaystyle \mathrm {H} (S)} – Entropy of set S {\displaystyle S} T {\displaystyle T} – The subsets created from splitting set S {\displaystyle S} by attribute A {\displaystyle A} such that S = ⋃ t ∈ T t {\displaystyle S=\bigcup _{t\in T}t} p ( t ) {\displaystyle p(t)} – The proportion of the number of elements in t {\displaystyle t} to the number of elements in set S {\displaystyle S} H ( t ) {\displaystyle \mathrm {H} (t)} – Entropy of subset t {\displaystyle t} In ID3, information gain can be calculated (instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set S {\displaystyle S} on this iteration.

    Read more →
  • Andrej Mrvar

    Andrej Mrvar

    Andrej Mrvar is a Slovenian computer scientist and a professor at the University of Ljubljana's Faculty of Social Sciences. He is known for his work in network analysis, graph drawing, decision making, virtual reality, timing and data processing of sports competitions. == Education and career == He is well known for his work on Pajek, a free software for analysis and visualization of large networks. Mrvar began work on Pajek in 1996 with Vladimir Batagelj. His book Exploratory Social Network Analysis with Pajek, coauthored with Wouter de Nooy and Vladimir Batagelj, is his most cited work. It was published by Cambridge University Press in three editions (first 2005, second 2011, and third 2018). The book was translated into Japanese (2009) and Chinese (first edition 2012, second 2014). With Anuška Ferligoj, he was a founding co-editor-in-chief of the Metodološki zvezki - Advances in Methodology and Statistics journal. == Awards and honors == Vidmar Award (Faculty of Electrical and Computer Engineering, University of Ljubljana): 1988, 1990 First prizes for contributions (with Vladimir Batagelj) to Graph Drawing Contests in years: 1995, 1996, 1997, 1998, 1999, 2000 and 2005 / Graph Drawing Hall of Fame. Award of University of Ljubljana for contributions in education and research (Svečana listina Univerze v Ljubljani za pomembne dosežke na področju vzgojnoizobraževalnega in znanstvenoraziskovalega dela): 2001 The INSNA's William D. Richards Software award for work on Pajek (with Vladimir Batagelj): 2013 Award of Faculty of Social Sciences, University of Ljubljana for scientific excellence (Priznanje za znanstveno odličnost): 2013 == Selected publications == Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj, Mark Granovetter (Series Editor), Exploratory Social Network Analysis with Pajek (Structural Analysis in the Social Sciences), Cambridge University Press (First Edition: 2005, Second Edition: 2011, Third Edition: 2018 ). Japanese Translation (2010). Chinese Translation (First Edition: 2012, Second Edition: 2014) Andrej Mrvar and Vladimir Batagelj, Analysis and visualization of large networks with program package Pajek. Complex Adaptive Systems Modeling, 4:6. SpringerOpen, 2016 Vladimir Batagelj and Andrej Mrvar, Some Analyses of Erdős Collaboration Graph, Social Networks, 22, 173–186, 2000 Vladimir Batagelj and Andrej Mrvar, A Subquadratic Triad Census Algorithm for Large Sparse Networks with Small Maximum Degree. Social Networks, 23, 237–243, 2001 Patrick Doreian and Andrej Mrvar, A Partitioning Approach to Structural Balance, Social Networks, 18, 149–168, 1996 Patrick Doreian and Andrej Mrvar, Partitioning Signed Social Networks, Social Networks, 31, 1–11, 2009 Andrej Mrvar and Patrick Doreian, Partitioning Signed Two-Mode Networks, Journal of Mathematical Sociology, 33, 196–221, 2009 Patrick Doreian and Andrej Mrvar, The international reach of the Koch brothers network. In: Antonyuk, A. and Basov, N. (Eds.): Networks in the Global World V. NetGloW 2020. Lecture Notes in Networks and Systems, 181, 225–235. Springer, 2021 Patrick Doreian and Andrej Mrvar, Delineating Changes in the Fundamental Structure of Signed Networks, Frontiers in Physics, 294, 1–11, 2021 Patrick Doreian and Andrej Mrvar, Hubs and Authorities in the Koch Brothers Network. Social Networks, Social Networks, 64, 148–157, 2021 Patrick Doreian and Andrej Mrvar, Public issues, policy proposals, social movements, and the interests of the Koch Brothers network of allies, Quality and Quantity, 56, 305–322, 2022 Douglas R. White, Vladimir Batagelj, Andrej Mrvar, Analyzing Large Kinship and Marriage Networks with Pgraph and Pajek. Social Science Computer Review, 17, 245–274, 1999 Ion Georgiou, Ronald Concer, Andrej Mrvar, A Systemic Approach to Sociometric Group Research: Advancing The Work of Leslie Day Zeleny, 1939–1947, Social Networks, 63, 174–200, 2020

    Read more →
  • Generalized blockmodeling of binary networks

    Generalized blockmodeling of binary networks

    Generalized blockmodeling of binary networks (also relational blockmodeling) is an approach of generalized blockmodeling, analysing the binary network(s). As most network analyses deal with binary networks, this approach is also considered as the fundamental approach of blockmodeling. This is especially noted, as the set of ideal blocks, when used for interpretation of blockmodels, have binary link patterns, which precludes them to be compared with valued empirical blocks. When analysing the binary networks, the criterion function is measuring block inconsistencies, while also reporting the possible errors. The ideal block in binary blockmodeling has only three types of conditions: "a certain cell must be (at least) 1, a certain cell must be 0 and the f {\displaystyle f} over each row (or column) must be at least 1". It is also used as a basis for developing the generalized blockmodeling of valued networks.

    Read more →
  • Cross-entropy method

    Cross-entropy method

    The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective. The method approximates the optimal importance sampling estimator by repeating two phases: Draw a sample from a probability distribution. Minimize the cross-entropy between this distribution and a target distribution to produce a better sample in the next iteration. Reuven Rubinstein developed the method in the context of rare-event simulation, where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems. == Estimation via importance sampling == Consider the general problem of estimating the quantity ℓ = E u [ H ( X ) ] = ∫ H ( x ) f ( x ; u ) d x {\displaystyle \ell =\mathbb {E} _{\mathbf {u} }[H(\mathbf {X} )]=\int H(\mathbf {x} )\,f(\mathbf {x} ;\mathbf {u} )\,{\textrm {d}}\mathbf {x} } , where H {\displaystyle H} is some performance function and f ( x ; u ) {\displaystyle f(\mathbf {x} ;\mathbf {u} )} is a member of some parametric family of distributions. Using importance sampling this quantity can be estimated as ℓ ^ = 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) g ( X i ) {\displaystyle {\hat {\ell }}={\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{g(\mathbf {X} _{i})}}} , where X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} is a random sample from g {\displaystyle g\,} . For positive H {\displaystyle H} , the theoretically optimal importance sampling density (PDF) is given by g ∗ ( x ) = H ( x ) f ( x ; u ) / ℓ {\displaystyle g^{}(\mathbf {x} )=H(\mathbf {x} )f(\mathbf {x} ;\mathbf {u} )/\ell } . This, however, depends on the unknown ℓ {\displaystyle \ell } . The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF g ∗ {\displaystyle g^{}} . == Generic CE algorithm == Choose initial parameter vector v ( 0 ) {\displaystyle \mathbf {v} ^{(0)}} ; set t = 1. Generate a random sample X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} from f ( ⋅ ; v ( t − 1 ) ) {\displaystyle f(\cdot ;\mathbf {v} ^{(t-1)})} Solve for v ( t ) {\displaystyle \mathbf {v} ^{(t)}} , where v ( t ) = argmax v ⁡ 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) f ( X i ; v ( t − 1 ) ) log ⁡ f ( X i ; v ) {\displaystyle \mathbf {v} ^{(t)}=\mathop {\textrm {argmax}} _{\mathbf {v} }{\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})}}\log f(\mathbf {X} _{i};\mathbf {v} )} If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2. In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are When f {\displaystyle f\,} belongs to the natural exponential family When f {\displaystyle f\,} is discrete with finite support When H ( X ) = I { x ∈ A } {\displaystyle H(\mathbf {X} )=\mathrm {I} _{\{\mathbf {x} \in A\}}} and f ( X i ; u ) = f ( X i ; v ( t − 1 ) ) {\displaystyle f(\mathbf {X} _{i};\mathbf {u} )=f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})} , then v ( t ) {\displaystyle \mathbf {v} ^{(t)}} corresponds to the maximum likelihood estimator based on those X k ∈ A {\displaystyle \mathbf {X} _{k}\in A} . == Continuous optimization—example == The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function S {\displaystyle S} , for example, S ( x ) = e − ( x − 2 ) 2 + 0.8 e − ( x + 2 ) 2 {\displaystyle S(x)={\textrm {e}}^{-(x-2)^{2}}+0.8\,{\textrm {e}}^{-(x+2)^{2}}} . To apply CE, one considers first the associated stochastic problem of estimating P θ ( S ( X ) ≥ γ ) {\displaystyle \mathbb {P} _{\boldsymbol {\theta }}(S(X)\geq \gamma )} for a given level γ {\displaystyle \gamma \,} , and parametric family { f ( ⋅ ; θ ) } {\displaystyle \left\{f(\cdot ;{\boldsymbol {\theta }})\right\}} , for example the 1-dimensional Gaussian distribution, parameterized by its mean μ t {\displaystyle \mu _{t}\,} and variance σ t 2 {\displaystyle \sigma _{t}^{2}} (so θ = ( μ , σ 2 ) {\displaystyle {\boldsymbol {\theta }}=(\mu ,\sigma ^{2})} here). Hence, for a given γ {\displaystyle \gamma \,} , the goal is to find θ {\displaystyle {\boldsymbol {\theta }}} so that D K L ( I { S ( x ) ≥ γ } ‖ f θ ) {\displaystyle D_{\mathrm {KL} }({\textrm {I}}_{\{S(x)\geq \gamma \}}\|f_{\boldsymbol {\theta }})} is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value ≥ γ {\displaystyle \geq \gamma } . The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm. === Pseudocode === // Initialize parameters μ := −6 σ2 := 100 t := 0 maxits := 100 N := 100 Ne := 10 // While maxits not exceeded and not converged while t < maxits and σ2 > ε do // Obtain N samples from current sampling distribution X := SampleGaussian(μ, σ2, N) // Evaluate objective function at sampled points S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) // Sort X by objective function values in descending order X := sort(X, S) // Update parameters of sampling distribution via elite samples μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 // Return mean of final sampling distribution as solution return μ == Related methods == Simulated annealing Genetic algorithms Harmony search Estimation of distribution algorithm Tabu search Natural Evolution Strategy Ant colony optimization algorithms

    Read more →
  • Multiple discriminant analysis

    Multiple discriminant analysis

    Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction technique. It has been used to predict signals as diverse as neural memory traces and corporate failure. MDA is not directly used to perform classification. It merely supports classification by yielding a compressed signal amenable to classification. The method described in Duda et al. (2001) §3.8.3 projects the multivariate signal down to an M−1 dimensional space where M is the number of categories. MDA is useful because most classifiers are strongly affected by the curse of dimensionality. In other words, when signals are represented in very-high-dimensional spaces, the classifier's performance is catastrophically impaired by the overfitting problem. This problem is reduced by compressing the signal down to a lower-dimensional space as MDA does. MDA has been used to reveal neural codes.

    Read more →
  • Multispectral pattern recognition

    Multispectral pattern recognition

    Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic spectrum (Jensen, 2005). Subcategories of multispectral remote sensing include hyperspectral, in which hundreds of bands are collected and analyzed, and ultraspectral remote sensing where many hundreds of bands are used (Logicon, 1997). The main purpose of multispectral imaging is the potential to classify the image using multispectral classification. This is a much faster method of image analysis than is possible by human interpretation. == Multispectral remote sensing systems == Remote sensing systems gather data via instruments typically carried on satellites in orbit around the Earth. The remote sensing scanner detects the energy that radiates from the object or area of interest. This energy is recorded as an analog electrical signal and converted into a digital value though an A-to-D conversion. There are several multispectral remote sensing systems that can be categorized in the following way: === Multispectral imaging using discrete detectors and scanning mirrors === Landsat Multispectral Scanner (MSS) Landsat Thematic Mapper (TM) NOAA Geostationary Operational Environmental Satellite (GOES) NOAA Advanced Very High Resolution Radiometer (AVHRR) NASA and ORBIMAGE, Inc., Sea-viewing Wide field-of-view Sensor (SeaWiFS) Daedalus, Inc., Aircraft Multispectral Scanner (AMS) NASA Airborne Terrestrial Applications Sensor (ATLAS) === Multispectral imaging using linear arrays === SPOT 1, 2, and 3 High Resolution Visible (HRV) sensors and Spot 4 and 5 High Resolution Visible Infrared (HRVIR) and vegetation sensor Indian Remote Sensing System (IRS) Linear Imaging Self-scanning Sensor (LISS) Space Imaging, Inc. (IKONOS) Digital Globe, Inc. (QuickBird) ORBIMAGE, Inc. (OrbView-3) ImageSat International, Inc. (EROS A1) NASA Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) NASA Terra Multiangle Imaging Spectroradiometer (MISR) === Imaging spectrometry using linear and area arrays === NASA Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Compact Airborne Spectrographic Imager 3 (CASI 3) NASA Terra Moderate Resolution Imaging Spectrometer (MODIS) NASA Earth Observer (EO-1) Advanced Land Imager (ALI), Hyperion, and LEISA Atmospheric Corrector (LAC) === Satellite analog and digital photographic systems === Russian SPIN-2 TK-350, and KVR-1000 NASA Space Shuttle and International Space Station Imagery == Multispectral classification methods == A variety of methods can be used for the multispectral classification of images: Algorithms based on parametric and nonparametric statistics that use ratio-and interval-scaled data and nonmetric methods that can also incorporate nominal scale data (Duda et al., 2001), Supervised or unsupervised classification logic, Hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products, Per-pixel or object-oriented classification logic, and Hybrid approaches == Supervised classification == In this classification method, the identity and location of some of the land-cover types are obtained beforehand from a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. The analyst would locate sites that have similar characteristics to the known land-cover types. These areas are known as training sites because the known characteristics of these sites are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. All pixels inside and outside of the training sites are evaluated and allocated to the class with the more similar characteristics. === Classification scheme === The first step in the supervised classification method is to identify the land-cover and land-use classes to be used. Land-cover refers to the type of material present on the site (e.g. water, crops, forest, wet land, asphalt, and concrete). Land-use refers to the modifications made by people to the land cover (e.g. agriculture, commerce, settlement). All classes should be selected and defined carefully to properly classify remotely sensed data into the correct land-use and/or land-cover information. To achieve this purpose, it is necessary to use a classification system that contains taxonomically correct definitions of classes. If a hard classification is desired, the following classes should be used: Mutually exclusive: there is not any taxonomic overlap of any classes (i.e., rain forest and evergreen forest are distinct classes). Exhaustive: all land-covers in the area have been included. Hierarchical: sub-level classes (e.g., single-family residential, multiple-family residential) are created, allowing that these classes can be included in a higher category (e.g., residential). Some examples of hard classification schemes are: American Planning Association Land-Based Classification System United States Geological Survey Land-use/Land-cover Classification System for Use with Remote Sensor Data U.S. Department of the Interior Fish and Wildlife Service U.S. National Vegetation and Classification System International Geosphere-Biosphere Program IGBP Land Cover Classification System === Training sites === Once the classification scheme is adopted, the image analyst may select training sites in the image that are representative of the land-cover or land-use of interest. If the environment where the data was collected is relatively homogeneous, the training data can be used. If different conditions are found in the site, it would not be possible to extend the remote sensing training data to the site. To solve this problem, a geographical stratification should be done during the preliminary stages of the project. All differences should be recorded (e.g. soil type, water turbidity, crop species, etc.). These differences should be recorded on the imagery and the selection training sites made based on the geographical stratification of this data. The final classification map would be a composite of the individual stratum classifications. After the data are organized in different training sites, a measurement vector is created. This vector would contain the brightness values for each pixel in each band in each training class. The mean, standard deviation, variance-covariance matrix, and correlation matrix are calculated from the measurement vectors. Once the statistics from each training site are determined, the most effective bands for each class should be selected. The objective of this discrimination is to eliminate the bands that can provide redundant information. Graphical and statistical methods can be used to achieve this objective. Some of the graphic methods are: Bar graph spectral plots Cospectral mean vector plots Feature space plots Cospectral parallelepiped or ellipse plots === Classification algorithm === The last step in supervised classification is selecting an appropriate algorithm. The choice of a specific algorithm depends on the input data and the desired output. Parametric algorithms are based on the fact that the data is normally distributed. If the data is not normally distributed, nonparametric algorithms should be used. The more common nonparametric algorithms are: One-dimensional density slicing Parallelipiped Minimum distance Nearest-neighbor Expert system analysis Convolutional neural network == Unsupervised classification == Unsupervised classification (also known as clustering) is a method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Unsupervised classification require less input information from the analyst compared to supervised classification because clustering does not require training data. This process consists in a series of numerical operations to search for the spectral properties of pixels. From this process, a map with m spectral classes is obtained. Using the map, the analyst tries to assign or transform the spectral classes into thematic information of interest (i.e. forest, agriculture, urban). This process may not be easy because some spectral clusters represent mixed classes of surface materials and may not be useful. The analyst has to understand the spectral characteristics of the terrain to be able to label clusters as a specific information class. There are hundreds of clustering algorithms. Two of the most conceptually simple algorithms are the chain method and the ISODATA method. === Chain method === The algorithm used in this method operates in a two-pass mode (it passes through the multispectral dataset two times. In the first pass, the program reads through the dataset and sequentially builds clusters (groups of p

    Read more →
  • Characteristic samples

    Characteristic samples

    Characteristic samples is a concept in the field of grammatical inference, related to passive learning. In passive learning, an inference algorithm I {\displaystyle I} is given a set of pairs of strings and labels S {\displaystyle S} , and returns a representation R {\displaystyle R} that is consistent with S {\displaystyle S} . Characteristic samples consider the scenario when the goal is not only finding a representation consistent with S {\displaystyle S} , but finding a representation that recognizes a specific target language. A characteristic sample of language L {\displaystyle L} is a set of pairs of the form ( s , l ( s ) ) {\displaystyle (s,l(s))} where: l ( s ) = 1 {\displaystyle l(s)=1} if and only if s ∈ L {\displaystyle s\in L} l ( s ) = − 1 {\displaystyle l(s)=-1} if and only if s ∉ L {\displaystyle s\notin L} Given the characteristic sample S {\displaystyle S} , I {\displaystyle I} 's output on it is a representation R {\displaystyle R} , e.g. an automaton, that recognizes L {\displaystyle L} . == Formal Definition == === The Learning Paradigm associated with Characteristic Samples === There are three entities in the learning paradigm connected to characteristic samples, the adversary, the teacher and the inference algorithm. Given a class of languages C {\displaystyle \mathbb {C} } and a class of representations for the languages R {\displaystyle \mathbb {R} } , the paradigm goes as follows: The adversary A {\displaystyle A} selects a language L ∈ C {\displaystyle L\in \mathbb {C} } and reports it to the teacher The teacher T {\displaystyle T} then computes a set of strings and label them correctly according to L {\displaystyle L} , trying to make sure that the inference algorithm will compute L {\displaystyle L} The adversary can add correctly labeled words to the set in order to confuse the inference algorithm The inference algorithm I {\displaystyle I} gets the sample and computes a representation R ∈ R {\displaystyle R\in \mathbb {R} } consistent with the sample. The goal is that when the inference algorithm receives a characteristic sample for a language L {\displaystyle L} , or a sample that subsumes a characteristic sample for L {\displaystyle L} , it will return a representation that recognizes exactly the language L {\displaystyle L} . === Sample === Sample S {\displaystyle S} is a set of pairs of the form ( s , l ( s ) ) {\displaystyle (s,l(s))} such that l ( s ) ∈ { − 1 , 1 } {\displaystyle l(s)\in \{-1,1\}} ==== Sample consistent with a language ==== We say that a sample S {\displaystyle S} is consistent with language L {\displaystyle L} if for every pair ( s , l ( s ) ) {\displaystyle (s,l(s))} in S {\displaystyle S} : l ( s ) = 1 if and only if s ∈ L {\displaystyle l(s)=1{\text{ if and only if }}s\in L} l ( s ) = − 1 if and only if s ∉ L {\displaystyle l(s)=-1{\text{ if and only if }}s\notin L} === Characteristic sample === Given an inference algorithm I {\displaystyle I} and a language L {\displaystyle L} , a sample S {\displaystyle S} that is consistent with L {\displaystyle L} is called a characteristic sample of L {\displaystyle L} for I {\displaystyle I} if: I {\displaystyle I} 's output on S {\displaystyle S} is a representation R {\displaystyle R} that recognizes L {\displaystyle L} . For every sample D {\displaystyle D} that is consistent with L {\displaystyle L} and also fulfils S ⊆ D {\displaystyle S\subseteq D} , I {\displaystyle I} 's output on D {\displaystyle D} is a representation R {\displaystyle R} that recognizes L {\displaystyle L} . A Class of languages C {\displaystyle \mathbb {C} } is said to have charistaristic samples if every L ∈ C {\displaystyle L\in \mathbb {C} } has a characteristic sample. == Related Theorems == === Theorem === If equivalence is undecidable for a class C {\textstyle \mathbb {C} } over Σ {\textstyle \Sigma } of cardinality bigger than 1, then C {\textstyle \mathbb {C} } doesn't have characteristic samples. ==== Proof ==== Given a class of representations C {\textstyle \mathbb {C} } such that equivalence is undecidable, for every polynomial p ( x ) {\displaystyle p(x)} and every n ∈ N {\displaystyle n\in \mathbb {N} } , there exist two representations r 1 {\displaystyle r_{1}} and r 2 {\displaystyle r_{2}} of sizes bounded by n {\displaystyle n} , that recognize different languages but are inseparable by any string of size bounded by p ( n ) {\displaystyle p(n)} . Assuming this is not the case, we can decide if r 1 {\displaystyle r_{1}} and r 2 {\displaystyle r_{2}} are equivalent by simulating their run on all strings of size smaller than p ( n ) {\displaystyle p(n)} , contradicting the assumption that equivalence is undecidable. === Theorem === If S 1 {\displaystyle S_{1}} is a characteristic sample for L 1 {\displaystyle L_{1}} and is also consistent with L 2 {\displaystyle L_{2}} , then every characteristic sample of L 2 {\displaystyle L_{2}} , is inconsistent with L 1 {\displaystyle L_{1}} . ==== Proof ==== Given a class C {\textstyle \mathbb {C} } that has characteristic samples, let R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} be representations that recognize L 1 {\displaystyle L_{1}} and L 2 {\displaystyle L_{2}} respectively. Under the assumption that there is a characteristic sample for L 1 {\displaystyle L_{1}} , S 1 {\displaystyle S_{1}} that is also consistent with L 2 {\displaystyle L_{2}} , we'll assume falsely that there exist a characteristic sample for L 2 {\displaystyle L_{2}} , S 2 {\displaystyle S_{2}} that is consistent with L 1 {\displaystyle L_{1}} . By the definition of characteristic sample, the inference algorithm I {\displaystyle I} must return a representation which recognizes the language if given a sample that subsumes the characteristic sample itself. But for the sample S 1 ∪ S 2 {\displaystyle S_{1}\cup S_{2}} , the answer of the inferring algorithm needs to recognize both L 1 {\displaystyle L_{1}} and L 2 {\displaystyle L_{2}} , in contradiction. === Theorem === If a class is polynomially learnable by example based queries, it is learnable with characteristic samples. == Polynomialy characterizable classes == === Regular languages === The proof that DFA's are learnable using characteristic samples, relies on the fact that every regular language has a finite number of equivalence classes with respect to the right congruence relation, ∼ L {\displaystyle \sim _{L}} (where x ∼ L y {\displaystyle x\sim _{L}y} for x , y ∈ Σ ∗ {\displaystyle x,y\in \Sigma ^{}} if and only if ∀ z ∈ Σ ∗ : x z ∈ L ↔ y z ∈ L {\displaystyle \forall z\in \Sigma ^{}:xz\in L\leftrightarrow yz\in L} ). Note that if x {\displaystyle x} , y {\displaystyle y} are not congruent with respect to ∼ L {\displaystyle \sim _{L}} , there exists a string z {\displaystyle z} such that x z ∈ L {\displaystyle xz\in L} but y z ∉ L {\displaystyle yz\notin L} or vice versa, this string is called a separating suffix. ==== Constructing a characteristic sample ==== The construction of a characteristic sample for a language L {\displaystyle L} by the teacher goes as follows. Firstly, by running a depth first search on a deterministic automaton A {\displaystyle A} recognizing L {\displaystyle L} , starting from its initial state, we get a suffix closed set of words, W {\displaystyle W} , ordered in shortlex order. From the fact above, we know that for every two states in the automaton, there exists a separating suffix that separates between every two strings that the run of A {\displaystyle A} on them ends in the respective states. We refer to the set of separating suffixes as S {\displaystyle S} . The labeled set (sample) of words the teacher gives the adversary is { ( w , l ( w ) ) | w ∈ W ⋅ S ∪ W ⋅ Σ ⋅ S } {\displaystyle \{(w,l(w))|w\in W\cdot S\cup W\cdot \Sigma \cdot S\}} where l ( w ) {\displaystyle l(w)} is the correct label of w {\displaystyle w} (whether it is in L {\displaystyle L} or not). We may assume that ϵ ∈ S {\displaystyle \epsilon \in S} . ==== Constructing a deterministic automata ==== Given the sample from the adversary W {\displaystyle W} , the construction of the automaton by the inference algorithm I {\displaystyle I} starts with defining P = prefix ( W ) {\displaystyle P={\text{prefix}}(W)} and S = suffix ( W ) {\displaystyle S={\text{suffix}}(W)} , which are the set of prefixes and suffixes of W {\displaystyle W} respectively. Now the algorithm constructs a matrix M {\displaystyle M} where the elements of P {\displaystyle P} function as the rows, ordered by the shortlex order, and the elements of S {\displaystyle S} function as the columns, ordered by the shortlex order. Next, the cells in the matrix are filled in the following manner for prefix p i {\displaystyle p_{i}} and suffix s j {\displaystyle s_{j}} : If p i s j ∈ W → M i j = l ( p i s j ) {\displaystyle p_{i}s_{j}\in W\rightarrow M_{ij}=l(p_{i}s_{j})} else, M i j = 0 {\displaystyle M_{ij}=0} Now, we say row i {\displaystyle i} and t {\displaystyle t} are distinguishable if there exi

    Read more →
  • Deep Learning Super Sampling

    Deep Learning Super Sampling

    Deep Learning Super Sampling (DLSS) is a suite of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available in a number of video games. The goal of these technologies is to allow the majority of the graphics pipeline to run at a lower resolution for increased performance, and then infer a higher resolution image from this that approximates the same level of detail as if the image had been rendered at this higher resolution. This allows for higher graphical settings or frame rates for a given output resolution, depending on user preference. All generations of DLSS are available on all RTX-branded cards from Nvidia in supported titles. However, the Frame Generation feature is only supported on RTX 40 series GPUs or newer and Multi Frame Generation is only available on 50 series GPUs. == History == Nvidia advertised DLSS as a key feature of GeForce RTX 20 series GPUs when they launched in September 2018. At that time, the results were limited to a few video games, namely Battlefield V, or Metro Exodus, because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling. In 2019, Control shipped with ray tracing and an image processing algorithm that approximated DLSS, which did not use the Tensor Cores. In April 2020, Nvidia advertised and shipped an improved version of DLSS named DLSS 2 with driver version 445.75. DLSS 2.0 was available for a few existing games including Control and Wolfenstein: Youngblood, and would later be added to many newly released games and game engines such as Unreal Engine and Unity. This time Nvidia said that it used the Tensor Cores again, and that the AI did not need to be trained specifically on each game. Despite sharing the DLSS branding, the two iterations of DLSS differ significantly and are not backwards-compatible. In January 2025, Nvidia stated that there are over 540 games and apps supporting DLSS, and that over 80% of Nvidia RTX users activate DLSS. In March 2025, there were more than 100 games that support DLSS 4, according to Nvidia. By May 2025, over 125 games supported DLSS 4. The first video game console to use DLSS, the Nintendo Switch 2, was released on June 5, 2025. Nvidia announced DLSS 4.5 at CES 2026. In January 2026, Nvidia stated that over 250 games and applications support Multi Frame Generation. On March 16, 2026, at GTC 2026, Nvidia CEO Jensen Huang presented DLSS 5, a real-time AI model based on neural rendering that realistically enhances lighting and material surfaces at up to 4K resolution while retaining the developer's intended art style. It is planned to release in fall of 2026. In a blog post on its website, Nvidia has announced that DLSS 5 will be available in such games as Assassin's Creed Shadows, Delta Force, Hogwarts Legacy, Naraka: Bladepoint, Phantom Blade Zero, Resident Evil Requiem, Starfield, The Elder Scrolls IV: Oblivion Remastered, and more. On May 31, 2026, Nvidia announced an updated version of Ray Reconstruction for DLSS 4.5 in a blog post, scheduled for release on all RTX GPUs in August of the same year. They said it is designed to better embed spatial awareness into scenes and analyze engine data on movements and lighting conditions, resulting in a sharper, more stable, and less noisy image. === Release timeline === == Technology == === DLSS 1 === The first iteration of DLSS is a predominantly spatial image upscaler with two stages, both relying on convolutional auto-encoder neural networks. The first step is an image enhancement network which uses the current frame and motion vectors to perform edge enhancement, and spatial anti-aliasing. The second stage is an image upscaling step which uses the single raw, low-resolution frame to upscale the image to the desired output resolution. Using just a single frame for upscaling means the neural network itself must generate a large amount of new information to produce the high-resolution output, which can result in slight hallucinations such as leaves that differ in style to the source content. The neural networks are trained on a per-game basis by generating a "perfect frame" using traditional supersampling to 64 samples per pixel, as well as the motion vectors for each frame. The data collected must be as comprehensive as possible, including as many levels, times of day, graphical settings, resolutions, etc. as possible. This data is also augmented using common augmentations such as rotations, colour changes, and random noise to help generalize the test data. Training is performed on Nvidia's Saturn V supercomputer. This first iteration received a mixed response, with many criticizing the often soft appearance and artifacts along with glitches in certain situations; likely a side effect of the limited data from only using a single frame input to the neural networks which could not be trained to perform optimally in all scenarios and edge-cases. Nvidia also demonstrated the ability for the auto-encoder networks to learn the ability to recreate depth-of-field and motion blur, although this functionality has never been included in a publicly released product. === DLSS 2 === DLSS 2 is a temporal anti-aliasing upsampling (TAAU) implementation, using data from previous frames extensively through sub-pixel jittering to resolve fine detail and reduce aliasing. The data DLSS 2 collects includes: the raw low-resolution input, motion vectors, depth buffers, and exposure / brightness information. It can also be used as a simpler TAA implementation where the image is rendered at 100% resolution, rather than being upsampled by DLSS, Nvidia brands this as DLAA (Deep Learning Anti-Aliasing). TAA(U) is used in many modern video games and game engines; however, all previous implementations have used some form of manually written heuristics to prevent temporal artifacts such as ghosting and flickering. One example of this is neighborhood clamping which forcefully prevents samples collected in previous frames from deviating too much compared to nearby pixels in newer frames. This helps to identify and fix many temporal artifacts, but deliberately removing fine details in this way is analogous to applying a blur filter, and thus the final image can appear blurry when using this method. DLSS 2 uses a convolutional auto-encoder neural network trained to identify and fix temporal artifacts, instead of manually programmed heuristics as mentioned above. Because of this, DLSS 2 can generally resolve detail better than other TAA and TAAU implementations, while also removing most temporal artifacts. This is why DLSS 2 can sometimes produce a sharper image than rendering at higher, or even native resolutions using traditional TAA. However, no temporal solution is perfect, and artifacts (ghosting in particular) are still visible in some scenarios when using DLSS 2. Because temporal artifacts occur in most art styles and environments in broadly the same way, the neural network that powers DLSS 2 does not need to be retrained when being used in different games. Despite this, Nvidia does frequently ship new minor revisions of DLSS 2 with new titles, so this could suggest some minor training optimizations may be performed as games are released, although Nvidia does not provide changelogs for these minor revisions to confirm this. The main advancements compared to DLSS 1 include: Significantly improved detail retention, a generalized neural network that does not need to be re-trained per-game, and ~2x less overhead (~1–2 ms vs ~2–4 ms). It should also be noted that forms of TAAU such as DLSS 2 are not upscalers in the same sense as techniques such as ESRGAN or DLSS 1, which attempt to create new information from a low-resolution source; instead, TAAU works to recover data from previous frames, rather than creating new data. In practice, this means low resolution textures in games will still appear low-resolution when using current TAAU techniques. This is why Nvidia recommends game developers use higher resolution textures than they would normally for a given rendering resolution by applying a mip-map bias when DLSS 2 is enabled. === DLSS 3 === Augments DLSS 2 with improved image quality and the introduction of a new motion interpolation feature, called Frame Generation. The DLSS Frame Generation algorithm takes two rendered frames from the rendering pipeline and generates a new frame that smoothly transitions between them. For every frame rendered, one additional frame is generated. DLSS 3.0 makes use of a new generation Optical Flow Accelerator (OFA) included in the Ada Lovelace architecture of GeForce RTX 40 series GPUs and with that is exclusive to them. The new OFA is said to be faster and more accurate than the one already available in previous Turing and Ampere RTX GPUs. === DLSS 3.5 === DLSS 3.5 adds Ray Reconstruction, replacing multiple denoising algorithms with a single AI model trained o

    Read more →
  • Generalized blockmodeling

    Generalized blockmodeling

    In generalized blockmodeling, the blockmodeling is done by "the translation of an equivalence type into a set of permitted block types", which differs from the conventional blockmodeling, which is using the indirect approach. It's a special instance of the direct blockmodeling approach. Generalized blockmodeling was introduced in 1994 by Patrick Doreian, Vladimir Batagelj and Anuška Ferligoj. == Definition == Generalized blockmodeling approach is a direct one, "where the optimal partition(s) is (are) identified based on minimal values of a compatible criterion function defined by the difference between empirical blocks and corresponding ideal blocks". At the same time, the much broader set of block types is introduced (while in conventional blockmodeling only certain types are used). The conventional blockmodeling is inductive due to nonspecification of neither the clusters or the location of block types, while in generalized blockmodeling the blockmodel is specified with more detail than just the permition of certain block types (e.g., prespecification). Further, it's possible to define departures from the permitted (ideal) blocktype, using criterion function. Using local optimization procedure, firstly the initial clustering (with specified number of clusters is done, based on random creation. How the clusters are neighboring to each other, is based on two transformations: 1) a vertex is moved from one to another cluster or 2) a pair of vertices is interchanged between two different clusters. This process of transformation steps is repeated many times, until only the best fitting partitions (with the minimized value of the criterion function) are kept as blockmodels for the future exploration of the network. Different types of generalized blockmodeling are: generalized binary blockmodeling, generalized valued blockmodeling and generalized homogeneity blockmodeling. == Benefits == According to Patrick Doreian, the benefits of generalized blockmodeling, are as follows: usage of explicit criterion function, compatible with a given type of equivalence, results to in-built measure of fit, which is integral to the establishment of the blockmodels (in conventional blockmodeling, there is no compelling and coherent measures of fit); partitions, based on generalized blockmodeling, regularly outperform and never perform less well than the partitions, based on conventional approach; with generalized blockmodeling it's possible to specify new types of blockmodels; this potentially unlimited set of new block types also results in permittion of inclusion of substantively driven blockmodels; in generalized blockmodeling, the specification of the block types and the location of some of them in the blockmodel is possible; researcher can speficy which (pair of) vertices must be (not) clustered together; this approach also allows the imposition of penalties, resulting into identification of empirical null blocks without inconsistencies with a corresponding ideal null block. == Problems == According to Doreian, the problems of generalized blockmodeling, are as follows: unknown sensitivity to particular data features, examination of boundary problems, computationally burdensome, which results in a constraint regarding practical network size (generalized blockmodeling is thus primarily used to analyse smaller networks (below 100 units)), identifying structure from incomplete network information, most of generalized blockmodeling is based on binary networks, but there is also development in the field of valued networks, criterion function is minimized for a specified blockmodel, with results in issues of evaluating statistically, based on the structural data alone, problems regarding three dimensional network data, problems regarding the evolution of fundamental network structure. == Book == The book with the same title, Generalized blockmodeling, written by Patrick Doreian, Vladimir Batagelj and Anuška Ferligoj, was in 2007 awarded the Harrison White Outstanding Book Award by the Mathematical Sociology Section of American Sociological Association.

    Read more →
  • Count sketch

    Count sketch

    Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms. It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams (these calculations require counting of the number of occurrences for the distinct elements of the stream). The sketch is nearly identical to the Feature hashing algorithm by John Moody, but differs in its use of hash functions with low dependence, which makes it more practical. In order to still have a high probability of success, the median trick is used to aggregate multiple count sketches, rather than the mean. These properties allow use for explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms. == Intuitive explanation == The inventors of this data structure offer the following iterative explanation of its operation: at the simplest level, the output of a single hash function s mapping stream elements q into {+1, -1} is feeding a single up/down counter C. After a single pass over the data, the frequency n ( q ) {\displaystyle n(q)} of a stream element q can be approximated, although extremely poorly, by the expected value E [ C ⋅ s ( q ) ] {\displaystyle {\mathbf {E}}[C\cdot s(q)]} ; a straightforward way to improve the variance of the previous estimate is to use an array of different hash functions s i {\displaystyle s_{i}} , each connected to its own counter C i {\displaystyle C_{i}} . For each i, the E [ C i ⋅ s i ( q ) ] = n ( q ) {\displaystyle {\mathbf {E}}[C_{i}\cdot s_{i}(q)]=n(q)} still holds, so averaging across the i range will tighten the approximation; the previous construct still has a major deficiency: if a lower-frequency-but-still-important output element a exhibits a hash collision with a high-frequency element even for one of the s i {\displaystyle s_{i}} hashes, n ( a ) {\displaystyle n(a)} estimate can be significantly affected. Avoiding this requires reducing the frequency of collision counter updates between any two distinct elements. This is achieved by replacing each C i {\displaystyle C_{i}} in the previous construct with an array of m counters (making the counter set into a two-dimensional matrix C i , j {\displaystyle C_{i,j}} ), with index j of a particular counter to be incremented/decremented selected via another set of hash functions h i {\displaystyle h_{i}} that map element q into the range {1..m}. Since E [ C i , h i ( q ) ⋅ s i ( q ) ] = n ( q ) {\displaystyle {\mathbf {E}}[C_{i,h_{i}(q)}\cdot s_{i}(q)]=n(q)} , averaging across all values of i will work. == Mathematical definition == 1. For constants w {\displaystyle w} and t {\displaystyle t} (to be defined later) independently choose d = 2 t + 1 {\displaystyle d=2t+1} random hash functions h 1 , … , h d {\displaystyle h_{1},\dots ,h_{d}} and s 1 , … , s d {\displaystyle s_{1},\dots ,s_{d}} such that h i : [ n ] → [ w ] {\displaystyle h_{i}:[n]\to [w]} and s i : [ n ] → { ± 1 } {\displaystyle s_{i}:[n]\to \{\pm 1\}} . It is necessary that the hash families from which h i {\displaystyle h_{i}} and s i {\displaystyle s_{i}} are chosen be pairwise independent. 2. For each item q i {\displaystyle q_{i}} in the stream, add s j ( q i ) {\displaystyle s_{j}(q_{i})} to the h j ( q i ) {\displaystyle h_{j}(q_{i})} th bucket of the j {\displaystyle j} th hash. At the end of this process, one has w d {\displaystyle wd} sums ( C i j ) {\displaystyle (C_{ij})} where C i , j = ∑ h i ( k ) = j s i ( k ) . {\displaystyle C_{i,j}=\sum _{h_{i}(k)=j}s_{i}(k).} To estimate the count of q {\displaystyle q} s one computes the following value: r q = median i = 1 d s i ( q ) ⋅ C i , h i ( q ) . {\displaystyle r_{q}={\text{median}}_{i=1}^{d}\,s_{i}(q)\cdot C_{i,h_{i}(q)}.} The values s i ( q ) ⋅ C i , h i ( q ) {\displaystyle s_{i}(q)\cdot C_{i,h_{i}(q)}} are unbiased estimates of how many times q {\displaystyle q} has appeared in the stream. The estimate r q {\displaystyle r_{q}} has variance O ( m i n { m 1 2 / w 2 , m 2 2 / w } ) {\displaystyle O(\mathrm {min} \{m_{1}^{2}/w^{2},m_{2}^{2}/w\})} , where m 1 {\displaystyle m_{1}} is the length of the stream and m 2 2 {\displaystyle m_{2}^{2}} is ∑ q ( ∑ i [ q i = q ] ) 2 {\displaystyle \sum _{q}(\sum _{i}[q_{i}=q])^{2}} . Furthermore, r q {\displaystyle r_{q}} is guaranteed to never be more than 2 m 2 / w {\displaystyle 2m_{2}/{\sqrt {w}}} off from the true value, with probability 1 − e − O ( t ) {\displaystyle 1-e^{-O(t)}} . === Vector formulation === Alternatively Count-Sketch can be seen as a linear mapping with a non-linear reconstruction function. Let M ( i ∈ [ d ] ) ∈ { − 1 , 0 , 1 } w × n {\displaystyle M^{(i\in [d])}\in \{-1,0,1\}^{w\times n}} , be a collection of d = 2 t + 1 {\displaystyle d=2t+1} matrices, defined by M h i ( j ) , j ( i ) = s i ( j ) {\displaystyle M_{h_{i}(j),j}^{(i)}=s_{i}(j)} for j ∈ [ w ] {\displaystyle j\in [w]} and 0 everywhere else. Then a vector v ∈ R n {\displaystyle v\in \mathbb {R} ^{n}} is sketched by C ( i ) = M ( i ) v ∈ R w {\displaystyle C^{(i)}=M^{(i)}v\in \mathbb {R} ^{w}} . To reconstruct v {\displaystyle v} we take v j ∗ = median i C j ( i ) s i ( j ) {\displaystyle v_{j}^{}={\text{median}}_{i}C_{j}^{(i)}s_{i}(j)} . This gives the same guarantees as stated above, if we take m 1 = ‖ v ‖ 1 {\displaystyle m_{1}=\|v\|_{1}} and m 2 = ‖ v ‖ 2 {\displaystyle m_{2}=\|v\|_{2}} . == Relation to Tensor sketch == The count sketch projection of the outer product of two vectors is equivalent to the convolution of two component count sketches. The count sketch computes a vector convolution C ( 1 ) x ∗ C ( 2 ) x T {\displaystyle C^{(1)}x\ast C^{(2)}x^{T}} , where C ( 1 ) {\displaystyle C^{(1)}} and C ( 2 ) {\displaystyle C^{(2)}} are independent count sketch matrices. Pham and Pagh show that this equals C ( x ⊗ x T ) {\displaystyle C(x\otimes x^{T})} – a count sketch C {\displaystyle C} of the outer product of vectors, where ⊗ {\displaystyle \otimes } denotes Kronecker product. The fast Fourier transform can be used to do fast convolution of count sketches. By using the face-splitting product such structures can be computed much faster than normal matrices.

    Read more →