Information gain (decision tree)

In the context of decision trees in information theory and machine learning, information gain refers to the conditional expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one. (In broader contexts, information gain can also be used as a synonym for either Kullback–Leibler divergence or mutual information, but the focus of this article is on the more narrow meaning below.) Explicitly, the information gain of a random variable X {\displaystyle X} obtained from an observation of a random variable A {\displaystyle A} taking value a {\displaystyle a} is defined as: I G ( X , a ) = D KL ( P X ∣ a ∥ P X ) {\displaystyle {\mathit {IG}}(X,a)=D_{\text{KL}}{\bigl (}P_{X\mid a}\parallel P_{X}{\bigr )}} In other words, it is the Kullback–Leibler divergence of P X ( x ) {\displaystyle P_{X}(x)} (the prior distribution for X {\displaystyle X} ) from P X ∣ a ( x ) {\displaystyle P_{X\mid a}(x)} (the posterior distribution for X {\displaystyle X} given A = a {\displaystyle A=a} ). The expected value of the information gain is the mutual information I ( X ; A ) {\displaystyle I(X;A)} : E A ⁡ [ I G ( X , A ) ] = I ( X ; A ) {\displaystyle \operatorname {E} _{A}[{\mathit {IG}}(X,A)]=I(X;A)} i.e. the reduction in the entropy of X {\displaystyle X} achieved by learning the state of the random variable A {\displaystyle A} . In machine learning, this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree, and when applied in the area of machine learning is known as decision tree learning. Usually an attribute with high mutual information should be preferred to other attributes. == General definition == In general terms, the expected information gain is the reduction in information entropy Η from a prior state to a state that takes some information as given: I G ( T , a ) = H ( T ) − H ( T | a ) , {\displaystyle IG(T,a)=\mathrm {H} {(T)}-\mathrm {H} {(T|a)},} where H ( T | a ) {\displaystyle \mathrm {H} {(T|a)}} is the conditional entropy of T {\displaystyle T} given the value of attribute a {\displaystyle a} . This is intuitively plausible when interpreting entropy Η as a measure of uncertainty of a random variable T {\displaystyle T} : by learning (or assuming) a {\displaystyle a} about T {\displaystyle T} , our uncertainty about T {\displaystyle T} is reduced (i.e. I G ( T , a ) {\displaystyle IG(T,a)} is positive), unless of course T {\displaystyle T} is independent of a {\displaystyle a} , in which case H ( T | a ) = H ( T ) {\displaystyle \mathrm {H} (T|a)=\mathrm {H} (T)} , meaning I G ( T , a ) = 0 {\displaystyle IG(T,a)=0} . == Formal definition == Let T denote a set of training examples, each of the form ( x , y ) = ( x 1 , x 2 , x 3 , . . . , x k , y ) {\displaystyle ({\textbf {x}},y)=(x_{1},x_{2},x_{3},...,x_{k},y)} where x a ∈ v a l s ( a ) {\displaystyle x_{a}\in \mathrm {vals} (a)} is the value of the a th {\displaystyle a^{\text{th}}} attribute or feature of example x {\displaystyle {\textbf {x}}} and y is the corresponding class label. The information gain for an attribute a is defined in terms of Shannon entropy H ( − ) {\displaystyle \mathrm {H} (-)} as follows. For a value v taken by attribute a, let S a ( v ) = { x ∈ T | x a = v } {\displaystyle S_{a}{(v)}=\{{\textbf {x}}\in T|x_{a}=v\}} be defined as the set of training inputs of T for which attribute a is equal to v. Then the information gain of T for attribute a is the difference between the a priori Shannon entropy H ( T ) {\displaystyle \mathrm {H} (T)} of the training set and the conditional entropy H ( T | a ) {\displaystyle \mathrm {H} {(T|a)}} . H ( T | a ) = ∑ v ∈ v a l s ( a ) | S a ( v ) | | T | ⋅ H ( S a ( v ) ) . {\displaystyle \mathrm {H} (T|a)=\sum _{v\in \mathrm {vals} (a)}{{\frac {|S_{a}{(v)}|}{|T|}}\cdot \mathrm {H} \left(S_{a}{\left(v\right)}\right)}.} I G ( T , a ) = H ( T ) − H ( T | a ) {\displaystyle IG(T,a)=\mathrm {H} (T)-\mathrm {H} (T|a)} The mutual information is equal to the total entropy for an attribute if for each of the attribute values a unique classification can be made for the result attribute. In this case, the relative entropies subtracted from the total entropy are 0. In particular, the values v ∈ v a l s ( a ) {\displaystyle v\in vals(a)} defines a partition of the training set data T into mutually exclusive and all-inclusive subsets, inducing a categorical probability distribution P a ( v ) {\textstyle P_{a}{(v)}} on the values v ∈ v a l s ( a ) {\textstyle v\in vals(a)} of attribute a. The distribution is given P a ( v ) := | S a ( v ) | | T | {\textstyle P_{a}{(v)}:={\frac {|S_{a}{(v)}|}{|T|}}} . In this representation, the information gain of T given a can be defined as the difference between the unconditional Shannon entropy of T and the expected entropy of T conditioned on a, where the expectation value is taken with respect to the induced distribution on the values of a. I G ( T , a ) = H ( T ) − ∑ v ∈ v a l s ( a ) P a ( v ) H ( S a ( v ) ) = H ( T ) − E P a [ H ( S a ( v ) ) ] = H ( T ) − H ( T | a ) . {\displaystyle {\begin{alignedat}{2}IG(T,a)&=\mathrm {H} (T)-\sum _{v\in \mathrm {vals} (a)}{P_{a}{(v)}\mathrm {H} \left(S_{a}{(v)}\right)}\\&=\mathrm {H} (T)-\mathbb {E} _{P_{a}}{\left[\mathrm {H} {(S_{a}{(v)})}\right]}\\&=\mathrm {H} (T)-\mathrm {H} {(T|a)}.\end{alignedat}}} == Example == In engineering applications, information is analogous to signal, and entropy is analogous to noise. It determines how a decision tree chooses to split data. The leftmost figure below is very impure and has high entropy corresponding to higher disorder and lower information value. As we go to the right, the entropy decreases, and the information value increases. Now, it is clear that information gain is the measure of how much information a feature provides about a class. Let's visualize information gain in a decision tree as shown in the right: The node t is the parent node, and the sub-nodes tL and tR are child nodes. In this case, the parent node t has a collection of cancer and non-cancer samples denoted as C and NC respectively. We can use information gain to determine how good the splitting of nodes is in a decision tree. In terms of entropy, information gain is defined as: To understand this idea, let's start by an example in which we create a simple dataset and want to see if gene mutations could be related to patients with cancer. Given four different gene mutations, as well as seven samples, the training set for a decision can be created as follows: In this dataset, a 1 means the sample has the mutation (True), while a 0 means the sample does not (False). A sample with C denotes that it has been confirmed to be cancerous, while NC means it is non-cancerous. Using this data, a decision tree can be created with information gain used to determine the candidate splits for each node. For the next step, the entropy at parent node t of the above simple decision tree is computed as:H(t) = −[pC,t log2(pC,t) + pNC,t log2(pNC,t)] where, probability of selecting a class ‘C’ sample at node t, pC,t = n(t, C) / n(t), probability of selecting a class ‘NC’ sample at node t, pNC,t = n(t, NC) / n(t), n(t), n(t, C), and n(t, NC) are the number of total samples, ‘C’ samples and ‘NC’ samples at node t respectively.Using this with the example training set, the process for finding information gain beginning with H ( t ) {\displaystyle \mathrm {H} {(t)}} for Mutation 1 is as follows: pC, t = 4/7 pNC, t = 3/7 H ( t ) {\displaystyle \mathrm {H} {(t)}} = −(4/7 × log2(4/7) + 3/7 × log2(3/7)) = 0.985 Note: H ( t ) {\displaystyle \mathrm {H} {(t)}} will be the same for all mutations at the root. The relatively high value of entropy H ( t ) = 0.985 {\displaystyle \mathrm {H} {(t)}=0.985} (1 is the optimal value) suggests that the root node is highly impure and the constituents of the input at the root node would look like the leftmost figure in the above Entropy Diagram. However, such a set of data is good for learning the attributes of the mutations used to split the node. At a certain node, when the homogeneity of the constituents of the input occurs (as shown in the rightmost figure in the above Entropy Diagram), the dataset would no longer be good for learning. Moving on, the entropy at left and right child nodes of the above decision tree is computed using the formulae:H(tL) = −[pC,L log2(pC,L) + pNC,L log2(pNC,L)]H(tR) = −[pC,R log2(pC,R) + pNC,R log2(pNC,R)]where, probability of selecting a class ‘C’ sample at the left child node, pC,L = n(tL, C) / n(tL), probability of selecting a class ‘NC’ sample at the left child node, pNC,L = n(tL, NC) / n(tL), probability of selecting a class ‘C’ sample at the right child node, pC,R = n(tR, C) / n(tR), prob

Plum Voice

The Plum Group, Inc. (DBA Plum Voice) is a company. Plum is headquartered in New York City with offices in Boston and Denver. == History == Plum Voice, founded in 2000 as The Plum Group, Inc., was incorporated to create technologies for personalized audio communication. By 2001, Plum had commercialized the open-standard Plum VoiceXML IVR platform which facilitated the creation of dynamic telecom applications. 2001 - Commercial launch of Plum VoiceXML IVR platform for customer-premises deployment 2002 - Launch of Plum Voice Hosting Centers for 24x7x365 managed IVR hosting 2004 - Plum Voice application suite receives a "Product of the Year" award from Customer Interactions magazine 2008 - Plum Survey builder launched, a do-it-yourself IVR survey tool. 2010 - Plum launched QuickFuse, a web-based rapid development platform used to create voice applications. 2013 - Plum launched VoiceTrends, an analytics and reporting toolkit designed specifically for voice applications. Plum achieves PCI-DSS Level 1. 2015 - Plum launched Plum Insight, a multi-channel (voice, web, mobile) survey platform. Plum achieves HIPAA compliance. 2016 - Plum launched a new version of QuickFuse called Fuse+. 2020 - Plum sunsets QuickFuse, rebrands Fuse+ as Plum Fuse.

BL (logic)

In mathematical logic, basic fuzzy logic (or shortly BL), the logic of the continuous t-norms, is one of the t-norm fuzzy logics. It belongs to the broader class of substructural logics, or logics of residuated lattices; it extends the logic MTL of all left-continuous t-norms. == Syntax == === Language === The language of the propositional logic BL consists of countably many propositional variables and the following primitive logical connectives: Implication → {\displaystyle \rightarrow } (binary) Strong conjunction ⊗ {\displaystyle \otimes } (binary). The sign & is a more traditional notation for strong conjunction in the literature on fuzzy logic, while the notation ⊗ {\displaystyle \otimes } follows the tradition of substructural logics. Bottom ⊥ {\displaystyle \bot } (nullary — a propositional constant); 0 {\displaystyle 0} or 0 ¯ {\displaystyle {\overline {0}}} are common alternative signs and zero a common alternative name for the propositional constant (as the constants bottom and zero of substructural logics coincide in MTL). The following are the most common defined logical connectives: Weak conjunction ∧ {\displaystyle \wedge } (binary), also called lattice conjunction (as it is always realized by the lattice operation of meet in algebraic semantics). Unlike MTL and weaker substructural logics, weak conjunction is definable in BL as A ∧ B ≡ A ⊗ ( A → B ) {\displaystyle A\wedge B\equiv A\otimes (A\rightarrow B)} Negation ¬ {\displaystyle \neg } (unary), defined as ¬ A ≡ A → ⊥ {\displaystyle \neg A\equiv A\rightarrow \bot } Equivalence ↔ {\displaystyle \leftrightarrow } (binary), defined as A ↔ B ≡ ( A → B ) ∧ ( B → A ) {\displaystyle A\leftrightarrow B\equiv (A\rightarrow B)\wedge (B\rightarrow A)} As in MTL, the definition is equivalent to ( A → B ) ⊗ ( B → A ) . {\displaystyle (A\rightarrow B)\otimes (B\rightarrow A).} (Weak) disjunction ∨ {\displaystyle \vee } (binary), also called lattice disjunction (as it is always realized by the lattice operation of join in algebraic semantics), defined as A ∨ B ≡ ( ( A → B ) → B ) ∧ ( ( B → A ) → A ) {\displaystyle A\vee B\equiv ((A\rightarrow B)\rightarrow B)\wedge ((B\rightarrow A)\rightarrow A)} Top ⊤ {\displaystyle \top } (nullary), also called one and denoted by 1 {\displaystyle 1} or 1 ¯ {\displaystyle {\overline {1}}} (as the constants top and zero of substructural logics coincide in MTL), defined as ⊤ ≡ ⊥ → ⊥ {\displaystyle \top \equiv \bot \rightarrow \bot } Well-formed formulae of BL are defined as usual in propositional logics. In order to save parentheses, it is common to use the following order of precedence: Unary connectives (bind most closely) Binary connectives other than implication and equivalence Implication and equivalence (bind most loosely) === Axioms === A Hilbert-style deduction system for BL has been introduced by Petr Hájek (1998). Its single derivation rule is modus ponens: from A {\displaystyle A} and A → B {\displaystyle A\rightarrow B} derive B . {\displaystyle B.} The following are its axiom schemata: ( B L 1 ) : ( A → B ) → ( ( B → C ) → ( A → C ) ) ( B L 2 ) : A ⊗ B → A ( B L 3 ) : A ⊗ B → B ⊗ A ( B L 4 ) : A ⊗ ( A → B ) → B ⊗ ( B → A ) ( B L 5 a ) : ( A → ( B → C ) ) → ( A ⊗ B → C ) ( B L 5 b ) : ( A ⊗ B → C ) → ( A → ( B → C ) ) ( B L 6 ) : ( ( A → B ) → C ) → ( ( ( B → A ) → C ) → C ) ( B L 7 ) : ⊥ → A {\displaystyle {\begin{array}{ll}{\rm {(BL1)}}\colon &(A\rightarrow B)\rightarrow ((B\rightarrow C)\rightarrow (A\rightarrow C))\\{\rm {(BL2)}}\colon &A\otimes B\rightarrow A\\{\rm {(BL3)}}\colon &A\otimes B\rightarrow B\otimes A\\{\rm {(BL4)}}\colon &A\otimes (A\rightarrow B)\rightarrow B\otimes (B\rightarrow A)\\{\rm {(BL5a)}}\colon &(A\rightarrow (B\rightarrow C))\rightarrow (A\otimes B\rightarrow C)\\{\rm {(BL5b)}}\colon &(A\otimes B\rightarrow C)\rightarrow (A\rightarrow (B\rightarrow C))\\{\rm {(BL6)}}\colon &((A\rightarrow B)\rightarrow C)\rightarrow (((B\rightarrow A)\rightarrow C)\rightarrow C)\\{\rm {(BL7)}}\colon &\bot \rightarrow A\end{array}}} The axioms (BL2) and (BL3) of the original axiomatic system were shown to be redundant (Chvalovský, 2012) and (Cintula, 2005). All the other axioms were shown to be independent (Chvalovský, 2012). == Semantics == Like in other propositional t-norm fuzzy logics, algebraic semantics is predominantly used for BL, with three main classes of algebras with respect to which the logic is complete: General semantics, formed of all BL-algebras — that is, all algebras for which the logic is sound Linear semantics, formed of all linear BL-algebras — that is, all BL-algebras whose lattice order is linear Standard semantics, formed of all standard BL-algebras — that is, all BL-algebras whose lattice reduct is the real unit interval [0, 1] with the usual order; they are uniquely determined by the function that interprets strong conjunction, which can be any continuous t-norm.

Generative literature

Generative literature is poetry or fiction that is automatically generated, often using computers. It is a genre of electronic literature, and also related to generative art. John Clark's Latin Verse Machine (1830–1843) is probably the first example of mechanised generative literature, while Christopher Strachey's love letter generator (1952) is the first digital example. With the large language models (LLMs) of the 2020s, generative literature is becoming increasingly common. == Definitions == Hannes Bajohr defines generative literature as literature involving "the automatic production of text according to predetermined parameters, usually following a combinatory, sometimes aleatory logic, and it emphasizes the production rather than the reception of the work (unlike, say, hypertext)." In his book Electronic Literature, Scott Rettberg connects generative literature to avant-garde literary movements like Dada, Surrealism, Oulipo and Fluxus. Bajohr argues that conceptual art is also an important reference. == Paradigms of generative literature == Bajohr describes two main paradigms of generative literature: the sequential paradigm, where the text generation is "executed as a sequence of rule-steps" and employs linear algorithms, and the connectionist paradigm, which is based on neural nets. The latter leads to what Bajohr calls a algorithmic empathy: "a non-anthropocentric empathy aimed not at the psychological states of the artists but at understanding the process of the work’s material production." == Poetry generation == The first examples of automated generative literature are poetry: John Clark's mechanical Latin Verse Machine (1830–1843) produced lines of hexameter verse in Latin, and Christopher Strachey's love letter generator (1952), programmed on the Manchester Mark 1 computer, generated short, satirical love letters. Examples of generative poetry using artificial neural networks include David Jhave Johnston's ReRites. == Narrative generation == Story generators have often followed specific narratological theories of how stories are constructed. An early example is Grimes' Fairy Tales, the "first to take a grammar-based approach and the first to operationalize Propp's famous model." Mike Sharples and Rafael Peréz y Peréz's book Story Machines gives a detailed history of story generation. Storyland by Nanette Wylde is an example of generative narrative. Jonathan Baillehache compares Storyland to Surrealist writing. Baillehache states, "When compared to earlier uses of chance operation in literature, a piece like this one resembles some of the automatic writings produced by André Breton and Philippe Soupault in their collective work The Magnetic Fields. . . The difference between Nanette Wylde’s Storyland and Breton and Soupault’s Magnetic Fields is that the former is produced according to a computational algorithm involving randomizers and user interaction, and the latter by two free-wheeling human subjects."

Computer-automated design

Design Automation usually refers to electronic design automation, or Design Automation which is a Product Configurator. Extending Computer-Aided Design (CAD), automated design and Computer-Automated Design (CAutoD) are more concerned with a broader range of applications, such as automotive engineering, civil engineering, composite material design, control engineering, dynamic system identification and optimization, financial systems, industrial equipment, mechatronic systems, steel construction, structural optimisation, and the invention of novel systems. The concept of CAutoD perhaps first appeared in 1963, in the IBM Journal of Research and Development, where a computer program was written. to search for logic circuits having certain constraints on hardware design to evaluate these logics in terms of their discriminating ability over samples of the character set they are expected to recognize. More recently, traditional CAD simulation is seen to be transformed to CAutoD by biologically-inspired machine learning, including heuristic search techniques such as evolutionary computation, and swarm intelligence algorithms. == Guiding designs by performance improvements == To meet the ever-growing demand of quality and competitiveness, iterative physical prototyping is now often replaced by 'digital prototyping' of a 'good design', which aims to meet multiple objectives such as maximised output, energy efficiency, highest speed and cost-effectiveness. The design problem concerns both finding the best design within a known range (i.e., through 'learning' or 'optimisation') and finding a new and better design beyond the existing ones (i.e., through creation and invention). This is equivalent to a search problem in an almost certainly, multidimensional (multivariate), multi-modal space with a single (or weighted) objective or multiple objectives. == Normalized objective function: cost vs. fitness == Using single-objective CAutoD as an example, if the objective function, either as a cost function J ∈ [ 0 , ∞ ) {\displaystyle J\in [0,\infty )} , or inversely, as a fitness function f ∈ ( 0 , 1 ] {\displaystyle f\in (0,1]} , where f = J 1 + J {\displaystyle f={\tfrac {J}{1+J}}} , is differentiable under practical constraints in the multidimensional space, the design problem may be solved analytically. Finding the parameter sets that result in a zero first-order derivative and that satisfy the second-order derivative conditions would reveal all local optima. Then comparing the values of the performance index of all the local optima, together with those of all boundary parameter sets, would lead to the global optimum, whose corresponding 'parameter' set will thus represent the best design. However, in practice, the optimization usually involves multiple objectives and the matters involving derivatives are a lot more complex. == Dealing with practical objectives == In practice, the objective value may be noisy or even non-numerical, and hence its gradient information may be unreliable or unavailable. This is particularly true when the problem is multi-objective. At present, many designs and refinements are mainly made through a manual trial-and-error process with the help of a CAD simulation package. Usually, such a posteriori learning or adjustments need to be repeated many times until a ‘satisfactory’ or ‘optimal’ design emerges. == Exhaustive search == In theory, this adjustment process can be automated by computerised search, such as exhaustive search. As this is an exponential algorithm, it may not deliver solutions in practice within a limited period of time. == Search in polynomial time == One approach to virtual engineering and automated design is evolutionary computation such as evolutionary algorithms. === Evolutionary algorithms === To reduce the search time, the biologically-inspired evolutionary algorithm (EA) can be used instead, which is a (non-deterministic) polynomial algorithm. The EA based multi-objective "search team" can be interfaced with an existing CAD simulation package in a batch mode. The EA encodes the design parameters (encoding being necessary if some parameters are non-numerical) to refine multiple candidates through parallel and interactive search. In the search process, 'selection' is performed using 'survival of the fittest' a posteriori learning. To obtain the next 'generation' of possible solutions, some parameter values are exchanged between two candidates (by an operation called 'crossover') and new values introduced (by an operation called 'mutation'). This way, the evolutionary technique makes use of past trial information in a similarly intelligent manner to the human designer. The EA based optimal designs can start from the designer's existing design database, or from an initial generation of candidate designs obtained randomly. A number of finely evolved top-performing candidates will represent several automatically optimized digital prototypes. There are websites that demonstrate interactive evolutionary algorithms for design. allows you to evolve 3D objects online and have them 3D printed. allows you to do the same for 2D images.

TipTop Technologies

TipTop Technologies is a real-time web and social search engine with a platform for semantic analysis of natural language. Tip-Top Search provides results capturing individual and group sentiment, opinions, and experiences there from the content of various sorts such as real-time messages from Twitter or consumer product reviews on Amazon.com. TipTop Technologies and ITC Infotech collaborated to create a search interface suitable for both enterprise and consumer applications. Tip-Top's products are part of the "emerging Web 3.0 applications which use semantic technologies to augment the underlying Web system's functionalities." Their main product is 360, an AI tool that incorporates multiple AI applications under one wing. Jonathan AlBright professor at Elon University, found videos generated by TipTop Technologies software on YouTube in his research into artificial intelligence, described it as AI-generated "fake news". Through semantic analysis of large data sets, TipTop gleaned behavioral insights from Tweets around events like Halloween, Thanksgiving, Holiday Gifting, the Super Bowl, and the Oscar Nominees for the Academy Awards coverage. Sentiment analysis, concept trend tracking, and real-time market research are other applications included in the TipTop Search product. TipTop's insight engine solves the problem of real-time data noise, and its ability to "sort the 'good tweets' from the 'bad tweets' when it comes to a product, service, or a region..." In addition, products like TipTop Shopping with customizable search widgets bring together consumer reviews, social search, and sentiment analysis enabling product comparisons across attributes like the overall value and aiding purchasing decisions through user-driven product tips and pits. TipTop Finance adds another complexity to real-time search results by incorporating corporate sentiment, company stock tickers, and social media into TipTop's existing social search platform. Additional success applying semantic technologies has been with polling, "if you compare these Gallup results with TipTop, a sentiment engine based on Twitter, the results are not way off. It does surprise you but it tells me that sentiment analysis in case of public opinion about a burning social issue or a famous personality is relatively easier." With the increasing amount of unstructured, opinion-oriented, and user-generated content available on the Web, TipTop's technology aims to make sense of all this data, and deliver it in a useful way for consumer and enterprise users alike. TipTop Technologies is a privately held company with its headquarters in the San Francisco Bay Area, and team members are located globally.

Sinewave synthesis

Sinewave synthesis, or sine wave speech, is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles. The first sinewave synthesis program (SWS) for the automatic creation of stimuli for perceptual experiments was developed by Philip Rubin at Haskins Laboratories in the 1970s. This program was subsequently used by Robert Remez, Philip Rubin, David Pisoni, and other colleagues to show that listeners can perceive continuous speech without traditional speech cues, i.e., pitch, stress, and intonation. This work paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space.