AI Chat Youtube

AI Chat Youtube — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Normalization (image processing)

    Normalization (image processing)

    In image processing, normalization is a process that changes the range of pixel intensity values, a kind of intensity mapping. Applications include photographs with poor contrast due to glare, for example. A typical case is contrast stretching. In more general fields of data processing, such as digital signal processing, it is referred to as dynamic range expansion. The purpose of dynamic range expansion in the various applications is usually to bring the image, or other type of signal, into a range that is more familiar or normal to the senses, hence the term normalization. Often, the motivation is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. For example, a newspaper will strive to make all of the images in an issue share a similar range of grayscale. Auto-normalization in image processing software typically normalizes to the full dynamic range of the number system specified in the image file format. == Definition == Normalization transforms an n-dimensional grayscale image I : { X ⊆ R n } → { Min , . . , Max } {\displaystyle I:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{Min}},..,{\text{Max}}\}} with intensity values in the range ( Min , Max ) {\displaystyle ({\text{Min}},{\text{Max}})} , into a new image I N : { X ⊆ R n } → { newMin , . . , newMax } {\displaystyle I_{N}:\{\mathbb {X} \subseteq \mathbb {R} ^{n}\}\rightarrow \{{\text{newMin}},..,{\text{newMax}}\}} with intensity values in the range ( newMin , newMax ) {\displaystyle ({\text{newMin}},{\text{newMax}})} . The linear normalization of a grayscale digital image is performed according to the formula I N = ( I − Min ) newMax − newMin Max − Min + newMin {\displaystyle I_{N}=(I-{\text{Min}}){\frac {{\text{newMax}}-{\text{newMin}}}{{\text{Max}}-{\text{Min}}}}+{\text{newMin}}} For example, if the intensity range of the image is 50 to 180 and the desired range is 0 to 255 the process entails subtracting 50 from each of pixel intensity, making the range 0 to 130. Then each pixel intensity is multiplied by 255/130, making the range 0 to 255. Normalization might also be non-linear, as the relationship between I {\displaystyle I} and I N {\displaystyle I_{N}} may not be linear. An example of non-linear normalization is when the normalization follows a sigmoid function, in which case the normalized image is computed according to the formula I N = ( newMax − newMin ) 1 1 + e − I − β α + newMin {\displaystyle I_{N}=({\text{newMax}}-{\text{newMin}}){\frac {1}{1+e^{-{\frac {I-\beta }{\alpha }}}}}+{\text{newMin}}} Where α {\displaystyle \alpha } defines the width of the input intensity range, and β {\displaystyle \beta } defines the intensity around which the range is centered. Gamma correction (log/inverse log) is also a common transformation function. === Colorspace === Intensity operations generally operate on a colorspace that maps to the human perception of lightness without intentionally changing the other properties. This can be done, for example, by operating on the L component of the CIELAB color space, or approximately by operating on the Y component of YCbCr. It is also possible to operate on each of the RGB color channels, though the result will not always make sense. == Contrast stretching == This is the most significant and essential technique of spatial-based image enhancement. The basic intent of this contrast enhancement technique is to adjust the local contrast in the image so as to bring out the clear regions or objects in the image. Low-contrast images often result from poor or non-uniform lighting conditions, a limited dynamic range of the imaging sensor, or improper settings of the lens aperture. This operation tries to change the intensity of the pixel in the image, particularly in the input image, to obtain an enhanced image. It is based on the number of techniques, namely local, global, dark and bright levels of contrast. The contrast enhancement is considered as the amount of color or gray differentiation that lies among the different features in an image. The contrast enhancement improves the quality of image by increasing the luminance difference between the foreground and background. A contrast stretching transformation can be achieved by: Stretching the dark range of input values into a wider range of output values: This involves increasing the brightness of the darker areas in the image to enhance details and improve visibility. Shifting the mid-range of input values: This involves adjusting the brightness levels of the mid-tones in the image to improve overall contrast and clarity. Compressing the bright range of input values: This process involves reducing the brightness of the brighter areas in the image to prevent overexposure resulting in a more balanced and visually appealing image. It can be described as the following piecewise funciton: I N = { s 1 r 1 I if I < r 1 s 2 − s 1 r 1 − r 2 ( I − r 1 ) if r 1 ≤ I ≤ r 2 1 − s 2 1 − r 2 ( I − r 2 ) if I > r 2 {\displaystyle I_{N}={\begin{cases}{\frac {s_{1}}{r_{1}}}I&{\text{if }}Ir_{2}\end{cases}}} Where: ( r 1 , s 1 ) {\displaystyle (r_{1},s_{1})} defines the transition point between the "dark" range to the "main" range. ( r 2 , s 2 ) {\displaystyle (r_{2},s_{2})} defines the transition point between the "main" range to the "bright" range. A typical linear stretch is obtained when ( r 1 , s 1 ) = ( r min , 0 ) {\displaystyle (r_{1},s_{1})=(r_{\text{min}},0)} and ( r 2 , s 2 ) = ( r max , 1 ) {\displaystyle (r_{2},s_{2})=(r_{\text{max}},1)} , where r min {\displaystyle r_{\text{min}}} and r max {\displaystyle r_{\text{max}}} denote the minimum and maximum levels in the source image. === Global contrast stretching === Global Contrast Stretching considers all color palate ranges at once to determine the maximum and minimum values for the entire RGB color image. This approach utilizes the combination of RGB colors to derive a single maximum and minimum value for contrast stretching across the entire image. === Local contrast stretching === Local contrast stretching (LCS) is an image enhancement method that focuses on locally adjusting each pixel's value to improve the visualization of structures within an image, particularly in both the darkest and lightest portions. It operates by utilizing sliding windows, known as kernels, which traverse the image. The central pixel within each kernel is adjusted using the following formula: I p ( x , y ) = 255 × [ I 0 ( x , y ) − m i n ] ( m a x − m i n ) {\displaystyle I_{p}(x,y)=255\times {\frac {[I_{0}(x,y)-min]}{(max-min)}}} Where: Ip(x,y) is the color level for the output pixel (x,y) after the contrast stretching process. I0(x,y) is the color level input for data pixel (x, y). max is the maximum value for color level in the input image within the selected kernel. min is the minimum value for color level in the input image within the selected kernel. A piecewise form (see above) may also be used. LCS can be applied to the three color channels of an image separately.

    Read more →
  • AI Video Generators: Free vs Paid (2026)

    AI Video Generators: Free vs Paid (2026)

    In search of the best AI video generator? An AI video generator is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI video generator slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Regular language

    Regular language

    In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to many modern regular expression engines, which are augmented with features that allow the recognition of non-regular languages). Alternatively, a regular language can be defined as a language recognised by a finite automaton. The equivalence of regular expressions and finite automata is known as Kleene's theorem (after American mathematician Stephen Cole Kleene). In the Chomsky hierarchy, regular languages are the languages generated by Type-3 grammars. == Formal definition == The collection of regular languages over an alphabet Σ is defined recursively as follows: The empty language ∅ is a regular language. For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language. If A is a regular language, A (Kleene star) is a regular language. Due to this, the empty string language {ε} is also regular. If A and B are regular languages, then A ∪ B (union) and A • B (concatenation) are regular languages. No other languages over Σ are regular. See Regular expression § Formal language theory for syntax and semantics of regular expressions. == Examples == All finite languages are regular; in particular the empty string language {ε} = ∅ is regular. Other typical examples include the language consisting of all strings over the alphabet {a, b} which contain an even number of as, or the language consisting of all strings of the form: several as followed by several bs. A simple example of a language that is not regular is the set of strings {anbn | n ≥ 0}. Intuitively, it cannot be recognized with a finite automaton, since a finite automaton has finite memory and it cannot remember the exact number of a's. Techniques to prove this fact rigorously are given below. == Equivalent formalisms == A regular language satisfies the following equivalent properties: it is the language of a regular expression (by the above definition) it is the language accepted by a nondeterministic finite automaton (NFA) it is the language accepted by a deterministic finite automaton (DFA) it can be generated by a regular grammar it is the language accepted by an alternating finite automaton it is the language accepted by a two-way finite automaton it can be generated by a prefix grammar it can be accepted by a read-only Turing machine it can be defined in monadic second-order logic (Büchi–Elgot–Trakhtenbrot theorem) it is recognized by some finite syntactic monoid M, meaning it is the preimage {w ∈ Σ | f(w) ∈ S} of a subset S of a finite monoid M under a monoid homomorphism f : Σ → M from the free monoid on its alphabet the number of equivalence classes of its syntactic congruence is finite. (This number equals the number of states of the minimal deterministic finite automaton accepting L.) Properties 10. and 11. are purely algebraic approaches to define regular languages; a similar set of statements can be formulated for a monoid M ⊆ Σ. In this case, equivalence over M leads to the concept of a recognizable language. Some authors use one of the above properties different from "1." as an alternative definition of regular languages. Some of the equivalences above, particularly those among the first four formalisms, are called Kleene's theorem in textbooks. Precisely which one (or which subset) is called such varies between authors. One textbook calls the equivalence of regular expressions and NFAs ("1." and "2." above) "Kleene's theorem". Another textbook calls the equivalence of regular expressions and DFAs ("1." and "3." above) "Kleene's theorem". Two other textbooks first prove the expressive equivalence of NFAs and DFAs ("2." and "3.") and then state "Kleene's theorem" as the equivalence between regular expressions and finite automata (the latter said to describe "recognizable languages"). A linguistically oriented text first equates regular grammars ("4." above) with DFAs and NFAs, calls the languages generated by (any of) these "regular", after which it introduces regular expressions which it terms to describe "rational languages", and finally states "Kleene's theorem" as the coincidence of regular and rational languages. Other authors simply define "rational expression" and "regular expressions" as synonymous and do the same with "rational languages" and "regular languages". Apparently, the term regular originates from a 1951 technical report where Kleene introduced regular events and explicitly welcomed "any suggestions as to a more descriptive term". Noam Chomsky, in his 1959 seminal article, used the term regular in a different meaning at first (referring to what is called Chomsky normal form today), but noticed that his finite state languages were equivalent to Kleene's regular events. == Closure properties == The regular languages are closed under various operations, that is, if the languages K and L are regular, so is the result of the following operations: the set-theoretic Boolean operations: union K ∪ L, intersection K ∩ L, and complement L, hence also relative complement K − L. the regular operations: K ∪ L, concatenation ⁠ K ∘ L {\displaystyle K\circ L} ⁠, and Kleene star L. the trio operations: string homomorphism, inverse string homomorphism, and intersection with regular languages. As a consequence they are closed under arbitrary finite state transductions, like quotient K / L with a regular language. Even more, regular languages are closed under quotients with arbitrary languages: If L is regular then L / K is regular for any K. the reverse (or mirror image) LR. Given a nondeterministic finite automaton to recognize L, an automaton for LR can be obtained by reversing all transitions and interchanging starting and finishing states. This may result in multiple starting states; ε-transitions can be used to join them. == Decidability properties == Given two deterministic finite automata A and B, it is decidable whether they accept the same language. As a consequence, using the above closure properties, the following problems are also decidable for arbitrarily given deterministic finite automata A and B, with accepted languages LA and LB, respectively: Containment: is LA ⊆ LB ? Disjointness: is LA ∩ LB = {} ? Emptiness: is LA = {} ? Universality: is LA = Σ ? Membership: given a ∈ Σ, is a ∈ LB ? For regular expressions, the universality problem is NP-complete already for a singleton alphabet. For larger alphabets, that problem is PSPACE-complete. If regular expressions are extended to allow also a squaring operator, with "A2" denoting the same as "AA", still just regular languages can be described, but the universality problem has an exponential space lower bound, and is in fact complete for exponential space with respect to polynomial-time reduction. For a fixed finite alphabet, the theory of the set of all languages – together with strings, membership of a string in a language, and for each character, a function to append the character to a string (and no other operations) – is decidable, and its minimal elementary substructure consists precisely of regular languages. For a binary alphabet, the theory is called S2S. == Complexity results == In computational complexity theory, the complexity class of all regular languages is sometimes referred to as REGULAR or REG and equals DSPACE(O(1)), the decision problems that can be solved in constant space (the space used is independent of the input size). REGULAR ≠ AC0, since it (trivially) contains the parity problem of determining whether the number of 1 bits in the input is even or odd and this problem is not in AC0. On the other hand, REGULAR does not contain AC0, because the nonregular language of palindromes, or the nonregular language { 0 n 1 n : n ∈ N } {\displaystyle \{0^{n}1^{n}:n\in \mathbb {N} \}} can both be recognized in AC0. If a language is not regular, it requires a machine with at least Ω(log log n) space to recognize (where n is the input size). In other words, DSPACE(o(log log n)) equals the class of regular languages. In practice, most nonregular problems are studied in a setting with at least logarithmic space, as this is the amount of space required to store a pointer into the input tape. == Location in the Chomsky hierarchy == To locate the regular languages in the Chomsky hierarchy, one notices that every regular language is context-free. The converse is not true: for example, the language consisting of all strings having the same number of as as bs is context-free but not regular. To prove that a language is not regular, one often uses the Myhill–Nerode theorem and the pumping lemma. Other approaches include using the closure properties of regular languages or quantifying Kolmogorov complexity. Important subclasses of regular languages include: Finite languages, those containing only a finite number of words. These are regular la

    Read more →
  • Best AI Code-review Tools in 2026

    Best AI Code-review Tools in 2026

    Looking for the best AI code-review tool? An AI code-review tool is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI code-review tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Jive (software)

    Jive (software)

    Jive (formerly known as Clearspace, then Jive SBS, then Jive Engage) is a commercial Java EE-based Enterprise 2.0 collaboration and knowledge management tool produced by Jive Software. It was first released as "Clearspace" in 2006, then renamed SBS (for "Social Business Software") in March 2009, then renamed "Jive Engage" in 2011, and renamed simply to "Jive" in 2012. Jive integrates the functionality of online communities, microblogging, social networking, discussion forums, blogs, wikis, and IM under one unified user interface. Content placed into any of the systems (blog, wiki, documentation, etc.) can be found through a common search interface. Other features include RSS capability, email integration, a reputation and reward system for participation, personal user profiles, JAX-WS web service interoperability, and integration with the Spring Framework. The product is a pure-Java server-side web application and will run on any platform where Java (JDK 1.5 or higher) is installed. It does not require a dedicated server - users have reported successful deployment in both shared environments and multiple machine clusters. As of Jive 8, released March 30, 2015, there is a Jive-n version which is for internal use (hosted by the consumer or hosted by Jive as a service) and a Jive-x version which is an external version hosted as a service. Jive no longer supports wiki markup language. == Server requirements for Jive 8-n == The following are the server requirements for Jive 8-n Operating systems: RHEL version 6 or 7 for x86_64, CentOS version 6 or 7 for x86_64 or SuSE Enterprise Linux Server (SLES) 11 and 12 for x86_64 Application Servers: Jive ships with its own embedded Apache HTTPD and Tomcat servers as part of the install package. It is not possible to deploy the application onto other appservers. Databases: MySQL (5.1, 5.5, 5.6) Oracle (11gR2, 12c) Postgres (9.0, 9.1, 9.2, 9.3, 9.4 - 9.2 or higher recommended) Microsoft SQL Server (2008R2, 2012, 2014) Environment: Jive recommends a server with at least 4GB of RAM and a dual-core 2 GHz processor with x86_64 architecture The product integrates with an LDAP repository or Active Directory For optimal deployment with a large community Jive Software recommends: using dedicated cache and document-conversion servers hosting the application and database servers separately == Releases == Jive 8, released on March 30, 2015 Jive 7, released in October 2013 Jive 9.0.x, released in November 2016 Jive 9, released in November 2016, supported now

    Read more →
  • Corpus linguistics

    Corpus linguistics

    Corpus linguistics is an empirical method for the study of language by text corpus (plural corpora). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner. The text-corpus method uses the body of texts in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have been increasingly used to compile dictionaries (starting with The American Heritage Dictionary of the English Language in 1969) and reference grammars, with A Comprehensive Grammar of the English Language, published in 1985, as a first. Experts in the field have differing views about the annotation of a corpus. These views range from John McHardy Sinclair, who advocates minimal annotation so texts speak for themselves, to the Survey of English Usage team (University College, London), who advocate annotation as allowing greater linguistic understanding through rigorous recording. == History == Some of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Prātiśākhya literature described the sound patterns of Sanskrit as found in the Vedas, and Pāṇini's grammar of classical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic grammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepared concordances to allow detailed study of the language of the Bible and other canonical texts. === English corpora === A landmark in modern corpus linguistics was the publication of Computational Analysis of Present-Day American English in 1967. Written by Henry Kučera and W. Nelson Francis, the work was based on an analysis of the Brown Corpus, which is a structured and balanced corpus of one million words of American English from the year 1961. The corpus comprises 2000 text samples, from a variety of genres. The Brown Corpus was the first computerized corpus designed for linguistic research. Kučera and Francis subjected the Brown Corpus to a variety of computational analyses and then combined elements of linguistics, language teaching, psychology, statistics, and sociology to create a rich and variegated opus. A further key publication was Randolph Quirk's "Towards a description of English Usage" in 1960 in which he introduced the Survey of English Usage. Quirk's corpus was the first modern corpus to be built with the purpose of representing the whole language. Shortly thereafter, Boston publisher Houghton-Mifflin approached Kučera to supply a million-word, three-line citation base for its new American Heritage Dictionary, the first dictionary compiled using corpus linguistics. The AHD took the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used). Other publishers followed suit. The British publisher Collins' COBUILD monolingual learner's dictionary, designed for users learning English as a foreign language, was compiled using the Bank of English. The Survey of English Usage Corpus was used in the development of one of the most important Corpus-based Grammars, which was written by Quirk et al. and published in 1985 as A Comprehensive Grammar of the English Language. The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), Australian Corpus of English (Australian English), the Frown Corpus (early 1990s American English), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and the British National Corpus, a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library. For contemporary American English, work has stalled on the American National Corpus, but the 400+ million word Corpus of Contemporary American English (1990–present) is now available through a web interface. The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project, containing one million words, which inspired Shana Poplack's much larger corpus of spoken French in the Ottawa-Hull area. === Multilingual corpora === In the 1990s, many of the notable early successes on statistical methods in natural-language programming (NLP) occurred in the field of machine translation, due especially to work at IBM Research. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. There are corpora in non-European languages as well. For example, the National Institute for Japanese Language and Linguistics in Japan has built a number of corpora of spoken and written Japanese. Sign language corpora have also been created using video data. === Ancient languages corpora === Besides these corpora of living languages, computerized corpora have also been made of collections of texts in ancient languages. An example is the Andersen-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields of information. The Quranic Arabic Corpus is an annotated corpus for the Classical Arabic language of the Quran. This is a recent project with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The Digital Corpus of Sanskrit (DCS) is a "Sandhi-split corpus of Sanskrit texts with full morphological and lexical analysis... designed for text-historical research in Sanskrit linguistics and philology." === Corpora from specific fields === Besides pure linguistic inquiry, researchers had begun to apply corpus linguistics to other academic and professional fields, such as the emerging sub-discipline of Law and Corpus Linguistics, which seeks to understand legal texts using corpus data and tools. The DBLP Discovery Dataset concentrates on computer science, containing relevant computer science publications with sentient metadata such as author affiliations, citations, or study fields. A more focused dataset was introduced by NLP Scholar, a combination of papers of the ACL Anthology and Google Scholar metadata. Corpora can also aid in translation efforts or in teaching foreign languages. == Methods == Corpus linguistics has generated a number of research methods, which attempt to trace a path from data to theory. Wallis and Nelson (2001) first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis. Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers. Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search. The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus (through corpus managers). Linguists with other interests and differing perspectives than the originators' can exploit this work. By sharing data

    Read more →
  • Finite-state transducer

    Finite-state transducer

    A finite-state transducer (FST) is a finite-state machine with two memory tapes, following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. An FST is a type of finite-state automaton (FSA) that maps between two sets of symbols. An FST is more general than an FSA. An FSA defines a formal language by defining a set of accepted strings, while an FST defines a relation between sets of strings. An FST will read a set of strings on the input tape and generate a set of relations on the output tape. An FST can be thought of as a translator or relater between strings in a set. In morphological parsing, an example would be inputting a string of letters into the FST, the FST would then output a string of morphemes. == Overview == An automaton can be said to recognize a string if we view the content of its tape as input. In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton generates strings, which means viewing its tape as an output tape. On this view, the automaton generates a formal language, which is a set of strings. The two views of automata are equivalent: the function that the automaton computes is precisely the indicator function of the set of strings it generates. The class of languages generated by finite automata is known as the class of regular languages. The two tapes of a transducer are typically viewed as an input tape and an output tape. On this view, a transducer is said to transduce (i.e., translate) the contents of its input tape to its output tape, by accepting a string on its input tape and generating another string on its output tape. It may do so nondeterministically and it may produce more than one output for each input string. A transducer may also produce no output for a given input string, in which case it is said to reject the input. In general, a transducer computes a relation between two formal languages. Each string-to-string finite-state transducer relates the input alphabet Σ to the output alphabet Γ. Relations R on Σ×Γ that can be implemented as finite-state transducers are called rational relations. Rational relations that are partial functions, i.e. that relate every input string from Σ to at most one Γ, are called rational functions. Finite-state transducers are often used for phonological and morphological analysis in natural language processing research and applications. Pioneers in this field include Ronald Kaplan, Lauri Karttunen, Martin Kay and Kimmo Koskenniemi. A common way of using transducers is in a so-called "cascade", where transducers for various operations are combined into a single transducer by repeated application of the composition operator (defined below). == Formal construction == Formally, a finite transducer T is a 6-tuple (Q, Σ, Γ, I, F, δ) such that: Q is a finite set, the set of states; Σ is a finite set, called the input alphabet; Γ is a finite set, called the output alphabet; I is a subset of Q, the set of initial states; F is a subset of Q, the set of final states; and δ ⊆ Q × ( Σ ∪ { ϵ } ) × ( Γ ∪ { ϵ } ) × Q {\displaystyle \delta \subseteq Q\times (\Sigma \cup \{\epsilon \})\times (\Gamma \cup \{\epsilon \})\times Q} (where ε is the empty string) is the transition relation. We can view (Q, δ) as a labeled directed graph, known as the transition graph of T: the set of vertices is Q, and ( q , a , b , r ) ∈ δ {\displaystyle (q,a,b,r)\in \delta } means that there is a labeled edge going from vertex q to vertex r. We also say that a is the input label and b the output label of that edge. NOTE: This definition of finite transducer is also called letter transducer (Roche and Schabes 1997); alternative definitions are possible, but can all be converted into transducers following this one. Define the extended transition relation δ ∗ {\displaystyle \delta ^{}} as the smallest set such that: δ ⊆ δ ∗ {\displaystyle \delta \subseteq \delta ^{}} ; ( q , ϵ , ϵ , q ) ∈ δ ∗ {\displaystyle (q,\epsilon ,\epsilon ,q)\in \delta ^{}} for all q ∈ Q {\displaystyle q\in Q} ; and whenever ( q , x , y , r ) ∈ δ ∗ {\displaystyle (q,x,y,r)\in \delta ^{}} and ( r , a , b , s ) ∈ δ {\displaystyle (r,a,b,s)\in \delta } then ( q , x a , y b , s ) ∈ δ ∗ {\displaystyle (q,xa,yb,s)\in \delta ^{}} . The extended transition relation is essentially the reflexive transitive closure of the transition graph that has been augmented to take edge labels into account. The elements of δ ∗ {\displaystyle \delta ^{}} are known as paths. The edge labels of a path are obtained by concatenating the edge labels of its constituent transitions in order. The behavior of the transducer T is the rational relation [T] defined as follows: x [ T ] y {\displaystyle x[T]y} if and only if there exists i ∈ I {\displaystyle i\in I} and f ∈ F {\displaystyle f\in F} such that ( i , x , y , f ) ∈ δ ∗ {\displaystyle (i,x,y,f)\in \delta ^{}} . This is to say that T transduces a string x ∈ Σ ∗ {\displaystyle x\in \Sigma ^{}} into a string y ∈ Γ ∗ {\displaystyle y\in \Gamma ^{}} if there exists a path from an initial state to a final state whose input label is x and whose output label is y. === Weighted automata === Finite State Transducers can be weighted, where each transition is labelled with a weight in addition to the input and output labels. A Weighted Finite State Transducer (WFST) over a set K of weights can be defined similarly to an unweighted one as an 8-tuple T=(Q, Σ, Γ, I, F, E, λ, ρ), where: Q, Σ, Γ, I, F are defined as above; E ⊆ Q × ( Σ ∪ { ϵ } ) × ( Γ ∪ { ϵ } ) × Q × K {\displaystyle E\subseteq Q\times (\Sigma \cup \{\epsilon \})\times (\Gamma \cup \{\epsilon \})\times Q\times K} (where ε is the empty string) is the finite set of transitions; λ : I → K {\displaystyle \lambda :I\rightarrow K} maps initial states to weights; ρ : F → K {\displaystyle \rho :F\rightarrow K} maps final states to weights. In order to make certain operations on WFSTs well-defined, it is convenient to require the set of weights to form a semiring. Two typical semirings used in practice are the log semiring and tropical semiring: nondeterministic automata may be regarded as having weights in the Boolean semiring. Two weighted FST can be composed. == Operations on finite-state transducers == The following operations defined on finite automata also apply to finite transducers: Union. Given transducers T and S, there exists a transducer T ∪ S {\displaystyle T\cup S} such that x [ T ∪ S ] y {\displaystyle x[T\cup S]y} if and only if x [ T ] y {\displaystyle x[T]y} or x [ S ] y {\displaystyle x[S]y} . Concatenation. Given transducers T and S, there exists a transducer T ⋅ S {\displaystyle T\cdot S} such that x [ T ⋅ S ] y {\displaystyle x[T\cdot S]y} if and only if there exist x 1 , x 2 , y 1 , y 2 {\displaystyle x_{1},x_{2},y_{1},y_{2}} with x = x 1 x 2 , y = y 1 y 2 , x 1 [ T ] y 1 {\displaystyle x=x_{1}x_{2},y=y_{1}y_{2},x_{1}[T]y_{1}} and x 2 [ S ] y 2 . {\displaystyle x_{2}[S]y_{2}.} Kleene closure. Given a transducer T, there might exist a transducer T ∗ {\displaystyle T^{}} with the following properties: and x [ T ∗ ] y {\displaystyle x[T^{}]y} does not hold unless mandated by (k1) or (k2). Composition. Given a transducer T on alphabets Σ and Γ and a transducer S on alphabets Γ and Δ, there exists a transducer T ∘ S {\displaystyle T\circ S} on Σ and Δ such that x [ T ∘ S ] z {\displaystyle x[T\circ S]z} if and only if there exists a string y ∈ Γ ∗ {\displaystyle y\in \Gamma ^{}} such that x [ T ] y {\displaystyle x[T]y} and y [ S ] z {\displaystyle y[S]z} . This operation extends to the weighted case. This definition uses the same notation used in mathematics for relation composition. However, the conventional reading for relation composition is the other way around: given two relations T and S, ( x , z ) ∈ T ∘ S {\displaystyle (x,z)\in T\circ S} when there exist some y such that ( x , y ) ∈ S {\displaystyle (x,y)\in S} and ( y , z ) ∈ T . {\displaystyle (y,z)\in T.} Projection to an automaton. There are two projection functions: π 1 {\displaystyle \pi _{1}} preserves the input tape, and π 2 {\displaystyle \pi _{2}} preserves the output tape. The first projection, π 1 {\displaystyle \pi _{1}} is defined as follows: Given a transducer T, there exists a finite automaton π 1 T {\displaystyle \pi _{1}T} such that π 1 T {\displaystyle \pi _{1}T} accepts x if and only if there exists a string y for which x [ T ] y . {\displaystyle x[T]y.} :The second projection, π 2 {\displaystyle \pi _{2}} is defined similarly. Determinization. Given a transducer T, we want to build an equivalent transducer that has a unique initial state and such that no two transitions leaving any state share the same input label. The powerset construction can be extended to transducers, or even weighted transducers, but sometimes fails to halt; indeed, some non-deterministic transducers do not admit equivalent

    Read more →
  • Yorick Wilks

    Yorick Wilks

    Yorick Alexander Wilks FBCS (27 October 1939 – 14 April 2023) was a British computer scientist. He was an emeritus professor of artificial intelligence at the University of Sheffield, visiting professor of artificial intelligence at Gresham College (a post created especially for him), senior research fellow at the Oxford Internet Institute, senior scientist at the Florida Institute for Human and Machine Cognition, and a member of the Epiphany Philosophers. In February 2023, Wilks joined WiredVibe as Director of AI and a Board Member, with the goal of commercializing his previous research and ideas. He remained in this role until his death, which occurred shortly before WiredVibe was acquired by AKY X, a company that continues to build on his legacy and contributions. == Biography == Wilks was born in Gerrards Cross, Buckinghamshire in England. He was educated at Torquay Boys' Grammar School, followed by Pembroke College, Cambridge, where he read Philosophy, joined the Epiphany Philosophers and obtained his Doctor of Philosophy degree (1968) under Professor R. B. Braithwaite for the thesis 'Argument and Proof'; he was an early pioneer in meaning-based approaches to the understanding of natural language content by computers. His main early contribution in the 1970s was called "Preference Semantics" (Wilks, 1973; Wilks and Fass, 1992), an algorithmic method for assigning the "most coherent" interpretation to a sentence in terms of having the maximum number of internal preferences of its parts (normally verbs or adjectives) satisfied. That early work was hand-coded with semantic entries (of the order of some hundreds) as was normal at the time, but since then has led to the empirical determinations of preferences (chiefly of English verbs) in the 1980s and 1990s. A key component of the notion of preference in semantics was that the interpretation of an utterance is not a well- or ill-formed notion, as was argued in Chomskyan approaches, such as those of Jerry Fodor and Jerrold Katz. It was rather that a semantic interpretation was the best available, even though some preferences might not be satisfied. So, in "The machine answered the question with a low whine" the agent of "answer" does not satisfy that verb's preference for a human answerer—which would cause it to be deemed ill-formed by Fodor and Katz—but is accepted as sub-optimal or metaphorical, and, now, conventional. The function of the algorithm is not to determine well-formedness at all but to make the optimal selection of word-senses to participate in the overall interpretation. Thus, in "The Pole answered..." the system will always select the human sense of the agent and not the inanimate one if it gives a more coherent interpretation overall. Preference Semantics is thus some of the earliest computational work—with programs run at Systems Development Corporation in Santa Monica in 1967 in LISP on an IBM360—in the now established field of word sense disambiguation. This approach was used in the first operational machine translation system based principally on meaning structures and built by Wilks at Stanford Artificial Intelligence Laboratory in the early 1970s (Wilks, 1973) at the same time and place as Roger Schank was applying his "Conceptual Dependency" approach to machine translation. The LISP code of Wilks' system was in The Computer Museum, Boston. Wilks was elected a fellow of the American and European Associations for Artificial Intelligence, of the British Computer Society, a member of the UK Computing Research Committee, and a permanent member of ICCL, the International Committee on Computational Linguistics. He was professor of artificial intelligence at the University of Sheffield and a senior research fellow at the Oxford Internet Institute. In 1991 he received a Defense Advanced Projects Agency grant on interlingual pragmatics-based machine translation and in 1994 he received a grant by the Engineering and Physical Sciences Research Council to investigate in the field of large-scale information extraction (LaSIE); in the following years he would obtain more grants to carry on exploring the field of information extraction (AVENTINUS, ECRAN, PASTA...). In the 1990s Wilks also became interested in modelling human-computer dialogue and the team led by David Levy and him as chief researcher won the Loebner Prize in 1997. He was the founding director of the EU funded Companions Project on creating long-term computer companions for people. At his Festschrift in 2007 at the British Computer Society in London a volume of his own papers was presented along with a volume of essays in his honour. He was awarded the Antonio Zampolli prize in honour of his lifetime work at the LREC 2008 conference on 28 May 2008, and the Lifetime Achievement Award at the ACL 2008 conference on 18 June 2008. In 2009, he was awarded the British Computer Society's Lovelace Medal, its annual award for research achievement, and was awarded the Fellowship of the Association for Computing Machinery. In 1998, Wilks became head of the Department of Computer Science of the University of Sheffield, where he had started working in the year 1993 as professor of artificial intelligence, a post he still held. In 1993 he became the founding director of the Institute of Language, Speech and Hearing (ILASH). Wilks also set up the Natural Language Processing Group of the University of Sheffield. In 1994 he (along with Rob Gaizauskas and Hamish Cunningham) designed GATE, an advanced NLP architecture that has been widely distributed. National Life Stories conducted an oral history interview (C1672/24) with Yorick Wilks in 2016 for its Science and Religion collection held by the British Library. Wilks died on 14 April 2023, at the age of 83. == Awards == Wilks received many awards: (2009) Elected Fellow of the Association for Computing Machinery (2009) Lovelace Medal by the British Computer Society (2008) Zampolli Prize (ELRA, awarded at LREC in Marrakech, Morocco) (2008) Lifetime Achievement Award (Association for Computational Linguistics, in Columbus) (2006) Visiting Professor, University of Oxford (2004) Elected to UK Computing Research Committee (2004) Elected Fellow, British Computer Society (2003) Visiting Fellow, Oxford Internet Institute (1998) Elected Fellow of European Association for Artificial Intelligence (1997) Elected Fellow, EPSRC College of Computing (1991) Visiting Fellow, Trinity Hall, Cambridge (1991) Elected Fellow of the American Association for Artificial Intelligence (1983) Royal Society Travel Fellowship (1983) Commonwealth of Australia Visiting Professor (1981) Visiting Sloan Fellow, University of California, Berkeley (1980) Invited Participant in the Nobel Symposium on Language, Stockholm (1979) NATO Senior Scientist Fellowship (1979) Visiting Sloan Fellow, Yale University (1975) SRC Senior Visiting Fellowship, University of Edinburgh == Membership == Wilks was an active member of the following associations: Association for Computational Linguistics Society for the Study of AI and Simulation of Behaviour Association for Computing Machinery Cognitive Science Society British Society for the Philosophy of Science American Association for Artificial Intelligence Aristotelian Society == Selected works == === Books === Wilks, Y. (2019) Artificial Intelligence: Modern Magic or Dangerous Future?.Icon Books. New illustrated edition, 2023, MIT Press. Wilks, Y. (2015) Machine Translation: its scope and limits. Springer Wilks, Y (ed.) (2010) Close Engagements with Artificial Companions: Key Social, Psychological and Design issues. John Benjamins; Amsterdam Wilks, Y., Brewster, C. (2009) Natural Language Processing as a Foundation of the Semantic Web. Now Press: London. Wilks, Y. (2007) Words and Intelligence I, Selected papers by Yorick Wilks. In K. Ahmad, C. Brewster & M. Stevenson (eds.), Springer: Dordrecht. Wilks, Y. (ed. and with introduction and commentaries). (2006) Language, cohesion and form: selected papers of Margaret Masterman. Cambridge: Cambridge University Press. Wilks, Y., Nirenburg, S., Somers, H. (eds.) (2003) Readings in Machine Translation. Cambridge, MA: MIT Press. Wilks, Y.(ed.). (1999) Machine Conversations. Kluwer: New York. Wilks, Y., Slator, B., Guthrie, L. (1996) Electric Words: dictionaries, computers and meanings. Cambridge, MA: MIT Press. Ballim, A., Wilks, Y. (1991) Artificial Believers. Norwood, NJ: Erlbaum. Wilks, Y.(ed.). (1990) Theoretical Issues in Natural Language Processing. Norwood, NJ: Erlbaum. Wilks, Y., Partridge, D. (eds. plus three YW chapters and an introduction). (1990) The Foundations of Artificial Intelligence: a sourcebook. Cambridge: Cambridge University Press. Wilks, Y., Sparck-Jones, K.(eds.). (1984) Automatic Natural Language Processing, paperback edition. New York: Wiley. Originally published by Ellis Horwood. Wilks, Y., Charniak, E. (eds and principal authors). (1976) Computational Semantics—an Introduction to Artificial Intelligence and

    Read more →
  • Language engineering

    Language engineering

    Language engineering involves the creation of natural language processing systems, whose cost and outputs are measurable and predictable. It is a distinct field contrasted to natural language processing and computational linguistics. A recent trend of language engineering is the use of Semantic Web technologies for the creation, archiving, processing, and retrieval of machine processable language data. Meta-Language Engineering is a proposed extension of Language Engineering first recorded in 2025, associated with the work of Delyone de Paula Canedo Filho. The term is used to designate an approach that, in addition to natural language processing, encompasses the symbolic, cognitive, and epistemological structuring of language systems.

    Read more →
  • François Chollet

    François Chollet

    François Chollet (French: [fʁɑ̃swa ʃoˈlɛ]; born 20 October 1989) is a French software engineer, artificial intelligence (AI) researcher, and former Senior Staff Engineer at Google. Chollet is the creator of the Keras deep-learning library released in 2015. His research focuses on computer vision, the application of machine learning to formal reasoning, abstraction, and how to achieve greater generality in artificial intelligence (AGI). == Education and career == In 2012, Chollet graduated with a Diplôme d'Ingénieur (Master of Engineering) from ENSTA Paris, a school of the Polytechnic Institute of Paris. In 2015, Chollet started working at Google shortly after releasing Keras. In 2019, he published the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) benchmark, which measures the ability of AI systems to solve novel reasoning problems. In 2024, Chollet launched ARC Prize, a US$1 million competition to solve the ARC-AGI benchmark. He left Google in November 2024 after more than 9 years with the company to found with Zapier co-founder Mike Knoop a new startup focused on developing AGI with program synthesis. In early 2025, Chollet announced the expansion of ARC Prize into a full-fledged non-profit foundation, to further the mission of guiding and accelerating research progress towards artificial general intelligence. == Books and publications == Chollet's research papers in artificial intelligence have been published at major conferences in the field, including the Conference on Computer Vision and Pattern Recognition (CVPR), the Conference on Neural Information Processing Systems (NeurIPS), and the International Conference on Learning Representations (ICLR). Chollet is the author of Xception: Deep Learning with Depthwise Separable Convolutions, which is among the top ten most cited papers in CVPR proceedings at more than 18,000 citations. Chollet is the author of the book Deep Learning with Python, which sold over 100,000 copies, and the co-author with Tomasz Kalinowski of Deep Learning With R. == Awards == On December 1, 2021, Chollet won the Global Swiss AI Award for breakthroughs in AI. In September 2024, Chollet was named by TIME as one of the 100 most influential people in AI.

    Read more →
  • Kurt Keutzer

    Kurt Keutzer

    Kurt Keutzer (born November 9, 1955) is an American computer scientist. == Early life and education == Kurt Keutzer grew up in Indianapolis, Indiana. He earned a bachelor's degree in mathematics from Maharishi University of Management (formerly Mararishi International University) in 1978, and a PhD in computer science from Indiana University Bloomington in 1984. == Career == Keutzer joined Bell Labs in 1984, where he worked on logic synthesis. In 1991, he joined the electronic design automation company Synopsys, where he was promoted to chief technology officer. He subsequently joined the University of California, Berkeley as a professor in 1998. His research at Berkeley has focused on the intersection of high performance computing and machine learning. Working with a number of graduate students at Berkeley, Keutzer developed FireCaffe, which scaled the training of deep neural networks to over 100 GPUs. Later, with LARS and LAMB optimizers, they scaled it to over 1000 servers. Keutzer and his students also developed deep neural networks such as SqueezeNet, SqueezeDet, and SqueezeSeg, which can run efficiently on mobile devices. Keutzer co-founded DeepScale with his PhD student Forrest Iandola in 2015, and Keutzer served as the company's chief strategy officer. The firm was focused on developing deep neural networks for advanced driver assistance systems in passenger cars. On October 1, 2019, electric vehicle manufacturer Tesla, Inc. purchased DeepScale to augment and accelerate its self-driving vehicle work. == Honors and awards == Keutzer was named a Fellow of the IEEE in 1996. Recipient of DAC Most Influential Paper (MIP) award (24th DAC, 1987) for his "Dagon: technology binding and local optimization by DAG matching” publication. == Books by Keutzer == 1988. Dwight Hill, Don Shugard, John Fishburn, and Kurt Keutzer. Algorithms and Techniques for VLSI Layout Synthesis. Springer. 1994. Srinivas Devadas, Abhijit Ghosh, and Kurt Keutzer. Logic Synthesis. McGraw-Hill. 2002. David Chinnery and Kurt Keutzer. Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design. Springer. (2nd edition appeared in 2007.) 2004. Pinhong Chen, Desmond A. Kirkpatrick, and Kurt Keutzer. Static Crosstalk-Noise Analysis: For Deep Sub-Micron Digital Designs. Springer. 2005. Matthias Gries and Kurt Keutzer. Building ASIPs: The Mescal Methodology. Springer.

    Read more →
  • Restricted Boltzmann machine

    Restricted Boltzmann machine

    A restricted Boltzmann machine (RBM) (also called a restricted Sherrington–Kirkpatrick model with external field or restricted stochastic Ising–Lenz–Little model) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs. RBMs were initially proposed under the name Harmonium by Paul Smolensky in 1986, and rose to prominence after Geoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction, classification, collaborative filtering, feature learning, topic modelling, immunology, and even many‑body quantum mechanics. They can be trained in either supervised or unsupervised ways, depending on the task. As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph: a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and there are no connections between nodes within a group. By contrast, "unrestricted" Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm. Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation. == Structure == The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a matrix of weights W {\displaystyle W} of size m × n {\displaystyle m\times n} . Each weight element ( w i , j ) {\displaystyle (w_{i,j})} of the matrix is associated with the connection between the visible (input) unit v i {\displaystyle v_{i}} and the hidden unit h j {\displaystyle h_{j}} . In addition, there are bias weights (offsets) a i {\displaystyle a_{i}} for v i {\displaystyle v_{i}} and b j {\displaystyle b_{j}} for h j {\displaystyle h_{j}} . Given the weights and biases, the energy of a configuration (pair of Boolean vectors) (v,h) is defined as E ( v , h ) = − ∑ i a i v i − ∑ j b j h j − ∑ i ∑ j v i w i , j h j {\displaystyle E(v,h)=-\sum _{i}a_{i}v_{i}-\sum _{j}b_{j}h_{j}-\sum _{i}\sum _{j}v_{i}w_{i,j}h_{j}} or, in matrix notation, E ( v , h ) = − a T v − b T h − v T W h . {\displaystyle E(v,h)=-a^{\mathrm {T} }v-b^{\mathrm {T} }h-v^{\mathrm {T} }Wh.} This energy function is analogous to that of a Hopfield network. As with general Boltzmann machines, the joint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows, P ( v , h ) = 1 Z e − E ( v , h ) {\displaystyle P(v,h)={\frac {1}{Z}}e^{-E(v,h)}} where Z {\displaystyle Z} is a partition function defined as the sum of e − E ( v , h ) {\displaystyle e^{-E(v,h)}} over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1. The marginal probability of a visible vector is the sum of P ( v , h ) {\displaystyle P(v,h)} over all possible hidden layer configurations, P ( v ) = 1 Z ∑ { h } e − E ( v , h ) {\displaystyle P(v)={\frac {1}{Z}}\sum _{\{h\}}e^{-E(v,h)}} , and vice versa. Since the underlying graph structure of the RBM is bipartite (meaning there are no intra-layer connections), the hidden unit activations are mutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations. That is, for m visible units and n hidden units, the conditional probability of a configuration of the visible units v, given a configuration of the hidden units h, is P ( v | h ) = ∏ i = 1 m P ( v i | h ) {\displaystyle P(v|h)=\prod _{i=1}^{m}P(v_{i}|h)} . Conversely, the conditional probability of h given v is P ( h | v ) = ∏ j = 1 n P ( h j | v ) {\displaystyle P(h|v)=\prod _{j=1}^{n}P(h_{j}|v)} . The individual activation probabilities are given by P ( h j = 1 | v ) = σ ( b j + ∑ i = 1 m w i , j v i ) {\displaystyle P(h_{j}=1|v)=\sigma \left(b_{j}+\sum _{i=1}^{m}w_{i,j}v_{i}\right)} and P ( v i = 1 | h ) = σ ( a i + ∑ j = 1 n w i , j h j ) {\displaystyle \,P(v_{i}=1|h)=\sigma \left(a_{i}+\sum _{j=1}^{n}w_{i,j}h_{j}\right)} where σ {\displaystyle \sigma } denotes the logistic sigmoid. The visible units of Restricted Boltzmann Machine can be multinomial, although the hidden units are Bernoulli. In this case, the logistic function for visible units is replaced by the softmax function P ( v i k = 1 | h ) = exp ⁡ ( a i k + Σ j W i j k h j ) Σ k ′ = 1 K exp ⁡ ( a i k ′ + Σ j W i j k ′ h j ) {\displaystyle P(v_{i}^{k}=1|h)={\frac {\exp(a_{i}^{k}+\Sigma _{j}W_{ij}^{k}h_{j})}{\Sigma _{k'=1}^{K}\exp(a_{i}^{k'}+\Sigma _{j}W_{ij}^{k'}h_{j})}}} where K is the number of discrete values that the visible values have. They are applied in topic modeling, and recommender systems. === Relation to other models === Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields. The graphical model of RBMs corresponds to that of factor analysis. == Training algorithm == Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set V {\displaystyle V} (a matrix, each row of which is treated as a visible vector v {\displaystyle v} ), arg ⁡ max W ∏ v ∈ V P ( v ) {\displaystyle \arg \max _{W}\prod _{v\in V}P(v)} or equivalently, to maximize the expected log probability of a training sample v {\displaystyle v} selected randomly from V {\displaystyle V} : arg ⁡ max W E [ log ⁡ P ( v ) ] {\displaystyle \arg \max _{W}\mathbb {E} \left[\log P(v)\right]} The algorithm most often used to train RBMs, that is, to optimize the weight matrix W {\displaystyle W} , is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train PoE (product of experts) models. The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update. The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows: Take a training sample v, compute the probabilities of the hidden units and sample a hidden activation vector h from this probability distribution. Compute the outer product of v and h and call this the positive gradient. From h, sample a reconstruction v' of the visible units, then resample the hidden activations h' from this. (Gibbs sampling step) Compute the outer product of v' and h' and call this the negative gradient. Let the update to the weight matrix W {\displaystyle W} be the positive gradient minus the negative gradient, times some learning rate: Δ W = ϵ ( v h T − v ′ h ′ T ) {\displaystyle \Delta W=\epsilon (vh^{\mathsf {T}}-v'h'^{\mathsf {T}})} . Update the biases a and b analogously: Δ a = ϵ ( v − v ′ ) {\displaystyle \Delta a=\epsilon (v-v')} , Δ b = ϵ ( h − h ′ ) {\displaystyle \Delta b=\epsilon (h-h')} . A Practical Guide to Training RBMs written by Hinton can be found on his homepage. == Stacked Restricted Boltzmann Machine == The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes. The usage of Stacked Boltzmann is to understand Natural languages, retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one. Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure: E = − 1 2 ∑ i , j w i j s i s j + ∑ i θ i s i {\displaystyle E=-{\frac {1}{2}}\sum _{i,j}{w_{ij}{s_{i}}{s_{j}}}+\sum _{i}{\theta _{i}}{s_{i}}} . The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with

    Read more →
  • Image tracing

    Image tracing

    In computer graphics, image tracing, raster-to-vector conversion or raster vectorization is the conversion of raster graphics into vector graphics. == Background == An image does not have any structure: it is just a collection of marks on paper, grains in film, or pixels in a bitmap. While such an image is useful, it has some limits. If the image is magnified enough, its artifacts appear. The halftone dots, film grains, and pixels become apparent. Images of sharp edges become fuzzy or jagged. See, for example, pixelation. Ideally, a vector image does not have the same problem. Edges and filled areas are represented as mathematical curves or gradients, and they can be magnified arbitrarily (though of course the final image must also be rasterized in to be rendered, and its quality depends on the quality of the rasterization algorithm for the given inputs). The task in vectorization is to convert a two-dimensional image into a two-dimensional vector representation of the image. It is not examining the image and attempting to recognize or extract a three-dimensional model that may be depicted; i.e. it is not a vision system. For most applications, vectorization also does not involve optical character recognition; characters are treated as lines, curves, or filled objects without attaching any significance to them. In vectorization, the shape of the character is preserved, so artistic embellishments remain. Vectorization is the inverse operation corresponding to rasterization, as integration is to differentiation. And, just as with these other operations, while rasterization is fairly straightforward and algorithmic, vectorization involves the reconstruction of lost information and therefore requires heuristic methods. Synthetic images such as maps, cartoons, logos, clip art, and technical drawings are suitable for vectorization. Those images could have been originally made as vector images because they are based on geometric shapes or drawn with simple curves. Continuous tone photographs (such as live portraits) are not good candidates for vectorization. The input to vectorization is an image, but an image may come in many forms such as a photograph, a drawing on paper, or one of several raster file formats. Programs that do raster-to-vector conversion may accept bitmap formats such as TIFF, BMP and PNG. The output is a vector file format. Common vector formats are SVG, DXF, EPS, EMF and AI. Vectorization can be used to update images or recover work. Personal computers often come with a simple paint program that produces a bitmap output file. These programs allow users to make simple illustrations by adding text, drawing outlines, and filling outlines with a specific color. Only the results of these operations (the pixels) are saved in the resulting bitmap; the drawing and filling operations are discarded. Vectorization can be used to recapture some of the information that was lost. Vectorization is also used to recover information that was originally in a vector format but has been lost or has become unavailable. A company may have commissioned a logo from a graphic arts firm. Although the graphics firm used a vector format, the client company may not have received a copy of that format. The company may then acquire a vector format by scanning and vectorizing a paper copy of the logo. == Process == Vectorization starts with an image. === Manual === The image can be vectorized manually. A person could look at the image, make some measurements, and then write the output file by hand. That was the case for the vectorization of a technical illustration about neutrinos. The illustration has a few geometric shapes and a lot of text; it was relatively easy to convert the shapes, and the SVG vector format allows the text (even subscripts and superscripts) to be entered easily. The original image did not have any curves (except for the text), so the conversion is straightforward. Curves make the conversion more complicated. Manual vectorization of complicated shapes can be facilitated by the tracing function built into some vector graphics editing programs. If the image is not yet in machine readable form, then it has to be scanned into a usable file format. Once there is a machine-readable bitmap, the image can be imported into a graphics editing program (such as Adobe Illustrator, CorelDRAW, or Inkscape). Then a person can manually trace the elements of the image using the program's editing features. Curves in the original image can be approximated with lines, arcs, and Bézier curves. An illustration program allows spline knots to be adjusted for a close fit. Manual vectorization is possible, but it can be tedious. Although graphics drawing programs have been around for a long time, artists may find the freehand drawing facilities awkward even when a drawing tablet is used. Instead of using a program, Pepper recommends making an initial sketch on paper. Instead of scanning the sketch and tracing it freehand in the computer, Pepper states: "Those proficient with a graphic tablet and stylus could make the following changes directly in CorelDRAW by using a scan of the sketch as an underlay and drawing over it. I prefer to use pen and ink, and a light table"; most of the final image was traced by hand in ink. Later the line-drawing image was scanned at 600 dpi, cleaned up in a paint program, and then automatically traced with a program. Once the black and white image was in the graphics program, some other elements were added and the figure was colored. Similarly, Ploch recreated a design from a digital photograph. The JPEG was imported and some "basic shapes" were traced by hand and colored in the graphics drawing program; more complex shapes were handled differently. Ploch used a bitmap editor to remove the background and crop the more complex image components. He then printed the image and traced it by hand onto tracing paper to get a clean black and white line drawing. That drawing was scanned and then vectorized with a program. === Automatic === Some programs automate the vectorization process. Example programs are Adobe Illustrator, Inkscape, Corel's PowerTRACE, and Potrace. Some of these programs have a command line interface while others are interactive that allow the user to adjust the conversion settings and view the result. Adobe Streamline is not only an interactive program, but it also allows a user to manually edit the input bitmap and the output curves. Corel's PowerTRACE is accessed through CorelDRAW; CorelDRAW can be used to modify the input bitmap and edit the output curves. Adobe Illustrator has a facility to trace individual curves. Automated programs can have mixed results. A program (PowerTRACE) was used to convert a PNG map to SVG. The program did a good job on the map boundaries (the most tedious task in the tracing) and the settings dropped out all the text (small objects). The text was manually re-inserted. Other conversions may not go as well. The results depend on having high-quality scans, reasonable settings, and good algorithms. Scanned images often have a lot of noise, which can require additional work to clean up. == Options == There are many different image styles and possibilities, and no single vectorization method works well on all images. Consequently, vectorization programs have many options that influence the result. One issue is what the predominant shapes are. If the image is of a fill-in form, then it will probably have just vertical and horizontal lines of a constant width. The program's vectorization should take that into account. On the other hand, a CAD drawing may have lines at any angle, there may be curved lines, and there may be several line weights (thick for objects and thin for dimension lines). Instead of (or in addition to) curves, the image may contain outlines filled with the same color. Adobe Streamline allows users to select a combination of line recognition (horizontal and vertical lines), centerline recognition, or outline recognition. Streamline also allows small outline shapes to be thrown out; the notion is such small shapes are noise. The user may set the noise level between 0 and 1000; an outline that has fewer pixels than that setting is discarded. Another issue is the number of colors in the image. Even images that were created as black on white drawings may end up with many shades of gray. Some line-drawing routines employ anti-aliasing; a pixel completely covered by the line will be black, but a pixel that is only partially covered will be gray. If the original image is on paper and is scanned, there is a similar result: edge pixels will be gray. Sometimes images are compressed (e.g., JPEG images), and the compression will introduce gray levels. Many of the vectorization programs will group same-color pixels into lines, curves, or outlined shapes. If each possible color is grouped into its object, there can be an enormous number of objects. Instead, the user is asked to s

    Read more →
  • Is an AI Sales Assistant Worth It in 2026?

    Is an AI Sales Assistant Worth It in 2026?

    Shopping for the best AI sales assistant? An AI sales assistant is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI sales assistant slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Emma Brunskill

    Emma Brunskill

    Emma Patricia Brunskill is an American computer scientist. Her research combines machine learning with human–computer interaction by studying the effects of AI systems in human-centered applications including educational software and healthcare, and the theory of reinforcement learning in situations where mistakes impose high risks or costs. She is an associate professor of computer science at Stanford University, where she also holds a courtesy appointment in the Stanford Graduate School of Education and is an affiliate of the King Center on Global Development. == Education and career == Brunskill grew up in Seattle and Edmonds, Washington, and entered the University of Washington at age 15. She graduated magna cum laude in 2000, with a bachelor's degree in computer engineering and physics. A Rhodes Scholarship took her to Magdalen College, Oxford in England, where she received a master's degree in neuroscience in 2002. After a summer working in Rwanda, she became a graduate student of computer science at the Massachusetts Institute of Technology, where she completed her Ph.D. in 2009. Her doctoral dissertation, Compact parametric models for efficient sequential decision making in high-dimensional, uncertain domains, was supervised by Nicholas Roy. After working as an NSF Postdoctoral Research Fellow at the University of California, Berkeley, she joined Carnegie Mellon University (CMU) in 2011 as an assistant professor of computer science. She moved from CMU to Stanford University in 2017. == Recognition == Brunskill was a 2014 recipient of the National Science Foundation CAREER Award and a 2015 recipient of the Office of Naval Research Young Investigator Award. She was one of two alumni of the University of Washington's Paul G. Allen School of Computer Science and Engineering to be honored in 2020 by the school's Alumni Impact Awards. She was elected as a Fellow of the Association for the Advancement of Artificial Intelligence in 2025, "for significant contributions to the field of reinforcement learning, and applications for societal benefit, in particular AI for education".

    Read more →