AI For Business Edinburgh

AI For Business Edinburgh — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Intelligent control

    Intelligent control

    Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms. == Overview == Intelligent control can be divided into the following major sub-domains: Neural network control Machine learning control Reinforcement learning Bayesian control Fuzzy control Neuro-fuzzy control Expert Systems Genetic control New control techniques are created continuously as new models of intelligent behavior are created and computational methods developed to support them. === Neural network controller === Neural networks have been used to solve problems in almost all spheres of science and technology. Neural network control basically involves two steps: System identification Control It has been shown that a feedforward network with nonlinear, continuous and differentiable activation functions have universal approximation capability. Recurrent networks have also been used for system identification. Given, a set of input-output data pairs, system identification aims to form a mapping among these data pairs. Such a network is supposed to capture the dynamics of a system. For the control part, deep reinforcement learning has shown its ability to control complex systems. === Bayesian controllers === Bayesian probability has produced a number of algorithms that are in common use in many advanced control systems, serving as state space estimators of some variables that are used in the controller. The Kalman filter and the Particle filter are two examples of popular Bayesian control components. The Bayesian approach to controller design often requires an important effort in deriving the so-called system model and measurement model, which are the mathematical relationships linking the state variables to the sensor measurements available in the controlled system. In this respect, it is very closely linked to the system-theoretic approach to control design.

    Read more →
  • Top 10 AI Text-to-video Tools Compared (2026)

    Top 10 AI Text-to-video Tools Compared (2026)

    Trying to pick the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Top 10 AI Copywriting Tools Compared (2026)

    Top 10 AI Copywriting Tools Compared (2026)

    In search of the best AI copywriting tool? An AI copywriting tool is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI copywriting tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Project Bergamot

    Project Bergamot

    Project Bergamot is a joint project between several European universities and Mozilla for the development of machine translation software based on artificial neural networks, which is intended for local execution on end-user devices. The software library that was created and the associated language models were made available to the general public as Free Software. Execution requires a x86 CPU with SSE4.1 instruction set extensions. In 2022, Devin Coldewey of TechCrunch judged the translation quality to be "more than adequate", but considered Firefox Translations to be not yet fully mature. == Usage == Mozilla used the Bergamot Translator to expand its web browser Firefox with a feature for translating web pages, which was previously considered an important gap in Firefox' feature set. It is often compared to the much older corresponding feature in Google Chrome, which utilizes a cloud-based background service. In contrast, Firefox Translations does not require any data to leave the user's computer, resulting in advantages in terms of data protection, availability and possibly response times. There is just the installation of a new language model that needs to take place the first time a new language is encountered. Greater independence from large technology companies and their interests is also mentioned as an important advantage. Mozilla thus strengthened its position as an alternative software vendor with a particular focus on data protection and security. Mozilla followed up with the similar feature of speech recognition for spoken user input, based on whisperfile. On the other hand, slow translation times have been observed, especially on older devices. Also, Firefox Translations initially supported far fewer language pairs than other major translation services and is only gradually adding new models. On that matter, the training pipeline is also made available to interested parties to enable the creation of missing language models. TranslateLocally is a Firefox-independent translation software based on the Bergamot Translator. It is also available as an (Electron-based) standalone application or as an extension for Chromium-based web browsers. == History == Mozilla had already tried to get a (cloud-based) web content translation feature into Firefox a few years before Project Bergamot, but had failed because of the financial challenge. Microsoft had already delivered offline capabilities for its translation software in 2018. Google soon followed suit, Apple two years later. The software is based on the free translation framework Marian, which the University of Edinburgh had previously developed in cooperation with Microsoft, and is itself based on the Nematus toolkit that was presented in 2017. Under the leadership of the University of Edinburgh, a development consortium was formed with the Mozilla Corporation and the additional European universities of Prague, Sheffield and Tartu. In 2018, it was able to get 3 million euros of funding from the EU's Horizon 2020 programme. Firefox Translations was initially provided as an add-on. A first functional demonstration prototype was presented in October 2019. Beta version 117 had the feature integrated directly into the browser, the official release was in version 118 from September 2023. Both the add-on module and as part of Firefox, the code and the models are subject to the version 2 of the Mozilla Public License. Since 2022, the EU-funded HPLT project creates new language models. It involves additional partners, including the universities of Helsinki, Turku, Oslo and other partners from Spain, Norway and the Czech Republic.

    Read more →
  • STIT logic

    STIT logic

    STIT logic (from seeing to it that) is a family of modal and branching-time logics for reasoning about agency and choice. A typical STIT operator has the form [ i s t i t : φ ] {\displaystyle [i\ {\mathsf {stit}}:\varphi ]} , usually read as "agent i {\displaystyle i} sees to it that φ {\displaystyle \varphi } ", and is interpreted in models where agents choose between alternative possible futures. STIT logics are used in action theory, deontic logic, epistemic logic, and the theory of intelligent agents to formalise notions such as "could have done otherwise", responsibility, joint action, and strategic ability in an indeterministic world. == Etymology == The acronym STIT comes from the English phrase "seeing to it that", introduced in influential work by Nuel Belnap and Michael Perloff on the logical analysis of agentive expressions. In this tradition, "to see to it that φ {\displaystyle \varphi } " is treated as a primitive agency operator, rather than being reduced to ordinary modal necessity. == History == Modern STIT logic arose in the 1980s in the context of branching-time semantics and formal theories of agency. Belnap and Perloff's article "Seeing to it that: A canonical form for agentives" introduced the idea of treating expressions of the form "agent i sees to it that φ" as a primitive modal operator, and analysed such sentences using a branching tree of moments and histories. This approach was further developed in a series of papers on indeterminism and agency and provided the conceptual core for later STIT formalisms. In the 1990s the basic formal systems of STIT logic were worked out. Horty and Belnap's influential paper on the deliberative STIT operator distinguished between a "Chellas" STIT that merely records the result of an agent's present choice and a "deliberative" STIT that requires the agent's choice to make a difference, and connected STIT with issues of action, omission, ability and obligation. Around the same time, Ming Xu proved completeness and decidability results for basic STIT systems, including a single-agent logic with Kripke-style semantics and axiomatizations for multi-agent deliberative STIT, thereby establishing STIT as a well-behaved normal modal framework. This early work was systematised in Belnap, Perloff and Xu's monograph Facing the Future: Agents and Choices in Our Indeterminist World, which presents a general branching-time semantics for individual and group STIT operators, discusses independence-of-agents conditions and articulates the metaphysical picture of an indeterministic "tree" of moments. At roughly the same time, Horty's book Agency and Deontic Logic developed deontic STIT logics in which obligations are tied to agents' available choices rather than to static states of affairs, and used the resulting systems to analyse "ought implies can", contrary-to-duty obligations and deontic paradoxes. These works helped to position STIT at the intersection of action theory, temporal logic and deontic logic. From the late 1990s and 2000s onward, STIT logics were combined with epistemic, temporal and strategic modalities. Broersen introduced complete STIT logics for knowledge and action and deontic-epistemic STIT systems that distinguish different modes of mens rea, with applications to responsibility and the specification of multi-agent systems. Work on group and coalitional agency investigated axiomatisations and complexity results for group STIT logics, and related STIT-based analyses of agency to coalition logic and alternating-time temporal logic (ATL) by exhibiting formal embeddings between the frameworks. Explicit temporal operators were added to STIT in so-called temporal STIT logics. Lorini proposed a temporal STIT with "next" and "until" operators along histories and showed how it can be applied to normative reasoning about ongoing behaviour and commitments. Ciuni and Lorini compared different semantics for temporal STIT, clarifying the relationships between branching-time, game-based and epistemic approaches, while Boudou and Lorini gave a semantics for temporal STIT based on concurrent game structures, thus strengthening links with standard models of multi-agent interaction used for ATL and strategy logic. In parallel, complexity-theoretic work by Balbiani, Herzig and Troquard and by Schwarzentruber and co-authors investigated the satisfiability and model-checking problems for various STIT fragments, showing for instance that many expressive group STIT logics are undecidable or of high computational complexity. In the 2010s, STIT ideas were combined with justification logic, imagination operators and refined deontic notions. Justification STIT logics, developed by Olkhovikov and others, merge explicit justifications with STIT-style agency so that producing a proof can itself be treated as an action that brings about knowledge, and they come with completeness and decidability results. Olkhovikov and Wansing introduced STIT imagination logics, together with axiomatic systems and tableau calculi, to model acts of voluntary imagining and their role in doxastic control. Other authors have proposed STIT-based logics of responsibility, blameworthiness and intentionality for use in philosophical and AI settings. Xu's survey article "Combinations of STIT with Ought and Know" (2015) reviews many of these developments and emphasises the interplay between deontic and epistemic STIT logics. Current research on STIT focuses on proof theory, automated reasoning and richer expressive resources. Lyon and van Berkel, building on earlier work on labelled calculi for STIT, have developed cut-free sequent systems and proof-search algorithms that yield syntactic decision procedures for a range of deontic and non-deontic multi-agent STIT logics and support applications such as duty checking and compliance checking in autonomous systems. Sawasaki has proposed first-order cstit-based STIT logics that can distinguish de re and de dicto readings of agency statements and has proved strong completeness results for Hilbert systems over finite models, moving the STIT programme beyond the purely propositional level. Further work investigates interpreted-system and computationally grounded semantics for STIT and its extensions in order to model the behaviour of autonomous agents in multi-agent settings, and proposes STIT-based semantics for epistemic notions based on patterns of information disclosure in interactive systems. == Branching-time semantics == STIT logics are usually interpreted over branching-time models. A standard STIT frame consists of: a non-empty set of moments T {\displaystyle T} , partially ordered by < {\displaystyle <} so that ( T , < ) {\displaystyle (T,<)} forms a tree (every pair of moments with a common predecessor has a greatest lower bound); a set of histories, each history being a maximal linearly ordered subset of T {\displaystyle T} ; a non-empty set of agents A g {\displaystyle Ag} ; for each agent i ∈ A g {\displaystyle i\in Ag} and moment m {\displaystyle m} , a choice function c h o i c e i m {\displaystyle {\mathsf {choice}}_{i}^{m}} that partitions the set of histories passing through m {\displaystyle m} into choice cells. The idea is that a moment represents a time at which choices are made, and histories represent complete possible future courses of events. At each moment, each agent's choice corresponds to selecting one of the available cells of histories determined by their choice function. Formulas are evaluated at pairs ( m , h ) {\displaystyle (m,h)} of a moment and a history through that moment (sometimes written m / h {\displaystyle m/h} ). A valuation assigns truth-values to atomic propositions at such indices; Boolean connectives are interpreted pointwise as in Kripke-style modal logic. == Chellas and deliberative STIT operators == Several STIT operators have been distinguished in the literature. A common approach uses two closely related operators, often called Chellas STIT and deliberative STIT. Let H m {\displaystyle H_{m}} be the set of histories passing through a moment m {\displaystyle m} , and write H m {\displaystyle H_{m}} ⟦ φ ⟧ m = { h ∈ H m ∣ M , m / h ⊨ φ } {\displaystyle {\text{⟦}}\varphi {\text{⟧}}_{m}=\{h\in H_{m}\mid M,m/h\models \varphi \}} for the set of histories at m {\displaystyle m} where φ {\displaystyle \varphi } holds. The Chellas STIT operator, often written [ i c s t i t : φ ] {\displaystyle [i\ {\mathsf {cstit}}:\varphi ]} , is given by M , m / h ⊨ [ i c s t i t : φ ] iff c h o i c e i m ( h ) ⊆ ⟦ φ ⟧ m . {\displaystyle M,m/h\models [i\ {\mathsf {cstit}}:\varphi ]\quad {\text{iff}}\quad {\mathsf {choice}}_{i}^{m}(h)\subseteq {\text{⟦}}\varphi {\text{⟧}}_{m}.} Intuitively, agent i {\displaystyle i} sees to it that φ {\displaystyle \varphi } if φ {\displaystyle \varphi } holds at all histories compatible with their present choice. The deliberative STIT operator, [ i d s t i t : φ ] {\displaystyle [i\ {\mathsf {dstit}}:\varphi ]} , adds

    Read more →
  • Simon Godsill

    Simon Godsill

    Simon John Godsill (born 2 December 1965) is professor of statistical signal processing at the University of Cambridge, and a professorial fellow at Corpus Christi College. He is also a member of the Centre for Science and Policy. His main area of research is Bayesian statistics and stochastic sampling methodologies, particularly particle filtering. == Education == Godsill obtained both undergraduate and Ph.D. degrees from the Department of Engineering at Cambridge University, whilst a member of Selwyn College. He obtained a first class degree in the Electrical and Information Sciences Tripos. The title of his 1993 Ph.D. thesis was "The Restoration of Degraded Audio Signals" and his Ph.D. supervisor was Peter Rayner, whom he shared with Michael Richard Lynch. == Career == Godsill has published over 250 articles in peer reviewed journals, along with the books Digital audio restoration: a statistical model based approach and Compressed sensing & sparse filtering. == Business interests == Godsill is currently a director of CEDAR Audio Ltd, a Cambridge-based company that applies Bayesian mathematics for purposes of noise reduction in audio data. In February 2005, the company received a Sci-Tech Academy Award (a 'Technical Oscar') for its services to the movie industry, and a stream of innovations appeared over the following years with corresponding recognition including induction into the Audio Technology Hall of Fame (2008), a Cinema Audio Society Award (2009). Godsill is also a director at Input Dynamics Ltd, a Cambridge-based company that applies Bayesian techniques to touch screen technology. Godsill is involved with the research effort at BMLL Technologies, a Cambridge spin-off working in the field of machine learning application in the financial sector.

    Read more →
  • Probabilistic automaton

    Probabilistic automaton

    In mathematics and computer science, the probabilistic automaton (PA) is a generalization of the nondeterministic finite automaton; it includes the probability of a given transition into the transition function, turning it into a transition matrix. Thus, the probabilistic automaton also generalizes the concepts of a Markov chain and of a subshift of finite type. The languages recognized by probabilistic automata are called stochastic languages; these include the regular languages as a subset. The number of stochastic languages is uncountable. The concept was introduced by Michael O. Rabin in 1963; a certain special case is sometimes known as the Rabin automaton (not to be confused with the subclass of ω-automata also referred to as Rabin automata). In recent years, a variant has been formulated in terms of quantum probabilities, the quantum finite automaton. == Informal Description == For a given initial state and input character, a deterministic finite automaton (DFA) has exactly one next state, and a nondeterministic finite automaton (NFA) has a set of next states. A probabilistic automaton (PA) instead has a weighted set (or vector) of next states, where the weights must sum to 1 and therefore can be interpreted as probabilities (making it a stochastic vector). The notions states and acceptance must also be modified to reflect the introduction of these weights. The state of the machine as a given step must now also be represented by a stochastic vector of states, and a state accepted if its total probability of being in an acceptance state exceeds some cut-off. A PA is in some sense a half-way step from deterministic to non-deterministic, as it allows a set of next states but with restrictions on their weights. However, this is somewhat misleading, as the PA utilizes the notion of the real numbers to define the weights, which is absent in the definition of both DFAs and NFAs. This additional freedom enables them to decide languages that are not regular, such as the p-adic languages with irrational parameters. As such, PAs are more powerful than both DFAs and NFAs (which are famously equally powerful). == Formal Definition == The probabilistic automaton may be defined as an extension of a nondeterministic finite automaton ( Q , Σ , δ , q 0 , F ) {\displaystyle (Q,\Sigma ,\delta ,q_{0},F)} , together with two probabilities: the probability P {\displaystyle P} of a particular state transition taking place, and with the initial state q 0 {\displaystyle q_{0}} replaced by a stochastic vector giving the probability of the automaton being in a given initial state. For the ordinary non-deterministic finite automaton, one has a finite set of states Q {\displaystyle Q} a finite set of input symbols Σ {\displaystyle \Sigma } a transition function δ : Q × Σ → ℘ ( Q ) {\displaystyle \delta :Q\times \Sigma \to \wp (Q)} a set of states F {\displaystyle F} distinguished as accepting (or final) states F ⊆ Q {\displaystyle F\subseteq Q} . Here, ℘ ( Q ) {\displaystyle \wp (Q)} denotes the power set of Q {\displaystyle Q} . By use of currying, the transition function δ : Q × Σ → ℘ ( Q ) {\displaystyle \delta :Q\times \Sigma \to \wp (Q)} of a non-deterministic finite automaton can be written as a membership function δ : Q × Σ × Q → { 0 , 1 } {\displaystyle \delta :Q\times \Sigma \times Q\to \{0,1\}} so that δ ( q , a , q ′ ) = 1 {\displaystyle \delta (q,a,q^{\prime })=1} if q ′ ∈ δ ( q , a ) {\displaystyle q^{\prime }\in \delta (q,a)} and 0 {\displaystyle 0} otherwise. The curried transition function can be understood to be a matrix with matrix entries [ θ a ] q q ′ = δ ( q , a , q ′ ) {\displaystyle \left[\theta _{a}\right]_{qq^{\prime }}=\delta (q,a,q^{\prime })} The matrix θ a {\displaystyle \theta _{a}} is then a square matrix, whose entries are zero or one, indicating whether a transition q → a q ′ {\displaystyle q{\stackrel {a}{\rightarrow }}q^{\prime }} is allowed by the NFA. Such a transition matrix is always defined for a non-deterministic finite automaton. The probabilistic automaton replaces these matrices by a family of right stochastic matrices P a {\displaystyle P_{a}} , for each symbol a in the alphabet Σ {\displaystyle \Sigma } so that the probability of a transition is given by [ P a ] q q ′ {\displaystyle \left[P_{a}\right]_{qq^{\prime }}} A state change from some state to any state must occur with probability one, of course, and so one must have ∑ q ′ [ P a ] q q ′ = 1 {\displaystyle \sum _{q^{\prime }}\left[P_{a}\right]_{qq^{\prime }}=1} for all input letters a {\displaystyle a} and internal states q {\displaystyle q} . The initial state of a probabilistic automaton is given by a row vector v {\displaystyle v} , whose components are the probabilities of the individual initial states q {\displaystyle q} , that add to 1: ∑ q [ v ] q = 1 {\displaystyle \sum _{q}\left[v\right]_{q}=1} The transition matrix acts on the right, so that the state of the probabilistic automaton, after consuming the input string a b c {\displaystyle abc} , would be v P a P b P c {\displaystyle vP_{a}P_{b}P_{c}} In particular, the state of a probabilistic automaton is always a stochastic vector, since the product of any two stochastic matrices is a stochastic matrix, and the product of a stochastic vector and a stochastic matrix is again a stochastic vector. This vector is sometimes called the distribution of states, emphasizing that it is a discrete probability distribution. Formally, the definition of a probabilistic automaton does not require the mechanics of the non-deterministic automaton, which may be dispensed with. Formally, a probabilistic automaton PA is defined as the tuple ( Q , Σ , P , v , F ) {\displaystyle (Q,\Sigma ,P,v,F)} . A Rabin automaton is one for which the initial distribution v {\displaystyle v} is a coordinate vector; that is, has zero for all but one entries, and the remaining entry being one. == Stochastic languages == The set of languages recognized by probabilistic automata are called stochastic languages. They include the regular languages as a subset. Let F = Q accept ⊆ Q {\displaystyle F=Q_{\text{accept}}\subseteq Q} be the set of "accepting" or "final" states of the automaton. By abuse of notation, Q accept {\displaystyle Q_{\text{accept}}} can also be understood to be the column vector that is the membership function for Q accept {\displaystyle Q_{\text{accept}}} ; that is, it has a 1 at the places corresponding to elements in Q accept {\displaystyle Q_{\text{accept}}} , and a zero otherwise. This vector may be contracted with the internal state probability, to form a scalar. The language recognized by a specific automaton is then defined as L η = { s ∈ Σ ∗ | v P s Q accept > η } {\displaystyle L_{\eta }=\{s\in \Sigma ^{}\vert vP_{s}Q_{\text{accept}}>\eta \}} where Σ ∗ {\displaystyle \Sigma ^{}} is the set of all strings in the alphabet Σ {\displaystyle \Sigma } (so that is the Kleene star). The language depends on the value of the cut-point η {\displaystyle \eta } , normally taken to be in the range 0 ≤ η < 1 {\displaystyle 0\leq \eta <1} . A language is called η-stochastic if and only if there exists some PA that recognizes the language, for fixed η {\displaystyle \eta } . A language is called stochastic if and only if there is some 0 ≤ η < 1 {\displaystyle 0\leq \eta <1} for which L η {\displaystyle L_{\eta }} is η-stochastic. A cut-point is said to be an isolated cut-point if and only if there exists a δ > 0 {\displaystyle \delta >0} such that | v P ( s ) Q accept − η | ≥ δ {\displaystyle \vert vP(s)Q_{\text{accept}}-\eta \vert \geq \delta } for all s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} == Properties == Every regular language is stochastic, and more strongly, every regular language is η-stochastic. A weak converse is that every 0-stochastic language is regular; however, the general converse does not hold: there are stochastic languages that are not regular. Every η-stochastic language is stochastic, for some 0 < η < 1 {\displaystyle 0<\eta <1} . Every stochastic language is representable by a Rabin automaton. If η {\displaystyle \eta } is an isolated cut-point, then L η {\displaystyle L_{\eta }} is a regular language. == p-adic languages == The p-adic languages provide an example of a stochastic language that is not regular, and also show that the number of stochastic languages is uncountable. A p-adic language is defined as the set of strings L η ( p ) = { n 1 n 2 n 3 … | 0 ≤ n k < p and 0. n 1 n 2 n 3 … > η } {\displaystyle L_{\eta }(p)=\{n_{1}n_{2}n_{3}\ldots \vert 0\leq n_{k}\eta \}} in the letters 0 , 1 , 2 , … , ( p − 1 ) {\displaystyle 0,1,2,\ldots ,(p-1)} . That is, a p-adic language is merely the set of real numbers in [0, 1], written in base-p, such that they are greater than η {\displaystyle \eta } . It is straightforward to show that all p-adic languages are stochastic. In particular, this implies that the number of stochastic languages is uncountable. A p-adic

    Read more →
  • Cortana (virtual assistant)

    Cortana (virtual assistant)

    Cortana is a discontinued virtual assistant developed by Microsoft that used the Bing search engine to perform tasks such as setting reminders and answering questions for users. Cortana was available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending on the software platform and region in which it was used. In 2019, Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations. It was split from the Windows 10 search bar in April 2019. In January 2020, the Cortana mobile app was removed from certain markets, and on March 31, 2021, the Cortana mobile app was shut down globally. On June 2, 2023, Microsoft announced that support for the Cortana standalone app on Microsoft Windows would end in late 2023 and would be replaced by Microsoft Copilot, an AI chatbot. Support for Cortana in the Microsoft Outlook and Microsoft 365 mobile apps was discontinued in fall of 2023. == History == === Beginnings (2009–2014) === The development of Cortana started in 2009 in the Microsoft Speech products team with general manager Zig Serafin and Chief Scientist Larry Heck. Heck and Serafin established the vision, mission, and long-range plan for Microsoft's digital personal assistant and they built a team with the expertise to create the initial prototypes for Cortana. Some of the key researchers in these early efforts included Microsoft Research researchers Dilek Hakkani-Tür, Gokhan Tur, Andreas Stolcke, and Malcolm Slaney, research software developer Madhu Chinthakunta, and user experience designer Lisa Stifelman. To develop the Cortana digital assistant, the team interviewed human personal assistants. The interviews inspired a number of unique features in Cortana, including the assistant's "notebook" feature. Originally, Cortana was meant to be only a codename, but a petition on Windows Phone's UserVoice site proved to be popular and made the codename official. Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows. It was named after Cortana, a synthetic intelligence character in Microsoft's Halo video game franchise originating in Bungie folklore, with Jen Taylor, the character's voice actress, returning to voice the personal assistant's US-specific version. === Expansion (2015–2018) === In January 2015, Microsoft announced the availability of Cortana for Windows 10 desktops and mobile devices as part of merging Windows Phone into the operating system at large. On May 26, 2015, Microsoft announced that Cortana would also be available on other mobile platforms. An Android release was set for July 2015, but the Android APK file containing Cortana was leaked ahead of its release. It was officially released, along with an iOS version, in December 2015. During E3 2015, Microsoft announced that Cortana would come to the Xbox One as part of a universally designed Windows 10 update for the console. Microsoft integrated Cortana into numerous products such as Microsoft Edge. Microsoft's Cortana assistant was deeply integrated into the browser. Cortana was able to find opening hours when on restaurant sites, show retail coupons for websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana integration with products such as GigJam. Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company. Microsoft's "Windows in the car" concept included Cortana. The concept makes it possible for drivers to make restaurant reservations and see places before they go there. At Microsoft Build 2016, Microsoft announced plans to integrate Cortana into Skype (Microsoft's video-conferencing and instant messaging service) as a bot to allow users to order food, book trips, transcribe video messages and make calendar appointments through Cortana in addition to other bots. As of 2016, Cortana was able to underline certain words and phrases in Skype conversations that relate to contacts and corporations. A writer from Engadget has criticised the Cortana integration in Skype for responding only to very specific keywords, feeling as if she was "chatting with a search engine" due to the impersonal way the bots replied to certain words such as "Hello" causing the Bing Music bot to bring up Adele's song of that name. Microsoft also announced at Microsoft Build 2016 that Cortana would be able to cloud-synchronise notifications between Windows 10 Mobile's and Windows 10's Action Center, as well as notifications from Android devices. In December 2016, Microsoft announced the preview of Calendar.help, a service that enabled people to delegate the scheduling of meetings to Cortana. Users interact with Cortana by including her in email conversations. Cortana would then check people's availability in Outlook Calendar or Google Calendar, and work with others Cc'd on the email to schedule the meeting. The service relied on automation and human-based computation. In May 2017, Microsoft announced INVOKE, a voice-activated speaker featuring Cortana, in collaboration with Harman Kardon. The premium speaker has a cylindrical design and offers 360-degree sound, the ability to make and receive calls with Skype, and all of the other features currently available with Cortana. In 2017, Microsoft partnered with Amazon to integrate Echo and Cortana with each other, allowing users of each smart assistant to summon the other via a command. This feature preview was released in August 2018. Windows 10 users were able to just say "Hey Cortana, open Alexa" and Echo users were able to say "Alexa, open Cortana" to summon the other assistant. === Decreasing focus and discontinuation (2019–2024) === In January 2019, Microsoft CEO Satya Nadella stated that he no longer saw Cortana as a direct competitor against Alexa and Siri. Shortly thereafter, Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations. It was split from the Windows 10 search bar in April 2019. In January 2020, the Cortana mobile app was removed from certain markets, and then, on July 24, 2020, Cortana was removed from the Xbox dashboard as part of a redesign. On January 31, 2021, Microsoft removed the Cortana mobile application in many markets, including the UK, Australia, Germany, Mexico, China, Spain, Canada, and India. On March 31, 2021, Microsoft shut down the Cortana apps globally for iOS and Android and removed the apps entirely from their corresponding app stores. To access previously recorded content, users had to use Cortana on Windows 10 or other specialized Microsoft applications. Microsoft also reduced emphasis on Cortana in Windows with the 2021 release of Windows 11. Cortana was not used during the device setup process or pinned to the taskbar by default. On June 2, 2023, Microsoft announced the Cortana standalone app on Windows 10 and Windows 11 which would shut down later in the year. In its support article, Microsoft listed several alternatives, most of which have since been rebranded as Microsoft Copilot. They also added that the change would not impact Cortana in Office 365 and Teams environments. On August 11, 2023, Microsoft updated the Cortana standalone app in Windows, informing that it was deprecated and can no longer be used. Microsoft's support article announcing the deprecation of Cortana was updated to reflect this change. Along with the deprecation of the standalone app, it was announced that Cortana support in Teams mobile, Microsoft Teams displays, and Teams rooms would end in late 2023. The support article states that Cortana in the “Play my emails” feature of the Microsoft Outlook mobile app would continue to be available. Later in June 2024, the support article was updated, stating that Cortana in the voice search and the "Play my emails" feature is now removed from the Microsoft Outlook mobile app, officially marking the discontinuation of Cortana across all Microsoft products. On May 22, 2024, Microsoft announced the Windows 11 24H2 update, which removed Cortana, Tips, and WordPad from systems. == Functionality == Cortana was able to set reminders, recognize natural voice without the requirement for keyboard input, and answer questions using information from the Bing search engine. Searches using Windows 10 are made only with the Microsoft Bing search engine, and all links will open with Microsoft Edge, except when a screen reader such as Narrator was being used, where the links will open in Internet Explorer. Windows Phone 8.1's universal Bing SmartSearch features were incorporated into Cortana, which replaced the

    Read more →
  • Cross-entropy method

    Cross-entropy method

    The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective. The method approximates the optimal importance sampling estimator by repeating two phases: Draw a sample from a probability distribution. Minimize the cross-entropy between this distribution and a target distribution to produce a better sample in the next iteration. Reuven Rubinstein developed the method in the context of rare-event simulation, where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems. == Estimation via importance sampling == Consider the general problem of estimating the quantity ℓ = E u [ H ( X ) ] = ∫ H ( x ) f ( x ; u ) d x {\displaystyle \ell =\mathbb {E} _{\mathbf {u} }[H(\mathbf {X} )]=\int H(\mathbf {x} )\,f(\mathbf {x} ;\mathbf {u} )\,{\textrm {d}}\mathbf {x} } , where H {\displaystyle H} is some performance function and f ( x ; u ) {\displaystyle f(\mathbf {x} ;\mathbf {u} )} is a member of some parametric family of distributions. Using importance sampling this quantity can be estimated as ℓ ^ = 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) g ( X i ) {\displaystyle {\hat {\ell }}={\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{g(\mathbf {X} _{i})}}} , where X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} is a random sample from g {\displaystyle g\,} . For positive H {\displaystyle H} , the theoretically optimal importance sampling density (PDF) is given by g ∗ ( x ) = H ( x ) f ( x ; u ) / ℓ {\displaystyle g^{}(\mathbf {x} )=H(\mathbf {x} )f(\mathbf {x} ;\mathbf {u} )/\ell } . This, however, depends on the unknown ℓ {\displaystyle \ell } . The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF g ∗ {\displaystyle g^{}} . == Generic CE algorithm == Choose initial parameter vector v ( 0 ) {\displaystyle \mathbf {v} ^{(0)}} ; set t = 1. Generate a random sample X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} from f ( ⋅ ; v ( t − 1 ) ) {\displaystyle f(\cdot ;\mathbf {v} ^{(t-1)})} Solve for v ( t ) {\displaystyle \mathbf {v} ^{(t)}} , where v ( t ) = argmax v ⁡ 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) f ( X i ; v ( t − 1 ) ) log ⁡ f ( X i ; v ) {\displaystyle \mathbf {v} ^{(t)}=\mathop {\textrm {argmax}} _{\mathbf {v} }{\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})}}\log f(\mathbf {X} _{i};\mathbf {v} )} If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2. In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are When f {\displaystyle f\,} belongs to the natural exponential family When f {\displaystyle f\,} is discrete with finite support When H ( X ) = I { x ∈ A } {\displaystyle H(\mathbf {X} )=\mathrm {I} _{\{\mathbf {x} \in A\}}} and f ( X i ; u ) = f ( X i ; v ( t − 1 ) ) {\displaystyle f(\mathbf {X} _{i};\mathbf {u} )=f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})} , then v ( t ) {\displaystyle \mathbf {v} ^{(t)}} corresponds to the maximum likelihood estimator based on those X k ∈ A {\displaystyle \mathbf {X} _{k}\in A} . == Continuous optimization—example == The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function S {\displaystyle S} , for example, S ( x ) = e − ( x − 2 ) 2 + 0.8 e − ( x + 2 ) 2 {\displaystyle S(x)={\textrm {e}}^{-(x-2)^{2}}+0.8\,{\textrm {e}}^{-(x+2)^{2}}} . To apply CE, one considers first the associated stochastic problem of estimating P θ ( S ( X ) ≥ γ ) {\displaystyle \mathbb {P} _{\boldsymbol {\theta }}(S(X)\geq \gamma )} for a given level γ {\displaystyle \gamma \,} , and parametric family { f ( ⋅ ; θ ) } {\displaystyle \left\{f(\cdot ;{\boldsymbol {\theta }})\right\}} , for example the 1-dimensional Gaussian distribution, parameterized by its mean μ t {\displaystyle \mu _{t}\,} and variance σ t 2 {\displaystyle \sigma _{t}^{2}} (so θ = ( μ , σ 2 ) {\displaystyle {\boldsymbol {\theta }}=(\mu ,\sigma ^{2})} here). Hence, for a given γ {\displaystyle \gamma \,} , the goal is to find θ {\displaystyle {\boldsymbol {\theta }}} so that D K L ( I { S ( x ) ≥ γ } ‖ f θ ) {\displaystyle D_{\mathrm {KL} }({\textrm {I}}_{\{S(x)\geq \gamma \}}\|f_{\boldsymbol {\theta }})} is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value ≥ γ {\displaystyle \geq \gamma } . The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm. === Pseudocode === // Initialize parameters μ := −6 σ2 := 100 t := 0 maxits := 100 N := 100 Ne := 10 // While maxits not exceeded and not converged while t < maxits and σ2 > ε do // Obtain N samples from current sampling distribution X := SampleGaussian(μ, σ2, N) // Evaluate objective function at sampled points S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) // Sort X by objective function values in descending order X := sort(X, S) // Update parameters of sampling distribution via elite samples μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 // Return mean of final sampling distribution as solution return μ == Related methods == Simulated annealing Genetic algorithms Harmony search Estimation of distribution algorithm Tabu search Natural Evolution Strategy Ant colony optimization algorithms

    Read more →
  • Model inversion attack

    Model inversion attack

    Model inversion attack is a type of adversarial machine learning attack where an attacker tries to reconstruct or infer sensitive information about a model's training data by analyzing the outputs of a trained machine learning model. Instead of directly querying the underlying dataset, attackers query the model (usually via APIs or prediction interfaces), and leverage patterns in the model responses to infer properties of the original inputs. These attacks leverage the fact that machine learning models encode statistical information about their training data in their parameters and outputs, which can unintentionally leak private or proprietary information. Depending on the access level to the target model, model inversion attacks can be performed in both black-box and white-box settings. In a generic attack, an adversary makes several queries to a model and leverages the responses (e.g. confidence scores, predictions) to train a surrogate or inversion model that learns to approximate the inverse mapping from outputs to inputs. This process may enable the reconstruction of sensitive attributes, e.g., facial features, medical data, or user behavior patterns, from models trained on such data. The technique has been demonstrated against various models like deep neural networks, classification systems etc. The technique has significant privacy risks in areas like healthcare, finance, biometric identification etc. Mitigation strategies include restricting model access, reducing output granularity, using differential privacy and monitoring anomalous query patterns.

    Read more →
  • Corpus linguistics

    Corpus linguistics

    Corpus linguistics is an empirical method for the study of language by text corpus (plural corpora). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner. The text-corpus method uses the body of texts in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have been increasingly used to compile dictionaries (starting with The American Heritage Dictionary of the English Language in 1969) and reference grammars, with A Comprehensive Grammar of the English Language, published in 1985, as a first. Experts in the field have differing views about the annotation of a corpus. These views range from John McHardy Sinclair, who advocates minimal annotation so texts speak for themselves, to the Survey of English Usage team (University College, London), who advocate annotation as allowing greater linguistic understanding through rigorous recording. == History == Some of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Prātiśākhya literature described the sound patterns of Sanskrit as found in the Vedas, and Pāṇini's grammar of classical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic grammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepared concordances to allow detailed study of the language of the Bible and other canonical texts. === English corpora === A landmark in modern corpus linguistics was the publication of Computational Analysis of Present-Day American English in 1967. Written by Henry Kučera and W. Nelson Francis, the work was based on an analysis of the Brown Corpus, which is a structured and balanced corpus of one million words of American English from the year 1961. The corpus comprises 2000 text samples, from a variety of genres. The Brown Corpus was the first computerized corpus designed for linguistic research. Kučera and Francis subjected the Brown Corpus to a variety of computational analyses and then combined elements of linguistics, language teaching, psychology, statistics, and sociology to create a rich and variegated opus. A further key publication was Randolph Quirk's "Towards a description of English Usage" in 1960 in which he introduced the Survey of English Usage. Quirk's corpus was the first modern corpus to be built with the purpose of representing the whole language. Shortly thereafter, Boston publisher Houghton-Mifflin approached Kučera to supply a million-word, three-line citation base for its new American Heritage Dictionary, the first dictionary compiled using corpus linguistics. The AHD took the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used). Other publishers followed suit. The British publisher Collins' COBUILD monolingual learner's dictionary, designed for users learning English as a foreign language, was compiled using the Bank of English. The Survey of English Usage Corpus was used in the development of one of the most important Corpus-based Grammars, which was written by Quirk et al. and published in 1985 as A Comprehensive Grammar of the English Language. The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), Australian Corpus of English (Australian English), the Frown Corpus (early 1990s American English), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and the British National Corpus, a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library. For contemporary American English, work has stalled on the American National Corpus, but the 400+ million word Corpus of Contemporary American English (1990–present) is now available through a web interface. The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project, containing one million words, which inspired Shana Poplack's much larger corpus of spoken French in the Ottawa-Hull area. === Multilingual corpora === In the 1990s, many of the notable early successes on statistical methods in natural-language programming (NLP) occurred in the field of machine translation, due especially to work at IBM Research. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. There are corpora in non-European languages as well. For example, the National Institute for Japanese Language and Linguistics in Japan has built a number of corpora of spoken and written Japanese. Sign language corpora have also been created using video data. === Ancient languages corpora === Besides these corpora of living languages, computerized corpora have also been made of collections of texts in ancient languages. An example is the Andersen-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields of information. The Quranic Arabic Corpus is an annotated corpus for the Classical Arabic language of the Quran. This is a recent project with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The Digital Corpus of Sanskrit (DCS) is a "Sandhi-split corpus of Sanskrit texts with full morphological and lexical analysis... designed for text-historical research in Sanskrit linguistics and philology." === Corpora from specific fields === Besides pure linguistic inquiry, researchers had begun to apply corpus linguistics to other academic and professional fields, such as the emerging sub-discipline of Law and Corpus Linguistics, which seeks to understand legal texts using corpus data and tools. The DBLP Discovery Dataset concentrates on computer science, containing relevant computer science publications with sentient metadata such as author affiliations, citations, or study fields. A more focused dataset was introduced by NLP Scholar, a combination of papers of the ACL Anthology and Google Scholar metadata. Corpora can also aid in translation efforts or in teaching foreign languages. == Methods == Corpus linguistics has generated a number of research methods, which attempt to trace a path from data to theory. Wallis and Nelson (2001) first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis. Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers. Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search. The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus (through corpus managers). Linguists with other interests and differing perspectives than the originators' can exploit this work. By sharing data

    Read more →
  • Mona Diab

    Mona Diab

    Mona Talat Diab (Arabic: منى طلعت دياب) is a computer science professor and director of Carnegie Mellon University's Language Technologies Institute. Previously, she was a professor at George Washington University and a research scientist with Facebook AI. Her research focuses on natural language processing, computational linguistics, cross lingual/multilingual processing, computational socio-pragmatics, Arabic language processing, and applied machine learning. == Education == Diab completed her M.Sc. in computer science with a major in machine learning and artificial intelligence at The George Washington University (1997) and her Ph.D. in computational linguistics at the University of Maryland, Linguistics Department and University of Maryland Institute for Advanced Computer Studies (UMIACS) in 2003, under the supervision of Philip Resnik. She was also a postdoctoral research scientist at Stanford University (2003–2005) under the mentorship of Dan Jurafsky, where she was a part of the Stanford NLP Group. == Career == After her postdoc at Stanford, Diab took a position as research scientist (principal investigator) at the Center for Computational Learning Systems (CCLS) in Columbia University, where she was also adjunct professor in the computer science department. In 2013 she joined the George Washington University as an associate professor, where she was promoted to full professor in 2017. Diab is the founder and director of the GW NLP lab CARE4Lang. Diab served as an elected faculty senator at Columbia University for 6 years (2007–2012) and an elected faculty senator at GW (2013–2014). She served the computational linguistics community as elected member, secretary and president of ACL SIGLEX (2005–2016) and elected president of ACL SIGSemitic. She currently serves as the elected VP-elect for ACL SIGDAT. In 2017 Diab joined Amazon AWS AI Deep Learning Group for Human Language Technologies, where she led the AWS Lex project for task oriented dialogue systems for enterprises. A couple of years later, she moved to Facebook AI as a research scientist. In the fall of 2023, she became the director of CMU's Language Technologies Institute -- the first full time director since the passing of its founder Jaime Carbonell. == Research == Diab's research interests include several areas in computational linguistics/natural language processing, like conversational AI, computational lexical semantics, multilingual and cross lingual processing, social media processing with an emphasis on computational socio- pragmatics, information extraction & text analytics, machine translation. Besides this, she also has special interests in Arabic NLP and low resource scenarios. Diab co-established two research trends in the computational linguistics field, computational approaches to linguistic code switching in 2007 and semantic textual similarity in 2010. Diab together with Nizar Habash and Owen Rambow, co-founded CADIM in 2005, a global reference point in Arabic dialect processing. In 2012, Diab together with Eneko Agirre and Johan Bos, brought together two ACL communities SIGLEX and SIGSEM and established the 1st tier conference SEM. == Awards and recognition == Selected as one of top 150 leaders and visionaries in AI nationwide to participate in White House AI Summit in Government, Washington, D.C., US, September 2019 March 2017: 3 Muslim Women in STEM You Should Know About, Teen Vogue, March 2017 May 2017: Behind Every Strong Woman Is...Another Strong Woman: Ten women give thanks to the women who supported them on the way up. Elle, May 2017. Google Faculty Research Award – Tharwa++: Building a multidialectal Arabic Lexical Repository, (PI), 09.2015 –12.2016. Google Faculty Research Award – Nuanced Sentiment and Perspective Analysis for Arabic Social Media Text, (PI), 12.2014 –12.2015 QNRF Best Poster Award – Ossama Obeid, Houda Bouamor, Wajdi Zaghouani, Mahmoud Ghoneim, Abdelati Hawwari, Mona Diab, Kemal Oflazer. (2016) MANDIAC: A Web-based Annotation System For Manual Arabic Diacritization. Proceedings of the 2nd Workshop on Arabic Corpora and Processing Tools, LREC 2016. Best Paper Award – Aminian, Maryam, Mahmoud Ghoneim, Mona Diab. (2015) Unsupervised False Friend Disambiguation Using Contextual Word Clusters and Parallel Word Alignments. In Proceedings of Workshop 9th Semantics Syntax Statistical Translation, NAACL 2015, Denver CO, US. == Publications == Diab has over 250 publications, and she is an acting editor for several scientific journals. === Selected publications === Semeval-2012 task 6: A pilot on semantic textual similarity. E. Agirre, D. Cer, M. Diab, A. Gonzalez-Agirre. SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) Predictive linguistic features of schizophrenia. ES Kayi, M Diab, L Pauselli, M Compton, G Coppersmith. arXiv preprint arXiv:1810.09377 Ideological perspective detection using semantic features. H Elfardy, M Diab, C Callison-Burch – Proceedings of SEM 2015 DeSePtion: Dual sequence prediction and adversarial examples for improved fact-checking. Christopher Hidey, Tuhin Chakrabarty, Tariq Alhindi, Siddharth Varia, Kriste Krstovski, Mona Diab, Smaranda Muresan, 2020 Does Causal Coherence Predict Online Spread of Social Media? Pedram Hosseini, Mona Diab, David A Broniatowski. Proceedings of International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, 2019. Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections. YA Lai, X Zhu, Y Zhang, M Diab, arXiv preprint arXiv:2003.08529, 2020 Readability of written medicine information materials in Arabic language: expert and consumer evaluation. S Al Aqeel, N Abanmy, A Aldayel, H Al-Khalifa, M Al-Yahya, M Diab. BMC health services research 18 (1), 1–7, 2019 Unsupervised word mapping using structural similarities in monolingual embeddings. H Aldarmaki, M Mohan, M Diab – Transactions of the Association for Computational Linguistics, 2018 An unsupervised method for word sense tagging using parallel corpora M Diab, P Resnik. Proceedings of ACL 2002 Overview for the first shared task on language identification in code-switched data. Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Julia Hirschberg, Alison Chang, Pascale Fung. Proceedings of the First Workshop on Computational Approaches to Code Switching, 2014 Modeling sentences in the latent space. W Guo, M Diab – ACL 20 12 Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. M Carpuat, M Diab – NAACL-HLT 2010 Rumor detection and classification for twitter data. S Hamidian, MT Diab – arXiv preprint arXiv:1912.08926, 2019 Subgroup detection in ideological discussions. A Abu-Jbara, P Dasigi, M Diab, D Radev – ACL 2012 Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of arabic. A. Pasha, M. Al-Badrashiny, M. Diab, A. El Kholy, R. Eskander, N. Habash, M. Pooleery, O. Rambow, R. Roth. LREC 14, 1094–1101. 2014 Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots. A. Gupta, P. Zhang, G. Lalwani, M. Diab. EMNLP 2019 A multitask learning approach for diacritic restoration. S. Alqahtani, A. Mishra, M. Diab. ACL 2020

    Read more →
  • Vatican News App

    Vatican News App

    The Vatican News App is an official mobile application software issued by the Vatican's Dicastery for Communication. Formerly titled The Pope App, the app was launched on January 23, 2013, under the auspices of the Pontifical Council for Social Communications, a now-defunct dicastery that was merged into the Secretariat (now Dicastery) for Communication in March 2016. Initially, The Pope App was available only on iOS devices, but became available for Android phones at the end of February 2013. The app is available for download on iOS and Android in five languages: English, French, Italian, Portuguese and Spanish. It was originally promoted as an application with focus on the figure of the Pope which made it possible to follow the Pope's events while they are taking place. Alerts notified the followers by informing and offering access to "official papal-related content in a variety of formats". The app also enabled its users to see areas of the Vatican through webcams allocated throughout St. Peter's Square in Rome that broadcast images. In early 2018, The Pope App was relaunched as the Vatican News App, accompanied by a redesign that eliminated many of the previous version's features, reducing the app to a more conventional news service, with increased emphasis on news from the Vatican and the worldwide Catholic Church and less focus on the day-to-day activities of the Pope.

    Read more →
  • Synchronous context-free grammar

    Synchronous context-free grammar

    Synchronous context-free grammars (SynCFG or SCFG; not to be confused with stochastic CFGs) are a type of formal grammar designed for use in transfer-based machine translation. Rules in these grammars apply to two languages at the same time, capturing grammatical structures that are each other's translations. The theory of SynCFGs borrows from syntax-directed transduction and syntax-based machine translation, modeling the reordering of clauses that occurs when translating a sentence by correspondences between phrase-structure rules in the source and target languages. Performance of SCFG-based MT systems has been found comparable with, or even better than, state-of-the-art phrase-based machine translation systems. Several algorithms exist to perform translation using SynCFGs. == Formalism == Rules in a SynCFG are superficially similar to CFG rules, except that they specify the structure of two phrases at the same time; one in the source language (the language being translated) and one in the target language. Numeric indices indicate correspondences between non-terminals in both constituent trees. Chiang gives the Chinese/English example: X → (yu X1 you X2, have X2 with X1) This rule indicates that an X phrase can be formed in Chinese with the structure "yu X1 you X2", where X1 and X2 are variables standing in for subphrases; and that the corresponding structure in English is "have X2 with X1" where X1 and X2 are independently translated to English. == Software == cdec, MT decoding package that supports SynCFGs Joshua, a machine translation decoding system written in Java

    Read more →
  • SNNS

    SNNS

    SNNS (Stuttgart Neural Network Simulator) is a neural network simulator originally developed at the University of Stuttgart. While it was originally built for X11 under Unix, there are Windows ports. Its successor JavaNNS never reached the same popularity. == Features == SNNS is written around a simulation kernel to which user written activation functions, learning procedures and output functions can be added. It has support for arbitrary network topologies and the standard release contains support for a number of standard neural network architectures and training algorithms. == Status == There is currently no ongoing active development of SNNS. In July 2008 the license was changed to the GNU LGPL.

    Read more →