Word error rate

Word error rate

Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate. Word error rate can then be computed as: W E R = S + D + I N = S + D + I S + D + C {\displaystyle {\mathit {WER}}={\frac {S+D+I}{N}}={\frac {S+D+I}{S+D+C}}} where S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words, N is the number of words in the reference (N=S+D+C) The intuition behind 'deletion' and 'insertion' is how to get from the reference to the hypothesis. So if we have the reference "This is wikipedia" and hypothesis "This _ wikipedia", we call it a deletion. Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, namely if the number of insertions I is larger than the number of correct words C. When reporting the performance of a speech recognition system, sometimes word accuracy (WAcc) is used instead: W A c c = 1 − W E R = N − S − D − I N = C − I N {\displaystyle {\mathit {WAcc}}=1-{\mathit {WER}}={\frac {N-S-D-I}{N}}={\frac {C-I}{N}}} Since the WER can be larger than 1.0, the word accuracy can be smaller than 0.0. == Experiments == It is commonly believed that a lower word error rate shows superior accuracy in recognition of speech, compared with a higher word error rate. However, at least one study has shown that this may not be true. In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy. == Other metrics == One problem with using a generic formula such as the one above, however, is that no account is taken of the effect that different types of error may have on the likelihood of successful outcome, e.g. some errors may be more disruptive than others and some may be corrected more easily than others. These factors are likely to be specific to the syntax being tested. A further problem is that, even with the best alignment, the formula cannot distinguish a substitution error from a combined deletion plus insertion error. Hunt (1990) has proposed the use of a weighted measure of performance accuracy where errors of substitution are weighted at unity but errors of deletion and insertion are both weighted only at 0.5, thus: W E R = S + 0.5 D + 0.5 I N {\displaystyle {\mathit {WER}}={\frac {S+0.5D+0.5I}{N}}} There is some debate, however, as to whether Hunt's formula may properly be used to assess the performance of a single system, as it was developed as a means of comparing more fairly competing candidate systems. A further complication is added by whether a given syntax allows for error correction and, if it does, how easy that process is for the user. There is thus some merit to the argument that performance metrics should be developed to suit the particular system being measured. Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser. This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents. The pace at which words should be spoken during the measurement process is also a source of variability between subjects, as is the need for subjects to rest or take a breath. All such factors may need to be controlled in some way. For text dictation it is generally agreed that performance accuracy at a rate below 95% is not acceptable, but this again may be syntax and/or domain specific, e.g. whether there is time pressure on users to complete the task, whether there are alternative methods of completion, and so on. The term "Single Word Error Rate" is sometimes referred to as the percentage of incorrect recognitions for each different word in the system vocabulary. == Edit distance == The word error rate may also be referred to as the length normalized edit distance. The normalized edit distance between X and Y, d( X, Y ) is defined as the minimum of W( P ) / L ( P ), where P is an editing path between X and Y, W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

AZFinText

Arizona Financial Text System (AZFinText) is a textual-based quantitative financial prediction system written by Robert P. Schumaker of University of Texas at Tyler and Hsinchun Chen of the University of Arizona. == System == This system differs from other systems in that it uses financial text as one of its key means of predicting stock price movement. This reduces the information lag-time problem evident in many similar systems where new information must be transcribed (e.g., such as losing a costly court battle or having a product recall), before the quant can react appropriately. AZFinText overcomes these limitations by utilizing the terms used in financial news articles to predict future stock prices twenty minutes after the news article has been released. It is believed that certain article terms can move stocks more than others. Terms such as factory exploded or workers strike will have a depressing effect on stock prices whereas terms such as earnings rose will tend to increase stock prices. The AZFinText system analyzes financial news to identify the patterns in how investors react to such specific information. It uses methods like sentiment analysis and term weighting to examine the text of news articles. This system is designed to find price differences that occur when the market responds to news stories. This approach provides an alternative and easier method for predicting stock market movements. == Overview of research == The foundation of AZFinText can be found in the ACM TOIS article. Within this paper, the authors tested several different prediction models and linguistic textual representations. From this work, it was found that using the article terms and the price of the stock at the time the article was released was the most effective model and using proper nouns was the most effective textual representation technique. Combining the two, AZFinText netted a 2.84% trading return over the five-week study period. AZFinText was then extended to study what combination of peer organizations help to best train the system. Using the premise that IBM has more in common with Microsoft than GM, AZFinText studied the effect of varying peer-based training sets. To do this, AZFinText trained on the various levels of GICS and evaluated the results. It was found that sector-based training was most effective, netting an 8.50% trading return, outperforming Jim Cramer, Jim Jubak and DayTraders.com during the study period. AZFinText was also compared against the top 10 quantitative systems and outperformed 6 of them. A third study investigated the role of portfolio building in a textual financial prediction system. From this study, Momentum and Contrarian stock portfolios were created and tested. Using the premise that past winning stocks will continue to win and past losing stocks will continue to lose, AZFinText netted a 20.79% return during the study period. It was also noted that traders were generally overreacting to news events, creating the opportunity of abnormal returns. A fourth study looked into using author sentiment as an added predictive variable. Using the premise that an author can unwittingly influence market trades simply by the terms they use, AZFinText was tested using tone and polarity features. It was found that Contrarian activity was occurring within the market, where articles of a positive tone would decrease in price and articles of a negative tone would increase in price. A further study investigated what article verbs have the most influence on stock price movement. From this work, it was found that planted, announcing, front, smaller and crude had the highest positive impact on stock price. == Notable publicity == AZFinText has been the topic of discussion by numerous media outlets. Some of the more notable ones include The Wall Street Journal, MIT's Technology Review, Dow Jones Newswire, WBIR in Knoxville, TN, Slashdot and other media outlets.

Nitro Zeus

Nitro Zeus is the project name for a well funded comprehensive cyber attack plan created as a mitigation strategy after the Stuxnet malware campaign and its aftermath. Unlike Stuxnet, that was loaded onto a system after the design phase to affect its proper operation, Nitro Zeus's objectives are built into a system during the design phase unbeknownst to the system users. This built-in feature allows a more assured and effective cyber attack against the system's users. The information about its existence was raised during research and interviews carried out by Alex Gibney for his Zero Days documentary film. The proposed long term widespread infiltration of major Iranian systems would disrupt and degrade communications, power grid, and other vital systems as desired by the cyber attackers. This was to be achieved by electronic implants in Iranian computer networks. The project was seen as one pathway in alternatives to full-scale war.

1tik

1tik, pronounced Antik (Arabic: أنتيك; lit. "Everything is going well") is a fully Algerian instant messaging, social media and mobile payment app. designed, developed and built locally by the Algerian start-up, INTAJ Digital, with backing from the state-owned company ATM Mobilis (who's the company's main sponsor). It is described as Algeria's first super-app that is entirely designed and built by local developers. == Etymology == The name "1tik" (Arabic: أنتيك) is drawn from the popular Algerian vernacular (Antik), the neologism, which appeared several years ago, means "everything is going well" or "it's all good". == History == 1tik was officially launched and announced the 20th December 2025 by INTAJ Digital's founder Youcef Toulaib and a team of 50 employees, making it the first ever Algerian instant messaging, social media and mobile payment app, rivaling with the growing influence of Yassir in Algeria. it grew in popularity after the presidency of Algeria and several other state-owned companies, medias, and ministries opened official accounts on the app.

Full30

Full30 was an American online video-sharing platform primarily dedicated to firearms and shooting sports-related content. The service was established in 2014 by Tim Harmsen and Mark Hammonds as a result of YouTube's increasing restrictions on gun-related videos. == History == After the 2018 Parkland high school shooting, many companies attempted to distance themselves from any association with the firearms industry. As a result, YouTube began demonetizing and sometimes outright deleting firearms-related videos, and in one case, popular YouTube poster Hickok45's channel was completely deleted but later restored. In response, Harmsen, who operates the Military Arms Channel on YouTube, decided to create his own video-hosting website to allow himself and other firearms content creators a platform free from such restrictions; he named the website Full30 — a reference to the popular 30-round STANAG magazine. In July 2020, site representatives announced the site had new ownership. By the end of 2022, the site began to be redirected to a series of other websites. By 2025, it was largely deactivated with the front page replaced by a form to be filled out to receive "updates", with no other explanation. == Contributors == Hickok45 Military Arms Channel Forgotten Weapons Bavarian Shooter Liberty Doll CloverTac

Office automation

Office automation refers to the varied computer machinery and software used to digitally create, collect, store, manipulate, and relay office information needed for accomplishing basic tasks. Raw data storage, electronic transfer, and the management of electronic business information comprise the basic activities of an office automation system. Office automation helps in optimizing or automating existing office procedures. The backbone of office automation is a local area network, which allows users to transfer data, mail and voice across the network. All office functions, including dictation, typing, filing, copying, fax, telex, microfilm and records management, telephone and telephone switchboard operations, fall into this category. Office automation was a popular term in the 1970s and 1980s as the desktop computer exploded onto the scene. Advantages of office automation include that it can get many tasks accomplished faster, it eliminates the need for a large staff, less storage is required to store data, and multiple people can update data simultaneously in the event of changes in schedule. == Outline == Businesses can easily purchase and stock their wares with the aid of technology. Many of the manual tasks that used to be done by hand can now be done through hand held devices and UPC and SKU coding. In the retail setting, automation also increases choice. Customers can easily process their payments through automated credit card machines and no longer have to wait in line for an employee to process and manually type in the credit card numbers. Office payrolls have been automated, which means no one has to manually cut checks, and those checks that are cut can be printed through computer programs. Direct deposit can be automatically set up and this further reduces the manual process, and most employees who participate in direct deposit often find their paychecks come earlier than if they'd have to wait for their checks to be written and then cleared by the bank. Other ways automation has reduced employee manpower on tasks is automated voice direction. Through the use of prompts, automated phone menus and directed calls, the need for employees to be dedicated to answer the phones has been reduced, and in some cases, eliminated.

CSS box model

In web development, the CSS box model refers to how HTML elements are modeled in browser engines and how the dimensions of those HTML elements are derived from CSS properties. It is a fundamental concept for the composition of HTML webpages. The guidelines of the box model are described by web standards World Wide Web Consortium (W3C) specifically the CSS Working Group. For much of the late-1990s and early 2000s there had been non-standard compliant implementations of the box model in mainstream browsers. With the advent of CSS2 in 1998, which introduced the box-sizing property, the problem had mostly been resolved. == Specifics == The Cascading Style Sheets (CSS) specification describes how elements of web pages are displayed by graphical browsers. Section 4 of the CSS1 specification defines a "formatting model" that gives block-level elements—such as p and blockquote—a width and height, and three levels of boxes surrounding it: padding, borders, and margins. While the specification never uses the term "box model" explicitly, the term has become widely used by web developers and web browser vendors. All HTML elements can be considered "boxes", this includes div tag, p tag, or a tag. Each of those boxes has five modifiable dimensions: the height and width describe dimensions of the actual content of the box (text, images, ...) the padding describes the space between this content and the border of the box the border is any kind of line (solid, dotted, dashed...) surrounding the box, if present the margin is the space around the border According to the CSS1 specification, released by W3C in 1996 and revised in 1999, when a width or height is explicitly specified for any block-level element, it should determine only the width or height of the visible element, with the padding, borders, and margins applied afterward. Before CSS3, this box model was known as W3C box model, in CSS3, it is known as the content-box. The total width of a box is therefore margin-left + border-left + padding-left + width + padding-right + border-right + margin-right. Similarly, the total height of a box equals margin-top + border-top + padding-top + height + padding-bottom + border-bottom + margin-bottom. For example, the following CSS code would specify the box dimensions of each block belonging to 'my-class'. Moreover, each such box will have total height 140px and width 240px. CSS3 introduced the Internet Explorer box model to the standard, known referred to as border-box. == History == Before HTML 4 and CSS, very few HTML elements supported both border and padding, so the definition of the width and height of an element was not very contentious. However, it varied depending on the element. The HTML width attribute of a table defined the width of the table including its border. On the other hand, the HTML width attribute of an image defined the width of the image itself (inside any border). The only element to support padding in those early days was the table cell. Width for the cell was defined as "the suggested width for a cell content in pixels excluding the cell padding." In 1996, CSS introduced margin, border and padding for many more elements. It adopted a definition width in relation to content, border, margin and padding similar to that for a table cell. This has since become known as the W3C box model. At the time, very few browser vendors implemented the W3C box model to the letter. The two major browsers at the time, Netscape 4.0 and Internet Explorer 4.0 both defined width and height as the distance from border to border. This has been referred to as the traditional or the Internet Explorer box model. Internet Explorer in "quirks mode" includes the content, padding and borders within a specified width or height; this results in a narrower or shorter rendering of a box than would result following the standard behavior. The Internet Explorer box model behavior was often considered a bug, because of the way in which earlier versions of Internet Explorer handle the box model or sizing of elements in a web page, which differs from the standard way recommended by the W3C for the Cascading Style Sheets language. As of Internet Explorer 6, the browser supports an alternative rendering mode (called the "standards-compliant mode") which solves this discrepancy. However, for backward compatibility reasons, all versions still behave in the usual, non-standard way by default (see quirks mode). Internet Explorer for Mac is not affected by this non-standard behavior. === Workarounds === Internet Explorer versions 6 and onward are not affected by the bug if the page contains certain HTML document type declarations. These versions maintain the buggy behavior when in quirks mode for reasons of backward compatibility. For example, quirks mode is triggered: When the document type declaration is absent or incomplete; When an HTML 3 or earlier document is encountered; When an HTML 4.0 Transitional or Frameset document type declaration is used and a system identifier (URI) is not present; When an SGML comment or other unrecognized content appears before the document type declaration Internet Explorer 6 also uses quirks mode if there is an XML declaration prior to the document type declaration. Various workarounds have been devised to force Internet Explorer versions 5 and earlier to display Web pages using the W3C box model. These workarounds generally exploit unrelated bugs in Internet Explorer's CSS selector processing in order to hide certain rules from the browser. The best known of these workarounds is the "box model hack" developed by Tantek Çelik, a former Microsoft employee who developed this idea while working on Internet Explorer for the Macintosh. It involves specifying a width declaration for Internet Explorer for Windows, and then overriding it with another width declaration for CSS-compliant browsers. This second declaration is hidden from Internet Explorer for Windows by exploiting other bugs in the way that it parses CSS rules. The implementation of these CSS “hacks” has been further complicated by the public release of Internet Explorer 7, which has had some issues fixed, but not others, causing undesired results in pages using these hacks. Box model hacks have proven unreliable because they rely on bugs in browsers' CSS support that may be fixed in later versions. For this reason, some Web developers have instead recommended either avoiding specifying both width and padding for the same element or using conditional comment and/or CSS filters to work around the box model bug in older versions of Internet Explorer. == Support for Internet Explorer's box model == Web designer Doug Bowman has said that the original Internet Explorer box model represents a better, more logical approach. Peter-Paul Koch gives the example of a physical box, whose dimensions always refer to the box itself, including potential padding, but never its content. He says that this box model is more useful for graphic designers, who create designs based on the visible width of boxes rather than the width of their content. Bernie Zimmermann says that the Internet Explorer box model is closer to the definition of cell dimensions and padding used in the HTML table model. The W3C has included a "box-sizing" property in CSS3. When box-sizing: border-box; is specified for an element, any padding or border of the element is drawn inside the specified width and height, "as commonly implemented by legacy HTML user agents". Internet Explorer 8, WebKit browsers such as Apple Safari 5.1+ and Google Chrome, Gecko-based browsers such as Mozilla Firefox 29.0 and later, Opera 7.0 and later, and Konqueror 3.3.2 and later support the CSS3 box-sizing property. Gecko browsers previous than 29.0 support the same functionality using the browser-specific -moz-box-sizing property. border-box is the default box model used in Bootstrap framework.