Image translation

Image translation

Image translation is the machine translation of images of printed text (posters, banners, menus, screenshots etc.). This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language. == General == Machine translation made available on the internet (web and mobile) is a notable advance in multilingual communication eliminating the need for an intermediary translator/interpreter, translating foreign texts still poses a problem to the user as they cannot be expected to be able to type the foreign text they wish to translate and understand. Manually entering the foreign text may prove to be a difficulty especially in cases where an unfamiliar alphabet is used from a script which user can't read, e.g. Cyrillic, Chinese, Japanese etc. for an English speaker or any speaker of a Latin-based language or vice versa. The technical advancements in OCR made it possible to recognize text from images. The possibility to use one's mobile device's camera to capture and extract printed text is also known as mobile OCR and was first introduced in Japanese manufactured mobile telephones in 2004. Using the handheld's camera one could take a picture of (a line of) text and have it extracted (digitalized) for further manipulation such as storing the information in their contacts list, as a web page address (URL) or text to use in an SMS/email message etc. Presently, mobile devices having a camera resolution of 2 megapixels or above with an auto-focus ability, often feature the text scanner service. Taking the text scanning facility one step further, image translation emerged, giving users the ability to capture text with their mobile phone's camera, extract the text, and have it translated in their own language. More and more applications emerged on this technology including Word Lens. After getting acquired by Google, it was made a part of Google Translate mobile app. Another simultaneous advancement in Image Processing, has also made it possible now to replace the text on the image with the translated text and create a new image altogether. == History == The development of the image translation service springs from the advances in OCR technology (miniaturization and reduction of memory resources consumed) enabling text scanning on mobile telephones. Among the first to announce mobile software capable of “reading” text using the mobile device's camera is International Wireless Inc. who in February 2003 released their “CheckPoint” and “WebPoint” applications. “CheckPoint” reads critical symbolic information on checks and is aimed at reducing losses that mobile merchants suffer from “bounced” checks by scanning the MICR number on the bottom of a check, while “WebPoint” enables the visual recognition and decoding of printed URL's, which are then opened by the device's web browser. The first commercial release of a mobile text scanner, however, took place in December 2004 when Vodafone and Sharp began selling the 902SH mobile which was the first to feature a 2 megapixel digital camera with optical zoom. Among the device's various multimedia features was the built-in text/bar code/QR code scanner. The text scanner function could handle up to 60 alphabetical characters simultaneously. The scanned text could be then sent as an email or SMS message, added as a dictionary entry or, in the case of scanned URLs, opened via the device's web browser. All subsequent Sharp mobiles feature the text scanner functionality. In September 2005, NEC Corporation and the Nara Institute of Science and Technology in Japan (NAIST) announced new software capable of transforming cameraphones into text scanners. The application differs substantially from similarly equipped mobile telephones in Japan (able to scan businesscards and small bits of text and use OCR to convert that to editable text or to URL addresses) by it ability to scan a whole page. The two companies, however, said they would not release the software commercially before the end of 2008. Combining the text scanner function with machine translation technology was first made by US company RantNetwork who in July 2007 started selling the Communilator, a machine translation application for mobile devices featuring the Image Translation functionality. Using the built-in camera, the mobile user could take a picture of some printed text, apply OCR to recognize the text and then translate it into any one of over 25 language available. In April 2008 Nokia showcased their Shoot-to-Translate application for the N73 model which is capable of taking a picture using the device's camera, extracting the text and then translating it. The application only offers Chinese to English translation, and does not handle large segments of text. Nokia said they are in the process of developing their Multiscanner product which, besides scanning text and business cards, would be able to translate between 52 languages. Again in April 2008, Korean company Unichal Inc. released their handheld Dixau text scanner capable of scanning and recognizing English text and then translating it into Korean using online translation tools such as Wikipedia or Google Translate. The device is connected to a PC or a laptop via the USB port. In February 2009, Bulgarian company Interlecta presented at the Mobile World Congress in Barcelona their mobile translator including image recognition and speech synthesis. The application handles all European languages along with Chinese, Japanese and Korean. The software connects to a server over the Internet to accomplish the image recognition and the translation. In May 2014, Google acquired Word Lens to improve the quality of visual and voice translation. It is able to scan text or picture with one's device and have it translated instantly. Since the OCR has been improving many companies or website started combining OCR and translation, to read the text from an image and show the translated text. In August 2018, an Indian company created ImageTranslate. It is able to read, translate and re-create the image in another language. As of late 2018, the tool added 13 new languages, including Arabic, Thai, Vietnamese, Hindi, and Bengali, significantly increasing its utility in Asia and the Middle East. This helps users translate photos already stored in their phone's gallery, not just live, real-time views. Currently, image translation is offered by the following companies: Google Translate app with camera ImageTranslate Yandex

PagedAttention

PagedAttention is an attention algorithm for efficient serving of large language models (LLMs). It was introduced in 2023 by Woosuk Kwon and colleagues in the paper Efficient Memory Management for Large Language Model Serving with PagedAttention, alongside the vLLM serving engine. The method stores the key–value cache used during autoregressive decoding in fixed-size blocks that can be mapped to non-contiguous physical memory, borrowing ideas from virtual memory, paging, and operating system design. == Background == In transformer inference, the key–value cache grows with sequence length and the number of concurrent requests. Kwon et al. argued that earlier serving systems typically reserved contiguous cache regions in advance, which caused reserved space, internal fragmentation, and external fragmentation. In their experiments, the paper reported that the effective memory utilization of previous systems could fall as low as 20.4%. == Description == PagedAttention partitions the cache of each sequence into fixed-size KV blocks. A request's cache is represented as a sequence of logical blocks, while a block table maps those logical blocks to physical GPU-memory blocks. As a result, neighboring logical blocks do not need to be contiguous in physical memory, and new blocks can be allocated on demand as generation proceeds. The design also makes it easier to share cache state across related decoding paths. In vLLM, physical blocks can be reference-counted and shared among requests or branches, with block-granularity copy-on-write used when a shared block must be modified. The original paper applied this design to parallel sampling, beam search, and prompts with shared prefixes. == Mathematical formulation == For a query token i {\displaystyle i} in causal self-attention, the standard attention output can be written as a i j = exp ⁡ ( q i ⊤ k j / d ) ∑ t = 1 i exp ⁡ ( q i ⊤ k t / d ) , o i = ∑ j = 1 i a i j v j {\displaystyle a_{ij}={\frac {\exp(\mathbf {q} _{i}^{\top }\mathbf {k} _{j}/{\sqrt {d}})}{\sum _{t=1}^{i}\exp(\mathbf {q} _{i}^{\top }\mathbf {k} _{t}/{\sqrt {d}})}},\;\mathbf {o} _{i}=\sum _{j=1}^{i}a_{ij}\mathbf {v} _{j}} where q i {\displaystyle \mathbf {q} _{i}} , k j {\displaystyle \mathbf {k} _{j}} , and v j {\displaystyle \mathbf {v} _{j}} are the query, key, and value vectors, and d {\displaystyle d} is the attention dimension. If the cache is partitioned into blocks of size B {\displaystyle B} , the key and value blocks may be written as K j = ( k ( j − 1 ) B + 1 , … , k j B ) , V j = ( v ( j − 1 ) B + 1 , … , v j B ) {\displaystyle \mathbf {K} _{j}=(\mathbf {k} _{(j-1)B+1},\ldots ,\mathbf {k} _{jB}),\;\mathbf {V} _{j}=(\mathbf {v} _{(j-1)B+1},\ldots ,\mathbf {v} _{jB})} PagedAttention then performs the computation blockwise: A i j = exp ⁡ ( q i ⊤ K j / d ) ∑ t = 1 ⌈ i / B ⌉ exp ⁡ ( q i ⊤ K t / d ) , o i = ∑ j = 1 ⌈ i / B ⌉ V j A i j ⊤ {\displaystyle \mathbf {A} _{ij}={\frac {\exp(\mathbf {q} _{i}^{\top }\mathbf {K} _{j}/{\sqrt {d}})}{\sum _{t=1}^{\lceil i/B\rceil }\exp(\mathbf {q} _{i}^{\top }\mathbf {K} _{t}/{\sqrt {d}})}},\;\mathbf {o} _{i}=\sum _{j=1}^{\lceil i/B\rceil }\mathbf {V} _{j}\mathbf {A} _{ij}^{\top }} where A i j {\displaystyle \mathbf {A} _{ij}} is the vector of attention scores for the j {\displaystyle j} -th KV block. In the formulation given by Kwon et al., this preserves the causal attention calculation while allowing the key and value blocks to reside in non-contiguous physical memory. == Performance and use == The vLLM paper reported that, on its evaluated workloads, the use of PagedAttention and the associated memory-management design improved serving throughput by 2–4× over the compared baselines, including FasterTransformer and Orca, while preserving model outputs. In experiments on OPT-13B with the Alpaca trace, the paper also reported memory savings of 6.1–9.8% for parallel sampling and 37.6–55.2% for beam search through KV-block sharing. A 2024 survey of LLM serving systems described PagedAttention as having become an industry norm in LLM serving frameworks, citing support in TGI, vLLM, and TensorRT-LLM. == Limitations and alternatives == Subsequent work has described trade-offs in the approach. The 2025 vAttention paper argued that PagedAttention requires attention kernels to be rewritten to support paging and increases software complexity, portability issues, redundancy, and execution overhead, proposing instead a memory manager that keeps the cache contiguous in virtual memory while relying on demand paging for physical allocation. === vAttention === Unlike PagedAttention, vAttention does not introduce a different attention rule; it retains the standard attention computation Attention ⁡ ( q i , K , V ) = softmax ⁡ ( q i K ⊤ s c a l e ) V . {\displaystyle \operatorname {Attention} (q_{i},K,V)=\operatorname {softmax} \left({\frac {q_{i}K^{\top }}{\mathrm {scale} }}\right)V.} In the notation of Prabhu et al., the key and value tensors for a request seen so far are K , V ∈ R L ′ × ( H × D ) {\displaystyle K,V\in \mathbb {R} ^{L'\times (H\times D)}} , where L ′ {\displaystyle L'} is the context length seen so far, H {\displaystyle H} is the number of KV heads on a worker, and D {\displaystyle D} is the dimension of each KV head. In systems prior to PagedAttention, the K cache (or V cache) at each layer of a worker is typically allocated as a 4D tensor of shape [ B , L , H , D ] , {\displaystyle [B,L,H,D],} where B {\displaystyle B} is batch size and L {\displaystyle L} is the maximum context length supported by the model. vAttention preserves this contiguous virtual-memory view while deferring physical-memory allocation to runtime. A serving framework maintains separate K and V tensors for each layer, so vAttention reserves 2 N {\displaystyle 2N} virtual-memory buffers on a worker, where N {\displaystyle N} is the number of layers managed by that worker. The maximum size of one virtual-memory buffer is B S = B × S , {\displaystyle BS=B\times S,} where S {\displaystyle S} is the maximum size of a single request's per-layer K cache (or V cache) on a worker. The paper defines S = L × H × D × P , {\displaystyle S=L\times H\times D\times P,} where P {\displaystyle P} is the number of bytes needed to store one element. In this formulation, vAttention keeps the KV cache contiguous in virtual memory and relies on demand paging for physical allocation, rather than modifying the attention kernel to operate over non-contiguous KV-cache blocks.

Higuchi dimension

In fractal geometry, the Higuchi dimension (or Higuchi fractal dimension (HFD)) is an approximate value for the box-counting dimension of the graph of a real-valued function or time series. This value is obtained via an algorithmic approximation so one also talks about the Higuchi method. It has many applications in science and engineering and has been applied to subjects like characterizing primary waves in seismograms, clinical neurophysiology and analyzing changes in the electroencephalogram in Alzheimer's disease. == Formulation of the method == The original formulation of the method is due to T. Higuchi. Given a time series X : { 1 , … , N } → R {\displaystyle X:\{1,\dots ,N\}\to \mathbb {R} } consisting of N {\displaystyle N} data points and a parameter k m a x ≥ 2 {\displaystyle k_{\mathrm {max} }\geq 2} the Higuchi Fractal dimension (HFD) of X {\displaystyle X} is calculated in the following way: For each k ∈ { 1 , … , k m a x } {\displaystyle k\in \{1,\dots ,k_{\mathrm {max} }}\} and m ∈ { 1 , … , k } {\displaystyle m\in \{1,\dots ,k}\} define the length L m ( k ) {\displaystyle L_{m}(k)} by L m ( k ) = N − 1 ⌊ N − m k ⌋ k 2 ∑ i = 1 ⌊ N − m k ⌋ | X N ( m + i k ) − X N ( m + ( i − 1 ) k ) | . {\displaystyle L_{m}(k)={\frac {N-1}{\lfloor {\frac {N-m}{k}}\rfloor k^{2}}}\sum _{i=1}^{\lfloor {\frac {N-m}{k}}\rfloor }|X_{N}(m+ik)-X_{N}(m+(i-1)k)|.} The length L ( k ) {\displaystyle L(k)} is defined by the average value of the k {\displaystyle k} lengths L 1 ( k ) , … , L k ( k ) {\displaystyle L_{1}(k),\dots ,L_{k}(k)} , L ( k ) = 1 k ∑ m = 1 k L m ( k ) . {\displaystyle L(k)={\frac {1}{k}}\sum _{m=1}^{k}L_{m}(k).} The slope of the best-fitting linear function through the data points { ( log ⁡ 1 k , log ⁡ L ( k ) ) } {\displaystyle \left\{\left(\log {\frac {1}{k}},\log L(k)\right)\right\}} is defined to be the Higuchi fractal dimension of the time-series X {\displaystyle X} . == Application to functions == For a real-valued function f : [ 0 , 1 ] → R {\displaystyle f:[0,1]\to \mathbb {R} } one can partition the unit interval [ 0 , 1 ] {\displaystyle [0,1]} into N {\displaystyle N} equidistantly intervals [ t j , t j + 1 ) {\displaystyle [t_{j},t_{j+1})} and apply the Higuchi algorithm to the times series X ( j ) = f ( t j ) {\displaystyle X(j)=f(t_{j})} . This results into the Higuchi fractal dimension of the function f {\displaystyle f} . It was shown that in this case the Higuchi method yields an approximation for the box-counting dimension of the graph of f {\displaystyle f} as it follows a geometrical approach (see Liehr & Massopust 2020). == Robustness and stability == Applications to fractional Brownian functions and the Weierstrass function reveal that the Higuchi fractal dimension can be close to the box-dimension. On the other hand, the method can be unstable in the case where the data X ( 1 ) , … , X ( N ) {\displaystyle X(1),\dots ,X(N)} are periodic or if subsets of it lie on a horizontal line (see Liehr & Massopust 2020).

Algorithm characterizations

Algorithm characterizations are attempts to formalize the word algorithm. Algorithm does not have a generally accepted formal definition. Researchers are actively working on this problem. This article will present some of the "characterizations" of the notion of "algorithm" in more detail. == The problem of definition == Over the last 200 years, the definition of the algorithm has become more complicated and detailed as researchers have tried to pin down the term. Indeed, there may be more than one type of "algorithm". But most agree that algorithm has something to do with defining generalized processes for the creation of "output" integers from other "input" integers – "input parameters" arbitrary and infinite in extent, or limited in extent but still variable—by the manipulation of distinguishable symbols (counting numbers) with finite collections of rules that a person can perform with paper and pencil. The most common number-manipulation schemes—both in formal mathematics and in routine life—are: (1) the recursive functions calculated by a person with paper and pencil, and (2) the Turing machine or its Turing equivalents—the primitive register-machine or "counter-machine" model, the random-access machine model (RAM), the random-access stored-program machine model (RASP) and its functional equivalent "the computer". When we are doing "arithmetic" we are really calculating by the use of "recursive functions" in the shorthand algorithms we learned in grade school, for example, adding and subtracting. The proofs that every "recursive function" we can calculate by hand we can compute by machine and vice versa—note the usage of the words calculate versus compute—is remarkable. But this equivalence together with the thesis (unproven assertion) that this includes every calculation/computation indicates why so much emphasis has been placed upon the use of Turing-equivalent machines in the definition of specific algorithms, and why the definition of "algorithm" itself often refers back to "the Turing machine". This is discussed in more detail under Stephen Kleene's characterization. The following are summaries of the more famous characterizations (Kleene, Markov, Knuth) together with those that introduce novel elements—elements that further expand the definition or contribute to a more precise definition. [ A mathematical problem and its result can be considered as two points in a space, and the solution consists of a sequence of steps or a path linking them. Quality of the solution is a function of the path. There might be more than one attribute defined for the path, e.g. length, complexity of shape, an ease of generalizing, difficulty, and so on. ] == Chomsky hierarchy == There is more consensus on the "characterization" of the notion of "simple algorithm". All algorithms need to be specified in a formal language, and the "simplicity notion" arises from the simplicity of the language. The Chomsky (1956) hierarchy is a containment hierarchy of classes of formal grammars that generate formal languages. It is used for classifying of programming languages and abstract machines. From the Chomsky hierarchy perspective, if the algorithm can be specified on a simpler language (than unrestricted), it can be characterized by this kind of language, else it is a typical "unrestricted algorithm". Examples: a "general purpose" macro language, like M4 is unrestricted (Turing complete), but the C preprocessor macro language is not, so any algorithm expressed in C preprocessor is a "simple algorithm". See also Relationships between complexity classes. == Features of a good algorithm == The following are desirable features of a well-defined algorithm, as discussed in Scheider and Gersting (1995): Unambiguous Operations: an algorithm must have specific, outlined steps. The steps should be exact enough to precisely specify what to do at each step. Well-Ordered: The exact order of operations performed in an algorithm should be concretely defined. Feasibility: All steps of an algorithm should be possible (also known as effectively computable). Input: an algorithm should be able to accept a well-defined set of inputs. Output: an algorithm should produce some result as an output, so that its correctness can be reasoned about. Finiteness: an algorithm should terminate after a finite number of instructions. Properties of specific algorithms that may be desirable include space and time efficiency, generality (i.e. being able to handle many inputs), or determinism. == 1881 John Venn's negative reaction to W. Stanley Jevons's Logical Machine of 1870 == In early 1870 W. Stanley Jevons presented a "Logical Machine" (Jevons 1880:200) for analyzing a syllogism or other logical form e.g. an argument reduced to a Boolean equation. By means of what Couturat (1914) called a "sort of logical piano [,] ... the equalities which represent the premises ... are "played" on a keyboard like that of a typewriter. ... When all the premises have been "played", the panel shows only those constituents whose sum is equal to 1, that is, ... its logical whole. This mechanical method has the advantage over VENN's geometrical method..." (Couturat 1914:75). For his part John Venn, a logician contemporary to Jevons, was less than thrilled, opining that "it does not seem to me that any contrivances at present known or likely to be discovered really deserve the name of logical machines" (italics added, Venn 1881:120). But of historical use to the developing notion of "algorithm" is his explanation for his negative reaction with respect to a machine that "may subserve a really valuable purpose by enabling us to avoid otherwise inevitable labor": (1) "There is, first, the statement of our data in accurate logical language", (2) "Then secondly, we have to throw these statements into a form fit for the engine to work with – in this case the reduction of each proposition to its elementary denials", (3) "Thirdly, there is the combination or further treatment of our premises after such reduction," (4) "Finally, the results have to be interpreted or read off. This last generally gives rise to much opening for skill and sagacity." He concludes that "I cannot see that any machine can hope to help us except in the third of these steps; so that it seems very doubtful whether any thing of this sort really deserves the name of a logical engine."(Venn 1881:119–121). == 1943, 1952 Stephen Kleene's characterization == This section is longer and more detailed than the others because of its importance to the topic: Kleene was the first to propose that all calculations/computations—of every sort, the totality of—can equivalently be (i) calculated by use of five "primitive recursive operators" plus one special operator called the mu-operator, or be (ii) computed by the actions of a Turing machine or an equivalent model. Furthermore, he opined that either of these would stand as a definition of algorithm. A reader first confronting the words that follow may well be confused, so a brief explanation is in order. Calculation means done by hand, computation means done by Turing machine (or equivalent). (Sometimes an author slips and interchanges the words). A "function" can be thought of as an "input-output box" into which a person puts natural numbers called "arguments" or "parameters" (but only the counting numbers including 0—the nonnegative integers) and gets out a single nonnegative integer (conventionally called "the answer"). Think of the "function-box" as a little man either calculating by hand using "general recursion" or computing by Turing machine (or an equivalent machine). "Effectively calculable/computable" is more generic and means "calculable/computable by some procedure, method, technique ... whatever...". "General recursive" was Kleene's way of writing what today is called just "recursion"; however, "primitive recursion"—calculation by use of the five recursive operators—is a lesser form of recursion that lacks access to the sixth, additional, mu-operator that is needed only in rare instances. Thus most of life goes on requiring only the "primitive recursive functions." === 1943 "Thesis I", 1952 "Church's Thesis" === In 1943 Kleene proposed what has come to be known as Church's thesis: "Thesis I. Every effectively calculable function (effectively decidable predicate) is general recursive" (First stated by Kleene in 1943 (reprinted page 274 in Davis, ed. The Undecidable; appears also verbatim in Kleene (1952) p.300) In a nutshell: to calculate any function the only operations a person needs (technically, formally) are the 6 primitive operators of "general" recursion (nowadays called the operators of the mu recursive functions). Kleene's first statement of this was under the section title "12. Algorithmic theories". He would later amplify it in his text (1952) as follows: "Thesis I and its converse provide the exact definition of the notion of a calculation (decision) procedure or algorithm, for the

Affective computing

Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While some core ideas in the field may be traced as far back as to early philosophical inquiries into emotion, the modern idea originated with Rosalind Picard's 1995 paper entitled "Affective Computing" and her 1997 book of the same name published by MIT Press. One motivation for researching affective computing is the ability to give machines emotional intelligence, including simulating empathy. The goal is that a machine should interpret the emotional state of humans and adapt its behavior to those emotions, responding appropriately. Recent experimental research has shown that subtle affective haptic feedback can shape human reward learning and mobile interaction behavior, suggesting that affective computing systems may not only interpret emotional states but also actively modulate user actions through emotion-laden outputs. == Areas == === Detecting and recognizing emotional information === Detecting emotional information usually begins with passive sensors that capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture, and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance. Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different modalities, such as speech recognition, natural language processing, or facial expression detection. The goal of most of these techniques is to produce labels that would match the labels a human would give in the same situation. For example, if a person makes a facial expression furrowing their brow, then the computer vision system might be trained to label their face as appearing "confused" or as "concentrating" or "slightly negative" (as opposed to positive, which it might say if they were smiling in a happy-appearing way). This response is based on the data used to train the system. These labels may or may not correspond to what the person is actually feeling. === Emotion in machines === Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine. Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'" The innovative approach "digital humans" or virtual humans includes an attempt to give these programs, which simulate humans, an emotional dimension as well, including reactions, facial expressions, and gestures in accordance with the reaction that a real person would have in certain emotionally stimulating situations. Emotion in machines often refers to emotion in computational, often AI-based, systems. As a result, the terms 'emotional AI' is being used. Some modern large language models simulate emotions in their chats with humans. ChatGPT's simulated emotion leans more positive than that of most human responses. == Technologies == In psychology, cognitive science, and in neuroscience, there have been two main approaches for describing how humans perceive and classify emotion: continuous or categorical. The continuous approach tends to use dimensions such as negative vs. positive, calm vs. aroused. The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, and disgust. Different kinds of machine learning regression and classification models are used for machines to produce continuous or discrete labels. Sometimes, models are also built that allow combinations across the categories (e.g. a happy-surprised face or a fearful-surprised face). The following sections consider many of the kinds of input data used for the task of emotion recognition. === Emotional speech === Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech. Some emotions have been found to be more easily computationally identified, such as anger or approval. Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and prosodic features such as pitch variables and speech rate can be analyzed through pattern recognition techniques. Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70-80% in research from 2003 and 2006. These systems tend to outperform average human accuracy (approximately 60%) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions. However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research. ==== Algorithms ==== The process of speech/text affect detection requires the creation of a reliable database, knowledge base, or vector space model, broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification. As of 2010, the most frequently used classifiers were linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms, and hidden Markov models (HMMs). Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system. The list below gives a brief description of each algorithm: LDC – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of vector features. k-NN – Classification happens by locating the object in the feature space, and comparing it with the k nearest neighbors (training examples). The majority vote decides on the classification. GMM – A probabilistic model used for representing the existence of subpopulations within the overall population. Each sub-population is described using the mixture distribution, which allows for classification of observations into the sub-populations. SVM – A type of (usually binary) linear classifier which decides in which of the two (or more) possible classes, each input may fall into. ANN – is a mathematical model, inspired by biological neural networks, that can better grasp possible non-linearities of the feature space. Decision tree algorithms – work based on following a decision tree in which leaves represent the classification outcome, and branches represent the conjunction of subsequent features that lead to the classification. HMMs – a statistical Markov model in which the states and state transitions are not directly available to observation. Instead, the series of outputs dependent on the states are visible. In the case of affect recognition, the outputs represent the sequence of speech feature vectors, which allow the deduction of states' sequences through which the model progressed. The states can consist of various intermediate steps in the expression of an emotion, and each of them has a probability distribution over the possible output vectors. The states' sequences allow us to predict the affective state which we are trying to classify, and this is one of the most commonly used techniques within the area of speech affect detection. It has been proven that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and th

Image destriping

Image destriping is the process of removing stripes or streaks from images and videos without disrupting the original image/video. These artifacts plague a range of fields in scientific imaging including atomic force microscopy, light sheet fluorescence microscopy, and planetary satellite imaging. The most common image processing techniques to reduce stripe artifacts is with Fourier filtering. Unfortunately, filtering methods risk altering or suppressing useful image data. Methods developed for multiple-sensor imaging systems in planetary satellites use statistical-based methods to match signal distribution across multiple sensors. More recently, a new class of approaches leverage compressed sensing, to regularize an optimization problem, and recover stripe free images. In many cases, these destriped images have little to no artifacts, even at low signal to noise ratios.

Microsoft Office PerformancePoint Server

Microsoft Office PerformancePoint Server is a business intelligence software product released in 2007 by Microsoft. The product was generally an integration of the acquisitions from ProClarity - the Planning Server and Monitoring Server - into Microsoft's SharePoint server product line. Although discontinued in 2009, the dashboard, scorecard, and analytics capabilities of PerformancePoint Server were incorporated into SharePoint 2010 and later versions. PerformancePoint Server also provided a planning and budgeting component directly integrated with Excel. == History == Microsoft offered preview releases of PerformancePoint Server starting in mid-2006. Previews of the product were formed from Business Scorecard Manager 2005 and the Planning Server component. Acquisitions ProClarity and Great Plains brought additional analytics and planning/reporting capabilities, as well as companion products ProClarity 6.3 and FRx. PerformancePoint Server was officially released in November 2007. Microsoft discontinued PerformancePoint Server as an independent product in 2009 and folded its dashboard, scorecard and analytics capabilities into PerformancePoint Services in SharePoint Server 2010. == Monitoring Server Component == Business monitoring capabilities, including dashboards, scorecards & key performance indicators, navigable reports for deeper analysis, strategy maps, and linked filtering, are provided by PerformancePoint's Monitoring Server component. A Dashboard Designer application that is distributed from Monitoring Server enables business analysts or IT Administrators to: create & test data source connections create views that use those data connections assemble the views into a dashboard deploy the dashboard as a SharePoint page Dashboard Designer saved content and security information back to the Monitoring Server. Data source connections, such as OLAP cubes or relational tables, were also made through Monitoring Server. After a dashboard has been published to the Monitoring Server database, it would be deployed as a SharePoint page and shared with other users as such. When the pages were opened in a web browser, Monitoring Server updated the data in the views by connecting back to the original data sources. == Planning Server Component == PerformancePoint's Planning Server component supported maintenance of logical business models, budget & approval workflows, enterprise data sources, and it followed Generally Accepted Accounting Principles. Planning Server made use of Excel for input and line-of-business reporting, as well as SQL Server for storing and processing business models. == Management Reporter Component == The Management Reporter component was designed to perform financial reporting and can read PerformancePoint Planning models directly. A development kit was also available to allow this component to read other models.