Color normalization

Color normalization

Color normalization is a topic in computer vision concerned with artificial color vision and object recognition. In general, the distribution of color values in an image depends on the illumination, which may vary depending on lighting conditions, cameras, and other factors. Color normalization allows for object recognition techniques based on color to compensate for these variations. == Main concepts == === Color constancy === Color constancy is a feature of the human internal model of perception, which provides humans with the ability to assign a relatively constant color to objects even under different illumination conditions. This is helpful for object recognition as well as identification of light sources in an environment. For example, humans see an object approximately as the same color when the sun is bright or when the sun is dim. === Applications === Color normalization has been used for object recognition on color images in the field of robotics, bioinformatics and general artificial intelligence, when it is important to remove all intensity values from the image while preserving color values. One example is in case of a scene shot by a surveillance camera over the day, where it is important to remove shadows or lighting changes on same color pixels and recognize the people that passed. Another example is automated screening tools used for the detection of diabetic retinopathy as well as molecular diagnosis of cancer states, where it is important to include color information during classification. == Known issues == The main issue about certain applications of color normalization is that the result looks unnatural or too distant from the original colors. In cases where there is a subtle variation between important aspects, this can be problematic. More specifically, the side effect can be that pixels become divergent and not reflect the actual color value of the image. A way of combating this issue is to use color normalization in combination with thresholding to correctly and consistently segment a colored image. == Transformations and algorithms == There is a vast array of different transformations and algorithms for achieving color normalization and a limited list is presented here. The performance of an algorithm is dependent on the task and one algorithm which performs better than another in one task might perform worse in another (no free lunch theorem). Additionally, the choice of the algorithm depends on the preferences of the user for the end-result, e.g. they may want a more natural-looking color image. === Grey world === The grey world normalization makes the assumption that changes in the lighting spectrum can be modelled by three constant factors applied to the red, green and blue channels of color. More specifically, a change in illuminated color can be modelled as a scaling α, β and γ in the R, G and B color channels and as such the grey world algorithm is invariant to illumination color variations. Therefore, a constancy solution can be achieved by dividing each color channel by its average value as shown in the following formula: ( α R , β G , γ B ) → ( α R α n ∑ i R , β G β n ∑ i G , γ B γ n ∑ i B ) {\displaystyle \left(\alpha R,\beta G,\gamma B\right)\rightarrow \left({\frac {\alpha R}{{\frac {\alpha }{n}}\sum _{i}R}},{\frac {\beta G}{{\frac {\beta }{n}}\sum _{i}G}},{\frac {\gamma B}{{\frac {\gamma }{n}}\sum _{i}B}}\right)} As mentioned above, grey world color normalization is invariant to illuminated color variations α, β and γ, however it has one important problem: it does not account for all variations of illumination intensity and it is not dynamic; when new objects appear in the scene it fails. To solve this problem there are several variants of the grey world algorithm. Additionally there is an iterative variation of the grey world normalization, however it was not found to perform significantly better. === Histogram equalization === Histogram equalization is a non-linear transform which maintains pixel rank and is capable of normalizing for any monotonically increasing color transform function. It is considered to be a more powerful normalization transformation than the grey world method. The results of histogram equalization tend to have an exaggerated blue channel and look unnatural, due to the fact that in most images the distribution of the pixel values is usually more similar to a Gaussian distribution, rather than uniform. === Histogram specification === Histogram specification transforms the red, green and blue histograms to match the shapes of three specific histograms, rather than simply equalizing them. It refers to a class of image transforms which aims to obtain images of which the histograms have a desired shape. As specified, firstly it is necessary to convert the image so that it has a particular histogram. Assume an image x. The following formula is the equalization transform of this image: y = f ( x ) = ∫ 0 x p x ( u ) d u {\displaystyle y=f(x)=\int \limits _{0}^{x}p_{x}(u)du} Then assume wanted image z. The equalization transform of this image is: y ′ = g ( z ) = ∫ 0 z p z ( u ) d u {\displaystyle y'=g(z)=\int \limits _{0}^{z}p_{z}(u)du} Of course p z ( u ) {\displaystyle p_{z}(u)} is the histogram of the output image. The formula to find the inverse of the above transform is: z = g − 1 ( y ′ ) {\displaystyle z=g^{-1}(y')} Therefore, since images y and y' have the same equalized histogram they are actually the same image, meaning y = y' and the transform from the given image x to the wanted image z is: z = g − 1 ( y ′ ) = g − 1 ( y ) = g − 1 ( f ( x ) ) {\displaystyle z=g^{-1}(y')=g^{-1}(y)=g^{-1}(f(x))} Histogram specification has the advantage of producing more realistic looking images, as it does not exaggerate the blue channel like histogram equalization. === Comprehensive Color Normalization === The comprehensive color normalization is shown to increase localization and object classification results in combination with color indexing. It is an iterative algorithm which works in two stages. The first stage is to use the red, green and blue color space with the intensity normalized, to normalize each pixel. The second stage is to normalize each color channel separately, so that the sum of the color components is equal to one third of the number of pixels. The iterations continue until convergence, meaning no additional changes. Formally: Normalize the color image f ( t ) = [ f i j ( t ) ] i = 1... N , j = 1... M {\displaystyle f^{(t)}=[f_{ij}^{(t)}]_{i=1...N,j=1...M}} which consists of color vectors f i j ( t ) = ( r i j ( t ) , g i j ( t ) , b i j ( t ) ) T . {\displaystyle f_{ij}^{(t)}=(r_{ij}^{(t)},g_{ij}^{(t)},b_{ij}^{(t)})^{T}.} For the first step explained above, compute: S i j := r i j ( t ) + g i j ( t ) + b i j ( t ) {\displaystyle S_{ij}:=r_{ij}^{(t)}+g_{ij}^{(t)}+b_{ij}^{(t)}} which leads to r i j ( t + 1 ) = r i j ( t ) S i j , g i j ( t + 1 ) = g i j ( t ) S i j {\displaystyle r_{ij}^{(t+1)}={\frac {r_{ij}^{(t)}}{S_{ij}}},g_{ij}^{(t+1)}={\frac {g_{ij}^{(t)}}{S_{ij}}}} and b i j ( t + 1 ) = b i j ( t ) S i j . {\displaystyle b_{ij}^{(t+1)}={\frac {b_{ij}^{(t)}}{S_{ij}}}.} For the second step explained above, compute: r ′ = 3 N M ∑ i = 1 N ∑ j = 1 M r i j ( t + 1 ) {\displaystyle r'={\frac {3}{NM}}\sum _{i=1}^{N}\sum _{j=1}^{M}r_{ij}^{(t+1)}} and normalize r i j ( t + 2 ) = r i j ( t + 1 ) r ′ . {\displaystyle r_{ij}^{(t+2)}={\frac {r_{ij}^{(t+1)}}{r'}}.} Of course the same process is done for b' and g'. Then these two steps are repeated until the changes between iteration t and t+2 are less than some set threshold. Comprehensive color normalization, just like the histogram equalization method previously mentioned, produces results that may look less natural due to the reduction in the number of color values.

Text simplification

Text simplification is an aspect of natural language processing that involves modifying, organizing, or categorizing existing text to make it easier to understand while retaining its original meaning. This process is essential in today's world, where communication is increasingly complex due to advancements in science, technology, and media. Human languages are inherently intricate, with extensive vocabularies and complex structures that can be challenging for machines to handle efficiently. Researchers have found that semantic compression techniques can help streamline and simplify text by reducing linguistic diversity and simplifying the vocabulary used in a given context. == Example == Text simplification involves modifying complex sentences into simpler ones to enhance readability and comprehension. Siddharthan (2006) provides an example to illustrate this process. The original sentence contains multiple clauses and phrases, which can be broken down into simpler sentences for better understanding. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today. An approach to text simplification involves lexical simplification via lexical substitution, a process that replaces complex words with simpler synonyms. Identifying complex words is a challenge addressed by machine learning classifiers trained on labeled data. Researchers have found that asking labelers to sort words by complexity levels yields more consistent results than the traditional method of categorizing words as simple or complex.

Structural similarity index measure

The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference. SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. This distinguishes from other techniques such as mean squared error (MSE) or peak signal-to-noise ratio (PSNR) that instead estimate absolute errors. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image. == History == The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik index, which was developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing. In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings. The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University. Further variants of the model have been developed in the Image and Visual Computing Laboratory at University of Waterloo and have been commercially marketed. SSIM subsequently found strong adoption in the image processing community and in the television and social media industries. The 2004 SSIM paper has been cited over 50,000 times according to Google Scholar, making it one of the highest cited papers in the image processing and video engineering fields. It was recognized with the IEEE Signal Processing Society Best Paper Award for 2009. It also received the IEEE Signal Processing Society Sustained Impact Award for 2016, indicative of a paper having an unusually high impact for at least 10 years following its publication. Because of its high adoption by the television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering Emmy Award in 2015 by the Television Academy. == Algorithm == The SSIM index is calculated between two windows of pixel values x {\displaystyle x} and y {\displaystyle y} of common size, from corresponding locations in two images to be compared. These SSIM values can be aggregated across the full images by averaging or other variations. === Special-case formula === In one simple special case, further explained in the next section, the SSIM measure between x {\displaystyle x} and y {\displaystyle y} is: SSIM ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) {\displaystyle {\hbox{SSIM}}(x,y)={\frac {(2\mu _{x}\mu _{y}+c_{1})(2\sigma _{xy}+c_{2})}{(\mu _{x}^{2}+\mu _{y}^{2}+c_{1})(\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2})}}} with: μ x {\displaystyle \mu _{x}} the pixel sample mean of x {\displaystyle x} ; μ y {\displaystyle \mu _{y}} the pixel sample mean of y {\displaystyle y} ; σ x 2 {\displaystyle \sigma _{x}^{2}} the sample variance of x {\displaystyle x} ; σ y 2 {\displaystyle \sigma _{y}^{2}} the sample variance of y {\displaystyle y} ; σ x y {\displaystyle \sigma _{xy}} the sample covariance of x {\displaystyle x} and y {\displaystyle y} ; c 1 = ( k 1 L ) 2 {\displaystyle c_{1}=(k_{1}L)^{2}} , c 2 = ( k 2 L ) 2 {\displaystyle c_{2}=(k_{2}L)^{2}} two variables to stabilize the division with weak denominator; L {\displaystyle L} the dynamic range of the pixel-values (typically this is 2 # b i t s p e r p i x e l − 1 {\displaystyle 2^{\#bits\ per\ pixel}-1} ); k 1 = 0.01 {\displaystyle k_{1}=0.01} and k 2 = 0.03 {\displaystyle k_{2}=0.03} by default. === General formula and components === The SSIM formula is based on three comparison measurements between the samples of x {\displaystyle x} and y {\displaystyle y} : luminance ( l {\displaystyle l} ), contrast ( c {\displaystyle c} ), and structure ( s {\displaystyle s} ). The individual comparison functions are: l ( x , y ) = 2 μ x μ y + c 1 μ x 2 + μ y 2 + c 1 {\displaystyle l(x,y)={\frac {2\mu _{x}\mu _{y}+c_{1}}{\mu _{x}^{2}+\mu _{y}^{2}+c_{1}}}} c ( x , y ) = 2 σ x σ y + c 2 σ x 2 + σ y 2 + c 2 {\displaystyle c(x,y)={\frac {2\sigma _{x}\sigma _{y}+c_{2}}{\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}}}} s ( x , y ) = σ x y + c 3 σ x σ y + c 3 {\displaystyle s(x,y)={\frac {\sigma _{xy}+c_{3}}{\sigma _{x}\sigma _{y}+c_{3}}}} The SSIM for each block is then a weighted combination of those comparative measures: SSIM ( x , y ) = l ( x , y ) α ⋅ c ( x , y ) β ⋅ s ( x , y ) γ {\displaystyle {\text{SSIM}}(x,y)=l(x,y)^{\alpha }\cdot c(x,y)^{\beta }\cdot s(x,y)^{\gamma }} Choosing the third denominator stabilizing constant as: c 3 = c 2 / 2 {\displaystyle c_{3}=c_{2}/2} leads to a simplification when combining the c and s components with equal exponents ( β = γ {\displaystyle \beta =\gamma } ), as the numerator of c is then twice the denominator of s, leading to a cancellation leaving just a 2. Setting the weights (exponents) α , β , γ {\displaystyle \alpha ,\beta ,\gamma } to 1, the formula can then be reduced to the special case shown above. === Mathematical properties === SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a distance function. However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function. The square of such a function is not convex, but is locally convex and quasiconvex, making SSIM a feasible target for optimization. === Application of the formula === In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window of size 11×11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image to create an SSIM quality map of the image. In the case of video quality assessment, the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation. === Variants === ==== Multi-scale SSIM ==== A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases. ==== Multi-component SSIM ==== Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions. The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception. The authors of 3-SSIM have also extended the model into four-component SSIM (4-SSIM). The edge types are further subdivided into preserved and changed edges by their distortion status. The proposed weighting is 0.25 for all four components. ==== Structural dissimilarity ==== Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied. DSSIM ( x , y ) = 1 − SSIM ( x , y ) 2 {\displaystyle {\hbox{DSSIM}}(x,y)={\frac {1-{\hbox{SSIM}}(x,y)}{2}}} ==== Video quality metrics and temporal variants ==== It is worth noting that the original vers

Apptek

Applications Technology (AppTek) is a U.S. company headquartered in McLean, Virginia that specializes in artificial intelligence and machine learning for human language technologies. The company provides both managed and professional services for natural language processing (NLP) technologies including automatic speech recognition (ASR), neural machine translation (MT), natural-language understanding (NLU) and neural speech synthesis. AppTek's Head of Science, Prof. Dr. -Ing Hermann Ney, was awarded the IEEE James L. Flanagan Speech and Audio Processing Award in 2019 and the ISCA Medal for Scientific Achievement in 2021 for his work in natural language processing. == History == AppTek was acquired in 1998 by Lernout & Hauspie (at the time a NASDAQ publicly traded company), AppTek organized a management buy-out and went private again in 2001. In 2014, the company sold its hybrid machine translation technology to eBay and has since rebuilt the platform to modern neural-based approaches for machine translation. In 2020, SOSi acquired non-controlling interest in AppTek and became an exclusive reseller of AppTek products for U.S. federal, state, and local government entities.

Beauty.AI

Beauty.AI is a mobile beauty pageant for humans and a contest for programmers developing algorithms for evaluating human appearance. The mobile app and website created by Youth Laboratories that uses artificial intelligence technology to evaluate people's external appearance through certain algorithms, such as symmetry, facial blemishes, wrinkles, estimated age and age appearance, and comparisons to actors and models. The Beauty.AI 2.0 contest caused great concern over important ethical issues with deep neural networks such as age, race and gender bias and lead to the creation of the Diversity.AI think tank dedicated to developing new methods for uncovering and managing bias in artificially intelligent systems. Beauty.AI was also an attempt to find approaches on how machines can perceive human face through evaluating particular features, commonly associated with health and beauty. == Concept == The Beauty.AI app was created by Youth Laboratories, a company based out of Russia and Hong Kong that focuses on facial skin analytics. The bioinformation company Insilico Medicine assists in the Beauty.AI app by testing its deep learning techniques to the app. One goal of the app is to reduce the need for human and animal testing as well as improving people's overall health. Its first contest was started in December 2016, and the results were announced in August 2016. More than 60,000 people submitted entries into the contest. The mobile app uses artificial intelligence technology to inspect photographs for certain facial features in order to both determine a person's beauty through artificial means by multiple robots. Part of the Beauty.AI app's purpose is to collect visual and anecdotal data to improve its creator's Youth Laboratories skin analyst skills. == Accusations of racism == There were a total of 44 individuals from different age groups and genders judged as the most attractive, with 37 white entrants, six Asian entrants, and one dark-skinned entrant. The app has received criticism from social justice advocates and computer science professionals. However, Alex Zhavoronkov, PhD, chief science officer of Youth Laboratories and chief technology officer Konstantin Kiselev, both for Youth Laboratories, noted that a lack of data may have contributed to these results. Also, Kiselev added that another issue was that approximately 75% of entrants were white Europeans, whereas only 7% and 1% were from India and Africa, respectively. Kiselev stated that they would work on doing more and better outreach to these areas to improve in this area. Despite this, it was said by Dr. Zhavoronkov that the AI would discard photos of dark-skinned people if the lighting is too poor. Dr. Zhavoronkov vowed to weed out the issues for the next beauty pageant and to try to avoid a similar controversy in the future.

Web development tools

Web development tools (often abbreviated to dev tools) allow web developers to test, modify and debug their websites. They are different from website builders and integrated development environments (IDEs) in that they do not assist in the direct creation of a webpage, rather they are tools used for testing the user interface of a website or web application. Web development tools come as browser add-ons or built-in features in modern web browsers. Browsers such as Google Chrome, Firefox, Safari, Microsoft Edge, and Opera have built-in tools to help web developers, and many additional add-ons can be found in their respective plugin download centers. Web development tools allow developers to work with a variety of web technologies, including HTML, CSS, the DOM, JavaScript, and other components that are handled by the web browser. == History and support == Early web developers manually debugged their websites by commenting out code and using JavaScript functions. One of the first browser debugging tools to exist was Mozilla's Firebug extension, which possessed many of the current core features of today's developer tools, leading to Firefox becoming popular with developers at the time. Safari's WebKit engine also introduced its integrated developer tools around that period, which eventually became the basis for both Safari and Chrome's current tooling. Microsoft released a developer toolbar for Internet Explorer 6 and 7; and then integrated them into the browser from version 8 onwards. In 2017, Mozilla discontinued Firebug in favour of integrated developer tools. Nowadays, all modern web browsers have support for web developer tools that allow web designers and developers to look at the make-up of their pages. These are all tools that are built into the browser and do not require additional modules or configuration. Firefox – F12 opens the Firefox DevTools. Google Chrome and Opera – Developer Tools (DevTools) Microsoft Edge – F12 opens Web Developer Tools. Microsoft incorporates additional features that are not included in mainline Chromium. Safari – The Safari Web Inspector has to be enabled from its settings pane. == Features == The built-in web developer tools in the browser are commonly accessed by hovering over an item on a webpage and selecting the "Inspect Element" or similar option from the context menu. Alternatively the F12 key tends to be another common shortcut. === HTML and the DOM === HTML and DOM viewer and editor is commonly included in the built-in web development tools. The difference between the HTML and DOM viewer, and the view source feature in web browsers is that the HTML and DOM viewer allows you to see the DOM as it was rendered in addition to allowing you to make changes to the HTML and DOM and see the change reflected in the page after the change is made. In addition to selecting and editing, the HTML elements panels will usually also display properties of the DOM object, such as display dimension, and CSS properties. Firefox, Safari, Chrome, and Edge all allow users to simulate the document on a mobile device by modifying the viewport dimensions and pixel density. Additionally, Firefox and Chrome both have the option to simulate colour blindness for the page. === Web page assets, resources and network information === Web pages typically load and require additional content in the form of images, scripts, font and other external files. Web development tools also allow developers to inspect resources that are loaded and available on the web page in a tree-structure listing, and the appearance of style sheets can be tested in real time. Web development tools also allow developers to view information about the network usage, such as viewing what the loading time and bandwidth usage are and which HTTP headers are being sent and received. Developers can manipulate and resend network requests. === Profiling and auditing === Profiling allows developers to capture information about the performance of a web page or web application. With this information developers can improve the performance of their scripts. Auditing features may provide developers suggestions, after analyzing a page, for optimizations to decrease page load time and increase responsiveness. Web development tools typically also provide a record of the time it takes to render the page, memory usage, and the types of events which are taking place. These features allow developers to optimize their web page or web application. ==== JavaScript debugging ==== JavaScript is commonly used in web browsers. Web development tools commonly include a debugger panel for scripts by allowing developers to add watch expressions, breakpoints, view the call stack, and pause, continue, and step while debugging JavaScript. A console is also often included, which allow developers to type in JavaScript commands and call functions, or view errors that may have been encountered during the execution of a script. === Extensions === The devtools API allows browser extensions to add their own features to developer tools.

RIPAC (microprocessor)

RIPAC was a VLSI single-chip microprocessor designed for automatic recognition of the connected speech, one of the first of this use. The project of the microprocessor RIPAC started in 1984. RIPAC was aimed to provide efficient real-time speech recognition services to the italian telephone system provided by SIP. The microprocessor was presented in September 1986 at The Hague (Netherlands) at EUSPICO conference. It was composed of 70.000 transistors and structured as Harvard architecture. The name RIPAC is the acronym for "Riconoscimento del PArlato Connesso", that means "Recognition of the connected speech" in Italian. The microprocessor was designed by the Italian companies CSELT and ELSAG and was produced by SGS: a combination of Hidden Markov Model and Dynamic Time Warping algorithms was used for processing speech signals. It was able to do real-time speech recognition of Italian and many languages with a good affordability. The chip, issued by U.S. Patent No. 4,907,278, worked at first run.