AI Face Year

AI Face Year — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Color histogram

    Color histogram

    In image processing and photography, a color histogram is a representation of the distribution of colors in an image. For digital images, a color histogram represents the number of pixels that have colors in each of a fixed list of color ranges that span the image's color space (the set of all possible colors). A color histogram can be built for any kind of color space, although the term is more often used for three-dimensional spaces such as RGB or HSV. For monochromatic images, the term intensity histogram may be used instead. For multi-spectral images, where each pixel is represented by an arbitrary number of measurements (for example, beyond the three measurements in RGB), a color histogram is N-dimensional, with N being the number of measurements taken. Each measurement has its own wavelength range of the light spectrum, some of which may be outside the visible spectrum. If the set of possible color values is sufficiently small, each of those colors may be placed on a range by itself; then the histogram is merely the count of pixels that have each possible color. Most often, the space is divided into an appropriate number of ranges, often arranged as a regular grid, each containing many similar color values. A color histogram may also be represented and displayed as a smooth function defined over the color space that approximates the pixel counts. Like other kinds of histograms, a color histogram is a statistic that can be viewed as an approximation of an underlying continuous distribution of color values. == Overview == Color histograms are flexible constructs that can be built from images in various color spaces, whether RGB, rg chromaticity or any other color space of any dimension. A histogram of an image is produced first by discretization of the colors in the image into a number of bins, and counting the number of image pixels in each bin. For example, a red–blue chromaticity histogram can be formed by first normalizing color pixel values by dividing RGB values by R+G+B, then quantizing the normalized R and B coordinates into N bins each. A two-dimensional histogram of red–blue chromaticity divided into four bins (N=4) may yield a histogram similar to this table: A histogram can be N-dimensional. Although harder to display, a three-dimensional color histogram for the above example could be thought of as four separate red–blue histograms, where each of the four histograms contains the red–blue values for a bin of green (0–63, 64–127, 128–191, and 192–255). The histogram provides a compact summarization of the distribution of data in an image. A color histogram of an image is relatively invariant with translation and rotation about the viewing axis, and varies only slowly with the angle of view. By comparing histogram signatures of two images and matching the color content of one image with the other, a color histogram is particularly well suited for the problem of recognizing an object of unknown position and rotation within a scene. Importantly, translation of an RGB image into the illumination invariant rg-chromaticity space allows the histogram to operate well in varying light levels. 1. What is a histogram? A histogram is a graphical representation of the number of pixels in an image. In a more simple way to explain, a histogram is a bar graph, whose X-axis represents the tonal scale (black at the left and white at the right), and Y-axis represents the number of pixels in an image in a certain area of the tonal scale. For example, the graph of a luminance histogram shows the number of pixels for each brightness level (from black to white), and when there are more pixels, the peak at the certain luminance level is higher. 2. What is a color histogram? A color histogram of an image represents the distribution of the composition of colors in the image. It shows different types of colors appeared and the number of pixels in each type of the colors appeared. The relation between a color histogram and a luminance histogram is that a color histogram can be also expressed as “three luminance histograms”, each of which shows the brightness distribution of each individual red/green/blue color channel. == Characteristics of a color histogram == A color histogram focuses only on the proportion of the number of different types of colors, regardless of the spatial location of the colors. The values of a color histogram are from statistics. They show the statistical distribution of colors and the essential tone of an image. In general, as the color distributions of the foreground and background in an image are different, there might be a bimodal distribution in the histogram. For the luminance histogram alone, there is no perfect histogram and in general, the histogram can tell whether it is over-exposure or not, but there are times when you might think the image is over exposed by viewing the histogram; however, in reality it is not. == Principles of the formation of a color histogram == The formation of a color histogram is rather simple. From the definition above, we can simply count the number of pixels for each 256 scales in each of the 3 RGB channel, and plot them on 3 individual bar graphs. In general, a color histogram is based on a certain color space, such as RGB or HSV. When we compute the pixels of different colors in an image, if the color space is large, then we can first divide the color space into certain numbers of small intervals. Each of the intervals is called a bin. This process is called color quantization. Then, by counting the number of pixels in each of the bins, we get a color histogram of the image. The concrete steps of the principles can be viewed in Example 1. == Examples == === Example 1 === Given the following image of a cat (an original version and a version that has been reduced to 256 colors for easy histogram purposes), the following data represents a color histogram in the RGB color space, using four bins. Bin 0 corresponds to intensities 0–63 Bin 1 is 64–127 Bin 2 is 128–191 and Bin 3 is 192–255. === Example 2 === Application in camera: Nowadays, some cameras have the ability to show the 3 color histograms when we take photos. We can examine clips (spikes on either the black or white side of the scale) in each of the 3 RGB color histograms. If we find one or more clipping on a channel of the 3 RGB channels, then this would result in a loss of detail for that color. To illustrate this, consider this example: We know that each of the three R, G, B channels has a range of values from 0 to 255 (8 bit). So consider a photo that has a luminance range of 0–255. Assume the photo we take is made of 4 blocks that are adjacent to each other and we set the luminance scale for each of the 4 blocks of original photo to be 10, 100, 205, 245. Thus, the image looks like the topmost figure on the right. Then, we overexpose the photo a little, say, the luminance scale of each block is increased by 10. Thus, the luminance scale for each of the 4 blocks of new photo is 20, 110, 215, 255. Then, the image looks like the second figure on the right. There is not much difference between both figures, all we can see is that the whole image becomes brighter (the contrast for each of the blocks remain the same). Now, we overexpose the original photo again, this time the luminance scale of each block is increased by 50. Thus, the luminance scale for each of the 4 blocks of the new photo is 60, 150, 255, 255. The new image now looks like the third figure on the right. Note that the scale for the last block is 255 instead of 295, for 255 is the top scale and thus the last block has clipped. When this happens, we lose the contrast of the last 2 blocks, and thus we cannot recover the image no matter how we adjust it. To conclude, when taking photos with a camera that displays histograms, always keep the brightest tone in the image below the largest scale 255 on the histogram in order to avoid losing details. == Drawbacks and other approaches == The main drawback of histograms for classification is that the representation is dependent on the color of the object being studied, ignoring its shape and texture. Color histograms can potentially be identical for two images with different object content which happens to share color information. Conversely, without spatial or shape information, similar objects of different color may be indistinguishable based solely on color histogram comparisons. There is no way to distinguish a red and white cup from a red and white plate. Put it another way: histogram-based algorithms have no concept of a generic 'cup', and a model of a red and white cup is no use when given an otherwise identical blue and white cup. Another problem is that color histograms have high sensitivity to noisy interference such as lighting intensity changes and quantization errors. High dimensionality (bins) color histograms are also another issue. Some color histogram feature spaces often occupy more than one hundred di

    Read more →
  • Adobe Presenter Video Express

    Adobe Presenter Video Express

    Adobe Presenter Video Express is screencasting and video editing software developed by Adobe Systems. == Description == Adobe Presenter Video Express is primarily used as a software by video creators, to record and mix webcam and screen video feeds. It allows users to simultaneously record video from their webcam and the screen, and easily mix the 2 tracks with a simple user interface. Users can change the background in their recorded video without needing equipment like a green screen. This is unlike other video tools which rely on chroma keying technology, and only work with green or blue screens. They can also add annotations and quizzes to their content and publish the video to MP4 or HTML5 formats. == List of notable features == === Record and mix, screen and webcam === Support for simultaneous recording of screen and webcam video feeds, with a simple editing interface to mix the two video streams. This lets the author rapidly create screencasts, software demos, etc. === Make my background awesome === This feature allows authors to change the background of their webcam recording without needing a green screen, provided they use a solid-colored backdrop which contrasts well against them. Authors can select images, videos or even the screen recording as their background. === In-video quizzing === Authors can insert quizzes within their video content. On success/failure attempts, the author can decide what message to display, and can also configure the video to jump to a certain point and play. Quizzes are published as part of the interactive HTML 5 player, which cannot be hosted on YouTube and Vimeo. === LMS Reporting === Authors can publish to any SCORM compliant LMS (Learning Management System) for quiz reporting, or to Adobe Captivate Prime. === In-app assets and branding === Adobe Presenter Video Express ships with a large number of branding videos, backgrounds and video filters to help authors create studio quality videos. === MP4 and HTML5 Output === The tool publishes a single MP4 video file containing all the video content, within an HTML 5 wrapper that contains the interactive player. The interactive HTML 5 player can be hosted on any website. == Common uses == === Screencasting === Screencasting is the process of recording one's computer screen as a video, usually with an audio voice over, to create a software demonstration, tutorial, presentation, etc. Adobe Presenter Video Express supports simultaneous recording of full screen video and microphone audio for creating screencasts. === Product marketing and demos === The ability to record the webcam video in addition to everything that is visible on the screen in Adobe Presenter Video Express, allows the author to add their personality to their screencasts. Features like video mixing and 'make my background awesome' further enhance the presentation, allowing effortless creation of marketing and demo videos. === Education === Adobe Presenter Video Express supports in-video quizzes and LMS reporting, along with screencasting and webcam recording. These features make it a powerful tool for creating educational content. == Differences from Adobe Presenter and Adobe Captivate == Adobe Presenter is a Microsoft PowerPoint plug-in for converting PowerPoint slides into interactive eLearning content, available only on Windows. Starting with Adobe Presenter 8, the video creation tool Adobe Presenter Video Express was bundled with every purchase of Adobe Presenter. From September 2015, Adobe Presenter Video Express 11 was also made available as a stand-alone product on Windows and Mac. A subscription license for Adobe Presenter Video Express, valid on Windows and Mac, is available for $9.99/month. Adobe Presenter Video Express continues to be bundled with purchases of Adobe Presenter on Windows as well. Adobe Captivate is an authoring tool for creating numerous forms of interactive eLearning content. Unlike Adobe Presenter, it uses a proprietary editing interface instead of Microsoft PowerPoint. While it is possible to create screen captures with Adobe Captivate, you cannot record the webcam feed. Adobe Captivate does not bundle Adobe Presenter or Adobe Presenter Video Express.

    Read more →
  • Facebook Messenger

    Facebook Messenger

    Messenger (formerly known as Facebook Messenger) is an American proprietary instant messaging service developed by Meta Platforms, the company that operates Facebook. Originally developed as Facebook Chat in 2008, the client application of Messenger is currently available on iOS and Android mobile platforms, Windows and macOS desktop platforms, through the Messenger.com web application, and on the standalone Meta Portal hardware. Messenger is used to send messages and exchange photos, videos, stickers, audio, and files, and also react to other users' messages and interact with bots. The service also supports voice and video calling. The standalone apps support using multiple accounts, conversations with end-to-end encryption, and playing games. There are also group chats where you can connect with multiple people at once in a private space such as Panama Chat. With a monthly userbase of over 1 billion people, it is among the largest social media platforms. == History == Following tests of a new instant messaging platform on Facebook in March 2008, the feature, then-titled "Facebook Chat", was gradually released to users in April 2008. Facebook revamped its messaging platform in November 2010, and subsequently acquired group messaging service Beluga in March 2011, which the company used to launch its standalone iOS and Android mobile apps on August 9, 2011. Facebook later launched a BlackBerry version in October 2011. An app for Windows Phone, though lacking features including voice messaging and chat heads, was released in March 2014. In April 2014, Facebook announced that the messaging feature would be removed from the main Facebook app and users will be required to download the separate Messenger app. An iPad-optimized version of the iOS app was released in July 2014. On April 8, 2015, Facebook launched a website interface for Messenger. A Tizen app was released on July 13, 2015. Facebook launched Messenger for Windows 10 in April 2016. In October 2016, Facebook released Messenger Lite, a stripped-down version of Messenger with a reduced feature set. The app is aimed primarily at old Android phones and regions where high-speed Internet is not widely available. In April 2017, Messenger Lite was expanded to 132 more countries. In May 2017, Facebook revamped the design for Messenger on Android and iOS, bringing a new home screen with tabs and categorization of content and interactive media, red dots indicating new activity, and relocated sections. Facebook announced a Messenger program for Windows 7 in a limited beta test in November 2011. The following month, Israeli blog TechIT leaked a download link for the program, with Facebook subsequently confirming and officially releasing the program. The program was eventually discontinued in March 2014. A Firefox web browser add-on was released in December 2012, but was also discontinued in March 2014. In December 2017, Facebook announced Messenger Kids, a new app aimed for persons under 13 years of age. The app comes with some differences compared to the standard version. In 2019, Messenger announced to be the 2nd most downloaded mobile app of the decade, from 2011 to 2019. In December 2019, Messenger dropped support for users to sign in using only a mobile number, meaning that users must sign in to a Facebook account in order to use the service. In March 2020, Facebook started to ship its dedicated Messenger for macOS app through the Mac App Store. The app is currently live in regions including France, Australia, Mexico, Poland, and many others. In April 2020, Facebook began rolling out a new feature called Messenger Rooms, a video chat feature that allows users to chat with up to 50 people at a time. The feature rivals Zoom, an application that gained a lot of popularity during the COVID-19 pandemic. Privacy concerns arose since the feature uses the same data collection policies as mainstream Facebook. In July 2020, Facebook added a new feature in Messenger that lets iOS users to use Apple's Face ID or Touch ID to lock their chats. The feature is called App Lock and is a part of several changes in Messenger regarding privacy and security. The option to view only "Unread Threads" was removed from the inbox, requiring the account holder to scroll through the entire inbox to be certain every unread message has been seen. On October 13, 2020, the Messenger application introduced cross-app messaging with Instagram, which was launched in September 2021. In addition to the integrated messaging, the application announced the introduction of a new logo, which should be an amalgamation of the Messenger and Instagram logo. The desktop app of Messenger was shut down on December 15, 2025. Messaging services were moved to the Facebook website or Messenger's site for those without an account on the former. The Messenger site was discontinued on April 16, 2026. Messaging services were moved to the Facebook website on the morning of April 17, 2026 without an Messenger account on the former to use Facebook account. == Features == The following is a table of features available in Messenger, as well as their geographical coverage and what devices they are available on. In addition there is a vanishing message feature. In addition there is an audio recording feature which allows audio recordings of up to one minute which may or may not be vanishing: === Messenger Rooms === It is a video conferencing feature of Messenger. It allows users to add up to 50 people at a time. Messenger Rooms does not require a Facebook account. Messenger Rooms competes with other services such as Zoom. Back in 2014, Facebook introduced an unrelated, stand-alone application named Rooms, letting users create places for users with similar interests, with users being anonymous to others. This was shut down in December 2015. In April 2020, during the COVID-19 pandemic, Facebook revealed video conferencing features for Messenger called Messenger Rooms. This was seen as a response to the popularity of other video conferencing platforms such as Zoom and Skype in the midst of the COVID-19 pandemic. Messenger Rooms allows users to add up to 50 people per room, without restrictions on time. It does not require a Facebook account or a separate app from Messenger. When used, it only prompts the user for basic information. Users can add 360° virtual backgrounds, mood lighting, and other AR effects as well as share screens. To prevent unwanted participants from joining, users can lock rooms and remove participants. Some have voiced concerns in regards to Messenger Room's privacy and how its parent, Facebook, handles data. Messenger Rooms, unlike some of its competitors, does not use end-to-end encryption. In addition, there have been concerns over how Messenger Rooms collects user data. == Monetization == In January 2017, Facebook announced that it was testing showing advertisements in Messenger's home feed. At the time, the testing was limited to a "small number of users in Australia and Thailand", with the ad format being swipe-based carousel ads. In July, the company announced that they were expanding the testing to a global audience. Stan Chudnovsky, head of Messenger, told VentureBeat that "We'll start slow ... When the average user can be sure to see them we truly don't know because we're just going to be very data-driven and user feedback-driven on making that decision". Facebook told TechCrunch that the advertisements' placement in the inbox depends on factors such as thread count, phone screen size, and pixel density. In a TechCrunch editorial by Devin Coldewey, he described the ads as "huge" in the space they occupy, "intolerable" in the way they appear in the user interface, and "irrelevant" due to the lack of context. Coldewey finished by writing "Advertising is how things get paid for on the internet, including TechCrunch, so I'm not an advocate of eliminating it or blocking it altogether. But bad advertising experiences can spoil a perfectly good app like (for the purposes of argument) Messenger. Messaging is a personal, purposeful use case and these ads are a bad way to monetize it." == Reception == In November 2014, the Electronic Frontier Foundation (EFF) listed Messenger (Facebook chat) on its Secure Messaging Scorecard. It received a score of 2 out of 7 points on the scorecard. It received points for having communications encrypted in transit and for having recently completed an independent security audit. It missed points because the communications were not encrypted with keys the provider didn't have access to, users could not verify contacts' identities, past messages were not secure if the encryption keys were stolen, the source code was not open to independent review, and the security design was not properly documented. As stated by Facebook in its Help Center, there is no way to log out of the Messenger application. Instead, users can choose between different availability statuses, including "Appear as inactive", "S

    Read more →
  • Edits (app)

    Edits (app)

    Edits is an American photo and short form video editing software service owned by Meta Platforms. It allows users to create videos and edit them by using features like green screens, and AI animation, and also provides real-time statistics to Instagram creators to track their accounts. Accounts directly from Instagram can be imported, and videos can be exported vice-versa. It is available solely on iOS and Android. On Apple, it supports over 32 different languages, including French, Spanish, and Chinese. It has been noted by critics as a direct competitor for apps like CapCut, owned by Chinese brand ByteDance. The Instagram head, Adam Mosseri, also acknowledged these similarities. Launched on April 22 for both iOS and Android. It received over 5M+ users on Apple and Android combined in its first 4 days since its launch. == History == On January 19, 2025, following the ban of all ByteDance Apps from the Google Play Store, and App Store, Instagram head Adam Mosseri announced on Threads that they would be launching the app in February for iOS, followed by an Android counterpart. He said the app is working with select people to test its features. In a separate post, he emphasized that the app is "more for creators than casual video makers". == Features == Edits contains many similar features to other competition of video editors like KineMaster, Inshot, and CapCut. When creating a video, users have the option to export in resolution of HD, 4K, and 2K, along with having HDR and SDR support. Like many traditional video editing software, it includes a timeline, and basic undo-redo buttons. On the bottom bar, 7 tabs for editing exist, namely the Split, Volume, Adjust, Speed, Delete, Filters, Green Screen, Voice FX, Extract Audio, Mirror, Slip, Replace and Duplicate bars. Basic features, like splitting, and adjusting speed and volume of clips are present, along with more advanced Green Screens, and AI features. Being a mobile video editor app, Edits also has drag-and-drop features to ease customer usage. Users have the ability to record videos directly within the app. This feature allows users to create content without needing extra software or devices. They can choose from several focal lengths, which affect how close or wide the shot appears. The app also supports different frame rates. Users have the ability to record videos directly within the app. This feature allows users to create content without needing extra software or devices. Once users are done filming your clips, they can simply transfer them into a project to start editing immediately. Upcoming features for the app include Keyframes, AI-powered modification, Collaboration, and Enhanced creativity. == Reception == Since its release, it received over 5 million downloads in 4 days. Critically, the app received great rankings from many. From users, the app received an average of 4.45 stars over Google Play Store and App Store in the first few days, with Google Play Store receiving the least stars. As in reviews, it was received mixed by the public. Many people praised the smoothness and intuivity of the app. "The app is more than just a basic editor, offering a full suite of creative tools, including a dedicated tab for inspiration and trending audio, as well as a tab for managing drafts," said a blogger. Some users were disappointed with the range of editing tools, some users have noted that it could benefit from more transition options between clips. Some even reported crashing between clips.

    Read more →
  • Tandem Money

    Tandem Money

    Tandem is one of the UK's original challenger banks. Tandem is a digital bank with a mobile app, and no branches. The acquisition of Harrods Bank in 2017 allowed the company to provide services using the former's banking licence. Tandem Bank Limited is authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority. Tandem has offices across the UK in Blackpool, Cardiff, Durham and London, employing over 500 people. == History == The company was founded by Ricky Knox, Matt Cooper and Michael Kent in 2014. In December 2016, Tandem announced that it had secured a £35 million investment from The Sanpower Group, the Chinese company that also owned the department store House of Fraser; however, £29 million of this investment was later revoked by Sanpower over concerns that the Chinese Government would object to the investment following increased restrictions on outbound investment in China. This resulted in a delay in the launch of Tandem's savings products, which, at the time of the revocation, was expected imminently and, more importantly, meant that Tandem volunteered the return of their banking license but retained all other permissions. In April 2018, Tandem launched fixed-term savings accounts, offering one-, two- and three-year terms through its app. === Acquisitions === In August 2017, it was announced that Tandem would fully acquire Harrods Bank, founded in 1893, in a deal that would bring a near-£200m loan book, over £300m of deposits and nearly £80 million of capital. Prior to its sale to Tandem Money, Harrods Bank catered for high-net-worth (HNW) individuals and operated from the Harrods store in Knightsbridge, London. It offered a variety of personal and business current and savings accounts, mortgages, foreign currency and gold bullion trading services. On 7 August 2017, Tandem Money Limited announced a deal to acquire 100% of Harrods Bank Limited shares. The purchase deal closed successfully on 11 January 2018. In March 2018, Tandem agreed to acquire Pariti Technologies Limited, developers of the Pariti money management application. In August 2020 Tandem acquired green home improvement loan specialists Allium Lending Group. It was announced on 8 February 2021 that Tandem had agreed to purchase the mortgage book from private bank Bank and Clients, consisting of 300 B&C customers for an undisclosed amount. In January 2022 Tandem Bank acquired consumer lender Oplo, creating a combined business with £1.2 billion of total assets. In April 2023, it was announced that Tandem had acquired money-sharing app Loop Money. At the time of the purchase, one of Loop's founders – Paul Pester – was also chairman at Tandem. == Features == Tandem Bank offers customers savings, mortgages, personal and secured loans, green home improvement loans and motor finance. In November 2022, the bank launched its new Tandem Marketplace, providing information and resources to help promote greener living.

    Read more →
  • BiP (software)

    BiP (software)

    BiP is a freeware instant messaging application developed by Lifecell Ventures Cooperatief U.A., a subsidiary of Turkcell incorporated in the Netherlands. It allows users to send text messages, voice messages and video calling, and it can be downloaded from the App Store, Google Play, and Huawei AppGallery. BiP has over 53 million users worldwide, and was first released in 2013. == Functions == BiP is a secure, and free communication platform. BiP allows making video and audio calls, allows sharing images, videos and location. BiP includes instant translations to 106 languages and exchange rates. President Erdoğan's Communications Office opposed WhatsApp's enforcement of its updated privacy policy and announced that Erdoğan left WhatsApp and opened an account in Telegram and BiP. The Turkish Ministry of National Defense has announced that it will move information groups to BiP for the same reason. == Others == Banglalink announced a BiP messenger partnership in Bangladesh The Communications Office of President Erdoğan opposed WhatsApp's enforcement of its updated privacy policy and announced that Erdoğan left WhatsApp and opened an account in Telegram and BiP. The Turkish Ministry of National Defense has announced that it will move information groups to BiP for the same reason. The CEO of BiP is Burak Akinci. The number of downloads of the app is 80 million globally.

    Read more →
  • Amira (software)

    Amira (software)

    Amira (ah-MEER-ah) is a software platform for visualization, processing, and analysis of 3D and 4D data. It is being actively developed by Thermo Fisher Scientific in collaboration with the Zuse Institute Berlin (ZIB), and commercially distributed by Thermo Fisher Scientific — together with its sister software Avizo. == Overview == Amira is an extendable software system for scientific visualization, data analysis, and presentation of 3D and 4D data. It is used by researchers and engineers in academia and industry. It is a tool for processing, analysis and visualization of data from various modalities; e.g. micro-CT, PET, Ultrasound. It is used in many fields, such as microscopy in biology and materials science, molecular biology, quantum physics, astrophysics, computational fluid dynamics (CFD), finite element modeling (FEM), non-destructive testing (NDT), and many more. One of the key features, besides data visualization, is Amira's set of tools for image segmentation and geometry reconstruction. This allows the user to mark (or segment) structures and regions of interest in 3D image volumes using automatic, semi-automatic, and manual tools. The segmentation can then be used for a variety of subsequent tasks, such as volumetric analysis, density analysis, shape analysis, or the generation of 3D computer models for visualization, numerical simulations, or rapid prototyping or 3D printing. Other key Amira features are multi-planar and volume visualization, image registration, filament tracing, cell separation and analysis, tetrahedral mesh generation, fiber-tracking from diffusion tensor imaging (DTI) data, skeletonization, spatial graph analysis, and stereoscopic rendering of 3D data over multiple displays and immersive virtual reality environments, including CAVEs. As a commercial product Amira requires the purchase of a license or an academic subscription. A time-limited, but full-featured evaluation version is available for download free of charge. == History == === 1993–1998: Research software === Amira's roots go back to 1993 and the Department for Scientific Visualization, headed by Hans-Christian Hege at the Zuse Institute Berlin (ZIB). The ZIB is a research institute for mathematics and informatics. The Scientific Visualization department's mission is to help solve computationally and scientifically challenging tasks in medicine, biology, engineering and materials science. For this purpose, it develops algorithms and software for 2D, 3D, and 4D data visualization and visually supported exploration and analysis. At that time, the young visualization group at the ZIB had experience with the extendable, data flow-oriented visualization environments apE, IRIS Explorer, and Advanced Visualization Studio (AVS), but was not satisfied with these products' interactivity, flexibility, and ease-of-use for non-computer scientists. Therefore, the development of a new software system was started in a research project within a medically oriented, multi-disciplinary collaborative research center. Based on experiences that Tobias Höllerer had gained in late 1993 with the new graphics library IRIS Inventor, it was decided to utilize that library. The development of the medical planning system was performed by Detlev Stalling, who later became the chief software architect of Amira. The new software was called "HyperPlan", highlighting its initial target application – a planning system for hyperthermia cancer treatment. The system was being developed on Silicon Graphics (SGI) computers, which at the time were the standard workstations used for high-end graphics computing. The software was based on libraries such as OpenGL (originally IRIS GL), Open Inventor (originally IRIS Inventor), and the graphical user interface libraries X11, Motif (software), and ViewKit. In 1998, X11/Motif/Viewkit were replaced by the Qt toolkit. The HyperPlan framework served as the base for more and more projects at the ZIB and was used by a growing number of researchers in collaborating institutions. The projects included applications in medical image computing, medical visualization, neurobiology, confocal microscopy, flow visualization, molecular analytics and computational astrophysics. === 1998–today: Commercially supported product === The growing number of users of the system started to exceed the capacities that ZIB could spare for software distribution and support, as ZIB's primary mission was algorithmic research. Therefore, the spin-off company Indeed – Visual Concepts GmbH was founded by Hans-Christian Hege, Detlev Stalling, and Malte Westerhoff. In Feb 1998 the HyperPlan software was given the new, application-neutral name "Amira". This name is not an acronym, but was chosen for being pronounceable in different languages and providing a suitable connotation, namely "to look at" or "to wonder at", from the Latin verb "admirare" (to admire), which reflects a basic situation in data visualization. A major re-design of the software was undertaken by Detlev Stalling and Malte Westerhoff in order to make it a commercially supportable product and to make it available on non-SGI computers as well. In March 1999, the first version of the commercial Amira was exhibited at the CeBIT tradeshow in Hannover, Germany on SGI IRIX and Hewlett-Packard UniX (HP-UX) booths. Versions for Linux and Microsoft Windows followed within the following twelve months. Later Mac OS X support was added. Indeed – Visual Concepts GmbH selected the Bordeaux, France and San Diego, United States based company TGS, Inc. as the worldwide distributor for Amira and completed five major releases (up to version 3.1) in the subsequent four years. In 2003 both Indeed – Visual Concepts GmbH, as well as TGS, Inc. were acquired by Massachusetts-based Mercury Computer Systems, Inc. (NASDAQ:MRCY) and became part of Mercury's newly formed life sciences business unit, later branded Visage Imaging. In 2009, Mercury Computer Systems, Inc. spun off Visage Imaging again and sold it to Melbourne, Australia based Promedicus Ltd (ASX:PME), a leading provider of radiology information systems and medical IT solutions. During this time, Amira continued to be developed in Berlin, Germany and in close collaboration with the ZIB, still headed by the original creators of Amira. TGS, located in Bordeaux, France was sold by Mercury Computer systems to a French investor and renamed to Visualization Sciences Group (VSG). VSG continued the work on a complementary product named Avizo, based on the same source code but customized for material sciences. In August 2012, FEI, to that date the largest OEM reseller of Amira, purchased VSG and the Amira business from Promedicus. This brought the two software sisters Amira and Avizo back into one hand. In August 2013, Visualization Sciences Group (VSG) became a business unit of FEI. In 2016 FEI has been bought by Thermo Fisher Scientific and became part of its Materials & Structural Analysis division in early 2017. Amira and Avizo are still being marketed as two different products; Amira for life sciences and Avizo for materials science, but the development efforts are now joined once again. In the meantime, the number of scientific articles using the Amira / Avizo software, is in the order of 10 thousands. == Amira options == === Microscopy option === Specific readers for microscopy data Image deconvolution Exploration of 3D imagery obtained from virtually any microscope Extraction and editing of filament networks from microscopy images === DICOM reader === Import of clinical and preclinical data in DICOM format === Mesh option === Generation of 3D finite element (FE) meshes from segmented image data Support for many state-of-the-art FE solver formats High-quality visualization of simulation mesh-based results, using scalar, vector, and tensor field display modules === Skeletonization option === Reconstruction and analysis of neural and vascular networks Visualization of skeletonized networks Length and diameter quantification of network segments Ordering of segments in a tree graph Skeletonization of very large image stacks === Molecular option === Advanced tools for the visualization of molecule models Hardware-accelerated volume rendering Powerful molecule editor Specific tools for complex molecular visualization === Developer option === Creation of new custom components for visualizing or data processing Implementation of new file readers or writers C++ programming language Development wizard for getting started quickly === Neuro option === Medical image analysis for DTI and brain perfusion Fiber tracking supporting several stream-line based algorithms Fiber separation into fiber bundles based on user defined source and destination regions Computation of tensor fields, diffusion weighted maps Eigenvalue decomposition of tensor fields Computation of mean transit time, cerebral blood flow, and cerebral blood volume === VR option === Visualization of data on large tiled displays

    Read more →
  • MoltenVK

    MoltenVK

    MoltenVK is a software library which allows Vulkan applications to run on top of Metal on Apple's macOS, iOS, and tvOS operating systems. It is the first software component to be released for the Vulkan Portability Initiative, a project to have a subset of Vulkan run on platforms lacking native Vulkan drivers. There are some limitations compared with a native Vulkan implementation. == History == MoltenVK was first released as a proprietary and commercially licensed product by The Brenwill Workshop on July 27, 2016. On July 31, 2017, Khronos announced the formation of the Vulkan Portability Technical Subgroup. === Open source === On February 26, 2018, Khronos announced that Vulkan became available on macOS and iOS products through the MoltenVK library. Valve announced that Dota 2 will run on macOS using the Vulkan API with the aid of MoltenVK, and that they had made an arrangement with developer The Brenwill Workshop Ltd to release MoltenVK as open-source software under the Apache License version 2.0. On May 30, 2018, Qt was updated with Vulkan for Qt on macOS using MoltenVK. On May 31, 2018, optional Vulkan support for Dota 2 on macOS was released. Benchmarks for the game were available the following day, showing better performance using Vulkan and MoltenVK compared to OpenGL. On July 20, 2018, Wine was updated with Vulkan support on macOS using MoltenVK. On 29 July 2018, the first app using MoltenVK was accepted onto the App Store, after initially being rejected. On 6 August 2018, Google open-sourced Filament, a crossplatform real-time physically based rendering engine with MoltenVK for macOS/iOS. On November 28, 2018, Valve released Artifact, their first Vulkan-only game on macOS using MoltenVK. === Version 1.0 === On 29 January 2019, MoltenVK 1.0.32 was released with early prototype of Vulkan Portability Extensions. RPCS3 and Dolphin emulators were updated with Vulkan support on macOS using MoltenVK. On 13 April 2019, MoltenVK 1.0.34 was released with support for tessellation. On July 30, 2019, MoltenVK 1.0.36 was released targeting Metal 3.0. On July 31, 2020, MoltenVK 1.0.44 was released, adding support for the tvOS platform. On January 23, 2020, MoltenVK was updated to support for some of the new features of Vulkan 1.2, as of Vulkan SDK 1.2.121. === Version 1.1 === On October 1, 2020, MoltenVK 1.1.0 was released, adding full support for Vulkan 1.1, as of Vulkan SDK 1.2.154. On 9 December 2020, MoltenVK 1.1.1 was released, providing support for Vulkan on Apple silicon GPUs and support for the Mac Catalyst platform for porting iOS/iPadOS apps to macOS. === Version 1.2 === On October 18, 2022, MoltenVK 1.2.0 was released, adding full support for Vulkan 1.2 as of Vulkan SDK 1.3.231. In January 2023, MoltenVK 1.2.2 added support for Vulkan as of SDK 1.3.239, while this version of Vulkan SDK fixed some issues with the interconnectivity with Metal API, while version 1.2.3 supported some additional extensions. === Version 1.3 === On May 1, 2025, MoltenVK 1.3 was released with support for Vulkan 1.3. === Version 1.4 === On August 20, 2025, MoltenVK 1.4 was released with support for Vulkan 1.4.

    Read more →
  • Facebook Messenger

    Facebook Messenger

    Messenger (formerly known as Facebook Messenger) is an American proprietary instant messaging service developed by Meta Platforms, the company that operates Facebook. Originally developed as Facebook Chat in 2008, the client application of Messenger is currently available on iOS and Android mobile platforms, Windows and macOS desktop platforms, through the Messenger.com web application, and on the standalone Meta Portal hardware. Messenger is used to send messages and exchange photos, videos, stickers, audio, and files, and also react to other users' messages and interact with bots. The service also supports voice and video calling. The standalone apps support using multiple accounts, conversations with end-to-end encryption, and playing games. There are also group chats where you can connect with multiple people at once in a private space such as Panama Chat. With a monthly userbase of over 1 billion people, it is among the largest social media platforms. == History == Following tests of a new instant messaging platform on Facebook in March 2008, the feature, then-titled "Facebook Chat", was gradually released to users in April 2008. Facebook revamped its messaging platform in November 2010, and subsequently acquired group messaging service Beluga in March 2011, which the company used to launch its standalone iOS and Android mobile apps on August 9, 2011. Facebook later launched a BlackBerry version in October 2011. An app for Windows Phone, though lacking features including voice messaging and chat heads, was released in March 2014. In April 2014, Facebook announced that the messaging feature would be removed from the main Facebook app and users will be required to download the separate Messenger app. An iPad-optimized version of the iOS app was released in July 2014. On April 8, 2015, Facebook launched a website interface for Messenger. A Tizen app was released on July 13, 2015. Facebook launched Messenger for Windows 10 in April 2016. In October 2016, Facebook released Messenger Lite, a stripped-down version of Messenger with a reduced feature set. The app is aimed primarily at old Android phones and regions where high-speed Internet is not widely available. In April 2017, Messenger Lite was expanded to 132 more countries. In May 2017, Facebook revamped the design for Messenger on Android and iOS, bringing a new home screen with tabs and categorization of content and interactive media, red dots indicating new activity, and relocated sections. Facebook announced a Messenger program for Windows 7 in a limited beta test in November 2011. The following month, Israeli blog TechIT leaked a download link for the program, with Facebook subsequently confirming and officially releasing the program. The program was eventually discontinued in March 2014. A Firefox web browser add-on was released in December 2012, but was also discontinued in March 2014. In December 2017, Facebook announced Messenger Kids, a new app aimed for persons under 13 years of age. The app comes with some differences compared to the standard version. In 2019, Messenger announced to be the 2nd most downloaded mobile app of the decade, from 2011 to 2019. In December 2019, Messenger dropped support for users to sign in using only a mobile number, meaning that users must sign in to a Facebook account in order to use the service. In March 2020, Facebook started to ship its dedicated Messenger for macOS app through the Mac App Store. The app is currently live in regions including France, Australia, Mexico, Poland, and many others. In April 2020, Facebook began rolling out a new feature called Messenger Rooms, a video chat feature that allows users to chat with up to 50 people at a time. The feature rivals Zoom, an application that gained a lot of popularity during the COVID-19 pandemic. Privacy concerns arose since the feature uses the same data collection policies as mainstream Facebook. In July 2020, Facebook added a new feature in Messenger that lets iOS users to use Apple's Face ID or Touch ID to lock their chats. The feature is called App Lock and is a part of several changes in Messenger regarding privacy and security. The option to view only "Unread Threads" was removed from the inbox, requiring the account holder to scroll through the entire inbox to be certain every unread message has been seen. On October 13, 2020, the Messenger application introduced cross-app messaging with Instagram, which was launched in September 2021. In addition to the integrated messaging, the application announced the introduction of a new logo, which should be an amalgamation of the Messenger and Instagram logo. The desktop app of Messenger was shut down on December 15, 2025. Messaging services were moved to the Facebook website or Messenger's site for those without an account on the former. The Messenger site was discontinued on April 16, 2026. Messaging services were moved to the Facebook website on the morning of April 17, 2026 without an Messenger account on the former to use Facebook account. == Features == The following is a table of features available in Messenger, as well as their geographical coverage and what devices they are available on. In addition there is a vanishing message feature. In addition there is an audio recording feature which allows audio recordings of up to one minute which may or may not be vanishing: === Messenger Rooms === It is a video conferencing feature of Messenger. It allows users to add up to 50 people at a time. Messenger Rooms does not require a Facebook account. Messenger Rooms competes with other services such as Zoom. Back in 2014, Facebook introduced an unrelated, stand-alone application named Rooms, letting users create places for users with similar interests, with users being anonymous to others. This was shut down in December 2015. In April 2020, during the COVID-19 pandemic, Facebook revealed video conferencing features for Messenger called Messenger Rooms. This was seen as a response to the popularity of other video conferencing platforms such as Zoom and Skype in the midst of the COVID-19 pandemic. Messenger Rooms allows users to add up to 50 people per room, without restrictions on time. It does not require a Facebook account or a separate app from Messenger. When used, it only prompts the user for basic information. Users can add 360° virtual backgrounds, mood lighting, and other AR effects as well as share screens. To prevent unwanted participants from joining, users can lock rooms and remove participants. Some have voiced concerns in regards to Messenger Room's privacy and how its parent, Facebook, handles data. Messenger Rooms, unlike some of its competitors, does not use end-to-end encryption. In addition, there have been concerns over how Messenger Rooms collects user data. == Monetization == In January 2017, Facebook announced that it was testing showing advertisements in Messenger's home feed. At the time, the testing was limited to a "small number of users in Australia and Thailand", with the ad format being swipe-based carousel ads. In July, the company announced that they were expanding the testing to a global audience. Stan Chudnovsky, head of Messenger, told VentureBeat that "We'll start slow ... When the average user can be sure to see them we truly don't know because we're just going to be very data-driven and user feedback-driven on making that decision". Facebook told TechCrunch that the advertisements' placement in the inbox depends on factors such as thread count, phone screen size, and pixel density. In a TechCrunch editorial by Devin Coldewey, he described the ads as "huge" in the space they occupy, "intolerable" in the way they appear in the user interface, and "irrelevant" due to the lack of context. Coldewey finished by writing "Advertising is how things get paid for on the internet, including TechCrunch, so I'm not an advocate of eliminating it or blocking it altogether. But bad advertising experiences can spoil a perfectly good app like (for the purposes of argument) Messenger. Messaging is a personal, purposeful use case and these ads are a bad way to monetize it." == Reception == In November 2014, the Electronic Frontier Foundation (EFF) listed Messenger (Facebook chat) on its Secure Messaging Scorecard. It received a score of 2 out of 7 points on the scorecard. It received points for having communications encrypted in transit and for having recently completed an independent security audit. It missed points because the communications were not encrypted with keys the provider didn't have access to, users could not verify contacts' identities, past messages were not secure if the encryption keys were stolen, the source code was not open to independent review, and the security design was not properly documented. As stated by Facebook in its Help Center, there is no way to log out of the Messenger application. Instead, users can choose between different availability statuses, including "Appear as inactive", "S

    Read more →
  • ArcSoft ShowBiz

    ArcSoft ShowBiz

    ShowBiz is a video editor by ArcSoft for the Windows operating system. It can create VCD and DVDs and can also export to the formats AVI, MPEG, WMV, and MOV. ShowBiz also contains a DVD burning and menu building feature. As of 2003, it was one of the three most dominant bundled titles. == Reception == PC Magazine reviewer Jan Ozer states: "ArcSoft's ShowBiz has evolved into a competent editor that's generally more usable than Dazzle's MovieStar program, providing more configuration controls, better preview features, and a much greater range of fun effects." John Virata, senior editor of Digital Media Online, says in his three page review of ShowBiz DVD 2, "It is an easy editor to work with and has a logically laid out interface that takes you step by step through the video creation and DVD creation process"

    Read more →
  • Template matching

    Template matching

    Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used for quality control in manufacturing, navigation of mobile robots, or edge detection in images. The main challenges in a template matching task are detection of occlusion, when a sought-after object is partly hidden in an image; detection of non-rigid transformations, when an object is distorted or imaged from different angles; sensitivity to illumination and background changes; background clutter; and scale changes. == Feature-based approach == The feature-based approach to template matching relies on the extraction of image features, such as shapes, textures, and colors, that match the target image or frame. This approach is usually achieved using neural networks and deep-learning classifiers such as VGG, AlexNet, and ResNet.Convolutional neural networks (CNNs), which many modern classifiers are based on, process an image by passing it through different hidden layers, producing a vector at each layer with classification information about the image. These vectors are extracted from the network and used as the features of the image. Feature extraction using deep neural networks, like CNNs, has proven extremely effective has become the standard in state-of-the-art template matching algorithms. This feature-based approach is often more robust than the template-based approach described below. As such, it has become the state-of-the-art method for template matching, as it can match templates with non-rigid and out-of-plane transformations, as well as high background clutter and illumination changes. == Template-based approach == For templates without strong features, or for when the bulk of a template image constitutes the matching image as a whole, a template-based approach may be effective. Since template-based matching may require sampling of a large number of data points, it is often desirable to reduce the number of sampling points by reducing the resolution of search and template images by the same factor before performing the operation on the resultant downsized images. This pre-processing method creates a multi-scale, or pyramid, representation of images, providing a reduced search window of data points within a search image so that the template does not have to be compared with every viable data point. Pyramid representations are a method of dimensionality reduction, a common aim of machine learning on data sets that suffer the curse of dimensionality. == Common challenges == In instances where the template may not provide a direct match, it may be useful to implement eigenspaces to create templates that detail the matching object under a number of different conditions, such as varying perspectives, illuminations, color contrasts, or object poses. For example, if an algorithm is looking for a face, its template eigenspaces may consist of images (i.e., templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions (i.e., poses). It is also possible for a matching image to be obscured or occluded by an object. In these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search object may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or by some other object in front of the camera. In cases where the object is malleable or poseable, motion becomes an additional problem, and problems involving both motion and occlusion become ambiguous. In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision. == Deformable templates in computational anatomy == Template matching is a central tool in computational anatomy (CA). In this field, a deformable template model is used to model the space of human anatomies and their orbits under the group of diffeomorphisms, functions which smoothly deform an object. Template matching arises as an approach to finding the unknown diffeomorphism that acts on a template image to match the target image. Template matching algorithms in CA have come to be called large deformation diffeomorphic metric mappings (LDDMMs). Currently, there are LDDMM template matching algorithms for matching anatomical landmark points, curves, surfaces, volumes. == Template-based matching explained using cross correlation or sum of absolute differences == A basic method of template matching sometimes called "Linear Spatial Filtering" uses an image patch (i.e., the "template image" or "filter mask") tailored to a specific feature of search images to detect. This technique can be easily performed on grey images or edge images, where the additional variable of color is either not present or not relevant. Cross correlation techniques compare the similarities of the search and template images. Their outputs should be highest at places where the image structure matches the template structure, i.e., where large search image values get multiplied by large template image values. This method is normally implemented by first picking out a part of a search image to use as a template. Let S ( x , y ) {\displaystyle S(x,y)} represent the value of a search image pixel, where ( x , y ) {\displaystyle (x,y)} represents the coordinates of the pixel in the search image. For simplicity, assume pixel values are scalar, as in a greyscale image. Similarly, let T ( x t , y t ) {\textstyle T(x_{t},y_{t})} represent the value of a template pixel, where ( x t , y t ) {\textstyle (x_{t},y_{t})} represents the coordinates of the pixel in the template image. To apply the filter, simply move the center (or origin) of the template image over each point in the search image and calculate the sum of products, similar to a dot product, between the pixel values in the search and template images over the whole area spanned by the template. More formally, if ( 0 , 0 ) {\displaystyle (0,0)} is the center (or origin) of the template image, then the cross correlation T ⋆ S {\displaystyle T\star S} at each point ( x , y ) {\displaystyle (x,y)} in the search image can be computed as: ( T ⋆ S ) ( x , y ) = ∑ ( x t , y t ) ∈ T T ( x t , y t ) ⋅ S ( x t + x , y t + y ) {\displaystyle (T\star S)(x,y)=\sum _{(x_{t},y_{t})\in T}T(x_{t},y_{t})\cdot S(x_{t}+x,y_{t}+y)} For convenience, T {\displaystyle T} denotes both the pixel values of the template image as well as its domain, the bounds of the template. Note that all possible positions of the template with respect to the search image are considered. Since cross correlation values are greatest when the values of the search and template pixels align, the best matching position ( x m , y m ) {\displaystyle (x_{m},y_{m})} corresponds to the maximum value of T ⋆ S {\displaystyle T\star S} over S {\displaystyle S} . Another way to handle translation problems on images using template matching is to compare the intensities of the pixels, using the sum of absolute differences (SAD) measure. To formulate this, let I S ( x s , y s ) {\displaystyle I_{S}(x_{s},y_{s})} and I T ( x t , y t ) {\displaystyle I_{T}(x_{t},y_{t})} denote the light intensity of pixels in the search and template images with coordinates ( x s , y s ) {\displaystyle (x_{s},y_{s})} and ( x t , y t ) {\displaystyle (x_{t},y_{t})} , respectively. Then by moving the center (or origin) of the template to a point ( x , y ) {\displaystyle (x,y)} in the search image, as before, the sum of absolute differences between the template and search pixel intensities at that point is: S A D ( x , y ) = ∑ ( x t , y t ) ∈ T | I T ( x t , y t ) − I S ( x t + x , y t + y ) | {\displaystyle SAD(x,y)=\sum _{(x_{t},y_{t})\in T}\left\vert I_{T}(x_{t},y_{t})-I_{S}(x_{t}+x,y_{t}+y)\right\vert } With this measure, the lowest SAD gives the best position for the template, rather than the greatest as with cross correlation. SAD tends to be relatively simple to implement and understand, but it also tends to be relatively slow to execute. A simple C++ implementation of SAD template matching is given below. == Implementation == In this simple implementation, it is assumed that the above described method is applied on grey images: This is why Grey is used as pixel intensity. The final position in this implementation gives the top left location for where the template image best matches the search image. One way to perform template matching on color images is to decompose the pixels into their color components and measure the quality of match between the color template and search image using the sum of the SAD computed for each color separately. == Speeding up the process == In the past, this type of spatial filtering was normally only used in dedicated hardware solutions because of the computational complexity of the operation, however we can lessen this complexity b

    Read more →
  • Image warping

    Image warping

    Image warping is the process of digitally manipulating an image such that any shapes portrayed in the image have been significantly distorted. Warping may be used for correcting image distortion as well as for creative purposes (e.g., morphing). The same techniques are equally applicable to video. While an image can be transformed in various ways, pure warping means that points are mapped to points without changing the colors. This can be based mathematically on any function from (part of) the plane to the plane. If the function is injective the original can be reconstructed. If the function is a bijection any image can be inversely transformed. Some methods are: Images may be distorted through simulation of optical aberrations. Images may be viewed as if they had been projected onto a curved or mirrored surface. (This is often seen in ray traced images.) Images can be partitioned into image polygons and each polygon distorted. Images can be distorted using morphing. The most obvious approach to transforming a digital image is the forward mapping. This applies the transform directly to the source image, typically generating unevenly-spaced points that will then be interpolated to generate the required regularly-spaced pixels. However, for injective transforms reverse mapping is also available. This applies the inverse transform to the target pixels to find the unevenly-spaced locations in the source image that contribute to them. Estimating them from source image pixels will require interpolation of the source image. To work out what kind of warping has taken place between consecutive images, one can use optical flow estimation techniques. == Image warping toolbox == ImWIP is an open-source, image warping tool for modeling deformation and motion in digital images, which contains differentiable image warping operators, together with their exact adjoints and derivatives.

    Read more →
  • Cross-entropy method

    Cross-entropy method

    The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective. The method approximates the optimal importance sampling estimator by repeating two phases: Draw a sample from a probability distribution. Minimize the cross-entropy between this distribution and a target distribution to produce a better sample in the next iteration. Reuven Rubinstein developed the method in the context of rare-event simulation, where tiny probabilities must be estimated, for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems. The method has also been applied to the traveling salesman, quadratic assignment, DNA sequence alignment, max-cut and buffer allocation problems. == Estimation via importance sampling == Consider the general problem of estimating the quantity ℓ = E u [ H ( X ) ] = ∫ H ( x ) f ( x ; u ) d x {\displaystyle \ell =\mathbb {E} _{\mathbf {u} }[H(\mathbf {X} )]=\int H(\mathbf {x} )\,f(\mathbf {x} ;\mathbf {u} )\,{\textrm {d}}\mathbf {x} } , where H {\displaystyle H} is some performance function and f ( x ; u ) {\displaystyle f(\mathbf {x} ;\mathbf {u} )} is a member of some parametric family of distributions. Using importance sampling this quantity can be estimated as ℓ ^ = 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) g ( X i ) {\displaystyle {\hat {\ell }}={\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{g(\mathbf {X} _{i})}}} , where X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} is a random sample from g {\displaystyle g\,} . For positive H {\displaystyle H} , the theoretically optimal importance sampling density (PDF) is given by g ∗ ( x ) = H ( x ) f ( x ; u ) / ℓ {\displaystyle g^{}(\mathbf {x} )=H(\mathbf {x} )f(\mathbf {x} ;\mathbf {u} )/\ell } . This, however, depends on the unknown ℓ {\displaystyle \ell } . The CE method aims to approximate the optimal PDF by adaptively selecting members of the parametric family that are closest (in the Kullback–Leibler sense) to the optimal PDF g ∗ {\displaystyle g^{}} . == Generic CE algorithm == Choose initial parameter vector v ( 0 ) {\displaystyle \mathbf {v} ^{(0)}} ; set t = 1. Generate a random sample X 1 , … , X N {\displaystyle \mathbf {X} _{1},\dots ,\mathbf {X} _{N}} from f ( ⋅ ; v ( t − 1 ) ) {\displaystyle f(\cdot ;\mathbf {v} ^{(t-1)})} Solve for v ( t ) {\displaystyle \mathbf {v} ^{(t)}} , where v ( t ) = argmax v ⁡ 1 N ∑ i = 1 N H ( X i ) f ( X i ; u ) f ( X i ; v ( t − 1 ) ) log ⁡ f ( X i ; v ) {\displaystyle \mathbf {v} ^{(t)}=\mathop {\textrm {argmax}} _{\mathbf {v} }{\frac {1}{N}}\sum _{i=1}^{N}H(\mathbf {X} _{i}){\frac {f(\mathbf {X} _{i};\mathbf {u} )}{f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})}}\log f(\mathbf {X} _{i};\mathbf {v} )} If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2. In several cases, the solution to step 3 can be found analytically. Situations in which this occurs are When f {\displaystyle f\,} belongs to the natural exponential family When f {\displaystyle f\,} is discrete with finite support When H ( X ) = I { x ∈ A } {\displaystyle H(\mathbf {X} )=\mathrm {I} _{\{\mathbf {x} \in A\}}} and f ( X i ; u ) = f ( X i ; v ( t − 1 ) ) {\displaystyle f(\mathbf {X} _{i};\mathbf {u} )=f(\mathbf {X} _{i};\mathbf {v} ^{(t-1)})} , then v ( t ) {\displaystyle \mathbf {v} ^{(t)}} corresponds to the maximum likelihood estimator based on those X k ∈ A {\displaystyle \mathbf {X} _{k}\in A} . == Continuous optimization—example == The same CE algorithm can be used for optimization, rather than estimation. Suppose the problem is to maximize some function S {\displaystyle S} , for example, S ( x ) = e − ( x − 2 ) 2 + 0.8 e − ( x + 2 ) 2 {\displaystyle S(x)={\textrm {e}}^{-(x-2)^{2}}+0.8\,{\textrm {e}}^{-(x+2)^{2}}} . To apply CE, one considers first the associated stochastic problem of estimating P θ ( S ( X ) ≥ γ ) {\displaystyle \mathbb {P} _{\boldsymbol {\theta }}(S(X)\geq \gamma )} for a given level γ {\displaystyle \gamma \,} , and parametric family { f ( ⋅ ; θ ) } {\displaystyle \left\{f(\cdot ;{\boldsymbol {\theta }})\right\}} , for example the 1-dimensional Gaussian distribution, parameterized by its mean μ t {\displaystyle \mu _{t}\,} and variance σ t 2 {\displaystyle \sigma _{t}^{2}} (so θ = ( μ , σ 2 ) {\displaystyle {\boldsymbol {\theta }}=(\mu ,\sigma ^{2})} here). Hence, for a given γ {\displaystyle \gamma \,} , the goal is to find θ {\displaystyle {\boldsymbol {\theta }}} so that D K L ( I { S ( x ) ≥ γ } ‖ f θ ) {\displaystyle D_{\mathrm {KL} }({\textrm {I}}_{\{S(x)\geq \gamma \}}\|f_{\boldsymbol {\theta }})} is minimized. This is done by solving the sample version (stochastic counterpart) of the KL divergence minimization problem, as in step 3 above. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the sample mean and sample variance corresponding to the elite samples, which are those samples that have objective function value ≥ γ {\displaystyle \geq \gamma } . The worst of the elite samples is then used as the level parameter for the next iteration. This yields the following randomized algorithm that happens to coincide with the so-called Estimation of Multivariate Normal Algorithm (EMNA), an estimation of distribution algorithm. === Pseudocode === // Initialize parameters μ := −6 σ2 := 100 t := 0 maxits := 100 N := 100 Ne := 10 // While maxits not exceeded and not converged while t < maxits and σ2 > ε do // Obtain N samples from current sampling distribution X := SampleGaussian(μ, σ2, N) // Evaluate objective function at sampled points S := exp(−(X − 2) ^ 2) + 0.8 exp(−(X + 2) ^ 2) // Sort X by objective function values in descending order X := sort(X, S) // Update parameters of sampling distribution via elite samples μ := mean(X(1:Ne)) σ2 := var(X(1:Ne)) t := t + 1 // Return mean of final sampling distribution as solution return μ == Related methods == Simulated annealing Genetic algorithms Harmony search Estimation of distribution algorithm Tabu search Natural Evolution Strategy Ant colony optimization algorithms

    Read more →
  • Speech recognition

    Speech recognition

    Speech recognition (automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT)) is a sub-field of computational linguistics concerned with methods and technologies that translate spoken language into text or other interpretable forms. Speech recognition applications include voice user interfaces, where the user speaks to a device, which "listens" and processes the audio. Common voice applications include interpreting commands for calling, call routing, home automation, and aircraft control. These applications are called direct voice input. Productivity applications include searching audio recordings, creating transcripts, and dictation. Speech recognition can be used to analyse speaker characteristics, such as identifying native language using pronunciation assessment. Voice recognition (speaker identification) refers to identifying the speaker, rather than speech contents. Recognizing the speaker can simplify the task of translating speech in systems trained on a specific person's voice. It can also be used to authenticate the speaker as part of a security process. == History == Applications for speech recognition developed over many decades, with progress accelerated due to advances in deep learning and the use of big data. These advances are reflected in an increase in academic papers, and greater system adoption. Key areas of growth include vocabulary size, more accurate recognition for unfamiliar speakers (speaker independence), and faster processing speed. === Pre-1970 === 1952 – Bell Labs researchers, Stephen Balashek, R. Biddulph, and K. H. Davis, built Audrey for single-speaker digit recognition. Their system located the formants in the power spectrum of each utterance. 1960 – Gunnar Fant developed and published the source–filter model of speech production. 1962 – IBM's 16-word "Shoebox" machine's speech recognition debuted at the 1962 World's Fair. 1966 – Linear predictive coding, a speech coding method, was proposed by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone. 1969 – Funding at Bell Labs came to a halt for several years after the company's head engineer, John R. Pierce, wrote an open letter criticizing speech recognition research. This defunding lasted until Pierce retired and James L. Flanagan took over. Raj Reddy was the first person to work on continuous speech recognition, as a graduate student at Stanford University in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing chess. Around this time, Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary. DTW processed speech by dividing it into short frames (e.g. 10 ms segments) and treating each frame as a unit. Speaker independence, however, remained unsolved. === 1970–1990 === 1971 – DARPA funded a five-year speech recognition research project, Speech Understanding Research, seeking a minimum vocabulary size of 1,000 words. The project considered speech understanding a key to achieving progress in speech recognition, which was later disproved. BBN, IBM, Carnegie Mellon (CMU), and Stanford Research Institute participated. 1972 – The IEEE Acoustics, Speech, and Signal Processing group held a conference in Newton, Massachusetts. 1976 – The first ICASSP was held in Philadelphia, which became a major venue for publishing on speech recognition. During the late 1960s, Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis. A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the hidden Markov model (HMM) for speech recognition. James Baker had learned about HMMs while at the Institute for Defense Analysis. HMMs enabled researchers to combine sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model. By the mid-1980s, Fred Jelinek's team at IBM created a voice-activated typewriter called Tangora, which could handle a 20,000-word vocabulary. Jelinek's statistical approach placed less emphasis on emulating human brain processes in favor of statistical modelling. (Jelinek's group independently discovered the application of HMMs to speech.) This was controversial among linguists since HMMs are too simplistic to account for many features of human languages. However, the HMM proved to be a highly useful way for modelling speech and replaced dynamic time warping as the dominant speech recognition algorithm in the 1980s. 1982 – Dragon Systems, founded by James and Janet M. Baker, was one of IBM's few competitors. === Practical speech recognition === The 1980s also saw the introduction of the n-gram language model. 1987 – The back-off model enabled language models to use multiple-length n-grams, and CSELT used HMM to recognize languages (in software and hardware, e.g. RIPAC). At the end of the DARPA program in 1976, the best computer available to researchers was the PDP-10 with 4 MB of RAM. It could take up to 100 minutes to decode 30 seconds of speech. Practical products included: 1984 – the Apricot Portable was released with up to 4096 words support, of which only 64 could be held in RAM at a time. 1987 – a recognizer from Kurzweil Applied Intelligence 1990 – Dragon Dictate, a consumer product released in 1990. AT&T deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without a human operator. The technology was developed by Lawrence Rabiner and others at Bell Labs. By the early 1990s, the vocabulary of the typical commercial speech recognition system had exceeded the average human vocabulary. Reddy's former student, Xuedong Huang, developed the Sphinx-II system at CMU. Sphinx-II was the first to do speaker-independent, large vocabulary, continuous speech recognition, and it won DARPA's 1992 evaluation. Handling continuous speech with a large vocabulary was a major milestone. Huang later founded the speech recognition group at Microsoft in 1993. Reddy's student Kai-Fu Lee joined Apple, where, in 1992, he helped develop the Casper speech interface prototype. Lernout & Hauspie, a Belgium-based speech recognition company, acquired other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000. L&H was used in Windows XP. L&H was an industry leader until an accounting scandal destroyed it in 2001. L&H speech technology was bought by ScanSoft, which became Nuance in 2005. Apple licensed Nuance software for its digital assistant Siri. ==== 2000s ==== In the 2000s, DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002, followed by Global Autonomous Language Exploitation (GALE) in 2005. Four teams participated in EARS: IBM; a team led by BBN with LIMSI and the University of Pittsburgh; Cambridge University; and a team composed of ICSI, SRI, and the University of Washington. EARS funded the collection of the Switchboard telephone speech corpus, which contained 260 hours of recorded conversations from over 500 speakers. The GALE program focused on Arabic and Mandarin broadcast news. Google's first effort at speech recognition came in 2007 after recruiting Nuance researchers. Its first product, GOOG-411, was a telephone-based directory service. Since at least 2006, the U.S. National Security Agency has employed keyword spotting, allowing analysts to index large volumes of recorded conversations and identify speech containing "interesting" keywords. Other government research programs focused on intelligence applications, such as DARPA's EARS program and IARPA's Babel program. In the early 2000s, speech recognition was dominated by hidden Markov models combined with feed-forward artificial neural networks (ANN). Later, speech recognition was taken over by long short-term memory (LSTM), a recurrent neural network (RNN) published by Sepp Hochreiter & Jürgen Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks that require memories of events that happened thousands of discrete time steps earlier, which is important for speech. Around 2007, LSTMs trained with Connectionist Temporal Classification (CTC) began to outperform. In 2015, Google reported a 49 percent error‑rate reduction in its speech recognition via CTC‑trained LSTM. Transformers, a type of neural network based solely on attention, were adopted in computer vision and language modelling, and then to speech recognition. Deep feed-forward (non-recurrent) networks for acoustic modelling were introduced in 2009 by Geoffrey Hinton and his students at the University of Toronto, and by Li Deng and colleagues at Microsoft Research. In contrast to the prioer incremental improvements, deep learning decreased error rates by 30%. Both shallow and deep forms (e.g., recurrent nets) of ANNs had been explored since the 1980s. Howev

    Read more →
  • Reconstruction from projections

    Reconstruction from projections

    The problem of reconstructing a multidimensional signal from its projection is uniquely multidimensional, having no 1-D counterpart. It has applications that range from computer-aided tomography to geophysical signal processing. It is a problem which can be explored from several points of view—as a deconvolution problem, a modeling problem, an estimation problem, or an interpolation problem. == Motivation and applications == Many fields in science and engineering use reconstruction from projections, especially in imaging. It is widely applied geophysical tomography, medical imaging and industrial radiography. For example, in a CT scanner, the 3D structure of the patient’s body being scanned is measured with beams going through the tissue and hitting a detector, giving a flat projection of the body from that angle. Multiple projections are put together to get an image of the position and shape of structures inside in 3D. == Problem statement and basics == A projection is a linear mapping of an M {\displaystyle M} dimensional signal into an N {\displaystyle N} dimensional one, where N ≤ M {\displaystyle N\leq M} . And the objective of reconstruction is to restore the M {\displaystyle M} dimensional signal based on the N {\displaystyle N} dimensional signal. The following case is a 2-D signal projected into 1D signal. The signal in the original coordinate is denoted as d ( u , v ) {\displaystyle d(u,v)} . Now consider a collimated beam of radiation coming from the opposite orientation of v ^ {\displaystyle {\hat {v}}} , producing a projection along u ^ {\displaystyle {\hat {u}}} . v ^ {\displaystyle {\hat {v}}} and u ^ {\displaystyle {\hat {u}}} are normal to each other, and the angle between u {\displaystyle u} and u ^ {\displaystyle {\hat {u}}} is theta. The signal obtained along u ^ {\displaystyle {\hat {u}}} axis is defined to be p θ ( u ^ ) {\displaystyle p_{\theta }({\hat {u}})} . The relationship between the original coordinate and the rotated coordinate is given by [ u ^ v ^ ] = [ cos ⁡ θ sin ⁡ θ − sin ⁡ θ cos ⁡ θ ] [ u v ] {\displaystyle {\begin{bmatrix}{\hat {u}}\\{\hat {v}}\end{bmatrix}}={\begin{bmatrix}\cos \theta &\sin \theta \\-\sin \theta &\cos \theta \end{bmatrix}}{\begin{bmatrix}u\\v\end{bmatrix}}} or inversely, [ u v ] = [ cos ⁡ θ − sin ⁡ θ sin ⁡ θ cos ⁡ θ ] [ u ^ v ^ ] {\displaystyle {\begin{bmatrix}u\\v\end{bmatrix}}={\begin{bmatrix}\cos \theta &-\sin \theta \\\sin \theta &\cos \theta \end{bmatrix}}{\begin{bmatrix}{\hat {u}}\\{\hat {v}}\end{bmatrix}}} Then we have p θ ( u ^ ) = ∫ − ∞ ∞ d ( u , v ) d v ^ = ∫ − ∞ ∞ d ( u ^ cos ⁡ ( θ ) − v ^ sin ⁡ ( θ ) , u ^ sin ⁡ ( θ ) + v ^ cos ⁡ ( θ ) ) d v ^ {\displaystyle p_{\theta }({\hat {u}})=\int _{-\infty }^{\infty }d(u,v)\,\mathrm {d} {\hat {v}}=\int _{-\infty }^{\infty }d({\hat {u}}\cos(\theta )-{\hat {v}}\sin(\theta ),{\hat {u}}\sin(\theta )+{\hat {v}}\cos(\theta ))\,\mathrm {d} {\hat {v}}} By varying theta, a large number of projections can be obtained. Given the projection-slice theorem, D ( Ω , θ ) {\displaystyle D(\Omega ,\theta )} ,the slice of the Fourier transform of d ( u , v ) {\displaystyle d(u,v)} at angle theta, is equivalent to P θ ( Ω ) {\displaystyle P_{\theta }(\Omega )} , the Fourier Transform of the projection p θ ( u ^ ) {\displaystyle p_{\theta }({\hat {u}})} . Therefore, the unknown d ( u , v ) {\displaystyle d(u,v)} can be obtained from its Fourier transform by means of the Fourier transform inversion integral d ( u , v ) = 1 4 π 2 ∫ − ∞ ∞ ∫ − ∞ ∞ D ( Ω 1 , Ω 2 ) e j Ω 1 u e j Ω 2 v d Ω 1 , Ω 2 {\displaystyle \mathrm {d} (u,v)={\frac {1}{4\pi ^{2}}}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }D(\Omega _{1},\Omega _{2})e^{j\Omega _{1}u}e^{j\Omega _{2}v}\,\mathrm {d} \Omega _{1},\Omega _{2}} = 1 4 π 2 ∫ 0 ∞ ∫ − π π D ( Ω , θ ) e j Ω u cos ⁡ ( θ ) e j Ω v s i n θ | Ω | d Ω d θ {\displaystyle ={\frac {1}{4\pi ^{2}}}\int _{0}^{\infty }\int _{-\pi }^{\pi }D(\Omega ,\theta )e^{j\Omega u\cos(\theta )}e^{j\Omega vsin\theta }{\begin{vmatrix}\Omega \end{vmatrix}}\,\mathrm {d} \Omega \mathrm {d} \theta } = 1 4 π 2 ∫ − π π ∫ 0 ∞ P θ ( Ω ) e j Ω ( u cos ⁡ θ + v sin ⁡ θ ) | Ω | d Ω d θ {\displaystyle ={\frac {1}{4\pi ^{2}}}\int _{-\pi }^{\pi }\int _{0}^{\infty }P_{\theta }(\Omega )e^{j}\Omega (u\cos \theta +v\sin \theta ){\begin{vmatrix}\Omega \end{vmatrix}}\,\mathrm {d} \Omega \mathrm {d} \theta } = 1 4 π 2 ∫ 0 π ( ∫ − ∞ ∞ P θ ( Ω ) | Ω | {\displaystyle ={\frac {1}{4\pi ^{2}}}\int _{0}^{\pi }(\int _{-\infty }^{\infty }P_{\theta }(\Omega ){\begin{vmatrix}\Omega \end{vmatrix}}} e j Ω u ^ d Ω ) d θ {\displaystyle e^{j\Omega {\hat {u}}}\mathrm {d} \Omega )\mathrm {d} \theta } By taking the inverse Fourier Transform and assuming g ( u ^ ) = F − 1 ( | Ω | 2 ) {\displaystyle g({\hat {u}})={\mathcal {F}}^{-1}({{\begin{vmatrix}\Omega \end{vmatrix}}^{2}})} , we get d ( u , v ) = ∑ i △ θ i [ p θ ( u ^ ) ∗ g θ i ( u ^ ) ] {\displaystyle d(u,v)=\sum _{i}\vartriangle \theta _{i}[p_{\theta }({\hat {u}})g_{\theta i}({\hat {u}})]} == Approaches == In practice, there are a wide variety of methods that are utilized, most of which are reconstruct 3-D information (volume) from 2-D signals (image). Typically used methods are CT, MRI, PET and SPECT. And the filtered back projection based on the principles introduced above are commonly applied. === Computed Tomography (CT) === In CT, a volume is formed by stacking the axial slices. The software cuts the volume in a different plane (usually orthogonal). Commonly, slice data is generated using an X-ray source that rotates around the object. X-ray sensors are positioned on the opposite side of the circle from the X-ray source. === Magnetic resonance imaging (MRI) === In MRI, energy from an oscillating magnetic field is temporarily applied to the patient at the appropriate resonance frequency. The protons (hydrogen atoms) emit a radio frequency signal which is measured by a receiving coil. The radio signal can be made to encode position information by varying the main magnetic field using gradient coils. === Positron emission tomography (PET) === The system detects pairs of gamma rays emitted indirectly by a positron-emitting radionuclide (tracer), which is introduced into the body on a biologically active molecule. Three-dimensional images of tracer concentration within the body are then constructed by computer analysis. In modern PET-CT scanners, three dimensional imaging is often accomplished with the aid of a CT X-ray scan performed on the patient during the same session, in the same machine. === Single-photon emission computed tomography (SPECT) === SPECT imaging is performed by using a gamma camera to acquire multiple 2-D images (projections) from multiple angles. Multiple projections are used to yield a 3-D data set. This data set may then be manipulated to show thin slices along any chosen axis of the body. SPECT is similar to PET in its use of radioactive tracer material and detection of gamma rays, while the tracers used in SPECT emit gamma radiation that is measured more directly.

    Read more →