AI Face Over

AI Face Over — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Art Recognition

    Art Recognition

    Art Recognition is a Swiss technology company headquartered in Adliswil, within the Zurich metropolitan area, Switzerland. Art Recognition specializes in the application of artificial intelligence (AI) for art authentication and the detection of art forgeries. == Overview == Art Recognition was established in 2019 by Dr. Carina Popovici and Christiane Hoppe-Oehl. Art Recognition employs a combination of machine learning techniques, computer vision algorithms, and deep neural networks to assess the authenticity of artworks. The company's technology undergoes a process of data collection, dataset preparation, and training. === Academic partnerships and grants === Art Recognition has established a relationship with Innosuisse, a Swiss innovation agency, to expand its research and development initiatives. It has also formed a strategic collaboration with Nils Büttner, an art historian and professor at the State Academy of Fine Arts Stuttgart (ABK Stuttgart). === Notable developments === In May 2024, Art Recognition played a key role in identifying counterfeit artworks, including alleged Monets and Renoirs, being sold on eBay. Germann Auction in November 2024 became the first auction house to successfully conduct a sale of artwork authenticated entirely by artificial intelligence. As of January 2025, Art Recognition has appointed art crime expert and Pulitzer Prize finalist Noah Charney as an advisor. === Recognition and debates === The company was featured on the front page of The Wall Street Journal for its involvement in the authentication case of the Flaget Madonna, believed to have been partly painted by Raphael. A broadcast by the Swiss public television SRF covered how the algorithm can be used to detect art forgeries with high accuracy. The technology developed by Art Recognition has been recognized for its role in providing a technology-based art authentication solution, compared to traditional methods. == Controversial cases == Art Recognition's AI algorithm has been applied to several high-profile and controversial artworks, sparking significant interest and debate in the art world. Samson and Delilah at the National Gallery in London: The National Gallery's "Samson and Delilah", traditionally attributed to the artist Rubens, has also been examined using Art Recognition's AI, which has assessed the painting as non-authentic. De Brecy Tondo Madonna. A research team from Bradford University and the University of Nottingham initially attributed the painting to Raphael, employing an AI face recognition software, while the AI developed at Art Recognition returned a negative result. The Bradford group's AI was trained on 49 images, whereas Art Recognition employed a larger dataset of over 100 images. Lucian Freud Painting Controversy: Featured in The New Yorker, a painting attributed to Lucian Freud became a subject of dispute. Art Recognition's AI analysis played a big role in examining the painting's authenticity. Titian at Kunsthaus Zürich: A painting attributed to Titian, housed at Kunsthaus Zürich, has been a topic of debate among art experts. The application of Art Recognition's technology offered a new perspective. Following this debate, Kunsthaus Zürich has announced plans to initiate a comprehensive project aimed at resolving the authenticity questions surrounding the painting. Art Recognition has contributed to the authentication debate surrounding The Polish Rider, a painting traditionally attributed to Rembrandt but subject to scholarly debate.

    Read more →
  • Moving object detection

    Moving object detection

    Moving object detection is a technique used in computer vision and image processing. Multiple consecutive frames from a video are compared by various methods to determine if any moving object is detected. Moving objects detection has been used for wide range of applications like video surveillance, activity recognition, road condition monitoring, airport safety, monitoring of protection along marine border, etc. == Definition == Moving object detection is to recognize the physical movement of an object in a given place or region. By acting segmentation among moving objects and stationary area or region, the moving objects' motion can be tracked and thus analyzed later. To achieve this, consider a video is a structure built upon single frames, moving object detection is to find the foreground moving target(s), either in each video frame or only when the moving target shows the first appearance in the video. == Traditional methods == Among all the traditional moving object detection methods, we could categorize them into four major approaches: Background subtraction, Frame differencing, Temporal Differencing, and Optical Flow. === Frame differencing === Instead of using traditional approach, to use image subtraction operator by subtracting second and images afterwards, the frame differencing method makes comparisons between two successive frames to detect moving targets. === Temporal differencing === The temporal differencing method identifies the moving object by applying pixel-wise difference method with two or three consecutive frames.

    Read more →
  • Objective vision

    Objective vision

    Objective Vision (Object Oriented Visionary) is a project mainly aimed at real-time computer vision and simulation vision of living creatures. it has three sections containing an open-source library of programming functions for using inside the projects, Virtual laboratory for scholars to check the application of functions directly and by command-line code for external and instant access, and the research section consists of paperwork and libraries to expand the scientific prove of works. == Background == The process has been used in the OVC libraries is as same as what's happening when living see a picture, and it's designed to give the researchers to experience the brain's visual cortex most close simulation for picture perception. The OVC was designed to work as a simulated visual cortex that has a critical job in processing and classify the objects to make it easier to work with pictures and graphical perception and processing. The human brain is much more aware of how it solves complex problems such as playing chess or solving algebra equations, which is why computer programmers have had so much success building machines that emulate this type of activity. but when the whole process is still a riddle that how the entities visionary system works. The project was simulated the visionary system by how it starts to convert the signals to image(actually the edges and colors) and then recognizing the shapes to find a relation between brain's information and image. The Objective Visionary system actually is concentrating on the separable sections, this separation gives the application visionary system the excellence processing result, because with this method the system do not waste much time on processing non significant sections and signals. this operation in the Objective Vision project called objective processing and because the O.V. mission is focused on human visionary simulation, so the developer refers with Objective Vision. == History == Objective-Vision is a Human (Natural) Visionary simulation Project developed by Michael Bidollahkhany. Following an explosion of interest during the 21st century were characterized by the maturing of the field and the significant growth of active applications; simulation of visionary systems, visionary based autonomous vehicle guidance, medical imaging (2D and 3D) and automatic surveillance are the most rapidly developing areas. This progress can be seen in an increasing number of software and hardware products on the market, as well as in a number of digital image processing software and APIs and also machine vision courses offered at universities worldwide. Therefore, the OVC project has been released as a research software project in 2016. One of important parts of this project was O.V.C. (Objective Vision Class library), that was designed to able companies and scientists to use the brain's most likely functionalities as visionary libraries to simplify and accelerate the image processing algorithms developments. The project started under MIT copyright license, but since 2018 the project continued as classified based on sponsors opinion. == The Algorithm == As developers claimed the algorithm used in the class library and developer's kit of project has been developed based on natural visionary system, and the functionalities containing image processing, optimization and labeling etc. are mostly upgraded and near techniques. Suppose that we've a picture of a jungle, or somewhere else, with this library developer will be able to manipulate not only the pixel of images for data extraction, but automatically based on which algorithm is used and image quality, he can manipulate directly a list of objects, same pixels and every data project needs to have, said the developer in his lecture answering how the algorithm works. === Viewpoint === For long times digital image processing and storing, was actually by processing just pixels; this Project tries to present a new kind of image processing and even storing, "objective vision" or "object-oriented visionary" is called. This project officially launched in May 2016, with the aim of making more adaptation between Computer Vision (Include Visionary, Digital image processing, discernment and even Perception) and Human Visual System; about development of the project: "...so we decided to research on Human Vision System, besides we worked on Artificial Retinal image processing and new visionary optimization unit(Presented at Istanbul Technical University Conference(Turkey 2015-2016)) and grew our research to Visionary CORTEX of Brain", Michael Bidollahkhany said. == Applications == The OVC application areas include: 2D and 3D feature toolkits Egomotion estimation Human–computer interaction (HCI) Mobile robotics Motion understanding Object identification Segmentation and recognition Stereopsis stereo vision: depth perception from two cameras Structure from motion (SFM) Motion tracking == Programming language == In first initial release of Objective Visionary Project the algorithm has been written in C++ and C#, and the virtual laboratory has been developed in C# and Delphi. Based on developers last lecture since the second release the complete algorithm has been re-written in C# based on .Net Core 1.0 to make it easier to work on different operating systems.

    Read more →
  • Find It, Fix It

    Find It, Fix It

    Find It, Fix It is a mobile app developed by the city of Seattle to report non-emergency issues. == History == The City of Seattle launched Find It, Fix It in 2013 for Android and iOS phones to let citizens report potholes, graffiti, and other problems they observe to the city. The app did not support Windows Phone, making it inaccessible to Microsoft employees in the city who used the company's then-supported mobile operating system. In 2015, Mayor Ed Murray led a Find It, Fix It walk with about 100 other people, including police officers, in the University District. Participants were encouraged to use the app to report problems they observed in the neighborhood. Later Find It, Fix It walks have taken place in neighborhoods including Crown Hill, First Hill, Belltown, Wallingford, and Highland Park. In 2020, Find It, Fix It added support for reporting issues with the dockless bicycle sharing systems in the city. Citing the success of Seattle’s app, the nearby city of Kent, Washington, announced that it would create a similar customer service app. == Usage == Users of Find It, Fix It can submit reports about graffiti, potholes, parking violations, broken street signs, and other issues. The app is designed to use a smartphone’s camera and GPS features to make it easier for users to file reports. The Atlantic reported in 2018 that Find It, Fix It was being used by neighborhood groups to report homeless encampments with the intention of having authorities remove them, citing examples of campaigns in Ravenna and Ballard. The executive director of Ballard Alliance, a local chamber of commerce for businesses in the neighborhood, used a private Facebook group to encourage business owners to use the app to report homeless encampments. In response to a poster campaign in the summer of 2019 with the slogan “See a tent? Report a tent”, a representative for the mayor’s office and two Seattle City Council members said that it was inappropriate to encourage use of Find It, Fix It to displace homeless people. As a backlash to these campaigns, people living far from Seattle filed hoax complaints using the app, such as by using photos of tents on display at REI stores. According to the Seattle Times, between January 1, 2020, and November 15, 2021, the city had received over 230,000 service requests, of which 77% were submitted via Find It, Fix It. The largest category of these, numbering over 55,000, concerned illegal dumping. Of complaints categorized as "parking", 3,000 had comments explicitly mentioning issues around homelessness. The ZIP code 98134, covering an industrial area south of Pioneer Square and north of Georgetown, had 5,559 service requests per 1,000 residents, by far the highest in the city.

    Read more →
  • Hyperparameter (machine learning)

    Hyperparameter (machine learning)

    In machine learning, a hyperparameter is a parameter that can be set in order to define any configurable part of a model's learning process. Hyperparameters can be classified as either model hyperparameters (such as the topology and size of a neural network) or algorithm hyperparameters (such as the learning rate and the batch size of an optimizer). These are named hyperparameters in contrast to parameters, which are characteristics that the model learns from the data. Hyperparameters are not required by every model or algorithm. Some simple algorithms such as ordinary least squares regression require none. However, the LASSO algorithm, for example, adds a regularization hyperparameter to ordinary least squares which must be set before training. Even models and algorithms without a strict requirement to define hyperparameters may not produce meaningful results if these are not carefully chosen. However, optimal values for hyperparameters are not always easy to predict. Some hyperparameters may have no meaningful effect, or one important variable may be conditional upon the value of another. Often a separate process of hyperparameter tuning is needed to find a suitable combination for the data and task. As well as improving model performance, hyperparameters can be used by researchers to introduce robustness and reproducibility into their work, especially if it uses models that incorporate random number generation. == Considerations == The time required to train and test a model can depend upon the choice of its hyperparameters. A hyperparameter is usually of continuous or integer type, leading to mixed-type optimization problems. The existence of some hyperparameters is conditional upon the value of others, e.g. the size of each hidden layer in a neural network can be conditional upon the number of layers. === Difficulty-learnable parameters === The objective function is typically non-differentiable with respect to hyperparameters. As a result, in most instances, hyperparameters cannot be learned using gradient-based optimization methods (such as gradient descent), which are commonly employed to learn model parameters. These hyperparameters are those parameters describing a model representation that cannot be learned by common optimization methods, but nonetheless affect the loss function. An example would be the tolerance hyperparameter for errors in support vector machines. === Untrainable parameters === Sometimes, hyperparameters cannot be learned from the training data because they aggressively increase the capacity of a model and can push the loss function to an undesired minimum (overfitting to the data), as opposed to correctly mapping the richness of the structure in the data. For example, if we treat the degree of a polynomial equation fitting a regression model as a trainable parameter, the degree would increase until the model perfectly fit the data, yielding low training error, but poor generalization performance. === Tunability === Most performance variation can be attributed to just a few hyperparameters. The tunability of an algorithm, hyperparameter, or interacting hyperparameters is a measure of how much performance can be gained by tuning it. For an LSTM, while the learning rate followed by the network size are its most crucial hyperparameters, batching and momentum have no significant effect on its performance. Although some research has advocated the use of mini-batch sizes in the thousands, other work has found the best performance with mini-batch sizes between 2 and 32. === Robustness === An inherent stochasticity in learning directly implies that the empirical hyperparameter performance is not necessarily its true performance. Methods that are not robust to simple changes in hyperparameters, random seeds, or even different implementations of the same algorithm cannot be integrated into mission critical control systems without significant simplification and robustification. Reinforcement learning algorithms, in particular, require measuring their performance over a large number of random seeds, and also measuring their sensitivity to choices of hyperparameters. Their evaluation with a small number of random seeds does not capture performance adequately due to high variance. Some reinforcement learning methods, e.g. DDPG (Deep Deterministic Policy Gradient), are more sensitive to hyperparameter choices than others. == Optimization == Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given test data. The objective function takes a tuple of hyperparameters and returns the associated loss. Typically these methods are not gradient based, and instead apply concepts from derivative-free optimization or black box optimization. == Reproducibility == Apart from tuning hyperparameters, machine learning involves storing and organizing the parameters and results, and making sure they are reproducible. In the absence of a robust infrastructure for this purpose, research code often evolves quickly and compromises essential aspects like bookkeeping and reproducibility. Online collaboration platforms for machine learning go further by allowing scientists to automatically share, organize and discuss experiments, data, and algorithms. Reproducibility can be particularly difficult for deep learning models. For example, research has shown that deep learning models depend very heavily even on the random seed selection of the random number generator.

    Read more →
  • Imix video cube

    Imix video cube

    The Imix (also known as ImMix) Video Cube is one of the first computer non-linear editing systems that was a full broadcast quality online video finishing machine. After its release in 1994, Imix released a more advanced version, the Imix Turbo Cube, which boasted 4 channels of real time layered visual effects. It was a hardware computer system controlled by an Apple Macintosh computer.

    Read more →
  • AppBlock

    AppBlock

    AppBlock is a software tool for managing screen time that limits access to selected mobile applications and websites. Developed by the Czech studio MobileSoft, it is distributed for Android and iOS devices as well as through browser extensions for Google Chrome, Microsoft Edge and Brave, and as desktop solutions. The application is used primarily to restrict time spent on social media and similar distracting services while working and studying. By 2025, the application reported 700,000 monthly active users, with the domestic Czech market accounting for less than one percent of its total user base and revenue. == History == === Origins === AppBlock was created by the Czech software studio MobileSoft, based in Hradec Králové. The studio was founded in 2012 by Miroslav Novosvětský, who remains the sole owner. The idea for the application arose from the use of browser-based website blockers on desktop computers. AppBlock was conceived as a way to reduce the time spent on mobile devices. === Early releases === In its early phase, AppBlock was available only for phones running on Android. Early versions allowed users to limit access to selected applications and websites during specified periods. From the outset, the application was distributed internationally rather than only within the Czech market, and early coverage reported a multi-million number of downloads worldwide. === Expansion of functionality === Over time, AppBlock has expanded beyond basic application blocking to include additional functions related to limiting procrastination and managing attention. The development of AppBlock accelerated during the COVID-19 pandemic. Following a reduction in external client orders, the studio reallocated resources from contract development to the application. Increased digital content consumption during lockdowns contributed to a rise in the application's usage and revenue. As the application developed, it became the company's product with the largest user base. Novosvětský described an increase in downloads over a twelve-month period, which he linked in part to the company's activities abroad, including participation in events focused on mobile marketing in the United States. These activities were an important factor in the further development of AppBlock. === Internationalization and market expansion === Within roughly the first eight years of the company's existence, MobileSoft became active both in the domestic Czech market and in the United States, supported among other things by participation in the CzechAccelerator program, which is intended to help Czech firms enter foreign markets. In mid-August 2021 the developers launched a version for iOS, which soon began to attract paying users. The expansion to iOS was accompanied by plans for cooperation with the Procrastination.com platform, intended to complement the blocking functions with educational content related to digital media use, sleep and work habits. By 2025, AppBlock was localised into 15 languages, with the largest share of users in the United States, the United Kingdom, Germany, and France, with recent growth in Brazil, and usage extending across several continents. AppBlock has reached more than 10 million installations. In the same period its creators announced plans to refine existing functions and to expand support beyond mobile phones to desktop use, including through support for additional web browsers. == Features == === Supported platforms === AppBlock is distributed as a mobile application for Android and iOS users through Google Play and the Apple App Store. Browser extensions for desktop systems are available for Google Chrome, Microsoft Edge and Brave. === Functionality === AppBlock's core function is to restrict access to selected applications and websites. The mobile application shows a list of installed apps and lets the user select which ones to block. It also includes tools to block specific websites and, on iOS, to block certain phrases entered in the Safari browser. AppBlock can mute notifications from selected applications, so alerts from those apps do not appear while blocking is active. In addition to choosing which apps or content to block, the software also offers an allowlist mode, where only selected applications remain accessible and all others are blocked. Blocking rules are organized into configurable schedules, called profiles. Users can create profiles that define time periods when selected apps and websites are unavailable. Newer versions also allow profiles to be activated automatically based on the time of day, days of the week, the device's location, or connection to specific Wi-Fi networks. The iOS version lets users set limits on how often or how long certain apps can be used before they are blocked, and it can track and restrict screen time for individual apps. In addition to these recurring rules, AppBlock includes a Quick Block feature that temporarily blocks selected apps and websites with a single action, without requiring a separate long-term schedule. Strict Mode is an optional setting that limits the ability to change blocking once it is active. For a specified period, it prevents editing AppBlock's rules and can be configured to stop the app from being uninstalled during that time. While Strict Mode is enabled, users cannot modify or disable the restrictions they have set. Deactivation requires specific verification steps, such as connecting the device to a charger or obtaining approval from a designated contact person. The mobile application also includes statistical and reporting features. In addition to blocking, AppBlock lets users view statistics and data about their use of applications and websites, including screen-time summaries and focus sessions that silence notifications and enforce blocking during defined work or study periods. Browser extensions for desktop environments apply AppBlock's website-blocking functions on Windows and macOS systems through supported web browsers. == Business model == AppBlock uses a freemium revenue model. The basic version of the application is available free of charge and allows blocking of up to three applications at the same time. The premium version removes this limit and adds further configuration options. In 2020, the application shifted from a one-time payment structure to a subscription model. By 2021, AppBlock had more than seven thousand paying users and annual revenue of about four million Czech crowns. By 2025, annual revenue reached approximately 4 million US dollars (80 million CZK) before taxes and platform fees, with roughly 20 percent of active users subscribing to the paid version. == Usage == AppBlock limits access to selected applications and websites in order to reduce smartphone overuse and digital distraction. It is used to block social media, games and other services considered addictive, with the aim of reducing frequent checking of mobile devices and creating time intervals in which these services are unavailable. Reported use cases of AppBlock cover work, students, parents, ADHD, mental health, well-being and business. The application is used both by individual users and within workplace initiatives in which employees install it to reduce digital distractions during working hours.

    Read more →
  • Adobe PhotoDeluxe

    Adobe PhotoDeluxe

    PhotoDeluxe was a consumer-oriented image editing software line published by Adobe Systems from 1996 until July 8, 2002. At that time it was replaced by Adobe's newly launched consumer-oriented image editing software Photoshop Elements. Adobe no longer provides technical support for the PhotoDeluxe software line. PhotoDeluxe had a range of image processing capabilities for the home photographer and image handler. These included removing red-eye, cropping, and adjusting brightness, contrast, and sharpness. It also included software to extract pictures from an image scanner. Among the functionality included was the ability to dynamically resize photos and export them in a wide range of formats. It also had a range of printing options including printing multiple copies of an image on the same page. It was often bundled free with Epson scanners or as free software with new computers. == Features == Despite the critical concerns regarding the quality of the setup, Photo Deluxe supports layering, blurs, sharpening, cloning, gradient fills, color and background switches, color variations, resizing options, and many other features. Another drawback of PhotoDeluxe was that it was designed for Mac computers, so working on Windows PC was a problem for those who were unable to customize their preferences. == Versions == === Adobe PhotoDeluxe 1.0 === The first version was released in 1996 for Windows and Macintosh computers. In one year, it sold over one million copies. === Adobe PhotoDeluxe 2.0 === The new version was released in 1997 and had added features such as a Clone Tool, red-eye removal, and sample templates for making posters, cards, and calendars. It also had new special effect features. === Adobe PhotoDeluxe 3.0 === The 3rd version was released in 1998. The new features included customizable clipart settings, the ability to import photos on the web, enhanced repair activities following Guided Activities, and Adobe Connectables to add new activities. === Adobe PhotoDeluxe Home Edition (4.0) === Version 4.0 was created by the makers of Photoshop. It had advanced abilities such as tools to add animation, voice, and music to a picture. It also had features to restore photos to their original position. == History == Adobe PhotoDeluxe 1.0 was released in 1996 for Macintosh computers, initially retailing for an MSRP of $49. The software did quite well, reportedly selling over a million copies by February of the next year, primarily due to bundles with companies like Apple and Hewlett-Packard. PhotoDeluxe was primarily advertised to consumers as a way to do basic photo manipulation, such as cropping and rotating images, or creating simple cards and calendars. PhotoDeluxe 2.0 was released in 1997, and was the last version of PhotoDeluxe that Adobe made that worked on Macs. PhotoDeluxe 2.0 became the "number one selling consumer photo-editing software product in the world." PhotoDeluxe 3.0 was released in 1998, where it was rebranded as "3.0 Home Edition", as Adobe released PhotoDeluxe Business Edition later that year for a higher price. PhotoDeluxe Home Edition, unofficially called PhotoDeluxe 4.0, was released in 1999 and was the last version of PhotoDeluxe to be released. Adobe officially cancelled PhotoDeluxe on July 8, 2002, citing the presence of Photoshop and Photoshop Elements, with support being officially cancelled in mid-2003. No version of PhotoDeluxe is compatible with Windows 10, rendering the program obsolete. == Pricing == All home versions of PhotoDeluxe retailed for an MSRP of $49. PhotoDeluxe 2.0 and onwards allowed users to upgrade from a previous version of PhotoDeluxe or a competing piece of graphics software for $39. Additionally PhotoDeluxe Business Edition allowed a similar deal, allowing users to upgrade from other versions of PhotoDeluxe or a competing software for $59, instead of its normal price of $99. Adobe also offered a bundle allowing users of 1.0 or 2.0 to get 3.0 and Business Edition for $79.

    Read more →
  • Class activation mapping

    Class activation mapping

    Class activation mapping methods are explainable AI (XAI) techniques used to visualize the regions of an input image that are the most relevant for a particular task, especially image classification, in convolutional neural networks (CNNs). These methods generate heatmaps by weighting the feature maps from a convolutional layer according to their relevance to the target class. In the field of artificial intelligence, generically defined as "the effort to automate intellectual tasks normally performed by humans", machine learning and deep learning were created. They both use statistical and computational methods to learn patterns from data, reducing the need for manually coded rules. Machine learning models are trained on input data and the known respective answers, learning the underlying patterns or structures present in the data. Traditional Machine learning algorithms employ manually designed feature sets, posing a direct link between machine learning designers and employed features. Deep learning is a subfield of machine learning, based on the concept of successive layers of representation, in which the data is progressively unfolded in different ways, to extract relevant and informative patterns in data analysis. Deep learning algorithms are defined as feature learning algorithms automatically learning hierarchical feature representations from raw data, extracting increasingly abstract features through multiple layers. CNNs are a specific architecture of deep learning models, designed to process spatially structured data, such as images, exploiting a series of convolution, non-linear activation and pooling operations to extract relevant features, contained in the so-called feature maps from input data. CNNs have demonstrated to be highly effective in a variety of computer vision and image processing tasks. CNNs (and deep learning models more broadly) are described as black boxes due to their complex and non-transparent internal layers of representation. The need for clearer indications on its internal working and decision-making process gave birth to XAI techniques. Among the proposed XAI techniques for computer vision tasks, Class activation mapping methods can show which pixels in an input image are important to the predicted logit for a class of interest, in a classification task. Class activation mapping methods were originally developed for class-discriminative scenarios to visualize which parts of the input image influenced the classification decision, namely to visually highlight the regions of those feature maps that contribute most strongly to the prediction of a given class. More advanced versions of these methods are not limited to image classification tasks, but have been extended also to several vision-related tasks, such as object detection, image captioning, visual question answering and image segmentation. == Background == The following methods laid the groundwork for the class activation maps approaches, forming the conceptual basis of using gradients to highlight class-discriminative regions. === Class model visualization and saliency maps for convolutional neural networks === The class model visualization and image-specific saliency maps approaches have been presented in the foundational work "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps" by Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman and it generalizes the deconvnet method by Zeiler and Fergus. Class model visualization synthesizes an artificial input image that strongly activates the output neurons associated with a target class. Given a trained, fixed model, this method starts with a zero-initialized image, backpropagates the gradients from the class score to the image pixels, updates the image pixels increasing the specific class scores and it repeats the pixel updating process, showing an encoded (idealized version) prototype of the class of interest. Image-specific class saliency visualization method provides a visual explanation by highlighting the most relevant pixels in an image for predicting a certain class C of interest. This is done by computing the gradient of the class score with respect to the input image, I 0 , {\displaystyle I_{0},} w = ∂ S C ∂ I | I 0 {\displaystyle w=\left.{\frac {\partial S_{C}}{\partial I}}\right|_{I_{0}}} approximating the model locally (around I 0 {\displaystyle I_{0}} ) as linear, using a first-order Taylor expansion: S C ( I ) ≈ w C T I + b {\displaystyle S_{C}(I)\approx w_{C}^{T}I+b} . The magnitude of w C {\displaystyle w_{C}} , the gradient, indicates the importancy of the pixels: larger gradients suggest greater influence on the prediction. Once the gradient is known, the saliency map is defined as the maximum absolute gradient across the color channels: M i j = m a x C | ∂ S C ∂ I i j C | {\displaystyle M_{ij}=max_{C}\left|{\frac {\partial S_{C}}{\partial I_{ij}^{C}}}\right|} resulting in an saliency map (i.e. heatmap). === Guided backpropagation === The concept of guided backpropagation can be traced for the first time in the paper by Springenberg et al. "Striving For Simplicity: The All Convolutional Net" and also this method builds upon the work by Zeiler and Fergus "Visualizing and Understanding Convolutional Networks". Guided backpropagation core is to understand what a CNN is learning, by visualizing the patterns that activate more strongly individual neurons (or filters), in architectures which do not rely on max-pooling layer. When propagating gradients back through a rectified linear unit (ReLU), guided backpropagation passes the gradient if and only if the input to the ReLU was positive (forward pass) and the output gradient is positive (backward signal), tackling both inactive neurons, negative gradients and suppressing the noise. The result displays sharper, high-resolution visualizations of what each neuron is responding to. Guided backpropagation represents a simple and practical method for model interpretability, helping understand how and where neural networks detect semantic concepts across layers. Moreover, it can be applied to any network architecture, due to its working principle. == Base versions == Class activation mapping and gradient-weighted class activation mapping are the original and most widely used methods for visual explanations in convolutional neural networks. These methods serve as the foundation for many later developments in explainable AI. Notation: In this article, the symbols i and j represent integer indices that disappear inside sums or averages, while x and y are the continuous (or up-sampled integer) coordinates of the final heat-map that is plotted. === Class activation mapping (CAM) === Class activation mapping (CAM) was the first, and the original, version of CAM methods, and it gave the name to the whole category. The approach was firstly introduced by Zhou et al. in their seminal work "Learning Deep Features for Discriminative Localization". This approach achieves class-specific heatmaps by modifying image classification CNN architectures, replacing fully-connected layers with convolutional layers and a final global average pooling layer. Its main scope is to localize and highlight discriminative regions of an input image that a CNN uses to identify a particular class, without needing explicit bounding box annotations. ==== Global average pooling (GAP) ==== Global average pooling (GAP) represents the key element in the original CAM approach. It is a dimensionality reduction technique and, similarly to other pooling layers, it allows the downsampling of the feature maps, calculating representative values for a specific region of the feature map. The particularity of GAP is that it calculates a single value for an entire feature map, significantly reducing the model dimensions. ==== Mathematical description ==== The mathematical description considers as its key the combination of convolutional and GAP layers. In CAM, it is mandatory to have the GAP layer after the last convolutional layer and before the final linear classifier layer. This last element of the architecture connects the output logits (the network predictions) y C {\displaystyle y^{C}} , to the GAP values, with its respective fine-tuned weights, w k C {\displaystyle w_{k}^{C}} . Considering A k {\displaystyle A^{k}} as the last feature maps of the last convolutional layer, GAP produces one value for each feature map, by averaging all the matrix elements (i, j) of the feature map: F k = 1 m n ∑ i = 1 m ∑ j = 1 n A i j k {\displaystyle F^{k}={\frac {1}{mn}}\sum _{i=1}^{m}\sum _{j=1}^{n}A_{ij}^{k}} with A k = [ A 11 k A 12 k ⋯ A 1 n k A 21 k A 22 k ⋯ A 2 n k ⋮ ⋮ ⋱ ⋮ A m 1 k A m 2 k ⋯ A m n k ] = { A i j k ∣ 1 ≤ i ≤ m , 1 ≤ j ≤ n } {\displaystyle A^{k}={\begin{bmatrix}A_{11}^{k}&A_{12}^{k}&\cdots &A_{1n}^{k}\\A_{21}^{k}&A_{22}^{k}&\cdots &A_{2n}^{k}\\\vdots &\vdots &\ddots &\vdots \\A_{m1}^{k}&A_{m2}^{k}&\cdots &A_{mn}^{k}\end{bmatrix}}=\left\{A_{

    Read more →
  • Microsoft To Do

    Microsoft To Do

    Microsoft To Do (previously styled as Microsoft To-Do) is a cloud-based task management application. It allows users to manage their tasks from a smartphone, tablet and computer. The technology is produced by the team behind Wunderlist, which was acquired by Microsoft, and the stand-alone apps feed into the existing Tasks feature of the Outlook product range. == History == Microsoft To Do was first launched as a preview with basic features in April 2017. Later more features were added including Task list sharing in June 2018. In September 2019, a major update to the app was unveiled, adopting a new user interface with a closer resemblance to Wunderlist. The name was also slightly updated by removing the hyphen from To-Do. In May 2020, Microsoft officially closed the doors on Wunderlist, ending its active service in favor of improving and expanding Microsoft To Do.

    Read more →
  • MoltenVK

    MoltenVK

    MoltenVK is a software library which allows Vulkan applications to run on top of Metal on Apple's macOS, iOS, and tvOS operating systems. It is the first software component to be released for the Vulkan Portability Initiative, a project to have a subset of Vulkan run on platforms lacking native Vulkan drivers. There are some limitations compared with a native Vulkan implementation. == History == MoltenVK was first released as a proprietary and commercially licensed product by The Brenwill Workshop on July 27, 2016. On July 31, 2017, Khronos announced the formation of the Vulkan Portability Technical Subgroup. === Open source === On February 26, 2018, Khronos announced that Vulkan became available on macOS and iOS products through the MoltenVK library. Valve announced that Dota 2 will run on macOS using the Vulkan API with the aid of MoltenVK, and that they had made an arrangement with developer The Brenwill Workshop Ltd to release MoltenVK as open-source software under the Apache License version 2.0. On May 30, 2018, Qt was updated with Vulkan for Qt on macOS using MoltenVK. On May 31, 2018, optional Vulkan support for Dota 2 on macOS was released. Benchmarks for the game were available the following day, showing better performance using Vulkan and MoltenVK compared to OpenGL. On July 20, 2018, Wine was updated with Vulkan support on macOS using MoltenVK. On 29 July 2018, the first app using MoltenVK was accepted onto the App Store, after initially being rejected. On 6 August 2018, Google open-sourced Filament, a crossplatform real-time physically based rendering engine with MoltenVK for macOS/iOS. On November 28, 2018, Valve released Artifact, their first Vulkan-only game on macOS using MoltenVK. === Version 1.0 === On 29 January 2019, MoltenVK 1.0.32 was released with early prototype of Vulkan Portability Extensions. RPCS3 and Dolphin emulators were updated with Vulkan support on macOS using MoltenVK. On 13 April 2019, MoltenVK 1.0.34 was released with support for tessellation. On July 30, 2019, MoltenVK 1.0.36 was released targeting Metal 3.0. On July 31, 2020, MoltenVK 1.0.44 was released, adding support for the tvOS platform. On January 23, 2020, MoltenVK was updated to support for some of the new features of Vulkan 1.2, as of Vulkan SDK 1.2.121. === Version 1.1 === On October 1, 2020, MoltenVK 1.1.0 was released, adding full support for Vulkan 1.1, as of Vulkan SDK 1.2.154. On 9 December 2020, MoltenVK 1.1.1 was released, providing support for Vulkan on Apple silicon GPUs and support for the Mac Catalyst platform for porting iOS/iPadOS apps to macOS. === Version 1.2 === On October 18, 2022, MoltenVK 1.2.0 was released, adding full support for Vulkan 1.2 as of Vulkan SDK 1.3.231. In January 2023, MoltenVK 1.2.2 added support for Vulkan as of SDK 1.3.239, while this version of Vulkan SDK fixed some issues with the interconnectivity with Metal API, while version 1.2.3 supported some additional extensions. === Version 1.3 === On May 1, 2025, MoltenVK 1.3 was released with support for Vulkan 1.3. === Version 1.4 === On August 20, 2025, MoltenVK 1.4 was released with support for Vulkan 1.4.

    Read more →
  • Macromedia FreeHand

    Macromedia FreeHand

    Macromedia FreeHand (formerly Aldus FreeHand) is a discontinued computer application for creating two-dimensional vector graphics oriented primarily to professional illustration, desktop publishing and content creation for the Web. FreeHand was similar in scope, intended market, and functionality to Adobe Illustrator, CorelDRAW and Xara Designer Pro. Because of FreeHand's dedicated page layout and text control features, it also compares to Adobe InDesign and QuarkXPress. Professions using FreeHand include graphic design, illustration, cartography, fashion and textile design, product design, architects, scientific research, and multimedia production. FreeHand was created by Altsys Corporation in 1988 and licensed to Aldus Corporation, which released versions 1 through 4. In 1994, Aldus merged with Adobe Systems and because of the overlapping market with Adobe Illustrator, FreeHand was returned to Altsys by order of the Federal Trade Commission. Altsys was later bought by Macromedia, which released FreeHand versions 5 through 11 (FreeHand MX). In 2005, Adobe Systems acquired Macromedia and its product line which included FreeHand MX, under whose ownership it presently resides. Since 2003, FreeHand development has been discontinued; in the Adobe Systems catalog, FreeHand has been replaced by Adobe Illustrator. FreeHand MX continues to run under Windows 11 and under Mac OS X 10.6 (Snow Leopard) within Rosetta, a PowerPC code emulator, and requires a registration patch supplied by Adobe. FreeHand 10 runs without problems on Mac OS X Snow Leopard with Rosetta enabled, and does not require a registration patch. Later versions of macOS can use a Mac OS X Snow Leopard Server virtual machine to emulate the required PowerPC support. == History == === Altsys and Aldus FreeHand === In 1984, James R. Von Ehr founded Altsys Corporation to develop graphics applications for personal computers. Based in Plano, Texas, the company initially produced font editing and conversion software; Fontastic Plus, Metamorphosis, and the Art Importer. Their premier PostScript font-design package, Fontographer, was released in 1986 and was the first such program on the market. With the PostScript background having been established by Fontographer, Altsys also developed FreeHand (originally called Masterpiece) as a Macintosh Postscript-based illustration program that used Bézier curves for drawing and was similar to Adobe Illustrator. FreeHand was announced as "... a Macintosh graphics program described as having all the features of Adobe's Illustrator plus drawing tools such as those in Mac Paint and Mac Draft and special effects similar to those in Cricket Draw." Seattle's Aldus Corporation acquired a licensing agreement with Altsys Corporation to release FreeHand along with their flagship product, Pagemaker, and Aldus FreeHand 1.0 was released in 1988. FreeHand's product name used intercaps; the F and H were capitalized. The partnership between the two companies continued with Altsys developing FreeHand and with Aldus controlling marketing and sales. After 1988, a competitive exchange between Aldus FreeHand and Adobe Illustrator ensued on the Macintosh platform with each software advancing new tools, achieving better speed, and matching significant features. Windows PC development also allowed Illustrator 2 (aka, Illustrator 88 on the Mac) and FreeHand 3 to release Windows versions to the graphics market. FreeHand 1.0 sold for $495 in 1988. It included the standard drawing tools and features as other draw programs including special effects in fills and screens, text manipulation tools, and full support for CMYK color printing. It was also possible to create and insert PostScript routines anywhere within the program. FreeHand performed in preview mode instead of keyline mode but performance was slower. FreeHand 2.0 sold for $495 in 1989. Besides improving on the features of FreeHand 1.0, FreeHand 2 added faster operation, Pantone colors, stroked text, flexible fill patterns and automatically import graphic assets from other programs. It added accurate control over a color monitor screen display, limited only by its resolution. FreeHand 3.0 sold for $595 in 1991. New features included resizable color, style, and layer panels including an Attributes menu. Also tighter precision of both the existing tools and aligning of objects. FH3 created compound Paths. Text could be converted to paths, applied to an ellipse, or made vertical. Carried over from version 1.0, FreeHand 3 suffered by having text entered into a dialog box instead of directly to the page. In October 1991, a 3.1 upgrade made FreeHand work with System 7 but additionally, it supported pressure-sensitive drawing which offered varying line widths with a users stroke. It improved element manipulation and added more import/export options. FreeHand 4.0 sold for $595 in 1994. Altsys ported FreeHand 3.0 to the NeXT system creating a new program named Virtuoso. Virtuoso continued its development at Altsys and version 2.0 of Virtuoso was feature-equivalent to FreeHand 4 (with the addition of NeXT-specific features such as Services and Display PostScript) and file compatible, with Virtuoso 2 able to open FreeHand 4 files and vice versa. A prominent feature of this version was the ability to type directly into the page and wrap inside or outside any shape. It also included drag-and-drop color imaging, a larger pasteboard, and a user interface that featured floating, rollup panels. The colors palette included a color mixer for adding new colors to the swatch list. Speed increases were made. In the same year of FreeHand 4 release, Adobe Systems announced merger plans with Aldus Corporation for $525 million. Fear about the end of competition between these two leading applications was reported in the media and expressed by customers (Illustrator versus FreeHand and Adobe Photoshop versus Aldus PhotoStyler.) Because of this overlapping of the market, Altsys stepped in by suing Aldus, saying that the merger deal was "a prima facie violation of a non-compete clause within the FreeHand licensing agreement." Altsys CEO Jim Von Ehr explained, "No one loves FreeHand more than we do. We will do whatever it takes to see it survive." The Federal Trade Commission issued a complaint against Adobe Systems on October 18, 1994, ordering a divestiture of FreeHand to "remedy the lessening of competition resulting from the acquisition as alleged in the Commission's complaint," and further, the FTC ordering, "That for a period of ten (10) years from the date on which this order becomes final, respondents shall not, without the prior approval of the Commission, directly or indirectly, through subsidiaries, partnerships, or otherwise .. Acquire any Professional Illustration Software or acquire or enter into any exclusive license to Professional Illustration Software;" (referring to FreeHand.) FreeHand was returned to Altsys with all licensing and marketing rights as well as Aldus FreeHand's customer list. === Macromedia Freehand === By late 1994, Altsys still retained all rights to FreeHand. Despite brief plans to keep it in-house to sell it along with Fontographer and Virtuoso, Altsys reached an agreement with the multimedia software company, Macromedia, to be acquired. This mutual agreement provided FreeHand and Fontographer a new home with ample resources for marketing, sales, and competition against the newly merged Adobe-Aldus company. Altsys would remain in Richardson, Texas, but would be renamed as the Digital Arts Group of Macromedia and was responsible for the continued development of FreeHand. Macromedia received FreeHand's 200,000 customers and expanded its traditional product line of multimedia graphics software to illustration and design graphics software. CEO James Von Ehr became a Macromedia vice-president until 1997 when he left to start another venture. FreeHand 5.0 sold for $595 in 1995. This version featured a more customizable and expanded workspace, multiple views, stronger design and editing tools, a report generator, spell check, paragraph styles, multicolor gradient fills up to 64 colors, speed improvements, and it accepted Illustrator plugins. In September 1995, a 5.5 upgrade added Photoshop plug-in support, PDF import capabilities, the Extract feature, inline graphics to text, improved auto-expanding text containers, the Crop feature, and the Create PICT Image feature. A FreeHand 5.5 upgrade was part of the FreeHand Graphics Studio (a suite that included Fontographer, Macromedia xRes image editing application, and Extreme 3D animation and modeling application). FreeHand 6.0 in 1996. This version only existed in beta. Some Freehand 7 prerelease versions were released under the Freehand 6 tag. FreeHand 7.0 sold for $399 in 1996, or $449 as part of the FreeHand Graphics Studio (see above.) Features included a redesigned user interface that allowed recombining Inspectors, Panel Tabs, Dockable Panels, Smart Cursors,

    Read more →
  • Semantic folding

    Semantic folding

    Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. == Theory == Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level. == Two-dimensional semantic space == Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vector can be obtained for any given word Y by employing the following algorithm: For each position X in the semantic map (where X represents cartesian coordinates) if the word Y is contained in the context-vector at position X then add 1 to the corresponding position in the word-vector for Y else add 0 to the corresponding position in the word-vector for Y The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. Some properties of word-SDRs that are of particular interest with respect to computational semantics are: high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". boolean logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of semantic similarity, from a simple overlap of vector elements, to a range of distance measures such as: Euclidean distance, Hamming distance, Jaccard distance, cosine similarity, Levenshtein distance, Sørensen-Dice index, etc. == Semantic spaces == Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis from Microsoft and Hyperspace Analogue to Language from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google and GloVe from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. == Visualization == The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

    Read more →
  • Iterative reconstruction

    Iterative reconstruction

    Iterative reconstruction refers to iterative algorithms used to reconstruct 2D and 3D images in certain imaging techniques. For example, in computed tomography an image must be reconstructed from projections of an object. Here, iterative reconstruction techniques are usually a better, but computationally more expensive alternative to the common filtered back projection (FBP) method, which directly calculates the image in a single reconstruction step. In recent research works, scientists have shown that extremely fast computations and massive parallelism is possible for iterative reconstruction, which makes iterative reconstruction practical for commercialization. == Basic concepts == The reconstruction of an image from the acquired data is an inverse problem. Often, it is not possible to exactly solve the inverse problem directly. In this case, a direct algorithm has to approximate the solution, which might cause visible reconstruction artifacts in the image. Iterative algorithms approach the correct solution using multiple iteration steps, which allows to obtain a better reconstruction at the cost of a higher computation time. There are a large variety of algorithms, but each starts with an assumed image, computes projections from the image, compares the original projection data and updates the image based upon the difference between the calculated and the actual projections. === Algebraic reconstruction === The Algebraic Reconstruction Technique (ART) was the first iterative reconstruction technique used for computed tomography by Hounsfield. === Iterative Sparse Asymptotic Minimum Variance === The iterative sparse asymptotic minimum variance algorithm is an iterative, parameter-free superresolution tomographic reconstruction method inspired by compressed sensing, with applications in synthetic-aperture radar, computed tomography scan, and magnetic resonance imaging (MRI). === Statistical reconstruction === There are typically five components to statistical iterative image reconstruction algorithms, e.g. An object model that expresses the unknown continuous-space function f ( r ) {\displaystyle f(r)} that is to be reconstructed in terms of a finite series with unknown coefficients that must be estimated from the data. A system model that relates the unknown object to the "ideal" measurements that would be recorded in the absence of measurement noise. Often this is a linear model of the form A x + ϵ {\displaystyle \mathbf {A} x+\epsilon } , where ϵ {\displaystyle \epsilon } represents the noise. A statistical model that describes how the noisy measurements vary around their ideal values. Often Gaussian noise or Poisson statistics are assumed. Because Poisson statistics are closer to reality, it is more widely used. A cost function that is to be minimized to estimate the image coefficient vector. Often this cost function includes some form of regularization. Sometimes the regularization is based on Markov random fields. An algorithm, usually iterative, for minimizing the cost function, including some initial estimate of the image and some stopping criterion for terminating the iterations. === Learned Iterative Reconstruction === In learned iterative reconstruction, the updating algorithm is learned from training data using techniques from machine learning such as convolutional neural networks, while still incorporating the image formation model. This typically gives faster and higher quality reconstructions and has been applied to CT and MRI reconstruction. == Advantages == The advantages of the iterative approach include improved insensitivity to noise and capability of reconstructing an optimal image in the case of incomplete data. The method has been applied in emission tomography modalities like SPECT and PET, where there is significant attenuation along ray paths and noise statistics are relatively poor. Statistical, likelihood-based approaches: Statistical, likelihood-based iterative expectation-maximization algorithms are now the preferred method of reconstruction. Such algorithms compute estimates of the likely distribution of annihilation events that led to the measured data, based on statistical principle, often providing better noise profiles and resistance to the streak artifacts common with FBP. Since the density of radioactive tracer is a function in a function space, therefore of extremely high-dimensions, methods which regularize the maximum-likelihood solution turning it towards penalized or maximum a-posteriori methods can have significant advantages for low counts. Examples such as Ulf Grenander's Sieve estimator or Bayes penalty methods, or via I.J. Good's roughness method may yield superior performance to expectation-maximization-based methods which involve a Poisson likelihood function only. As another example, it is considered superior when one does not have a large set of projections available, when the projections are not distributed uniformly in angle, or when the projections are sparse or missing at certain orientations. These scenarios may occur in intraoperative CT, in cardiac CT, or when metal artifacts require the exclusion of some portions of the projection data. In Magnetic Resonance Imaging it can be used to reconstruct images from data acquired with multiple receive coils and with sampling patterns different from the conventional Cartesian grid and allows the use of improved regularization techniques (e.g. total variation) or an extended modeling of physical processes to improve the reconstruction. For example, with iterative algorithms it is possible to reconstruct images from data acquired in a very short time as required for real-time MRI (rt-MRI). In Cryo Electron Tomography, where the limited number of projections are acquired due to the hardware limitations and to avoid the biological specimen damage, it can be used along with compressive sensing techniques or regularization functions (e.g. Huber function) to improve the reconstruction for better interpretation. Here is an example that illustrates the benefits of iterative image reconstruction for cardiac MRI.

    Read more →
  • Haskins Laboratories

    Haskins Laboratories

    Haskins Laboratories, Inc. is an independent research laboratory, founded in 1935 and located in New Haven, Connecticut since 1970. Many current Haskins researchers are affiliated with Yale University's Child Study Center and/or the University of Connecticut. Haskins is a multidisciplinary and international community of researchers who conduct basic research on spoken and written language and global literacy. A guiding perspective of their research has been to view speech and language as emerging from biological processes, including those of adaptation, response to stimuli, and conspecific interaction. Haskins Laboratories has a long history of technological and theoretical innovation, from creating systems of rules for speech synthesis and development of an early working prototype of a reading machine for the blind to developing the landmark concept of phonemic awareness as the critical preparation for learning to read an alphabetic writing system. == Research tools and facilities == Haskins Laboratories is equipped, in-house, with a comprehensive suite of tools and capabilities to advance its mission of research into language and literacy. As of 2014, these included: Anechoic chamber Electroencephalography BioSemi 264 electrode, 24 bit Active Two System EGI 128 electrode, Geodesic EEG System 300 Electromagnetic articulography (EMMA) Carstens AG501 NDI WAVE Eye tracking: HL is equipped with 3 SR Research eye-trackers. 2 Model Eyelink 1000 systems. 1 Model Eyelink 1000plus system. Magnetic resonance imaging: Haskins has access to MRI scanners through agreements with the University of Connecticut and the Yale School of Medicine. On-site, HL has a Linux computer cluster dedicated to analysis of MRI data. Motion capture: HL is equipped with a Vicon motion capture system with one Basler high-speed digital camera, six Vicon MX T-20 cameras and a Vicon MX Giganet for synching camera data and connecting cameras to the data capture computer. Near infrared spectroscopy: HL has a TechEn CW6 8x8 system (four emitters; eight detectors). Ultrasound sonogram == History == Many researchers have contributed to scientific breakthroughs at Haskins Laboratories since its founding. All of them are indebted to the pioneering work and leadership of Caryl Parker Haskins, Franklin S. Cooper, Alvin Liberman, Seymour Hutner and Luigi Provasoli. The history presented here focuses on the research program of the division of Haskins Laboratories that, since the 1940s, has been most well known for its work in the areas of speech, language, and reading. === 1930s === Caryl Haskins and Franklin S. Cooper established Haskins Laboratories in 1935. It was originally affiliated with Harvard University, MIT, and Union College in Schenectady, NY. Caryl Haskins conducted research in microbiology, radiation physics, and other fields in Cambridge, MA and Schenectady. In 1939 Haskins Laboratories moved its center to New York City. Seymour Hutner joined the staff to set up a research program in microbiology, genetics, and nutrition. The descendant of the division led by Hutner program eventually became a department of Pace University in New York. The two identically named organizations are no longer formally affiliated. === 1940s === The U. S. Office of Scientific Research and Development, under Vannevar Bush asked Haskins Laboratories to evaluate and develop technologies for assisting blinded World War II veterans. Experimental psychologist Alvin Liberman joined Haskins Laboratories to assist in developing a "sound alphabet" to represent the letters in a text for use in a reading machine for the blind. Luigi Provasoli joined Haskins Laboratories to set up a research program in marine biology. The program in marine biology moved to Yale University in 1970 and disbanded with Provasoli's retirement in 1978. === 1950s === Franklin S. Cooper invented the pattern playback, a machine that converts pictures of the acoustic patterns of speech back into sound. With this device, Alvin Liberman, Cooper, and Pierre Delattre (and later joined by Katherine Safford Harris, Leigh Lisker, Arthur Abramson, and others), discovered the acoustic cues for the perception of phonetic segments (consonants and vowels). Liberman and colleagues proposed a motor theory of speech perception to resolve the acoustic complexity: they hypothesized that we perceive speech by tapping into a biological specialization, a speech module, that contains knowledge of the acoustic consequences of articulation. Liberman, aided by Frances Ingemann and others, organized the results of the work on speech cues into a groundbreaking set of rules for speech synthesis by the Pattern Playback. === 1960s === Franklin S. Cooper and Katherine Safford Harris, working with Peter MacNeilage, were the first researchers in the U.S. to use electromyographic techniques, pioneered at the University of Tokyo, to study the neuromuscular organization of speech. Leigh Lisker and Arthur Abramson looked for simplification at the level of articulatory action in the voicing of certain contrasting consonants. They showed that many acoustic properties of voicing contrasts arise from variations in voice onset time, the relative phasing of the onset of vocal cord vibration and the end of a consonant. Their work has been widely replicated and elaborated, here and abroad, over the following decades. Donald Shankweiler and Michael Studdert-Kennedy used a dichotic listening technique (presenting different nonsense syllables simultaneously to opposite ears) to demonstrate the dissociation of phonetic (speech) and auditory (nonspeech) perception by finding that phonetic structure devoid of meaning is an integral part of language, typically processed in the left cerebral hemisphere. Liberman, Cooper, Shankweiler, and Studdert-Kennedy summarized and interpreted fifteen years of research in "Perception of the Speech Code", still among the most cited papers in the speech literature. It set the agenda for many years of research at Haskins and elsewhere by describing speech as a code in which speakers overlap (or coarticulate) segments to form syllables. Researchers at Haskins connected their first computer to a speech synthesizer designed by Haskins Laboratories' engineers. Ignatius Mattingly, with British collaborators, John N. Holmes and J.N. Shearme, adapted the Pattern playback rules to write the first computer program for synthesizing continuous speech from a phonetically spelled input. A further step toward a reading machine for the blind combined Mattingly's program with an automatic look-up procedure for converting alphabetic text into strings of phonetic symbols. === 1970s === In 1970, Haskins Laboratories moved to New Haven, Connecticut, and entered into affiliation agreements with Yale University and the University of Connecticut; Haskins remains fully independent of both Yale and UConn, administratively and financially. The lab's original location in New Haven, at 270 Crown Street (from 1970 to 2005), was leased from Yale University. Isabelle Liberman, Donald Shankweiler, and Alvin Liberman teamed up with Ignatius Mattingly to study the relationship between speech perception and reading, a topic implicit in Haskins Laboratories' research program since its inception. They developed the concept of phonemic awareness, the knowledge that would-be readers must be aware of the phonemic structure of their language in order to be able to read. Leonard Katz related the work to contemporary cognitive theory and provided expertise in experimental design and data analysis. Under the broad rubric of the "alphabetic principle", this is the core of the lab's present program of reading pedagogy. Patrick Nye joined Haskins Laboratories to lead a team working on the reading machine for the blind. The project culminated when the addition of an optical character recognizer allowed investigators to assemble the first automatic text-to-speech reading machine. By the end of the decade this technology had advanced to the point where commercial concerns assumed the task of designing and manufacturing reading machines for the blind. In 1973, Franklin S. Cooper was selected to form a panel of six experts charged with investigating the famous 18-minute gap in the White House office tapes of President Richard Nixon related to the Watergate scandal. Building on earlier work, Philip Rubin developed the sinewave synthesis program, which was then used by Robert Remez, Rubin, and colleagues to show that listeners can perceive continuous speech without traditional speech cues from a pattern of sinewaves that track the changing resonances of the vocal tract. This paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space. Philip Rubin and colleagues developed Paul Mermelstein's anatomically simplified vocal tract model, originally worked on at Bell Laboratories, into the first articulatory synthesizer that can be controlled in a phy

    Read more →