AI Art Modifier

AI Art Modifier — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AI safety

    AI safety

    AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence systems. It encompasses AI alignment (which aims to ensure AI systems behave as intended), monitoring AI systems for risks, and enhancing their robustness. The field is particularly concerned with existential risks posed by advanced AI models. Beyond technical research, AI safety involves developing norms and policies that promote safety, including advocacy for regulations at different levels of government. The field gained significant popularity in 2023, with rapid progress in generative AI and public concerns voiced by researchers and CEOs about potential dangers. During the 2023 AI Safety Summit, the United States and the United Kingdom both established their own AI Safety Institute. However, researchers have expressed concern that AI safety measures are not keeping pace with the rapid development of AI capabilities. == Motivations == Scholars discuss current risks from critical systems failures, bias, and AI-enabled surveillance, as well as emerging risks like technological unemployment, digital manipulation, weaponization, AI-enabled cyberattacks and bioterrorism. They also discuss speculative risks from losing control of future artificial general intelligence (AGI) agents, or from AI enabling perpetually stable dictatorships. === Existential safety === Some have criticized concerns about AGI, such as Andrew Ng who compared them in 2015 to "worrying about overpopulation on Mars when we have not even set foot on the planet yet". Stuart J. Russell on the other side urges caution, arguing that "it is better to anticipate human ingenuity than to underestimate it". AI researchers have widely differing opinions about the severity and primary sources of risk posed by AI technology – though surveys suggest that experts take high consequence risks seriously. In two surveys of AI researchers, the median respondent was optimistic about AI overall, but placed a 5% probability on an "extremely bad (e.g. human extinction)" outcome of advanced AI. In a 2022 survey of the natural language processing community, 37% agreed or weakly agreed that it is plausible that AI decisions could lead to a catastrophe that is "at least as bad as an all-out nuclear war". == History == Risks from AI began to be seriously discussed at the start of the computer age: Moreover, if we move in the direction of making machines which learn and whose behavior is modified by experience, we must face the fact that every degree of independence we give the machine is a degree of possible defiance of our wishes. In 1988 Blay Whitby published a book outlining the need for AI to be developed along ethical and socially responsible lines. From 2008 to 2009, the Association for the Advancement of Artificial Intelligence (AAAI) commissioned a study to explore and address potential long-term societal influences of AI research and development. The panel was generally skeptical of the radical views expressed by science-fiction authors but agreed that "additional research would be valuable on methods for understanding and verifying the range of behaviors of complex computational systems to minimize unexpected outcomes". In 2011, Roman Yampolskiy introduced the term "AI safety engineering" at the Philosophy and Theory of Artificial Intelligence conference, listing prior failures of AI systems and arguing that "the frequency and seriousness of such events will steadily increase as AIs become more capable". In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. He has the opinion that the rise of AGI has the potential to create various societal issues, ranging from the displacement of the workforce by AI, manipulation of political and military structures, to even the possibility of human extinction. His argument that future advanced systems may pose a threat to human existence prompted Elon Musk, Bill Gates, and Stephen Hawking to voice similar concerns. In 2015, dozens of artificial intelligence experts signed an open letter on artificial intelligence calling for research on the societal impacts of AI and outlining concrete directions. To date, the letter has been signed by over 8000 people including Yann LeCun, Shane Legg, Yoshua Bengio, and Stuart Russell. In the same year, a group of academics led by professor Stuart J. Russell founded the Center for Human-Compatible AI at the University of California Berkeley and the Future of Life Institute awarded $6.5 million in grants for research aimed at "ensuring artificial intelligence (AI) remains safe, ethical and beneficial". In 2016, the White House Office of Science and Technology Policy and Carnegie Mellon University announced The Public Workshop on Safety and Control for Artificial Intelligence, which was one of a sequence of four White House workshops aimed at investigating "the advantages and drawbacks" of AI. In the same year, Concrete Problems in AI Safety – one of the first and most influential technical AI Safety agendas – was published. In 2017, the Future of Life Institute sponsored the Asilomar Conference on Beneficial AI, where more than 100 thought leaders formulated principles for beneficial AI including "Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards". In 2018, the DeepMind Safety team outlined AI safety problems in specification, robustness, and assurance. The following year, researchers organized a workshop at ICLR that focused on these problem areas. In 2021, Unsolved Problems in ML Safety was published, outlining research directions in robustness, monitoring, alignment, and systemic safety. In 2023, Rishi Sunak said he wants the United Kingdom to be the "geographical home of global AI safety regulation" and to host the first global summit on AI safety. The AI safety summit took place in November 2023, and focused on the risks of misuse and loss of control associated with frontier AI models. During the summit the intention to create the International Scientific Report on the Safety of Advanced AI was announced. In 2024, The US and UK forged a new partnership on the science of AI safety. The MoU was signed on 1 April 2024 by US commerce secretary Gina Raimondo and UK technology secretary Michelle Donelan to jointly develop advanced AI model testing, following commitments announced at an AI Safety Summit in Bletchley Park in November. In 2025, an international team of 96 experts chaired by Yoshua Bengio published the first International AI Safety Report. The report, commissioned by 30 nations and the United Nations, represents the first global scientific review of potential risks associated with advanced artificial intelligence. It details potential threats stemming from misuse, malfunction, and societal disruption, with the objective of informing policy through evidence-based findings, without providing specific recommendations. == Research focus == AI safety research areas include robustness, monitoring, and alignment. === Robustness === ==== Adversarial robustness ==== AI systems are often vulnerable to adversarial examples or "inputs to machine learning (ML) models that an attacker has intentionally designed to cause the model to make a mistake". For example, in 2013, Szegedy et al. discovered that adding specific imperceptible perturbations to an image could cause it to be misclassified with high confidence. This continues to be an issue with neural networks, though in recent work the perturbations are generally large enough to be perceptible. The image on the right is predicted to be an ostrich after the perturbation is applied. (Left) is a correctly predicted sample, (center) perturbation applied magnified by 10x, (right) adversarial example. Adversarial robustness is often associated with security. Researchers demonstrated that an audio signal could be imperceptibly modified so that speech-to-text systems transcribe it to any message the attacker chooses. Network intrusion and malware detection systems also must be adversarially robust since attackers may design their attacks to fool detectors. Models that represent objectives (reward models) must also be adversarially robust. For example, a reward model might estimate how helpful a text response is and a language model might be trained to maximize this score. Researchers have shown that if a language model is trained for long enough, it will leverage the vulnerabilities of the reward model to achieve a better score and perform worse on the intended task. This issue can be addressed by improving the adversarial robustness of the reward model. More generally, any AI system used to evaluate another AI system must be adversarially robust. This could include monitoring tools, since they could also potentially be tampered with to produce a higher reward. Large language models (LLMs) can be vulnerable to prom

    Read more →
  • Motor theory of speech perception

    Motor theory of speech perception

    The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them. The hypothesis has gained more interest outside the field of speech perception than inside. This has increased particularly since the discovery of mirror neurons that link the production and perception of motor movements, including those made by the vocal tract. The theory was initially proposed in the Haskins Laboratories in the 1950s by Alvin Liberman and Franklin S. Cooper, and developed further by Donald Shankweiler, Michael Studdert-Kennedy, Ignatius Mattingly, Carol Fowler and Douglas Whalen. == Origins and development == The hypothesis has its origins in research using pattern playback to create reading machines for the blind that would substitute sounds for orthographic letters. This led to a close examination of how spoken sounds correspond to the acoustic spectrogram of them as a sequence of auditory sounds. This found that successive consonants and vowels overlap in time with one another (a phenomenon known as coarticulation). This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures. === Associationist approach === Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. This aspect of the theory was dropped, however, with the discovery that prelinguistic infants could already detect most of the phonetic contrasts used to separate different speech sounds. === Cognitivist approach === The behavioristic approach was replaced by a cognitivist one in which there was a speech module. The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception. === Changing distal objects === Initially, speech perception was assumed to link to speech objects that were both the invariant movements of speech articulators the invariant motor commands sent to muscles to move the vocal tract articulators This was later revised to include the phonetic gestures rather than motor commands, and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements. === Modern revision === The "speech is special" claim has been dropped, as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception). === Mirror neurons === The discovery of mirror neurons has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, although there are also critics. == Support == === Nonauditory gesture information === If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively heard as "sounds". This is, in fact, the case. The McGurk effect shows that seeing the production of a spoken syllable that differs from an auditory cue synchronized with it affects the perception of the auditory one. In other words, if someone hears "ba" but sees a video of someone pronouncing "ga", what they hear is different—some people believe they hear "da". People find it easier to hear speech in noise if they can see the speaker. People can hear syllables better when their production can be felt haptically. === Categorical perception === Using a speech synthesizer, speech sounds can be varied in place of articulation along a continuum from /bɑ/ to /dɑ/ to /ɡɑ/, or in voice onset time on a continuum from /dɑ/ to /tɑ/ (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being /dɑ/ and the sound on the other extreme being /tɑ/, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either /dɑ/ or /tɑ/. Likewise, the English consonant /d/ may vary in its acoustic details across different phonetic contexts (the /d/ in /du/ does not technically sound the same as the one in /di/, for example), but all /d/'s as perceived by a listener fall within one category (voiced alveolar plosive) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments." This suggests that humans identify speech using categorical perception, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track. === Speech imitation === If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing. People can repeat heard syllables more quickly than they would be able to produce them normally. === Speech production === Hearing speech activates vocal tract muscles, and the motor cortex and premotor cortex. The integration of auditory and visual input in speech perception also involves such areas. Disrupting the premotor cortex disrupts the perception of speech units such as plosives. The activation of the motor areas occurs in terms of the phonemic features which link with the vocal track articulators that create speech gestures. The perception of a speech sound is aided by pre-emptively stimulating the motor representation of the articulators responsible for its pronunciation . Auditory and motor cortical coupling is restricted to a specific range of neuronal firing frequency. === Perception-action meshing === Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neurons that are activated both by seeing (or hearing) an action and when that action is carried out. Another source of evidence is that for common coding theory between the representations used for perception and action. == Criticisms == The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary".p. 361 Several critiques of it exist. === Multiple sources === Speech perception is affected by nonproduction sources of information, such as context. Individual words are hard to understand in isolation but easy when heard in sentence context. It therefore seems that speech perception uses multiple sources that are integrated together in an optimal way. === Production === The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around. It would also predict that defects in speech production would impair speech perception, but they do not. However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists. === Speech module === Several sources of evidence for a specialized speech module have failed to be supported. Duplex perception can be observed with door slams. The McGurk effect can also be achieved with nonlinguistic stimuli, such as showing someone a video of a basketball bouncing but playing the sound of a ping-pong ball bouncing. As for categorical perception, listeners can be sensitive to acoustic differences within single phonetic categories. As a result, this part of the theory has been dropped by some researchers. === Sublexical tasks === The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process spee

    Read more →
  • Minimum resolvable contrast

    Minimum resolvable contrast

    Minimum resolvable contrast (MRC) is a subjective measure of a visible spectrum sensor’s or camera's sensitivity and ability to resolve data. A snapshot image of a series of three bar targets of selected spatial frequencies and various contrast coatings captured by the unit under test (UUT) is used to determine the MRC of the UUT, i.e., the visible spectrum camera or sensor. A trained observer selects the smallest target resolvable at each contrast level. Typically, specialized computer software collects the inputted data of the observer and provides a graph of contrast vs. spatial frequency at a given luminance level. A first order polynomial is fitted to the data and an MRC curve of spatial frequency versus contrast is generated.

    Read more →
  • Template matching

    Template matching

    Template matching is a technique in digital image processing for finding small parts of an image which match a template image. It can be used for quality control in manufacturing, navigation of mobile robots, or edge detection in images. The main challenges in a template matching task are detection of occlusion, when a sought-after object is partly hidden in an image; detection of non-rigid transformations, when an object is distorted or imaged from different angles; sensitivity to illumination and background changes; background clutter; and scale changes. == Feature-based approach == The feature-based approach to template matching relies on the extraction of image features, such as shapes, textures, and colors, that match the target image or frame. This approach is usually achieved using neural networks and deep-learning classifiers such as VGG, AlexNet, and ResNet.Convolutional neural networks (CNNs), which many modern classifiers are based on, process an image by passing it through different hidden layers, producing a vector at each layer with classification information about the image. These vectors are extracted from the network and used as the features of the image. Feature extraction using deep neural networks, like CNNs, has proven extremely effective has become the standard in state-of-the-art template matching algorithms. This feature-based approach is often more robust than the template-based approach described below. As such, it has become the state-of-the-art method for template matching, as it can match templates with non-rigid and out-of-plane transformations, as well as high background clutter and illumination changes. == Template-based approach == For templates without strong features, or for when the bulk of a template image constitutes the matching image as a whole, a template-based approach may be effective. Since template-based matching may require sampling of a large number of data points, it is often desirable to reduce the number of sampling points by reducing the resolution of search and template images by the same factor before performing the operation on the resultant downsized images. This pre-processing method creates a multi-scale, or pyramid, representation of images, providing a reduced search window of data points within a search image so that the template does not have to be compared with every viable data point. Pyramid representations are a method of dimensionality reduction, a common aim of machine learning on data sets that suffer the curse of dimensionality. == Common challenges == In instances where the template may not provide a direct match, it may be useful to implement eigenspaces to create templates that detail the matching object under a number of different conditions, such as varying perspectives, illuminations, color contrasts, or object poses. For example, if an algorithm is looking for a face, its template eigenspaces may consist of images (i.e., templates) of faces in different positions to the camera, in different lighting conditions, or with different expressions (i.e., poses). It is also possible for a matching image to be obscured or occluded by an object. In these cases, it is unreasonable to provide a multitude of templates to cover each possible occlusion. For example, the search object may be a playing card, and in some of the search images, the card is obscured by the fingers of someone holding the card, or by another card on top of it, or by some other object in front of the camera. In cases where the object is malleable or poseable, motion becomes an additional problem, and problems involving both motion and occlusion become ambiguous. In these cases, one possible solution is to divide the template image into multiple sub-images and perform matching on each subdivision. == Deformable templates in computational anatomy == Template matching is a central tool in computational anatomy (CA). In this field, a deformable template model is used to model the space of human anatomies and their orbits under the group of diffeomorphisms, functions which smoothly deform an object. Template matching arises as an approach to finding the unknown diffeomorphism that acts on a template image to match the target image. Template matching algorithms in CA have come to be called large deformation diffeomorphic metric mappings (LDDMMs). Currently, there are LDDMM template matching algorithms for matching anatomical landmark points, curves, surfaces, volumes. == Template-based matching explained using cross correlation or sum of absolute differences == A basic method of template matching sometimes called "Linear Spatial Filtering" uses an image patch (i.e., the "template image" or "filter mask") tailored to a specific feature of search images to detect. This technique can be easily performed on grey images or edge images, where the additional variable of color is either not present or not relevant. Cross correlation techniques compare the similarities of the search and template images. Their outputs should be highest at places where the image structure matches the template structure, i.e., where large search image values get multiplied by large template image values. This method is normally implemented by first picking out a part of a search image to use as a template. Let S ( x , y ) {\displaystyle S(x,y)} represent the value of a search image pixel, where ( x , y ) {\displaystyle (x,y)} represents the coordinates of the pixel in the search image. For simplicity, assume pixel values are scalar, as in a greyscale image. Similarly, let T ( x t , y t ) {\textstyle T(x_{t},y_{t})} represent the value of a template pixel, where ( x t , y t ) {\textstyle (x_{t},y_{t})} represents the coordinates of the pixel in the template image. To apply the filter, simply move the center (or origin) of the template image over each point in the search image and calculate the sum of products, similar to a dot product, between the pixel values in the search and template images over the whole area spanned by the template. More formally, if ( 0 , 0 ) {\displaystyle (0,0)} is the center (or origin) of the template image, then the cross correlation T ⋆ S {\displaystyle T\star S} at each point ( x , y ) {\displaystyle (x,y)} in the search image can be computed as: ( T ⋆ S ) ( x , y ) = ∑ ( x t , y t ) ∈ T T ( x t , y t ) ⋅ S ( x t + x , y t + y ) {\displaystyle (T\star S)(x,y)=\sum _{(x_{t},y_{t})\in T}T(x_{t},y_{t})\cdot S(x_{t}+x,y_{t}+y)} For convenience, T {\displaystyle T} denotes both the pixel values of the template image as well as its domain, the bounds of the template. Note that all possible positions of the template with respect to the search image are considered. Since cross correlation values are greatest when the values of the search and template pixels align, the best matching position ( x m , y m ) {\displaystyle (x_{m},y_{m})} corresponds to the maximum value of T ⋆ S {\displaystyle T\star S} over S {\displaystyle S} . Another way to handle translation problems on images using template matching is to compare the intensities of the pixels, using the sum of absolute differences (SAD) measure. To formulate this, let I S ( x s , y s ) {\displaystyle I_{S}(x_{s},y_{s})} and I T ( x t , y t ) {\displaystyle I_{T}(x_{t},y_{t})} denote the light intensity of pixels in the search and template images with coordinates ( x s , y s ) {\displaystyle (x_{s},y_{s})} and ( x t , y t ) {\displaystyle (x_{t},y_{t})} , respectively. Then by moving the center (or origin) of the template to a point ( x , y ) {\displaystyle (x,y)} in the search image, as before, the sum of absolute differences between the template and search pixel intensities at that point is: S A D ( x , y ) = ∑ ( x t , y t ) ∈ T | I T ( x t , y t ) − I S ( x t + x , y t + y ) | {\displaystyle SAD(x,y)=\sum _{(x_{t},y_{t})\in T}\left\vert I_{T}(x_{t},y_{t})-I_{S}(x_{t}+x,y_{t}+y)\right\vert } With this measure, the lowest SAD gives the best position for the template, rather than the greatest as with cross correlation. SAD tends to be relatively simple to implement and understand, but it also tends to be relatively slow to execute. A simple C++ implementation of SAD template matching is given below. == Implementation == In this simple implementation, it is assumed that the above described method is applied on grey images: This is why Grey is used as pixel intensity. The final position in this implementation gives the top left location for where the template image best matches the search image. One way to perform template matching on color images is to decompose the pixels into their color components and measure the quality of match between the color template and search image using the sum of the SAD computed for each color separately. == Speeding up the process == In the past, this type of spatial filtering was normally only used in dedicated hardware solutions because of the computational complexity of the operation, however we can lessen this complexity b

    Read more →
  • Neuro-symbolic AI

    Neuro-symbolic AI

    Neuro-symbolic AI is a subfield of artificial intelligence that integrates neural methods (e.g., neural networks and deep learning) with symbolic methods (e.g., formal logic, knowledge representation, and automated reasoning). The goal is to combine the strengths of both approaches, resulting in AI systems that can be trained from raw data and demonstrate robustness against outliers or errors in the base data, while preserving explainability, explicit use of expert knowledge, and explicit cognitive reasoning. As argued by Leslie Valiant and others, the effective construction of rich computational cognitive models demands the combination of symbolic reasoning and efficient machine learning. Gary Marcus argued, "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much of useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only known machinery that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation." Angelo Dalli, Henry Kautz, Francesca Rossi, and Bart Selman also argued for such a synthesis. Their arguments attempt to address the two kinds of thinking, as discussed in Daniel Kahneman's book Thinking, Fast and Slow. It describes cognition as encompassing two components: System 1 is fast, reflexive, intuitive, and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is used for pattern recognition. System 2 handles planning, deduction, and deliberative thinking. In this view, deep learning best handles the first kind of cognition, while symbolic reasoning best handles the second kind. Both are necessary for the development of a robust and reliable AI system capable of learning, reasoning, and interacting with humans to accept advice and answer questions. Since the 1990s, dual-process models with explicit references to the two contrasting systems have been the focus of research in both the fields of AI and cognitive science by numerous researchers. In 2025, the adoption of neurosymbolic AI, an approach that integrates neural networks with symbolic reasoning, increased in response to the need to address hallucination issues in large language models. For example, Amazon implemented Neurosymbolic AI in its Vulcan warehouse robots and Rufus shopping assistant to enhance accuracy and decision-making. == Approaches == Approaches for integration are diverse. Henry Kautz's taxonomy of neuro-symbolic architectures follows, along with some examples: Symbolic Neural symbolic is the current approach of many neural models in natural language processing, where words or subword tokens are the ultimate input and output of large language models. Examples include BERT, RoBERTa, and GPT-3. Symbolic[Neural] is exemplified by AlphaGo, where symbolic techniques are used to invoke neural techniques. In this case, the symbolic approach is Monte Carlo tree search and the neural techniques learn how to evaluate game positions. Neural | Symbolic uses a neural architecture to interpret perceptual data as symbols and relationships that are reasoned about symbolically. Neural-Concept Learner is an example. Neural: Symbolic → Neural relies on symbolic reasoning to generate or label training data that is subsequently learned by a deep learning model, e.g., to train a neural model for symbolic computation by using a Macsyma-like symbolic mathematics system to create or label examples. NeuralSymbolic uses a neural net that is generated from symbolic rules. An example is the Neural Theorem Prover, which constructs a neural network from an AND-OR proof tree generated from knowledge base rules and terms. Logic Tensor Networks also fall into this category. Neural[Symbolic] according to Kautz, this approach embeds true symbolic reasoning inside a neural network. These are tightly-coupled neural-symbolic systems, in which the logical inference rules are internal to the neural network. This way, the neural network internally computes the inference from the premises and learns to reason based on logical inference systems. Early work on connectionist modal and temporal logics by Garcez, Lamb, and Gabbay is aligned with this approach. These categories are not exhaustive, as they do not consider multi-agent systems. In 2005, Bader and Hitzler presented a more fine-grained categorization that took into account, e.g., whether the use of symbols included logic and, if so, whether the logic was propositional or first-order logic. The 2005 categorization and Kautz's taxonomy above are compared and contrasted in a 2021 article. Sepp Hochreiter argued that Graph Neural Networks "...are the predominant models of neural-symbolic computing" since "[t]hey describe the properties of molecules, simulate social networks, or predict future states in physical and engineering applications with particle-particle interactions." == Artificial general intelligence == Gary Marcus argues that "...hybrid architectures that combine learning and symbol manipulation are necessary for robust intelligence, but not sufficient", and that there are ...four cognitive prerequisites for building robust artificial intelligence: hybrid architectures that combine large-scale learning with the representational and computational powers of symbol manipulation, large-scale knowledge bases—likely leveraging innate frameworks—that incorporate symbolic knowledge along with other forms of knowledge, reasoning mechanisms capable of leveraging those knowledge bases in tractable ways, and rich cognitive models that work together with those mechanisms and knowledge bases. This echoes earlier calls for hybrid models as early as the 1990s. == History == Garcez and Lamb described research in this area as ongoing, at least since the 1990s. During that period, the terms symbolic and sub-symbolic AI were popular. A series of workshops on neuro-symbolic AI has been held annually since 2005 Neuro-Symbolic Artificial Intelligence. In the early 1990s, an initial set of workshops on this topic were organized. == Research == Key research questions remain, such as: What is the best way to integrate neural and symbolic architectures? How should symbolic structures be represented within neural networks and extracted from them? How should common-sense knowledge be learned and reasoned about? How can abstract knowledge that is hard to encode logically be handled? == Implementations == Implementations of neuro-symbolic approaches include: AllegroGraph: an integrated Knowledge Graph based platform for neuro-symbolic application development. Scallop: a language based on Datalog that supports differentiable logical and relational reasoning. Scallop can be integrated in Python and with a PyTorch learning module. Logic Tensor Networks: encode logical formulas as neural networks and simultaneously learn term encodings, term weights, and formula weights. DeepProbLog: combines neural networks with the probabilistic reasoning of ProbLog. Abductive Learning: integrates machine learning and logical reasoning in a balanced-loop via abductive reasoning, enabling them to work together in a mutually beneficial way. SymbolicAI: a compositional differentiable programming library.

    Read more →
  • Uniphore

    Uniphore

    Uniphore is an American software company that develops artificial intelligence platforms for business use. The company is headquartered in Palo Alto, California, with offices in the United States, United Kingdom, Spain, Israel, United Arab Emirates, and India. Uniphore is known for its "Business AI Cloud," an enterprise AI platform that combines data, knowledge, models, and software agents for use in sales, marketing, and service. The company has also acquired firms in video emotion AI, AI agents, low-code automation, knowledge automation, voice and screen capture, customer data platforms, and data engineering. == History == Uniphore Software Systems was founded by Umesh Sachdev and Ravi Saraogi in 2008 and was incubated at IIT Madras. The company received an initial grant of $100,000 from the National Research Development Corporation. Early work focused on speech technologies for emerging markets. Uniphore partnered with companies that specialized in English and European languages, and adapting the technology for Indian languages and dialects. In 2014, Uniphore released its first flagship products, auMina, along with two other products, Akeira and amVoice. Uniphore raised series A funding, led by Kris Gopalakrishnan (cofounder of Infosys), in April 2015. The next month, Uniphore received additional investment from IDG Ventures. With input from its investors, Uniphore changed its business model from license fee-based income to a software as a service-based subscription fee model in 2015. By June 2016, it had added more than 70 global languages and expanded its services to Southeast Asia, the Middle East, and the United States. The company opened operations in Singapore in October 2016. The company raised Series B funding in October 2017, led by John Chambers and existing investors. Series C funding of $51 million was announced in August 2019 and led by March Capital. Uniphore acquired an exclusive third-party license for robotic process automation technology from NTT DATA in October 2020. In January 2021, Uniphore acquired Emotion Research Lab, a startup based in Spain that uses artificial intelligence and machine learning to analyze video and interpret emotions. The company received $140 million in Series D funding, led by Sorenson Capital Partners, in March 2021, bringing total funding to $210 million. In January 2021, Uniphore acquired Emotion Research Lab. In July 2021, it agreed to acquire Jacada, a provider of low-code/no-code automation; the transaction closed in October 2021. On February 16, 2022, Uniphore announced a $400 million Series E financing led by NEA, which valued the company at $2.5 billion. Hilarie Koplow-McAdams, an NEA venture partner and former Salesforce/New Relic executive, joined Uniphore's board in 2022. Uniphore's board has also included former Cisco CEO John Chambers, former Convergys CEO Andrea J. Ayers, and CrowdStrike CFO Burt Podbere (appointed January 2021). In February 2023, Uniphore acquired UK-based Red Box, a platform for capturing voice and screen recordings used in regulated and large-scale environments. It also acquired France-based Hexagone, a behavioral analytics firm combining computer vision and natural-language techniques. On December 5, 2024, Uniphore announced agreements to acquire ActionIQ, a customer data platform (CDP) vendor, and Infoworks, an enterprise data engineering platform. Uniphore launched the Business AI Cloud on June 9, 2025. The Business AI Cloud consists of a single, unified platform that includes data, knowledge, AI models, and AI agents. Uniphore announced in August 2025 that it had acquired Orby AI and intended to acquire Autonom8 to extend multi-agent and workflow automation capabilities. As of September 2025, Uniphore's customers included the United States Coast Guard, Singapore Police Force, London Underground, DirecTV, JPMorgan Chase, LG, DHL, UPS, Vodafone, Verizon, NTT Data, and as of May 2021, Firstsource. In October 2025, Uniphore raised $260 million in a Series F round at a reported valuation of $2.5 billion. Investors included March Capital, NEA, Nvidia, AMD, Snowflake, and Databricks. In January 2026, KPMG and Uniphore announced a collaboration focused on deploying AI agents powered by specialized small language models. The announcement was made at the World Economic Forum held in Davos. Cognizant and Uniphore announced a partnership in February 2026 to develop industry-specific AI tools for regulated sectors, which would initially focus on life sciences and finance. Uniphore and Rackspace also announced a partnership in March 2026. This partnership was announced in order to create an "Infrastructure-to-Agents" architecture, focusing on Business AI as a private cloud service. == Products == As of 2025, Uniphore's core offering is the Business AI Cloud and Business AI Suite of agentic AI applications. === Business AI Cloud === Uniphore’s Business AI Cloud is a full-stack platform that organizes enterprise data and knowledge for agentic AI applications. The platform enables deployment across clouds and existing data sources. Key layers and capabilities include the following. Agentic layer: Includes prebuilt agents, a natural-language agent builder, and orchestration based on Business Process Model and Notation (BPMN) to run AI workflows across business units. Model layer: Supports an open, interoperable mix of closed and open-source large language models (LLMs). Models can be orchestrated, governed, and replaced as needed. Knowledge layer: Organizes raw data into structured knowledge used for retrieval, explainability, and fine-tuning of small language models (SLMs). Data layer: Connects to data across multiple platforms and clouds through a zero-copy, composable fabric, enabling in-place preparation and supporting data residency and sovereignty requirements. === Business AI Suite === The Uniphore Business AI Suite has various prebuilt AI agents that can be used in customer service, sales, marketing, and human resources. The Uniphore Business AI Suite includes several LOBs (Lines of Business) for business functions with intelligent agents that are prebuilt, but composable. Built on the Uniphore Business AI Cloud, each application combines agentic automation and fine-tuned models. Marketing AI, Customer Service AI, Sales AI, and People AI (for human resources) are included. Competitors include Palantir, Microsoft Azure, Amazon Bedrock, Google's Vertex AI, Databricks, and Snowflake. == Recognition == Deloitte Technology Fast 50 India identified Uniphore as the 17th fastest-growing technology company in India in 2012 and one of the top 500 fastest growing companies in the Asia-Pacific region in 2014. In 2016, Time included Sachdev on its list of "10 millennials who are changing the world" for “building a phone that can understand almost any language”. NASSCOM named Uniphore to its "League of 10" emerging Indian technology companies in 2017. In 2020, the San Francisco Business Times ranked Uniphore as No. 7 among small companies in its list of the best places to work in the San Francisco Bay Area. In 2022, the company was featured on the Forbes AI 50 list. Uniphore was mentioned in the Deloitte Technology Fast 500 list in 2023, 2024, and 2025. In 2025, Inc. included Uniphore in its Best in Business program.

    Read more →
  • Pixel binning

    Pixel binning

    Pixel binning, also known as binning, is a process image sensors of digital cameras use to combine adjacent pixels throughout an image, by summing or averaging their values, during or after readout. It improves low-light performance while still allowing for highly detailed photographs in good light. Charge from adjacent pixels in CCD or charge-coupled device image sensors and some other image sensors can be combined during readout, increasing the line rate or frame rate. In the context of image processing, binning is the procedure of combining clusters of adjacent pixels, throughout an image, into single pixels. For example, in 2×2 binning, an array of 4 pixels becomes a single larger pixel, reducing the number of pixels to 1/4 and halving the image resolution in each dimension. The result can be the sum, average, median, minimum, or maximum value of the cluster. Some systems use more advanced algorithms such as considering the values of nearby pixels, edge detection, self-claimed "AI", etc. to increase the perceived visual quality of the final downsized image. This aggregation, although associated with loss of information, reduces the amount of data to be processed, facilitating analysis. The binned image has lower resolution, but the relative noise level in each pixel is generally reduced. == History == Normally, an increase in megapixel count on a constant image sensor size would lead to a sacrifice of the surface size of the individual pixels, which would result in each pixel being able to catch less light in the same time, thus leading to a darker and/or noisier image in low light (given the same exposure time). In the past, camera manufacturers had to compromise between low-light performance and the amount of detail in good light, by dropping the megapixel count like HTC did in 2013 with their four-megapixel "UltraPixel" camera. However, this results in less detailed images in daylight where enough light is available. With pixel binning, the camera has "the best of both worlds", meaning both the benefit of high detail in good light and the benefit of high brightness in low light. In low light, the surfaces of four or more pixels can act as one large pixel that catches far more light. For example, some smartphones such as the Samsung Galaxy A15 are able to capture photographs with up to fifty megapixels in daylight. However, in low light, the individual pixels would be too small to capture the light needed for a bright image with the short exposure time available for handheld shooting. Therefore, with pixel binning activated, the 50-megapixel image sensor acts as a 12.5-megapixel image sensor, a quarter of its original resolution, with an accordingly larger surface area per pixel.

    Read more →
  • Phase stretch transform

    Phase stretch transform

    Phase stretch transform (PST) is a computational approach to signal and image processing. One of its utilities is for feature detection and classification. PST is related to time stretch dispersive Fourier transform. It transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property (refractive index). The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. PST performs similar functionality as phase-contrast microscopy, but on digital images. PST can be applied to digital images and temporal (time series) data. It is a physics-based feature engineering algorithm. == Operation principle == Here the principle is described in the context of feature enhancement in digital images. The image is first filtered with a spatial kernel followed by application of a nonlinear frequency-dependent phase. The output of the transform is the phase in the spatial domain. The main step is the 2-D phase function which is typically applied in the frequency domain. The amount of phase applied to the image is frequency dependent, with higher amount of phase applied to higher frequency features of the image. Since sharp transitions, such as edges and corners, contain higher frequencies, PST emphasizes the edge information. Features can be further enhanced by applying thresholding and morphological operations. PST is a pure phase operation whereas conventional edge detection algorithms operate on amplitude. == Physical and mathematical foundations of phase stretch transform == Photonic time stretch technique can be understood by considering the propagation of an optical pulse through a dispersive fiber. By disregarding the loss and non-linearity in fiber, the non-linear Schrödinger equation governing the optical pulse propagation in fiber upon integration reduces to: E o ( z , t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( 0 , ω ) ⋅ e − i β 2 z ω 2 2 ⋅ e i ω t d ω {\displaystyle E_{o}(z,t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(0,\omega )\cdot e^{\frac {-i\beta _{2}z\omega ^{2}}{2}}\cdot e^{i\omega {t}}\,d\omega } (1) where β 2 {\displaystyle \beta _{2}} = GVD parameter, z is propagation distance, E o ( z , t ) {\displaystyle E_{o}(z,t)} is the reshaped output pulse at distance z and time t. The response of this dispersive element in the time-stretch system can be approximated as a phase propagator as presented in H ( ω ) = e i φ ( ω ) = e i ∑ m = 0 ∞ φ m ( ω ) = ∏ m = 0 ∞ H m ( ω ) {\displaystyle H(\omega )=e^{i\varphi (\omega )}=e^{i\sum _{m=0}^{\infty }\varphi _{m}(\omega )}=\prod _{m=0}^{\infty }H_{m}(\omega )} (2) Therefore, Eq. 1 can be written as following for a pulse that propagates through the time-stretch system and is reshaped into a temporal signal with a complex envelope given by E o ( t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( ω ) ⋅ H ( ω ) ⋅ e i ω t d ω {\displaystyle E_{o}(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(\omega )\cdot H(\omega )\cdot e^{i\omega t}\,d\omega } (3) The time stretch operation is formulated as generalized phase and amplitude operations, S { E i ( t ) } = ∫ − ∞ + ∞ F { E i ( t ) } ⋅ e i φ ( ω ) ⋅ L ~ ( ω ) ⋅ e i ω t d ω {\displaystyle \mathbb {S} \{E_{i}(t)\}=\int _{-\infty }^{+\infty }{\mathcal {F}}\{E_{i}(t)\}\cdot e^{i\varphi (\omega )}\cdot {\tilde {L}}(\omega )\cdot e^{i\omega {t}}d\omega } (4) where e i φ ( ω ) {\displaystyle e^{i\varphi (\omega )}} is the phase filter and L ~ ( ω ) {\displaystyle {\tilde {L}}(\omega )} is the amplitude filter. Next the operator is converted to discrete domain, S { E i [ n ] } = 1 N ∑ u = 0 N − 1 F F T { E i ( n ) } ⋅ K ~ ( u ) ⋅ L ~ ( u ) ⋅ e i 2 π N u n {\displaystyle \mathbb {S} \{E_{i}[n]\}={\frac {1}{N}}\sum _{u=0}^{N-1}FFT\{E_{i}(n)\}\cdot {\tilde {K}}(u)\cdot {\tilde {L}}(u)\cdot e^{i{\frac {2\pi }{N}}un}} (5) where u {\displaystyle u} is the discrete frequency, K ~ ( u ) {\displaystyle {\tilde {K}}(u)} is the phase filter, L ~ ( u ) {\displaystyle {\tilde {L}}(u)} is the amplitude filter and FFT is fast Fourier transform. The stretch operator S { } {\displaystyle \mathbb {S} \{\}} for a digital image is then S { E i [ n , m ] } = 1 M N ∑ v = 0 N − 1 ∑ u = 0 M − 1 F F T 2 { E i ( n , m ) } ⋅ K ~ ( u , v ) ⋅ L ~ ( u , v ) ⋅ e i 2 π M u m ⋅ e i 2 π N v n {\displaystyle \mathbb {S} \{E_{i}[n,m]\}={\frac {1}{MN}}\sum _{v=0}^{N-1}\sum _{u=0}^{M-1}FFT^{2}\{E_{i}(n,m)\}\cdot {\tilde {K}}(u,v)\cdot {\tilde {L}}(u,v)\cdot e^{i{\frac {2\pi }{M}}um}\cdot e^{i{\frac {2\pi }{N}}vn}} (6) In the above equations, E i [ n , m ] {\displaystyle E_{i}[n,m]} is the input image, n {\displaystyle n} and m {\displaystyle m} are the spatial variables, F F T 2 {\displaystyle FFT^{2}} is the two-dimensional fast Fourier transform, and u {\displaystyle u} and v {\displaystyle v} are spatial frequency variables. The function K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} is the warped phase kernel and the function L ~ ( u , v ) {\displaystyle {\tilde {L}}(u,v)} is a localization kernel implemented in frequency domain. PST operator is defined as the phase of the Warped Stretch Transform output as follows P S T { E i [ n , m ] } ≜ ∡ { S { E i [ x , y ] } } {\displaystyle PST\{E_{i}[n,m]\}\triangleq \measuredangle \{\mathbb {S} \{E_{i}[x,y]\}\}} (7) where ∡ { } {\displaystyle \measuredangle \{\}} is the angle operator. == PST kernel implementation == The warped phase kernel K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} can be described by a nonlinear frequency dependent phase K ~ ( u , v ) = e i φ ( u , v ) {\displaystyle {\tilde {K}}(u,v)=e^{i\varphi (u,v)}} While arbitrary phase kernels can be considered for PST operation, here we study the phase kernels for which the kernel phase derivative is a linear or sublinear function with respect to frequency variables. A simple example for such phase derivative profiles is the inverse tangent function. Consider the phase profile in the polar coordinate system φ ( u , v ) = φ polar ( r , θ ) = φ polar ( r ) {\displaystyle \varphi (u,v)=\varphi _{\text{polar}}(r,\theta )=\varphi _{\text{polar}}(r)} From d φ ( r ) d r = tan − 1 ⁡ ( r ) {\displaystyle {\frac {d\varphi (r)}{dr}}=\tan ^{-1}(r)} we have φ ( r ) = r tan − 1 ⁡ ( r ) − 1 2 log ⁡ ( r 2 + 1 ) {\displaystyle \varphi (r)=r\tan ^{-1}(r)-{\frac {1}{2}}\log(r^{2}+1)} Therefore, the PST kernel is implemented as φ ( r ) = S ⋅ ( W r ) ⋅ tan − 1 ⁡ ( W r ) − 1 2 log ⁡ ( 1 + ( W r ) 2 ) ( W r max ) ⋅ tan − 1 ⁡ ( W r max ) − 1 2 log ⁡ ( 1 + ( W r max ) 2 ) {\displaystyle \varphi (r)=S\cdot {\frac {(Wr)\cdot \tan ^{-1}(Wr)-{\frac {1}{2}}\log(1+(Wr)^{2})}{(Wr_{\max })\cdot \tan ^{-1}(Wr_{\max })-{\frac {1}{2}}\log(1+(Wr_{\max })^{2})}}} where S {\displaystyle S} and W {\displaystyle W} are real-valued numbers related to the strength and warp of the phase profile == Applications == PST has been used for edge detection in biological and biomedical images as well as synthetic-aperture radar (SAR) image processing, as well as detail and feature enhancement for digital images. PST has also been applied to improve the point spread function for single molecule imaging in order to achieve super-resolution. The transform exhibits intrinsic superior properties compared to conventional edge detectors for feature detection in low contrast visually impaired images. The PST function can also be performed on 1-D temporal waveforms in the analog domain to reveal transitions and anomalies in real time. == Open source code release == On February 9, 2016, a UCLA Engineering research group has made public the computer code for PST algorithm that helps computers process images at high speeds and "see" them in ways that human eyes cannot. The researchers say the code could eventually be used in face, fingerprint, and iris recognition systems for high-tech security, as well as in self-driving cars' navigation systems or for inspecting industrial products. The Matlab implementation for PST can also be downloaded from Matlab Files Exchange. However, it is provided for research purposes only, and a license must be obtained for any commercial applications. The software is protected under a US patent. The code was then significantly refactored and improved to support GPU acceleration. In May 2022, it became one algorithm in PhyCV: the first physics-inspired computer vision library.

    Read more →
  • Israeli cybersecurity industry

    Israeli cybersecurity industry

    The Israeli cybersecurity industry is a rapidly growing sector within Israel's technology and innovation ecosystem. Israel is internationally recognized as a powerhouse in the cybersecurity domain, with numerous cybersecurity startups, established companies, research institutions, and government initiatives. Tel Aviv itself is being ranked 7th in annual list of best global tech ecosystems, as reported by the Jerusalem Post. == History == The roots of Israel's cybersecurity industry can be traced back to the country's strong focus on national security and intelligence. The establishment of elite military units such as Unit 8200, the Israeli Intelligence Corps unit responsible for signals intelligence and code decryption, played a significant role in the development of cybersecurity expertise in the country. Many former members of Unit 8200 have gone on to establish successful cybersecurity companies or join existing organizations, bringing their unique skill sets and experience to the private sector. == Market overview == As of 2024, Israel housed more than 450 cybersecurity startups and companies. In 2023, the value of exits by Israeli tech companies reached $7.5 billion. Israel's cybersecurity industry is characterized by a high concentration of startups develop new technologies in areas such as network security, endpoint protection, data security, cloud security, and threat intelligence. In recent years, the sector has attracted significant investment from both local and international venture capital firms, as well as major technology companies such as Microsoft, Google, and IBM. Several Israeli cybersecurity companies have gained global recognition and success, with some being acquired by major corporations or conducting successful initial public offerings (IPOs). === Key Israeli cybersecurity companies === Some key Israeli cybersecurity companies include: Check Point Software Technologies CyberArk Cato Networks Radware Wiz === Financial activity === Israel’s cybersecurity sector has seen significant financial activity. As of 2023, mergers and acquisitions in the cybersecurity sector totaled $2.8 billion. In the first quarter of 2024, the sector secured $846 million in private funding. == Background == The military experience helped much. Israel's mandatory military service, combined with the expertise developed within elite units such as Unit 8200, has fostered a strong talent pool with practical experience in cybersecurity. Israel's thriving startup ecosystem, often referred to as the "Startup Nation," has fostered an environment of innovation and collaboration that has contributed to the growth of the cybersecurity industry. Israeli cybersecurity companies often collaborate with international partners, both in the private and public sectors, to share knowledge and develop joint solutions. === Government Initiatives and Support === The government also supported well through various initiatives, such as the Israel National Cyber Directorate (INCD), which works to strengthen cybersecurity defenses and promote the development of the sector. === Academic institutions === Israeli universities and research centers are involved in cybersecurity research and education, contributing to the development of new technologies and training the next generation of cybersecurity professionals. Academic Tech transfer offices in Israel also facilitate the commercialization of cybersecurity technologies. Some academic institutions with cybersecurity laboratories include: Tel Aviv University Technion Ben-Gurion University

    Read more →
  • PANGU (software)

    PANGU (software)

    The PANGU (Planet and Asteroid Natural scene Generation Utility) is a computer graphics utility of which the development was funded by ESA and performed by University of Dundee. It generates scenes of planets, moons, asteroids, spacecraft and rovers. The main purpose of the tool is to test and validate navigation techniques based on the processing of images coming from on-board sensors, such as a camera or imaging LIDAR on a planetary lander.

    Read more →
  • Verbal overshadowing

    Verbal overshadowing

    Verbal overshadowing is a phenomenon where giving a verbal description of sensory input impairs formation of memories of that input. This was first reported by Schooler and Engstler-Schooler (1990) where it was shown that the effects can be observed across multiple domains of cognition which are known to rely on non-verbal knowledge and perceptual expertise. One example of this is memory, which has been known to be influenced by language. Seminal work by Carmichael and collaborators (1932) demonstrated that when verbal labels are connected to non-verbal forms during an individual's encoding process, it could potentially bias the way those forms are reproduced. Because of this, memory performance relying on reportable aspects of memory that encode visual forms should be vulnerable to the effects of verbalization. == Initial findings == Schooler and Engstler-Schooler (1990) were the first to report findings of verbal overshadowing. In their study, participants watched a video of a simulated robbery and were instructed to either verbally describe the robber or engage in a control task. Those who engaged in giving a verbal description were less likely to correctly identify the robber from a test lineup, compared to those who engaged in the control task. A larger effect was detected when the verbal description was provided 20, rather than 5, minutes after the video, and immediately before the test lineup. A meta-analysis by Meissner and Brigham (2001) supported the effects of verbal overshadowing, showing a small but reliably negative effect. == General effects of verbal overshadowing == The effects of verbal overshadowing have been generalized across multiple domains of cognition that are known to rely on non-verbal knowledge and perceptual expertise, such as memory. Memory has been known to be influenced by language. Seminal work by Carmichael and collaborators (1932) demonstrated that labels attached to, or associated with, non-verbal forms during memory encoding can affect the way the forms were subsequently reproduced. Because of this, memory performance that relies on reportable aspects of memory that encode visual forms should be vulnerable to the effects of verbalization. Pelizzon, Brandimonte, and Luccio (2002) found that visual memory representations appear to incorporate visual, spatial, and temporal characteristics. It is explained as follows: With the temporal code (where the only information available is the sequence of the stimuli), performance levels remain high, unless participants are required to retrieve the stimuli in a different order from that used at encoding (visual cue). In this case, performance is significantly impaired, even in the presence of a visual cue. The study showed that order information acts as a link between the two separate representations of figure and background, hence preventing verbal overshadowing at encoding (temporal component) or attenuating its influence at retrieval (spatial component).(p. 960) Hatano, Ueno, Kitagami, and Kawaguchi found that verbal overshadowing is likely to occur when participants verbally described targets in detail. Detailed verbal descriptions resulted in more frequently inaccurate descriptions that in turn created inaccurate representations in the memories of participants. Inaccuracies are also likely to occur when face recognition comes immediately after verbalization. Other forms of non-verbal knowledge affected by verbal overshadowing include the following: [Verbal overshadowing] has also been observed when participants attempt to generate descriptions of other 'difficult-to-describe' stimuli such as colors (Schooler and Engstler-Schooler, 1990) or abstract figures (Brandimonte et al., 1997), or other non-visual tasks such as wine tasting (Melcher and Schooler, 1996), decision making (Wilson and Schooler, 1991), and insight problem-solving. (p. 871) (Schooler et al., 1993) Verbalization of stimuli leads to the disruption of non-reportable processes that are necessary for achieving insight solutions, which are distinct from language processes. Schooler, Ohlsson, and Brooks (1993) found that face recognition requires information that cannot be adequately verbalized, giving rise to difficulty in describing factors in recognition judgments. Subjects were less effective in solving insight problems when compelled to put their thoughts in words, which suggests that language may interfere with thought. The verbal overshadowing effect was not seen when participants engaged in articulatory suppression. Performance was reduced in both the verbal and non-verbal description conditions. This is evidence that verbal encoding plays a role in face recognition. By testing with distracting faces presented between study and test, Lloyd-Jones and Brown (2008) suggested a dual-process approach to recognition memory took place, that verbalization influenced familiarity-based processes at first, but its effects were later seen on recollection, when discrimination between items became more difficult. == Verbal overshadowing in facial recognition == The verbal overshadowing effect can be found for facial recognition because faces are predominately processed in a holistic or configurable manner. (Tanaka & Farah, 1993; Tanaka & Sengco, 1997) Verbalizing one's memory for a face is done using a featural or analytic strategy, leading to a drift from the configurable information about the face and to impaired recognition performance. However, Fallshore & Schooler (1995) found that the verbal overshadowing effect was not found when participants described faces of races different from their own. A study by Brown and Lloyd-Jones (2003) found that there was no verbal overshadowing effect found in car descriptions; it was only seen in facial descriptions. The authors noted that descriptions were no different on any measure including accuracy. It is suggested that less expertise in verbalizing faces rather than cars invokes a stronger shift in verbal and featural processing. This supports the concept of a transfer inappropriate retrieval framework and addresses some limitations of the effect. Wickham and Swift (2006) suggested that the verbal overshadowing effect is not seen in describing all faces, and one aspect that determines this is distinctiveness. Results showed that typical faces produce verbal overshadowing, while distinctive faces did not. In studies of eyewitness reports, variation in response criteria given by participants influenced the quality of the descriptions generated and accuracy on identification task, known as the retrieval-based effect. Face recognition was also impaired when subjects described a familiar face, such as a parent, or when describing a previously seen but novel face. Dodson, Johnson, and Schooler (1997) found that recognition was also impaired when participants were provided with a description of a previously seen face, and they were able to ignore provided versus self-generated descriptions more easily. This finding of verbal overshadowing suggested that eyewitness recognition is not only affected by their own descriptions, but of descriptions heard from others, such other eyewitness testimonies. == Voice recognition == The verbal overshadowing effect has also been found to affect voice identification. Research shows that describing a non-verbal stimuli leads to a decrease in recognition accuracy. In an unpublished study by Schooler, Fiore, Melcher, and Ambadar (1996), participants listened to a tape-recorded voice, after which they were asked either to verbally describe it or to not do so, and then asked to distinguish the voice from 3 similar distractor voices. The results showed that verbal overshadowing impaired accuracy of recognition based on gut feeling, suggesting an overall verbal overshadowing for voice recognition. Due to the forensic relevance of voices heard over the telephone and harassing phone calls that are often a problem for police, Perfect, Hunt, and Harris (2002) examined the influence of three factors on accuracy and confidence in voice recognition from a line-up. They expected to find an effect, because voice represents a class of stimuli that is difficult to describe verbally. This meets Schooler et al.'s (1997) modality mismatch criterion, meaning that describing the speakers age, gender, or accent is difficult, making voice recognition susceptible to the verbal overshadowing phenomenon. It was found that the method of memory encoding had no impact on performance, and that hearing a telephone voice reduced confidence but did not affect accuracy. They also found that providing a verbal description impaired accuracy but had no effect on confidence. The data showed an effect of verbal overshadowing in voice recognition and provided yet another disassociation between confidence and performance. Although there was a difference in confidence level, witnesses were able to identify voices over the telephone as accurately as voices heard direc

    Read more →
  • Baby Bundle (app)

    Baby Bundle (app)

    Baby Bundle is a parenting mobile app for iPhone and iPad. It was designed to help new parents through pregnancy and the first two years of parenthood. Developed in collaboration with medical experts, it helps track and record the child's development and growth, offers parental advice, manages vaccinations and health check-ups, stores photos and provides baby monitoring services. == History == Baby Bundle was founded in the United Kingdom by brothers, Nick and Anthony von Christierson. Each worked in investment banking prior to developing Baby Bundle, Nick at Greenhill & Co., and Anthony at Goldman Sachs. The idea for the app came when a friend's wife voiced her frustration over having multiple parenting apps on her smartphone. Nick and Anthony left their jobs to create a single app that would include all those features. They conducted market research by interviewing more than 500 parents in the UK and US. It took them a year to build the app, which was named by their mother. Looking for endorsement, they first went to the US in 2013 and partnered with parenting expert and pediatrician Dr. Jennifer Trachtenberg. Baby Bundle was launched in the US and Canadian App Stores in April 2014. In the same month, it became the #1 parenting app in iTunes and was featured by Apple as the #1 Editor's pick across all categories. Mashable called it one of the "Top 5 Can’t Miss Apps." Baby Bundle raised $1.8m seed round in March 2015 to fund development. The money came from a range of angel investors from across the US, UK and Asia. The von Christierson brothers have signed a deal to co-brand the app in the Middle East and expect to launch in Europe and Africa. == Features == Baby Bundle is an app for both the iPhone or iPad and provides smart monitoring tools and trackers for pregnancy and child development. It acts as a growth and daily activity tracker and offers parental advice, manages vaccinations and health check-ups. It has a parenting guide with tips and advice on what to expect when the baby arrives. An interactive forum also lets parents ask questions from others in the community. The app is free and also include paid premium features like the ability to turn two iPhones running into a baby monitor, a cloud service to share the child's data with a spouse and the ability to store data on more than one baby.

    Read more →
  • GNU toolchain

    GNU toolchain

    The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchain plays a vital role in development of Linux, some BSD systems, and software for embedded systems. Parts of the GNU toolchain are also directly used with or ported to other platforms such as Solaris, macOS, Microsoft Windows (via Cygwin and MinGW/MSYS/WSL2), Sony PlayStation Portable (used by PSP modding scene) and Sony PlayStation 3. == Components == Projects in the GNU toolchain are: GNU Autotools (build system) – Software build toolset from GNU GNU Binutils – GNU software development tools for executable code GNU Bison – Yacc-compatible parser generator program GNU C Library – GNU implementation of the standard C libraryPages displaying short descriptions of redirect targets GNU Compiler Collection – Free and open-source compiler for various programming languages GNU Debugger – Source-level debugger GNU m4 – General-purpose macro processor GNU make – Software build automation tool

    Read more →
  • Adobe InDesign

    Adobe InDesign

    Adobe InDesign is a desktop publishing and page layout designing software application produced by Adobe and first released in 1999. It can be used to create works such as posters, flyers, brochures, magazines, newspapers, presentations, books and ebooks. InDesign can also publish content suitable for tablet devices in conjunction with Adobe Digital Publishing Suite. Graphic designers and production artists are the principal users. InDesign is the successor to PageMaker, which Adobe acquired by buying Aldus Corporation in late 1994. (Freehand, Aldus's competitor to Adobe Illustrator, was licensed from Altsys, the maker of Fontographer.) By 1998, PageMaker had lost much of the professional market to the comparatively feature-rich QuarkXPress version 3.3, released in 1992, and version 4.0, released in 1996. In 1999, Quark announced its offer to buy Adobe and to divest the combined company of PageMaker to avoid problems under United States antitrust law. Adobe declined Quark's offer and continued to develop a new desktop publishing application. Aldus had begun developing a successor to PageMaker, code-named "Shuksan". Later, Adobe code-named the project "K2", and Adobe released InDesign 1.0 in 1999. InDesign exports documents in Adobe's Portable Document Format (PDF) and supports multiple languages. It was the first DTP application to support Unicode character sets, advanced typography with OpenType fonts, advanced transparency features, layout styles, optical margin alignment, and cross-platform scripting with JavaScript. Later versions of the software introduced new file formats. To support the new features, especially typography, introduced with InDesign CS, the program and its document format are not backward-compatible. Instead, InDesign CS2 introduced the INX (.inx) format, an XML-based document representation, to allow backward compatibility with future versions. InDesign CS versions updated with the 3.1 April 2005 update can read InDesign CS2-saved files exported to the .inx format. The InDesign Interchange format does not support versions earlier than InDesign CS. With InDesign CS4, Adobe replaced INX with InDesign Markup Language (IDML), another XML-based document representation. InDesign was the first native Mac OS X publishing software. With the third major version, InDesign CS, Adobe increased InDesign's distribution by bundling it with Adobe Photoshop, Adobe Illustrator, and Adobe Acrobat in Adobe Creative Suite. Adobe developed InDesign CS3 (and Creative Suite 3) as universal binary software compatible with native Intel and PowerPC Macs in 2007, two years after the announced 2005 schedule, inconveniencing early adopters of Intel-based Macs. Adobe CEO Bruce Chizen said, "Adobe will be first with a complete line of universal applications." == File format == The MIME type is not official File Open formats: indd, indl, indt, indb, inx, idml, pmd, xqx New File formats: indd, indl, indb File Save As formats: indd, indt Save file format for InCopy: icma (Assignment file) icml (Content file, Exported file) icap (Package for InCopy) idap (Package for InDesign) File Export formats: pdf, idml, icml, eps, jpg, txt, XML, rtf == Versions == Newer versions can, as a rule, open files created by older versions, but the reverse is not true. Current versions can export the InDesign file as an IDML file (InDesign Markup Language), which can be opened by InDesign versions from CS4 upwards; older versions from CS4 down can export to an INX file (InDesign Interchange format). === Server version === In October 2005, Adobe released InDesign Server CS2, a modified version of InDesign (without a user interface) for Windows and Macintosh server platforms. It does not provide any editing client; rather, it is for use by developers in creating client-server solutions with the InDesign plug-in technology. In March 2007 Adobe officially announced Adobe InDesign CS3 Server as part of the Adobe InDesign family. == Features == Paragraph styles are an essential tool for designers when working with text in Adobe InDesign. Despite their menacing appearance, they are straightforward to operate. Other features that make InDesign a good tool for working with text and paragraphs include: Creating frames and shapes Aligning objects with grids and guides Manipulating objects Organizing objects Importing text Formatting text Spell checking Importing images Parent pages (formerly master pages) Paragraph styles == Internationalization and localization == InDesign Middle Eastern editions have unique settings for laying out Arabic or Hebrew text. They feature: Text settings: Special settings for laying out Arabic or Hebrew text, such as: Ability to use Arabic, Persian or Hindi digits; Use kashidas for letter spacing and full justification; Ligature option; Adjust the position of diacritics, such as vowels of the Arabic script; Justify text in three possible ways: Standard, Arabic, Naskh; Option to insert special characters, including Geresh, Gershayim, Maqaf for Hebrew and Kashida for Arabic texts; Apply standard, Arabic, or Hebrew styles for page, paragraph, and footnote numbering. Bi-directional text flow: Right-to-left behavior applies to several objects: Story, paragraph, character, and table. It allows mixing right-to-left and left-to-right words, paragraphs, and stories in a document. Changing the direction of neutral characters (e.g., / or ?) is possible according to the user's keyboard language. Table of contents: Provides a table of contents titles, one for each supported language. This table is sorted according to the chosen language. InDesign CS4 Middle Eastern versions allow users to select the language of the index title and cross-references. Indices: This allows the creation of a simple keyword index or a somewhat more detailed index of the information in the text using embedded indexing codes. Unlike more sophisticated programs, InDesign cannot insert character style information as part of an index entry (e.g., when indexing book, journal, or movie titles). Indices are limited to four levels (the top level and three sub-levels). Like tables of contents, indices can be sorted according to the selected language. Importing and exporting: Can import QuarkXPress files up to version 4.1 (1999), even using Arabic XT, Arabic Phonyx, or Hebrew XPressWay fonts, retaining the layout and content. Includes 50 import/export filters, including a Microsoft Word 97-98-2000 import filter and a plain text import filter. Exports IDML files can be read by QuarkXPress 2017. Reverse layout: Include a reverse layout feature to reverse the layout of a document when converting a left-to-right document to a right-to-left one or vice versa. Complex script rendering: InDesign supports Unicode character encoding, and Middle Eastern editions support complex text layouts for Arabic and Hebrew complex scripts. The underlying Arabic and Hebrew support is present in the Western editions of InDesign CS4, CS5, CS5.5, and CS6, but the user interface is not exposed, making it difficult to access.

    Read more →
  • Iterative reconstruction

    Iterative reconstruction

    Iterative reconstruction refers to iterative algorithms used to reconstruct 2D and 3D images in certain imaging techniques. For example, in computed tomography an image must be reconstructed from projections of an object. Here, iterative reconstruction techniques are usually a better, but computationally more expensive alternative to the common filtered back projection (FBP) method, which directly calculates the image in a single reconstruction step. In recent research works, scientists have shown that extremely fast computations and massive parallelism is possible for iterative reconstruction, which makes iterative reconstruction practical for commercialization. == Basic concepts == The reconstruction of an image from the acquired data is an inverse problem. Often, it is not possible to exactly solve the inverse problem directly. In this case, a direct algorithm has to approximate the solution, which might cause visible reconstruction artifacts in the image. Iterative algorithms approach the correct solution using multiple iteration steps, which allows to obtain a better reconstruction at the cost of a higher computation time. There are a large variety of algorithms, but each starts with an assumed image, computes projections from the image, compares the original projection data and updates the image based upon the difference between the calculated and the actual projections. === Algebraic reconstruction === The Algebraic Reconstruction Technique (ART) was the first iterative reconstruction technique used for computed tomography by Hounsfield. === Iterative Sparse Asymptotic Minimum Variance === The iterative sparse asymptotic minimum variance algorithm is an iterative, parameter-free superresolution tomographic reconstruction method inspired by compressed sensing, with applications in synthetic-aperture radar, computed tomography scan, and magnetic resonance imaging (MRI). === Statistical reconstruction === There are typically five components to statistical iterative image reconstruction algorithms, e.g. An object model that expresses the unknown continuous-space function f ( r ) {\displaystyle f(r)} that is to be reconstructed in terms of a finite series with unknown coefficients that must be estimated from the data. A system model that relates the unknown object to the "ideal" measurements that would be recorded in the absence of measurement noise. Often this is a linear model of the form A x + ϵ {\displaystyle \mathbf {A} x+\epsilon } , where ϵ {\displaystyle \epsilon } represents the noise. A statistical model that describes how the noisy measurements vary around their ideal values. Often Gaussian noise or Poisson statistics are assumed. Because Poisson statistics are closer to reality, it is more widely used. A cost function that is to be minimized to estimate the image coefficient vector. Often this cost function includes some form of regularization. Sometimes the regularization is based on Markov random fields. An algorithm, usually iterative, for minimizing the cost function, including some initial estimate of the image and some stopping criterion for terminating the iterations. === Learned Iterative Reconstruction === In learned iterative reconstruction, the updating algorithm is learned from training data using techniques from machine learning such as convolutional neural networks, while still incorporating the image formation model. This typically gives faster and higher quality reconstructions and has been applied to CT and MRI reconstruction. == Advantages == The advantages of the iterative approach include improved insensitivity to noise and capability of reconstructing an optimal image in the case of incomplete data. The method has been applied in emission tomography modalities like SPECT and PET, where there is significant attenuation along ray paths and noise statistics are relatively poor. Statistical, likelihood-based approaches: Statistical, likelihood-based iterative expectation-maximization algorithms are now the preferred method of reconstruction. Such algorithms compute estimates of the likely distribution of annihilation events that led to the measured data, based on statistical principle, often providing better noise profiles and resistance to the streak artifacts common with FBP. Since the density of radioactive tracer is a function in a function space, therefore of extremely high-dimensions, methods which regularize the maximum-likelihood solution turning it towards penalized or maximum a-posteriori methods can have significant advantages for low counts. Examples such as Ulf Grenander's Sieve estimator or Bayes penalty methods, or via I.J. Good's roughness method may yield superior performance to expectation-maximization-based methods which involve a Poisson likelihood function only. As another example, it is considered superior when one does not have a large set of projections available, when the projections are not distributed uniformly in angle, or when the projections are sparse or missing at certain orientations. These scenarios may occur in intraoperative CT, in cardiac CT, or when metal artifacts require the exclusion of some portions of the projection data. In Magnetic Resonance Imaging it can be used to reconstruct images from data acquired with multiple receive coils and with sampling patterns different from the conventional Cartesian grid and allows the use of improved regularization techniques (e.g. total variation) or an extended modeling of physical processes to improve the reconstruction. For example, with iterative algorithms it is possible to reconstruct images from data acquired in a very short time as required for real-time MRI (rt-MRI). In Cryo Electron Tomography, where the limited number of projections are acquired due to the hardware limitations and to avoid the biological specimen damage, it can be used along with compressive sensing techniques or regularization functions (e.g. Huber function) to improve the reconstruction for better interpretation. Here is an example that illustrates the benefits of iterative image reconstruction for cardiac MRI.

    Read more →