CarPlay

CarPlay

CarPlay is an Apple standard that enables a car radio or automotive head unit to be a display and controller for an iOS device. It is available on iPhone 5 and later models running iOS 7.1 or later. More than 800 car and motorcycle models support CarPlay, according to Apple. Vehicle owners can add support by installing certain aftermarket vehicle audio products. Most CarPlay systems connect to iOS through USB, some are wireless, and wireless support can be added through aftermarket dongles. CarPlay Ultra, a more integrated version of CarPlay, was first announced on Aston Martin DBX707 in May 2025. == Software == Apple's CarPlay-enabled apps include: Phone Apple Music Apple Maps Calendar Messages Audiobooks (part of Apple Books) Podcasts Settings News Developers must obtain permission from Apple to develop CarPlay-enabled apps. Such apps fall into five categories: Audio: primarily provide audio content, such as music or podcasts. Examples: Amazon Music, Audible, Google Play Music, iHeartRadio, QQ Music, Spotify, and Overcast. Navigation: turn-by-turn guidance, including searching for points of interests and navigating to a destination. Examples: AutoNavi, Baidu Maps, Google Maps, ChargeFinder and Waze. Automaker-made apps allow a user to control vehicle-specific features such as climate controls, gas levels, or radio via CarPlay. Messaging/Voice over IP (VoIP): listen to new messages and reply using dictation in an audio-only interface. Messaging apps on CarPlay integrate with third-party Siri support (known as SiriKit), while VoIP apps integrate with the iOS calling interface using CallKit. Examples: Telegram, WhatsApp, and Zoom. Food-ordering and parking-services apps. To discourage distracted driving, Siri is used extensively, providing voice turn-by-turn navigation guidance and voice-input for text messages. Newscast-style weather and stock results are announced instead of displayed. Requests that bring up visual information may be blocked when the car is in gear, and most native CarPlay apps deliver audio content with minimal interaction. CarPlay-enabled apps installed on the device appear on the CarPlay home screen unless disabled by the user. The inclusion or exclusion and order of app appearance can be changed on a per-vehicle basis. == Hardware == Most of the CarPlay software runs on the connected iPhone. The CarPlay interface provides audio output and a visual display to the vehicle's infotainment system, while adapting to the vehicle's available control methods, including touch screens, rotary dials, physical buttons, steering-wheel controls, and hands-free microphones. Aftermarket head units may support CarPlay or Android Auto, and many support both platforms. === Wired CarPlay === In a wired CarPlay configuration, the iPhone connects to the vehicle or head unit via a USB cable. The USB connection supplies power to the iPhone and provides a stable data link for audio, video, and control input. Wired CarPlay is supported by a wide range of factory-installed infotainment systems and aftermarket head units. Some third-party devices marketed as wireless CarPlay adapters operate by emulating a wired CarPlay connection to the vehicle. These devices plug into the vehicle's USB port and present themselves as a wired CarPlay interface, while separately establishing a wireless connection to the iPhone. Such devices still require the vehicle or head unit to support standard (wired) CarPlay. === Wireless CarPlay === Wireless CarPlay allows the iPhone to connect to a compatible vehicle or head unit without a physical cable. During the initial pairing process, the iPhone exchanges network credentials with the CarPlay receiver over Bluetooth. Once paired, CarPlay data is transmitted over a two-way Wi-Fi connection between the phone and the vehicle. Wireless CarPlay support depends on both the vehicle or head unit hardware and the iPhone model, and is generally limited to newer factory systems and select aftermarket receivers. == History == === Predecessor === In 2008, one year after the release of the iPhone, Mercedes vehicles were first to sell an audio system incorporating both the iPod and iPhone, equipped with 30-pin iOS input jacks. The new 2008 Harman Kardon NTG 2.5 featured full audio streaming, syncing, charging and control integrated into the steering wheel controls, instrument panel, and head unit. Apple was working with Mercedes to develop iOS compatible audio systems into their cars first only a year after iPhone launch. With an Apple Lightning-to-30-pin adapter, iPhones/iPods remain backwards-compatible with the Harman Kardon 2.5 and later models. This is the earliest audio system specifically engineered for iPod/iPhone integration, which predated CarPlay and every other manufacturer incorporating iOS into vehicles. The concept of CarPlay was based on the iOS 4 feature called "iPod Out" which was produced through several years of joint development by Apple and the BMW Group's Technology Office USA. iPod Out enabled vehicles with the necessary infrastructure to "host" the analog video and audio from a supporting iOS device while receiving inputs, such as button presses and knob rotations, from a car's infotainment system, to drive the "hosted" user interface in the vehicle's built-in display. It was announced at WWDC 2010 and first shipped in BMW Group vehicles in early 2011. The BMW and Mini option was called "PlugIn" and paved the way for the first cross-OEM platforms, introducing the concept of requiring a car-specific interface for apps (as opposed to MirrorLink's simple and insufficient mirroring of what was shown on the smartphone's screen). === Development === CarPlay's codename was Stark. Apple's Eddy Cue announced it as iOS in the Car at WWDC 2013. In January 2014, it was reported that Apple's hardware-oriented corporate culture had led to release delays. iOS in the Car was then rebranded and launched as CarPlay with significant design changes at the Geneva Motor Show in March 2014 with Ferrari, Kia, Mercedes-Benz, and Volvo among the first car manufacturers. At WWDC 2022, Apple announced plans to release an all-new version of CarPlay, informally dubbed CarPlay 2. The new version was said to be able to control vehicle functions, access vehicle stats, and take over multiple vehicle screens. Officials said they planned to release it in late 2024 and that manufacturers that are planning to adopt the new CarPlay include: Audi, Acura, Ford, Honda, Infiniti, Jaguar, Land Rover, Lincoln, Mercedes-Benz, Nissan, Polestar, Porsche, Renault, and Volvo. In January 2025, amidst delays, Apple removed the planned released date from its website. On May 15, 2025, Apple announced that next-generation CarPlay, now called CarPlay Ultra, would be included with all new vehicles from Aston Martin. Existing vehicles will also be receiving CarPlay Ultra through a future software update. It is only available in the US and Canada. == Timeline == June 2013: Apple introduced iOS in the Car; an early version of CarPlay that was never publicly released, at WWDC 2013. June 2013: BMW officials announced their cars would not support iOS in the Car; they later changed their minds. November 2013: Siri Eyes Free mode was offered as a dealer-installed accessory in the US to some Honda Accord and Acura RDX & ILX models. In December, Honda offered additional integration, featuring new HondaLink services, on some US and Canada models of the Civic and the Fit. March 2014: Apple introduced CarPlay, which was renamed from iOS in the Car with significant design changes, at the 2014 Geneva Motor Show with automakers Ferrari, Mercedes-Benz and Volvo. September 2014: A Ferrari FF was the first car with a full version of CarPlay. November 2014: Hyundai announced the Sonata sedan would be their first model with available CarPlay by the end of the first quarter of 2015. January 2015: Volkswagen announced CarPlay support would be coming later in 2015 and would be either standard or available on the majority of their 2016 model year lineup. May 2015: General Motors announced CarPlay would be available starting with 14 different 2016 model year Chevrolet vehicles. July 2015: Honda announced CarPlay would be available in their vehicles starting with the 2016 Honda Accord. December 2015: Volvo implemented CarPlay in the 2016 Volvo XC90 as their first vehicle with CarPlay support. December 2015: Mercedes-Benz confirmed that CarPlay would be available starting with select 2016 model year vehicles. January 2016: Apple released a list detailing the car models which support CarPlay. January 2016: Ford announced CarPlay would be available on all 2017 Ford/Lincoln model year vehicles equipped with the Sync 3 infotainment system. January 2016: FCA (now a part of Stellantis) announced CarPlay would be available on their UConnect infotainment system starting with select 2016 model year vehicles. March 2016: Subaru announced the beginning of CarPlay and Android Auto support, st

Artificial intelligence in hiring

Artificial intelligence can be used to automate aspects of the job recruitment process. Advances in artificial intelligence, such as the advent of machine learning and the growth of big data, enable AI to be utilized to recruit, screen, and predict the success of applicants. Proponents of artificial intelligence in hiring claim it reduces bias, assists with finding qualified candidates, and frees up human resource workers' time for other tasks, while opponents worry that AI perpetuates inequalities in the workplace and will eliminate jobs. Despite the potential benefits, the ethical implications of AI in hiring remain a subject of debate, with concerns about algorithmic transparency, accountability, and the need for ongoing oversight to ensure fair and unbiased decision-making throughout the recruitment process. == Background == It is common for companies to use AI to automate aspects of their hiring process, especially the hospitality, finance, and tech industries. == Uses == === Screeners === Screeners are tests that allow companies to sift through a large applicant pool and extract applicants that have desirable features. What factors are used to screen applicants is a concern to ethicists and civil rights activists. A screener that favors people who have similar characteristics to those already employed at a company may perpetuate inequalities. For example, if a company that is predominantly white and male uses its employees' data to train its screener it may accidentally create a screening process that favors white, male applicants. The automation of screeners also has the potential to reduce biases. Biases against applicants with African American sounding names have been shown in multiple studies. An AI screener has the potential to limit human bias and error in the hiring process, allowing more minority applicants to be successful. === Recruitment === Recruitment involves the identification of potential applicants and the marketing of positions. AI is commonly utilized in the recruitment process because it can help boost the number of qualified applicants for positions. Companies are able to use AI to target their marketing to applicants who are likely to be good fits for a position. This often involves the use of social media sites advertising tools, which rely on AI. Facebook allows advertisers to target ads based on demographics, location, interests, behavior, and connections. Facebook also allows companies to target a "look-a-like" audience, that is the company supplies Facebook with a data set, typically the company's current employees, and Facebook will target the ad to profiles that are similar to the profiles in the data set. Additionally, job sites like Indeed, Glassdoor, and ZipRecruiter target job listings to applicants that have certain characteristics employers are looking for. Targeted advertising has many advantages for companies trying to recruit such being a more efficient use of resources, reaching a desired audience, and boosting qualified applicants. This has helped make it a mainstay in modern hiring. Who receives a targeted ad can be controversial. In hiring, the implications of targeted ads have to do with who is able to find out about and then apply to a position. Most targeted ad algorithms are proprietary information. Some platforms, like Facebook and Google, allow users to see why they were shown a specific ad, but users who do not receive the ad likely never know of its existence and also have no way of knowing why they were not shown the ad. === Interviews === Chatbots were one of the first applications of AI and are commonly used in the hiring process. Interviewees interact with chatbots to answer interview questions, and an analysis of their responses can be generated by AI. HireVue has created technology that analyzes interviewees' responses and gestures during recorded video interviews. Over 12 million interviewees have been screened by the more than 700 companies that utilize the service. == Controversies == Artificial intelligence in hiring confers many benefits, but it also has some challenges that have concerned experts. AI is only as good as the data it is using. Biases can inadvertently be baked into the data used in AI. Often companies will use data from their employees to decide what people to recruit or hire. This can perpetuate bias and lead to more homogenous workforces. Facebook Ads was an example of a platform that created such controversy for allowing business owners to specify what type of employee they are looking for. For example, job advertisements for nursing and teach could be set such that only women of a specific age group would see the advertisements. Facebook Ads has since then removed this function from its platform, citing the potential problems with the function in perpetuating biases and stereotypes against minorities. The growing use of Artificial Intelligence-enabled hiring systems has become an important component of modern talent hiring, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in the hiring systems, based on Natural Language Processing (NLP) methods, may result in unconscious gender bias. Utilizing data driven methods may mitigate some bias generated from these systems It can also be hard to quantify what makes a good employee. This poses a challenge for training AI to predict which employees will be best. Commonly used metrics like performance reviews can be subjective and have been shown to favor white employees over black employees and men over women. Another challenge is the limited amount of available data. Employers only collect certain details about candidates during the initial stages of the hiring process. This requires AI to make determinations about candidates with very limited information to go off of. Additionally, many employers do not hire employees frequently and so have limited firm specific data to go off. To combat this, many firms will use algorithms and data from other firms in their industry. AI's reliance on applicant and current employees personal data raises privacy issues. These issues effect both the applicants and current employees, but also may have implications for third parties who are linked through social media to applicants or current employees. For example, a sweep of someone's social media will also show their friends and people they have tagged in photos or posts. == AI and the future of hiring == Artificial intelligence along with other technological advances such as improvements in robotics have placed 47% of jobs at risk of being eliminated in the near future. In 2016 the founder of the World Economic Forum, Klaus Schwab, called AI and related technology the "Fourth Industrial Revolution". According to some scholars, however, the transformative impact of AI on labor has been overstated. The "no-real-change" theory holds that an IT revolution has already occurred, but that the benefits of implementing new technologies does not outweigh the costs associated with adopting them. This theory claims that the result of the IT revolution is thus much less impactful than had originally been forecasted. Other scholars refute this theory claiming that AI has already led to significant job loss for unskilled labor and that it will eliminate middle skill and high skill jobs in the future. This position is based around the idea that AI is not yet a technology of general use and that any potential 4th industrial revolution has not fully occurred. A third theory holds that the effect of AI and other technological advances is too complicated to yet be understood. This theory is centered around the idea that while AI will likely eliminate jobs in the short term it will also likely increase the demand for other jobs. The question then becomes will the new jobs be accessible to people and will they emerge near when jobs are eliminated. == AI use in hiring for candidates == Job seekers now commonly encounter AI-driven tools at multiple stages, including automated resume parsing, video interview analysis, chatbots for frequently asked questions, and real‑time application updates. Some candidates also employ AI career agents, designed to optimize job searches, tailor applications, and interface with hiring teams. A 2025 Australian study found that AI-driven video interviews exhibited transcription error rates of up to 22% for non‑native speakers and those with speech-related disabilities, raising concerns of discrimination. A 2017 study in the Journal of Sociology found persistent gender and racial disparities in AI screening tools, even when fairness interventions are applied. Industry observers describe a growing “AI arms race” in recruitment, where both employers and candidates increasingly rely on automated agents. Employers use recruiting systems to source and filter applicants, while candidates deploy AI agents to prepare and submit applications. == Regulations == The Artifici

Semantic folding

Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. == Theory == Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level. == Two-dimensional semantic space == Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vector can be obtained for any given word Y by employing the following algorithm: For each position X in the semantic map (where X represents cartesian coordinates) if the word Y is contained in the context-vector at position X then add 1 to the corresponding position in the word-vector for Y else add 0 to the corresponding position in the word-vector for Y The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. Some properties of word-SDRs that are of particular interest with respect to computational semantics are: high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". boolean logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of semantic similarity, from a simple overlap of vector elements, to a range of distance measures such as: Euclidean distance, Hamming distance, Jaccard distance, cosine similarity, Levenshtein distance, Sørensen-Dice index, etc. == Semantic spaces == Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis from Microsoft and Hyperspace Analogue to Language from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google and GloVe from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. == Visualization == The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

CrewAI

CrewAI is an open-source software framework and platform for building AI agents and multi-agent systems. Written primarily in Python, it is used to define artificial-intelligence agents, assign tasks to them, and coordinate their work through agent teams and workflows. The framework is associated with CrewAI Inc., a startup developing enterprise tools for automating business workflows with large language model-based agents. == History == CrewAI was first released on the Python Package Index in December 2023. The project was created by João Moura and later developed by CrewAI Inc. and open-source contributors. In October 2024, TechCrunch reported that CrewAI had raised $18 million across seed and Series A funding rounds from investors including Boldstart Ventures, Craft Ventures, Earl Grey Capital, and Insight Partners. The report also stated that Andrew Ng and HubSpot co-founder Dharmesh Shah had invested in the company. SiliconANGLE described the company as the developer of an open-source framework for building artificial-intelligence agents and reported that the funding consisted of a seed round led by Boldstart Ventures and a Series A led by Insight Partners. By late 2024, CrewAI had introduced commercial enterprise products built on top of its open-source components. TechCrunch reported that the company's enterprise offering added access controls, analytics, support, and templates for workflow automation. == Features == CrewAI is designed around groups of agents, sometimes called "crews", that can be assigned roles, goals, and tasks. The framework supports agent collaboration, task delegation, tool use, memory, and knowledge sources for retrieval-augmented generation workflows. The project describes two main building blocks: "Crews", which are used for autonomous agent collaboration, and "Flows", which are used for more controlled event-driven workflows. The framework is independent of LangChain and is released under the MIT License. It can be installed as a Python package and is commonly used with external large language model APIs or local models, depending on the developer's configuration. == Business model == CrewAI combines an open-source framework with commercial enterprise products. Its enterprise products are intended for organizations that need to build, monitor, and manage agent-based automations with additional security, observability, and administrative controls.

Computer audition

Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound semantics for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing, auditory modelling, music perception and cognition, pattern recognition, and machine learning, as well as more traditional methods of artificial intelligence for musical knowledge representation. == Applications == Like computer vision versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings. Applications of computer audition are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on. == Related disciplines == Computer Audition overlaps with the following disciplines: Music information retrieval: methods for search and analysis of similarity between music signals. Auditory scene analysis: understanding and description of audio sources and events. Computational musicology and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data. Computer music: use of computers in creative musical applications. Machine musicianship: audition driven interactive music systems. == Areas of study == Since audio signals are interpreted by the human ear–brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) could benefit from efficient compression schemes, which discard inaudible information in the sound. Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces. The study of CA could be roughly divided into the following sub-problems: Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture. Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma, and auditory representations. Musical knowledge structures: analysis of tonality, rhythm, and harmonies. Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering. Sequence modeling: matching and alignment between signals and note sequences. Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods. Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure. Multi-modal analysis: finding correspondences between textual, visual, and audio signals. === Representation issues === Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files. Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric representation for general audio. Parametric audio representations usually use filter banks or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings. === Features === Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape etc., statistical characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (bandwidth) in frequency or octave invariance (chroma). Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation. === Musical knowledge === Finding specific musical structures is possible by using musical knowledge as well as supervised and unsupervised machine learning methods. Examples of this include detection of tonality according to distribution of frequencies that correspond to patterns of occurrence of notes in musical scales, distribution of note onset times for detection of beat structure, distribution of energies in different frequencies to detect musical chords and so on. === Sound similarity and sequence modeling === Comparison of sounds can be done by comparison of features with or without reference to time. In some cases an overall similarity can be assessed by close values of features between two sounds. In other cases when temporal structure is important, methods of dynamic time warping need to be applied to "correct" for different temporal scales of acoustic events. Finding repetitions and similar sub-sequences of sonic events is important for tasks such as texture synthesis and machine improvisation. === Source separation === Since one of the basic characteristics of general audio is that it comprises multiple simultaneously sounding sources, such as multiple musical instruments, people talking, machine noises or animal vocalization, the ability to identify and separate individual sources is very desirable. Unfortunately, there are no methods that can solve this problem in a robust fashion. Existing methods of source separation rely sometimes on correlation between different audio channels in multi-channel recordings. The ability to separate sources from stereo signals requires different techniques than those usually applied in communications where multiple sensors are available. Other source separation methods rely on training or clustering of features in mono recording, such as tracking harmonically related partials for multiple pitch detection. Some methods, before explicit recognition, rely on revealing structures in data without knowing the structures (like recognizing objects in abstract pictures without attributing them meaningful labels) by finding the least complex data representations, for instance describing audio scenes as generated by a few tone patterns and their trajectories (polyphonic voices) and acoustical contours drawn by a tone (c

Quantum image processing

Quantum image processing (QIMP) is using quantum computing or quantum information processing to create and work with quantum images. Due to some of the properties inherent to quantum computation, notably entanglement and parallelism, it is hoped that QIMP technologies will offer capabilities and performances that surpass their traditional equivalents, in terms of computing speed, security, and minimum storage requirements. == Background == A. Y. Vlasov's work in 1997 focused on using a quantum system to recognize orthogonal images. This was followed by efforts using quantum algorithms to search specific patterns in binary images and detect the posture of certain targets. Notably, more optics-based interpretations for quantum imaging were initially experimentally demonstrated in and formalized in after seven years. In 2003, Salvador Venegas-Andraca and S. Bose presented Qubit Lattice, the first published general model for storing, processing and retrieving images using quantum systems. Later on, in 2005, Latorre proposed another kind of representation, called the Real Ket, whose purpose was to encode quantum images as a basis for further applications in QIMP. Furthermore, in 2010 Venegas-Andraca and Ball presented a method for storing and retrieving binary geometrical shapes in quantum mechanical systems in which it is shown that maximally entangled qubits can be used to reconstruct images without using any additional information. Technically, these pioneering efforts with the subsequent studies related to them can be classified into three main groups: Quantum-assisted digital image processing (QDIP): These applications aim at improving digital or classical image processing tasks and applications. Optics-based quantum imaging (OQI) Classically inspired quantum image processing (QIMP) A survey of quantum image representation has been published in. Furthermore, the recently published book Quantum Image Processing provides a comprehensive introduction to quantum image processing, which focuses on extending conventional image processing tasks to the quantum computing frameworks. It summarizes the available quantum image representations and their operations, reviews the possible quantum image applications and their implementation, and discusses the open questions and future development trends. == Quantum image representations == There are various approaches for quantum image representation, that are usually based on the encoding of color information. A common representation is FRQI (Flexible Representation for Quantum Images), that captures the color and position at every pixel of the image, and defined as: | I ⟩ = 1 2 n ∑ i = 0 2 2 n − 1 | c i ⟩ ⊗ | i ⟩ {\displaystyle \vert I\rangle ={\frac {1}{2^{n}}}\sum _{i=0}^{2^{2n-1}}\vert c_{i}\rangle \otimes \vert i\rangle } where | i ⟩ {\textstyle |i\rangle } is the position and | c i ⟩ = c o s θ i | 0 ⟩ + s i n θ i | 1 ⟩ {\textstyle \vert c_{i}\rangle =cos\theta _{i}\vert 0\rangle +sin\theta _{i}\vert 1\rangle } the color with a vector of angles θ i ∈ [ 0 , π / 2 ] {\textstyle \theta _{i}\in \left[0,\pi /2\right]} . As it can be seen, | c i ⟩ {\textstyle \vert c_{i}\rangle } is a regular qubit state of the form | ψ ⟩ = α | 0 ⟩ + β | 1 ⟩ {\displaystyle \vert \psi \rangle =\alpha \vert 0\rangle +\beta \vert 1\rangle } , with basis states | 0 ⟩ = ( 1 0 ) {\textstyle \vert 0\rangle ={\begin{pmatrix}1\\0\end{pmatrix}}} and | 1 ⟩ = ( 0 1 ) {\textstyle \vert 1\rangle ={\begin{pmatrix}0\\1\end{pmatrix}}} , as well as amplitudes α {\textstyle \alpha } and β {\textstyle \beta } that satisfy | α | 2 + | β | 2 = 1 {\textstyle \left|\alpha \right|^{2}+\left|\beta \right|^{2}=1} . Another common representation is MCQI (Multi-Channel Representation for Quantum Images), that uses the RGB channels with quantum states and following FRQI definition: | I ⟩ = 1 2 n + 1 ∑ i = 0 2 2 n − 1 | C R G B i ⟩ ⊗ | i ⟩ {\displaystyle \vert I\rangle ={\frac {1}{2^{n+1}}}\sum _{i=0}^{2^{2n-1}}\vert C_{RGB}^{i}\rangle \otimes \vert i\rangle } | C R G B i ⟩ = cos ⁡ θ R i | 000 ⟩ + cos ⁡ θ G i | 001 ⟩ + cos ⁡ θ B i | 010 ⟩ + sin ⁡ θ R i | 100 ⟩ + sin ⁡ θ G i | 101 ⟩ + sin ⁡ θ B i | 110 ⟩ + cos ⁡ θ α | 011 ⟩ + sin ⁡ θ α | 111 ⟩ {\displaystyle {\begin{aligned}{\begin{aligned}\vert C_{RGB}^{i}\rangle &={\cos \theta _{R}^{i}\vert 000\rangle }+{\cos \theta _{G}^{i}\vert 001\rangle }+{\cos \theta _{B}^{i}\vert 010\rangle }\\&\quad +{\sin \theta _{R}^{i}\vert 100\rangle }+{\sin \theta _{G}^{i}\vert 101\rangle }+{\sin \theta _{B}^{i}\vert 110\rangle }\\&\quad +{\cos {\theta _{\alpha }}\vert 011\rangle }+{\sin \theta _{\alpha }\vert 111\rangle }\end{aligned}}\end{aligned}}} Departing from the angle-based approach of FRQI and MCQI, and using a qubit sequence, NEQR (Novel Enhanced Representation for Quantum Images) is another representation approach, that uses a function f ( y , x ) = C y x q − 1 C y x q − 2 … C y x 1 C y x 0 {\textstyle f\left(y,x\right)=C_{yx}^{q-1}C_{yx}^{q-2}\ldots C_{yx}^{1}C_{yx}^{0}} to encode color values for a 2 n × 2 n {\displaystyle 2^{n}\times 2^{n}} image: | I ⟩ = 1 2 n ∑ y = 0 2 n − 1 ∑ x = 0 2 n − 1 | f ( y , x ) ⟩ | y x ⟩ {\displaystyle \vert I\rangle ={\frac {1}{2^{n}}}\sum _{y=0}^{2^{n}-1}\sum _{x=0}^{2^{n}-1}\vert f\left(y,x\right)\rangle \vert yx\rangle } == Quantum image manipulations == A lot of the effort in QIMP has been focused on designing algorithms to manipulate the position and color information encoded using flexible representation of quantum images (FRQI) and its many variants. For instance, FRQI-based fast geometric transformations including (two-point) swapping, flip, (orthogonal) rotations and restricted geometric transformations to constrain these operations to a specified area of an image were initially proposed. Recently, NEQR-based quantum image translation to map the position of each picture element in an input image into a new position in an output image and quantum image scaling to resize a quantum image were discussed. While FRQI-based general form of color transformations were first proposed by means of the single qubit gates such as X, Z, and H gates. Later, Multi-Channel Quantum Image-based channel of interest (CoI) operator to entail shifting the grayscale value of the preselected color channel and the channel swapping (CS) operator to swap the grayscale values between two channels have been fully discussed. To illustrate the feasibility and capability of QIMP algorithms and application, researchers always prefer to simulate the digital image processing tasks on the basis of the QIRs that we already have. By using the basic quantum gates and the aforementioned operations, so far, researchers have contributed to quantum image feature extraction, quantum image segmentation, quantum image morphology, quantum image comparison, quantum image filtering, quantum image classification, quantum image stabilization, among others. In particular, QIMP-based security technologies have attracted extensive interest of researchers as presented in the ensuing discussions. Similarly, these advancements have led to many applications in the areas of watermarking, encryption, and steganography etc., which form the core security technologies highlighted in this area. In general, the work pursued by the researchers in this area are focused on expanding the applicability of QIMP to realize more classical-like digital image processing algorithms; propose technologies to physically realize the QIMP hardware; or simply to note the likely challenges that could impede the realization of some QIMP protocols. == Quantum image transform == By encoding and processing the image information in quantum-mechanical systems, a framework of quantum image processing is presented, where a pure quantum state encodes the image information: to encode the pixel values in the probability amplitudes and the pixel positions in the computational basis states. Given an image F = ( F i , j ) M × L {\displaystyle F=(F_{i,j})_{M\times L}} , where F i , j {\displaystyle F_{i,j}} represents the pixel value at position ( i , j ) {\displaystyle (i,j)} with i = 1 , … , M {\displaystyle i=1,\dots ,M} and j = 1 , … , L {\displaystyle j=1,\dots ,L} , a vector f → {\displaystyle {\vec {f}}} with M L {\displaystyle ML} elements can be formed by letting the first M {\displaystyle M} elements of f → {\displaystyle {\vec {f}}} be the first column of F {\displaystyle F} , the next M {\displaystyle M} elements the second column, etc. A large class of image operations is linear, e.g., unitary transformations, convolutions, and linear filtering. In the quantum computing, the linear transformation can be represented as | g ⟩ = U ^ | f ⟩ {\displaystyle |g\rangle ={\hat {U}}|f\rangle } with the input image state | f ⟩ {\displaystyle |f\rangle } and the output image state | g ⟩ {\displaystyle |g\rangle } . A unitary transformation can be implemented as a unitary evolution. Some basic and commonly used image transforms (e.g., the Fourier, Hadamard, an

Inception score

The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative image model such as a generative adversarial network (GAN). The score is calculated based on the output of a separate, pretrained Inception v3 image classification model applied to a sample of (typically around 30,000) images generated by the generative model. The Inception Score is maximized when the following conditions are true: The entropy of the distribution of labels predicted by the Inceptionv3 model for the generated images is minimized. In other words, the classification model confidently predicts a single label for each image. Intuitively, this corresponds to the desideratum of generated images being "sharp" or "distinct". The predictions of the classification model are evenly distributed across all possible labels. This corresponds to the desideratum that the output of the generative model is "diverse". It has been somewhat superseded by the related Fréchet inception distance. While the Inception Score only evaluates the distribution of generated images, the FID compares the distribution of generated images with the distribution of a set of real images ("ground truth"). == Definition == Let there be two spaces, the space of images Ω X {\displaystyle \Omega _{X}} and the space of labels Ω Y {\displaystyle \Omega _{Y}} . The space of labels is finite. Let p g e n {\displaystyle p_{gen}} be a probability distribution over Ω X {\displaystyle \Omega _{X}} that we wish to judge. Let a discriminator be a function of type p d i s : Ω X → M ( Ω Y ) {\displaystyle p_{dis}:\Omega _{X}\to M(\Omega _{Y})} where M ( Ω Y ) {\displaystyle M(\Omega _{Y})} is the set of all probability distributions on Ω Y {\displaystyle \Omega _{Y}} . For any image x {\displaystyle x} , and any label y {\displaystyle y} , let p d i s ( y | x ) {\displaystyle p_{dis}(y|x)} be the probability that image x {\displaystyle x} has label y {\displaystyle y} , according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet. The Inception Score of p g e n {\displaystyle p_{gen}} relative to p d i s {\displaystyle p_{dis}} is I S ( p g e n , p d i s ) := exp ⁡ ( E x ∼ p g e n [ D K L ( p d i s ( ⋅ | x ) ‖ ∫ p d i s ( ⋅ | x ) p g e n ( x ) d x ) ] ) {\displaystyle IS(p_{gen},p_{dis}):=\exp \left(\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\int p_{dis}(\cdot |x)p_{gen}(x)dx\right)\right]\right)} Equivalent rewrites include ln ⁡ I S ( p g e n , p d i s ) := E x ∼ p g e n [ D K L ( p d i s ( ⋅ | x ) ‖ E x ∼ p g e n [ p d i s ( ⋅ | x ) ] ) ] {\displaystyle \ln IS(p_{gen},p_{dis}):=\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\mathbb {E} _{x\sim p_{gen}}[p_{dis}(\cdot |x)]\right)\right]} ln ⁡ I S ( p g e n , p d i s ) := H [ E x ∼ p g e n [ p d i s ( ⋅ | x ) ] ] − E x ∼ p g e n [ H [ p d i s ( ⋅ | x ) ] ] {\displaystyle \ln IS(p_{gen},p_{dis}):=H[\mathbb {E} _{x\sim p_{gen}}[p_{dis}(\cdot |x)]]-\mathbb {E} _{x\sim p_{gen}}[H[p_{dis}(\cdot |x)]]} ln ⁡ I S {\displaystyle \ln IS} is nonnegative by Jensen's inequality. Pseudocode:INPUT discriminator p d i s {\displaystyle p_{dis}} . INPUT generator g {\displaystyle g} . Sample images x i {\displaystyle x_{i}} from generator. Compute p d i s ( ⋅ | x i ) {\displaystyle p_{dis}(\cdot |x_{i})} , the probability distribution over labels conditional on image x i {\displaystyle x_{i}} . Sum up the results to obtain p ^ {\displaystyle {\hat {p}}} , an empirical estimate of ∫ p d i s ( ⋅ | x ) p g e n ( x ) d x {\displaystyle \int p_{dis}(\cdot |x)p_{gen}(x)dx} . Sample more images x i {\displaystyle x_{i}} from generator, and for each, compute D K L ( p d i s ( ⋅ | x i ) ‖ p ^ ) {\displaystyle D_{KL}\left(p_{dis}(\cdot |x_{i})\|{\hat {p}}\right)} . Average the results, and take its exponential. RETURN the result. === Interpretation === A higher inception score is interpreted as "better", as it means that p g e n {\displaystyle p_{gen}} is a "sharp and distinct" collection of pictures. ln ⁡ I S ( p g e n , p d i s ) ∈ [ 0 , ln ⁡ N ] {\displaystyle \ln IS(p_{gen},p_{dis})\in [0,\ln N]} , where N {\displaystyle N} is the total number of possible labels. ln ⁡ I S ( p g e n , p d i s ) = 0 {\displaystyle \ln IS(p_{gen},p_{dis})=0} iff for almost all x ∼ p g e n {\displaystyle x\sim p_{gen}} p d i s ( ⋅ | x ) = ∫ p d i s ( ⋅ | x ) p g e n ( x ) d x {\displaystyle p_{dis}(\cdot |x)=\int p_{dis}(\cdot |x)p_{gen}(x)dx} That means p g e n {\displaystyle p_{gen}} is completely "indistinct". That is, for any image x {\displaystyle x} sampled from p g e n {\displaystyle p_{gen}} , discriminator returns exactly the same label predictions p d i s ( ⋅ | x ) {\displaystyle p_{dis}(\cdot |x)} . The highest inception score N {\displaystyle N} is achieved if and only if the two conditions are both true: For almost all x ∼ p g e n {\displaystyle x\sim p_{gen}} , the distribution p d i s ( y | x ) {\displaystyle p_{dis}(y|x)} is concentrated on one label. That is, H y [ p d i s ( y | x ) ] = 0 {\displaystyle H_{y}[p_{dis}(y|x)]=0} . That is, every image sampled from p g e n {\displaystyle p_{gen}} is exactly classified by the discriminator. For every label y {\displaystyle y} , the proportion of generated images labelled as y {\displaystyle y} is exactly E x ∼ p g e n [ p d i s ( y | x ) ] = 1 N {\displaystyle \mathbb {E} _{x\sim p_{gen}}[p_{dis}(y|x)]={\frac {1}{N}}} . That is, the generated images are equally distributed over all labels.