AI Chat Image

AI Chat Image — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Vegas Pro

    Vegas Pro

    Vegas Pro (formerly known as Sony Vegas) is a professional video editing software package for non-linear editing (NLE), designed to run on the Microsoft Windows operating system. The first release of Vegas Beta was on June 11, 1999. Vegas was originally developed as a non-linear audio editing application. Version 2.0 would split the program into audio and video editing variants, with the former being dropped by version 4.0, making the video offering the only variant available to consumers. Vegas Pro features real-time multi-track video and audio editing on unlimited tracks, resolution-independent video sequencing, complex effects, compositing tools, 24-bit/192 kHz audio support, VST and DirectX plug-in effect support, and Dolby Digital surround sound mixing. The software was originally published by Sonic Foundry until May 2003, when Sony purchased Sonic Foundry and formed Sony Creative Software. On May 24, 2016, Sony announced that Vegas was sold to MAGIX, which formed VEGAS Creative Software, to continue support and development of the software. As of the end of March 2026, it was publicly announced that Boris FX had taken ownership of Vegas Pro. Each release of Vegas is sold standalone; however, upgrade discounts are sometimes provided. == Features == Vegas does not require any specialized hardware to run properly, allowing it to operate on any Windows computer that meets the system requirements. == History == Vegas 1.0 was released after a brief public beta by Sonic Foundry on July 23, 1999 at the NAMM Show in Nashville, Tennessee as an audio-only tool with a particular focus on re-scaling and resampling audio. It supported formats like DivX and Real Networks RealSystem G2 file formats. Martin Walker from Sound on Sound described working in Vegas 1.0 as a "very pleasurable experience, especially since so many functions are highly intuitive" though also criticizing some features as hard to figure out due to the lack of a central help file. Later, on June 12, 2000, Vegas Video and Audio 2.0 (also referred to as just Vegas 2.0) was released, with its beta releasing earlier that year on April 10. This was the first version of Vegas to include video-editing tools and was also the first to have a low-cost "LE" version alongside the regular release. The LE releases would continue through version 3.0 of Vegas but would be discontinued by the release of Vegas 4.0. Vegas 3.0 was released the next year on December 3, and added new video effects, features for ease-of-use with DV, and support for editing Windows Media files. Vegas 4.0 was released on 6 February 2003 and added application scripting, advanced color correction, 5.1 surround sound mixing, and Steinberg ASIO support. This was the last release under the Sonic Foundry name after it sold much of its software suite, including Sound Forge and Acid Pro, to Sony Pictures Digital for $18 million later in 2003. Under Sony's ownership, Vegas 5.0 was released on April 19, 2004, bringing 3D track motion, compositing, reversing, envelope automation, etc. 7.0 also added an improved video preview, enhanced layout management, improved snapping, and more customization. With the release of 8.0, Sony opted to go back to the original "Vegas Pro" branding that the first version released with. It added the ability to burn Blu-ray and DVD optical media, support for 32-bit floating point audio, support for tempo-based audio effects, and more. It also moved the timeline to the bottom of the window by default with the option of moving it back to the top if the user wished to. Sony was also experimenting with 64-bit at this time and ported Vegas Pro 8.0 to 64-bit systems under the name "Vegas Pro 8.1". Vegas Pro 9.0 added support for 4K resolution and pro camcorder formats like Red and XDCAM EX. In 2009, Sony Creative Software purchased the Velvetmatter Radiance suite of video FX plug-ins which were included in Sony Vegas Pro 9.0. As a result, they were no longer available as a separate product from Velvetmatter. Vegas Pro 10 was released in 2010 with stereoscopic 3D editing, image stabilization, OpenFX plugin support, real-time audio event effects, and a few UI changes. This was the last release to include support for Windows XP. Vegas Pro 11 was released the next year on 17 October, with GPGPU video acceleration, enhanced text tools, enhanced stereoscopic/3D features, RAW photo support, and new event synchronization mechanisms. In addition, Vegas Pro 11 comes pre-loaded with "NewBlue" Titler Pro, a 2D and 3D titling plug-in. Vegas Pro 12 would add two new configurations: Vegas Pro 12 Edit, for "Professional Video and Audio Production"; and Vegas Pro 12 Suite, for "Professional Editing, Disc Authoring, and Visual Effects Design". Vegas Pro 13 would be the last version released with Sony branding after the acquisition of much of Sony Creative Software's library by Magix. After they acquired Vegas, Magix released version 14 on September 20, 2016. It featured advanced 4K upscaling as well as many bug fixes, a higher video velocity limit, RED camera support, and a variety of other features. This was also the last version to have the light theme enabled by default. Released on August 28, 2017, Vegas Pro 15 features major UI changes that claim to bring usability improvements and customization. It was the first version of VEGAS Pro to have a dark theme; it also allows more efficient editing speeds, including adding new shortcuts to speed the video editing process. Vegas Pro 15 includes support for Intel Quick Sync Video (QSV) and other technologies, as well as various other features. It introduced a new VEGAS Pro icon as a V. Vegas Pro 16 has some new features including file backup, motion tracking, improved video stabilization, 360° editing and HDR support. Magix has continued to improve Vegas through version 21 with support for reading Matroska files, a more detailed render dialogue, live streaming, VST3 support, a VST 32-bit bridge, and a selective Paste Event Attributes menu. Magix would later release a subscription model for using Vegas named "Vegas Pro 365" on January 17, 2018, although the perpetual licence is still an option for customers. This version includes cloud-based speech synthesis among other features not included in the mainline Vegas release. == Version history == Each release of Vegas is sold standalone, however upgrade discounts are sometimes provided. === Vegas Beta === Sonic Foundry introduced a sneak preview version of Vegas Pro on June 11, 1999. It is called a "Multitrack Media Editing System". === Vegas 1.0 === Released on July 23, 1999 at the NAMM Show in Nashville, Tennessee, Vegas was an audio-only tool with a particular focus on rescaling and resampling audio. It supported formats like DivX and Real Networks RealSystem G2 file formats. Version 1.0 is the final Vegas release to include Windows 95 support. === Vegas Video beta (Vegas 2.0 beta) === Released on April 10, 2000, this was the first version of Vegas to include video-editing tools. === Vegas Video (Vegas 2.0) === Released on June 12, 2000. Version 2.0 is the final Vegas Video release to include Windows NT 4.0 support. === Vegas Video 3.0 === Released on December 3, 2001. This release added: New Video Effects – Lens Flare, Light Rays, Film FX, Color Curves, Mirror, Remap, Deform, Convolution, Linear Blur, Black Restore, Levels, Unsharp Mask, Color Grading, and Timecode Burn filter. Batch Capture with Automatic Scene Detection – Captures DV with automatic scene detection, batch capture, tape logging, still image capture and thumbnail previews. Red Book Audio CD Mastering with CD Architect (TM) Technology – Used for burning Red Book audio CD masters directly from the Vegas timeline with ISRC, UPC, and PQ list support. New Sonic Foundry DV Codec – Introduces a DV codec developed by Sonic Foundry that offers artifact-free compositing and DV chromakeying. DV Print-to-Tape from the Timeline – Prints projects to DV cameras and decks from the Vegas timeline. Windows Media (TM) File Editing – Creates and edits Windows Media (TM) files. New MPEG Encoding Tools – Used for producing MPEG-2 files for DVD productions. Dynamic RAM Previewing – Temporary RAM/render-free previews for analysis and tweaking of complex video FX without rendering. VideoCD and Data CD Burning – Burning projects directly to VideoCD for playback on most DVD players or data CDs for playback computers' CD-ROMs. === Vegas 4.0 === Released on February 6, 2003. This release added: Advanced Color Correction Tools Searchable Media Pool Bins Vectorscope, Histogram, Parade and Waveform Monitoring Application Scripting Improved Ripple Editing Motion Blur and Super-Sampling Envelopes 5.1 Surround Mixing Dolby® Digital AC-3 Encoding certified and tested by Dolby Laboratories DirectX® Audio Plug-In Effects Automation ASIO Driver Support Windows Media™ 9 Support, including Surround Encoding DVD Authoring with AC-3 File Import Capabilities Integration with DVD Architect via Chap

    Read more →
  • Double descent

    Double descent

    Double descent in statistics and machine learning is the phenomenon where a model's error rate on the test set initially decreases with the number of parameters, then peaks, then decreases again. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning. The increase usually occurs near the interpolation threshold, where the number of parameters is the same as the number of training data points (the model is just large enough to fit the training data). Or, more precisely, it is the maximum number of samples on which the model/training procedure achieves approximately on average 0 training error. == History == Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models. == Theoretical models == Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise. A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically. A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

    Read more →
  • JAX (software)

    JAX (software)

    JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. It is developed by Google with contributions from Nvidia and other community contributors. It is described as bringing together a modified version of the automatic differentiation system autograd and OpenXLA's XLA (Accelerated Linear Algebra). It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary features of JAX are: Providing a unified NumPy-like interface to computations that run on CPU, GPU, or TPU, in local or distributed settings. Built-in Just-In-Time (JIT) compilation via OpenXLA, an open-source machine learning compiler ecosystem. Efficient evaluation of gradients via its automatic differentiation transformations. Automatic vectorization to efficiently map functions over arrays representing batches of inputs. == Libraries using Jax == Flax Equinox Optax

    Read more →
  • Manifold regularization

    Manifold regularization

    In machine learning, manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition. == Manifold regularizer == === Motivation === Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions. In particular, manifold regularization extends the technique of Tikhonov regularization as applied to Reproducing kernel Hilbert spaces (RKHSs). Under standard Tikhonov regularization on RKHSs, a learning algorithm attempts to learn a function f {\displaystyle f} from among a hypothesis space of functions H {\displaystyle {\mathcal {H}}} . The hypothesis space is an RKHS, meaning that it is associated with a kernel K {\displaystyle K} , and so every candidate function f {\displaystyle f} has a norm ‖ f ‖ K {\displaystyle \left\|f\right\|_{K}} , which represents the complexity of the candidate function in the hypothesis space. When the algorithm considers a candidate function, it takes its norm into account in order to penalize complex functions. Formally, given a set of labeled training data ( x 1 , y 1 ) , … , ( x ℓ , y ℓ ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{\ell },y_{\ell })} with x i ∈ X , y i ∈ Y {\displaystyle x_{i}\in X,y_{i}\in Y} and a loss function V {\displaystyle V} , a learning algorithm using Tikhonov regularization will attempt to solve the expression arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ ‖ f ‖ K 2 {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma \left\|f\right\|_{K}^{2}} where γ {\displaystyle \gamma } is a hyperparameter that controls how much the algorithm will prefer simpler functions over functions that fit the data better. Manifold regularization adds a second regularization term, the intrinsic regularizer, to the ambient regularizer used in standard Tikhonov regularization. Under the manifold assumption in machine learning, the data in question do not come from the entire input space X {\displaystyle X} , but instead from a nonlinear manifold M ⊂ X {\displaystyle M\subset X} . The geometry of this manifold, the intrinsic space, is used to determine the regularization norm. === Laplacian norm === There are many possible choices for the intrinsic regularizer ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} . Many natural choices involve the gradient on the manifold ∇ M {\displaystyle \nabla _{M}} , which can provide a measure of how smooth a target function is. A smooth function should change slowly where the input data are dense; that is, the gradient ∇ M f ( x ) {\displaystyle \nabla _{M}f(x)} should be small where the marginal probability density P X ( x ) {\displaystyle {\mathcal {P}}_{X}(x)} , the probability density of a randomly drawn data point appearing at x {\displaystyle x} , is large. This gives one appropriate choice for the intrinsic regularizer: ‖ f ‖ I 2 = ∫ x ∈ M ‖ ∇ M f ( x ) ‖ 2 d P X ( x ) {\displaystyle \left\|f\right\|_{I}^{2}=\int _{x\in M}\left\|\nabla _{M}f(x)\right\|^{2}\,d{\mathcal {P}}_{X}(x)} In practice, this norm cannot be computed directly because the marginal distribution P X {\displaystyle {\mathcal {P}}_{X}} is unknown, but it can be estimated from the provided data. === Graph-based approach of the Laplacian norm === When the distances between input points are interpreted as a graph, then the Laplacian matrix of the graph can help to estimate the marginal distribution. Suppose that the input data include ℓ {\displaystyle \ell } labeled examples (pairs of an input x {\displaystyle x} and a label y {\displaystyle y} ) and u {\displaystyle u} unlabeled examples (inputs without associated labels). Define W {\displaystyle W} to be a matrix of edge weights for a graph, where W i j {\displaystyle W_{ij}} is a similarity built from distance measure between the data points x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} (so that more close implies higher W i j {\displaystyle W_{ij}} ). Define D {\displaystyle D} to be a diagonal matrix with D i i = ∑ j = 1 ℓ + u W i j {\displaystyle D_{ii}=\sum _{j=1}^{\ell +u}W_{ij}} and L {\displaystyle L} to be the Laplacian matrix D − W {\displaystyle D-W} . Then, as the number of data points ℓ + u {\displaystyle \ell +u} increases, L {\displaystyle L} converges to the Laplace–Beltrami operator Δ M {\displaystyle \Delta _{M}} , which is the divergence of the gradient ∇ M {\displaystyle \nabla _{M}} . Then, if f {\displaystyle \mathbf {f} } is a vector of the values of f {\displaystyle f} at the data, f = [ f ( x 1 ) , … , f ( x l + u ) ] T {\displaystyle \mathbf {f} =[f(x_{1}),\ldots ,f(x_{l+u})]^{\mathrm {T} }} , the intrinsic norm can be estimated: ‖ f ‖ I 2 = 1 ( ℓ + u ) 2 f T L f {\displaystyle \left\|f\right\|_{I}^{2}={\frac {1}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As the number of data points ℓ + u {\displaystyle \ell +u} increases, this empirical definition of ‖ f ‖ I 2 {\displaystyle \left\|f\right\|_{I}^{2}} converges to the definition when P X {\displaystyle {\mathcal {P}}_{X}} is known. === Solving the regularization problem with graph-based approach === Using the weights γ A {\displaystyle \gamma _{A}} and γ I {\displaystyle \gamma _{I}} for the ambient and intrinsic regularizers, the final expression to be solved becomes: arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ A ‖ f ‖ K 2 + γ I ( ℓ + u ) 2 f T L f {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma _{A}\left\|f\right\|_{K}^{2}+{\frac {\gamma _{I}}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As with other kernel methods, H {\displaystyle {\mathcal {H}}} may be an infinite-dimensional space, so if the regularization expression cannot be solved explicitly, it is impossible to search the entire space for a solution. Instead, a representer theorem shows that under certain conditions on the choice of the norm ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} , the optimal solution f ∗ {\displaystyle f^{}} must be a linear combination of the kernel centered at each of the input points: for some weights α i {\displaystyle \alpha _{i}} , f ∗ ( x ) = ∑ i = 1 ℓ + u α i K ( x i , x ) {\displaystyle f^{}(x)=\sum _{i=1}^{\ell +u}\alpha _{i}K(x_{i},x)} Using this result, it is possible to search for the optimal solution f ∗ {\displaystyle f^{}} by searching the finite-dimensional space defined by the possible choices of α i {\displaystyle \alpha _{i}} . === Functional approach of the Laplacian norm === The idea beyond the graph-Laplacian is to use neighbors to estimate the Laplacian. This method is akin to local averaging methods, that are known to scale poorly in high-dimensional problems. Indeed, the graph Laplacian is known to suffer from the curse of dimensionality. Luckily, it is possible to leverage expected smoothness of the function to estimate thanks to more advanced functional analysis. This method consists of estimating the Laplacian operator using derivatives of the kernel reading ∂ 1 , j K ( x i , x ) {\displaystyle \partial _{1,j}K(x_{i},x)} where ∂ 1 , j {\displaystyle \partial _{1,j}} denotes the partial derivatives according to the j-th coordinate of the first variable. This second approach to the Laplacian norm is to put in relation with meshfree methods, that contrast with the finite difference method in PDE. == Applications == Manifold regularization can extend a variety of algorithms that can be expressed using Tikhonov regularization, by choosing an appropriate loss function V {\displaystyle V} and hypothesis space H {\displaystyle {\mathcal {H}}} . Two commonly used examples are the families of support vector machines and regularized least squares algorithm

    Read more →
  • Group of Governmental Experts on Lethal Autonomous Weapons Systems

    Group of Governmental Experts on Lethal Autonomous Weapons Systems

    The Group of Governmental Experts on Lethal Autonomous Weapons Systems, commonly known as the GGE on LAWS, refers to a group of governmental experts established under the framework of the Convention on Certain Conventional Weapons (CCW), a United Nations arms control framework. The group examines legal, ethical, societal and moral questions that arise from the increased use of autonomous robots to carry weapons and to be programmed to engage in combat in various situations that might arise, including battles between countries, or in patrolling border areas or sensitive areas, or other similar roles. As of 18 March 2025, the Convention on Certain Conventional Weapons had 128 High Contracting Parties. In the Geneva Conventions, the term "High Contracting Parties" refers to the states that have joined the conventions and are therefore bound to uphold them. Among the countries that have joined are states with tense relations or ongoing armed conflict with one another, including Russia and Ukraine, Israel and the State of Palestine, and Pakistan and Afghanistan. == Background == In 2013, the Meeting of State Parties to the Convention on Certain Conventional Weapons agreed on a mandate on lethal autonomous weapon systems and tasked its chairperson with convening an informal Meeting of Experts to discuss issues related to emerging technologies in the area of LAWS. Those informal Meetings of Experts were then held in 2014, 2015 and 2016, and their reports fed into subsequent meetings of the High Contracting Parties. At the Fifth CCW Review Conference in 2016, the High Contracting Parties decided to establish an open-ended Group of Governmental Experts on emerging technologies in the area of LAWS, building on the earlier expert meetings. Since then, the group has been reconvened annually. In 2023, the Meeting of the High Contracting Parties to the CCW decided that the GGE on LAWS would continue its work in 2024 and 2025. The group was tasked with developing, by consensus, elements of a possible instrument, without predetermining its form, as well as other measures addressing lethal autonomous weapon systems, drawing on existing CCW protocols, earlier recommendations, state proposals, and legal, military, and technological expertise. == 2024 == In 2024, the GGE met twice, and the group was chaired by Robert in den Bosch, the Netherlands' disarmament ambassador. The 2024 Meeting of the High Contracting Parties decided that the group would meet for 10 days in 2025, in two five-day sessions, and reaffirmed its mandate to continue work by consensus on possible elements of an instrument and other measures addressing lethal autonomous weapon systems. == 2025 == At its first 2025 session, held in Geneva from 3 to 7 March 2025, the Group of Governmental Experts on Lethal Autonomous Weapon Systems discussed revisions to the chair's rolling text. The text was structured into five sections, or "boxes", though delegates held differing views on whether headings were useful or appropriate. Broadly, the discussions covered the characterization of lethal autonomous weapon systems, the application of international humanitarian law, possible prohibitions and regulations, legal review, and questions of accountability and responsibility. At its second session, held from 1 to 5 September 2025, delegations continued work on the chair's rolling text, which set out elements of a possible instrument and was organized into five thematic "boxes". == 2026 == === Developments before the 2026 session === A few weeks before the meeting, autonomous weapons drew renewed attention when the United States pressured Anthropic to revise the terms of use for its AI model Claude. Anthropic prohibited the model's use for mass domestic surveillance and for fully autonomous weapons operating without human oversight, while reports also emerged that OpenAI had reached an agreement with the U.S. Department of War for the use of its AI models, reportedly stipulating that they would not independently direct autonomous weapons where human control was required. The U.S. military nevertheless continued to use Claude during its war on Iran, and there was increasing alarm about the use of AI-assisted semi-autonomous weapons in conflicts including those in Ukraine, Sudan, Gaza, and Iran. Before the start of the sessions, Robert in den Bosch, as chair, warned that progress was urgent because technological developments were moving quickly. At the same time, although states agreed that international humanitarian law applied to LAWS, specific internationally binding standards governing such systems remained largely absent. A key divide before the session was that Russia and the United States opposed new legally binding instruments, while other states argued that new rules were necessary. According to Robert in den Bosch, the talks could lead to new rules, amendments to an existing convention, or a new treaty. === First session === From 2 to 6 March 2026, the group held its penultimate session under the group's three-year mandate. Delegations discussed the chair's rolling draft text, circulated in December 2025, on elements of a possible instrument or other measures concerning lethal autonomous weapon systems. In revised text circulated by the chair on 5 March 2026, a lethal autonomous weapon system was characterized as "a functionally integrated combination of one or more weapons and technological components, that can identify, select, and engage a target, without intervention by a human operator in the execution of these tasks". The text was divided into five boxes to structure discussion. During the session, delegates conducted a first reading of the draft text, and the chair later circulated revised language for several sections. Informal consultations were also held. According to campaign groups and participating observers, support grew during the week for moving to negotiations on the basis of the rolling text, with more than 70 states said to support that step by the end of the session, though some participants warned that attempts to bridge differences risked blurring the group's core purpose. The International Committee of the Red Cross argued that the text should not only restate existing international humanitarian law, but also clarify how those rules apply to autonomous weapons and set out additional measures tailored to the specific challenges such systems raise. Stop Killer Robots likewise emphasized the need to preserve meaningful human judgment and control over increasingly autonomous systems. During the discussions, the U.S. delegation opposed the term "human control" and reportedly proposed the alternative phrase "good faith human judgment and care". Other delegations rejected that wording as too weak, while many states continued to insist that meaningful human control over weapon systems remained essential.

    Read more →
  • Data annotation

    Data annotation

    Data annotation is the process of labeling or tagging relevant metadata within a dataset to enable machines to interpret the data accurately. The dataset can take various forms, including images, audio files, video footage, or text. == Applications == Data is a fundamental component in the development of artificial intelligence (AI). Training AI models, particularly in computer vision and natural language processing, requires large volumes of annotated data. Proper annotation ensures that machine learning algorithms can recognize patterns and make accurate predictions. Common types of data annotation include classification, bounding boxes, semantic segmentation, and keypoint annotation. Data annotation is used in AI-driven fields, including healthcare, autonomous vehicles, retail, security, and entertainment. By accurately labeling data, machine learning models can perform complex tasks such as object detection, sentiment analysis, and speech recognition with greater precision. This growing demand has led to the emergence of specialized sectors and platforms dedicated to AI training and human-in-the-loop workflows, which often utilize Reinforcement Learning from Human Feedback (RLHF) to refine model behavior. == In computer vision == === Image classification === Image classification, also known as image categorization, involves assigning predefined labels to images. Machine learning algorithms trained on classified images can later recognize objects and differentiate between categories. For instance, an AI model trained to recognize furniture styles can distinguish between Georgian and Rococo armchairs. === Semantic segmentation === Semantic segmentation assigns each pixel in an image to a specific class, such as trees, vehicles, humans, or buildings. This type of annotation enables machine learning models to differentiate objects by grouping similar pixels, allowing for a detailed understanding of an image. === Bounding boxes === Bounding box annotation involves drawing rectangular boxes around objects in an image. This technique is commonly used in autonomous driving, security surveillance, and retail analytics to detect and classify objects such as pedestrians, vehicles, and products on store shelves. === 3D cuboids === 3D cuboid annotation enhances traditional bounding boxes by adding depth, enabling models to predict an object's spatial orientation, movement, and size. This method is particularly useful for autonomous vehicles and robotics, where understanding object dimensions and depth is critical. === Polygonal annotation === For objects with irregular shapes, such as curved or multi-sided items, polygonal annotation provides more precise labeling than bounding boxes. This technique is often used in applications that require detailed object recognition, such as medical imaging or aerial mapping. === Keypoint annotation === Keypoint annotation marks specific points on an object, such as facial landmarks or body joints, to enable tracking and motion analysis. This method is widely used in facial recognition, emotion detection, sports analytics, and augmented reality applications.

    Read more →
  • Organoid intelligence

    Organoid intelligence

    Organoid intelligence (OI) is an emerging field of study in computer science and biology that develops and studies biological wetware computing using 3D cultures of human brain cells (or brain organoids) and brain-machine interface technologies. Such technologies may be referred to as OIs or the nervous filesystem. Organoid intelligent computer systems can be an example of biohybrid systems. == Differences with non-organic computing == As opposed to traditional non-organic silicon-based approaches, OI seeks to use lab-grown cerebral organoids to serve as "biological hardware". While these structures are still far from being able to think like a regular human brain and do not yet possess strong computing capabilities, OI research currently offers the potential to improve the understanding of brain development, learning and memory, potentially finding treatments for neurological disorders such as dementia. Thomas Hartung, a professor from Johns Hopkins University, argued in 2023 that "while silicon-based computers are certainly better with numbers, brains are better at learning." He noted that transistor density in computer chip may be approaching its limits, whereas brains, being wired differently, are more energy-efficient and can store large amounts of information. Some researchers claim that even though human brains are slower than machines at processing simple information, they are far better at processing complex information as brains can deal with fewer and more uncertain data, perform both sequential and parallel processing, being highly heterogenous, use incomplete datasets, and is said to outperform non-organic machines in decision-making. Training OIs involve the process of biological learning (BL) as opposed to machine learning (ML) for AIs. == Bioinformatics in OI == OI generates complex biological data, necessitating sophisticated methods for processing and analysis. Bioinformatics provides the tools and techniques to decipher raw data, uncovering the patterns and insights. Researchers have developed a platform named Neuroplatform for experimenting remotely with brain organoids via an API. == Intended functions == Brain-inspired computing hardware aims to emulate the structure and working principles of the brain and could be used to address current limitations in AI technologies. However, brain-inspired silicon chips are still limited in their ability to fully mimic brain function, as most examples are built on digital electronic principles. One study performed OI computation (which they termed Brainoware) by sending and receiving information from the brain organoid using a high-density multielectrode array. By applying spatiotemporal electrical stimulation, nonlinear dynamics, and fading memory properties, as well as unsupervised learning from training data by reshaping the organoid functional connectivity, the study showed the potential of this technology by using it for speech recognition and nonlinear equation prediction in a reservoir computing framework. == Ethical concerns == While researchers are hoping to use OI and biological computing to complement traditional silicon-based computing, there are also questions about the ethics of such an approach. Concerns include the possibility that an organoid could develop sentience or consciousness, and the question of the relationship between a stem cell donor (for growing the organoid) and the respective OI system.

    Read more →
  • AI literacy

    AI literacy

    AI literacy or artificial intelligence literacy is "a set of competencies that enables individuals to critically evaluate AI technologies; communicate and collaborate effectively with AI; and use AI as a tool online, at home, and in the workplace." AI is employed in a variety of applications, including self-driving automobiles, virtual assistants and text generation by generative AI models. Users of these tools should be able to make informed decisions. AI literacy may have an impact on students' future employment prospects. With the rise of generative AI platforms, AI literacy has become a topic of conversation in the field of education. Some think AI literacy is essential for school and college students, while others restrict or prohibit the use of AI in assignments, viewing it as a form of academic dishonesty. However, many researchers and educational institutions promote a more nuanced approach, encouraging critical engagement with AI while developing policies that balance academic integrity with opportunities for learning. == Definitions == Other definitions of AI literacy include the ability to understand, use, monitor, and critically reflect on AI applications. That use of the term usually refers to teaching skills and knowledge to the general public, particularly those who are not adept in AI and the ability to understand, use, evaluate, and ethically navigate AI. As research into AI literacy is still emerging and focused on developing context-specific skills, there is not yet a single, broadly agreed-upon definition. AI literacy is linked to other forms of literacy. AI literacy requires digital literacy, whereas scientific and computational literacy may inform it. Data literacy also significantly overlaps with it. == Categories == AI literacy encompasses multiple categories, including a theoretical understanding of how artificial intelligence works, the usage of artificial intelligence technologies, and the critical appraisal of artificial intelligence, and its ethics. === Know and understand AI === Knowledge and understanding of AI refers to a basic understanding of what artificial intelligence is and how it works. This includes familiarity with machine learning algorithms and the limitations and biases present in AI systems. Users who know and understand AI should be familiar with various technologies that use artificial intelligence, including cognitive systems, robotics and machine learning. This includes recognizing that large language models (LLMs) are machine learning models trained on extensive datasets which generate new text rather than retrieving pre-written responses. === Use and apply AI === Using and applying AI refers to the ability to use AI tools to solve problems and perform tasks such as programming and analyzing big data. Some consider prompt engineering, the practice of designing effective prompts to guide generative AI platforms more effectively, as another competency within AI literacy. === Evaluate and create AI === Evaluation and creation refers to the ability to critically evaluate the quality and reliability of AI systems. It also refers to designing and building fair and ethical AI systems. To evaluate correctly, users should also learn in which areas AI is strong, and in which areas it is weak. === AI ethics === AI ethics refers to understanding the moral implications of AI, and the making informed decisions regarding the use of AI tools. This area includes considerations such as: Accountability: Hold AI actors accountable for the operation of AI systems and adherence to ethical ideals. Accuracy: Identify and report sources of error and uncertainty in algorithms and data. Auditability: Enable other parties to audit and assess algorithm behavior via transparent information sharing. Explainability: Make sure that algorithmic judgments and the underlying data can be presented in simple language. Fairness: Prevent biases and consider varied viewpoints. To do so, increase the diversity of researchers in the field. Human Centricity and Well-being: Prioritize human well-being in AI development and deployment. Human rights Alignment: Ensure that technology do not infringe internationally recognized human rights. Inclusivity: Make AI accessible to everyone. Progress: Choose high value initiatives. Responsibility, accountability, and transparency: Foster trust via responsibility, accountability, and fairness. Robustness and Security: Make AI systems safe, secure, and resistant to manipulation or data breach. Sustainability: Choose implementations that generate long-term, useful benefits. Environmental Implications: How this tool impacts the environment, any restrictions or laws, if this impact is worth the effects or not. === Enabling AI === Support AI by developing associated knowledge and skills such as programming and statistics. == Promoting AI literacy == Several governments have recognized the need to promote AI literacy, including among adults. Such programs have been published in the United States, China, Germany and Finland. Programs intended for the general public usually consist of short and easy to understand online study units. Programs intended for children are usually project-based. Programs for students at colleges and universities often address the specific professional needs of the student, depending on their field of study. Beyond the education system, AI literacy can also be developed in the community, for example in museums. === Schools === Schools use diverse pedagogies to promote AI literacy. These include: Performing a Turing test with an intelligent agent Creating chatbots Building apps using Blockly-based programming Project-based learning Building robots Data visualization Training AI models Artificial intelligence curricula can improve students' understanding of topics such as machine learning, neural networks, and deep learning. === Higher education === Before the second decade of the 21st century, artificial intelligence was studied mainly in STEM courses. Later, projects emerged to increase artificial intelligence education, specifically to promote AI literacy. Most courses start with one or more study units that deal with basic questions such as what artificial intelligence is, where it comes from, what it can do and what it can't do. Most courses also refer to machine learning and deep learning. Some of the courses deal with moral issues in artificial intelligence. In Ireland, the Higher Education Authority published Generative AI in Higher Education Teaching & Learning: Policy Framework in December 2025, which encouraged higher education institutions to embed AI literacy across programmes as a core graduate attribute. ==== Disciplinary policy ==== As a response to the increase of generative AI use in education, several disciplines formed committees or task forces to examine context-specific approaches toward AI literacy. In spring 2025, the Modern Language Association and Conference on College Composition and Communication Joint Task Force finished development of three working papers, a guide on AI literacy for students, and a collection of resources addressing AI use in writing. The task force emphasized the need for "a culture of critical AI literacy" and included guidelines not only for students but also educators and institutions, highlighting the need for modeling ethical AI use in planning processes. Similarly, a committee formed by the American Historical Association Council published "Guiding Principles for Artificial Intelligence in History Education" which encouraged "clear and transparent engagement with generative AI." The guidelines demonstrate the value of criticality when working with generative AI in thinking and research.

    Read more →
  • Color quantization

    Color quantization

    In computer graphics, color quantization or color image quantization is quantization applied to color spaces; it is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as visually similar as possible to the original image. Computer algorithms to perform color quantization on bitmaps have been studied since the 1970s. Color quantization is critical for displaying images with many colors on devices that can only display a limited number of colors, usually due to memory limitations, and enables efficient compression of certain types of images. The name "color quantization" is primarily used in computer graphics research literature; in applications, terms such as optimized palette generation, optimal palette generation, or decreasing color depth are used. Some of these are misleading, as the palettes generated by standard algorithms are not necessarily the best possible. == Algorithms == Most standard techniques treat color quantization as a problem of clustering points in three-dimensional space, where the points represent colors found in the original image and the three axes represent the three color channels. Almost any three-dimensional clustering algorithm can be applied to color quantization, and vice versa. After the clusters are located, typically the points in each cluster are averaged to obtain the representative color that all colors in that cluster are mapped to. The three color channels are usually red, green, and blue, but another popular choice is the Lab color space, in which Euclidean distance is more consistent with perceptual difference. The most popular algorithm by far for color quantization, invented by Paul Heckbert in 1979, is the median cut algorithm. Many variations on this scheme are in use. Before this time, most color quantization was done using the population algorithm or population method, which essentially constructs a histogram of equal-sized ranges and assigns colors to the ranges containing the most points. A more modern popular method is clustering using octrees, first conceived by Gervautz and Purgathofer and improved by Xerox PARC researcher Dan Bloomberg. If the palette is fixed, as is often the case in real-time color quantization systems such as those used in operating systems, color quantization is usually done using the "straight-line distance" or "nearest color" algorithm, which simply takes each color in the original image and finds the closest palette entry, where distance is determined by the distance between the two corresponding points in three-dimensional space. In other words, if the colors are ( r 1 , g 1 , b 1 ) {\displaystyle (r_{1},g_{1},b_{1})} and ( r 2 , g 2 , b 2 ) {\displaystyle (r_{2},g_{2},b_{2})} , we want to minimize the Euclidean distance: ( r 1 − r 2 ) 2 + ( g 1 − g 2 ) 2 + ( b 1 − b 2 ) 2 . {\displaystyle {\sqrt {(r_{1}-r_{2})^{2}+(g_{1}-g_{2})^{2}+(b_{1}-b_{2})^{2}}}.} This effectively decomposes the color cube into a Voronoi diagram, where the palette entries are the points and a cell contains all colors mapping to a single palette entry. There are efficient algorithms from computational geometry for computing Voronoi diagrams and determining which region a given point falls in; in practice, indexed palettes are so small that these are usually overkill. Color quantization is frequently combined with dithering, which can eliminate unpleasant artifacts such as banding that appear when quantizing smooth gradients and give the appearance of a larger number of colors. Some modern schemes for color quantization attempt to combine palette selection with dithering in one stage, rather than perform them independently. A number of other much less frequently used methods have been invented that use entirely different approaches. The Local K-means algorithm, conceived by Oleg Verevka in 1995, is designed for use in windowing systems where a core set of "reserved colors" is fixed for use by the system and many images with different color schemes might be displayed simultaneously. It is a post-clustering scheme that makes an initial guess at the palette and then iteratively refines it. In the early days of color quantization, the k-means clustering algorithm was deemed unsuitable because of its high computational requirements and sensitivity to initialization. In 2011, M. Emre Celebi reinvestigated the performance of k-means as a color quantizer. He demonstrated that an efficient implementation of k-means outperforms a large number of color quantization methods. The high-quality but slow NeuQuant algorithm reduces images to 256 colors by training a Kohonen neural network "which self-organises through learning to match the distribution of colours in an input image. Taking the position in RGB-space of each neuron gives a high-quality colour map in which adjacent colours are similar." It is particularly advantageous for images with gradients. Finally, one of the newer methods is spatial color quantization, conceived by Puzicha, Held, Ketterer, Buhmann, and Fellner of the University of Bonn, which combines dithering with palette generation and a simplified model of human perception to produce visually impressive results even for very small numbers of colors. It does not treat palette selection strictly as a clustering problem, in that the colors of nearby pixels in the original image also affect the color of a pixel. See sample images. == History and applications == In the early days of PCs, it was common for video adapters to support only 2, 4, 16, or (eventually) 256 colors due to video memory limitations; they preferred to dedicate the video memory to having more pixels (higher resolution) rather than more colors. Color quantization helped to justify this tradeoff by making it possible to display many high color images in 16- and 256-color modes with limited visual degradation. Many operating systems automatically perform quantization and dithering when viewing high color images in a 256 color video mode, which was important when video devices limited to 256 color modes were dominant. Modern computers can now display millions of colors at once, far more than can be distinguished by the human eye, limiting this application primarily to mobile devices and legacy hardware. Nowadays, color quantization is mainly used in GIF and PNG images. GIF, for a long time the most popular lossless and animated bitmap format on the World Wide Web, only supports up to 256 colors, necessitating quantization for many images. Some early web browsers constrained images to use a specific palette known as the web colors, leading to severe degradation in quality compared to optimized palettes. PNG images support 24-bit color, but can often be made much smaller in filesize without much visual degradation by application of color quantization, since PNG files use fewer bits per pixel for palettized images. The infinite number of colors available through the lens of a camera is impossible to display on a computer screen; thus converting any photograph to a digital representation necessarily involves some quantization. Practically speaking, 24-bit color is sufficiently rich to represent almost all colors perceivable by humans with sufficiently small error as to be visually identical (if presented faithfully), within the available color space. However, the digitization of color, either in a camera detector or on a screen, necessarily limits the available color space. Consequently there are many colors that may be impossible to reproduce, regardless of how many bits are used to represent the color. For example, it is impossible in typical RGB color spaces (common on computer monitors) to reproduce the full range of green colors that the human eye is capable of perceiving. With the few colors available on early computers, different quantization algorithms produced very different-looking output images. As a result, a lot of time was spent on writing sophisticated algorithms to be more lifelike. === Quantization for image compression === Many image file formats support indexed color. A whole-image palette typically selects 256 "representative" colors for the entire image, where each pixel references any one of the colors in the palette, as in the GIF and PNG file formats. A block palette typically selects 2 or 4 colors for each block of 4x4 pixels, used in BTC, CCC, S2TC, and S3TC. === Editor support === Many bitmap graphics editors contain built-in support for color quantization, and will automatically perform it when converting an image with many colors to an image format with fewer colors. Most of these implementations allow the user to set exactly the number of desired colors. Examples of such support include: Photoshop's Mode→Indexed Color function supplies a number of quantization algorithms ranging from the fixed Windows system and Web palettes to the proprietary Local and Global algorithms for generating palettes suited to a particu

    Read more →
  • Symbolic regression

    Symbolic regression

    Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a starting point for symbolic regression. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using genetic programming, as well as more recent methods utilizing Bayesian methods and neural networks. Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy. Further methods such as Exact Learning attempt to transform the fitting problem into a moments problem in a natural function space, usually built around generalizations of the Meijer-G function. By not requiring a priori specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system, as well as improving generalisability and extrapolation behaviour by preventing overfitting. Accuracy and simplicity may be left as two separate objectives of the regression—in which case the optimum solutions form a Pareto front—or they may be combined into a single objective by means of a model selection principle such as minimum description length. It has been proven that symbolic regression is an NP-hard problem. Nevertheless, if the sought-for equation is not too complex it is possible to solve the symbolic regression problem exactly by generating every possible function (built from some predefined set of operators) and evaluating them on the dataset in question. == Difference from classical regression == While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters. This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system. Nevertheless, this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity. == Benchmarking == === SRBench === In 2021, SRBench was proposed as a large benchmark for symbolic regression. In its inception, SRBench featured 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB. The benchmark intends to be a living project: it encourages the submission of improvements, new datasets, and new methods, to keep track of the state of the art in SR. === SRBench Competition 2022 === In 2022, SRBench announced the competition Interpretable Symbolic Regression for Data Science, which was held at the GECCO conference in Boston, MA. The competition pitted nine leading symbolic regression algorithms against each other on a novel set of data problems and considered different evaluation criteria. The competition was organized in two tracks, a synthetic track and a real-world data track. ==== Synthetic Track ==== In the synthetic track, methods were compared according to five properties: re-discovery of exact expressions; feature selection; resistance to local optima; extrapolation; and sensitivity to noise. Rankings of the methods were: QLattice PySR (Python Symbolic Regression) uDSR (Deep Symbolic Optimization) ==== Real-world Track ==== In the real-world track, methods were trained to build interpretable predictive models for 14-day forecast counts of COVID-19 cases, hospitalizations, and deaths in New York State. These models were reviewed by a subject expert and assigned trust ratings and evaluated for accuracy and simplicity. The ranking of the methods was: uDSR (Deep Symbolic Optimization) QLattice geneticengine (Genetic Engine) == Non-standard methods == Most symbolic regression algorithms prevent combinatorial explosion by implementing evolutionary algorithms that iteratively improve the best-fit expression over many generations. Recently, researchers have proposed algorithms utilizing other tactics in AI. Silviu-Marian Udrescu and Max Tegmark developed the "AI Feynman" algorithm, which attempts symbolic regression by training a neural network to represent the mystery function, then runs tests against the neural network to attempt to break up the problem into smaller parts. For example, if f ( x 1 , . . . , x i , x i + 1 , . . . , x n ) = g ( x 1 , . . . , x i ) + h ( x i + 1 , . . . , x n ) {\displaystyle f(x_{1},...,x_{i},x_{i+1},...,x_{n})=g(x_{1},...,x_{i})+h(x_{i+1},...,x_{n})} , tests against the neural network can recognize the separation and proceed to solve for g {\displaystyle g} and h {\displaystyle h} separately and with different variables as inputs. This is an example of divide and conquer, which reduces the size of the problem to be more manageable. AI Feynman also transforms the inputs and outputs of the mystery function in order to produce a new function which can be solved with other techniques, and performs dimensional analysis to reduce the number of independent variables involved. The algorithm was able to "discover" 100 equations from The Feynman Lectures on Physics, while a leading software using evolutionary algorithms, Eureqa, solved only 71. AI Feynman, in contrast to classic symbolic regression methods, requires a very large dataset in order to first train the neural network and is naturally biased towards equations that are common in elementary physics.

    Read more →
  • Computer audition

    Computer audition

    Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound semantics for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing, auditory modelling, music perception and cognition, pattern recognition, and machine learning, as well as more traditional methods of artificial intelligence for musical knowledge representation. == Applications == Like computer vision versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings. Applications of computer audition are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on. == Related disciplines == Computer Audition overlaps with the following disciplines: Music information retrieval: methods for search and analysis of similarity between music signals. Auditory scene analysis: understanding and description of audio sources and events. Computational musicology and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data. Computer music: use of computers in creative musical applications. Machine musicianship: audition driven interactive music systems. == Areas of study == Since audio signals are interpreted by the human ear–brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) could benefit from efficient compression schemes, which discard inaudible information in the sound. Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces. The study of CA could be roughly divided into the following sub-problems: Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture. Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma, and auditory representations. Musical knowledge structures: analysis of tonality, rhythm, and harmonies. Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering. Sequence modeling: matching and alignment between signals and note sequences. Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods. Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure. Multi-modal analysis: finding correspondences between textual, visual, and audio signals. === Representation issues === Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files. Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric representation for general audio. Parametric audio representations usually use filter banks or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings. === Features === Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape etc., statistical characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (bandwidth) in frequency or octave invariance (chroma). Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation. === Musical knowledge === Finding specific musical structures is possible by using musical knowledge as well as supervised and unsupervised machine learning methods. Examples of this include detection of tonality according to distribution of frequencies that correspond to patterns of occurrence of notes in musical scales, distribution of note onset times for detection of beat structure, distribution of energies in different frequencies to detect musical chords and so on. === Sound similarity and sequence modeling === Comparison of sounds can be done by comparison of features with or without reference to time. In some cases an overall similarity can be assessed by close values of features between two sounds. In other cases when temporal structure is important, methods of dynamic time warping need to be applied to "correct" for different temporal scales of acoustic events. Finding repetitions and similar sub-sequences of sonic events is important for tasks such as texture synthesis and machine improvisation. === Source separation === Since one of the basic characteristics of general audio is that it comprises multiple simultaneously sounding sources, such as multiple musical instruments, people talking, machine noises or animal vocalization, the ability to identify and separate individual sources is very desirable. Unfortunately, there are no methods that can solve this problem in a robust fashion. Existing methods of source separation rely sometimes on correlation between different audio channels in multi-channel recordings. The ability to separate sources from stereo signals requires different techniques than those usually applied in communications where multiple sensors are available. Other source separation methods rely on training or clustering of features in mono recording, such as tracking harmonically related partials for multiple pitch detection. Some methods, before explicit recognition, rely on revealing structures in data without knowing the structures (like recognizing objects in abstract pictures without attributing them meaningful labels) by finding the least complex data representations, for instance describing audio scenes as generated by a few tone patterns and their trajectories (polyphonic voices) and acoustical contours drawn by a tone (c

    Read more →
  • Cognitive robotics

    Cognitive robotics

    Cognitive robotics or cognitive technology is a subfield of robotics concerned with endowing a robot with intelligent behavior by providing it with a processing architecture that will allow it to learn and reason about how to behave in response to complex goals in a complex world. Cognitive robotics may be considered the engineering branch of embodied cognitive science and embodied embedded cognition, consisting of robotic process automation, artificial intelligence, machine learning, deep learning, optical character recognition, image processing, process mining, analytics, software development and system integration. == Core issues == While traditional cognitive modeling approaches have assumed symbolic coding schemes as a means for depicting the world, translating the world into these kinds of symbolic representations has proven to be problematic if not untenable. Perception and action and the notion of symbolic representation are therefore core issues to be addressed in cognitive robotics. == Starting point == Cognitive robotics views human or animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional artificial intelligence techniques. Target robotic cognitive capabilities include perception processing, attention allocation, anticipation, planning, complex motor coordination, reasoning about other agents and perhaps even about their own mental states. Robotic cognition embodies the behavior of intelligent agents in the physical world (or a virtual world, in the case of simulated cognitive robotics). Ultimately, the robot must be able to act in the real world. == Learning techniques == === Motor Babble === A preliminary robot learning technique called motor babbling involves correlating pseudo-random complex motor movements by the robot with resulting visual and/or auditory feedback such that the robot may begin to expect a pattern of sensory feedback given a pattern of motor output. Desired sensory feedback may then be used to inform a motor control signal. This is thought to be analogous to how a baby learns to reach for objects or learns to produce speech sounds. For simpler robot systems, where, for instance, inverse kinematics may feasibly be used to transform anticipated feedback (desired motor result) into motor output, this step may be skipped. === Imitation === Once a robot can coordinate its motors to produce a desired result, the technique of learning by imitation may be used. The robot monitors the performance of another agent and then the robot tries to imitate that agent. It is often a challenge to transform imitation information from a complex scene into a desired motor result for the robot. Note that imitation is a high-level form of cognitive behavior and imitation is not necessarily required in a basic model of embodied animal cognition. === Knowledge acquisition === A more complex learning approach is "autonomous knowledge acquisition": the robot is left to explore the environment on its own. A system of goals and beliefs is typically assumed. A somewhat more directed mode of exploration can be achieved by "curiosity" algorithms, such as Intelligent Adaptive Curiosity or Category-Based Intrinsic Motivation. These algorithms generally involve breaking sensory input into a finite number of categories and assigning some sort of prediction system (such as an artificial neural network) to each. The prediction system keeps track of the error in its predictions over time. Reduction in prediction error is considered learning. The robot then preferentially explores categories in which it is learning (or reducing prediction error) the fastest. == Other architectures == Some researchers in cognitive robotics have tried using architectures such as (ACT-R and Soar (cognitive architecture)) as a basis of their cognitive robotics programs. These highly modular symbol-processing architectures have been used to simulate operator performance and human performance when modeling simplistic and symbolized laboratory data. The idea is to extend these architectures to handle real-world sensory input as that input continuously unfolds through time. What is needed is a way to somehow translate the world into a set of symbols and their relationships. == Questions == Some of the fundamental questions to be answered in cognitive robotics are: How much human programming should or can be involved to support the learning processes? How can one quantify progress? Some of the adopted ways are reward and punishment. But what kind of reward and what kind of punishment? In humans, when teaching a child, for example, the reward would be candy or some encouragement, and the punishment can take many forms. But what is an effective way with robots?

    Read more →
  • Geo-replication

    Geo-replication

    Geo-replication systems are designed to provide improved availability and disaster tolerance by using geographically distributed data centers. This is intended to improve the response time for applications such as web portals. Geo-replication can be achieved using software, hardware or a combination of the two. == Software == Geo-replication software is a network performance-enhancing technology that is designed to provide improved access to portal or intranet content for users at the most remote parts of large organizations. It is based on the principle of storing complete replicas of portal content on local servers, and then keeping the content on those servers up-to-date using heavily compressed data updates. === Portal acceleration === Geo-replication technologies are used to provide replication of the content of portals, intranets, web applications, content and data between servers, across wide area networks WAN to allow users at remote sites to access central content at LAN speeds. Geo-replication software can improve the performance of data networks that suffer limited bandwidth, latency and periodic disconnection. Terabytes of data can be replicated over a wide area network, giving remote sites faster access to web applications. Geo-replication software uses a combination of data compression and content caching technologies. differencing technologies can also be employed to reduce the volume of data that has to be transmitted to keep portal content accurate across all servers. This update compression can reduce the load that portal traffic places on networks, and improve the response time of a portal. === Portal replication === Remote users of web portals and collaboration environments will frequently experience network bandwidth and latency problems which will slow down their experience of opening and closing files, and otherwise interacting with the portal. Geo-replication technology is deployed to accelerate the remote end user portal performance to be equivalent to that experienced by users locally accessing the portal in the central office. === Differencing engine technologies === To deliver this reduction in the size of the required data updates across a portal, geo-replication systems often use differencing engine technologies. These systems are able to difference the content of each portal server right down to the byte level. This knowledge of the content that is already on each server enables the system to rebuild any changes to the content on one server, across each of the other servers in the deployment from content already hosted on those other servers. This type of differencing system ensures that no content, at the byte level, is ever sent to a server twice. === Offline portal replication on laptops === Geo-replication systems are often extended to deliver local replication beyond the server and down to the laptop used by a single user. Server to laptop replication enables mobile users to have access to a local replica of their business portal on a standard laptop. This technology may be employed to provide in the field access to portal content by, for example, sales forces and combat forces. == Geo-replication systems ==

    Read more →
  • Data Science and Predictive Analytics

    Data Science and Predictive Analytics

    The first edition of the textbook Data Science and Predictive Analytics: Biomedical and Health Applications using R, authored by Ivo D. Dinov, was published in August 2018 by Springer. The second edition of the book was printed in 2023. This textbook covers some of the core mathematical foundations, computational techniques, and artificial intelligence approaches used in data science research and applications. By using the statistical computing platform R and a broad range of biomedical case-studies, the 23 chapters of the book first edition provide explicit examples of importing, exporting, processing, modeling, visualizing, and interpreting large, multivariate, incomplete, heterogeneous, longitudinal, and incomplete datasets (big data). == Structure == === First edition table of contents === The first edition of the Data Science and Predictive Analytics (DSPA) textbook is divided into the following 23 chapters, each progressively building on the previous content. === Second edition table of contents === The significantly reorganized revised edition of the book (2023) expands and modernizes the presented mathematical principles, computational methods, data science techniques, model-based machine learning and model-free artificial intelligence algorithms. The 14 chapters of the new edition start with an introduction and progressively build foundational skills to naturally reach biomedical applications of deep learning. Introduction Basic Visualization and Exploratory Data Analytics Linear Algebra, Matrix Computing, and Regression Modeling Linear and Nonlinear Dimensionality Reduction Supervised Classification Black Box Machine Learning Methods Qualitative Learning Methods—Text Mining, Natural Language Processing, and Apriori Association Rules Learning Unsupervised Clustering Model Performance Assessment, Validation, and Improvement Specialized Machine Learning Topics Variable Importance and Feature Selection Big Longitudinal Data Analysis Function Optimization Deep Learning, Neural Networks == Reception == The materials in the Data Science and Predictive Analytics (DSPA) textbook have been peer-reviewed in the Journal of the American Statistical Association, International Statistical Institute’s ISI Review Journal, and the Journal of the American Library Association. Many scholarly publications reference the DSPA textbook. As of January 17, 2021, the electronic version of the book first edition (ISBN 978-3-319-72347-1) is freely available on SpringerLink and has been downloaded over 6 million times. The textbook is globally available in print (hardcover and softcover) and electronic formats (PDF and EPub) in many college and university libraries and has been used for data science, computational statistics, and analytics classes at various institutions.

    Read more →
  • AI Overviews

    AI Overviews

    AI Overviews is an artificial intelligence (AI) feature integrated into Google Search that produces AI-generated summaries of search results. The feature has been criticized for its inaccuracy and for reducing website traffic. == History and development == AI Overviews were first introduced as part of Google's Search Generative Experience (SGE), which was unveiled at the Google I/O conference in May 2023. In May 2024 at Google I/O 2024, the feature was rebranded as AI Overviews and launched in the United States. The introduction of AI Overviews was seen as a strategic move to compete with other generative AI advancements, including OpenAI's ChatGPT. By August 2024, AI Overviews was rolled out to several other countries, including the United Kingdom, India, Japan, Brazil, Mexico, and Indonesia, with support for multiple languages. In October 2024, Google expanded the feature globally, making it available in over 100 countries. In December 2024, Botify x Demandsphere released findings stating that when AI Overviews and featured snippets appear together on the search engine results page, they take up approximately 67.1% of the screen on desktop and 75.7% on mobile. Even if content is ranking in the #1 position, it may not be visible to consumers if other visual elements on the results page are more prominent. In March 2025, Google started testing an "AI Mode", where the search results page is AI-generated. The company was also considering adding advertisements to the AI Mode, as they already exist in AI Overviews. As of May 2025, AI Overviews are available in over 200 countries and territories and in more than 40 languages. As of March 2026, Google AI Overviews appear on more than 48% of total Google Search queries, compared to just 6.49% in the previous year (58% year-over-year growth). == Functionality == The AI Overviews feature uses large language models to generate summaries from web content. The overviews are designed to be concise, providing a snapshot of relevant information about the queried topic. Google allows users to adjust the language complexity in summaries, offering both simplified and detailed options. The overviews also include links to sources. According to a June 2025 study by Semrush, the most cited source is Quora, followed by Reddit. == Reception == The feature has faced criticism for inaccuracies, including instances where erroneous or nonsensical content was generated. Depending on what is searched for, the overview may also consist of hallucinated content, such as when searching for idioms that do not exist. In May 2024, Google temporarily restricted the AI tool after it provided suggestions that were seen as nonsensical and harmful, such as telling users to eat rocks or apply glue on pizza. Concerns were also raised by content publishers, who feared a decline in web traffic as users relied on the summaries instead of visiting source websites. A Google patent from 2026 raised the concern of webmasters that Google could entirely replace the landing page of websites by an AI optimized copy of the website in its results. There is also apprehension about the ethical implications of AI-driven content aggregation, including its impact on intellectual property rights and the visibility of smaller content providers. The European Commission announced in December 2025 that they were investigating whether AI Overviews breached European competition law. In response, Google has stated its commitment to improve content validation and refine the algorithms used to filter unreliable information. Google implemented measures to prioritize link placement within AI Overviews, aiming to balance user convenience with the needs of content creators. In January 2026, Google restricted AI Overviews on certain health-related searches following an investigation by The Guardian. == Lawsuits == On February 24, 2025, Chegg sued Alphabet over the AI Overviews feature, claiming that it was leading to students preferring "low-quality, unverified AI summaries", thus violating antitrust law. Chegg also said it was considering either a sale or a take-private transaction. In September 2025, Penske Media Corporation, the publisher of Rolling Stone and The Hollywood Reporter, sued Google, claiming that AI Overviews illegally regurgitate content from their websites and drive off potential site visitors by always appearing on top of the search results while leaving little incentive to see the linked sources. The company stated that "the future of digital media and [...] its integrity [...] is threatened by Google's current actions", alleging that 20% of searches that link to Penske-owned websites show AI Overviews and that the figure is expected to rise. Google spokesperson José Castañeda called the claims "meritless" and stated that "AI Overviews send traffic to a greater diversity of sites." In 2026, Canadian musician Ashley MacIsaac filed a lawsuit against Google claiming that the AI Overview feature had wrongly stated that MacIsaac had been convicted of numerous criminal offences and was on the sex offender registry. He claims this incorrect information led to the cancellation of a December 2025 gig organized by the Sipekne'katik First Nation.

    Read more →