AI App Quora

AI App Quora — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Real-time computer graphics

    Real-time computer graphics

    Real-time computer graphics or real-time rendering is the sub-field of computer graphics focused on producing and analyzing images in real time. The term can refer to anything from rendering an application's graphical user interface (GUI) to real-time image analysis, but is most often used in reference to interactive 3D computer graphics, typically using a graphics processing unit (GPU). One example of this concept is a video game that rapidly renders changing 3D environments to produce an illusion of motion. Computers have been capable of generating 2D images such as simple lines, images and polygons in real time since their invention. However, quickly rendering detailed 3D objects is a daunting task for traditional Von Neumann architecture-based systems. An early workaround to this problem was the use of sprites, 2D images that could imitate 3D graphics. Different techniques for rendering now exist, such as ray-tracing and rasterization. Using these techniques and advanced hardware, computers can now render images quickly enough to create the illusion of motion while simultaneously accepting user input. This means that the user can respond to rendered images in real time, producing an interactive experience. == Principles of real-time 3D computer graphics == The goal of computer graphics is to generate computer-generated images, or frames, using certain desired metrics. One such metric is the number of frames generated in a given second. Real-time computer graphics systems differ from traditional (i.e., non-real-time) rendering systems in that non-real-time graphics typically rely on ray tracing. In this process, millions or billions of rays are traced from the camera to the world for detailed rendering—this expensive operation can take hours or days to render a single frame. Real-time graphics systems must render each image in less than 1/30th of a second. Ray tracing is far too slow for these systems; instead, they employ the technique of z-buffer triangle rasterization. In this technique, every object is decomposed into individual primitives, usually triangles. Each triangle gets positioned, rotated and scaled on the screen, and rasterizer hardware (or a software emulator) generates pixels inside each triangle. These triangles are then decomposed into atomic units called fragments that are suitable for displaying on a display screen. The fragments are drawn on the screen using a color that is computed in several steps. For example, a texture can be used to "paint" a triangle based on a stored image, and then shadow mapping can alter that triangle's colors based on line-of-sight to light sources. === Video game graphics === Real-time graphics optimizes image quality subject to time and hardware constraints. GPUs and other advances increased the image quality that real-time graphics can produce. GPUs are capable of handling millions of triangles per frame, and modern DirectX/OpenGL class hardware is capable of generating complex effects, such as shadow volumes, motion blurring, and triangle generation, in real-time. The advancement of real-time graphics is evidenced in the progressive improvements between actual gameplay graphics and the pre-rendered cutscenes traditionally found in video games. Cutscenes are typically rendered in real-time—and may be interactive. Although the gap in quality between real-time graphics and traditional off-line graphics is narrowing, offline rendering remains much more accurate. === Advantages === Real-time graphics are typically employed when interactivity (e.g., player feedback) is crucial. When real-time graphics are used in films, the director has complete control of what has to be drawn on each frame, which can sometimes involve lengthy decision-making. Teams of people are typically involved in the making of these decisions. In real-time computer graphics, the user typically operates an input device to influence what is about to be drawn on the display. For example, when the user wants to move a character on the screen, the system updates the character's position before drawing the next frame. Usually, the display's response-time is far slower than the input device—this is justified by the immense difference between the (fast) response time of a human being's motion and the (slow) perspective speed of the human visual system. This difference has other effects too: because input devices must be very fast to keep up with human motion response, advancements in input devices (e.g., the current Wii remote) typically take much longer to achieve than comparable advancements in display devices. Another important factor controlling real-time computer graphics is the combination of physics and animation. These techniques largely dictate what is to be drawn on the screen—especially where to draw objects in the scene. These techniques help realistically imitate real world behavior (the temporal dimension, not the spatial dimensions), adding to the computer graphics' degree of realism. Real-time previewing with graphics software, especially when adjusting lighting effects, can increase work speed. Some parameter adjustments in fractal generating software may be made while viewing changes to the image in real time. == Rendering pipeline == The graphics rendering pipeline ("rendering pipeline" or simply "pipeline") is the foundation of real-time graphics. Its main function is to render a two-dimensional image in relation to a virtual camera, three-dimensional objects (an object that has width, length, and depth), light sources, lighting models, textures and more. === Architecture === The architecture of the real-time rendering pipeline can be divided into conceptual stages: application, geometry and rasterization. === Application stage === The application stage is responsible for generating "scenes", or 3D settings that are drawn to a 2D display. This stage is implemented in software that developers optimize for performance. This stage may perform processing such as collision detection, speed-up techniques, animation and force feedback, in addition to handling user input. Collision detection is an example of an operation that would be performed in the application stage. Collision detection uses algorithms to detect and respond to collisions between (virtual) objects. For example, the application may calculate new positions for the colliding objects and provide feedback via a force feedback device such as a vibrating game controller. The application stage also prepares graphics data for the next stage. This includes texture animation, animation of 3D models, animation via transforms, and geometry morphing. Finally, it produces primitives (points, lines, and triangles) based on scene information and feeds those primitives into the geometry stage of the pipeline. === Geometry stage === The geometry stage manipulates polygons and vertices to compute what to draw, how to draw it and where to draw it. Usually, these operations are performed by specialized hardware or GPUs. Variations across graphics hardware mean that the "geometry stage" may actually be implemented as several consecutive stages. ==== Model and view transformation ==== Before the final model is shown on the output device, the model is transformed onto multiple spaces or coordinate systems. Transformations move and manipulate objects by altering their vertices. Transformation is the general term for the four specific ways that manipulate the shape or position of a point, line or shape. ==== Lighting ==== In order to give the model a more realistic appearance, one or more light sources are usually established during transformation. However, this stage cannot be reached without first transforming the 3D scene into view space. In view space, the observer (camera) is typically placed at the origin. If using a right-handed coordinate system (which is considered standard), the observer looks in the direction of the negative z-axis with the y-axis pointing upwards and the x-axis pointing to the right. ==== Projection ==== Projection is a transformation used to represent a 3D model in a 2D space. The two main types of projection are orthographic projection (also called parallel) and perspective projection. The main characteristic of an orthographic projection is that parallel lines remain parallel after the transformation. Perspective projection utilizes the concept that if the distance between the observer and model increases, the model appears smaller than before. Essentially, perspective projection mimics human sight. ==== Clipping ==== Clipping is the process of removing primitives that are outside of the view box in order to facilitate the rasterizer stage. Once those primitives are removed, the primitives that remain will be drawn into new triangles that reach the next stage. ==== Screen mapping ==== The purpose of screen mapping is to find out the coordinates of the primitives during the clipping stage. ==== Rasterizer stage ==== The rasterizer

    Read more →
  • Cognitive computer

    Cognitive computer

    A cognitive computer is a computer that hardwires artificial intelligence and machine learning algorithms into an integrated circuit that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic engineering approach. Synonyms include neuromorphic chip and cognitive chip. In 2023, IBM's proof-of-concept NorthPole chip (optimized for 2-, 4- and 8-bit precision) achieved remarkable performance in image recognition. In 2013, IBM developed Watson, a cognitive computer that uses neural networks and deep learning techniques. The following year, it developed the 2014 TrueNorth microchip architecture which is designed to be closer in structure to the human brain than the von Neumann architecture used in conventional computers. In 2017, Intel also announced its version of a cognitive chip in "Loihi, which it intended to be available to university and research labs in 2018. Intel (most notably with its Pohoiki Beach and Springs systems), Qualcomm, and others are improving neuromorphic processors steadily. == IBM TrueNorth chip == TrueNorth was a neuromorphic CMOS integrated circuit produced by IBM in 2014. It is a manycore processor network on a chip design, with 4096 cores, each one having 256 programmable simulated neurons for a total of just over a million neurons. In turn, each neuron has 256 programmable "synapses" that convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). Its basic transistor count is 5.4 billion. In 2023 Zhejiang University and Alibaba developed Darwin a neuromorphic chip The darwin3 chip was designed around 2023 so it is fairly modern compared to IBM's TrueNorth or Intel's LoihI. === Details === Memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von Neumann-architecture bottleneck and is very energy-efficient, with IBM claiming a power consumption of 70 milliwatts and a power density that is 1/10,000th of conventional microprocessors. The SyNAPSE chip operates at lower temperatures and power because it only draws power necessary for computation. Skyrmions have been proposed as models of the synapse on a chip. The neurons are emulated using a Linear-Leak Integrate-and-Fire (LLIF) model, a simplification of the leaky integrate-and-fire model. According to IBM, it does not have a clock, operates on unary numbers, and computes by counting to a maximum of 19 bits. The cores are event-driven by using both synchronous and asynchronous logic, and are interconnected through an asynchronous packet-switched mesh network on chip (NOC). IBM developed a new network to program and use TrueNorth. It included a simulator, a new programming language, an integrated programming environment, and libraries. This lack of backward compatibility with any previous technology (e.g., C++ compilers) poses serious vendor lock-in risks and other adverse consequences that may prevent it from commercialization in the future. === Research === In 2018, a cluster of TrueNorth network-linked to a master computer was used in stereo vision research that attempted to extract the depth of rapidly moving objects in a scene. == IBM NorthPole chip == In 2023, IBM released its NorthPole chip, which is a proof-of-concept for dramatically improving performance by intertwining compute with memory on-chip, thus eliminating the Von Neumann bottleneck. It blends approaches from IBM's 2014 TrueNorth system with modern hardware designs to achieve speeds about 4,000 times faster than TrueNorth. It can run ResNet-50 or Yolo-v4 image recognition tasks about 22 times faster, with 25 times less energy and 5 times less space, when compared to GPUs which use the same 12-nm node process that it was fabricated with. It includes 224 MB of RAM and 256 processor cores and can perform 2,048 operations per core per cycle at 8-bit precision, and 8,192 operations at 2-bit precision. It runs at between 25 and 425 MHz. This is an inferencing chip, but it cannot yet handle GPT-4 because of memory and accuracy limitations == Intel Loihi chip == === Pohoiki Springs === Pohoiki Springs is a system that incorporates Intel's self-learning neuromorphic chip, named Loihi, introduced in 2017, perhaps named after the Hawaiian seamount Lōʻihi. Intel claims Loihi is about 1000 times more energy efficient than general-purpose computing systems used to train neural networks. In theory, Loihi supports both machine learning training and inference on the same silicon independently of a cloud connection, and more efficiently than convolutional neural networks or deep learning neural networks. Intel points to a system for monitoring a person's heartbeat, taking readings after events such as exercise or eating, and using the chip to normalize the data and work out the ‘normal’ heartbeat. It can then spot abnormalities and deal with new events or conditions. The first iteration of the chip was made using Intel's 14 nm fabrication process and houses 128 clusters of 1,024 artificial neurons each for a total of 131,072 simulated neurons. This offers around 130 million synapses, far less than the human brain's 800 trillion synapses, and behind IBM's TrueNorth. Loihi is available for research purposes among more than 40 academic research groups as a USB form factor. In October 2019, researchers from Rutgers University published a research paper to demonstrate the energy efficiency of Intel's Loihi in solving simultaneous localization and mapping. In March 2020, Intel and Cornell University published a research paper to demonstrate the ability of Intel's Loihi to recognize different hazardous materials, which could eventually aid to "diagnose diseases, detect weapons and explosives, find narcotics, and spot signs of smoke and carbon monoxide". === Pohoiki Beach === Intel's Loihi 2, named Pohoiki Beach, was released in September 2021 with 64 cores. It boasts faster speeds, higher-bandwidth inter-chip communications for enhanced scalability, increased capacity per chip, a more compact size due to process scaling, and improved programmability. === Hala Point === Hala Point packages 1,152 Loihi 2 processors produced on Intel 3 process node in a six-rack-unit chassis. The system supports up to 1.15 billion neurons and 128 billion synapses distributed over 140,544 neuromorphic processing cores, consuming 2,600 watts of power. It includes over 2,300 embedded x86 processors for ancillary computations. Intel claimed in 2024 that Hala Point was the world’s largest neuromorphic system. It uses Loihi 2 chips. It is claimed to offer 10x more neuron capacity and up to 12x higher performance. The Darwin3 chip exceeds these specs. Hala Point provides up to 20 quadrillion operations per second, (20 petaops), with efficiency exceeding 15 trillion (8-bit) operations s−1 W−1 on conventional deep neural networks. Hala Point integrates processing, memory and communication channels in a massively parallelized fabric, providing 16 PB s−1 of memory bandwidth, 3.5 PB s−1 of inter-core communication bandwidth, and 5 TB s−1 of inter-chip bandwidth. The system can process its 1.15 billion neurons 20 times faster than a human brain. Its neuron capacity is roughly equivalent to that of an owl brain or the cortex of a capuchin monkey. Loihi-based systems can perform inference and optimization using 100 times less energy at speeds as much as 50 times faster than CPU/GPU architectures. Intel claims that Hala Point can create LLMs. Much further research is needed == SpiNNaker == SpiNNaker (Spiking Neural Network Architecture) is a massively parallel, manycore supercomputer architecture designed by the Advanced Processor Technologies Research Group at the Department of Computer Science, University of Manchester. == Criticism == Critics argue that a room-sized computer – as in the case of IBM's Watson – is not a viable alternative to a three-pound human brain. Some also cite the difficulty for a single system to bring so many elements together, such as the disparate sources of information as well as computing resources. In 2021, The New York Times released Steve Lohr's article "What Ever Happened to IBM’s Watson?". He wrote about some costly failures of IBM Watson. One of them, a cancer-related project called the Oncology Expert Advisor, was abandoned in 2016 as a costly failure. During the collaboration, Watson could not use patient data. Watson struggled to decipher doctors’ notes and patient histories. The development of LLMs has placed a new emphasis on cognitive computers, because the Transformer technology that underpins LLMs demands huge energy for GPUs and PCs. Cognitive computers use significantly less energy, but the details of STDPs and neuron models cannot yet match the accuracy of backprop, and so ANN to SNN weight translations such as QAT and PQT or progressive quantization are becoming popular, with their own limitations.

    Read more →
  • Regularization perspectives on support vector machines

    Regularization perspectives on support vector machines

    Within mathematical analysis, Regularization perspectives on support-vector machines provide a way of interpreting support-vector machines (SVMs) in the context of other regularization-based machine-learning algorithms. SVM algorithms categorize binary data, with the goal of fitting the training set data in a way that minimizes the average of the hinge-loss function and L2 norm of the learned weights. This strategy avoids overfitting via Tikhonov regularization and in the L2 norm sense and also corresponds to minimizing the bias and variance of our estimator of the weights. Estimators with lower Mean squared error predict better or generalize better when given unseen data. Specifically, Tikhonov regularization algorithms produce a decision boundary that minimizes the average training-set error and constrain the Decision boundary not to be excessively complicated or overfit the training data via a L2 norm of the weights term. The training and test-set errors can be measured without bias and in a fair way using accuracy, precision, Auc-Roc, precision-recall, and other metrics. Regularization perspectives on support-vector machines interpret SVM as a special case of Tikhonov regularization, specifically Tikhonov regularization with the hinge loss for a loss function. This provides a theoretical framework with which to analyze SVM algorithms and compare them to other algorithms with the same goals: to generalize without overfitting. SVM was first proposed in 1995 by Corinna Cortes and Vladimir Vapnik, and framed geometrically as a method for finding hyperplanes that can separate multidimensional data into two categories. This traditional geometric interpretation of SVMs provides useful intuition about how SVMs work, but is difficult to relate to other machine-learning techniques for avoiding overfitting, like regularization, early stopping, sparsity and Bayesian inference. However, once it was discovered that SVM is also a special case of Tikhonov regularization, regularization perspectives on SVM provided the theory necessary to fit SVM within a broader class of algorithms. This has enabled detailed comparisons between SVM and other forms of Tikhonov regularization, and theoretical grounding for why it is beneficial to use SVM's loss function, the hinge loss. == Theoretical background == In the statistical learning theory framework, an algorithm is a strategy for choosing a function f : X → Y {\displaystyle f\colon \mathbf {X} \to \mathbf {Y} } given a training set S = { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle S=\{(x_{1},y_{1}),\ldots ,(x_{n},y_{n})\}} of inputs x i {\displaystyle x_{i}} and their labels y i {\displaystyle y_{i}} (the labels are usually ± 1 {\displaystyle \pm 1} ). Regularization strategies avoid overfitting by choosing a function that fits the data, but is not too complex. Specifically: f = argmin f ∈ H { 1 n ∑ i = 1 n V ( y i , f ( x i ) ) + λ ‖ f ‖ H 2 } , {\displaystyle f={\underset {f\in {\mathcal {H}}}{\operatorname {argmin} }}\left\{{\frac {1}{n}}\sum _{i=1}^{n}V(y_{i},f(x_{i}))+\lambda \|f\|_{\mathcal {H}}^{2}\right\},} where H {\displaystyle {\mathcal {H}}} is a hypothesis space of functions, V : Y × Y → R {\displaystyle V\colon \mathbf {Y} \times \mathbf {Y} \to \mathbb {R} } is the loss function, ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} is a norm on the hypothesis space of functions, and λ ∈ R {\displaystyle \lambda \in \mathbb {R} } is the regularization parameter. When H {\displaystyle {\mathcal {H}}} is a reproducing kernel Hilbert space, there exists a kernel function K : X × X → R {\displaystyle K\colon \mathbf {X} \times \mathbf {X} \to \mathbb {R} } that can be written as an n × n {\displaystyle n\times n} symmetric positive-definite matrix K {\displaystyle \mathbf {K} } . By the representer theorem, f ( x i ) = ∑ j = 1 n c j K i j , and ‖ f ‖ H 2 = ⟨ f , f ⟩ H = ∑ i = 1 n ∑ j = 1 n c i c j K ( x i , x j ) = c T K c . {\displaystyle f(x_{i})=\sum _{j=1}^{n}c_{j}\mathbf {K} _{ij},{\text{ and }}\|f\|_{\mathcal {H}}^{2}=\langle f,f\rangle _{\mathcal {H}}=\sum _{i=1}^{n}\sum _{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})=c^{T}\mathbf {K} c.} == Special properties of the hinge loss == The simplest and most intuitive loss function for categorization is the misclassification loss, or 0–1 loss, which is 0 if f ( x i ) = y i {\displaystyle f(x_{i})=y_{i}} and 1 if f ( x i ) ≠ y i {\displaystyle f(x_{i})\neq y_{i}} , i.e. the Heaviside step function on − y i f ( x i ) {\displaystyle -y_{i}f(x_{i})} . However, this loss function is not convex, which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0–1 loss. The hinge loss, V ( y i , f ( x i ) ) = ( 1 − y f ( x ) ) + {\displaystyle V{\big (}y_{i},f(x_{i}){\big )}={\big (}1-yf(x){\big )}_{+}} , where ( s ) + = max ( s , 0 ) {\displaystyle (s)_{+}=\max(s,0)} , provides such a convex relaxation. In fact, the hinge loss is the tightest convex upper bound to the 0–1 misclassification loss function, and with infinite data returns the Bayes-optimal solution: f b ( x ) = { 1 , p ( 1 ∣ x ) > p ( − 1 ∣ x ) , − 1 , p ( 1 ∣ x ) < p ( − 1 ∣ x ) . {\displaystyle f_{b}(x)={\begin{cases}1,&p(1\mid x)>p(-1\mid x),\\-1,&p(1\mid x) Read more →

  • How to Choose an AI Image Generator

    How to Choose an AI Image Generator

    Shopping for the best AI image generator? An AI image generator is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI image generator slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Enonic XP

    Enonic XP

    Enonic XP is a free and open-source content platform. Developed by the Norwegian software company Enonic, the platform can be used to build websites, progressive web applications, or web-based APIs. Enonic XP uses an application framework for coding server logic with JavaScript, and has no need for SQL as it ships with an integrated content repository. The CMS is fully decoupled, meaning developers can create traditional websites and landing pages, or use XP in headless mode, that is without the presentation layer, for loading editorial content onto any device or client. Enonic is used by major organizations in Norway, including the national postal service Norway Post, the insurance company Gjensidige, the Norwegian Labour and Welfare Administration, and all the top football clubs in the national football league for men, Eliteserien. == Overview == Enonic XP ships with the content management system (CMS) Content Studio. This includes a visual drag and drop editor, a landing page editor, support for multi-site and multi-language, media and structured content, advanced image editing, responsive user interface, permissions and roles management, revision and version control, and bulk publishing. Integrations and applications can be directly installed via the "Applications" section in XP, where the platform finds apps approved in the official Enonic Market. There are no third-party databases in Enonic XP. Instead, the developers have built a distributed storage repository, avoiding the need to index content. The system brings together capabilities from Filesystem, NoSQL, document stores, and search in the storage technology, which automatically indexes everything put into the storage. Enonic XP supports deployment of server side JavaScript. The open-source framework runs on top of a JVM (Java virtual machine), and allows developers to run the same code in the browser and on the server, thus enabling them to employ JavaScript. While running on the Java virtual machine, Enonic XP can be deployed on most infrastructures. The dependency on a third-party application server to deploy code has been removed, as the platform is an application server by default. A developer can for instance insert his own modules and code straight into the system while it is running. JavaScript unifies all the technical elements, and Enonic XP features a MVC framework where everything on the back-end can be coded with server-side JavaScript. The Enonic platform can use any template engine. === Progressive web apps === Another feature of Enonic XP is the possibility for developers to create progressive web apps (PWA). A PWA is a web application that is a regular web page or website, but can appear to the user like a mobile application. === Headless CMS and integrations === Enonic XP is headless, which means it separates content and presentation. The platform supports GraphQL, provides several default APIs, and allows for building custom APIs through the Guillotine starter kit. Consequently, Enonic supports modern front-end frameworks, and offers integrations with e.g. Next.js and React. == History == Enonic AS was founded in 2000 by Morten Øien Eriksen and Thomas Sigdestad. The software company specialized in building services and solutions, including a content management system known as "Vertical Site", then "Enonic CMS". Being aware that they had application, database, and website teams working on separate silos toward the same goal, Enonic sought to combine the different elements into a single software. The resulting application platform Enonic XP, first released in 2015, includes a CMS as an optional surface layer. In March 2020, Enonic XP was ranked by SoftwareReviews, a division of Info-Tech Research Group, a Canadian IT research and analyst firm, as the "Leader" in Web Experience Management. The ranking is based on user reviews, and is featured in SoftwareReviews‘ Digital Experience Data Quadrant Report, a comprehensive evaluation and ranking of leading Web Experience Management vendors. Enonic was also ranked first in 2021 and 2022. === Release history === Enonic XP assumed the mantle from the previous content management system Enonic CMS, and thus began with "version 5.0.0." The following list only contains major releases. == Development and support == Enonic offers a user and developer community consisting of a forum, support system with tickets, documentation, codex, learning and training center with certifications, and various community groups. Writing about the support system, Mike Johnston of CMS Critic notes that "enterprise customers obviously get access to a higher level of personalized support, where the Enonic support team can respond as fast as two hours." The support system is divided in three levels: silver, gold and platinum—from next day business support to 24/7 support. As Enonic XP is open-source, known vulnerabilities, bugs and issues are listed on GitHub.

    Read more →
  • Ben Goertzel

    Ben Goertzel

    Ben Goertzel is a computer scientist, artificial intelligence (AI) researcher, and businessman. He helped popularize the term artificial general intelligence (AGI). == Early life and education == Three of Goertzel's Jewish great-grandparents immigrated to New York from Lithuania and Poland (in the Russian Empire). Goertzel's father is Ted Goertzel, a former professor of sociology at Rutgers University. Goertzel left high school after the tenth grade to attend Bard College at Simon's Rock, where he graduated with a bachelor's degree in Quantitative Studies. Goertzel graduated with a PhD in mathematics from Temple University under the supervision of Avi Lin in 1990, at age 23. == Career == Goertzel is the founder and CEO of SingularityNET, a project which was founded to distribute artificial intelligence data via blockchains. He is a leading developer of the OpenCog framework for artificial general intelligence. Goertzel was an associate and grant recipient of Jeffrey Epstein. He received a $100,000 grant from the Jeffrey Epstein Foundation for artificial general intelligence research in 2001. When interviewed by The New York Times about Epstein in 2019, Goertzel said, "I have no desire to talk about Epstein right now... The stuff I'm reading about him in the papers is pretty disturbing and goes way beyond what I thought his misdoings and kinks were. Yecch." === Sophia the Robot === Goertzel was the Chief Scientist of Hanson Robotics, the company that created the Sophia robot. As of 2018, Sophia's architecture includes scripting software, a chat system, and OpenCog, an AI system designed for general reasoning. Experts in the field have treated the project mostly as a PR stunt, stating that Hanson's claims that Sophia was "basically alive" are "grossly misleading" because the project does not involve AI technology, while computer scientist Yann LeCun, then Meta's chief AI scientist, made several unflattering remarks including calling the project "complete bullshit". === Views on AI === In May 2007, Goertzel spoke at a Google tech talk about his approach to creating artificial general intelligence. He defines intelligence as the ability to detect patterns in the world and in the agent itself, measurable in terms of emergent behavior of "achieving complex goals in complex environments". A "baby-like" artificial intelligence is initialized, then trained as an agent in a simulated or virtual world such as Second Life to produce a more powerful intelligence. Knowledge is represented in a network whose nodes and links carry probabilistic truth values as well as "attention values", with the attention values resembling the weights in a neural network. Several algorithms operate on this network, the central one being a combination of a probabilistic inference engine and a custom version of evolutionary programming. The 2012 documentary The Singularity by independent filmmaker Doug Wolens discussed Goertzel's views on AGI. In 2023 Goertzel postulated that artificial intelligence could replace up to 80 percent of human jobs in the coming years "without having an AGI, by my guess. Not with ChatGPT exactly as a product. But with systems of that nature". At the Web Summit 2023 in Rio de Janeiro, Goertzel spoke out against efforts to curb AI research and that AGI is only a few years away. Goertzel's belief is that AGI will be a net positive for humanity by assisting with societal problems such as, but not limited to, climate change.

    Read more →
  • Nicholas Carlini

    Nicholas Carlini

    Nicholas Carlini is an American researcher affiliated with Anthropic and previously with Google DeepMind who has published research in the fields of computer security and machine learning. He is known for his work on adversarial machine learning, particularly his work on the Carlini & Wagner attack in 2016. This attack was particularly useful in defeating defensive distillation, a method used to increase model robustness, and has since been effective against other defenses against adversarial input. In 2018, Carlini demonstrated an attack on Mozilla's DeepSpeech model, showing that hidden commands could be embedded in speech inputs, which the model would execute even if they were inaudible to humans. He also led a team at UC Berkeley that successfully broke seven out of nine defenses against adversarial attacks presented at the 2018 International Conference on Learning Representations. In addition to his work on adversarial attacks, Carlini has made significant contributions to understanding the privacy risks of machine learning models. In 2020, he revealed that large language models, like GPT-2, could memorize and output personally identifiable information. His research demonstrated that this issue worsened with larger models, and he later showed similar vulnerabilities in generative image models, such as Stable Diffusion. == Life and career == Nicholas Carlini obtained his Bachelor of Arts in Computer Science and Mathematics from the University of California, Berkeley, in 2013. He then continued his studies at the same university, where he pursued a PhD under the supervision of David Wagner, completing it in 2018. Carlini became known for his work on adversarial machine learning. In 2016, he worked alongside Wagner to develop the Carlini & Wagner attack, a method of generating adversarial examples against machine learning models. The attack was proved to be useful against defensive distillation, a popular mechanism where a student model is trained based on the features of a parent model to increase the robustness and generalizability of student models. The attack gained popularity when it was shown that the methodology was also effective against most other defenses, rendering them ineffective. In 2018, Carlini demonstrated an attack against Mozilla Foundation's DeepSpeech model where he showed that by hiding malicious commands inside normal speech input the speech model would respond to the hidden commands even when the commands were not discernible by humans. In the same year, Carlini and his team at UC Berkeley showed that out of the 11 papers presenting defenses to adversarial attacks accepted in that year's ICLR conference, seven of the defenses could be broken. Since 2021, he and his team have been working on large language models, creating a questionnaire where humans typically scored 35% whereas AI models scored in the 40%, with GPT-3 getting 38% which could be improved to 40% through few shot prompting. The best performer in the test was UnifiedQA, a model developed by Google specifically for answer questions and answer sets. Carlini has also developed methods to cause large language models like ChatGPT to answer harmful questions like how to construct bombs. He is also known for his work studying the privacy of machine learning models. In 2020, he showed for the first time that large language models would memorize some of the text data that they were trained on. For example, he found that GPT-2 could output personally identifiable information. He then led an analysis of larger models and studied how memorization increased with model size. Then, in 2022 he showed the same vulnerability in generative image models, and specifically diffusion models, by showing that Stable Diffusion could output images of people's faces that it was trained on. Following on this, Carlini then showed that ChatGPT would also sometimes output exact copies of webpages it was trained on, including personally identifiable information. Some of these studies have since been referenced by the courts in debating the copyright status of AI models. == Other work == Carlini received the Best of Show award at the 2020 IOCCC for implementing a tic-tac-toe game entirely with calls to printf, expanding on work from a research paper of his from 2015. The judges commented on his submission "This year's Best of Show (carlini) is such a novel way of obfuscation that it would be worth of a special mention in the (future) Best of IOCCC list!". [sic] == Awards == Best Student Paper Award, IEEE S&P 2017 ("Towards Evaluating the Robustness of Neural Networks") Best Paper Award, ICML 2018 ("Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples") Distinguished Paper Award, USENIX 2021 ("Poisoning the Unlabeled Dataset of Semi-Supervised Learning") Distinguished Paper Award, USENIX 2023 ("Tight Auditing of Differentially Private Machine Learning") Best Paper Award, ICML 2024 ("Stealing Part of a Production Language Model") Best Paper Award, ICML 2024 ("Considerations for Differentially Private Learning with Large-Scale Public Pretraining")

    Read more →
  • The Best Free AI Paraphrasing Tool for Beginners

    The Best Free AI Paraphrasing Tool for Beginners

    Trying to pick the best AI paraphrasing tool? An AI paraphrasing tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI paraphrasing tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Agent verification

    Agent verification

    Agent verification is activity to gain assurances that purposeful artificial constructs act in accordance with their specifications. While primitive forms of inorganic agents have been used in manufacturing for centuries, the study of artificial agents did not begin until the mid 20th century. Foundational work on such agents was closely bound with the emergence of artificial intelligence as an academic discipline. Early agents deployed for industrial control systems and in computing were often controlled by quite simple logic however, not involving artificial intelligence as such. When deployed as part of a multi-agent system, even such simple agents could require special agent orientated testing methods, as their collective behaviour was challenging to verify with traditional testing techniques. Difficulties in providing assurances that agents will not behave in dangerous ways became more prevalent after the introduction of LLM agents, especially after the rapid acceleration of their deployment in 2025. The verification of agent behaviour can be conducted by formal or informal methods. Informal verification requires less mathematical skill. But when agents are part of systems where errors have significant risks — such as danger to human life, environmental damage or major financial loss — formal verification is preferred. Both regulators and system designers themselves like formal verification as it provides a high degree of mathematical certainty. It is not however always possible to formally test all aspects of an agent based system's behaviour, especially where newer LLM based agents are concerned, due in part to their high degree of autonomy. Accordingly, agent verification for low impact deployments might be carried out only with informal methods, while for high impact deployments, it may be performed with a mix of formal and informal techniques. == Terminology == In academia, the term agent verification is often defined to mean activity concerned with gaining assurance that the agent behaves in accordance with its specification - whether by processes such as testing or simulation. 'Verification' is typically contrasted with 'validation', the latter meaning activity concerned with checking that the specification itself meets user or real world needs. Such definitions are not universally adhered to however - for example, in some workplaces and documents, the words 'verification' and 'validation' can be used synonymously. Efforts to gain confidence in Agents have intensified sharply since 2025 due to the rapid roll out of LLM agents; different terms are sometimes used in the commercial sector. Here the term 'agent verification' can be used in the same sense as it is in academia, but sometimes the same activity can be covered by more ambiguous and wider ranging terms such as 'Agent governance' , 'Agent observability' or 'AI agent policing'. == History == === Classical agents === The theoretical underpinnings for artificial (inorganic) agents emerged in the mid 20th century, with establishment of cybernetics and artificial intelligence. Oliver Selfridge's 1958 Pandemonium - A Paradigm for Learning paper was an important early theoretical contribution in establishing agent oriented architecture. Practical implementations of agents for real world applications began to become widespread in the 1990s, after the introduction of the belief–desire–intention software model (BDI), and agent-oriented programming. Pure digital agents were deployed in computer infrastructure for purposes such as monitoring, while agents connected to real-world sensors and actuators were increasingly used in industrial control systems. While the concept of artificial agents was interwoven with early artificial intelligence studies right from the start, early agents lacked general purpose reasoning capabilities, often only having simple if then logic. Even a device as simple as a thermostat, which has a sensor and a means of acting, can be considered a proto agent in this sense. Verifying the behaviours of a simple single agent system is not generally especially difficult, but it can be a different matter when several simple agents coexist in the same system. Craig Reynolds's work on boids showed that relatively complex, "intelligent" behaviour can emerge from a number of such simple agents working together in a Multi-agent system (MAS). By the 1990s, even the behaviour of a single agent system could sometimes be quite complex; in accordance with the Belief–desire–intention software model, agents could have believes that might evolve over time. Agents were increasingly introduced that were controlled by quite large decision tree models, which had new vulnerabilities to adversarial attack. It was becoming increasingly apparent that traditional software verification methods had limitations for testing such agents, or even for the more primitive type of agents when they were deployed as part of a MAS. It was the use of agents for industrial control systems, sometimes associated with robotics, that lent urgency to the practice of agent verification. Informal testing might be acceptable for digital agents used say to monitor whether each of an organisation's computers are properly licensed. But with an increasing potential for faulty agents to result in a failure that might cause a large fire to break out at a chemical manufacturing plant, a botched medical operation, or even a crashed aircraft, the need to develop reliable means of verifying behaviour of such agents was considered urgent. The Foundation for Intelligent Physical Agents was established in 1996. From the late 90s, a growing number of industry and university based scientists began working on the problem, with researchers publishing papers on the verification of both single and multi agent systems. Much of this work showed how formal verification techniques like model checking could be used to gain a high level of assurance that agent based systems would conform with their specification. A 2018 systematic review covering 231 studies found that model checking was the most common technique for agent verification, with theorem proving the second most commonly used formal verification method. In the first two decades of the 20th century, agents run by AI became more common, with Siri and Alexa being well known examples. But such agents still lacked general reasoning capabilities and did not pose new pressing problems for agent verification. === General purpose reasoning agents === The advent of LLMs created huge potential for further use of artificial agents, as agents based on them could have general purpose cognitive abilities. Agents run by LLMs (and occasionally non-LLM foundation models) have similar vulnerability to adversarial attack as those run by decision tree models. The wider scope of actions for LLM agents has created new challenges for their verification, over and above those present for classical agents. For example, the LLM's neural network endows it with infinite domains, an especial challenge for traditional formal verification techniques. Academics began to study the problems involved in verifying LLM agents from 2018. Deployment of such agents began to accelerate in late 2023 after OpenAI's "function-calling" API was made available, and especially after Anthropic's late 2024 introduction of Model Context Protocol (MCP), a standardised way for LLM agents to gain contextual awareness, and to act on the world by calling various external tools. The rapid rollout of LLM agents following MCP's release has seen the task of agent verification receive increased attention within academia, and also from the private sector. In 2024 and 2025 several startups focusing on LLM agent verification have been founded in both Europe and the US to meet growing demand. == Approaches == === Formal verification === Formal verification involves proving the correctness of some or all aspects of a system using mathematical methods. Such methods can range from manual formal proof, to verification assisted with automated theorem provers like Isabelle. For agent verification, model checking is by far the most frequently used formal verification method; for pre-LLM models it was often complemented with techniques using computation tree logic. Another common method is theorem proving. Formal verification provides a higher degree of confidence than informal methods, but it is not always used, even when it is possible. Sometimes a person or organisation developing software agents won't have the necessary skills, or may not see it as worth the effort if the agent(s) will not have the ability to cause much harm even if they malfunction. When agents are deployed in systems where errors could have serious consequences, the ability of formal verification methods to provide mathematical certainty tends to be strongly preferred by both regulators and designers themselves. But even for high impact systems, formal verificatio

    Read more →
  • Bob Coecke

    Bob Coecke

    Bob Coecke (born 23 July 1968) is a Belgian theoretical physicist and logician. He was Professor of Quantum foundations, Logics, and Structures at Oxford University until 2020. He was Chief Scientist at quantum computing company Quantinuum, until 2025 and founded a startup called Relational Intelligence in 2026. He is also Distinguished Visiting Research Chair at the Perimeter Institute for Theoretical Physics, and Emeritus Fellow at Wolfson College, Oxford. He pioneered categorical quantum mechanics (entry 18M40 in Mathematics Subject Classification 2020), Quantum Picturalism, ZX-calculus, DisCoCat model for natural language,, quantum natural language processing (QNLP) and quantum education through the book Quantum in Pictures. He is a founder of the Quantum Physics and Logic community and the Applied Category Theory communities and conference series, and of the journal Compositionality. Coecke is also a composer and musician, who has been called a pioneer of industrial music, and is also one of the pioneers of employing quantum computers in music. == Education and career == Coecke obtained his doctorate in sciences at the Vrije Universiteit Brussel in 1996, and performed postdoctoral work in the Theoretical Physics Group of Imperial College, London in the Category Theory Group of the Mathematics and Statistics Department at McGill University in Montreal, in the Department of Pure Mathematics and Mathematical Statistics of Cambridge University, and in the Department of Computer Science, University of Oxford. He was an EPSRC Advanced Research Fellow at the Department of Computer Science, University of Oxford, where he became Lecturer in Quantum Computer Science in 2007, and jointly with Samson Abramsky built and headed the Quantum Group. In July 2011, he was nominated professor of Quantum Foundations, Logics and Structures at Oxford University, with retroactive effect as of October 2010. He was a Governing Body Fellow of Wolfson College, Oxford since 2007, where he now is an Emeritus Fellow. In January 2019, Coecke became Senior Scientific Advisor of Cambridge Quantum Computing, and in January 2021 he resigned from his Professorship at Oxford, to become Chief Scientist of Cambridge Quantum Computing. After the merger of Cambridge Quantum Computing with Honeywell Quantum Systems, he stayed on as Chief Scientist of the joint entity Quantinuum until 2025. In January 2023 he also became Distinguished Visiting Research Chair at the Perimeter Institute for Theoretical Physics. == Work == Coecke's research focuses on the foundations of physics, more particularly category theory, logic, and diagrammatic reasoning, with application to quantum informatics, quantum gravity, and NLP. He has pioneered categorical quantum mechanics together with Samson Abramsky, and spearheaded the development of a diagrammatic quantum formalism based on Penrose graphical notation, on which he wrote a textbook entitled Picturing Quantum Processes with Aleks Kissinger. With Ross Duncan he pioneered ZX-calculus. He pioneered the DisCoCat model for natural language, with Stephen Clark and Mehrnoosh Sadrzadeh. He also pioneered quantum natural language processing (QNLP), with Will Zeng, and colleagues at Cambridge Quantum Computing. == Music == Coecke is also a musician, performing and recording since the eighties. He retrospectively has been named a pioneer of industrial music. His band, Black Tish, "used cutting edge sampling techniques for the time, a host of synth and sound loops and metal-style guitars to create a heavy rock/electronica fusion unlike anything heard before", and "bridge the gap between the pure experimental nature of bands like Throbbing Gristle and Einstürzende Neubauten and the (comparatively) more radio accessible Ministry or Nine Inch Nails". Coecke is also one of the pioneers of employing quantum computers in music. == Selected publications == Textbooks Bob Coecke, Aleks Kissinger:Picturing Quantum Processes. A First Course in Quantum Theory and Diagrammatic Reasoning, Cambridge University Press, 2017, ISBN 978-1316219317 Bob Coecke, Stefano Gogioso:Quantum in Pictures, Quantinuum, 2022, ISBN 978-1-7392147-1-5 Books (as editor) Bob Coecke, David Moore, Alexander Wilce (eds.): Current Research in Operational Quantum Logic: Algebras, Categories, Languages, Fundamental Theories of Physics, Kluwer Academic, 2010, ISBN 978-9048154371 Bob Coecke (ed.): New Structures for Physics, Lecture Notes in Physics 813, Springer, 2011, ISBN 978-3642128202 Articles Bob Coecke: Kindergarten quantum mechanics, arXiv:quant-ph/0510032 Samson Abramsky, Bob Coecke: A categorical semantics of quantum protocols, Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004, pp. 415–425 Bob Coecke, Ross Duncan: Interacting quantum observables, Automata, Languages and Programming, pp. 298–310, 2008 Konstantinos Meichanetzidis, Alexis Toumi, Giovanni de Felice, Bob Coecke: Grammar-Aware Question-Answering on Quantum Computers, arXiv:2012.03756 Bob Coecke: The Mathematics of Text Structure, arXiv:1904.03478 Will Zeng, Bob Coecke: Quantum Algorithms for Compositional Natural Language Processing, arXiv:1608.01406 Bob Coecke, Tobias Fritz, Robert Spekkens: A mathematical theory of resources, arXiv:1409.5531 Bob Coecke: An Alternative Gospel of structure: order, composition, processes, arxiv:1307.4038 Bob Coecke, Mehrnoosh Sadrzadeh, Steven Clark: Mathematical Foundations for a Compositional Distributional Model of Meaning, arXiv:1003.4394 Bob Coecke: Quantum Picturalism, arXiv:0908.1787 Software articles Eduardo Reck Miranda, Richie Yeung, Anna Pearson, Konstantinos Meichanetzidis, Bob Coecke: A quantum natural language processing approach to musical intelligence, arXiv:2111.06741 Dimitri Kartsaklis, Ian Fan, Richie Yeung, Anna Pearson, Robin Lorenz, Alexis Toumi, Giovanni de Felice, Konstantinos Meichanetzidis, Stephen Clark, Bob Coecke: lambeq: An efficient high-level python library for quantum NLP, arXiv:2110.04236 Giovanni de Felice, Alexis Toumi, Bob Coecke: Discopy: monoidal categories in Python, arXiv:2111.06741

    Read more →
  • Linguistics Research Center at UT Austin

    Linguistics Research Center at UT Austin

    The Linguistics Research Center (LRC) at the University of Texas is a center for computational linguistics research & development. It was directed by Prof. Winfred Lehmann until his death in 2007, and subsequently by Dr. Jonathan Slocum. Since its founding, virtually all projects at the LRC have involved processing natural language texts with the aid of computers. The principal activities of the Center at present focus on Indo-European languages and comprise historical study, lexicography, and web-based teaching; staff members engage in several independent but often complementary projects in these fields using a variety of software, almost all of it developed in-house. == History == The LRC was founded by Winfred Lehmann in 1961. In the early days, research efforts at the LRC concentrated on machine translation (MT) -- the translation of texts from one human language to another with the aid of computers, very developed nowadays in the field of language industry—funded by the USAF and other sponsors. The LRC concentrated on German English translation, though a copy of the Russian Master Dictionary was deposited at the LRC after the ALPAC report. After a general hiatus ca. 1975-78, new funding led to the development by Jonathan Slocum and others of a new system with the same name (the METAL MT system), but with new sets of tools for linguists and vastly greater success, resulting in the delivery a production prototype then later a full-fledged commercial MT system. MT R&D continued at the LRC, with funding by various sponsors, until well into the 1990s. From its early years to the present, the LRC has mounted a number of smaller projects resulting in the publication of significant works relating to Indo-European languages and/or their common ancestor, Proto-Indo-European. The hallmark of this work has been the use of computers to transcribe texts and prepare them for publication. A prominent example of the LRC using computers to prepare texts for print publication is the book by Winfred P. Lehmann, A Gothic Etymological Dictionary (Leiden: Brill, 1986). The final print-ready version was produced with the aid of a laser printer (exotic new technology, in those days) using, for the various languages included in the entries, approximately 500 special characters—many of them designed at the Center. This was the first major etymological dictionary for Indo-European languages to be produced with the aid of computers. Current LRC projects have concentrated on transcribing early Indo-European texts, developing language lessons based on them, and publishing on the web these and other materials related to the study of Indo-European languages, of their common ancestor Proto-Indo-European, and of historical linguistics more generally. == Alumni == Winfred Lehmann Rolf A. Stachowitz Jonathan Slocum Winfield S. Bennett John White

    Read more →
  • Vector quantization

    Vector quantization

    Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensional data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression. It can also be used for lossy data correction and density estimation. Vector quantization is based on the competitive learning paradigm, so it is closely related to the self-organizing map model and to sparse coding models used in deep learning algorithms such as autoencoder. == Training == One simple training algorithm for vector quantization is: Pick a sample point at random Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance Repeat A more sophisticated algorithm reduces the bias in the density matching estimation and ensures that all points are used, by including an extra sensitivity parameter: Increase each centroid's sensitivity s i {\displaystyle s_{i}} by a small amount Pick a sample point P {\displaystyle P} at random For each quantization vector centroid c i {\displaystyle c_{i}} , let d ( P , c i ) {\displaystyle d(P,c_{i})} denote the distance of P {\displaystyle P} and c i {\displaystyle c_{i}} Find the centroid c i {\displaystyle c_{i}} for which d ( P , c i ) − s i {\displaystyle d(P,c_{i})-s_{i}} is the smallest Move c i {\displaystyle c_{i}} towards P {\displaystyle P} by a small fraction of the distance Set s i {\displaystyle s_{i}} to zero Repeat It is desirable to use a cooling schedule to produce convergence: see Simulated annealing. Another simple method is LBG, which is based on k-means. The algorithm can be iteratively updated with "live" data, rather than by picking random points from a data set, but this will introduce some bias if the data are temporally correlated over many samples. == Applications == Vector quantization is used for lossy data compression, lossy data correction, pattern recognition, density estimation and clustering. Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid. For density estimation, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm). === Use in data compression === Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data has errors that are inversely proportional to density. The transformation is usually done by projection or by using a codebook. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix coded variable-length encoded value as its output. The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a k-dimensional vector [ x 1 , x 2 , . . . , x k ] {\displaystyle [x_{1},x_{2},...,x_{k}]} of amplitude levels. It is compressed by choosing the nearest matching vector from a set of n-dimensional vectors [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} , with n < k. All possible combinations of the n-dimensional vector [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} form the vector space to which all the quantized vectors belong. Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression. Twin vector quantization (VQF) is part of the MPEG-4 standard dealing with time domain weighted interleaved vector quantization. === Video codecs based on vector quantization === Bink video Cinepak Daala is transform-based but uses pyramid vector quantization on transformed coefficients Digital Video Interactive: Production-Level Video and Real-Time Video Indeo Microsoft Video 1 QuickTime: Apple Video (RPZA) and Graphics Codec (SMC) Sorenson SVQ1 and SVQ3 Smacker video VQA format, used in many games The usage of video codecs based on vector quantization has declined significantly in favor of those based on motion compensated prediction combined with transform coding, e.g. those defined in MPEG standards, as the low decoding complexity of vector quantization has become less relevant. === Audio codecs based on vector quantization === AMR-WB+ CELP CELT (now part of Opus) is transform-based but uses pyramid vector quantization on transformed coefficients Codec 2 DTS G.729 iLBC Ogg Vorbis TwinVQ === Use in pattern recognition === VQ was also used in the eighties for speech and speaker recognition. Recently it has also been used for efficient nearest neighbor search and on-line signature recognition. In pattern recognition applications, one codebook is constructed for each class (each class being a user in biometric applications) using acoustic vectors of this user. In the testing phase the quantization distortion of a testing signal is worked out with the whole set of codebooks obtained in the training phase. The codebook that provides the smallest vector quantization distortion indicates the identified user. The main advantage of VQ in pattern recognition is its low computational burden when compared with other techniques such as dynamic time warping (DTW) and hidden Markov model (HMM). The main drawback when compared to DTW and HMM is that it does not take into account the temporal evolution of the signals (speech, signature, etc.) because all the vectors are mixed up. In order to overcome this problem a multi-section codebook approach has been proposed. The multi-section approach consists of modelling the signal with several sections (for instance, one codebook for the initial part, another one for the center and a last codebook for the ending part). === Use as clustering algorithm === As VQ is seeking for centroids as density points of nearby lying samples, it can be also directly used as a prototype-based clustering method: each centroid is then associated with one prototype. By aiming to minimize the expected squared quantization error and introducing a decreasing learning gain fulfilling the Robbins-Monro conditions, multiple iterations over the whole data set with a concrete but fixed number of prototypes converges to the solution of k-means clustering algorithm in an incremental manner. === Generative adversarial networks (GAN) === VQ has been used to quantize a feature representation layer in the discriminator of generative adversarial networks. The feature quantization (FQ) technique performs implicit feature matching. It improves the GAN training, and yields an improved performance on a variety of popular GAN models: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation.

    Read more →
  • CLAWS (linguistics)

    CLAWS (linguistics)

    The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96–97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus. == History == A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. Developed in the early 1980s, CLAWS was built to fill the ever-growing gap created by always-changing POS necessities. Originally created to add part-of-speech tags to the LOB corpus of British English, the CLAWS tagset has since been adapted to other languages as well, including Urdu and Arabic. Since its inception, CLAWS has been hailed for its functionality and adaptability. Still, it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. Ambiguity arises in cases such as with the word flies, and whether it should be classified as a noun or a verb. It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure. == Rules and processing == CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words in anticipating each part-of-speech label. === Sample output === This excerpt from Bram Stoker's Dracula (1897) has been tagged using both the CLAWS C5 and C7 tagsets. This is what a CLAWS output will generally look like, with the most likely part-of-speech tag following each word. == Tagsets == === CLAWS1 tagset === The first tagset developed in CLAWS, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to Brown Corpus tags. See Table of tags in C1 tagset here. === CLAWS2 tagset === From 1983 to 1986, updated versions leading to CLAWS2 were part of a larger attempt to deal with aspects such as recognizing sentence breaks, in order to avoid the need for manual pre-processing of a text before the tags were applied, moving instead to optional manual post-editing to adjust the output of the automatic annotation, if needed. The CLAWS2 tagset has 166 word tags. See Table of tags in C2 tagset here. === CLAWS4 tagset === The CLAWS4 was used for the 100-million-word British National Corpus (BNC). A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets. For example, the BNC project used two tagset versions: "a main tagset (C5) with 62 tags with which the whole of the corpus has been tagged, and a larger (C7) tagset with 152 tags, which has been used to make a selected 'core' sample corpus of two million words." The latest version of CLAWS4 is offered by UCREL, a research center of Lancaster University. === CLAWS5 tagset === The CLAWS5 tagset, which was used for BNC, has over 60 tags. See Table of tags in C5 tagset here. === CLAWS6 tagset === The CLAWS6 tagset was used for the BNC sampler corpus and the COLT corpus. It has over 160 tags, including 13 determiner subtypes. See Table of tags in C6 tagset here. === CLAWS7 tagset === The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset. See Table of tags in C7 tagset here. === CLAWS8 tagset === CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as 37 new auxiliary tags for forms of be, do, and have. See Table of tags in C8 tagset here

    Read more →
  • Frederick J. Damerau

    Frederick J. Damerau

    Frederick J. Damerau (December 25, 1931 – January 27, 2009) was a pioneer of research on natural language processing and data mining. After earning his B.A. from Cornell University in 1953, he spent most of his career at IBM, in the Thomas J. Watson Research Center. He holds a PhD from Yale University. One of his most influential and ground-breaking papers was "A technique for computer detection and correction of spelling errors" published in 1964. He also developed and patented for IBM the first algorithm for placing hyphens automatically in words. In 1971 he published the book "Markov Models and Linguistic Theory : An Experimental Study of a Model for English." After being active in research for over four decades, Fred Damerau died on January 27, 2009.

    Read more →
  • Vera Demberg

    Vera Demberg

    Vera Demberg (born 1981) is a German computational linguist and professor of computer science and computational linguistics at Saarland University. Her research interests include cognitive models of human language comprehension, natural language generation, experimental psycholinguistics, multimodal language processing in a dual-task setting, and experimental and computational discourse research and pragmatics. == Career and research == Vera Demberg studied computational linguistics at the Institute for Machine Language Processing at the University of Stuttgart from 2001 to 2006. She then completed a Master's degree in Artificial Intelligence at the University of Edinburgh from 2004 to 2005. She received her Ph.D. from the Department of Computer Science there from 2006 to 2010. Her dissertation paper, titled “Broad-Coverage Model of Prediction in Human Sentence Processing”, was awarded the Cognitive Science Society's “Glushko Dissertation Prize in Cognitive Science” in 2011. In her work, she designed a model of human sentence processing that can be used to predict difficulties in processing at the syntactic level. From 2010 to 2016, Vera Demberg led an independent research group on cognitive models of human language processing and their application to speech dialog systems in the Cluster of Excellence “Multimodal Computing and Interaction” at the University of Saarland. In 2016, she was appointed there to a professorship in computer science and computational linguistics. Demberg's professorship is in the Department of Computer Science (Faculty of Mathematics and Computer Science). She is also a co-opted professor in the Department of Linguistics and Language Technology (Faculty of Philosophy). Since 2020, she has led the ERC Starting Grant “Individualized Interaction in Discourse”. The project conducts research on how to make linguistic interaction with computer systems more natural. She has authored and co-authored numerous papers on the study of computational linguistics and natural language processing. According to Google Scholar, Vera Demberg has an H-index of 30. == Publications == Vera Demberg has authored more than 200 papers; please refer to her scholar page at https://scholar.google.com/citations?user=l2CFSAMAAAAJ == Awards == 2011: Cognitive Science Society Glushko Dissertation Prize in Cognitive Science 2020: ERC Starting Grant “Individualized Interaction in Discourse” 2024: Member of the Academy of Sciences and Literature

    Read more →