Vector quantization

Vector quantization

Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensional data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression. It can also be used for lossy data correction and density estimation. Vector quantization is based on the competitive learning paradigm, so it is closely related to the self-organizing map model and to sparse coding models used in deep learning algorithms such as autoencoder. == Training == One simple training algorithm for vector quantization is: Pick a sample point at random Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance Repeat A more sophisticated algorithm reduces the bias in the density matching estimation and ensures that all points are used, by including an extra sensitivity parameter: Increase each centroid's sensitivity s i {\displaystyle s_{i}} by a small amount Pick a sample point P {\displaystyle P} at random For each quantization vector centroid c i {\displaystyle c_{i}} , let d ( P , c i ) {\displaystyle d(P,c_{i})} denote the distance of P {\displaystyle P} and c i {\displaystyle c_{i}} Find the centroid c i {\displaystyle c_{i}} for which d ( P , c i ) − s i {\displaystyle d(P,c_{i})-s_{i}} is the smallest Move c i {\displaystyle c_{i}} towards P {\displaystyle P} by a small fraction of the distance Set s i {\displaystyle s_{i}} to zero Repeat It is desirable to use a cooling schedule to produce convergence: see Simulated annealing. Another simple method is LBG, which is based on k-means. The algorithm can be iteratively updated with "live" data, rather than by picking random points from a data set, but this will introduce some bias if the data are temporally correlated over many samples. == Applications == Vector quantization is used for lossy data compression, lossy data correction, pattern recognition, density estimation and clustering. Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid. For density estimation, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm). === Use in data compression === Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data has errors that are inversely proportional to density. The transformation is usually done by projection or by using a codebook. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix coded variable-length encoded value as its output. The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a k-dimensional vector [ x 1 , x 2 , . . . , x k ] {\displaystyle [x_{1},x_{2},...,x_{k}]} of amplitude levels. It is compressed by choosing the nearest matching vector from a set of n-dimensional vectors [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} , with n < k. All possible combinations of the n-dimensional vector [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} form the vector space to which all the quantized vectors belong. Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression. Twin vector quantization (VQF) is part of the MPEG-4 standard dealing with time domain weighted interleaved vector quantization. === Video codecs based on vector quantization === Bink video Cinepak Daala is transform-based but uses pyramid vector quantization on transformed coefficients Digital Video Interactive: Production-Level Video and Real-Time Video Indeo Microsoft Video 1 QuickTime: Apple Video (RPZA) and Graphics Codec (SMC) Sorenson SVQ1 and SVQ3 Smacker video VQA format, used in many games The usage of video codecs based on vector quantization has declined significantly in favor of those based on motion compensated prediction combined with transform coding, e.g. those defined in MPEG standards, as the low decoding complexity of vector quantization has become less relevant. === Audio codecs based on vector quantization === AMR-WB+ CELP CELT (now part of Opus) is transform-based but uses pyramid vector quantization on transformed coefficients Codec 2 DTS G.729 iLBC Ogg Vorbis TwinVQ === Use in pattern recognition === VQ was also used in the eighties for speech and speaker recognition. Recently it has also been used for efficient nearest neighbor search and on-line signature recognition. In pattern recognition applications, one codebook is constructed for each class (each class being a user in biometric applications) using acoustic vectors of this user. In the testing phase the quantization distortion of a testing signal is worked out with the whole set of codebooks obtained in the training phase. The codebook that provides the smallest vector quantization distortion indicates the identified user. The main advantage of VQ in pattern recognition is its low computational burden when compared with other techniques such as dynamic time warping (DTW) and hidden Markov model (HMM). The main drawback when compared to DTW and HMM is that it does not take into account the temporal evolution of the signals (speech, signature, etc.) because all the vectors are mixed up. In order to overcome this problem a multi-section codebook approach has been proposed. The multi-section approach consists of modelling the signal with several sections (for instance, one codebook for the initial part, another one for the center and a last codebook for the ending part). === Use as clustering algorithm === As VQ is seeking for centroids as density points of nearby lying samples, it can be also directly used as a prototype-based clustering method: each centroid is then associated with one prototype. By aiming to minimize the expected squared quantization error and introducing a decreasing learning gain fulfilling the Robbins-Monro conditions, multiple iterations over the whole data set with a concrete but fixed number of prototypes converges to the solution of k-means clustering algorithm in an incremental manner. === Generative adversarial networks (GAN) === VQ has been used to quantize a feature representation layer in the discriminator of generative adversarial networks. The feature quantization (FQ) technique performs implicit feature matching. It improves the GAN training, and yields an improved performance on a variety of popular GAN models: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation.

Moving object detection

Moving object detection is a technique used in computer vision and image processing. Multiple consecutive frames from a video are compared by various methods to determine if any moving object is detected. Moving objects detection has been used for wide range of applications like video surveillance, activity recognition, road condition monitoring, airport safety, monitoring of protection along marine border, etc. == Definition == Moving object detection is to recognize the physical movement of an object in a given place or region. By acting segmentation among moving objects and stationary area or region, the moving objects' motion can be tracked and thus analyzed later. To achieve this, consider a video is a structure built upon single frames, moving object detection is to find the foreground moving target(s), either in each video frame or only when the moving target shows the first appearance in the video. == Traditional methods == Among all the traditional moving object detection methods, we could categorize them into four major approaches: Background subtraction, Frame differencing, Temporal Differencing, and Optical Flow. === Frame differencing === Instead of using traditional approach, to use image subtraction operator by subtracting second and images afterwards, the frame differencing method makes comparisons between two successive frames to detect moving targets. === Temporal differencing === The temporal differencing method identifies the moving object by applying pixel-wise difference method with two or three consecutive frames.

The Culture of Connectivity

The Culture of Connectivity: A Critical History of Social Media is a book by José van Dijck published by Oxford University Press in 2013 on social media platforms and their history. The author considers the histories of five social media platforms: Facebook, Twitter, Flickr, YouTube, and Wikipedia. She focuses on how their technological, social and cultural dimensions contribute to their current status.

Digital goods

Digital goods or e-goods are intangible goods that exist in digital form. Examples are Wikipedia articles; digital media, such as e-books, downloadable music, internet radio, internet television and streaming media; fonts, logos, photos and graphics; digital subscriptions; online ads (as purchased by the advertiser); internet coupons; electronic tickets; electronically treated documentation in many different fields; downloadable software (Digital Distribution) and mobile apps; cloud-based applications and online games; virtual goods used within the virtual economies of online games and communities; community access; workbooks; worksheets; planners; e-learning (online courses); webinars, video tutorials, blog posts; cards; patterns; website themes and templates. == Legal concerns about digital goods == Special legal concerns regarding digital goods include copyright infringement and taxation. Also the question of the ownership (versus licensed use or service only) of purely digital goods is not finally resolved. For instance, the software installers of the digital software distributor gog.com are technically independent to the account but are still subject to the EULA, where a "licensed, not sold" formulation is used. Therefore, it is not clear if the software can be legally used after a hypothetical loss of the account; a question which was also raised before in practice for the similar service Steam. In July 2012, the European Court of Justice ruled in the case UsedSoft GMbH v. Oracle International Corp. that the sale of a software product, either through a physical support or download, constituted a transfer of ownership in EU law, thus the first sale doctrine applies; the ruling thereby breaks the "licensed, not sold" legal theory, but leaves open numerous questions. Therefore, it is also permissible to resell software licenses even if the digital good has been downloaded directly from the Internet, as the first-sale doctrine applied whenever software was originally sold to a customer for an unlimited amount of time, thus prohibiting any software maker from preventing the resale of their software by any of their legitimate owners. The court requires that the previous owner must no longer be able to use the licensed software after the resale, but finds that the practical difficulties in enforcing this clause should not be an obstacle to authorizing resale, as they are also present for software which can be installed from physical supports, where the first-sale doctrine is in force. In several cases, content providers have faced criticism for revoking access to digital goods due to expired licenses or the discontinuation of a product, such as ebooks (which resulted in a lawsuit against Amazon.com, Inc.), digital video (with Sony Interactive Entertainment revoking access to purchased StudioCanal content from its now-defunct PlayStation video store; a similar move involving Warner Bros. Discovery content was averted by an updated license agreement), and video games (such as Ubisoft discontinuing and revoking access to its game The Crew without providing refunds or the ability to redownload the game) In September 2024, the U.S. state of California implemented a consumer protection law that prohibits the use of terms such as "buy" or "purchase" during transactions involving digital goods if there is no way to obtain the purchases in a manner that cannot be revoked by the seller (such as allowing it to be downloaded for permanent, offline access), and requires a disclaimer to be displayed to the customer at the time of purchase.

Glossary of operating systems terms

This page is a glossary of Operating systems terminology. == A == access token: In Microsoft Windows operating systems, an access token contains the security credentials for a login session and identifies the user, the user's groups, the user's privileges, and, in some cases, a particular application. == B == binary semaphore: See semaphore. booting: In computing, booting (also known as booting up) is the initial set of operations that a computer performs after electrical power is switched on or when the computer is reset. This can take tens of seconds and typically involves performing a power-on self-test, locating and initializing peripheral devices, and then finding, loading and starting the operating system. == C == cache: In computer science, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere. cloud: Cloud computing operating systems are recent, and were not mentioned in Gagne's 8th Edition (2009). In contrast, by Gagne's 9th (2012), cloud o/s received 3 pages of coverage (41, 42, 716). Doeppner (2011) mentions them (p. 3), but only to prove that operating systems "are not a solved problem" and that even if the day of the dedicated PC is waning, cloud computing has created an entirely new opportunity for o/s development ala sharing, networks, memory, parallelism, etc. Gagne (2012) adds that in addition to numerous traditional o/s's at cloud warehouses, Virtual machine o/s (VMMs), Eucalyptus, Vware, vCloud Director and others are being developed specifically for cloud management with numerous traditional o/s features (security, threads, file and memory management, guis, etc.) (p. 42). Microsoft's investment in cloud aspects of o/s tend to support that argument. concurrency == D == daemon: Operating systems often start daemons at boot time and serve the function of responding to network requests, hardware activity, or other programs by performing some task. Daemons can also configure hardware (like udevd on some Linux systems), run scheduled tasks (like cron), and perform a variety of other tasks. == E == == F == == G == == H == == I == == J == == K == kernel: In computing, the kernel is a computer program that manages input/output requests from software and translates them into data processing instructions for the central processing unit and other electronic components of a computer. The kernel is a fundamental part of a modern computer's operating system. == L == lock: In computer science, a lock or mutex (from mutual exclusion) is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy. == M == mutual exclusion: Mutual exclusion is to allow only one process at a time to access the same critical section (a part of code which accesses the critical resource). This helps prevent race conditions. mutex: See lock. == N == == O == == P == paging daemon: See daemon. process == Q == == R == == S == semaphore: In computer science, particularly in operating systems, a semaphore is a variable or abstract data type that is used for controlling access, by multiple processes, to a common resource in a parallel programming or a multi user environment. == T == thread: In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by an operating system scheduler. The scheduler itself is a light-weight process. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. templating: In an o/s context, templating refers to creating a single virtual machine image as a guest operating system, then saving it as a tool for multiple running virtual machines (Gagne, 2012, p. 716). The technique is used both in virtualization and cloud computing management, and is common in large server warehouses. == U == == V == == W == == Z ==

Distributed file system for cloud

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations (create, delete, modify, read, write) on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system. Users can share computing resources through the Internet thanks to cloud computing which is typically characterized by scalable and elastic resources – such as physical servers, applications and any services that are virtualized and allocated dynamically. Synchronization is required to make sure that all devices are up-to-date. Distributed file systems enable many big, medium, and small enterprises to store and access their remote data as they do local data, facilitating the use of variable resources. == Overview == === History === Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s. Sun Microsystem's Network File System became available in the 1980s. Before that, people who wanted to share files used the sneakernet method, physically transporting files on storage media from place to place. Once computer networks started to proliferate, it became obvious that the existing file systems had many limitations and were unsuitable for multi-user environments. Users initially used FTP to share files. FTP first ran on the PDP-10 at the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and then from the server onto the destination computer. Users were required to know the physical addresses of all computers involved with the file sharing. === Supporting techniques === Modern data centers must support large, heterogenous environments, consisting of large numbers of computers of varying capacities. Cloud computing coordinates the operation of all such systems, with techniques such as data center networking (DCN), the MapReduce framework, which supports data-intensive computing applications in parallel and distributed systems, and virtualization techniques that provide dynamic resource allocation, allowing multiple operating systems to coexist on the same physical server. === Applications === Cloud computing provides large-scale computing thanks to its ability to provide the needed CPU and storage resources to the user with complete transparency. This makes cloud computing particularly suited to support different types of applications that require large-scale distributed processing. This data-intensive computing needs a high performance file system that can share data between virtual machines (VM). Cloud computing dynamically allocates the needed resources, releasing them once a task is finished, requiring users to pay only for needed services, often via a service-level agreement. Cloud computing and cluster computing paradigms are becoming increasingly important to industrial data processing and scientific applications such as astronomy and physics, which frequently require the availability of large numbers of computers to carry out experiments. == Architectures == Most distributed file systems are built on the client-server architecture, but other, decentralized, solutions exist as well. === Client-server architecture === Network File System (NFS) uses a client-server architecture, which allows sharing of files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently. The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model: Remote access model: Provides transparency, the client has access to a file. He sends requests to the remote file (while the file remains on the server). Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients. The file system used by NFS is almost the same as the one used by Unix systems. Files are hierarchically organized into a naming graph in which directories and files are represented by nodes. === Cluster-based architectures === A cluster-based architecture ameliorates some of the issues in client-server architectures, improving the execution of applications in parallel. The technique used here is file-striping: a file is split into multiple chunks, which are "striped" across several storage servers. The goal is to allow access to different parts of a file in parallel. If the application does not benefit from this technique, then it would be more convenient to store different files on different servers. However, when it comes to organizing a distributed file system for large data centers, such as Amazon and Google, that offer services to web clients allowing multiple operations (reading, updating, deleting,...) to a large number of files distributed among a large number of computers, then cluster-based solutions become more beneficial. Note that having a large number of computers may mean more hardware failures. Two of the most widely used distributed file systems (DFS) of this type are the Google File System (GFS) and the Hadoop Distributed File System (HDFS). The file systems of both are implemented by user level processes running on top of a standard operating system (Linux in the case of GFS). ==== Design principles ==== ===== Goals ===== Google File System (GFS) and Hadoop Distributed File System (HDFS) are specifically built for handling batch processing on very large data sets. For that, the following hypotheses must be taken into account: High availability: the cluster can contain thousands of file servers and some of them can be down at any time A server belongs to a rack, a room, a data center, a country, and a continent, in order to precisely identify its geographical location The size of a file can vary from many gigabytes to many terabytes. The file system should be able to support a massive number of files The need to support append operations and allow file contents to be visible even while a file is being written Communication is reliable among working machines: TCP/IP is used with a remote procedure call RPC communication abstraction. TCP allows the client to know almost immediately when there is a problem and a need to make a new connection. ===== Load balancing ===== Load balancing is essential for efficient operation in distributed environments. It means distributing work among different servers, fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server. In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications. ===== Load rebalancing ===== In a cloud computing environment, failure is the norm, and chunkservers may be upgraded, replaced, and added to the system. Files can also be dynamically created, deleted, and appended. That leads to load imbalance in a distributed file system, meaning that the file chunks are not distributed equitably between the servers. Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold. However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is NP-hard. In order to get a large number of chunkservers to work in collaboration, and to

The Holocaust and social media

The representation of the Holocaust on social media has been a subject of scholarly inquiry and media attention. == Selfies at Holocaust memorial sites == Some visitors take selfies at Holocaust memorials, which has been the subject of controversy. In 2018, Rhian Sugden, a British model, received criticism after posting a selfie at the Memorial to the Murdered Jews of Europe in Berlin with the caption "ET phone home". She later removed the caption, but defended taking the photograph. Other celebrities have also been criticised for photographs at the Berlin memorial, including Indian actress Priyanka Chopra and US politician Pete Buttigieg, whose husband posted a photograph of him at the memorial on a personal social media account. The Israeli artist and satirist Shahak Shapira set up the website yolocaust.de in 2017 to expose people who take inappropriate selfies at the Holocaust memorial in Berlin. Shapira went through thousands of selfies posted to social media sites such as Facebook, Instagram, Tinder, and Grindr, choosing the twelve that he found most offensive. When the images were moused over, the website replaces the memorial backdrop with black and white images of Nazi victims. "Yolocaust" is a portmanteau of "Holocaust" and YOLO, an acronym for "you only live once". The website went viral, receiving 1.2 million views in the first 24 hours after its launch. Shapira honored requests to take down all of the photographs, which he had used without permission, and the website remains with only a textual documentation of the project. In an analysis of comments by Internet users on the project, Christoph Bareither estimated that 75% were positive. However, the memorial's architect, Peter Eisenman, criticized the website. In his 2018 book Postcards from Auschwitz, Grinnell professor Daniel P. Reynolds defends the practice of selfie-taking at Holocaust sites. In 2019, the Auschwitz-Birkenau State Museum requested that visitors not take inappropriate selfies, although the museum's staff acknowledged that other visitors take selfies in a thoughtful and respectful manner, which they did not criticize. In an academic paper, Gemma Commane and Rebekah Potton analyze the use of Instagram to share tourist photographs at Holocaust sites and conclude that "Instagram encourages conversation and empathy, keeping the Holocaust visible in youth discourses". According to their analysis, most images are tagged with respectful hashtags such as #tragic, #remembrance, and #sadness. The Auschwitz museum has an official Instagram account, auschwitzmemorial, which it uses to share selected appropriate Instagram posts. However, the image feed for the hashtag "Auschwitz" includes potentially offensive images such as an image of "Nazi Vs. Jews #beerpong". This image, according to the authors, expresses "mockery and contempt" for Holocaust victims. They also document offensive memes using images of Holocaust atrocities and shared on Instagram. Some social media users post in order to criticize what they see as inappropriate behavior at Holocaust sites, with one commenting, "Taking photos posing next to razor wire, selfies with victim's hair in the background, and even group shots in front of the crematoria had to be seen to be believed." == Assessment of tourism == Social media posts have been used by researchers to analyze the phenomenon of Holocaust-related tourism. == Social media groups == People have created groups on Facebook to discuss issues related to the Holocaust. One paper analyses two such groups, "The Holocaust and My Family" and "The Descendants of the Victims and Survivors of the Holocaust" in which people engage in collective trauma processing. == Eva.stories == In 2019, Israeli high-tech entrepreneur Mati Kochavi created a fictitious Instagram account for Eva Heyman, a Hungarian-Jewish girl who was murdered in Auschwitz concentration camp. The project met with mixed reception. Israeli prime minister Benjamin Netanyahu praised the project, saying that it "exposes the immense tragedy of our people through the story of one girl". == Holocaust denial == The issue of Holocaust denial on social media has also attracted attention. In October 2020, Facebook reversed its policy and banned Holocaust denial from the platform. Founder Mark Zuckerberg had previously argued that such content should not be banned on freedom of speech grounds.