TurboQuant is an online vector quantization algorithm for compressing high-dimensional Euclidean vectors while preserving their geometric structure. It was proposed in 2025 by Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni in the paper TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate. The paper lists Zandieh and Mirrokni as affiliated with Google Research, Daliri with New York University, and Hadian with Google DeepMind. The method was developed for applications including large language model (LLM) inference, key–value (KV) cache compression, vector databases, and nearest neighbor search. TurboQuant consists of two related algorithms: TurboQuantmse, which is optimized for mean squared error (MSE), and TurboQuantprod, which is optimized for unbiased inner product estimation. The algorithm uses a random rotation of input vectors, applies scalar quantizers to the rotated coordinates, and, for inner-product estimation, applies a one-bit Quantized Johnson–Lindenstrauss (QJL) transform to the residual error. == Background == Vector quantization is a compression method that maps high-dimensional vectors to a finite set of codewords. The problem has roots in Shannon's source coding theory and rate–distortion theory. In machine learning and information retrieval, vector quantization is used to reduce the memory required to store embeddings, activation vectors, and other numerical representations. In Transformer-based large language models, the KV cache stores key and value vectors from previous tokens during autoregressive decoding. The size of this cache grows with context length, the number of attention heads, and the number of concurrent requests, making it a major memory bottleneck in LLM serving. Similar compression problems appear in vector search, where large collections of embedding vectors must be stored and searched efficiently. Earlier approaches to vector quantization include product quantization, scalar quantization, and data-dependent k-means codebook construction. The TurboQuant paper argues that many existing methods either require offline preprocessing and calibration or suffer from suboptimal distortion guarantees in online settings. == Algorithm == === TurboQuantmse === TurboQuantmse is the version of the algorithm optimized for mean-squared error. For a unit vector x ∈ S d − 1 {\displaystyle x\in S^{d-1}} , the algorithm first applies a random rotation matrix Π ∈ R d × d {\displaystyle \Pi \in \mathbb {R} ^{d\times d}} and sets z = Π x {\displaystyle z=\Pi x} . Each coordinate of the rotated vector follows a shifted and scaled beta distribution, which converges to a normal distribution in high dimensions. In high dimensions, distinct coordinates also become nearly independent, allowing the algorithm to apply scalar quantizers independently to each coordinate. The scalar quantizer is constructed by solving a one-dimensional continuous k-means or Lloyd–Max quantization problem. If the centroids are c 1 , c 2 , … , c 2 b {\displaystyle c_{1},c_{2},\ldots ,c_{2^{b}}} , the quantization step stores, for each coordinate, i d x j = a r g m i n k ∈ [ 2 b ] | z j − c k | . {\displaystyle \mathrm {idx} _{j}=\operatorname {} {arg\,min}_{k\in [2^{b}]}|z_{j}-c_{k}|.} During dequantization, the stored index for each coordinate is replaced by the corresponding centroid, giving a reconstructed rotated vector z ~ {\displaystyle {\tilde {z}}} . The algorithm then rotates back: x ~ = Π ⊤ z ~ . {\displaystyle {\tilde {x}}=\Pi ^{\top }{\tilde {z}}.} The paper gives the following bound for TurboQuantmse: D m s e ≤ 3 π 2 ⋅ 1 4 b . {\displaystyle D_{\mathrm {mse} }\leq {\frac {\sqrt {3\pi }}{2}}\cdot {\frac {1}{4^{b}}}.} It also reports finer-grained MSE values of approximately 0.36, 0.117, 0.03, and 0.009 for bit-widths b = 1 , 2 , 3 , 4 {\displaystyle b=1,2,3,4} , respectively. === TurboQuantprod === TurboQuantprod is optimized for unbiased inner-product estimation. The authors note that an MSE-optimized quantizer may introduce bias when used to estimate inner products. To address this, TurboQuantprod first applies TurboQuantmse with bit-width b − 1 {\displaystyle b-1} , then applies a one-bit Quantized Johnson–Lindenstrauss transform to the remaining residual vector. Let r = x − Q m s e − 1 ( Q m s e ( x ) ) {\displaystyle r=x-Q_{\mathrm {mse} }^{-1}(Q_{\mathrm {mse} }(x))} be the residual after MSE quantization, and let γ = ‖ r ‖ 2 {\displaystyle \gamma =\|r\|_{2}} . The QJL step stores a sign vector for the residual. For γ ≠ 0 {\displaystyle \gamma \neq 0} , this can be written using the normalized residual u = r / γ {\displaystyle u=r/\gamma } : q j l = sign ( S u ) , {\displaystyle qjl=\operatorname {sign} (Su),} where S ∈ R d × d {\displaystyle S\in \mathbb {R} ^{d\times d}} is a random projection matrix. Since the sign function is invariant under positive rescaling, this is equivalent to sign ( S r ) {\displaystyle \operatorname {sign} (Sr)} when r ≠ 0 {\displaystyle r\neq 0} . If γ = 0 {\displaystyle \gamma =0} , the residual correction is zero. TurboQuantprod stores the MSE quantization, the QJL sign vector, and the residual norm: Q p r o d ( x ) = [ Q m s e ( x ) , q j l , γ ] . {\displaystyle Q_{\mathrm {prod} }(x)=\left[Q_{\mathrm {mse} }(x),qjl,\gamma \right].} The dequantized vector is reconstructed as x ~ = x ~ m s e + π / 2 d γ S ⊤ q j l . {\displaystyle {\tilde {x}}={\tilde {x}}_{\mathrm {mse} }+{\frac {\sqrt {\pi /2}}{d}}\,\gamma S^{\top }qjl.} The paper proves that TurboQuantprod is unbiased for inner-product estimation: E x ~ [ ⟨ y , x ~ ⟩ ] = ⟨ y , x ⟩ . {\displaystyle \mathbb {E} _{\tilde {x}}\left[\langle y,{\tilde {x}}\rangle \right]=\langle y,x\rangle .} It also gives the distortion bound D p r o d ≤ 3 π 2 ⋅ ‖ y ‖ 2 2 d ⋅ 1 4 b . {\displaystyle D_{\mathrm {prod} }\leq {\frac {\sqrt {3\pi }}{2}}\cdot {\frac {\|y\|_{2}^{2}}{d}}\cdot {\frac {1}{4^{b}}}.} == Performance and applications == The TurboQuant paper reports that the algorithm achieves near-optimal distortion rates within a small constant factor of information-theoretic lower bounds. The authors report that, for KV cache quantization, TurboQuant achieved quality neutrality at 3.5 bits per channel and marginal degradation at 2.5 bits per channel. In long-context LLM experiments using Llama 3.1 8B Instruct, the paper evaluated the method on a "needle-in-a-haystack" retrieval task with document lengths from 4,000 to 104,000 tokens. It reported that TurboQuant matched the uncompressed full-precision baseline while using more than 4× compression, and compared the method against PolarQuant, SnapKV, PyramidKV, and KIVI. Google Research stated that TurboQuant was evaluated on long-context benchmarks including LongBench, Needle in a Haystack, ZeroSCROLLS, RULER, and L-Eval using open-source models including Gemma and Mistral. According to a report in Tom's Hardware, Google described the method as reducing KV-cache memory by at least six times and achieving up to an eightfold improvement in attention-logit computation on Nvidia H100 GPUs compared with unquantized 32-bit keys. TurboQuant has also been applied to nearest-neighbor vector search. The original paper reports experiments on DBpedia entity embeddings and GloVe embeddings, comparing TurboQuant with product quantization and other vector-search quantization baselines. == Relationship to other methods == TurboQuant is related to several methods for efficient large language model inference and high-dimensional search: Product quantization – a vector quantization technique widely used for approximate nearest-neighbor search Quantization (machine learning) – reducing the numerical precision of weights, activations, or cached tensors in machine learning models PagedAttention – a memory-management algorithm for LLM serving that reduces fragmentation in the KV cache Johnson–Lindenstrauss lemma – a result in high-dimensional geometry used in random projection methods Lloyd's algorithm – an algorithm for scalar and vector quantization, including k-means-style codebook construction Unlike PagedAttention, which focuses on memory allocation and cache layout, TurboQuant reduces the numerical storage cost of the vectors themselves. Unlike many product-quantization methods, TurboQuant is designed to be data-oblivious and online, avoiding dataset-specific codebook training. == Limitations == The strongest performance claims for TurboQuant come from the original paper and Google Research's own publication. Coverage in technology media has noted that the broader impact of the method will depend on real-world implementation details, workloads, and hardware architectures.
Apache Pig
Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for relational database management systems. Pig Latin can be extended using user-defined functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language. == History == Apache Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad hoc way of creating and executing MapReduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation. === Naming === Regarding the naming of the Pig programming language, the name was chosen arbitrarily and stuck because it was memorable, easy to spell, and for novelty. The story goes that the researchers working on the project initially referred to it simply as 'the language'. Eventually they needed to call it something. Off the top of his head, one researcher suggested Pig, and the name stuck. It is quirky yet memorable and easy to spell. While some have hinted that the name sounds coy or silly, it has provided us with an entertaining nomenclature, such as Pig Latin for the language, Grunt for the shell, and PiggyBank for the CPAN-like shared repository. == Example == Below is an example of a "Word Count" program in Pig Latin: The above program will generate parallel executable tasks which can be distributed across multiple machines in a Hadoop cluster to count the number of words in a dataset such as all the webpages on the internet. == Pig vs SQL == In comparison to SQL, Pig has a nested relational model, uses lazy evaluation, uses extract, transform, load (ETL), is able to store data at any point during a pipeline, declares execution plans, supports pipeline splits, thus allowing workflows to proceed along DAGs instead of strictly sequential pipelines. On the other hand, it has been argued DBMSs are substantially faster than the MapReduce system once the data is loaded, but that loading the data takes considerably longer in the database systems. It has also been argued RDBMSs offer out of the box support for column-storage, working with compressed data, indexes for efficient random data access, and transaction-level fault tolerance. Pig Latin is procedural and fits very naturally in the pipeline paradigm while SQL is instead declarative. In SQL users can specify that data from two tables must be joined, but not what join implementation to use (You can specify the implementation of JOIN in SQL, thus "... for many SQL applications the query writer may not have enough knowledge of the data or enough expertise to specify an appropriate join algorithm."). Pig Latin allows users to specify an implementation or aspects of an implementation to be used in executing a script in several ways. In effect, Pig Latin programming is similar to specifying a query execution plan, making it easier for programmers to explicitly control the flow of their data processing task. SQL is oriented around queries that produce a single result. SQL handles trees naturally, but has no built in mechanism for splitting a data processing stream and applying different operators to each sub-stream. Pig Latin script describes a directed acyclic graph (DAG) rather than a pipeline. Pig Latin's ability to include user code at any point in the pipeline is useful for pipeline development. If SQL is used, data must first be imported into the database, and then the cleansing and transformation process can begin.
MDS matrix
An MDS matrix (maximum distance separable) is a matrix representing a function with certain diffusion properties that have useful applications in cryptography. Technically, an m × n {\displaystyle m\times n} matrix A {\displaystyle A} over a finite field K {\displaystyle K} is an MDS matrix if it is the transformation matrix of a linear transformation f ( x ) = A x {\displaystyle f(x)=Ax} from K n {\displaystyle K^{n}} to K m {\displaystyle K^{m}} such that no two different ( m + n ) {\displaystyle (m+n)} -tuples of the form ( x , f ( x ) ) {\displaystyle (x,f(x))} coincide in n {\displaystyle n} or more components. Equivalently, the set of all ( m + n ) {\displaystyle (m+n)} -tuples ( x , f ( x ) ) {\displaystyle (x,f(x))} is an MDS code, i.e., a linear code that reaches the Singleton bound. Let A ~ = ( I n A ) {\displaystyle {\tilde {A}}={\begin{pmatrix}\mathrm {I} _{n}\\\hline \mathrm {A} \end{pmatrix}}} be the matrix obtained by joining the identity matrix I n {\displaystyle \mathrm {I} _{n}} to A {\displaystyle A} . Then a necessary and sufficient condition for a matrix A {\displaystyle A} to be MDS is that every possible n × n {\displaystyle n\times n} submatrix obtained by removing m {\displaystyle m} rows from A ~ {\displaystyle {\tilde {A}}} is non-singular. This is also equivalent to the following: all the sub-determinants of the matrix A {\displaystyle A} are non-zero. Then a binary matrix A {\displaystyle A} (namely over the field with two elements) is never MDS unless it has only one row or only one column with all components 1 {\displaystyle 1} . Reed–Solomon codes have the MDS property and are frequently used to obtain the MDS matrices used in cryptographic algorithms. Serge Vaudenay suggested using MDS matrices in cryptographic primitives to produce what he called multipermutations, not-necessarily linear functions with this same property. These functions have what he called perfect diffusion: changing t {\displaystyle t} of the inputs changes at least m − t + 1 {\displaystyle m-t+1} of the outputs. He showed how to exploit imperfect diffusion to cryptanalyze functions that are not multipermutations. MDS matrices are used for diffusion in such block ciphers as AES, SHARK, Square, Twofish, Anubis, KHAZAD, Manta, Hierocrypt, Kalyna, Camellia and HADESMiMC, and in the stream cipher MUGI and the cryptographic hash function Whirlpool, Poseidon.
Secret London
Secret London is a Facebook group started by 21-year-old Bristol University graduate, Tiffany Philippou, on 19 January 2010 in response to a Saatchi & Saatchi competition. The group grew rapidly (180,000 members as of 8 February 2010) and is composed mostly of Londoners who use the site to share suggestions and photos of London. After the group's early success, the founder announced her intention to launch a website of the same name by crowdsourcing the design and development. The website was launched on 16 February 2010. == Other secret cities == Following the initial success of Secret London, a number of other secret groups were independently started around the world, some of which already have over 100,000 users. As of 19 February 2010, the list of other groups includes: Secret Frankfurt, Secret Tel Aviv, Secret Paris, Secret New York, Secret Tokyo, Secret Toronto, Secret Los Angeles, Secret Exeter, Secret Boston, Secret Norwich, Secret Singapore, Secret Brighton, Secret Minneapolis, Secret Sydney, Secret Canberra, Secret Brisbane, Secret Wellington, Secret Christchurch, Secret Madeira, Secret Funchal, Secret Bristol and Secret Cardiff. == Controversy == Some commentators have questioned whether it possible to share secrets without compromising them, and whether sharing tips publicly will lead to over-exposure of the businesses who are recommended.
Data validation and reconciliation
Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation. == Models, data and measurement errors == Industrial processes, for example chemical or thermodynamic processes in chemical plants, refineries, oil or gas production sites, or power plants, are often represented by two fundamental means: Models that express the general structure of the processes, Data that reflects the state of the processes at a given point in time. Models can have different levels of detail, for example one can incorporate simple mass or compound conservation balances, or more advanced thermodynamic models including energy conservation laws. Mathematically the model can be expressed by a nonlinear system of equations F ( y ) = 0 {\displaystyle F(y)=0\,} in the variables y = ( y 1 , … , y n ) {\displaystyle y=(y_{1},\ldots ,y_{n})} , which incorporates all the above-mentioned system constraints (for example the mass or heat balances around a unit). A variable could be the temperature or the pressure at a certain place in the plant. === Error types === Data originates typically from measurements taken at different places throughout the industrial site, for example temperature, pressure, volumetric flow rate measurements etc. To understand the basic principles of PDR, it is important to first recognize that plant measurements are never 100% correct, i.e. raw measurement y {\displaystyle y\,} is not a solution of the nonlinear system F ( y ) = 0 {\displaystyle F(y)=0\,\!} . When using measurements without correction to generate plant balances, it is common to have incoherencies. Measurement errors can be categorized into two basic types: random errors due to intrinsic sensor accuracy and systematic errors (or gross errors) due to sensor calibration or faulty data transmission. Random errors means that the measurement y {\displaystyle y\,\!} is a random variable with mean y ∗ {\displaystyle y^{}\,\!} , where y ∗ {\displaystyle y^{}\,\!} is the true value that is typically not known. A systematic error on the other hand is characterized by a measurement y {\displaystyle y\,\!} which is a random variable with mean y ¯ {\displaystyle {\bar {y}}\,\!} , which is not equal to the true value y ∗ {\displaystyle y^{}\,} . For ease in deriving and implementing an optimal estimation solution, and based on arguments that errors are the sum of many factors (so that the Central limit theorem has some effect), data reconciliation assumes these errors are normally distributed. Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses. The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses. This use of average values, like a moving average, acts as a low-pass filter, so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases. === Necessity of removing measurement errors === ISA-95 is the international standard for the integration of enterprise and control systems It asserts that: Data reconciliation is a serious issue for enterprise-control integration. The data have to be valid to be useful for the enterprise system. The data must often be determined from physical measurements that have associated error factors. This must usually be converted into exact values for the enterprise system. This conversion may require manual, or intelligent reconciliation of the converted values [...]. Systems must be set up to ensure that accurate data are sent to production and from production. Inadvertent operator or clerical errors may result in too much production, too little production, the wrong production, incorrect inventory, or missing inventory. == History == PDR has become more and more important due to industrial processes that are becoming more and more complex. PDR started in the early 1960s with applications aiming at closing material balances in production processes where raw measurements were available for all variables. At the same time the problem of gross error identification and elimination has been presented. In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process., PDR also became more mature by considering general nonlinear equation systems coming from thermodynamic models., , Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah. Dynamic PDR was formulated as a nonlinear optimization problem by Liebman et al. in 1992. == Data reconciliation == Data reconciliation is a technique that targets at correcting measurement errors that are due to measurement noise, i.e. random errors. From a statistical point of view the main assumption is that no systematic errors exist in the set of measurements, since they may bias the reconciliation results and reduce the robustness of the reconciliation. Given n {\displaystyle n} measurements y i {\displaystyle y_{i}} , data reconciliation can mathematically be expressed as an optimization problem of the following form: min x , y ∗ ∑ i = 1 n ( y i ∗ − y i σ i ) 2 subject to F ( x , y ∗ ) = 0 y min ≤ y ∗ ≤ y max x min ≤ x ≤ x max , {\displaystyle {\begin{aligned}\min _{x,y^{}}&\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\\{\text{subject to }}&F(x,y^{})=0\\&y_{\min }\leq y^{}\leq y_{\max }\\&x_{\min }\leq x\leq x_{\max },\end{aligned}}\,\!} where y i ∗ {\displaystyle y_{i}^{}\,\!} is the reconciled value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), y i {\displaystyle y_{i}\,\!} is the measured value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), x j {\displaystyle x_{j}\,\!} is the j {\displaystyle j} -th unmeasured variable ( j = 1 , … , m {\displaystyle j=1,\ldots ,m\,\!} ), and σ i {\displaystyle \sigma _{i}\,\!} is the standard deviation of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), F ( x , y ∗ ) = 0 {\displaystyle F(x,y^{})=0\,\!} are the p {\displaystyle p\,\!} process equality constraints and x min , x max , y min , y max {\displaystyle x_{\min },x_{\max },y_{\min },y_{\max }\,\!} are the bounds on the measured and unmeasured variables. The term ( y i ∗ − y i σ i ) 2 {\displaystyle \left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\,\!} is called the penalty of measurement i. The objective function is the sum of the penalties, which will be denoted in the following by f ( y ∗ ) = ∑ i = 1 n ( y i ∗ − y i σ i ) 2 {\displaystyle f(y^{})=\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}} . In other words, one wants to minimize the overall correction (measured in the least squares term) that is needed in order to satisfy the system constraints. Additionally, each least squares term is weighted by the standard deviation of the corresponding measurement. The standard deviation is related to the accuracy of the measurement. For example, at a 95% confidence level, the standard deviation is about half the accuracy. === Redundancy === Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from redundancy in information theory. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy", "analytical redundancy", or "topological redundancy". Redundancy can be due to sensor redundancy, where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints. Redundancy is linked to the concept
Toy problem
In scientific disciplines, a toy problem or a puzzlelike problem is a problem that is not of immediate scientific interest, yet is used as an expository device to illustrate a trait that may be shared by other, more complicated, instances of the problem, or as a way to explain a particular, more general, problem solving technique. A toy problem is useful to test and demonstrate methodologies. Researchers can use toy problems to compare the performance of different algorithms. They are also good for game designing. For instance, while engineering a large system, the large problem is often broken down into many smaller toy problems which have been well understood in detail. Often these problems distill a few important aspects of complicated problems so that they can be studied in isolation. Toy problems are thus often very useful in providing intuition about specific phenomena in more complicated problems. As an example, in the field of artificial intelligence, classical puzzles, games and problems are often used as toy problems. These include sliding-block puzzles, N-Queens problem, missionaries and cannibals problem, tic-tac-toe, chess, Tower of Hanoi and others.
Storyful
Storyful (stylized as storyful.) is a social media intelligence company headquartered in Dublin, Ireland that is a subsidiary of News Corp, offering services such as social news monitoring, video licensing, and reputation risk management tools for corporate clients. The startup was launched as the first social media newswire, a content aggregator, verifying news sources and online content in Dublin in 2010 by Mark Little, a former journalist with RTÉ News. Storyful was acquired by News Corp in 2013 for USD$25 million. == Background == Mark Little, who had worked as a television journalist for RTÉ One, founded startup Storyful in Dublin, Ireland, in 2010, as a service that "verified news sources and online content". According to Nieman Lab, Storyful had a reputation for content aggregation as a social news agency—finding, verifying, distributing, licensing, and commercializing user-generated content, social media and online content from social networking services, including videos about stories in the news, such as the Syrian Civil War, Arab Spring protests, as well as "smaller viral moments". Storyful aimed to provide authority through its verification and monitoring tools while providing authenticity through user-generated content. On 20 December 2013 News Corp purchased Storyful for US$25 million and opened a New York office in the same building as Fox News' main studios. Little left Storyful in 2015 and Gavin Sheridan, Storyful's director of innovation left in 2014. News Corp CEO Robert Thomson said that through Storyful, News Corp would "define the opportunities that the digital landscape presents, rather than simply adapt to them." After the acquisition, the company expanded its service to include "commercial and creative work". After Murdoch acquired the company, from 2014 through to February 2018, losses "swelled", requiring a series of cash injections from News Corp. During that time the company expanded aggressively globally with a staff of about 200 worldwide up from about 30 in 2014. According to The Guardian, in 2016, journalists were encouraged by Storyful to use the social media monitoring software called Verify developed by Storyful. By installing Verify's web browser extension on their computers, Verify would inform the journalists when social media content had been "verified and cleared". The Guardian revealed that through the Verify plugin, dozens of staff in four offices had access to the journalists browsing activity without them knowing. This data allowed Storyful to actively monitor its own clients' activities on social media and to "turn it into an internal feed" at Storyful that "updates in real time". In November 2018, when a video circulated by Infowars' Paul Joseph Watson appeared to prove that CNN's Jim Acosta's contact with a White House intern was a physical blow, Storyful was able to prove that the 15-second-long clip had been doctored. According to a 21 January 2019 article in CNN Business, Rob McDonagh, the editor of Storyful's U.S. news team, had proven that one of the viral videos that served as catalysts in the January 2019 Lincoln Memorial confrontation at 18 January 2019 Indigenous Peoples March, was posted by a suspicious account, under the handle @2020fight. McDonagh's team validates videos and posts before adding them to their "digest", distinguishing true stories from those that are not. Storyful attempts to validate each post or video before including it in its digest. McDonagh reviewed previous content from @2020fight's account, and found it suspicious because it had a high follower count, a "highly polarized and yet inconsistent political messaging", an "unusually high rate of tweets", and "the use of someone else's image in the profile photo." reporter Donie O'Sullivan said that the @2020fight video that had been posted on 18 January, which had 2.5 million views by 22 January, was the one that "helped frame the news cycle". Currently the website offers a service by which video can be commercially brokered. == Services == Services include a newswire service—one of their "core pillars"—and social news monitoring. By February 2018, Storyful was developing "risk and reputation monitoring" services through which they would source and verify social news, fact-checking it and contextualising it for corporate clients. They were "developing tech tools" to "explore obscure or closed networks" for their intelligence team. can use to explore obscure or closed networks. They "track deviations in social conversations around brands and organisations and catch potential risks before they blow up. Like an alerts system." The company "released a re-booted version of its Newswire platform in 2018. According to FORA, Storyful was developing new tools to combat fake news online. == Clients == When Storyful was acquired by News Corp in 2013, the company already had the Wall Street Journal, the BBC, New York Times, YouTube, ITN and Channel 4 News as clients. By 2018 their clients included CNN, ABC News and Fox News, The New York Times, the Washington Post, in the United States, the Australian Broadcasting Corporation and all of News Corp’s own publications. Most of their "reputation-conscious corporate customers" clients prefer to not be named.