Least-squares support vector machine

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods. == From support-vector machine to least-squares support-vector machine == Given a training set { x i , y i } i = 1 N {\displaystyle \{x_{i},y_{i}\}_{i=1}^{N}} with input data x i ∈ R n {\displaystyle x_{i}\in \mathbb {R} ^{n}} and corresponding binary class labels y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} , the SVM classifier, according to Vapnik's original formulation, satisfies the following conditions: { w T ϕ ( x i ) + b ≥ 1 , if y i = + 1 , w T ϕ ( x i ) + b ≤ − 1 , if y i = − 1 , {\displaystyle {\begin{cases}w^{T}\phi (x_{i})+b\geq 1,&{\text{if }}\quad y_{i}=+1,\\w^{T}\phi (x_{i})+b\leq -1,&{\text{if }}\quad y_{i}=-1,\end{cases}}} which is equivalent to y i [ w T ϕ ( x i ) + b ] ≥ 1 , i = 1 , … , N , {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1,\quad i=1,\ldots ,N,} where ϕ ( x ) {\displaystyle \phi (x)} is the nonlinear map from original space to the high- or infinite-dimensional space. === Inseparable data === In case such a separating hyperplane does not exist, we introduce so-called slack variables ξ i {\displaystyle \xi _{i}} such that { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N . {\displaystyle {\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N.\end{cases}}} According to the structural risk minimization principle, the risk bound is minimized by the following minimization problem: min J 1 ( w , ξ ) = 1 2 w T w + c ∑ i = 1 N ξ i , {\displaystyle \min J_{1}(w,\xi )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}\xi _{i},} Subject to { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N , {\displaystyle {\text{Subject to }}{\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N,\end{cases}}} To solve this problem, we could construct the Lagrangian function: L 1 ( w , b , ξ , α , β ) = 1 2 w T w + c ∑ i = 1 N ξ i − ∑ i = 1 N α i { y i [ w T ϕ ( x i ) + b ] − 1 + ξ i } − ∑ i = 1 N β i ξ i , {\displaystyle L_{1}(w,b,\xi ,\alpha ,\beta )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}{\xi _{i}}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{y_{i}\left[{w^{T}\phi (x_{i})+b}\right]-1+\xi _{i}\right\}-\sum \limits _{i=1}^{N}\beta _{i}\xi _{i},} where α i ≥ 0 , β i ≥ 0 ( i = 1 , … , N ) {\displaystyle \alpha _{i}\geq 0,\ \beta _{i}\geq 0\ (i=1,\ldots ,N)} are the Lagrangian multipliers. The optimal point will be in the saddle point of the Lagrangian function, and then we obtain By substituting w {\displaystyle w} by its expression in the Lagrangian formed from the appropriate objective and constraints, we will get the following quadratic programming problem: max Q 1 ( α ) = − 1 2 ∑ i , j = 1 N α i α j y i y j K ( x i , x j ) + ∑ i = 1 N α i , {\displaystyle \max Q_{1}(\alpha )=-{\frac {1}{2}}\sum \limits _{i,j=1}^{N}{\alpha _{i}\alpha _{j}y_{i}y_{j}K(x_{i},x_{j})}+\sum \limits _{i=1}^{N}\alpha _{i},} where K ( x i , x j ) = ⟨ ϕ ( x i ) , ϕ ( x j ) ⟩ {\displaystyle K(x_{i},x_{j})=\left\langle \phi (x_{i}),\phi (x_{j})\right\rangle } is called the kernel function. Solving this QP problem subject to constraints in (1), we will get the hyperplane in the high-dimensional space and hence the classifier in the original space. === Least-squares SVM formulation === The least-squares version of the SVM classifier is obtained by reformulating the minimization problem as min J 2 ( w , b , e ) = μ 2 w T w + ζ 2 ∑ i = 1 N e i 2 , {\displaystyle \min J_{2}(w,b,e)={\frac {\mu }{2}}w^{T}w+{\frac {\zeta }{2}}\sum \limits _{i=1}^{N}e_{i}^{2},} subject to the equality constraints y i [ w T ϕ ( x i ) + b ] = 1 − e i , i = 1 , … , N . {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]=1-e_{i},\quad i=1,\ldots ,N.} The least-squares SVM (LS-SVM) classifier formulation above implicitly corresponds to a regression interpretation with binary targets y i = ± 1 {\displaystyle y_{i}=\pm 1} . Using y i 2 = 1 {\displaystyle y_{i}^{2}=1} , we have ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i e i ) 2 = ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 , {\displaystyle \sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}(y_{i}e_{i})^{2}=\sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2},} with e i = y i − ( w T ϕ ( x i ) + b ) . {\displaystyle e_{i}=y_{i}-(w^{T}\phi (x_{i})+b).} Notice, that this error would also make sense for least-squares data fitting, so that the same end results holds for the regression case. Hence the LS-SVM classifier formulation is equivalent to J 2 ( w , b , e ) = μ E W + ζ E D {\displaystyle J_{2}(w,b,e)=\mu E_{W}+\zeta E_{D}} with E W = 1 2 w T w {\displaystyle E_{W}={\frac {1}{2}}w^{T}w} and E D = 1 2 ∑ i = 1 N e i 2 = 1 2 ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 . {\displaystyle E_{D}={\frac {1}{2}}\sum \limits _{i=1}^{N}e_{i}^{2}={\frac {1}{2}}\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2}.} Both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } should be considered as hyperparameters to tune the amount of regularization versus the sum squared error. The solution does only depend on the ratio γ = ζ / μ {\displaystyle \gamma =\zeta /\mu } , therefore the original formulation uses only γ {\displaystyle \gamma } as tuning parameter. We use both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } as parameters in order to provide a Bayesian interpretation to LS-SVM. The solution of LS-SVM regressor will be obtained after we construct the Lagrangian function: { L 2 ( w , b , e , α ) = J 2 ( w , e ) − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , = 1 2 w T w + γ 2 ∑ i = 1 N e i 2 − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , {\displaystyle {\begin{cases}L_{2}(w,b,e,\alpha )\;=J_{2}(w,e)-\sum \limits _{i=1}^{N}\alpha _{i}\left\{{\left[{w^{T}\phi (x_{i})+b}\right]+e_{i}-y_{i}}\right\},\\\quad \quad \quad \quad \quad \;={\frac {1}{2}}w^{T}w+{\frac {\gamma }{2}}\sum \limits _{i=1}^{N}e_{i}^{2}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{\left[w^{T}\phi (x_{i})+b\right]+e_{i}-y_{i}\right\},\end{cases}}} where α i ∈ R {\displaystyle \alpha _{i}\in \mathbb {R} } are the Lagrange multipliers. The conditions for optimality are { ∂ L 2 ∂ w = 0 → w = ∑ i = 1 N α i ϕ ( x i ) , ∂ L 2 ∂ b = 0 → ∑ i = 1 N α i = 0 , ∂ L 2 ∂ e i = 0 → α i = γ e i , i = 1 , … , N , ∂ L 2 ∂ α i = 0 → y i = w T ϕ ( x i ) + b + e i , i = 1 , … , N . {\displaystyle {\begin{cases}{\frac {\partial L_{2}}{\partial w}}=0\quad \to \quad w=\sum \limits _{i=1}^{N}\alpha _{i}\phi (x_{i}),\\{\frac {\partial L_{2}}{\partial b}}=0\quad \to \quad \sum \limits _{i=1}^{N}\alpha _{i}=0,\\{\frac {\partial L_{2}}{\partial e_{i}}}=0\quad \to \quad \alpha _{i}=\gamma e_{i},\;i=1,\ldots ,N,\\{\frac {\partial L_{2}}{\partial \alpha _{i}}}=0\quad \to \quad y_{i}=w^{T}\phi (x_{i})+b+e_{i},\,i=1,\ldots ,N.\end{cases}}} Elimination of w {\displaystyle w} and e {\displaystyle e} will yield a linear system instead of a quadratic programming problem: [ 0 1 N T 1 N Ω + γ − 1 I N ] [ b α ] = [ 0 Y ] , {\displaystyle \left[{\begin{matrix}0&1_{N}^{T}\\1_{N}&\Omega +\gamma ^{-1}I_{N}\end{matrix}}\right]\left[{\begin{matrix}b\\\alpha \end{matrix}}\right]=\left[{\begin{matrix}0\\Y\end{matrix}}\right],} with Y = [ y 1 , … , y N ] T {\displaystyle Y=[y_{1},\ldots ,y_{N}]^{T}} , 1 N = [ 1 , … , 1 ] T {\displaystyle 1_{N}=[1,\ldots ,1]^{T}} and α = [ α 1 , … , α N ] T {\displaystyle \alpha =[\alpha _{1},\ldots ,\alpha _{N}]^{T}} . Here, I N {\displaystyle I_{N}} is an N × N {\displaystyle N\times N} identity matrix, and Ω ∈ R N × N {\displaystyle \Omega \in \mathbb {R} ^{N\times N}} is the kernel matrix defined by Ω i j = ϕ ( x i ) T ϕ ( x j ) = K ( x i , x j ) {\displaystyle \Omega _{ij}=\phi (x_{i})^{T}\phi (x_{j})=K(x_{i},x_{j})} . === Kernel function K === For the kernel function K(•, •) one typically has the following choices: Linear kernel : K ( x , x i ) = x i T x , {\displaystyle K(x,x_{i})=x_{i}^{T}x,} Polynomial kernel of degree d {\displaystyle d} : K ( x , x i ) = ( 1 + x i T x / c ) d , {\displaystyle K(x,x_{i})=\left({1+x_{i}^{T}x/c}\right)^{d},} Radial basis function RBF kernel : K ( x , x i ) = exp ⁡ ( − ‖ x − x i ‖ 2 / σ 2 ) , {\displaystyle K(x,x_{i})=\exp \left({-\left\|{x-x_{i}}\right\|^{2}/\sigma ^{2}}\right),} MLP kernel : K ( x , x i ) = tanh ⁡ ( k x i T x + θ ) , {\displaystyle K(x,x_{i})=\tanh \left({k

Non-local means

Non-local means is an algorithm in image processing for image denoising. Unlike "local mean" filters, which take the mean value of a group of pixels surrounding a target pixel to smooth the image, non-local means filtering takes a mean of all pixels in the image, weighted by how similar these pixels are to the target pixel. This results in much greater post-filtering clarity, and less loss of detail in the image compared with local mean algorithms. If compared with other well-known denoising techniques, non-local means adds "method noise" (i.e. error in the denoising process) which looks more like white noise, which is desirable because it is typically less disturbing in the denoised product. Recently non-local means has been extended to other image processing applications such as deinterlacing, view interpolation, and depth maps regularization. == Definition == Suppose Ω {\displaystyle \Omega } is the area of an image, and p {\displaystyle p} and q {\displaystyle q} are two points within the image. Then, the algorithm is: u ( p ) = 1 C ( p ) ∫ Ω v ( q ) f ( p , q ) d q . {\displaystyle u(p)={1 \over C(p)}\int _{\Omega }v(q)f(p,q)\,\mathrm {d} q.} where u ( p ) {\displaystyle u(p)} is the filtered value of the image at point p {\displaystyle p} , v ( q ) {\displaystyle v(q)} is the unfiltered value of the image at point q {\displaystyle q} , f ( p , q ) {\displaystyle f(p,q)} is the weighting function, and the integral is evaluated ∀ q ∈ Ω {\displaystyle \forall q\in \Omega } . C ( p ) {\displaystyle C(p)} is a normalizing factor, given by C ( p ) = ∫ Ω f ( p , q ) d q . {\displaystyle C(p)=\int _{\Omega }f(p,q)\,\mathrm {d} q.} == Common weighting functions == The purpose of the weighting function, f ( p , q ) {\displaystyle f(p,q)} , is to determine how closely related the image at the point p {\displaystyle p} is to the image at the point q {\displaystyle q} . It can take many forms. === Gaussian === The Gaussian weighting function sets up a normal distribution with a mean, μ = B ( p ) {\displaystyle \mu =B(p)} and a variable standard deviation: f ( p , q ) = e − | B ( q ) − B ( p ) | 2 h 2 {\displaystyle f(p,q)=e^{-{{\left\vert B(q)-B(p)\right\vert ^{2}} \over h^{2}}}} where h {\displaystyle h} is the filtering parameter (i.e., standard deviation) and B ( p ) {\displaystyle B(p)} is the local mean value of the image point values surrounding p {\displaystyle p} . == Discrete algorithm == For an image, Ω {\displaystyle \Omega } , with discrete pixels, a discrete algorithm is required. u ( p ) = 1 C ( p ) ∑ q ∈ Ω v ( q ) f ( p , q ) {\displaystyle u(p)={1 \over C(p)}\sum _{q\in \Omega }v(q)f(p,q)} where, once again, v ( q ) {\displaystyle v(q)} is the unfiltered value of the image at point q {\displaystyle q} . C ( p ) {\displaystyle C(p)} is given by: C ( p ) = ∑ q ∈ Ω f ( p , q ) {\displaystyle C(p)=\sum _{q\in \Omega }f(p,q)} Then, for a Gaussian weighting function, f ( p , q ) = e − | B ( q ) 2 − B ( p ) 2 | h 2 {\displaystyle f(p,q)=e^{-{{\left\vert B(q)^{2}-B(p)^{2}\right\vert } \over h^{2}}}} where B ( p ) {\displaystyle B(p)} is given by: B ( p ) = 1 | R ( p ) | ∑ i ∈ R ( p ) v ( i ) {\displaystyle B(p)={1 \over |R(p)|}\sum _{i\in R(p)}v(i)} where R ( p ) ⊆ Ω {\displaystyle R(p)\subseteq \Omega } and is a square region of pixels surrounding p {\displaystyle p} and | R ( p ) | {\displaystyle |R(p)|} is the number of pixels in the region R {\displaystyle R} . == Efficient implementation == The computational complexity of the non-local means algorithm is quadratic in the number of pixels in the image, making it particularly expensive to apply directly. Several techniques were proposed to speed up execution. One simple variant consists of restricting the computation of the mean for each pixel to a search window centred on the pixel itself, instead of the whole image. Another approximation uses summed-area tables and fast Fourier transform to calculate the similarity window between two pixels, speeding up the algorithm by a factor of 50 while preserving comparable quality of the result.

Digital goods

Digital goods or e-goods are intangible goods that exist in digital form. Examples are Wikipedia articles; digital media, such as e-books, downloadable music, internet radio, internet television and streaming media; fonts, logos, photos and graphics; digital subscriptions; online ads (as purchased by the advertiser); internet coupons; electronic tickets; electronically treated documentation in many different fields; downloadable software (Digital Distribution) and mobile apps; cloud-based applications and online games; virtual goods used within the virtual economies of online games and communities; community access; workbooks; worksheets; planners; e-learning (online courses); webinars, video tutorials, blog posts; cards; patterns; website themes and templates. == Legal concerns about digital goods == Special legal concerns regarding digital goods include copyright infringement and taxation. Also the question of the ownership (versus licensed use or service only) of purely digital goods is not finally resolved. For instance, the software installers of the digital software distributor gog.com are technically independent to the account but are still subject to the EULA, where a "licensed, not sold" formulation is used. Therefore, it is not clear if the software can be legally used after a hypothetical loss of the account; a question which was also raised before in practice for the similar service Steam. In July 2012, the European Court of Justice ruled in the case UsedSoft GMbH v. Oracle International Corp. that the sale of a software product, either through a physical support or download, constituted a transfer of ownership in EU law, thus the first sale doctrine applies; the ruling thereby breaks the "licensed, not sold" legal theory, but leaves open numerous questions. Therefore, it is also permissible to resell software licenses even if the digital good has been downloaded directly from the Internet, as the first-sale doctrine applied whenever software was originally sold to a customer for an unlimited amount of time, thus prohibiting any software maker from preventing the resale of their software by any of their legitimate owners. The court requires that the previous owner must no longer be able to use the licensed software after the resale, but finds that the practical difficulties in enforcing this clause should not be an obstacle to authorizing resale, as they are also present for software which can be installed from physical supports, where the first-sale doctrine is in force. In several cases, content providers have faced criticism for revoking access to digital goods due to expired licenses or the discontinuation of a product, such as ebooks (which resulted in a lawsuit against Amazon.com, Inc.), digital video (with Sony Interactive Entertainment revoking access to purchased StudioCanal content from its now-defunct PlayStation video store; a similar move involving Warner Bros. Discovery content was averted by an updated license agreement), and video games (such as Ubisoft discontinuing and revoking access to its game The Crew without providing refunds or the ability to redownload the game) In September 2024, the U.S. state of California implemented a consumer protection law that prohibits the use of terms such as "buy" or "purchase" during transactions involving digital goods if there is no way to obtain the purchases in a manner that cannot be revoked by the seller (such as allowing it to be downloaded for permanent, offline access), and requires a disclaimer to be displayed to the customer at the time of purchase.

FactorDaily

FactorDaily is an Indian digital media publication founded in 2016 by Pankaj Mishra and Jayadevan PK. Mishra was formerly an Editor at TechCrunch and the Economic Times. The digital publication was launched with an intent to produce stories on the impact of technology on life in India. == History == FactorDaily began publishing in May 2016, with daily reported stories on technology, culture and life in India. Prior to its launch, the company had raised $1 million in seed funding from Accel India, Blume Ventures, Girish Mathrubootham of Freshdesk, Vijay Shekhar Sharma of PayTm, and Jay Vijayan of Tekion. Josey Puliyenthuruthel John, formerly Managing Editor at Business Today and National Corporate Editor at Mint, later joined the company as a Consulting Editor. In January 2017, FactorDaily launched its first Podcast called The Outliers. The inaugural episode featured a conversation with Manish Sharma of Printo on his journey starting up. == Awards == The FactorDaily team won the Bengaluru Editors Lab 2017, a journalism hackathon organised by the Global Editors Network (GEN). The story titled "India has 3,800 psychiatrists for 1.2bn people. Can tech step in to manage mental health?" won the first prize in the online category of the fifth Schizophrenia Research Foundation’s (SCARF) ‘Media for Mental Health’ awards. The story titled 'The dark hand of tech that stokes sex trafficking in India', won the Stop Slavery media Awards by the Thomson Reuters Foundation for the year 2020.

VibeOS

VibeOS is an operating system built from scratch entirely by generative artificial intelligence, using code produced through prompts to Claude (vibe coding). It is capable of running on QEMU and was successfully tested on a Raspberry Pi Zero. It has been released under the MIT license. == Features == === Core === Custom kernel with cooperative multitasking (preemptive backup) FAT32 filesystem with long filename support Memory allocator, process scheduler, interrupt handling GIC-400 (QEMU) and BCM2836/BCM2835 (Pi) interrupt controllers Configurable boot (splash screen, boot target) === GUI === Desktop environment with draggable windows Menu bar, dock, window minimize/maximize/close Mouse and keyboard input Modern macOS-inspired aesthetic === Networking === Full TCP/IP stack (Ethernet, ARP, IP, ICMP, UDP, TCP) DNS resolver HTTP client TLS 1.2 with HTTPS support === Apps === Web browser with HTML/CSS rendering Terminal emulator with readline-style shell Text editor (vim clone) with syntax highlighting File manager with drag-and-drop Music player (MP3/WAV) Calculator, system monitor VibeCode IDE Doom port === Development === TCC (Tiny C Compiler) - compile C programs directly on VibeOS MicroPython interpreter with full kernel API bindings 60+ userspace programs (coreutils, games, GUI apps) === Hardware === Runs on Raspberry Pi Zero 2W USB keyboard and mouse via DWC2 driver SD card via EMMC driver 1920×1080 framebuffer == Further projects == There are other independent projects under the VibeOS name, including an independent development by Ben, also developed using vibe coding, aimed at creating a Unix-like operating system for educational purposes. Another project is Vib-OS, an operating system also built using vibe coding, capable of booting on a Raspberry Pi. It offers a desktop environment with a customizable wallpaper, a file manager, and a web browser currently in an early stage of development, a functional Doom port, among other features that are not very polished given the state of development.

Compute (machine learning)

In machine learning and deep learning, compute is the amount of computing power or computational resources required to train machine learning models and large language models. More broadly, compute is the computational power or resources necessary for a computer or computer program to function. == Definition == Compute is commonly defined as the amount of computing power or computational resources required to train machine learning and large language models. The term "compute" has also been more broadly applied to cloud computing, referencing processing power, memory, networking, storage, and other resources required for the computation of any program. Compute is measured in petaflop/s-days and is used to document AI training. A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kilowatt-hour for energy. An amount of compute is meant to give an idea of the number of actual operations performed. == History == In a 2018 analysis titled "AI and compute", artificial intelligence company OpenAI introduced the concept of compute. OpenAI identified two eras of training AI systems in terms of compute-usage. From 1959 to 2012, compute roughly followed Moore’s law. Between 2012 and 2018, the amount of compute used in the largest AI training runs increased exponentially, growing by more than 300,000 times — roughly doubling every 3.4 months. By comparison, Moore’s Law doubled every two years over the same period. One of the largest models, released in 2020, used 600,000 times more computing power than the 2012 model. After 2020, compute growth began to slow down, with the compute needed for the largest AI models continuing to slow down in 2023. The notion of compute has become increasingly used from the mid-2020s onwards. == Compute growth and AI progress == Larger AI models trained on more data and using more computational resources, tend to perform better. This happens even if the algorithms themselves remain unchanged. As early as 2018, OpenAI noted the exponential increase in compute to be have a key role in AI progress. OpenAI considers three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. AI models with more compute not only improve in the tasks they were trained on but can develop emergent abilities. Incremental improvements can lead to more abrupt leaps in capabilities. AI provider SpaceXAI said in 2026 that their AI progress is driven by compute and used it a key metric in the AI training of its supercomputer Colossus, the which contains 1 million GPUs. Anthropic has a contract of $1.25 billion per month with SpaceXAI to buy all the compute capacity at Colossus 1 data center. === Criticism and policy === Increasing, promoting or constraining progress in artificial intelligence has often be done via controlling the amount of compute. Policymarkers have enacted policies and provided support to make compute resources more accessible to domestic AI researchers. In a January 2022 report, the Center for Security and Emerging Technology (CSET) suggested to institutions that increasingly powerful and generalizable AI (AGI) will likely require other strategies than maximizing compute. Some AI researchers are also concerned that government might exclusively focus on scaling compute instead of other strategies. The CSET has reported on the various bottlenecks which could explain why deep learning needs for compute have slow down: training is expensive and training extremely large models generates traffic jams across many processors that are difficult to manage. there is a limited supply of AI chips (see AI chip memory shortage). CSET advances that the main resource is human capital, specifically talented researchers — according to a 2023 published survey of more than 400 AI researchers, academic and private sector workers. The survey found that AI researchers are not primarily or exclusively constrained by compute access. However, both academic and industry AI researchers equally report concerns that insufficient compute could prevent them from contributing meaningfully to AI research in the future. High compute users are more concerned about compute access. When asked about which resource provided by the government would be the most useful to them, some AI researchers select compute, other prefer grant funding. For this goal, CSET advised policymakers to ensure that even researchers with smaller budgets could effectively contribute to AI research. Other proposed strategies include using contemporary AI algorithms, managing modern AI infrastructure or focusing on interdisciplinary work between the AI field and other fields of computer science. A 2024 study on compute access found that academic-only AI research teams often have less compute intensive research topics, especially foundation models, compared to industry AI labs. As a consequence, academia is likely to play a smaller role in advancing such techniques. The researchers suggest nationally-sponsored computing infrastructure as well as open science initiatives to boost academic compute access. === Data === A 2022 study found that current large language models are significantly under-trained, a consequence of focusing on scaling language models whilst keeping the amount of training data constant. By training over 400 language models of various parameter and token size, they found that "for compute-optimal training", the model size and the number of training tokens should ideally be scaled equally: for every doubling of model size the number of training tokens should also be doubled.

Digital journalism

Digital journalism, also known as netizen journalism or online journalism, is a contemporary form of journalism where editorial content is distributed via the Internet, as opposed to publishing via print or broadcast. What constitutes digital journalism is debated amongst scholars. However, the primary product of journalism, which is news and features on current affairs, is presented solely or in combination as text, audio, video, or some interactive forms like storytelling stories or newsgames and disseminated through digital media technology. Fewer barriers to entry, lowered distribution costs and diverse computer networking technologies have led to the widespread practice of digital journalism. It has democratized the flow of information that was previously controlled by traditional media including newspapers, magazines, radio and television. Most readers expect online journalists to be reliable and competent, but these journalists often fail to meet this standard because they have very short deadlines and do not have enough resources to produce decent work. Some have asserted that a greater degree of creativity can be exercised with digital journalism when compared to traditional journalism and traditional media. The digital aspect may be central to the journalistic message and remains, to some extent, within the creative control of the writer, editor and/or publisher. It has been acknowledged that reports of its growth have tended to be exaggerated. In fact, a 2019 Pew survey showed a 16% decline in the time spent on online news sites since 2016. In the United States, reports issued by the Federal Communications Commission (FCC) in 2011 and by the Government Accountability Office (GAO) and the Congressional Research Service (CRS) in 2023 found that increases in newsroom staffing at digital-native news websites from 2008 to 2020 were not offsetting cuts in newsroom staffing among newspapers (which numbered in the tens of thousands of jobs), and that newspapers and television (which had been seeing declining newsroom staffing alongside newspapers) still employed the majority of payrolled newsroom staff in the United States in 2022 while online-only news websites employed less than 10%. The GAO and CRS reports noted further that the reduction in subscription and advertising revenue for the U.S. newspaper industry from 2000 to 2020 that constituted the overwhelming majority of its inflation-adjusted total revenue was not being offset by digital circulation or online advertising despite almost two-thirds of U.S. advertising spending in total by 2020 being online. Also, while the FCC report noted that local television stations in the United States had become some of the largest providers of local news online, the FCC found in a 2021 working paper that inflation-adjusted advertising revenue for television stations fell nationally from 2010 to 2018. == Overview == Digital journalism flows as journalism flows and is difficult to pinpoint where it is and where it is going. In partnership with digital media, digital journalism uses facets of digital media to perform journalist tasks, for example, using the internet as a tool rather than a singular form of digital media. There is no absolute agreement as to what constitutes digital journalism. Mu Lin argues that, "Web and mobile platforms demand us to adopt a platform-free mindset for an all-inclusive production approach – create the [digital] contents first, then distribute via appropriate platforms." The repurposing of print content for an online audience is sufficient for some, while others require content created with the digital medium's unique features like hypertextuality. Fondevila Gascón adds multimedia and interactivity to complete the digital journalism essence. For Deuze, online journalism can be functionally differentiated from other kinds of journalism by its technological component which journalists have to consider when creating or displaying content. Digital journalistic work may range from purely editorial content like CNN (produced by professional journalists) online to public-connectivity websites like Slashdot (communication lacking formal barriers of entry). The difference of digital journalism from traditional journalism may be in its re-conceptualised role of the reporter in relation to audiences and news organizations. The expectations of society for instant information was important for the evolution of digital journalism. However, it is likely that the exact nature and roles of digital journalism will not be fully known for some time. Some researchers even argue that the free distribution of online content, online advertisement and the new way recipients use news could undermine the traditional business model of mass media distributors that is based on single-copy sales, subscriptions and the selling of advertisement space. == History == The first type of digital journalism, called teletext, was invented in the UK in 1970. Teletext is a system allowing viewers to choose which stories they wish to read and see it immediately. The information provided through teletext is brief and instant, similar to the information seen in digital journalism today. The information was broadcast between the frames of a television signal in what was called the vertical blanking interval or VBI. American journalist Hunter S. Thompson relied on early digital communication technology beginning by using a fax machine to report from the 1971 US presidential campaign trail as documented in his book Fear and Loathing on the Campaign Trail. After the invention of teletext was the invention of videotex, of which Prestel was the world's first system, launching commercially in 1979 with various British newspapers, such as the Financial Times lining up to deliver newspaper stories online through it. Videotex closed down in 1986 due to failing to meet end-user demand. American newspaper companies took notice of the new technology and created their own videotex systems, the largest and most ambitious being Viewtron, a service of Knight-Ridder launched in 1981. Others were Keycom in Chicago and Gateway in Los Angeles. All of them had closed by 1986. Next came computer Bulletin Board Systems. In the late 1980s and early 1990s, several smaller newspapers started online news services using BBS software and telephone modems. The first of these was the Albuquerque Tribune in 1989. Computer Gaming World in September 1992 broke the news of Electronic Arts' acquisition of Origin Systems on Prodigy, before its next issue went to press. Online news websites began to proliferate in the 1990s. An early adopter was The News & Observer in Raleigh, North Carolina which offered online news as Nando. Steve Yelvington wrote on the Poynter Institute website about Nando, owned by The N&O, by saying "Nando evolved into the first serious, professional news site on the World Wide Web". It originated in the early 1990s as "NandO Land". It is believed that a major increase in digital online journalism occurred around this time when the first commercial web browsers, Netscape Navigator (1994) and Internet Explorer (1995). By 1996, most news outlets had an online presence. Although journalistic content was repurposed from original text/video/audio sources without change in substance, it could be consumed in different ways because of its online form through toolbars, topically grouped content, and intertextual links. A twenty-four-hour news cycle and new ways of user-journalist interaction web boards were among the features unique to the digital format. Later, portals such as AOL and Yahoo! and their news aggregators (sites that collect and categorize links from news sources) led to news agencies such as The Associated Press to supplying digitally suited content for aggregation beyond the limit of what client news providers could use in the past. Also, Salon, was founded in 1995. In 2001, the American Journalism Review called Salon the Internet's "preeminent independent venue for journalism." In 2008, for the first time, more Americans reported getting their national and international news from the internet, rather than newspapers. Young people aged 18 to 29 now primarily get their news via the Internet, according to a Pew Research Center report. Audiences to news sites continued to grow due to the launch of new news sites, continued investment in news online by conventional news organizations, and the continued growth in internet audiences overall. Sixty-five percent of youth now primarily access the news online. Mainstream news sites are the most widespread form of online news media production. As of 2000, the vast majority of journalists in the Western world now use the internet regularly in their daily work. In addition to mainstream news sites, digital journalism is found in index and category sites (sites without much original content but multiple links to existing news sites), meta- and comment sites (sites about