Model collapse

Model collapse

Model collapse, also known by other names such as "AI inbreeding", "AI cannibalism", "Habsburg AI", and "model autophagy disorder" or "MAD" is a phenomenon noted in artificial intelligence studies, where machine learning models gradually degrade due to errors coming from uncurated synthetic data, or due to training on the outputs of another model such as prior versions of itself. It is unclear to what extent the phenomenon threatens the long-term development of such models, and some techniques have been proposed to mitigate the effect. == Characteristics == Shumailov et al. coined the term to describe two specific stages to the degradation of machine learning models: early model collapse and late model collapse: In early model collapse, the model begins losing information about the tails of the distribution – mostly affecting minority data. Later work highlighted that early model collapse is hard to notice, since overall performance may appear to improve, while the model loses performance on minority data. In late model collapse, the model loses a significant proportion of its performance, confusing concepts and losing most of its variance. == Mechanism == Using synthetic data as training data can lead to issues with the quality and reliability of the trained model. Model collapse occurs for three main reasons: functional approximation errors sampling errors learning errors Importantly, it happens in even the simplest of models, where not all of the error sources are present. In more complex models the errors often compound, leading to faster collapse. == Disagreement over real-world impact == Some researchers and commentators on model collapse warn that the phenomenon could fundamentally threaten future generative AI development: As AI-generated data is shared on the Internet, it will inevitably end up in future training datasets, which are often crawled from the Internet. If training on "slop" (large quantities of unlabeled synthetic data) inevitably leads to model collapse, this could therefore pose a difficult problem. However, recently, other researchers have disagreed with this argument, showing that if synthetic data accumulates alongside human-generated data, model collapse is avoided. The researchers argue that data accumulating over time is a more realistic description of reality than deleting all existing data every year, and that the real-world impact of model collapse may not be as catastrophic as feared. An alternative branch of the literature investigates the use of machine learning detectors and watermarking to identify model generated data and filter it out. == Mathematical models of the phenomenon == === 1D Gaussian model === In 2024, a first attempt has been made at illustrating collapse for the simplest possible model — a single dimensional normal distribution fit using unbiased estimators of mean and variance, computed on samples from the previous generation. To make this more precise, we say that original data follows a normal distribution X 0 ∼ N ( μ , σ 2 ) {\displaystyle X^{0}\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , and we possess M 0 {\displaystyle M_{0}} samples X j 0 {\displaystyle X_{j}^{0}} for j ∈ { 1 , … , M 0 } {\displaystyle j\in {\{\,1,\dots ,M_{0}\,{}\}}} . Denoting a general sample X j i {\displaystyle X_{j}^{i}} as sample j ∈ { 1 , … , M i } {\displaystyle j\in {\{\,1,\dots ,M_{i}\,{}\}}} at generation i {\displaystyle i} , then the next generation model is estimated using the sample mean and variance: μ i + 1 = 1 M i ∑ j X j i ; σ i + 1 2 = 1 M i − 1 ∑ j ( X j i − μ i + 1 ) 2 . {\displaystyle \mu _{i+1}={\frac {1}{M_{i}}}\sum _{j}X_{j}^{i};\quad \sigma _{i+1}^{2}={\frac {1}{M_{i}-1}}\sum _{j}(X_{j}^{i}-\mu _{i+1})^{2}.} Leading to a conditionally normal next generation model X j i + 1 | μ i + 1 , σ i + 1 ∼ N ( μ i + 1 , σ i + 1 2 ) {\displaystyle X_{j}^{i+1}|\mu _{i+1},\;\sigma _{i+1}\sim {\mathcal {N}}(\mu _{i+1},\sigma _{i+1}^{2})} . In theory, this is enough to calculate the full distribution of X j i {\displaystyle X_{j}^{i}} . However, even after the first generation, the full distribution is no longer normal: It follows a variance-gamma distribution. To continue the analysis, instead of writing the probability density function at each generation, it is possible to explicitly construct them in terms of independent random variables using Cochran's theorem. To be precise, μ 1 {\displaystyle \mu _{1}} and σ 1 {\displaystyle \sigma _{1}} are independent, with μ 1 ∼ N ( μ , σ 2 M 0 ) {\displaystyle \mu _{1}\sim {\mathcal {N}}\left(\mu ,{\frac {\sigma ^{2}}{M_{0}}}\right)} and ( M 0 − 1 ) σ 1 2 ∼ σ 2 Γ ( M 0 − 1 2 , 1 2 ) {\displaystyle (M_{0}-1)\,\sigma _{1}^{2}\sim \sigma ^{2}\,\Gamma \left({\frac {M_{0}-1}{2}},{\frac {1}{2}}\right)} , following a Gamma distribution. Denoting with Z {\displaystyle Z} Gaussian random variables distributed according to N ( 0 , 1 ) {\displaystyle {\mathcal {N}}(0,1)} and with S i {\displaystyle S^{i}} random variables distributed with 1 M i − 1 − 1 Γ ( M i − 1 − 1 2 , 1 2 ) {\displaystyle {\frac {1}{M_{i-1}-1}}\Gamma \left({\frac {M_{i-1}-1}{2}},{\frac {1}{2}}\right)} , it turns out to be possible to write samples at each generation as X j 0 = μ + σ Z j 0 , {\textstyle X_{j}^{0}=\mu +\sigma Z_{j}^{0},} X j 1 = μ + σ M 0 Z 1 + σ S 1 Z j 1 , {\textstyle X_{j}^{1}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+\sigma {\sqrt {S^{1}}}Z_{j}^{1},} and more generally X j n = μ + σ M 0 Z 1 + σ M 1 S 1 Z 2 + ⋯ + σ M n − 1 S 1 × ⋯ × S n − 1 Z n + σ S 1 × ⋯ × S n Z j n . {\displaystyle X_{j}^{n}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+{\frac {\sigma }{\sqrt {M_{1}}}}{\sqrt {S^{1}}}Z^{2}+\dots +{\frac {\sigma }{\sqrt {M_{n-1}}}}{\sqrt {S^{1}\times \dots \times S^{n-1}}}Z^{n}+\sigma {\sqrt {S^{1}\times \dots \times S^{n}}}Z_{j}^{n}.} Note, that these are not joint distributions, as Z n {\displaystyle Z^{n}} and S n {\displaystyle S^{n}} depend directly on Z j n − 1 {\displaystyle Z_{j}^{n-1}} , but when considering X j n {\displaystyle X_{j}^{n}} on its own the formula above provides all the information about the full distribution. To analyse the model collapse, we can first calculate variance and mean of samples at generation n {\displaystyle n} . This would tell us what kind of distributions we expect to arrive at after n {\displaystyle n} generations. It is possible to find its exact value in closed form, but the mean and variance of the square root of gamma distribution are expressed in terms of gamma functions, making the result quite clunky. Following, it is possible to expand all results to second order in each of 1 / M i {\displaystyle 1/M_{i}} , assuming each sample size to be large. It is then possible to show that 1 σ 2 Var ⁡ ( X j n ) = 1 M 0 + 1 M 1 + ⋯ + 1 M n − 1 + 1 + O ( M i − 2 ) . {\displaystyle {\frac {1}{\sigma ^{2}}}\operatorname {Var} (X_{j}^{n})={\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n-1}}}+1+{\mathcal {O}}\left(M_{i}^{-2}\right).} And if all sample sizes M i = M {\displaystyle M_{i}=M} are constant, this diverges linearly as n → ∞ {\displaystyle n\to \infty } : Var ⁡ ( X j n ) = σ 2 ( 1 + n M ) ; E ( X j n ) = μ . {\displaystyle \operatorname {Var} (X_{j}^{n})=\sigma ^{2}\left(1+{\frac {n}{M}}\right);\quad \mathbb {E} (X_{j}^{n})=\mu .} This is the same scaling as for a single dimensional Gaussian random walk. However, divergence of the variance of X j n {\displaystyle X_{j}^{n}} does not directly provide any information about the corresponding estimates of μ n + 1 {\displaystyle \mu _{n+1}} and σ n + 1 {\displaystyle \sigma _{n+1}} , particularly how different they are from the original μ {\displaystyle \mu } and σ {\displaystyle \sigma } . It turns out to be possible to calculate the distance between the true distribution and the approximated distribution at step n + 1 {\displaystyle n+1} , using the Wasserstein-2 distance (which is also sometimes referred to as risk): E [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 3 2 σ 2 ( 1 M 0 + 1 M 1 + ⋯ + 1 M n ) + O ( M i − 2 ) , {\displaystyle \mathbb {E} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {3}{2}}\sigma ^{2}\left({\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n}}}\right)+{\mathcal {O}}\left(M_{i}^{-2}\right),} Var ⁡ [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 1 2 σ 4 ( 3 M 0 2 + 3 M 1 2 + ⋯ + 3 M n 2 + ∑ i ≠ j 4 M i M j ) + O ( M i − 3 ) . {\displaystyle \operatorname {Var} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {1}{2}}\sigma ^{4}\left({\frac {3}{M_{0}^{2}}}+{\frac {3}{M_{1}^{2}}}+\dots +{\frac {3}{M_{n}^{2}}}+\sum _{i\neq j}{\frac {4}{M_{i}M_{j}}}\right)+{\mathcal {O}}\left(M_{i}^{-3}\right).} This directly shows why model collapse occurs in this simple model. Due to errors from re-sampling the approximated distribution, each generation ends up corresponding to a

Floyd–Steinberg dithering

Floyd–Steinberg dithering is an image dithering algorithm first published in 1976 by Robert W. Floyd and Louis Steinberg. It is commonly used by image manipulation software, for example, when converting an image from a Truecolor 24-bit PNG format into a GIF format, which is restricted to a maximum of 256 colors. == Implementation == The algorithm achieves dithering using error diffusion, meaning it pushes (adds) the residual quantization error of a pixel onto its neighboring pixels, to be quantized after. It spreads the debt out according to the distribution (shown as a map of the neighboring pixels): [ ∗ 7 16 … … 3 16 5 16 1 16 … ] {\displaystyle {\begin{bmatrix}&&&{\frac {\displaystyle 7}{\displaystyle 16}}&\ldots \\\ldots &{\frac {\displaystyle 3}{\displaystyle 16}}&{\frac {\displaystyle 5}{\displaystyle 16}}&{\frac {\displaystyle 1}{\displaystyle 16}}&\ldots \\\end{bmatrix}}} The pixel indicated with a star () indicates the pixel currently being scanned, and the blank pixels are the previously scanned pixels. The specific values (7/16, 3/16, 5/16, 1/16) were originally found by trial-and-error, "guided by the desire to have a region of desired density 0.5 come out as a checkerboard pattern". The algorithm scans the image from left to right, top to bottom, quantizing pixel values one by one. Each time, the quantization error is transferred to the neighboring pixels, while not affecting the pixels that already have been quantized. Hence, if a number of pixels have been rounded downwards, it becomes more likely that the next pixel is rounded upwards, such that on average, the quantization error is close to zero. The diffusion coefficients have the property that if the original pixel values are exactly halfway in between the nearest available colors, the dithered result is a checkerboard pattern. For example, 50% grey data could be dithered as a black-and-white checkerboard pattern. For optimal dithering, the counting of quantization errors should be in sufficient accuracy to prevent rounding errors from affecting the result. For correct results, all values should be linearized first, rather than operating directly on sRGB values as is common for images stored on computers. In some implementations, the horizontal direction of scan alternates between lines; this is called "serpentine scanning" or boustrophedon transform dithering. The algorithm described above is in the following pseudocode. This works for any approximately linear encoding of pixel values, such as 8-bit integers, 16-bit integers or real numbers in the range [0, 1]. for each y from top to bottom do for each x from left to right do oldpixel := pixels[x][y] newpixel := find_closest_palette_color(oldpixel) pixels[x][y] := newpixel quant_error := oldpixel - newpixel pixels[x + 1][y ] := pixels[x + 1][y ] + quant_error × 7 / 16 pixels[x - 1][y + 1] := pixels[x - 1][y + 1] + quant_error × 3 / 16 pixels[x ][y + 1] := pixels[x ][y + 1] + quant_error × 5 / 16 pixels[x + 1][y + 1] := pixels[x + 1][y + 1] + quant_error × 1 / 16 When converting grayscale pixel values from a high to a low bit depth (e.g. 8-bit grayscale to 1-bit black-and-white), find_closest_palette_color() may perform just a simple rounding, for example: find_closest_palette_color(oldpixel) = round(oldpixel / 255) The pseudocode can result in pixel values exceeding the valid values (such as greater than 255 in 8-bit grayscale images). Such values should ideally be handled by the find_closest_palette_color() function, rather than clipping the intermediate values, since a subsequent error may bring the value back into range. However, if fixed-width integers are used, wrapping of intermediate values would cause inversion of black and white, and so should be avoided. The find_closest_palette_color() implementation is nontrivial for a palette that is not evenly distributed, however small inaccuracies in selecting the correct palette color have minimal visual impact due to error being propagated to future pixels. A nearest neighbor search in 3D is frequently used.

List of speech recognition software

Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways. == Acoustic models and speech corpus (compilation) == The following list presents notable speech recognition software engines with a brief synopsis of characteristics. == Macintosh == == Cross-platform web apps based on Chrome == The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API. == Mobile devices and smartphones == Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including: == Windows == === Windows built-in speech recognition === The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10. === Windows 7, 8, 10, 11 third-party speech recognition === Braina – Dictate into third party software and websites, fill web forms and execute vocal commands. Dragon NaturallySpeaking from Nuance Communications – Successor to the older DragonDictate product. Focus on dictation. 64-bit Windows support since version 10.1. Tazti – Create speech command profiles to play PC games and control applications – programs. Create speech commands to open files, folders, webpages, applications. Windows 7, Windows 8 and Windows 8.1 versions. Voice Finger – software that improves the Windows speech recognition system by adding several extensions to it. The software enables controlling the mouse and the keyboard by only using the voice. It is especially useful for aiding users to overcome disabilities or to heal from computer injuries. === Microsoft Speech API === The first version of the Microsoft Speech API was released for Windows NT 3.51 and Windows 95 in 1995, it was then part of Windows up to Windows Vista. This initial version already contained Direct Speech Recognition and Direct Text To Speech APIs which applications could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It can also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, the pure SDK form lacks any user interface (numerous applications were available), and thus is unsuitable for end users. == Built-in software == Microsoft Kinect includes built-in software which allows speech recognition of commands. Older generations of Nokia phones like Nokia N Series (before using Windows 7 mobile technology) used speech-recognition with family names from contact list and a few commands. Siri, originally implemented in the iPhone 4S, Apple's personal assistant for iOS, which uses technology from Nuance Communications. Cortana (software), Microsoft's personal assistant built into Windows Phone and Windows 10. == Interactive voice response == The following are interactive voice response (IVR) systems: CSLU Toolkit Genesys HTK – copyrighted by Microsoft, but allows altering software for licensee's internal use LumenVox ASR Tellme Networks; acquired by Microsoft == Unix-like x86 and x86-64 speech transcription software == Janus Recognition Toolkit (JRTk) Mozilla DeepSpeech was developing an open-source Speech-To-Text engine based on Baidu's deep speech research paper. Weesper Neon Flow – professional voice-dictation software that provides offline speech-to-text processing on macOS and Windows using local AI models. It is not open source and offers a paid subscription after a 15‑day free trial. Vocalinux – open-source speech transcription software for Linux. == Discontinued software == IBM VoiceType (formerly IBM Personal Dictation System) IBM ViaVoice – Embedded version still maintained by IBM. No longer supported for versions above Windows Vista. Untested above macOS 10.4 or on Macintoshes with an Intel chipset. Quack.com; acquired by AOL; the name has now been reused for an iPad search app. SpeechWorks from Nuance Communications. Yap Speech Cloud – Speech-to-text platform acquired by Amazon.com.

Adobe GoLive

Adobe GoLive was a WYSIWYG HTML editor and web site management application from Adobe Systems. It replaced Adobe PageMill as Adobe's primary HTML editor and was itself discontinued in favor of Dreamweaver. The last version of GoLive that Adobe released was GoLive 9. == History == GoLive originated as the flagship product of a company named GoNet Communication, Inc. then based in Menlo Park, California, and the development company GoNet Communications GmbH in Hamburg, Germany, in 1996. Later GoNet changed its name to GoLive Systems, Inc, and the name of its product to GoLive CyberStudio. Adobe acquired GoLive in 1999 and re-branded the GoLive CyberStudio product to what became Adobe GoLive. Adobe took over the Hamburg office as an Adobe development site to continue to develop the product. At the time of the acquisition, CyberStudio was a Macintosh-only application. In the spring of 1999 Adobe released Adobe GoLive for both Macintosh and Microsoft Windows. The first versions of Dreamweaver and CyberStudio were released in a similar timeframe. However, Dreamweaver eventually became the dominant WYSIWYG HTML editor in market share. After the Adobe acquisition of Macromedia (the company that had owned Dreamweaver), GoLive was progressively re-targeted toward Adobe's traditional design market, and the product became better integrated with Adobe's existing suite of design-oriented software products and less focused on the professional web development market. The Adobe CS2 Premium suite contained GoLive CS2. With the release of Creative Suite 3, Adobe integrated Dreamweaver as a replacement for GoLive and released GoLive 9 as a standalone product. In April 2008, Adobe announced that sales and development of GoLive would cease in favor of Dreamweaver. == General description and distinctive aspects == GoLive incorporated a largely modeless workflow that relied heavily on drag-and-drop. Most user interaction was done via a contextual inspector rather than the modal workflow found in Dreamweaver. Among its features were a separate editor for tables that supported nesting, and a two-dimensional panel for applying CSS styles to elements. GoLive supported drag-and-drop of native Adobe Photoshop and Adobe Illustrator files via what the company called "Smart Objects", which then automatically guided the user through saving those files in web-supported formats. Updates to the original Photoshop or Illustrator assets were automatically tracked by GoLive. It also implemented a tool called "Components" which allowed updates to interface elements throughout a site to be updated globally by changing one single file. As a website management tool, GoLive allowed users to transfer and publish content directly from within the application, and allowed individual files to be excluded from uploading. == Features == One of the new features of GoLive version 5 was Dynamic Link, which was a method of creating dynamic, database-driven web content without the need to know a server-side language and with full WYSIWYG support in the GoLive user interface. GoLive had a powerful set of extensibility API which could be used to add additional functionality to the product. The GoLive SDK provided interfaces which allowed developers to use a combination of XML, JavaScript and C/C++ to create plugins for the product. The extensibility API allowed developers access to custom drawing and event handling using JavaScript, as well as a full JavaScript debugger and command line interpreter. This allowed intermediate-level developers using interpreted JavaScript to create sophisticated user interfaces. == Language and framework structure == Adobe GoLive is coded in the C++ programming language. It uses a custom C++ framework called SCL (Simple Class Library) which was initially built from scratch by the engineers at GoLive Systems Inc. The SCL framework was also used in the short-lived Adobe Atmosphere 3D software. == Release history == As the final version, GoLive 9 was discontinued in April 2008.

Gapo

Gapo is a Vietnamese social networking service based in Hanoi, Vietnam. Users are able to create a personal profile and share text, photos and videos with others on the platform. Users can also use Gapo for live streaming, instant messaging, blogging, and online payments. Gapo was launched in July 2019 by Hà Trung Kiên and Duong Vi Khoa. == History == Gapo was founded in response to calls for Vietnam's Communist-led government to produce a domestic alternative to social media giants like Facebook and Google. Gapo officially launched on July 23, 2019 at an event in Hanoi. The company received 500 billion đồng (US$22 million) in funding from technology corporation G-Group to be utilized in the first phase of development. They also partnered with Sony Music Entertainment to provide music content to its services. == Features == Gapo features a news feed for posting content, livestreaming, instant messaging, and blogging. It also allows users to pay online and access public services. == Reception == Within two days of launch, Gapo received about 200,000 registrations. By September 2019, the user base increased to one million. Upon launch, Gapo experienced significant technical difficulties. Users complained about the inability to sign up for a new account and said that certain functions were not available for use at launch. This issue caused Gapo to temporarily suspend their services in order to perform upgrades and bug fixes. Gapo relaunched the next day, though many users reported that the access speed decreased. The mobile app also received mixed reviews from users in both the App Store and the Google Play Store, with an average rating of 3.1 and 3.5, respectively. Most users found the app to be a knockoff of Facebook, although some users praised the app for being locally developed. === Expert opinions on platform viability === Le Hong Hiep of the ISEAS - Yusof Ishak Institute was doubtful that a Vietnamese-owned social network service could be as powerful as a foreign-based service, stating that Vietnam might not be able to develop a viable social media network to compete with the likes of Facebook or Google. Others, like blogger Ann Chi, said that, due to local players complying with local censorship policy, there is a chance that locals might not trust Gapo and other local services in light of possible surveillance. Regarding the targeted user base figure for the end of 2019 and 2021, experts cautioned that the company might need an additional trillion đồng of funding to reach its planned user base targets. In response, the company stated that Gapo was never meant to compete with Facebook, but instead noted that the main difference between Gapo and Facebook is that Gapo provides a personalized user experience through customization. == Censorship == Gapo has the right to censor posts and news that are deemed offensive and inaccurate by users or not approved by the censorship curators.

Confusion network

A confusion network (sometimes called a word confusion network or informally known as a sausage) is a natural language processing method that combines outputs from multiple automatic speech recognition or machine translation systems. Confusion networks are simple linear directed acyclic graphs with the property that each a path from the start node to the end node goes through all the other nodes. The set of words represented by edges between two nodes is called a confusion set. In machine translation, the defining characteristic of confusion networks is that they allow multiple ambiguous inputs, deferring committal translation decisions until later stages of processing. This approach is used in the open source machine translation software Moses and the proprietary translation API in IBM Bluemix Watson.

Spherical basis

In pure and applied mathematics, particularly quantum mechanics and computer graphics and their applications, a spherical basis is the basis used to express spherical tensors. The spherical basis closely relates to the description of angular momentum in quantum mechanics and spherical harmonic functions. While spherical polar coordinates are one orthogonal coordinate system for expressing vectors and tensors using polar and azimuthal angles and radial distance, the spherical basis are constructed from the standard basis and use complex numbers. == In three dimensions == A vector A in 3D Euclidean space R3 can be expressed in the familiar Cartesian coordinate system in the standard basis ex, ey, ez, and coordinates Ax, Ay, Az: or any other coordinate system with associated basis set of vectors. From this extend the scalars to allow multiplication by complex numbers, so that we are now working in C 3 {\displaystyle \mathbb {C} ^{3}} rather than R 3 {\displaystyle \mathbb {R} ^{3}} . === Basis definition === In the spherical bases denoted e+, e−, e0, and associated coordinates with respect to this basis, denoted A+, A−, A0, the vector A is: where the spherical basis vectors can be defined in terms of the Cartesian basis using complex-valued coefficients in the xy plane: in which i {\displaystyle i} denotes the imaginary unit, and one normal to the plane in the z direction: e 0 = e z {\displaystyle \mathbf {e} _{0}=\mathbf {e} _{z}} The inverse relations are: === Commutator definition === While giving a basis in a 3-dimensional space is a valid definition for a spherical tensor, it only covers the case for when the rank k {\displaystyle k} is 1. For higher ranks, one may use either the commutator, or rotation definition of a spherical tensor. The commutator definition is given below, any operator T q ( k ) {\displaystyle T_{q}^{(k)}} that satisfies the following relations is a spherical tensor: [ J ± , T q ( k ) ] = ℏ ( k ∓ q ) ( k ± q + 1 ) T q ± 1 ( k ) {\displaystyle [J_{\pm },T_{q}^{(k)}]=\hbar {\sqrt {(k\mp q)(k\pm q+1)}}T_{q\pm 1}^{(k)}} [ J z , T q ( k ) ] = ℏ q T q ( k ) {\displaystyle [J_{z},T_{q}^{(k)}]=\hbar qT_{q}^{(k)}} === Rotation definition === Analogously to how the spherical harmonics transform under a rotation, a general spherical tensor transforms as follows, when the states transform under the unitary Wigner D-matrix D ( R ) {\displaystyle {\mathcal {D}}(R)} , where R is a (3×3 rotation) group element in SO(3). That is, these matrices represent the rotation group elements. With the help of its Lie algebra, one can show these two definitions are equivalent. D ( R ) T q ( k ) D † ( R ) = ∑ q ′ = − k k T q ′ ( k ) D q ′ q ( k ) {\displaystyle {\mathcal {D}}(R)T_{q}^{(k)}{\mathcal {D}}^{\dagger }(R)=\sum _{q'=-k}^{k}T_{q'}^{(k)}{\mathcal {D}}_{q'q}^{(k)}} === Coordinate vectors === For the spherical basis, the coordinates are complex-valued numbers A+, A0, A−, and can be found by substitution of (3B) into (1), or directly calculated from the inner product ⟨, ⟩ (5): A 0 = ⟨ e 0 , A ⟩ = ⟨ e z , A ⟩ = A z {\displaystyle A_{0}=\left\langle \mathbf {e} _{0},\mathbf {A} \right\rangle =\left\langle \mathbf {e} _{z},\mathbf {A} \right\rangle =A_{z}} with inverse relations: In general, for two vectors with complex coefficients in the same real-valued orthonormal basis ei, with the property ei·ej = δij, the inner product is: where · is the usual dot product and the complex conjugate must be used to keep the magnitude (or "norm") of the vector positive definite. == Properties (three dimensions) == === Orthonormality === The spherical basis is an orthonormal basis, since the inner product ⟨, ⟩ (5) of every pair vanishes meaning the basis vectors are all mutually orthogonal: ⟨ e + , e − ⟩ = ⟨ e − , e 0 ⟩ = ⟨ e 0 , e + ⟩ = 0 {\displaystyle \left\langle \mathbf {e} _{+},\mathbf {e} _{-}\right\rangle =\left\langle \mathbf {e} _{-},\mathbf {e} _{0}\right\rangle =\left\langle \mathbf {e} _{0},\mathbf {e} _{+}\right\rangle =0} and each basis vector is a unit vector: ⟨ e + , e + ⟩ = ⟨ e − , e − ⟩ = ⟨ e 0 , e 0 ⟩ = 1 {\displaystyle \left\langle \mathbf {e} _{+},\mathbf {e} _{+}\right\rangle =\left\langle \mathbf {e} _{-},\mathbf {e} _{-}\right\rangle =\left\langle \mathbf {e} _{0},\mathbf {e} _{0}\right\rangle =1} hence the need for the normalizing factors of 1 / 2 {\displaystyle 1/\!{\sqrt {2}}} . === Change of basis matrix === The defining relations (3A) can be summarized by a transformation matrix U: ( e + e − e 0 ) = U ( e x e y e z ) , U = ( − 1 2 − i 2 0 + 1 2 − i 2 0 0 0 1 ) , {\displaystyle {\begin{pmatrix}\mathbf {e} _{+}\\\mathbf {e} _{-}\\\mathbf {e} _{0}\end{pmatrix}}=\mathbf {U} {\begin{pmatrix}\mathbf {e} _{x}\\\mathbf {e} _{y}\\\mathbf {e} _{z}\end{pmatrix}}\,,\quad \mathbf {U} ={\begin{pmatrix}-{\frac {1}{\sqrt {2}}}&-{\frac {i}{\sqrt {2}}}&0\\+{\frac {1}{\sqrt {2}}}&-{\frac {i}{\sqrt {2}}}&0\\0&0&1\end{pmatrix}}\,,} with inverse: ( e x e y e z ) = U − 1 ( e + e − e 0 ) , U − 1 = ( − 1 2 + 1 2 0 + i 2 + i 2 0 0 0 1 ) . {\displaystyle {\begin{pmatrix}\mathbf {e} _{x}\\\mathbf {e} _{y}\\\mathbf {e} _{z}\end{pmatrix}}=\mathbf {U} ^{-1}{\begin{pmatrix}\mathbf {e} _{+}\\\mathbf {e} _{-}\\\mathbf {e} _{0}\end{pmatrix}}\,,\quad \mathbf {U} ^{-1}={\begin{pmatrix}-{\frac {1}{\sqrt {2}}}&+{\frac {1}{\sqrt {2}}}&0\\+{\frac {i}{\sqrt {2}}}&+{\frac {i}{\sqrt {2}}}&0\\0&0&1\end{pmatrix}}\,.} It can be seen that U is a unitary matrix, in other words its Hermitian conjugate U† (complex conjugate and matrix transpose) is also the inverse matrix U−1. For the coordinates: ( A + A − A 0 ) = U ∗ ( A x A y A z ) , U ∗ = ( − 1 2 + i 2 0 + 1 2 + i 2 0 0 0 1 ) , {\displaystyle {\begin{pmatrix}A_{+}\\A_{-}\\A_{0}\end{pmatrix}}=\mathbf {U} ^{\mathrm {} }{\begin{pmatrix}A_{x}\\A_{y}\\A_{z}\end{pmatrix}}\,,\quad \mathbf {U} ^{\mathrm {} }={\begin{pmatrix}-{\frac {1}{\sqrt {2}}}&+{\frac {i}{\sqrt {2}}}&0\\+{\frac {1}{\sqrt {2}}}&+{\frac {i}{\sqrt {2}}}&0\\0&0&1\end{pmatrix}}\,,} and inverse: ( A x A y A z ) = ( U ∗ ) − 1 ( A + A − A 0 ) , ( U ∗ ) − 1 = ( − 1 2 + 1 2 0 − i 2 − i 2 0 0 0 1 ) . {\displaystyle {\begin{pmatrix}A_{x}\\A_{y}\\A_{z}\end{pmatrix}}=(\mathbf {U} ^{\mathrm {} })^{-1}{\begin{pmatrix}A_{+}\\A_{-}\\A_{0}\end{pmatrix}}\,,\quad (\mathbf {U} ^{\mathrm {} })^{-1}={\begin{pmatrix}-{\frac {1}{\sqrt {2}}}&+{\frac {1}{\sqrt {2}}}&0\\-{\frac {i}{\sqrt {2}}}&-{\frac {i}{\sqrt {2}}}&0\\0&0&1\end{pmatrix}}\,.} === Cross products === Taking cross products of the spherical basis vectors, we find an obvious relation: e q × e q = 0 {\displaystyle \mathbf {e} _{q}\times \mathbf {e} _{q}={\boldsymbol {0}}} where q is a placeholder for +, −, 0, and two less obvious relations: e ± × e ∓ = ± i e 0 {\displaystyle \mathbf {e} _{\pm }\times \mathbf {e} _{\mp }=\pm i\mathbf {e} _{0}} e ± × e 0 = ± i e ± {\displaystyle \mathbf {e} _{\pm }\times \mathbf {e} _{0}=\pm i\mathbf {e} _{\pm }} === Inner product in the spherical basis === The inner product between two vectors A and B in the spherical basis follows from the above definition of the inner product: ⟨ A , B ⟩ = A + B + ⋆ + A − B − ⋆ + A 0 B 0 ⋆ {\displaystyle \left\langle \mathbf {A} ,\mathbf {B} \right\rangle =A_{+}B_{+}^{\star }+A_{-}B_{-}^{\star }+A_{0}B_{0}^{\star }}