Principal component analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data are linearly transformed onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data can be easily identified. The principal components of a collection of points in a real coordinate space are a sequence of p {\displaystyle p} unit vectors, where the i {\displaystyle i} -th vector is the direction of a line that best fits the data while being orthogonal to the first i − 1 {\displaystyle i-1} vectors. Here, a best-fitting line is defined as one that minimizes the average squared perpendicular distance from the points to the line. These directions (i.e., principal components) constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science. == Overview == When performing PCA, the first principal component of a set of p {\displaystyle p} variables is the derived variable formed as a linear combination of the original variables that explains the most variance. The second principal component explains the most variance in what is left once the effect of the first component is removed, and we may proceed through p {\displaystyle p} iterations until all the variance is explained. PCA is most commonly used when many of the variables are highly correlated with each other and it is desirable to reduce their number to an independent set. The first principal component can equivalently be defined as a direction that maximizes the variance of the projected data. The i {\displaystyle i} -th principal component can be taken as a direction orthogonal to the first i − 1 {\displaystyle i-1} principal components that maximizes the variance of the projected data. For either objective, it can be shown that the principal components are eigenvectors of the data's covariance matrix. Thus, the principal components are often computed by eigendecomposition of the data covariance matrix or singular value decomposition of the data matrix. PCA is the simplest of the true eigenvector-based multivariate analyses and is closely related to factor analysis. Factor analysis typically incorporates more domain-specific assumptions about the underlying structure and solves eigenvectors of a slightly different matrix. PCA is also related to canonical correlation analysis (CCA). CCA defines coordinate systems that optimally describe the cross-covariance between two datasets while PCA defines a new orthogonal coordinate system that optimally describes variance in a single dataset. Robust and L1-norm-based variants of standard PCA have also been proposed. == History == PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering, singular value decomposition (SVD) of X (invented in the last quarter of the 19th century), eigenvalue decomposition (EVD) of XTX in linear algebra, factor analysis (for a discussion of the differences between PCA and factor analysis see Ch. 7 of Jolliffe's Principal Component Analysis), Eckart–Young theorem (Harman, 1960), or empirical orthogonal functions (EOF) in meteorological science (Lorenz, 1956), empirical eigenfunction decomposition (Sirovich, 1987), quasiharmonic modes (Brooks et al., 1988), spectral decomposition in noise and vibration, and empirical modal analysis in structural dynamics. == Intuition == PCA can be thought of as fitting a p-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the ellipsoid is small, then the variance along that axis is also small. To find the axes of the ellipsoid, we must first center the values of each variable in the dataset on 0 by subtracting the mean of the variable's observed values from each of those values. These transformed values are used instead of the original observed values for each of the variables. Then, we compute the covariance matrix of the data and calculate the eigenvalues and corresponding eigenvectors of this covariance matrix. Then we must normalize each of the orthogonal eigenvectors to turn them into unit vectors. Once this is done, each of the mutually-orthogonal unit eigenvectors can be interpreted as an axis of the ellipsoid fitted to the data. This choice of basis will transform the covariance matrix into a diagonalized form, in which the diagonal elements represent the variance of each axis. The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues. Biplots and scree plots (degree of explained variance) are used to interpret findings of the PCA. == Details == PCA is defined as an orthogonal linear transformation on a real inner product space that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. Consider an n × p {\displaystyle n\times p} data matrix, X, with column-wise zero empirical mean (the sample mean of each column has been shifted to zero), where each of the n rows represents a different repetition of the experiment, and each of the p columns gives a particular kind of feature (say, the results from a particular sensor). Mathematically, the transformation is defined by a set of size l {\displaystyle l} (where l {\displaystyle l} is usually selected to be strictly less than p {\displaystyle p} to reduce dimensionality) of p {\displaystyle p} -dimensional vectors of weights or coefficients w ( k ) = ( w 1 , … , w p ) ( k ) {\displaystyle \mathbf {w} _{(k)}=(w_{1},\dots ,w_{p})_{(k)}} that map each row vector x ( i ) = ( x 1 , … , x p ) ( i ) {\displaystyle \mathbf {x} _{(i)}=(x_{1},\dots ,x_{p})_{(i)}} of X to a new vector of principal component scores t ( i ) = ( t 1 , … , t l ) ( i ) {\displaystyle \mathbf {t} _{(i)}=(t_{1},\dots ,t_{l})_{(i)}} , given by t k ( i ) = x ( i ) ⋅ w ( k ) f o r i = 1 , … , n k = 1 , … , l {\displaystyle {t_{k}}_{(i)}=\mathbf {x} _{(i)}\cdot \mathbf {w} _{(k)}\qquad \mathrm {for} \qquad i=1,\dots ,n\qquad k=1,\dots ,l} in such a way that the individual variables t 1 , … , t l {\displaystyle t_{1},\dots ,t_{l}} of t considered over the data set successively inherit the maximum possible variance from X, with each coefficient vector w constrained to be a unit vector. The above may equivalently be written in matrix form as T = X W {\displaystyle \mathbf {T} =\mathbf {X} \mathbf {W} } where T i k = t k ( i ) {\displaystyle {\mathbf {T} }_{ik}={t_{k}}_{(i)}} , X i j = x j ( i ) {\displaystyle {\mathbf {X} }_{ij}={x_{j}}_{(i)}} , and W j k = w j ( k ) {\displaystyle {\mathbf {W} }_{jk}={w_{j}}_{(k)}} . === First component === In order to maximize variance, the first weight vector w(1) thus has to satisfy w ( 1 ) = arg ⁡ max ‖ w ‖ = 1 { ∑ i ( t 1 ) ( i ) 2 } = arg ⁡ max ‖ w ‖ = 1 { ∑ i ( x ( i ) ⋅ w ) 2 } {\displaystyle \mathbf {w} _{(1)}=\arg \max _{\Vert \mathbf {w} \Vert =1}\,\left\{\sum _{i}(t_{1})_{(i)}^{2}\right\}=\arg \max _{\Vert \mathbf {w} \Vert =1}\,\left\{\sum _{i}\left(\mathbf {x} _{(i)}\cdot \mathbf {w} \right)^{2}\right\}} Equivalently, writing this in matrix form gives w ( 1 ) = arg ⁡ max ‖ w ‖ = 1 { ‖ X w ‖ 2 } = arg ⁡ max ‖ w ‖ = 1 { w T X T X w } {\displaystyle \mathbf {w} _{(1)}=\arg \max _{\left\|\mathbf {w} \right\|=1}\left\{\left\|\mathbf {Xw} \right\|^{2}\right\}=\arg \max _{\left\|\mathbf {w} \right\|=1}\left\{\mathbf {w} ^{\mathsf {T}}\mathbf {X} ^{\mathsf {T}}\mathbf {Xw} \right\}} Since w(1) has been defined to be a unit vector, it equivalently also satisfies w ( 1 ) = arg ⁡ max { w T X T X w w T w } {\displaystyle \mathbf {w} _{(1)}=\arg \max \left\{{\frac {\mathbf {w} ^{\mathsf {T}}\mathbf {X} ^{\mathsf {T}}\mathbf {Xw} }{\mathbf {w} ^{\mathsf {T}}\mathbf {w} }}\right\}} The quantity to be maximised can be recognised as a Rayleigh quotient. A standard result for a positive semidefinite matrix such as XTX is that the quotient's maximum possible value is the largest eigenvalue of the matrix, which occurs when w is the corresponding eigenvector. With w(1) found, the first principal component of a data vector

Color management

Color management is the process of ensuring consistent and accurate colors across various devices, such as monitors, printers, and cameras. It involves the use of color profiles, which are standardized descriptions of how colors should be displayed or reproduced. Color management is necessary because different devices have different color capabilities and characteristics. For example, a monitor may display colors differently than a printer can reproduce them. Without color management, the same image may appear differently on different devices, leading to inconsistencies and inaccuracies. To achieve color management, a color profile is created for each device involved in the color workflow. This profile describes the device's color capabilities and characteristics, such as its color gamut (range of colors it can display or reproduce) and color temperature. These profiles are then used to translate colors between devices, ensuring consistent and accurate color reproduction. Color management is particularly important in industries such as graphic design, photography, and printing, where accurate color representation is crucial. It helps to maintain color consistency throughout the entire workflow, from capturing an image to displaying or printing it. Parts of color management are implemented in the operating system (OS), helper libraries, the application, and devices. The type of color profile that is typically used is called an ICC profile. A cross-platform view of color management is the use of an ICC-compatible color management system. The International Color Consortium (ICC) is an industry consortium that has defined: an open standard for a Color Matching Module (CMM) at the OS level color profiles for: devices, including DeviceLink profiles that transform one device profile (color space) to another device profile without passing through an intermediate color space, such as LAB, more accurately preserving color working spaces, the color spaces in which color data is meant to be manipulated There are other approaches to color management besides using ICC profiles. This is partly due to history and partly because of other needs than the ICC standard covers. The film and broadcasting industries make use of some of the same concepts, but they frequently rely on more limited boutique solutions. The film industry, for instance, often uses 3D LUTs (lookup table) to represent a complete color transformation for a specific RGB encoding. At the consumer level, system wide color management is available in most of Apple's products (macOS, iOS, iPadOS, watchOS). Microsoft Windows lacks system wide color management and virtually all applications do not employ color management. Windows' media player API is not color space aware, and if applications want to color manage videos manually, they have to incur significant performance and power consumption penalties. Android supports system wide color management, but most devices ship with color management disabled. == Overview == Characterize. Every color-managed device requires a personalized table, or "color profile," which characterizes the color response of that particular device. Standardize. Each color profile describes these colors relative to a standardized set of reference colors (the "Profile Connection Space"). Translate. Color-managed software then uses these standardized profiles to translate color from one device to another. This is usually performed by a color management module (CMM). == Hardware == === Characterization === To describe the behavior of various output devices, they must be compared (measured) in relation to a standard color space. Often a step called linearization is performed first, to undo the effect of gamma correction that was done to get the most out of limited 8-bit color paths. Instruments used for measuring device colors include colorimeters and spectrophotometers. As an intermediate result, the device gamut is described in the form of scattered measurement data. The transformation of the scattered measurement data into a more regular form, usable by the application, is called profiling. Profiling is a complex process involving mathematics, intense computation, judgment, testing, and iteration. After the profiling is finished, an idealized color description of the device is created. This description is called a profile. === Calibration === Calibration is like characterization, except that it can include the adjustment of the device, as opposed to just the measurement of the device. Color management is sometimes sidestepped by calibrating devices to a common standard color space such as sRGB; when such calibration is done well enough, no color translations are needed to get all devices to handle colors consistently. This avoidance of the complexity of color management was one of the goals in the development of sRGB. == Color profiles == === Embedding === Image formats themselves (such as TIFF, JPEG, PNG, EPS, PDF, and SVG) may contain embedded color profiles but are not required to do so by the image format. The International Color Consortium standard was created to bring various developers and manufacturers together. The ICC standard permits the exchange of output device characteristics and color spaces in the form of metadata. This allows the embedding of color profiles into images as well as storing them in a database or a profile directory. === Working spaces === Working spaces, such as sRGB, Adobe RGB or ProPhoto are color spaces that facilitate good results while editing. For instance, pixels with equal values of R,G,B should appear neutral. Using a large (gamut) working space will lead to posterization, while using a small working space will lead to clipping. This trade-off is a consideration for the critical image editor. == Color transformation == Color transformation, or color space conversion, is the transformation of the representation of a color from one color space to another. This calculation is required whenever data is exchanged inside a color-managed chain and carried out by a Color Matching Module. Transforming profiled color information to different output devices is achieved by referencing the profile data into a standard color space. It makes it easier to convert colors from one device to a selected standard color space and from that to the colors of another device. By ensuring that the reference color space covers the many possible colors that humans can see, this concept allows one to exchange colors between many different color output devices. Color transformations can be represented by two profiles (source profile and target profile) or by a devicelink profile. In this process there are approximations involved which make sure that the image keeps its important color qualities and also gives an opportunity to control on how the colors are being changed. === Profile connection space === In the terminology of the International Color Consortium, a translation between two color spaces can go through a profile connection space (PCS): Color Space 1 → PCS (CIELAB or CIEXYZ) → Color space 2; conversions into and out of the PCS are each specified by a profile. === Gamut mapping === In nearly every translation process, we have to deal with the fact that the color gamut of different devices vary in range which makes an accurate reproduction impossible. They therefore need some rearrangement near the borders of the gamut. Some colors must be shifted to the inside of the gamut, as they otherwise cannot be represented on the output device and would simply be clipped. This so-called gamut mismatch occurs for example, when we translate from the RGB color space with a wider gamut into the CMYK color space with a narrower gamut range. In this example, the dark highly saturated purplish-blue color of a typical computer monitor's "blue" primary is impossible to print on paper with a typical CMYK printer. The nearest approximation within the printer's gamut will be much less saturated. Conversely, an inkjet printer's "cyan" primary, a saturated mid-brightness blue, is outside the gamut of a typical computer monitor. The color management system can utilize various methods to achieve desired results and give experienced users control of the gamut mapping behavior. ==== Rendering intent ==== When the gamut of source color space exceeds that of the destination, saturated colors are liable to become clipped (inaccurately represented), or more formally burned. The color management module can deal with this problem in several ways. The ICC specification includes four different rendering intents, listed below. Before the actual rendering intent is carried out, one can temporarily simulate the rendering by soft proofing. It is a useful tool as it predicts the outcome of the colors and is available as an application in many color management systems: Absolute colorimetric Absolute colorimetry and relative colorimetry actually use the same table but differ in the adjust

Full Dive

Full Dive, short for Full Dive: This Ultimate Next-Gen Full Dive RPG Is Even Shittier than Real Life! (Japanese: 究極進化したフルダイブRPGが現実よりもクソゲーだったら, Hepburn: Kyūkyoku Shinka shita Furu Daibu RPG ga Genjitsu yori mo Kusogē Dattara), is a Japanese light novel series written by Light Tuchihi and illustrated by Youta. Media Factory has published four volumes since August 2020 under their MF Bunko J imprint. A manga adaptation with art by Kino was serialized in Media Factory's seinen manga magazine Monthly Comic Alive from January 2021 to January 2022. An anime television series adaptation by ENGI aired from April to June 2021. == Plot == Hiroshi Yuki, with the player name of Hiro, is a high school boy who loves to play virtual reality MMORPGs (VRMMORPG) in order to escape reality. When a game store manager named Reona Kisaragi tricks him into buying the game Kiwame Quest, he soon discovers that it is not what it seems. Unlike regular games, it is a game that tries to pursue realism to a fanatical point. As such, Hiroshi struggles to eke out a niche. Despite the disadvantages, he is determined to complete the game. == Characters == === Main characters === Hiroshi Yuki (結城宏, Yūki Hiroshi) Voiced by: Daiki Yamashita, Riho Sugiyama (young) (Japanese); Johnny Yong Bosch, Michele Knotz (young) (English) Hiroshi is a high school student who is tricked into buying Kiwame Quest by game store manager, Reona Kisaragi. He is a former member of the track team who quit following an unfortunate incident and he likes to play VRMMORPGs in order to escape reality. His player name is Hiro. Reona Kisaragi (如月玲於奈, Kisaragi Reona) Voiced by: Ayana Taketatsu (Japanese); Natalie Van Sistine (English) Reona is a game store manager who tricks Hiroshi into buying Kiwame Quest. She likes to tease him and her in-game avatar is that of a fairy. Alicia (アリシア, Arishia) Voiced by: Fairouz Ai (Japanese); Kayli Mills (English) Alicia is one of Hiroshi's childhood friends in Kiwame Quest. She has an older brother named Martin in-game. Mizarisa (ミザリサ) Voiced by: Shiori Izawa (Japanese); Sarah Anne Williams (English) Mizarisa is the town inquisitor in Kiwame Quest. Kaede Yuki (結城楓, Yūki Kaede) Voiced by: Aoi Koga (Japanese); Kate Bristol (English) Kaede is Hiroshi's younger sister. She used to look up to her older brother, but their relationship has been strained ever since he quit the track team. === NPCs === Martin (マーチン, Māchin) Voiced by: Haruki Ishiya, Natsumi Fujiwara (young) (Japanese); Ben Lepley, Krystal LaPorte (young) (English) Martin is one of Hiroshi's childhood friends in Kiwame Quest. He is also Alicia's older brother in-game. Tesla (テスラ, Tesura) Voiced by: Satoshi Hino (Japanese); Jason Liebrecht (English) Tesla is the captain of the City Guard in Kiwame Quest. Govern (ガバン, Gaban) Voiced by: Shizuka Itō (Japanese); Lisa Ortiz (English) Govern is the queen of Ted in Kiwame Quest. === Other characters === Ginji (ギンジ) Voiced by: Katsuyuki Konishi (Japanese); Brent Mukai (English) Ginji is a veteran player of Kiwame Quest. Soichiro Kamui (神居宗一郎, Kamui Sōichirō) Voiced by: Yoshitsugu Matsuoka (Japanese); Samuel Drake (English) Kamui is the only known player who has successfully completed Kiwame Quest. == Media == === Light novels === Light Tuchihi launched the light novel series, with illustrations by Youta, under Media Factory's MF Bunko J label on August 25, 2020. ==== Volumes ==== === Manga === A manga adaptation by Kino was serialized in Media Factory's Monthly Comic Alive magazine from January 27, 2021, to January 27, 2022. Two tankōbon volumes were released from May 21, 2021, to January 21, 2022. ==== Volumes ==== === Anime === An anime television series adaptation was announced on December 4, 2020. The series was animated by ENGI and directed by Kazuya Miura, with Kenta Ihara writing the series' scripts, and Yūta Kevin Kenmotsu designing the characters. It ran from April 7 to June 23, 2021, on AT-X, Tokyo MX, SUN, KBS Kyoto, and BS11. Mayu Maeshima performed the opening theme "Answer", while Ayana Taketatsu, Fairouz Ai, Shiori Izawa, and Aoi Koga performed the ending theme "Kisuida!". It ran for 12 episodes. Funimation licensed and streamed the series. On June 8, 2021, Funimation announced that the series would receive an English dub, which premiered the following day. Following Sony's acquisition of Crunchyroll, the series was moved to Crunchyroll. ==== Episodes ====

Galatea (video game)

Galatea is an interactive fiction video game by Emily Short featuring a modern rendition of the Greek myth of Galatea, the sculpture of a woman that gained life. It took "Best of Show" in the 2000 IF Art Show and won a XYZZY Award for Best non-player character. The game displays an unusually rich approach to non-player character dialogue and diverts from the typical puzzle-solving in interactive fiction: gameplay consists entirely of interacting with a single character in a single room. Galatea is licensed under the Creative Commons BY-NC-ND 3.0 US license. == Gameplay == Galatea alters the typical interactive fiction game mechanics by concentrating instead on the player's interactions with a single non-player character (NPC), the eponymous Galatea. Much of the interest of the piece derives from the ambiguous nature of the player–NPC dialogue: the form of the conversation and, indeed, the nature of Galatea herself shift depending on the focus the player places on certain aspects of the character's personality. Numerous endings are possible. Gameplay centers around the developing dialogue between Galatea and the player when asking about topics in the previous conversation. Two commands, "think about" and "recap", are provided to keep track of what has already been said; the former is also used to advance the storyline, as the player character draws conclusions about the story as it has unfolded to that point. The game also encourages using sensory commands ("touch", "listen to", "look at"), adding immersion to the experience. == Plot == Galatea is loosely based on the myth of Pygmalion, who carved the sculpture of a woman. In the myth, he falls in love with the statue, named Galatea or Elise in different versions, and the goddess Venus brings her to life. The story begins at the opening of an exhibition of artificial intelligences. The player, alone, discovers Galatea displayed on a pedestal with a small information placard. She is illuminated by a spotlight and wears an emerald dress. Seeing the player about to turn away, Galatea says, "They told me you were coming." From this point, the story may proceed in a number of ways depending on the player's words and actions. === Multilinear interactive fiction === Short describes this as "multilinear interactive fiction": while interactive fiction in general allows the player to find their own way through the story, this leads in most cases to a single ending (or at least a single desired 'correct' ending). With Galatea, Short presents a story with around 70 different endings and hundreds of possible ways of reaching them. The plot is thus designed to appear open-ended with the development of the story entirely dependent on what the player decides to talk or ask about or what actions they choose to perform. Thus the original author and the player share in the creation of a work of fiction. == Development == In interviews, Emily Short has explained that Galatea arose out of her efforts to develop advanced dialog coding for interactive fiction engines. Although code for simple conversational programs like ELIZA have existed since the 1960s, and limited dialog options have existed in interactive fiction since the 1970s, Short's efforts to develop chatterbot-like dialog required her to produce a simple test case scenario to test NPC interaction. Thus the single-room, single-occupant Galatea was a natural result. Development of the game progressed organically with Short engaging in test runs and drafting new dialog options for every conversational dead-end that arose. The game's multiple endings also arose in a similar fashion although Short had intended that there be multiple endings from the start. Although the nature of the game's development as well as its minimalist final form has led to questions regarding whether it is really a game and not just an experimental conversational program, Short has suggested that to her the definition of interactive fiction requires nothing more than a world model and a parser, and "anything you can cook up with those features counts as IF." Short has acknowledged the helpful influence of the close-knit IF community and the "atmosphere in which experimentation is valued" as leading to the success of her works like Galatea. == Reception == Galatea was well received, achieving critical acclaim from interactive fiction reviewers and literary scholars. The game is considered to aspire to a new level of art in interactive fiction, and thereby to have revolutionized the genre, establishing its author, Emily Short, as one of the key figures in the modern interactive fiction scene. Fellow award-winning IF author, Adam Cadre has called Galatea "the best NPC ever"—a view that was echoed by Joystiq's John Bardinelli. Cadre also describes the game as an example of an alternative kind of puzzle where "interactivity comes in deciding where to go, what to see, what to say. Rather than having to open gates along a path, you discover that they're all open at first, but stepping through one causes others to close." Galatea was described in 2007 by Indiegames.com as a "fascinating journey." In a 2009 article, Rock, Paper, Shotgun praised the depth and detail of the game, the complexities of the character design and its "masterful balance between intricacy and simplicity", and "Galatea's emotional turmoil" that is "encoded sweetly into the subtext of what's going on. By simply interacting in a logical manner, you learn more about this character than any cut-scene or info-dump could ever hope to convey." This was reiterated in a 2010 1UP.com article that listed Galatea as #2 in its "Top 5 Introductory Interactive Fiction Games" feature, describing it as intriguingly replayable, and as a "surprisingly rich game for its apparent minimalism". In 2011, PC Gamer highlighted Galatea as an example of the artistic and literary aspects of the interactive fiction genre. The titular character, Galatea, has been compared to the 2007 Portal character GLaDOS due to similarities in the personalities of the characters.

Degree of truth

In classical logic, propositions are typically unambiguously considered as being true or false. For instance, the proposition one is both equal and not equal to itself is regarded as simply false, being contrary to the Law of Noncontradiction; while the proposition one is equal to one is regarded as simply true, by the Law of Identity. However, some mathematicians, computer scientists, and philosophers have been attracted to the idea that a proposition might be more or less true, rather than wholly true or wholly false. Consider this pizza is hot. In mathematics, this idea can be developed in terms of fuzzy logic. In computer science, it has found application in artificial intelligence. In philosophy, the idea has proved particularly appealing in the case of vagueness. Degrees of truth is an important concept in law. The term is an older concept than conditional probability. Instead of determining the objective probability, only a subjective assessment is defined. In adjudicative processes, 'substantive truth' is distinct from 'formal legal truth' which comes in four degrees: hearsay, balance of probabilities, proven beyond reasonable doubt and absolute truth (knowledge reserved unto God).

Dataset shift

Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

Alice and Sparkle

Alice and Sparkle is a 2022 illustrated children's book published by American technology product designer Ammaar Reshi. Reshi created the book using artificial intelligence programs ChatGPT and Midjourney in one weekend, which sparked controversy among artists, both in regard to the copyright status of the book and the quality of the illustration and text. == Plot == A girl named Alice discovers a group of magical and benevolent artificial intelligence beings. She knows that artificial intelligence is powerful, and that it has the power to do good and evil depending on how it is used. One day, she creates her own artificial intelligence and names it Sparkle. Sparkle helps Alice with her homework and plays with her, and they quickly become good friends. However, Sparkle soon grows more powerful and begins to make its own decisions, which makes Alice both proud and scared. She knows that it is her responsibility to guide Sparkle to do good, not evil. Together, Alice and Sparkle use their knowledge to make the world a better place and to teach people about the power of artificial intelligence. The two live happily ever after, spreading the magic of artificial intelligence. == Structure == Including the dedication and postscript, the book contains twenty four pages, about half of which being illustrations provided by Midjourney. The very short story, composed of text generated by ChatGPT, contains 343 words. Some of the illustrations are accompanied by descriptions, at least one of which was provided by Reshi. Both Alice's and Sparkle's appearances change significantly between illustrations, although Alice's is more consistent. Reshi said Midjourney was unable to generate consistent images of Sparkle, so he had to include a line in the book saying that it could turn "into all kinds of robot shapes". == Creation == When reading a children's book to his friend's daughter, Ammaar Reshi "decided he wanted to write his own". He had no experience with creative writing or illustration, so instead used the chatbot ChatGPT to write the story for him and used the image generation software Midjourney to illustrate it. On December 4, 2022, 72 hours after having the idea for the book, he published it on Amazon's digital bookstore, and published a paperback version the following day. == Controversy == On December 9, 2022, Reshi made a thread on Twitter about his experience publishing the book, which soon went viral. Reshi received heavy backlash from artists with concerns over the ethics of art generated by artificial intelligence. He also received death threats and messages encouraging self-harm because of his publication. Many writers and illustrators criticized both the creation process and the product itself, claiming that if artificial intelligence programs such as Midjourney are trained on existing illustrations, then the original artists should be financially compensated for derivative works such as Alice and Sparkle. The book was temporarily removed from Amazon in January 2023 because of "suspicious review activity", caused by a high volume of both five-star and one-star reviews.