Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods. == From support-vector machine to least-squares support-vector machine == Given a training set { x i , y i } i = 1 N {\displaystyle \{x_{i},y_{i}\}_{i=1}^{N}} with input data x i ∈ R n {\displaystyle x_{i}\in \mathbb {R} ^{n}} and corresponding binary class labels y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} , the SVM classifier, according to Vapnik's original formulation, satisfies the following conditions: { w T ϕ ( x i ) + b ≥ 1 , if y i = + 1 , w T ϕ ( x i ) + b ≤ − 1 , if y i = − 1 , {\displaystyle {\begin{cases}w^{T}\phi (x_{i})+b\geq 1,&{\text{if }}\quad y_{i}=+1,\\w^{T}\phi (x_{i})+b\leq -1,&{\text{if }}\quad y_{i}=-1,\end{cases}}} which is equivalent to y i [ w T ϕ ( x i ) + b ] ≥ 1 , i = 1 , … , N , {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1,\quad i=1,\ldots ,N,} where ϕ ( x ) {\displaystyle \phi (x)} is the nonlinear map from original space to the high- or infinite-dimensional space. === Inseparable data === In case such a separating hyperplane does not exist, we introduce so-called slack variables ξ i {\displaystyle \xi _{i}} such that { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N . {\displaystyle {\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N.\end{cases}}} According to the structural risk minimization principle, the risk bound is minimized by the following minimization problem: min J 1 ( w , ξ ) = 1 2 w T w + c ∑ i = 1 N ξ i , {\displaystyle \min J_{1}(w,\xi )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}\xi _{i},} Subject to { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N , {\displaystyle {\text{Subject to }}{\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N,\end{cases}}} To solve this problem, we could construct the Lagrangian function: L 1 ( w , b , ξ , α , β ) = 1 2 w T w + c ∑ i = 1 N ξ i − ∑ i = 1 N α i { y i [ w T ϕ ( x i ) + b ] − 1 + ξ i } − ∑ i = 1 N β i ξ i , {\displaystyle L_{1}(w,b,\xi ,\alpha ,\beta )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}{\xi _{i}}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{y_{i}\left[{w^{T}\phi (x_{i})+b}\right]-1+\xi _{i}\right\}-\sum \limits _{i=1}^{N}\beta _{i}\xi _{i},} where α i ≥ 0 , β i ≥ 0 ( i = 1 , … , N ) {\displaystyle \alpha _{i}\geq 0,\ \beta _{i}\geq 0\ (i=1,\ldots ,N)} are the Lagrangian multipliers. The optimal point will be in the saddle point of the Lagrangian function, and then we obtain By substituting w {\displaystyle w} by its expression in the Lagrangian formed from the appropriate objective and constraints, we will get the following quadratic programming problem: max Q 1 ( α ) = − 1 2 ∑ i , j = 1 N α i α j y i y j K ( x i , x j ) + ∑ i = 1 N α i , {\displaystyle \max Q_{1}(\alpha )=-{\frac {1}{2}}\sum \limits _{i,j=1}^{N}{\alpha _{i}\alpha _{j}y_{i}y_{j}K(x_{i},x_{j})}+\sum \limits _{i=1}^{N}\alpha _{i},} where K ( x i , x j ) = ⟨ ϕ ( x i ) , ϕ ( x j ) ⟩ {\displaystyle K(x_{i},x_{j})=\left\langle \phi (x_{i}),\phi (x_{j})\right\rangle } is called the kernel function. Solving this QP problem subject to constraints in (1), we will get the hyperplane in the high-dimensional space and hence the classifier in the original space. === Least-squares SVM formulation === The least-squares version of the SVM classifier is obtained by reformulating the minimization problem as min J 2 ( w , b , e ) = μ 2 w T w + ζ 2 ∑ i = 1 N e i 2 , {\displaystyle \min J_{2}(w,b,e)={\frac {\mu }{2}}w^{T}w+{\frac {\zeta }{2}}\sum \limits _{i=1}^{N}e_{i}^{2},} subject to the equality constraints y i [ w T ϕ ( x i ) + b ] = 1 − e i , i = 1 , … , N . {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]=1-e_{i},\quad i=1,\ldots ,N.} The least-squares SVM (LS-SVM) classifier formulation above implicitly corresponds to a regression interpretation with binary targets y i = ± 1 {\displaystyle y_{i}=\pm 1} . Using y i 2 = 1 {\displaystyle y_{i}^{2}=1} , we have ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i e i ) 2 = ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 , {\displaystyle \sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}(y_{i}e_{i})^{2}=\sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2},} with e i = y i − ( w T ϕ ( x i ) + b ) . {\displaystyle e_{i}=y_{i}-(w^{T}\phi (x_{i})+b).} Notice, that this error would also make sense for least-squares data fitting, so that the same end results holds for the regression case. Hence the LS-SVM classifier formulation is equivalent to J 2 ( w , b , e ) = μ E W + ζ E D {\displaystyle J_{2}(w,b,e)=\mu E_{W}+\zeta E_{D}} with E W = 1 2 w T w {\displaystyle E_{W}={\frac {1}{2}}w^{T}w} and E D = 1 2 ∑ i = 1 N e i 2 = 1 2 ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 . {\displaystyle E_{D}={\frac {1}{2}}\sum \limits _{i=1}^{N}e_{i}^{2}={\frac {1}{2}}\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2}.} Both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } should be considered as hyperparameters to tune the amount of regularization versus the sum squared error. The solution does only depend on the ratio γ = ζ / μ {\displaystyle \gamma =\zeta /\mu } , therefore the original formulation uses only γ {\displaystyle \gamma } as tuning parameter. We use both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } as parameters in order to provide a Bayesian interpretation to LS-SVM. The solution of LS-SVM regressor will be obtained after we construct the Lagrangian function: { L 2 ( w , b , e , α ) = J 2 ( w , e ) − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , = 1 2 w T w + γ 2 ∑ i = 1 N e i 2 − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , {\displaystyle {\begin{cases}L_{2}(w,b,e,\alpha )\;=J_{2}(w,e)-\sum \limits _{i=1}^{N}\alpha _{i}\left\{{\left[{w^{T}\phi (x_{i})+b}\right]+e_{i}-y_{i}}\right\},\\\quad \quad \quad \quad \quad \;={\frac {1}{2}}w^{T}w+{\frac {\gamma }{2}}\sum \limits _{i=1}^{N}e_{i}^{2}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{\left[w^{T}\phi (x_{i})+b\right]+e_{i}-y_{i}\right\},\end{cases}}} where α i ∈ R {\displaystyle \alpha _{i}\in \mathbb {R} } are the Lagrange multipliers. The conditions for optimality are { ∂ L 2 ∂ w = 0 → w = ∑ i = 1 N α i ϕ ( x i ) , ∂ L 2 ∂ b = 0 → ∑ i = 1 N α i = 0 , ∂ L 2 ∂ e i = 0 → α i = γ e i , i = 1 , … , N , ∂ L 2 ∂ α i = 0 → y i = w T ϕ ( x i ) + b + e i , i = 1 , … , N . {\displaystyle {\begin{cases}{\frac {\partial L_{2}}{\partial w}}=0\quad \to \quad w=\sum \limits _{i=1}^{N}\alpha _{i}\phi (x_{i}),\\{\frac {\partial L_{2}}{\partial b}}=0\quad \to \quad \sum \limits _{i=1}^{N}\alpha _{i}=0,\\{\frac {\partial L_{2}}{\partial e_{i}}}=0\quad \to \quad \alpha _{i}=\gamma e_{i},\;i=1,\ldots ,N,\\{\frac {\partial L_{2}}{\partial \alpha _{i}}}=0\quad \to \quad y_{i}=w^{T}\phi (x_{i})+b+e_{i},\,i=1,\ldots ,N.\end{cases}}} Elimination of w {\displaystyle w} and e {\displaystyle e} will yield a linear system instead of a quadratic programming problem: [ 0 1 N T 1 N Ω + γ − 1 I N ] [ b α ] = [ 0 Y ] , {\displaystyle \left[{\begin{matrix}0&1_{N}^{T}\\1_{N}&\Omega +\gamma ^{-1}I_{N}\end{matrix}}\right]\left[{\begin{matrix}b\\\alpha \end{matrix}}\right]=\left[{\begin{matrix}0\\Y\end{matrix}}\right],} with Y = [ y 1 , … , y N ] T {\displaystyle Y=[y_{1},\ldots ,y_{N}]^{T}} , 1 N = [ 1 , … , 1 ] T {\displaystyle 1_{N}=[1,\ldots ,1]^{T}} and α = [ α 1 , … , α N ] T {\displaystyle \alpha =[\alpha _{1},\ldots ,\alpha _{N}]^{T}} . Here, I N {\displaystyle I_{N}} is an N × N {\displaystyle N\times N} identity matrix, and Ω ∈ R N × N {\displaystyle \Omega \in \mathbb {R} ^{N\times N}} is the kernel matrix defined by Ω i j = ϕ ( x i ) T ϕ ( x j ) = K ( x i , x j ) {\displaystyle \Omega _{ij}=\phi (x_{i})^{T}\phi (x_{j})=K(x_{i},x_{j})} . === Kernel function K === For the kernel function K(•, •) one typically has the following choices: Linear kernel : K ( x , x i ) = x i T x , {\displaystyle K(x,x_{i})=x_{i}^{T}x,} Polynomial kernel of degree d {\displaystyle d} : K ( x , x i ) = ( 1 + x i T x / c ) d , {\displaystyle K(x,x_{i})=\left({1+x_{i}^{T}x/c}\right)^{d},} Radial basis function RBF kernel : K ( x , x i ) = exp ( − ‖ x − x i ‖ 2 / σ 2 ) , {\displaystyle K(x,x_{i})=\exp \left({-\left\|{x-x_{i}}\right\|^{2}/\sigma ^{2}}\right),} MLP kernel : K ( x , x i ) = tanh ( k x i T x + θ ) , {\displaystyle K(x,x_{i})=\tanh \left({k
Zardoz (computer security)
In computer security, the Security-Digest list, better known as the Zardoz list, was a semi-private full disclosure mailing list run by Neil Gorsuch from 1989 through 1991. It identified weaknesses in systems and gave directions on where to find them. It was a perennial target for computer hackers, who sought archives of the list for information on undisclosed software vulnerabilities. == Membership restrictions == Access to Zardoz was approved on a case-by-case basis by Gorsuch, principally by reference to the user account used to send subscription requests; requests were approved for root users, valid UUCP owners, or system administrators listed at the NIC. The openness of the list to users other than Unix system administrators was a regular topic of conversation, with participants expressing concern that vulnerabilities and exploitation details disclosed on the list were liable to spread to hackers. The circulation of Zardoz postings was an open secret among computer hackers, and mocked in a Phrack parody of an IRC channel populated by security experts. == Notable participants == Keith Bostic discussed BSD Sendmail vulnerabilities Chip Salzenberg discussed Peter Honeyman's posting of a UUCP worm, and shell script security Gene Spafford discussed VMS and Ultrix bugs, and relayed law enforcement enquiries about the Morris Worm Tom Christiansen discussed SUID shell scripts Chris Torek discussed devising exploits from general descriptions of vulnerabilities Henry Spencer discussed Unix security Brendan Kehoe discussed systems security Alec Muffett announced Crack, the Unix password cracker The majority of Zardoz participants were Unix systems administrators and C software developers. Neil Gorsuch and Gene Spafford were the most prolific contributors to the list.
Taylor Swift deepfake pornography controversy
In late January 2024, sexually explicit AI-generated deepfake images of American musician Taylor Swift were proliferated on social media platforms 4chan and X (formerly Twitter). Several artificial images of Swift of a sexual or violent nature were quickly spread, with one post reported to have been seen over 47 million times before its eventual removal. The images led Microsoft to enhance Microsoft Designer's text-to-image model to prevent future abuse. Moreover, these images prompted responses from anti-sexual assault advocacy groups, US politicians, Swifties, and Microsoft CEO Satya Nadella, among others, and it has been suggested that Swift's influence could result in new legislation regarding the creation of deepfake pornography. A similar controversy emerged in August 2025, when The Verge reported AI image and video tool Grok Imagine generated sexually explicit images and videos of Swift from an otherwise innocuous text prompt. == Background == American musician Taylor Swift has been the target of misogyny and slut-shaming throughout her career. American technology corporation Microsoft offers AI image creators called Microsoft Designer and Bing Image Creator, which employ censorship safeguards to prevent users from generating unsafe or objectionable content. Members of a Telegram group discussed ways to circumvent these censors to create pornographic images of celebrities. Graphika, a disinformation research firm, traced the creation of the images back to a 4chan community. == Reactions == For some, the deepfake images of Swift immediately became a source of controversy and outrage. Other internet users found them humorous and absurd, such as the image making it appear as though Swift was to engage in sexual intercourse with Oscar the Grouch. The images drew condemnations from Rape, Abuse & Incest National Network and SAG-AFTRA. The latter group, who had been following issues regarding AI-generated media prior to Swift's involvement, considered the images "upsetting, harmful and deeply concerning." Microsoft CEO Satya Nadella, whose company's products were believed to be used to make these images, responded to the controversy as "alarming and terrible", further stating his belief that "we all benefit when the online world is a safe world." === Taylor Swift === A source close to Swift told the Daily Mail that she would be considering legal action, saying, "Whether or not legal action will be taken is being decided, but there is one thing that is clear: These fake AI-generated images are abusive, offensive, exploitative, and done without Taylor's consent and/or knowledge." === Politicians === White House press secretary Karine Jean-Pierre expressed concern over the counterfeit images, deeming them "alarming", and emphasized the obligation of social media platforms to curb the dissemination of misinformation. Several members of American politics called for legislation against AI-generated pornography. Later in the month, a bipartisan bill was introduced by US senators Dick Durbin, Lindsey Graham, Amy Klobuchar and Josh Hawley. The bill would allow victims to sue individuals who produced or possessed "digital forgeries" with intent to distribute, or those who received the material knowing it was made without consent. The European Union struck a deal in February 2024 on a similar bill that would criminalize deepfake pornography, as well as online harassment and revenge porn, by mid-2027. === Social media platforms === X responded to the sharing of these images on their own website with claims they would suspend accounts that participated in their spread. Despite this, the photos continued to be reshared among accounts of X, and spread to other platforms including Instagram and Reddit. X enforces a "synthetic and manipulated media policy", which has been criticized for its efficacy. They briefly blocked searches of Swift's name on January 27, 2024, reinstating them two days later. === Swifties === Fans of Taylor Swift, known as Swifties, responded to the circulation of these images by pushing the hashtag #ProtectTaylorSwift to trend on X. They also flooded other hashtags related to the images with more positive images and videos of her live performances. == Cultural significance == Deepfake pornography has remained highly controversial and has affected figures from other celebrities to ordinary people, most of whom are women. Journalists have opined that the involvement of a prominent public figure such as Swift in the dissemination of AI-generated pornography could bring public awareness and political reform to the issue.
Terminator (franchise)
Terminator is an American media franchise created by James Cameron and Gale Anne Hurd. It is considered to be of the cyberpunk subgenre of science fiction. The franchise primarily focuses on the events leading to a future post-apocalyptic war between a synthetic intelligence known as Skynet, and a surviving resistance of humans led by John Connor. In this future, Skynet uses an arsenal of cyborgs known as Terminators, designed to mimic humans and infiltrate the resistance. Much of the franchise takes place in time periods prior to the Skynet takeover, with both humans and Terminators using time travel to attempt to alter the past and change the outcome of the future. A prominent Terminator model throughout the films is the T-800, commonly known as "the Terminator", with instances of this model portrayed by Arnold Schwarzenegger. The franchise began with the 1984 film The Terminator, written and directed by Cameron, with Hurd as producer. They would return for the 1991 sequel Terminator 2: Judgment Day (or T2). Both films were critical and commercial successes. Terminator 3: Rise of the Machines (or T3) was released in 2003 to positive reviews, followed by Terminator Salvation in 2009 to more negative reviews. Salvation was intended as the first in a new trilogy, which was later scrapped after the film rights were sold. Cameron was consulted for the 2015 film Terminator Genisys, a reboot branching off from the timeline of the original film. It was negatively received and performed poorly at the box-office. Cameron had a larger role as a producer of the 2019 film Terminator: Dark Fate, a direct sequel to T2 that ignores the three preceding films. As with Salvation, both Genisys and Dark Fate were planned as first installments of new trilogies, with the plans scrapped each time due to the films' poor box-office performances. Outside of the theatrical films, Cameron co-directed T2-3D: Battle Across Time, a 1996 theme park film-based attraction. It was produced as the original sequel to T2 and reunited its main cast. A television series, Terminator: The Sarah Connor Chronicles, was developed without Cameron's involvement and aired for two seasons in 2008 and 2009. It was also produced as a T2 sequel, taking place in an alternate timeline that ignores the third film and subsequent events. Terminator Zero, an anime series, premiered in August 2024. The franchise has also inspired several lines of comic books since 1988, and numerous video games since 1991. By 2010, the franchise had generated $3 billion in revenue. == Themes and setting == The central theme of the franchise is the battle for survival between the nearly-extinct human race and the world-spanning, synthetic intelligence that is Skynet. Skynet is positioned in the first film, The Terminator (1984), as a U.S. strategic "Global Digital Defense Network" computer system by Cyberdyne Systems which becomes self-aware. Shortly after activation, Skynet seemingly perceives all humans as a threat to its existence and formulates a plan to systematically wipe out humanity itself. The system initiates a nuclear first strike against Russia, thereby ensuring a devastating second strike and a nuclear holocaust which wipes out much of humanity in the resulting nuclear war. In the post-apocalyptic aftermath, Skynet later builds up its own autonomous machine-based military capability which includes the Terminators used against individual human targets and thereafter proceeds to wage a persistent total war against the surviving elements of humanity, some of whom have militarily organized themselves into a Resistance. At some point in this future, Skynet develops the capability of time travel and both it and the Resistance seek to use this technology in order to win the war; either by altering or accelerating past events or by preventing the apocalyptic timeline. === Judgment Day === In the franchise, Judgment Day (a reference to the biblical Day of Judgment) is the date on which Skynet becomes self-aware, in which case its creators panic and attempt to deactivate the network. As a result, Skynet perceives humanity as a threat and attempts to exterminate them. Skynet launches an all-out nuclear attack on Russia in order to provoke a nuclear counter-strike against the United States, knowing this will eliminate its human enemies. Due to time travel and the consequent ability to change the future, several differing dates are given for Judgment Day. In Terminator 2: Judgment Day (1991), Sarah Connor states that Judgment Day will occur on August 29, 1997. However, this date is delayed following the attack on Cyberdyne Systems in the same film. Judgment Day has various different dates in different timelines of the subsequent films, as well as the television series, creating a multiverse of temporal phenomena. In Terminator 3: Rise of the Machines (2003) and Terminator Salvation (2009), Judgment Day was postponed to July 2003. In Terminator: The Sarah Connor Chronicles (2008–2009), the attack on Cyberdyne Systems in the second film delayed Judgment Day to April 21, 2011. In Terminator Genisys (2015), the fifth film in the franchise, Judgment Day was postponed to an unspecified day in October 2017, attributed to altered events in both the future and the past. Sarah and Kyle Reese travel through time to the year 2017 and seemingly defeat Skynet, but the system core, contained inside a subterranean blast shelter, survives unknown to them, thus further delaying, rather than preventing, Judgment Day. In Terminator: Dark Fate (2019), the direct sequel to Terminator 2: Judgment Day, a date is not given for the new Judgment Day though it is named as such by Grace. Since Grace is a ten-year-old in 2020 and shown as a teenager in the post-Judgment Day world in flash-forwards throughout the film, Judgment Day occurs sometime in the early 2020s in this timeline. == Franchise rights == Before the first film was created, director James Cameron sold the rights for $1 to Gale Anne Hurd, his future wife, who produced the film, under the strict provision that he be allowed to direct it. Hemdale Film Corporation also became a 50-percent owner of the franchise rights, until its share was sold in 1990 to Carolco Pictures, a company founded by Andrew G. Vajna and Mario Kassar. Terminator 2: Judgment Day was released a year later. Carolco filed for bankruptcy in 1995 and its library was subsequently acquired by StudioCanal, which continues to own the franchise today. However, the rights to future Terminator films were ultimately put up for auction. By that time, Cameron had become interested in making a Terminator 3 film. The rights were ultimately auctioned to Vajna in 1997, for $8 million. Vajna and Kassar spent another $8 million to purchase Hurd's half of the rights in 1998, becoming the full owners of the franchise. Hurd was initially opposed to the sale of the rights, while Cameron had lost interest in the franchise and a third film. After the 2003 release of Terminator 3: Rise of the Machines, the franchise rights were sold in 2007 for about $25 million to The Halcyon Company, which produced Terminator Salvation in 2009. Later that year, the company faced legal issues and filed for bankruptcy, putting the franchise rights up for sale. The rights were valued at about $70 million. In 2010, the rights were sold for $29.5 million to Pacificor, a hedge fund that was Halcyon's largest creditor. In 2012, the rights were sold to Megan Ellison and her production company Annapurna Pictures for less than $20 million, a lower price than what was previously offered. The low price was because of the possibility of Cameron regaining the rights in 2019, as a result of new North American copyright laws. Megan's brother David Ellison and Skydance Productions produced Terminator Genisys in 2015. Cameron worked together with David Ellison to produce the 2019 film Terminator: Dark Fate. As the film neared its release, Hurd filed to terminate a copyright grant made 35 years earlier. Under this move, Hurd would again become a 50-percent owner of the rights with Cameron and Skydance could lose the rights to make any additional Terminator films beginning in November 2020, unless a new deal is worked out. Skydance responded that it had a deal in place with Cameron and that it "controls the rights to the Terminator franchise for the foreseeable future". == Films == === The Terminator (1984) === The Terminator is a 1984 science fiction action film released by Orion Pictures, co-written and directed by James Cameron and starring Arnold Schwarzenegger, Linda Hamilton and Michael Biehn. It is the first work in the Terminator franchise. In the film, robots take over the world in the near future, directed by the artificial intelligence Skynet. With its sole mission to completely annihilate humanity, it develops android assassins called Terminators that outwardly appear human. A man named John Connor starts the Tech-Com resistance to fight the machi
Constructive cooperative coevolution
The constructive cooperative coevolutionary algorithm (also called C3) is a global optimisation algorithm in artificial intelligence based on the multi-start architecture of the greedy randomized adaptive search procedure (GRASP). It incorporates the existing cooperative coevolutionary algorithm (CC). The considered problem is decomposed into subproblems. These subproblems are optimised separately while exchanging information in order to solve the complete problem. An optimisation algorithm, usually but not necessarily an evolutionary algorithm, is embedded in C3 for optimising those subproblems. The nature of the embedded optimisation algorithm determines whether C3's behaviour is deterministic or stochastic. The C3 optimisation algorithm was originally designed for simulation-based optimisation but it can be used for global optimisation problems in general. Its strength over other optimisation algorithms, specifically cooperative coevolution, is that it is better able to handle non-separable optimisation problems. An improved version was proposed later, called the Improved Constructive Cooperative Coevolutionary Differential Evolution (C3iDE), which removes several limitations with the previous version. A novel element of C3iDE is the advanced initialisation of the subpopulations. C3iDE initially optimises the subpopulations in a partially co-adaptive fashion. During the initial optimisation of a subpopulation, only a subset of the other subcomponents is considered for the co-adaptation. This subset increases stepwise until all subcomponents are considered. This makes C3iDE very effective on large-scale global optimisation problems (up to 1000 dimensions) compared to cooperative coevolutionary algorithm (CC) and Differential evolution. The improved algorithm has then been adapted for multi-objective optimization. == Algorithm == As shown in the pseudo code below, an iteration of C3 exists of two phases. In Phase I, the constructive phase, a feasible solution for the entire problem is constructed in a stepwise manner. Considering a different subproblem in each step. After the final step, all subproblems are considered and a solution for the complete problem has been constructed. This constructed solution is then used as the initial solution in Phase II, the local improvement phase. The CC algorithm is employed to further optimise the constructed solution. A cycle of Phase II includes optimising the subproblems separately while keeping the parameters of the other subproblems fixed to a central blackboard solution. When this is done for each subproblem, the found solution are combined during a "collaboration" step, and the best one among the produced combinations becomes the blackboard solution for the next cycle. In the next cycle, the same is repeated. Phase II, and thereby the current iteration, are terminated when the search of the CC algorithm stagnates and no significantly better solutions are being found. Then, the next iteration is started. At the start of the next iteration, a new feasible solution is constructed, utilising solutions that were found during the Phase I of the previous iteration(s). This constructed solution is then used as the initial solution in Phase II in the same way as in the first iteration. This is repeated until one of the termination criteria for the optimisation is reached, e.g. a maximum number of evaluations. {Sphase1} ← ∅ while termination criteria not satisfied do if {Sphase1} = ∅ then {Sphase1} ← SubOpt(∅, 1) end if while pphase1 not completely constructed do pphase1 ← GetBest({Sphase1}) {Sphase1} ← SubOpt(pphase1, inext subproblem) end while pphase2 ← GetBest({Sphase1}) while not stagnate do {Sphase2} ← ∅ for each subproblem i do {Sphase2} ← SubOpt(pphase2,i) end for {Sphase2} ← Collab({Sphase2}) pphase2 ← GetBest({Sphase2}) end while end while == Multi-objective optimisation == The multi-objective version of the C3 algorithm is a Pareto-based algorithm which uses the same divide-and-conquer strategy as the single-objective C3 optimisation algorithm . The algorithm again starts with the advanced constructive initial optimisations of the subpopulations, considering an increasing subset of subproblems. The subset increases until the entire set of all subproblems is included. During these initial optimisations, the subpopulation of the latest included subproblem is evolved by a multi-objective evolutionary algorithm. For the fitness calculations of the members of the subpopulation, they are combined with a collaborator solution from each of the previously optimised subpopulations. Once all subproblems' subpopulations have been initially optimised, the multi-objective C3 optimisation algorithm continues to optimise each subproblem in a round-robin fashion, but now collaborator solutions from all other subproblems' subspopulations are combined with the member of the subpopulation that is being evaluated. The collaborator solution is selected randomly from the solutions that make up the Pareto-optimal front of the subpopulation. The fitness assignment to the collaborator solutions is done in an optimistic fashion (i.e. an "old" fitness value is replaced when the new one is better). == Applications == The constructive cooperative coevolution algorithm has been applied to different types of problems, e.g. a set of standard benchmark functions, optimisation of sheet metal press lines and interacting production stations. The C3 algorithm has been embedded with, amongst others, the differential evolution algorithm and the particle swarm optimiser for the subproblem optimisations.
Kernel embedding of distributions
In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega
Speech-generating device
Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies. There are several input and display methods for users of varying abilities to make use of SGDs. Some SGDs have multiple pages of symbols to accommodate a large number of utterances, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices can produce electronic voice output by using digitized recordings of natural speech or through speech synthesis—which may carry less emotional information but can permit the user to speak novel messages. The content, organization, and updating of the vocabulary on an SGD is influenced by a number of factors, such as the user's needs and the contexts that the device will be used in. The development of techniques to improve the available vocabulary and rate of speech production is an active research area. Vocabulary items should be of high interest to the user, be frequently applicable, have a range of meanings, and be pragmatic in functionality. There are multiple methods of accessing messages on devices: directly or indirectly, or using specialized access devices—although the specific access method will depend on the skills and abilities of the user. SGD output is typically much slower than speech, although rate enhancement strategies can increase the user's rate of output, resulting in enhanced efficiency of communication. The first known SGD was prototyped in the mid-1970s, and rapid progress in hardware and software development has meant that SGD capabilities can now be integrated into devices like smartphones. Notable users of SGDs include Stephen Hawking, Roger Ebert, Tony Proudfoot, and Pete Frates (founder of the ALS Ice Bucket Challenge). Speech-generating systems may be dedicated devices developed solely for AAC, or non-dedicated devices such as computers running additional software to allow them to function as AAC devices. == History == SGDs have their roots in early electronic communication aids. The first such aid was a sip-and-puff typewriter controller named the patient-operated selector mechanism (Naman) prototyped by Reg Maling in the United Kingdom in 1960. POSSUM scanned through a set of symbols on an illuminated display. Researchers at Delft University in the Netherlands created the lightspot-operated typewriter (LOT) in 1970, which made use of small movements of the head to point a small spot of light at a matrix of characters, each equipped with a photoelectric cell. Although it was commercially unsuccessful, the LOT was well received by its users. In 1966, Barry Romich, a freshman engineering student at Case Western Reserve University, and Ed Prentke, an engineer at Highland View Hospital in Cleveland, Ohio, formed a partnership, creating the Prentke Romich Company. In 1969, the company produced its first communication device, a typing system based on a discarded Teletype machine. In 1979, Mark Dahmke developed software for a vocal communication aid program using the Computalker CT-1 analog speech synthesizer with a microcomputer. The software utilized phonemes to generate speech, assisting individuals with communication impairments in constructing words and sentences. Dahmke's work contributed to the advancement of assistive technology for people with disabilities. Notably, he designed the "Vocabulary Management System" for Bill Rush, a student with cerebral palsy. This early speech synthesis technology facilitated improved communication for Rush and was featured in a 1980 issue of LIFE Magazine. Dahmke's contributions have influenced the development of augmentative and alternative communication (AAC) technologies. During the 1970s and early 1980s, several other companies emerged that have since become prominent manufacturers of SGDs. Toby Churchill founded Toby Churchill Ltd in 1973, after losing his speech following encephalitis. In the US, Dynavox (then known as Sentient Systems Technology) grew out of a student project at Carnegie-Mellon University, created in 1982 to help a young woman with cerebral palsy to communicate. Beginning in the 1980s, improvements in technology led to a greatly increased number, variety, and performance of commercially available communication devices, and a reduction in their size and price. Alternative methods of access such as Target Scanning (also known as eye pointing) calibrate the movement of a user's eyes to direct an SGD to produce the desired speech. Scanning, in which alternatives are presented to the user sequentially, became available on communication devices. Speech output possibilities included both digitized and synthesized speech. Rapid progress in hardware and software development continued, including projects funded by the European Community. The first commercially available dynamic screen speech-generating devices were developed in the 1990s. Software was developed that allowed the computer-based production of communication boards. High-tech devices have continued to become smaller and lighter, while increasing accessibility and capability; communication devices can be accessed using eye-tracking systems, perform as a computer for word-processing and Internet use, and as an environmental control device for independent access to other equipment such as TV, radio and telephones. Stephen Hawking came to be associated with the unique voice of his particular synthesis equipment. Hawking was unable to speak due to a combination of disabilities caused by ALS, and an emergency tracheotomy. In the past 20 or so years SGD have gained popularity amongst young children with speech deficiencies, such as autism, Down syndrome, and predicted brain damage due to surgery. Starting in the early 2000s, specialists saw the benefit of using SGDs not only for adults but for children, as well. Neuro-linguists found that SGDs were just as effective in helping children who were at risk for temporary language deficits after undergoing brain surgery as it is for patients with ALS. In particular, digitized SGDs have been used as communication aids for pediatric patients during the recovery process. == Access methods == There are many methods of accessing messages on devices: directly, indirectly, and with specialized access devices. Direct access methods involve physical contact with the system, by using a keyboard or a touch screen. Users accessing SGDs indirectly and through specialized devices must manipulate an object in order to access the system, such as maneuvering a joystick, head mouse, optical head pointer, light pointer, infrared pointer, or switch access scanner. The specific access method will depend on the skills and abilities of the user. With direct selection a body part, pointer, adapted mouse, joystick, or eye tracking could be used, whereas switch access scanning is often used for indirect selection. Unlike direct selection (e.g., typing on a keyboard, touching a screen), users of Target Scanning can only make selections when the scanning indicator (or cursor) of the electronic device is on the desired choice. Those who are unable to point typically calibrate their eyes to use eye gaze as a way to point and blocking as a way to select desired words and phrases. The speed and pattern of scanning, as well as the way items are selected, are individualized to the physical, visual and cognitive capabilities of the user. == Message construction == Augmentative and alternative communication is typically much slower than speech, with users generally producing 8–10 words per minute. Rate enhancement strategies can increase the user's rate of output to around 12–15 words per minute, and as a result enhance the efficiency of communication. In any given SGD there may be a large number of vocal expressions that facilitate efficient and effective communication, including greetings, expressing desires, and asking questions. Some SGDs have multiple pages of symbols to accommodate a large number of vocal expressions, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices generally display a set of selections either using a dynamically changing screen, or a fixed display. There are two main options for increasing the rate of communication for an SGD: encoding, and prediction. Encoding permits a user to produce a word, sentence or phrase using only on