Text simplification

Text simplification

Text simplification is an aspect of natural language processing that involves modifying, organizing, or categorizing existing text to make it easier to understand while retaining its original meaning. This process is essential in today's world, where communication is increasingly complex due to advancements in science, technology, and media. Human languages are inherently intricate, with extensive vocabularies and complex structures that can be challenging for machines to handle efficiently. Researchers have found that semantic compression techniques can help streamline and simplify text by reducing linguistic diversity and simplifying the vocabulary used in a given context. == Example == Text simplification involves modifying complex sentences into simpler ones to enhance readability and comprehension. Siddharthan (2006) provides an example to illustrate this process. The original sentence contains multiple clauses and phrases, which can be broken down into simpler sentences for better understanding. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today. An approach to text simplification involves lexical simplification via lexical substitution, a process that replaces complex words with simpler synonyms. Identifying complex words is a challenge addressed by machine learning classifiers trained on labeled data. Researchers have found that asking labelers to sort words by complexity levels yields more consistent results than the traditional method of categorizing words as simple or complex.

Shopify

Shopify Inc., stylized as shopify, is a Canadian multinational e-commerce company headquartered in Ottawa, Ontario that operates a platform for retail point-of-sale systems. The company has over 5 million customers and processed US$292.3 billion in transactions in 2024, of which 57% was in the United States. Major customers include Tesla, LVMH, Nestlé, PepsiCo, AB InBev, Kraft Heinz, Lindt, Whole Foods Market, Red Bull, and Hyatt. The company's software has been praised for its ease of use and reasonable fee structure. It has been described as the "go-to e-commerce platform for startups". However, the company has faced criticism for allegedly inflating their sales data and for associating with controversial sellers. == History == === 2006: Founding === Shopify was founded in 2006 by friends Tobias Lütke, Daniel Weinand and Scott Lake after launching Snowdevil, an online store for snowboarding equipment, in 2004. Dissatisfied with the existing e-commerce products on the market, Lütke, a computer programmer by trade, instead built his own. Lütke used the open source web application framework Ruby on Rails to build Snowdevil's online store and launched it after two months of development. The Snowdevil founders launched the platform as Shopify in June 2006. Shopify created an open-source template language called Liquid, which is written in Ruby and has been used since 2006. In June 2009, Shopify launched an application programming interface (API) platform and App Store. The API allows developers to create applications for Shopify online stores and then sell them on the Shopify App Store. === 2010s === In January 2010, Shopify started its Build-A-Business competition, in which participants create a business using its commerce platform. The winners of the competition received cash prizes and mentorship from entrepreneurs, such as Richard Branson, Eric Ries and others. In April of that year, Shopify launched a free mobile app on the Apple App Store. The app allows Shopify store owners to view and manage their stores from iOS mobile devices. In December 2010, Shopify raised $7 million from a series A round from Bessemer Venture Partners, FirstMark Capital, and Felicis Ventures at a $20 million pre-money valuation. At that time, the company had annualized transaction value of $132 million. In October 2011, it raised $15 million in a Series B round. In August 2013, Shopify launched Shopify Payments in partnership with Stripe. Shopify Payments allows merchants to accept payments without requiring a third-party payment gateway. The company also announced the launch of a point of sale system to enable in-person sales in addition to online. The company received $100 million in Series C funding in December 2013. Shopify earned $105 million in revenue in 2014, twice as much as it raised the previous year. In February 2014, Shopify released "Shopify Plus" for large e-commerce businesses seeking access to additional features and support. Shopify went public via an initial public offering on May 21, 2015 raising more than $131 million. In September 2015, Amazon.com closed its Amazon Webstore service for merchants and selected Shopify as the preferred migration provider; In April 2016, Shopify announced Shopify Capital, a cash advance product. Shopify Capital was initially piloted to merchants within the US and allowed merchants to receive an advance on future earnings processed through its payment gateway. Since its launch in 2016, Shopify Capital has provided more than $5.1 billion in funding to Shopify merchants, with a maximum advance of $2 million. On June 7, 2016, Shopify launched its Shopify Plus Partners Program, to help agencies connect with evolving businesses in ecommerce space. On October 3, 2016, Shopify acquired Boltmade. In November 2016, Shopify partnered with Paystack which allowed Nigerian online retailers to accept payments from customers around the world. On November 22, 2016, Shopify launched Frenzy, a mobile app that improves flash sales. In January 2017, Shopify announced integration with Amazon that would allow merchants to sell on Amazon from their Shopify stores. In April 2017, Shopify introduced its Chip & Swipe Reader, a Bluetooth enabled debit and credit card reader for brick and mortar retail purchases. The company has since released additional technology for brick and mortar retailers, including a point-of-sale system with a Dock and Retail Stand similar to that offered by Square, and a tappable chip card reader. Shopify announced a one-click accelerated checkout feature called Shopify Pay in April 2017 as an exclusive feature for merchants using Shopify Payments as their payment processor. Customers can save their shipping and payment information for future purchases from all participating Shopify stores. In November 2017 Shopify announced Arrive, a mobile application to help customers track packages from both Shopify merchants and other e-commerce websites. In September 2018, Shopify announced plans to expand its office space in Toronto's King West neighborhood in 2022 as part of "The Well" complex, jointly owned by Allied Properties REIT and RioCan REIT. In October 2018, Shopify opened its first flagship, a physical space for business owners in Los Angeles. The space offered educational classes, coworking space, a "genius bar" for companies that use Shopify software, and workshops. Online cannabis sales in Ontario, Canada, used Shopify's software when the drug was legalized in October 2018. Shopify's software is also used for in-person cannabis sales in Ontario since becoming legal in 2019. In January 2019, Shopify announced the launch of Shopify Studios, a full-service television and film content and production house. On March 22, 2019, Shopify and email marketing platform Mailchimp ended an integration agreement over disputes involving customer privacy and data collection. In April 2019, Shopify announced an integration with Snapchat to allow Shopify merchants to buy and manage Snapchat Story ads directly on the Shopify platform. The company had previously secured similar integration partnerships with Facebook and Google. On August 14, 2019, Shopify launched Shopify Chat, a new native chat function that allows merchants to have real-time conversations with customers visiting Shopify stores online. === 2020s === In January 2020, the company announced plans to hire in Vancouver, Canada. Additionally, the effects of the COVID-19 pandemic contributed to lifting stock prices. On February 21, 2020, Shopify announced plans to join the Diem Association, known as Libra Association at the time. Also that month, Shopify Pay was rebranded as Shop Pay. In April, Arrive was rebranded as Shop, combining both customer-facing features under a single brand. In May, during the COVID-19 pandemic, Shopify announced it would shift most of its global workforce to permanent remote work. It was reported that Shopify's valuation would likely rise on the back of options it had in the company Affirm that was expecting to go public shortly. In November 2020, Shopify announced a partnership with Alipay to support merchants with cross-border payments. Shopify also provided the opportunity for users to connect Alibaba and AliExpress to Shopify through a Alibaba Dropshipping app that could be purchased through the Shopify App Store. Multiple applications launched between 2021 and 2024 allowed customers to connect their Shopify store to their Alibaba account and then import and publish your products. The integration automatically syncs inventory and orders between both platforms so that Alibaba vendors can ship directly to dropshipping customers.As a result of Affirm's January 13, 2021 IPO, Shopify's 8% stake in Affirm was worth $2 billion. About half of Shopify's C-level executives left the company in early 2021. On June 29, 2021, Shopify removed the 20% revenue share for app developers that make less than US$1 million per year. On January 18, 2022, Shopify announced a partnership with JD.com to let U.S. merchants expand their operations in China, listing their products on JD's cross-border e-commerce platform JD Worldwide. On March 22, 2022, Shopify introduced Linkpop, a product to create a branded, social marketplace through which merchants can advertise and market their products via links to be added on social media channels. The following month, Shopify, Alphabet Inc., Meta Platforms, McKinsey & Company, and Stripe, Inc. announced a $925 million advance market commitment of carbon dioxide removal (CDR) from companies that are developing CDR technology over the next 9 years. In June 2022, Shopify partnered with Twitter. As a part of the deal, Twitter announced that it would launch a sales channel app for all of Shopify's U.S. merchants through its app store. Shopify also partnered with PayPal to offer Shopify Payments to merchants in France. On July 26, 2022, Lütke announced immediate layoffs totalling roughly 10 percent of its workforce. In

Ideonomy

Ideonomy is a combinatorial "science of ideas" developed by American independent scholar Patrick M. Gunkel (1947–2017). Specifically, Ideonomy is concerned with the systematic organization of ideas and the discovery of the rules behind how ideas combine, diverge, and transform. Gunkel defined ideonomy as "the science of the laws of ideas and of the application of such laws to the generation of all possible ideas in connection with any subject, idea, or thing." In his 1992 book A History of Knowledge, Charles Van Doren compared ideonomy to a "mining operation" that excavates meanings and thought to discover treasures hidden deep within language. Sources from the 1980s and 1990s demonstrate that ideonomy was useful to academic researchers in fields including biology, toxicology, and nursing/patient care. Beginning in the 2010s, academics in a wide range of fields including machine learning, marketing, computational modeling, and cybersecurity have relied on materials generated for ideonomy to provide methodological support for their research. == Etymology and definition == The word "ideonomy" combines the Greek roots ideo- (from idea, meaning pattern or form) and -nomy (from nomos, meaning law or custom). The suffix -nomy suggests the laws concerning or the totality of knowledge about a given subject, as in astronomy or taxonomy. In a note posted on the MIT ideonomy website, Gunkel states that the word was supposedly first coined by the French Encyclopedists to refer to a science of ideas. No evidence is provided for this statement, however. The concept bears some relationship to Antoine Destutt de Tracy's "ideology" (1796), which originally meant a systematic science of ideas before acquiring its modern political connotations. Gunkel provided several metaphorical descriptions of ideonomy: An "idea bank": a computer network enabling systematic exploration of infinite possible ideas A "kaleidoscope" that can exhibit all possible combinations and transformations of ideas A "prism" capable of diffracting any idea into its cognitive components A "gigantic microscope for magnifying the ideocosm" == History and development == In 1984, Gunkel received a five-year unsolicited grant from the Richard Lounsbery Foundation of New York to develop ideonomy. A June 1, 1987 article on the front page of The Wall Street Journal brought Gunkel and ideonomy to wider public attention. Some academics were interested in using ideonomy's techniques, including biologist Betsey Dyer, who published several contemporaneous peer-reviewed studies citing ideonomy. Academic researchers in the field of toxicology and nursing/patient care also used ideonomy. However, ideonomy's broadest contribution to date came beginning in the 2010s, as a list of personality traits generated for combinatorial matching was used by researchers in artificial intelligence to code human emotions for machine-learning tasks, develop computational models related to personality, develop a measurement framework for influencer-brand recommender systems, and aid information awareness/cybersecurity assessment. == Methodology == The foundational empirical method of ideonomy involves the systematic creation of extensive lists. Gunkel's apartment reportedly contained thousands of lists on every conceivable topic. Gunkel termed each list an "organon," which he described as expanding through "combination, permutation, transformation, generalization, specialization, intersection, interaction, reapplication, recursive use, etc. of existing organons." The ideonomic process follows a progressive structure. The ideonomist begins with a simple list of examples of a particular idea, concept, or thing. The list need not be exhaustive. By studying this list, the ideonomist isolates and identifies types. This categorical analysis then reveals missing items, allowing the primary list to be improved and refined. Gunkel emphasized that list items must not only cover genuine categories of nature but also be formulated in ways that yield the largest possible number of syntactically coherent possibilities when combined. The core technique of ideonomy is "ideocombinatorics"—the systematic intersection and combination of items from different lists to generate novel composite concepts. Gunkel developed computer programs to automate this process. For example, combining a list of 230 Universal Elementary Shapes (pits, pyramids, trenches, hemispheres, needles) with a list of 74 Types of Order (recurrence, identity, likeness of parts) yields 17,020 possible "shapes of order." These combinations, when phrased as questions ("Can there be pits of recurrence?"), could suggest new categories of phenomena worthy of investigation. The computer-generated output is typically repetitive and often meaningless. However, with sufficient frequency, the combinations yield results that are unexpectedly interesting and fruitful. In one documented case, Gunkel's programs generated 45,540 questions about toxins for microbiologist David Bermudes. One question—"Can hierarchies of cell process be used as a basis for classifying toxic action?"—prompted Bermudes to develop a novel approach to classifying biological toxins by the type of molecule they attack, rather than by chemical structure or physiological system affected. According to one contemporaneous account of ideonomy, "Gunkel takes for his field all fields and all ideas about anything. He uses a computer to generate lists of words and phrases and by juxtaposition reviews the resultant patterns for novel ideas. The computer is ideal for this task because the mind would rebel at the formidable processing task ideonomy involves. What we have here is computer generated originality." == Applications == Gunkel and his supporters identified several practical applications for ideonomic methods: Scientific research: Biologist Betsey Dyer of Wheaton College published research crediting ideonomy for helping to generate ideas. Medical science: When Austin pathologist Michael T. O'Brien was presented with the ideonomically-generated question "Can arteries have rashes?", he initially dismissed it as nonsense. Upon reflection, he realized that large arteries are supplied with blood by tiny vessels that might become inflamed and dilated, analogous to skin vessels in a rash—a phenomenon potentially worth researching. Analogical thinking: Harvard law professor Robert Clark used ideonomic analogies to write a research paper comparing plant structure with human hierarchies. Artificial intelligence: Douglas Lenat, a researcher at Microelectronics and Computer Technology Corporation (MCC) in Austin, suggested that Gunkel's lists enumerating types of human mistakes could help design AI systems capable of recognizing and correcting their own errors. == Reception and criticism == Ideonomy received mixed reactions from the academic and scientific communities. Prominent supporters included: Edward Fredkin, former director of MIT's computer science laboratory, who praised Gunkel's "provocative ideas on artificial intelligence." Marvin Minsky, AI scientist and MIT professor, who described ideonomy as "perhaps the most extensive study of ways to generate ideas." Frederick Seitz, president emeritus of Rockefeller University, who noted Gunkel's "encyclopedic scope" Robert C. Clark, Harvard law professor, who called Gunkel "the most intelligent person I ever met" However, skeptics questioned whether ideonomy constituted a genuine science. Fredkin himself noted that Gunkel "pours out about 60 ideas a minute, and 59 of them are bad," though he added that "even with one good idea out of 60, it's still an amazing accomplishment." Douglas Lenat observed that brainstorming with Gunkel was "a bit like being hit over the head by the muse with a sledgehammer" and that "he puts people off." Gunkel himself acknowledged that ideonomy was in its infancy and might seem "absurdly utopian." His planned magnum opus on ideonomy remained incomplete, and was posted on an MIT website thanks to faculty advisor Whitman Richards. Gunkel wrote: "Pioneering in a completely new field, yes in a new science, is almost unreal. It is heartbreaking, it is pitiable, it is almost inhuman. Honestly, it is a hell. There is nothing heroic about it." == Related concepts == Gunkel identified several historical precedents for ideonomic thinking: Gottfried Wilhelm Leibniz (1646–1716): The philosopher's work on a universal characteristic (characteristica universalis) and calculus of reasoning Peter Mark Roget (1779–1869): Creator of Roget's Thesaurus, which organized concepts into a systematic taxonomy Dmitri Mendeleev (1834–1907): Developer of the periodic table, demonstrating how combining lists of element families could reveal previously unseen connections Fritz Zwicky (1898–1974): The Caltech astrophysicist whom Gunkel called the "grandfather of ideonomy" for his development of "morphological research"—systematic exploration of all possible solutions t

Statistical learning theory

Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

Matchbox Educable Noughts and Crosses Engine

The Matchbox Educable Noughts and Crosses Engine (sometimes called the Machine Educable Noughts and Crosses Engine or MENACE) was a mechanical computer made from 304 matchboxes designed and built by artificial intelligence researcher Donald Michie and his colleague Roger Chambers, in 1961. It was designed to play human opponents in games of noughts and crosses (tic-tac-toe) by returning a move for any given state of play and to refine its strategy through reinforcement learning. This was one of the first types of artificial intelligence. Michie and Chambers did not have immediate access to a computer; they worked around this by building the engine out of matchboxes. The matchboxes they used each represented a single possible layout of a noughts and crosses grid. When the computer first played, it would randomly choose moves based on the current layout. As it played more games, through a reinforcement loop, it disqualified strategies that led to losing games, and supplemented strategies that led to winning games. Michie held a tournament against MENACE in 1961, wherein he experimented with different openings. Following MENACE's maiden tournament against Michie, it demonstrated successful artificial intelligence in its strategy. Michie's essays on MENACE's weight initialisation and the BOXES algorithm used by MENACE became popular in the field of computer science research. Michie was honoured for his contribution to machine learning research, and was twice commissioned to program a MENACE simulation on an actual computer. == Origin == Donald Michie (1923–2007) had been on the team decrypting the German Tunny Code during World War II. Fifteen years later, he wanted to further display his mathematical and computational prowess with an early convolutional neural network. Since computer equipment was not obtainable for such uses, and Michie did not have a computer readily available, he decided to display and demonstrate artificial intelligence in a more esoteric format and constructed a functional mechanical computer out of matchboxes and beads. MENACE was constructed as the result of a bet with a computer science colleague who postulated that such a machine was impossible. Michie undertook the task of collecting and defining each matchbox as a "fun project", later turned into a demonstration tool. Michie completed his essay on MENACE in 1963, "Experiments on the mechanization of game-learning", as well as his essay on the BOXES Algorithm, written with R. A. Chambers and had built up an AI research unit in Hope Park Square, Edinburgh, Scotland. MENACE learned by playing successive matches of noughts and crosses. Each time, it would eliminate a losing strategy by the human player confiscating the beads that corresponded to each move. It reinforced winning strategies by making the moves more likely, by supplying extra beads. This was one of the earliest versions of the Reinforcement Loop, the schematic algorithm of looping the algorithm, dropping unsuccessful strategies until only the winning ones remain. This model starts as completely random, and gradually learns. == Composition == MENACE was made from 304 matchboxes glued together in an arrangement similar to a chest of drawers. Each box had a code number, which was keyed into a chart. This chart had drawings of tic-tac-toe game grids with various configurations of X, O, and empty squares, corresponding to all possible permutations a game could go through as it progressed. After removing duplicate arrangements (ones that were simply rotations or mirror images of other configurations), MENACE used 304 permutations in its chart and thus that many matchboxes. Each individual matchbox tray contained a collection of coloured beads. Each colour represented a move on a square on the game grid, and so matchboxes with arrangements where positions on the grid were already taken would not have beads for that position. Additionally, at the front of the tray were two extra pieces of card in a "V" shape, the point of the "V" pointing at the front of the matchbox. Michie and his artificial intelligence team called MENACE's algorithm "Boxes", after the apparatus used for the machine. The first stage "Boxes" operated in five phases, each setting a definition and a precedent for the rules of the algorithm in relation to the game. == Operation == MENACE played first, as O, since all matchboxes represented permutations only relevant to the "X" player. To retrieve MENACE's choice of move, the opponent or operator located the matchbox that matched the current game state, or a rotation or mirror image of it. For example, at the start of a game, this would be the matchbox for an empty grid. The tray would be removed and lightly shaken so as to move the beads around. Then, the bead that had rolled into the point of the "V" shape at the front of the tray was the move MENACE had chosen to make. Its colour was then used as the position to play on, and, after accounting for any rotations or flips needed based on the chosen matchbox configuration's relation to the current grid, the O would be placed on that square. Then the player performed their move, the new state was located, a new move selected, and so on, until the game was finished. When the game had finished, the human player observed the game's outcome. As a game was played, each matchbox that was used for MENACE's turn had its tray returned to it ajar, and the bead used kept aside, so that MENACE's choice of moves and the game states they belonged to were recorded. Michie described his reinforcement system with "reward" and "punishment". Once the game was finished, if MENACE had won, it would then receive a "reward" for its victory. The removed beads showed the sequence of the winning moves. These were returned to their respective trays, easily identifiable since they were slightly open, as well as three bonus beads of the same colour. In this way, in future games MENACE would become more likely to repeat those winning moves, reinforcing winning strategies. If it lost, the removed beads were not returned, "punishing" MENACE, and meaning that in future it would be less likely, and eventually incapable if that colour of bead became absent, to repeat the moves that cause a loss. If the game was a draw, one additional bead was added to each box. == Results in practice == === Optimal strategy === Noughts and crosses has a well-known optimal strategy. A player must place their symbol in a way that blocks the other player from achieving any rows while simultaneously making a row themself. However, if both players use this strategy, the game always ends in a draw. If the human player is familiar with the optimal strategy, and MENACE can quickly learn it, then the games will eventually only end in draws. The likelihood of the computer winning increases quickly when the computer plays against a random-playing opponent. When playing against a player using optimal strategy, the odds of a draw grow to 100%. In Donald Michie's official tournament against MENACE in 1961 he used optimal strategy, and he and the computer began to draw consistently after twenty games. Michie's tournament had the following milestones: Michie began by consistently opening with "Variant 0", the middle square. At 15 games, MENACE abandoned all non-corner openings. At just over 20, Michie switched to consistently using "Variant 1", the bottom-right square. At 60, he returned to Variant 0. As he neared 80 games, he moved to "Variant 2", the top-middle. At 110, he switched to "Variant 3", the top right. At 135, he switched to "Variant 4", middle-right. At 190, he returned to Variant 1, and at 210, he returned to Variant 0. The trend in changes of beads in the "2" boxes runs: === Correlation === Depending on the strategy employed by the human player, MENACE produces a different trend on scatter graphs of wins. Using a random turn from the human player results in an almost-perfect positive trend. Playing the optimal strategy returns a slightly slower increase. The reinforcement does not create a perfect standard of wins; the algorithm will draw random uncertain conclusions each time. After the j-th round, the correlation of near-perfect play runs: 1 − D D − D ( j + 2 ) ∑ i = 0 j D ( j i + 1 ) V i {\displaystyle {1-D \over D-D^{(j+2)}}\sum _{i=0}^{j}D^{(ji+1)}V_{i}} Where Vi is the outcome (+1 is win, 0 is draw and -1 is loss) and D is the decay factor (average of past values of wins and losses). Below, Mn is the multiplier for the n-th round of the game. == Legacy == Donald Michie's MENACE proved that a computer could learn from failure and success to become good at a task. It used what would become core principles within the field of machine learning before they had been properly theorised. For example, the combination of how MENACE starts with equal numbers of types of beads in each matchbox, and how these are then selected at random, creates a learning behaviour similar to weight initialisation

Quantum machine learning

Quantum machine learning (QML) is the study of quantum algorithms for machine learning. It often refers to quantum algorithms for machine learning tasks which analyze classical data, sometimes called quantum-enhanced machine learning. QML algorithms use qubits and quantum operations to try to improve the space and time complexity of classical machine learning algorithms. Hybrid QML methods involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. The term "quantum machine learning" is sometimes used to refer classical machine learning methods applied to data generated from quantum experiments (i.e. machine learning of quantum systems), such as learning the phase transitions of a quantum system or creating new quantum experiments. QML also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory". == Machine learning with quantum computers == Quantum-enhanced machine learning refers to quantum algorithms that solve tasks in machine learning, thereby improving and often expediting classical machine learning techniques. Such algorithms typically require one to encode the given classical data set into a quantum computer to make it accessible for quantum information processing. Subsequently, quantum information processing routines are applied and the result of the quantum computation is read out by measuring the quantum system. For example, the outcome of the measurement of a qubit reveals the result of a binary classification task. While many proposals of QML algorithms are still purely theoretical and require a full-scale universal quantum computer to be tested, others have been implemented on small-scale or special purpose quantum devices. === Quantum associative memories and quantum pattern recognition === Early work on quantum associative memories has been done by Dan Ventura and Tony Martinez and by Carlo A. Trugenberger in the late 1990s and early 2000s. Associative (or content-addressable) memories are able to recognize stored content on the basis of a similarity measure, while random access memories are accessed by the address of stored information and not its content. As such they must be able to retrieve both incomplete and corrupted patterns, the essential machine learning task of pattern recognition. Typical classical associative memories store p patterns in the O ( n 2 ) {\displaystyle O(n^{2})} interactions (synapses) of a real, symmetric energy matrix over a network of n artificial neurons. The encoding is such that the desired patterns are local minima of the energy functional and retrieval is done by minimizing the total energy, starting from an initial configuration. Unfortunately, classical associative memories are severely limited by the phenomenon of cross-talk. When too many patterns are stored, spurious memories appear which quickly proliferate, so that the energy landscape becomes disordered and no retrieval is anymore possible. The number of storable patterns is typically limited by a linear function of the number of neurons, p ≤ O ( n ) {\displaystyle p\leq O(n)} . Quantum associative memories (in their simplest realization) store patterns in a unitary matrix U acting on the Hilbert space of n qubits. Retrieval is realized by the unitary evolution of a fixed initial state to a quantum superposition of the desired patterns with probability distribution peaked on the most similar pattern to an input. By its very quantum nature, the retrieval process is thus probabilistic. Because quantum associative memories are free from cross-talk, however, spurious memories are never generated. Correspondingly, they have a superior capacity than classical ones. The number of parameters in the unitary matrix U is O ( p n ) {\displaystyle O(pn)} . One can thus have efficient, spurious-memory-free quantum associative memories for any polynomial number of patterns. If the matrix U is encoded as a unique operator (as opposed as to a sequence of gates as in the circuit model), e.g. by an optical interferometer, the retrieval becomes efficient even for an exponential number of patterns. === Linear algebra simulation with quantum amplitudes === A number of quantum algorithms for machine learning are based on the idea of amplitude encoding, that is, to associate the amplitudes of a quantum state with the inputs and outputs of computations. Since a state of n {\displaystyle n} qubits is described by 2 n {\displaystyle 2^{n}} complex amplitudes, this information encoding can allow for an exponentially compact representation. Intuitively, this corresponds to associating a discrete probability distribution over binary random variables with a classical vector. The goal of algorithms based on amplitude encoding is to formulate quantum algorithms whose resources grow polynomially in the number of qubits n {\displaystyle n} , which amounts to a logarithmic time complexity in the number of amplitudes and thereby the dimension of the input. Many QML algorithms in this category are based on variations of the quantum algorithm for linear systems of equations (colloquially called HHL, after the paper's authors) which, under specific conditions, performs a matrix inversion using an amount of physical resources growing only logarithmically in the dimensions of the matrix. One of these conditions is that a Hamiltonian which entry-wise corresponds to the matrix can be simulated efficiently, which is known to be possible if the matrix is sparse or low rank. For reference, any known classical algorithm for matrix inversion requires a number of operations that grows more than quadratically in the dimension of the matrix (e.g. O ( n 2.373 ) {\displaystyle O{\mathord {\left(n^{2.373}\right)}}} ), but they are not restricted to sparse matrices. Quantum matrix inversion can be applied to machine learning methods in which the training reduces to solving a linear system of equations, for example in least-squares linear regression, the least-squares version of support vector machines, and Gaussian processes. A crucial bottleneck of methods that simulate linear algebra computations with the amplitudes of quantum states is state preparation, which often requires one to initialise a quantum system in a state whose amplitudes reflect the features of the entire dataset. Although efficient methods for state preparation are known for specific cases, this step easily hides the complexity of the task. === Variational quantum algorithms (VQAs) === In a variational quantum algorithm, a classical computer optimizes the parameters used to prepare a quantum state, while a quantum computer is used to do the actual state preparation and measurement. VQAs are considered promising candidates for noisy intermediate-scale quantum computers. Variational quantum circuits (or parameterized quantum circuits) are a popular class of VQAs where the parameters are those used in a fixed quantum circuit. Researchers have studied VQCs to solve optimization problems and find the ground state energy of complex quantum systems, which were difficult to solve using a classical computer. === Quantum binary classifier === Pattern reorganization is one of the important tasks of machine learning, binary classification is one of the tools or algorithms to find patterns. Binary classification is used in supervised learning and in unsupervised learning. In QML, classical bits are converted to qubits and they are mapped to Hilbert space; complex value data are used in a quantum binary classifier to use the advantage of Hilbert space. By exploiting the quantum mechanic properties such as superposition, entanglement, interference the quantum binary classifier produces the accurate result in short period of time. === Quantum machine learning algorithms based on Grover search === Another approach to improving classical machine learning with quantum information processing uses amplitude amplification methods based on Grover's search algorithm, which has been shown to solve unstructured search problems with a quadratic speedup compared to classical algorithms. These quantum routines can be employed for learning algorithms that translate into an unstructured search task, as can be done, for instance, in the case of the k-medians and the k-nearest neighbors algorithms. Other applications include quadratic speedups in the training of perceptrons. An e

Case-based reasoning

Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using case-based reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry) is treating nature as a database of solutions to problems. Case-based reasoning is a prominent type of analogy solution making. It has been argued that case-based reasoning is not only a powerful method for computer reasoning, but also a pervasive behavior in everyday human problem solving; or, more radically, that all reasoning is based on past cases personally experienced. This view is related to prototype theory, which is most deeply explored in cognitive science. == Process == Case-based reasoning has been formalized for purposes of computer reasoning as a four-step process: Retrieve: Given a target problem, retrieve cases relevant to solving it from memory. A case consists of a problem, its solution, and, typically, annotations about how the solution was derived. For example, suppose Fred wants to prepare blueberry pancakes. Being a novice cook, the most relevant experience he can recall is one in which he successfully made plain pancakes. The procedure he followed for making the plain pancakes, together with justifications for decisions made along the way, constitutes Fred's retrieved case. Reuse: Map the solution from the previous case to the target problem. This may involve adapting the solution as needed to fit the new situation. In the pancake example, Fred must adapt his retrieved solution to include the addition of blueberries. Revise: Having mapped the previous solution to the target situation, test the new solution in the real world (or a simulation) and, if necessary, revise. Suppose Fred adapted his pancake solution by adding blueberries to the batter. After mixing, he discovers that the batter has turned blue – an undesired effect. This suggests the following revision: delay the addition of blueberries until after the batter has been ladled into the pan. Retain: After the solution has been successfully adapted to the target problem, store the resulting experience as a new case in memory. Fred, accordingly, records his new-found procedure for making blueberry pancakes, thereby enriching his set of stored experiences, and better preparing him for future pancake-making demands. == Comparison to other methods == At first glance, CBR may seem similar to the rule induction algorithms of machine learning. Like a rule-induction algorithm, CBR starts with a set of cases or training examples; it forms generalizations of these examples, albeit implicit ones, by identifying commonalities between a retrieved case and the target problem. If for instance a procedure for plain pancakes is mapped to blueberry pancakes, a decision is made to use the same basic batter and frying method, thus implicitly generalizing the set of situations under which the batter and frying method can be used. The key difference, however, between the implicit generalization in CBR and the generalization in rule induction lies in when the generalization is made. A rule-induction algorithm draws its generalizations from a set of training examples before the target problem is even known; that is, it performs eager generalization. For instance, if a rule-induction algorithm were given recipes for plain pancakes, Dutch apple pancakes, and banana pancakes as its training examples, it would have to derive, at training time, a set of general rules for making all types of pancakes. It would not be until testing time that it would be given, say, the task of cooking blueberry pancakes. The difficulty for the rule-induction algorithm is in anticipating the different directions in which it should attempt to generalize its training examples. This is in contrast to CBR, which delays (implicit) generalization of its cases until testing time – a strategy of lazy generalization. In the pancake example, CBR has already been given the target problem of cooking blueberry pancakes; thus it can generalize its cases exactly as needed to cover this situation. CBR therefore tends to be a good approach for rich, complex domains in which there are myriad ways to generalize a case. In law, there is often explicit delegation of CBR to courts, recognizing the limits of rule based reasons: limiting delay, limited knowledge of future context, limit of negotiated agreement, etc. While CBR in law and cognitively inspired CBR have long been associated, the former is more clearly an interpolation of rule based reasoning, and judgment, while the latter is more closely tied to recall and process adaptation. The difference is clear in their attitude toward error and appellate review. Another name for case-based reasoning in problem solving is symptomatic strategies. It does require à priori domain knowledge that is gleaned from past experience which established connections between symptoms and causes. This knowledge is referred to as shallow, compiled, evidential, history-based as well as case-based knowledge. This is the strategy most associated with diagnosis by experts. Diagnosis of a problem transpires as a rapid recognition process in which symptoms evoke appropriate situation categories. An expert knows the cause by virtue of having previously encountered similar cases. Case-based reasoning is the most powerful strategy, and that used most commonly. However, the strategy won't work independently with truly novel problems, or where deeper understanding of whatever is taking place is sought. An alternative approach to problem solving is the topographic strategy which falls into the category of deep reasoning. With deep reasoning, in-depth knowledge of a system is used. Topography in this context means a description or an analysis of a structured entity, showing the relations among its elements. Also known as reasoning from first principles, deep reasoning is applied to novel faults when experience-based approaches aren't viable. The topographic strategy is therefore linked to à priori domain knowledge that is developed from a more a fundamental understanding of a system, possibly using first-principles knowledge. Such knowledge is referred to as deep, causal or model-based knowledge. Hoc and Carlier noted that symptomatic approaches may need to be supported by topographic approaches because symptoms can be defined in diverse terms. The converse is also true – shallow reasoning can be used abductively to generate causal hypotheses, and deductively to evaluate those hypotheses, in a topographical search. == Criticism == Critics of CBR argue that it is an approach that accepts anecdotal evidence as its main operating principle. Without statistically relevant data for backing and implicit generalization, there is no guarantee that the generalization is correct. However, all inductive reasoning where data is too scarce for statistical relevance is inherently based on anecdotal evidence. == History == CBR traces its roots to the work of Roger Schank and his students at Yale University in the early 1980s. Schank's model of dynamic memory was the basis for the earliest CBR systems: Janet Kolodner's CYRUS and Michael Lebowitz's IPP. Other schools of CBR and closely allied fields emerged in the 1980s, which directed at topics such as legal reasoning, memory-based reasoning (a way of reasoning from examples on massively parallel machines), and combinations of CBR with other reasoning methods. In the 1990s, interest in CBR grew internationally, as evidenced by the establishment of an International Conference on Case-Based Reasoning in 1995, as well as European, German, British, Italian, and other CBR workshops. CBR technology has resulted in the deployment of a number of successful systems, the earliest being Lockheed's CLAVIER, a system for laying out composite parts to be baked in an industrial convection oven. CBR has been used extensively in applications such as the Compaq SMART system and has found a major application area in the health sciences, as well as in structural safety management. There is recent work that develops CBR within a statistical framework and formalizes case-based inference as a specific type of probabilistic inference. Thus, it becomes possible to produce case-based predictions equipped with a certain level of confidence. One description of the difference between CBR and induction from instances is that statistical inference aims to find what tends to make cases similar while CBR aims to encode what suffices to claim similarly.