Clanker

Clanker

Clanker is a derogatory term for robots and artificial intelligence (AI) software. The term has been used in Star Wars media, first appearing in the franchise's 2005 video game Star Wars: Republic Commando. In 2025, the term became widely used to express hatred or distaste for machines ranging from delivery robots to large language models. This trend has been attributed to anxiety around the negative societal effects of AI. == In science fiction == The term has been previously used in science fiction literature, first appearing in a 1958 article by William Tenn in which he uses it to describe robots from science fiction films like Metropolis. The Star Wars franchise began using the term as a slur against droids in the 2005 video game Star Wars: Republic Commando before being prominently used in the animated series Star Wars: The Clone Wars, which follows a galaxy-wide war between the Galactic Republic's clone troopers and the Confederacy of Independent Systems' battle droids. In Star Wars media, robots—more commonly known as droids—are routinely depicted as the subjects of discrimination. For example, in the original Star Wars film, C-3PO and R2-D2 are abducted by Jawas and sold to the family of Luke Skywalker. When visiting a cantina in Mos Eisley, both droids are refused service by the bartender, who remarks that "We don't serve their kind." In Star Wars lore, the term clanker had entered use by the time of the franchise's High Republic Era and became prominent during the Clone Wars, in which clone troopers regularly use the phrase against battle droids. == AI backlash == The growing popularity of the term clanker reflects an increase in direct contact between people and AI systems. On sidewalks, delivery robots impede mobility and cause safety issues. In digital spaces, cybersecurity experts have raised concerns about the rising number of bots online, which now make up a large portion of internet traffic. A 2025 report estimated that about one in five social media accounts are automated. The term is also a reaction to AI advocacy from industrialists like Elon Musk and Sam Altman, who have championed the integration of AI into nearly every aspect of modern life. This includes efforts by major companies and startups alike, such as Amazon's development of humanoid robots to replace human workers in service industries. Such initiatives have further fueled public skepticism, reinforcing the association of clanker with unease over automation and the displacement of human roles. A global survey conducted by the research firm Gartner in December 2023 found that 64% of customers would prefer companies to avoid using AI in customer service, with another 53% stating they would consider switching to a different company if they discovered AI was handling their service interactions. Another report by Ernst & Young, published in July 2025, found that 42% of employees across Europe are worried that the use of AI in the workplace may threaten their employment. Criticism has also been directed at the technology itself. Some of the backlash stems from concerns about the resource consumption of AI systems, their frequent reliance on copyrighted material without consent, and questions about the intentions of the corporations behind them. There are also concerns about the potential cognitive effects of relying heavily on AI. A study, authored by researchers at Microsoft and Carnegie Mellon University, warns that regular dependence on AI may leave users mentally unprepared for real-world problem solving, likening the effect to cognitive atrophy. In June 2025, United States Senator Ruben Gallego tweeted that his "new bill makes sure you don't have to talk to a clanker if you don't want to", referring to proposed legislation that would require call centers to disclose their use of automated customer service agents to callers in the United States and offer the option to switch to a human representative. == Analysis == Linguist Adam Aleksic has described clanker as an evolution of racial slurs that anthropomorphize robotic systems. Internet memes incorporating the term often reference historical discrimination against marginalized groups such as African Americans. Based on the work of linguist Geoffrey Nunberg, American news website Axios has argued that clanker is merely a derogatory word, rather than a slur, because it does not perpetuate social inequities. NPR has noted the irony that the word robot was coined by Karel Čapek for his 1920 science-fiction play R.U.R. as a similar criticism of industrialization forcing workers to become devoid of their humanity. Aleksic has observed that robot can be further traced to the Proto-Slavic noun orbъ, which means 'slave'. While other science fiction media include pejoratives for androids and robots, such as skinjob and toaster from the Blade Runner and Battlestar Galactica franchises, respectively, clanker is believed to have gained popularity because its usage is intuitive and flexible. Whereas AI slop describes low-quality output from artificial intelligence, clanker belittles the underlying computer systems.

NER model

NER is one of several formulas for accessing live subtitles in television broadcasts and events that are produced using speech recognition. The three letters stand for number, edit error and recognition error. It has been promoted as an alternative to Word error rate (Word Error Rate) which is a more objective measure. The overall score is calculated as follows: Firstly, the number of edit and recognition errors is deducted from the total number of words in the live subtitles. This number is then divided by the total number of words in the live subtitles and finally multiplied by one hundred. N E R v a l u e = N − E − R N ∗ 100 {\displaystyle NERvalue={\frac {N-E-R}{N}}100} . The acronyms stand for the following: N (number) = total number of words in the live subtitles E (Edit error) = edit error R (Recognition error) = recognition error This measurement process has been used for public television broadcasts in European countries like Italy and Switzerland. One major drawback with NER is that it requires a human assessor to rate errors as either: 1 Minor edition or recognition errors 2 Normal edition or recognition errors 3 Serious errors which are then weighted in the assessment process. This is both subjective, time consuming and costly. Also, NER fails to account for words left out subtitles which is something that does not take account of the D/deaf audience who want verbatim subtitles. As a result, NER cannot accurately reflect the audience's experience of subtitles. Another problem is the inconsistency of human evaluation of subtitles, particularly with live subtitles, where there are differing opinions of the importance of subtitle errors. By way of contrast, Word error rate is an objective measure of subtitle errors, since it measures the textual discrepancy between the subtitles and the speech.

Fitness approximation

Fitness approximation aims to approximate the objective or fitness functions in evolutionary optimization by building up machine learning models based on data collected from numerical simulations or physical experiments. The machine learning models for fitness approximation are also known as meta-models or surrogates, and evolutionary optimization based on approximated fitness evaluations are also known as surrogate-assisted evolutionary approximation. Fitness approximation in evolutionary optimization can be seen as a sub-area of data-driven evolutionary optimization. == Approximate models in function optimization == === Motivation === In many real-world optimization problems including engineering problems, the number of fitness function evaluations needed to obtain a good solution dominates the optimization cost. In order to obtain efficient optimization algorithms, it is crucial to use prior information gained during the optimization process. Conceptually, a natural approach to utilizing the known prior information is building a model of the fitness function to assist in the selection of candidate solutions for evaluation. A variety of techniques for constructing such a model, often also referred to as surrogates, metamodels or approximation models – for computationally expensive optimization problems have been considered. === Approaches === Common approaches to constructing approximate models based on learning and interpolation from known fitness values of a small population include: Low-degree polynomials and regression models Fourier surrogate modeling Artificial neural networks including Multilayer perceptrons Radial basis function network Support vector machines Due to the limited number of training samples and high dimensionality encountered in engineering design optimization, constructing a globally valid approximate model remains difficult. As a result, evolutionary algorithms using such approximate fitness functions may converge to local optima. Therefore, it can be beneficial to selectively use the original fitness function together with the approximate model.

Vapnik–Chervonenkis dimension

In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size (capacity, complexity, expressive power, richness, or flexibility) of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the function class can shatter—that is, for which all possible binary labelings can be realized by some function in the class. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis. Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high-degree polynomial: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so that it can fit a given set of training points well. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This function may not fit the training set well, because it has a low capacity. This notion of capacity is made rigorous below. == Definitions == === VC dimension of a set-family === Let C = { C } C ∈ C {\displaystyle {\mathcal {C}}=\{C\}_{C\in {\mathcal {C}}}} be a family of sets (also called set family, collection of sets or set of sets) and X {\displaystyle X} a set. Their intersection is defined as the following set family: C ∩ X := { C ∩ X ∣ C ∈ C } . {\displaystyle {\mathcal {C}}\cap X:=\{C\cap X\mid C\in {\mathcal {C}}\}.} Here typically X {\displaystyle X} and each C ∈ C {\displaystyle C\in {\mathcal {C}}} are subsets of a big "universe" of possibilities U {\displaystyle U} where intersection takes place. We say that a set X {\displaystyle X} is shattered by C {\displaystyle {\mathcal {C}}} if P ( X ) = C ∩ X {\displaystyle {\mathcal {P}}(X)={\mathcal {C}}\cap X} i.e. the set of intersections contains (hence is equal to) all the subsets of X {\displaystyle X} . For finite sets X {\displaystyle X} this is equivalent to | C ∩ X | = 2 | X | . {\displaystyle |{\mathcal {C}}\cap X|=2^{|X|}.} The VC dimension D {\displaystyle D} of C {\displaystyle {\mathcal {C}}} is the cardinality of the largest set that is shattered by C {\displaystyle {\mathcal {C}}} . If arbitrarily large sets can be shattered, the VC dimension of C {\displaystyle {\mathcal {C}}} is ∞ {\displaystyle \infty } . === VC dimension of a classification model === A binary classification model f {\displaystyle f} with some parameter vector θ {\displaystyle \theta } is said to shatter a set of generally positioned data points ( x 1 , x 2 , … , x n ) {\displaystyle (x_{1},x_{2},\ldots ,x_{n})} if, for every assignment of labels to those points, there exists a θ {\displaystyle \theta } such that the model f {\displaystyle f} makes no errors when evaluating that set of data points. The VC dimension of a model f {\displaystyle f} is the maximum number of points that can be arranged so that f {\displaystyle f} shatters them. More formally, it is the maximum cardinal D {\displaystyle D} such that there exists a generally positioned data point set of cardinality D {\displaystyle D} that can be shattered by f {\displaystyle f} . == Examples == f {\displaystyle f} is a constant classifier (with no parameters); Its VC dimension is 0 since it cannot shatter even a single point. In general, the VC dimension of a finite classification model, which can return at most 2 d {\displaystyle 2^{d}} different classifiers, is at most d {\displaystyle d} (this is an upper bound on the VC dimension; the Sauer–Shelah lemma gives a lower bound on the dimension). f {\displaystyle f} is a single-parametric threshold classifier on real numbers; i.e., for a certain threshold θ {\displaystyle \theta } , the classifier f θ {\displaystyle f_{\theta }} returns 1 if the input number is larger than θ {\displaystyle \theta } and 0 otherwise. The VC dimension of f {\displaystyle f} is 1 because: (a) It can shatter a single point. For every point x {\displaystyle x} , a classifier f θ {\displaystyle f_{\theta }} labels it as 0 if θ > x {\displaystyle \theta >x} and labels it as 1 if θ < x {\displaystyle \theta x + 2 {\displaystyle \theta >x+2} , as (1,0) if θ ∈ [ x − 4 , x − 2 ) {\displaystyle \theta \in [x-4,x-2)} , as (1,1) if θ ∈ [ x − 2 , x ] {\displaystyle \theta \in [x-2,x]} , and as (0,1) if θ ∈ ( x , x + 2 ] {\displaystyle \theta \in (x,x+2]} . (b) It cannot shatter any set of three points. For every set of three numbers, if the smallest and the largest are labeled 1, then the middle one must also be labeled 1, so not all labelings are possible. f {\displaystyle f} is a straight line as a classification model on points in a two-dimensional plane (this is the model used by a perceptron). The line should separate positive data points from negative data points. There exist sets of 3 points that can indeed be shattered using this model (any 3 points that are not collinear can be shattered). However, no set of 4 points can be shattered: by Radon's theorem, any four points can be partitioned into two subsets with intersecting convex hulls, so it is not possible to separate one of these two subsets from the other. Thus, the VC dimension of this particular classifier is 3. It is important to remember that while one can choose any arrangement of points, the arrangement of those points cannot change when attempting to shatter for some label assignment. Note, only 3 of the 23 = 8 possible label assignments are shown for the three points. f {\displaystyle f} is a single-parametric sine classifier, i.e., for a certain parameter θ {\displaystyle \theta } , the classifier f θ {\displaystyle f_{\theta }} returns 1 if the input number x {\displaystyle x} has sin ⁡ ( θ x ) > 0 {\displaystyle \sin(\theta x)>0} and 0 otherwise. The VC dimension of f {\displaystyle f} is infinite, since it can shatter any finite subset of the set { 2 − m ∣ m ∈ N } {\displaystyle \{2^{-m}\mid m\in \mathbb {N} \}} . == Uses == === In statistical learning theory === The VC dimension can predict a probabilistic upper bound on the test error of a classification model. Vapnik proved that the probability of the test error (i.e., risk with 0–1 loss function) distancing from an upper bound (on data that is drawn i.i.d. from the same distribution as the training set) is given by: Pr ( test error ⩽ training error + 1 N [ D ( log ⁡ ( 2 N D ) + 1 ) − log ⁡ ( η 4 ) ] ) = 1 − η , {\displaystyle \Pr \left({\text{test error}}\leqslant {\text{training error}}+{\sqrt {{\frac {1}{N}}\left[D\left(\log \left({\tfrac {2N}{D}}\right)+1\right)-\log \left({\tfrac {\eta }{4}}\right)\right]}}\,\right)=1-\eta ,} where D {\displaystyle D} is the VC dimension of the classification model, 0 < η ⩽ 1 {\displaystyle 0<\eta \leqslant 1} , and N {\displaystyle N} is the size of the training set (restriction: this formula is valid when D ≪ N {\displaystyle D\ll N} . When D {\displaystyle D} is larger, the test-error may be much higher than the training-error. This is due to overfitting). The VC dimension also appears in sample-complexity bounds. A space of binary functions with VC dimension D {\displaystyle D} can be learned with: N = Θ ( D + ln ⁡ 1 δ ε 2 ) {\displaystyle N=\Theta \left({\frac {D+\ln {1 \over \delta }}{\varepsilon ^{2}}}\right)} samples, where ε {\displaystyle \varepsilon } is the learning error and δ {\displaystyle \delta } is the failure probability. Thus, the sample-complexity is a linear function of the VC dimension of the hypothesis space. === In computational geometry === The VC dimension is one of the critical parameters in the size of ε-nets, which determines the complexity of approximation algorithms based on them; range sets without finite VC dimension may not have finite ε-nets at all. == Bounds == The VC dimension of the dual set-family of C {\displaystyle {\mathcal {C}}} is strictly less than 2 vc ⁡ ( C ) + 1 {\displaystyle 2^{\operatorname {vc} ({\mathcal {C}})+1}} , and this is best possible. The VC dimension of a finite set-family C {\displaystyle {\mathcal {C}}} is at most log 2 ⁡ | C | {\displaystyle \log _{2}|{\mathcal {C}}|} . This is because | C ∩ X | ≤ | X | {\displaystyle |{\mathcal {C}}\cap X|\leq |X|} by definition. Given a set-fa

Algorithmic learning theory

Algorithmic learning theory is a mathematical framework for analyzing machine learning problems and algorithms. Synonyms include formal learning theory and algorithmic inductive inference. Algorithmic learning theory is different from statistical learning theory in that it does not make use of statistical assumptions and analysis. Both algorithmic and statistical learning theory are concerned with machine learning and can thus be viewed as branches of computational learning theory. == Distinguishing characteristics == Unlike statistical learning theory and most statistical theory in general, algorithmic learning theory does not assume that data are random samples, that is, that data points are independent of each other. This makes the theory suitable for domains where observations are (relatively) noise-free but not random, such as language learning and automated scientific discovery. The fundamental concept of algorithmic learning theory is learning in the limit: as the number of data points increases, a learning algorithm should converge to a correct hypothesis on every possible data sequence consistent with the problem space. This is a non-probabilistic version of statistical consistency, which also requires convergence to a correct model in the limit, but allows a learner to fail on data sequences with probability measure 0 . Algorithmic learning theory investigates the learning power of Turing machines. Other frameworks consider a much more restricted class of learning algorithms than Turing machines, for example, learners that compute hypotheses more quickly, for instance in polynomial time. An example of such a framework is probably approximately correct learning . == Learning in the limit == The concept was introduced in E. Mark Gold's seminal paper "Language identification in the limit". The objective of language identification is for a machine running one program to be capable of developing another program by which any given sentence can be tested to determine whether it is "grammatical" or "ungrammatical". The language being learned need not be English or any other natural language - in fact the definition of "grammatical" can be absolutely anything known to the tester. In Gold's learning model, the tester gives the learner an example sentence at each step, and the learner responds with a hypothesis, which is a suggested program to determine grammatical correctness. It is required of the tester that every possible sentence (grammatical or not) appears in the list eventually, but no particular order is required. It is required of the learner that at each step the hypothesis must be correct for all the sentences so far. A particular learner is said to be able to "learn a language in the limit" if there is a certain number of steps beyond which its hypothesis no longer changes. At this point it has indeed learned the language, because every possible sentence appears somewhere in the sequence of inputs (past or future), and the hypothesis is correct for all inputs (past or future), so the hypothesis is correct for every sentence. The learner is not required to be able to tell when it has reached a correct hypothesis, all that is required is that it be true. Gold showed that any language which is defined by a Turing machine program can be learned in the limit by another Turing-complete machine using enumeration. This is done by the learner testing all possible Turing machine programs in turn until one is found which is correct so far - this forms the hypothesis for the current step. Eventually, the correct program will be reached, after which the hypothesis will never change again (but note that the learner does not know that it won't need to change). Gold also showed that if the learner is given only positive examples (that is, only grammatical sentences appear in the input, not ungrammatical sentences), then the language can only be guaranteed to be learned in the limit if there are only a finite number of possible sentences in the language (this is possible if, for example, sentences are known to be of limited length). Language identification in the limit is a highly abstract model. It does not allow for limits of runtime or computer memory which can occur in practice, and the enumeration method may fail if there are errors in the input. However the framework is very powerful, because if these strict conditions are maintained, it allows the learning of any program known to be computable. This is because a Turing machine program can be written to mimic any program in any conventional programming language. See Church-Turing thesis. == Other identification criteria == Learning theorists have investigated other learning criteria, such as the following. Efficiency: minimizing the number of data points required before convergence to a correct hypothesis. Mind Changes: minimizing the number of hypothesis changes that occur before convergence. Mind change bounds are closely related to mistake bounds that are studied in statistical learning theory. Kevin Kelly has suggested that minimizing mind changes is closely related to choosing maximally simple hypotheses in the sense of Occam’s Razor. == Annual conference == Since 1990, there is an International Conference on Algorithmic Learning Theory (ALT), called Workshop in its first years (1990–1997). Between 1992 and 2016, proceedings were published in the LNCS series. Starting from 2017, they are published by the Proceedings of Machine Learning Research. The 34th conference will be held in Singapore in Feb 2023. The topics of the conference cover all of theoretical machine learning, including statistical and computational learning theory, online learning, active learning, reinforcement learning, and deep learning.

Odor source localization

Odor source localization (OSL) is the problem of locating the origin of an airborne or waterborne chemical plume using one or more mobile sensors, typically robots equipped with chemical sensors. The task sits at the intersection of robotics, fluid dynamics and machine olfaction. Chemical plumes in turbulent flows are intermittent and patchy, and most chemical sensors respond slowly and have limited selectivity, so the instantaneous reading available to a moving sensor is a poor proxy for the underlying time-averaged concentration field. Robotic OSL has been studied since the late 1980s and has applications including the detection of gas leaks, search and rescue after industrial accidents, and environmental monitoring of industrial emissions. == History == Robotic odor search emerged in the late 1980s and 1990s, drawing on earlier work in chemical ecology that had described how moths and other insects locate distant pheromone sources. R. A. Russell at Monash University was among the first to build mobile robots that followed chemical trails on the floor and tracked airborne odor plumes. Distributed and multi-robot odor search were investigated by Hayes, Martinoli and Goodman at the California Institute of Technology and EPFL, who studied cooperative plume-tracing on simulated and physical robot swarms. In 2007 Vergassola, Villermaux and Shraiman introduced infotaxis, an information-theoretic search strategy in which a sensor moves so as to maximize the expected information gain about source location, rather than following a chemical concentration gradient; the paper appeared in Nature and prompted substantial follow-up work in the robotics community. From the mid-2010s, multi-rotor unmanned aerial vehicles carrying lightweight chemical sensors became a common experimental platform for OSL research. == Problem formulation == OSL is generally decomposed into three sub-problems: plume detection (deciding whether a chemical signal is present), plume traversal (moving so as to remain in contact with the plume), and source declaration (deciding when the source has been reached). The mathematical difficulty depends strongly on the assumed dispersion model. In laminar or low-Reynolds number flows a Gaussian advection–diffusion model gives a smooth concentration field with a well-defined gradient. In turbulent flows, which dominate most realistic environments, the plume is filamentary: the sensor receives short, randomly spaced bursts of chemical separated by periods of zero signal, and the time-averaged field is not a useful guide on the time scales at which a robot must act. Source-term estimation, surveyed by Hutchinson and colleagues, additionally aims to recover both the position and the release rate of the source from the observed concentrations, often using probabilistic filters. == Biological inspiration == Many OSL strategies are explicitly modeled on the behavior of male moths flying upwind toward a pheromone source. As reviewed by Cardé and Willis, moths combine an upwind surge whenever they detect a filament of pheromone with a wider crosswind cast when contact is lost, producing a characteristic zig-zag trajectory that has been transposed onto mobile robots by several groups. Other biological models draw on the search behavior of dogs and of marine animals such as blue crabs and lobsters, which integrate chemical and bilateral hydrodynamic cues over much shorter ranges. == Algorithms and strategies == === Reactive strategies === Reactive strategies select the next motion as a direct function of the current sensor reading. Chemotaxis steers along the locally estimated concentration gradient, which is effective in laminar plumes but degrades severely in turbulence. Anemotaxis exploits a measured wind direction by surging upwind when chemical contact is made. The bio-inspired cast-and-surge family combines anemotaxis with a deterministic crosswind cast on contact loss, and is the dominant reactive approach for turbulent environments. === Probabilistic and information-theoretic strategies === Probabilistic methods maintain a posterior distribution over possible source locations and choose actions that improve that distribution. The infotaxis strategy of Vergassola, Villermaux and Shraiman selects the move that maximizes the expected reduction in entropy of the source-location posterior, and is effective in regimes where the spatial gradient is unusable. Bayesian source-term estimation extends this idea by inferring both source position and release rate, typically using particle filters or sequential Monte Carlo. === Map-based strategies === Map-based methods build a spatial model of the time-averaged gas distribution from sensor readings collected along the robot's trajectory and search for local maxima in that model. Lilienthal and colleagues describe a family of kernel-based gas distribution mapping techniques in which point measurements are convolved with a Gaussian kernel to produce a spatially extrapolated estimate. Such methods are most useful when the source can be assumed quasi-stationary and the robot is able to revisit locations. === Multi-robot and swarm strategies === Multiple robots searching cooperatively can shorten search times. Cooperative formations spread the sensors across the crosswind axis, making detection of an intermittent plume more likely. Swarm-based approaches, reviewed by Wang and colleagues, deploy larger numbers of simpler agents and rely on collective behavior rather than centralized planning; reported advantages include improved coverage of the search area and the possibility of locating multiple sources in parallel. == Sensors and platforms == Most OSL systems use metal-oxide semiconductor (MOX) sensors, photoionization detectors or electrochemical cells, which trade off sensitivity, selectivity, response time and power consumption. Ishida and colleagues describe how these sensors interact with airflow around the robot body, an effect that motivates careful aerodynamic design and active sampling. Mobile platforms include wheeled ground robots for indoor and structured outdoor environments, multi-rotor unmanned aerial vehicles for open spaces and elevated sources, and autonomous underwater vehicles for chemical plumes in the marine environment. == Notable systems == Among the early demonstrations, R. A. Russell's series of differential-drive robots at Monash University localized volatile sources in still and ventilated rooms during the 1990s. The Smelling Nano Aerial Vehicle reported by Burgués and colleagues used a Crazyflie nano-quadcopter (approximately 27 grams in mass and 10 cm across) carrying a custom MOX gas sensing board, and built three-dimensional gas distribution maps of indoor releases from sweeping flights of less than three minutes. The GADEN simulator, released by Monroy and colleagues, couples three-dimensional dispersion computed from an OpenFOAM CFD solver with models of MOX and photo-ionization gas sensors, and is widely used to test mobile-robot olfaction algorithms in simulation. == Applications == Reported applications include the localization of natural-gas and methane leaks in urban infrastructure, search for chemical contamination after industrial accidents, search and rescue, and environmental monitoring of industrial emissions. Drug- and explosives-detection robots are an adjacent application area, although these typically rely on close-range sniffing rather than long-range plume tracking. == Open challenges == Open challenges identified in recent reviews include the limited speed, selectivity and stability of available chemical sensors; the scarcity of standardized, large-scale benchmarks comparable to those available in computer vision; reliable handling of multi-source environments, where standard single-source assumptions fail; and the integration of OSL with other autonomous-vehicle subsystems such as obstacle avoidance and navigation in three-dimensional turbulent flow.

Quadratic unconstrained binary optimization

Quadratic unconstrained binary optimization (QUBO), also known as unconstrained binary quadratic programming (UBQP), is a combinatorial optimization problem with a wide range of applications from finance and economics to machine learning. QUBO is an NP hard problem, and for many classical problems from theoretical computer science, like maximum cut, graph coloring and the partition problem, embeddings into QUBO have been formulated. Embeddings for machine learning models include support-vector machines, clustering and probabilistic graphical models. Moreover, due to its close connection to Ising models, QUBO constitutes a central problem class for adiabatic quantum computation, where it is solved through a physical process called quantum annealing. == Definition == Let B = { 0 , 1 } {\displaystyle \mathbb {B} =\lbrace 0,1\rbrace } the set of binary digits (or bits), then B n {\displaystyle \mathbb {B} ^{n}} is the set of binary vectors of fixed length n ∈ N {\displaystyle n\in \mathbb {N} } . Given a symmetric or upper triangular matrix Q ∈ R n × n {\displaystyle {\boldsymbol {Q}}\in \mathbb {R} ^{n\times n}} , whose entries Q i j {\displaystyle Q_{ij}} define a weight for each pair of indices i , j ∈ { 1 , … , n } {\displaystyle i,j\in \lbrace 1,\dots ,n\rbrace } , we can define the function f Q : B n → R {\displaystyle f_{\boldsymbol {Q}}:\mathbb {B} ^{n}\rightarrow \mathbb {R} } that assigns a value to each binary vector x {\displaystyle {\boldsymbol {x}}} through f Q ( x ) = x ⊺ Q x = ∑ i = 1 n ∑ j = 1 n Q i j x i x j . {\displaystyle f_{\boldsymbol {Q}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }{\boldsymbol {Qx}}=\sum _{i=1}^{n}\sum _{j=1}^{n}Q_{ij}x_{i}x_{j}.} Alternatively, the linear and quadratic parts can be separated as f Q ′ , q ( x ) = x ⊺ Q ′ x + q ⊺ x , {\displaystyle f_{{\boldsymbol {Q}}',{\boldsymbol {q}}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }{\boldsymbol {Q}}'{\boldsymbol {x}}+{\boldsymbol {q}}^{\intercal }{\boldsymbol {x}},} where Q ′ ∈ R n × n {\displaystyle {\boldsymbol {Q}}'\in \mathbb {R} ^{n\times n}} and q ∈ R n {\displaystyle {\boldsymbol {q}}\in \mathbb {R} ^{n}} . This is equivalent to the previous definition through Q = Q ′ + diag ⁡ [ q ] {\displaystyle {\boldsymbol {Q}}={\boldsymbol {Q}}'+\operatorname {diag} [{\boldsymbol {q}}]} using the diag operator, exploiting that x = x ⋅ x {\displaystyle x=x\cdot x} for all binary values x {\displaystyle x} . Intuitively, the weight Q i j {\displaystyle Q_{ij}} is added if both x i = 1 {\displaystyle x_{i}=1} and x j = 1 {\displaystyle x_{j}=1} . The QUBO problem consists of finding a binary vector x ∗ {\displaystyle {\boldsymbol {x}}^{}} that minimizes f Q {\displaystyle f_{\boldsymbol {Q}}} , i.e., ∀ x ∈ B n : f Q ( x ∗ ) ≤ f Q ( x ) {\displaystyle \forall {\boldsymbol {x}}\in \mathbb {B} ^{n}:~f_{\boldsymbol {Q}}({\boldsymbol {x}}^{})\leq f_{\boldsymbol {Q}}({\boldsymbol {x}})} . In general, x ∗ {\displaystyle {\boldsymbol {x}}^{}} is not unique, meaning there may be a set of minimizing vectors with equal value w.r.t. f Q {\displaystyle f_{\boldsymbol {Q}}} . The complexity of QUBO arises from the number of candidate binary vectors to be evaluated, as | B n | = 2 n {\displaystyle \left|\mathbb {B} ^{n}\right|=2^{n}} grows exponentially in n {\displaystyle n} . Sometimes, QUBO is defined as the problem of maximizing f Q {\displaystyle f_{\boldsymbol {Q}}} , which is equivalent to minimizing f − Q = − f Q {\displaystyle f_{-{\boldsymbol {Q}}}=-f_{\boldsymbol {Q}}} . == Properties == QUBO is scale invariant for positive factors α > 0 {\displaystyle \alpha >0} , which leave the optimum x ∗ {\displaystyle {\boldsymbol {x}}^{}} unchanged: f α Q ( x ) = x ⊺ ( α Q ) x = α ( x ⊺ Q x ) = α f Q ( x ) {\displaystyle f_{\alpha {\boldsymbol {Q}}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }(\alpha {\boldsymbol {Q}}){\boldsymbol {x}}=\alpha ({\boldsymbol {x}}^{\intercal }{\boldsymbol {Qx}})=\alpha f_{\boldsymbol {Q}}({\boldsymbol {x}})} . In its general form, QUBO is NP-hard and cannot be solved efficiently by any known polynomial-time algorithm. However, there are polynomially-solvable special cases, where Q {\displaystyle {\boldsymbol {Q}}} has certain properties, for example: If all coefficients are positive, the optimum is trivially x ∗ = ( 0 , … , 0 ) ⊺ {\displaystyle {\boldsymbol {x}}^{}=(0,\dots ,0)^{\intercal }} . Similarly, if all coefficients are negative, the optimum is x ∗ = ( 1 , … , 1 ) ⊺ {\displaystyle {\boldsymbol {x}}^{}=(1,\dots ,1)^{\intercal }} . If Q {\displaystyle {\boldsymbol {Q}}} is diagonal, the bits can be optimized independently, and the problem is solvable in O ( n ) {\displaystyle {\mathcal {O}}(n)} . The optimal variable assignments are simply x i ∗ = 1 {\displaystyle x_{i}^{}=1} if Q i i < 0 {\displaystyle Q_{ii}<0} , and x i ∗ = 0 {\displaystyle x_{i}^{}=0} otherwise. If all off-diagonal elements of Q {\displaystyle {\boldsymbol {Q}}} are non-positive, the corresponding QUBO problem is solvable in polynomial time. QUBO can be solved using integer linear programming solvers like CPLEX or Gurobi Optimizer. This is possible since QUBO can be reformulated as a linear constrained binary optimization problem. To achieve this, substitute the product x i x j {\displaystyle x_{i}x_{j}} by an additional binary variable z i j ∈ B {\displaystyle z_{ij}\in \mathbb {B} } and add the constraints x i ≥ z i j {\displaystyle x_{i}\geq z_{ij}} , x j ≥ z i j {\displaystyle x_{j}\geq z_{ij}} and x i + x j − 1 ≤ z i j {\displaystyle x_{i}+x_{j}-1\leq z_{ij}} . Note that z i j {\displaystyle z_{ij}} can also be relaxed to continuous variables within the bounds zero and one. == Applications == QUBO is a structurally simple, yet computationally hard optimization problem. It can be used to encode a wide range of optimization problems from various scientific areas. === Maximum Cut === Given a graph G = ( V , E ) {\displaystyle G=(V,E)} with vertex set V = { 1 , … , n } {\displaystyle V=\lbrace 1,\dots ,n\rbrace } and edges E ⊆ V × V {\displaystyle E\subseteq V\times V} , the maximum cut (max-cut) problem consists of finding two subsets S , T ⊆ V {\displaystyle S,T\subseteq V} with T = V ∖ S {\displaystyle T=V\setminus S} , such that the number of edges between S {\displaystyle S} and T {\displaystyle T} is maximized. The more general weighted max-cut problem assumes edge weights w i j ≥ 0 ∀ i , j ∈ V {\displaystyle w_{ij}\geq 0~\forall i,j\in V} , with ( i , j ) ∉ E ⇒ w i j = 0 {\displaystyle (i,j)\notin E\Rightarrow w_{ij}=0} , and asks for a partition S , T ⊆ V {\displaystyle S,T\subseteq V} that maximizes the sum of edge weights between S {\displaystyle S} and T {\displaystyle T} , i.e., max S ⊆ V ∑ i ∈ S , j ∉ S w i j . {\displaystyle \max _{S\subseteq V}\sum _{i\in S,j\notin S}w_{ij}.} By setting w i j = 1 {\displaystyle w_{ij}=1} for all ( i , j ) ∈ E {\displaystyle (i,j)\in E} this becomes equivalent to the original max-cut problem above, which is why we focus on this more general form in the following. For every vertex in i ∈ V {\displaystyle i\in V} we introduce a binary variable x i {\displaystyle x_{i}} with the interpretation x i = 0 {\displaystyle x_{i}=0} if i ∈ S {\displaystyle i\in S} and x i = 1 {\displaystyle x_{i}=1} if i ∈ T {\displaystyle i\in T} . As T = V ∖ S {\displaystyle T=V\setminus S} , every i {\displaystyle i} is in exactly one set, meaning there is a 1:1 correspondence between binary vectors x ∈ B n {\displaystyle {\boldsymbol {x}}\in \mathbb {B} ^{n}} and partitions of V {\displaystyle V} into two subsets. We observe that, for any i , j ∈ V {\displaystyle i,j\in V} , the expression x i ( 1 − x j ) + ( 1 − x i ) x j {\displaystyle x_{i}(1-x_{j})+(1-x_{i})x_{j}} evaluates to 1 if and only if i {\displaystyle i} and j {\displaystyle j} are in different subsets, equivalent to logical XOR. Let W ∈ R + n × n {\displaystyle {\boldsymbol {W}}\in \mathbb {R} _{+}^{n\times n}} with W i j = w i j ∀ i , j ∈ V {\displaystyle W_{ij}=w_{ij}~\forall i,j\in V} . By extending above expression to matrix-vector form we find that x ⊺ W ( 1 − x ) + ( 1 − x ) ⊺ W x = − 2 x ⊺ W x + ( W 1 + W ⊺ 1 ) ⊺ x {\displaystyle {\boldsymbol {x}}^{\intercal }{\boldsymbol {W}}({\boldsymbol {1}}-{\boldsymbol {x}})+({\boldsymbol {1}}-{\boldsymbol {x}})^{\intercal }{\boldsymbol {Wx}}=-2{\boldsymbol {x}}^{\intercal }{\boldsymbol {Wx}}+({\boldsymbol {W1}}+{\boldsymbol {W}}^{\intercal }{\boldsymbol {1}})^{\intercal }{\boldsymbol {x}}} is the sum of weights of all edges between S {\displaystyle S} and T {\displaystyle T} , where 1 = ( 1 , 1 , … , 1 ) ⊺ ∈ R n {\displaystyle {\boldsymbol {1}}=(1,1,\dots ,1)^{\intercal }\in \mathbb {R} ^{n}} . As this is a quadratic function over x {\displaystyle {\boldsymbol {x}}} , it is a QUBO problem whose parameter matrix we can read from above expression as Q = 2 W − diag ⁡ [ W 1 + W ⊺ 1 ] , {\displaystyle {\boldsymbol {Q}}=2{\boldsymbol {W}}-\operatorname {diag} [{\boldsymbol {W1}}+{\boldsymbol {W}}^{\intercal }{\bol