T-norm

In mathematics, a t-norm (also T-norm or, unabbreviated, triangular norm) is a kind of binary operation used in the framework of probabilistic metric spaces and in multi-valued logic, specifically in fuzzy logic. A t-norm generalizes intersection in a lattice and conjunction in logic. The name triangular norm refers to the fact that in the framework of probabilistic metric spaces t-norms are used to generalize the triangle inequality of ordinary metric spaces. == Definition == A t-norm is a function T: [0, 1] × [0, 1] → [0, 1] that satisfies the following properties: Commutativity: T(a, b) = T(b, a) Monotonicity: T(a, b) ≤ T(c, d) if a ≤ c and b ≤ d Associativity: T(a, T(b, c)) = T(T(a, b), c) The number 1 acts as identity element: T(a, 1) = a Since a t-norm is a binary algebraic operation on the interval [0, 1], infix algebraic notation is also common, with the t-norm usually denoted by ∗ {\displaystyle } . The defining conditions of the t-norm are exactly those of a partially ordered abelian monoid on the real unit interval [0, 1]. (Cf. ordered group.) The monoidal operation of any partially ordered abelian monoid L is therefore by some authors called a triangular norm on L. === Classification of t-norms === A t-norm is called continuous if it is continuous as a function, in the usual interval topology on [0, 1]2. (Similarly for left- and right-continuity.) A t-norm is called strict if it is continuous and strictly monotone. A t-norm is called nilpotent if it is continuous and each x in the open interval (0, 1) is nilpotent, that is, there is a natural number n such that x ∗ {\displaystyle } ... ∗ {\displaystyle } x (n times) equals 0. A t-norm ∗ {\displaystyle } is called Archimedean if it has the Archimedean property, that is, if for each x, y in the open interval (0, 1) there is a natural number n such that x ∗ {\displaystyle } ... ∗ {\displaystyle } x (n times) is less than or equal to y. The usual partial ordering of t-norms is pointwise, that is, T1 ≤ T2 if T1(a, b) ≤ T2(a, b) for all a, b in [0, 1]. As functions, pointwise larger t-norms are sometimes called stronger than those pointwise smaller. In the semantics of t-norm fuzzy logics, however, the larger a t-norm, the weaker (in terms of logical strength) conjunction it represents. == Prominent examples == Minimum t-norm ⊤ m i n ( a , b ) = min { a , b } , {\displaystyle \top _{\mathrm {min} }(a,b)=\min\{a,b\},} also called the Gödel t-norm, as it is the standard semantics for conjunction in Gödel fuzzy logic. Besides that, it occurs in most t-norm based fuzzy logics as the standard semantics for weak conjunction. It is the pointwise largest t-norm (see the properties of t-norms below). Product t-norm ⊤ p r o d ( a , b ) = a ⋅ b {\displaystyle \top _{\mathrm {prod} }(a,b)=a\cdot b} (the ordinary product of real numbers). Besides other uses, the product t-norm is the standard semantics for strong conjunction in product fuzzy logic. It is a strict Archimedean t-norm. Łukasiewicz t-norm ⊤ L u k ( a , b ) = max { 0 , a + b − 1 } . {\displaystyle \top _{\mathrm {Luk} }(a,b)=\max\{0,a+b-1\}.} The name comes from the fact that the t-norm is the standard semantics for strong conjunction in Łukasiewicz fuzzy logic. It is a nilpotent Archimedean t-norm, pointwise smaller than the product t-norm. Drastic t-norm ⊤ D ( a , b ) = { b if a = 1 a if b = 1 0 otherwise. {\displaystyle \top _{\mathrm {D} }(a,b)={\begin{cases}b&{\mbox{if }}a=1\\a&{\mbox{if }}b=1\\0&{\mbox{otherwise.}}\end{cases}}} The name reflects the fact that the drastic t-norm is the pointwise smallest t-norm (see the properties of t-norms below). It is a right-continuous Archimedean t-norm. Nilpotent minimum ⊤ n M ( a , b ) = { min ( a , b ) if a + b > 1 0 otherwise {\displaystyle \top _{\mathrm {nM} }(a,b)={\begin{cases}\min(a,b)&{\mbox{if }}a+b>1\\0&{\mbox{otherwise}}\end{cases}}} is a standard example of a t-norm that is left-continuous, but not continuous. Despite its name, the nilpotent minimum is not a nilpotent t-norm. Hamacher product ⊤ H 0 ( a , b ) = { 0 if a = b = 0 a b a + b − a b otherwise {\displaystyle \top _{\mathrm {H} _{0}}(a,b)={\begin{cases}0&{\mbox{if }}a=b=0\\{\frac {ab}{a+b-ab}}&{\mbox{otherwise}}\end{cases}}} is a strict Archimedean t-norm, and an important representative of the parametric classes of Hamacher t-norms and Schweizer–Sklar t-norms. == Properties of t-norms == The drastic t-norm is the pointwise smallest t-norm and the minimum is the pointwise largest t-norm: ⊤ D ( a , b ) ≤ ⊤ ( a , b ) ≤ ⊤ m i n ( a , b ) , {\displaystyle \top _{\mathrm {D} }(a,b)\leq \top (a,b)\leq \mathrm {\top _{min}} (a,b),} for any t-norm ⊤ {\displaystyle \top } and all a, b in [0, 1]. In particular, we have that: ⊤ D ( a , b ) ≤ ⊤ L u k ( a , b ) ≤ ⊤ p r o d ( a , b ) ≤ ⊤ m i n ( a , b ) , {\displaystyle \top _{\mathrm {D} }(a,b)\leq \top _{\mathrm {Luk} }(a,b)\leq \top _{\mathrm {prod} }(a,b)\leq \mathrm {\top _{min}} (a,b),} for all a, b in [0, 1]. For every t-norm T, the number 0 acts as null element: T(a, 0) = 0 for all a in [0, 1]. A t-norm T has zero divisors if and only if it has nilpotent elements; each nilpotent element of T is also a zero divisor of T. The set of all nilpotent elements is an interval [0, a] or [0, a), for some a in [0, 1]. === Properties of continuous t-norms === Although real functions of two variables can be continuous in each variable without being continuous on [0, 1]2, this is not the case with t-norms: a t-norm T is continuous if and only if it is continuous in one variable, i.e., if and only if the functions fy(x) = T(x, y) are continuous for each y in [0, 1]. Analogous theorems hold for left- and right-continuity of a t-norm. A continuous t-norm is Archimedean if and only if 0 and 1 are its only idempotents. A continuous Archimedean t-norm is strict if 0 is its only nilpotent element; otherwise it is nilpotent. By definition, moreover, a continuous Archimedean t-norm T is nilpotent if and only if each x < 1 is a nilpotent element of T. Thus with a continuous Archimedean t-norm T, either all or none of the elements of (0, 1) are nilpotent. If it is the case that all elements in (0, 1) are nilpotent, then the t-norm is isomorphic to the Łukasiewicz t-norm; i.e., there is a strictly increasing function f such that ⊤ ( x , y ) = f − 1 ( ⊤ L u k ( f ( x ) , f ( y ) ) ) . {\displaystyle \top (x,y)=f^{-1}(\top _{\mathrm {Luk} }(f(x),f(y))).} If on the other hand it is the case that there are no nilpotent elements of T, the t-norm is isomorphic to the product t-norm. In other words, all nilpotent t-norms are isomorphic, the Łukasiewicz t-norm being their prototypical representative; and all strict t-norms are isomorphic, with the product t-norm as their prototypical example. The Łukasiewicz t-norm is itself isomorphic to the product t-norm undercut at 0.25, i.e., to the function p(x, y) = max(0.25, x ⋅ y) on [0.25, 1]2. For each continuous t-norm, the set of its idempotents is a closed subset of [0, 1]. Its complement—the set of all elements that are not idempotent—is therefore a union of countably many non-overlapping open intervals. The restriction of the t-norm to any of these intervals (including its endpoints) is Archimedean, and thus isomorphic either to the Łukasiewicz t-norm or the product t-norm. For such x, y that do not fall into the same open interval of non-idempotents, the t-norm evaluates to the minimum of x and y. These conditions actually give a characterization of continuous t-norms, called the Mostert–Shields theorem, since every continuous t-norm can in this way be decomposed, and the described construction always yields a continuous t-norm. The theorem can also be formulated as follows: A t-norm is continuous if and only if it is isomorphic to an ordinal sum of the minimum, Łukasiewicz, and product t-norm. A similar characterization theorem for non-continuous t-norms is not known (not even for left-continuous ones), only some non-exhaustive methods for the construction of t-norms have been found. == Residuum == For any left-continuous t-norm ⊤ {\displaystyle \top } , there is a unique binary operation ⇒ {\displaystyle \Rightarrow } on [0, 1] such that ⊤ ( z , x ) ≤ y {\displaystyle \top (z,x)\leq y} if and only if z ≤ ( x ⇒ y ) {\displaystyle z\leq (x\Rightarrow y)} for all x, y, z in [0, 1]. This operation is called the residuum of the t-norm. In prefix notation, the residuum of a t-norm ⊤ {\displaystyle \top } is often denoted by ⊤ → {\displaystyle {\vec {\top }}} or by the letter R. The interval [0, 1] equipped with a t-norm and its residuum forms a residuated lattice. The relation between a t-norm T and its residuum R is an instance of adjunction (specifically, a Galois connection): the residuum forms a right adjoint R(x, –) to the functor T(–, x) for each x in the lattice [0, 1] taken as a poset category. In the standard semantics of t-norm based fuzzy logics, where conjunction is interpreted by a t-norm, the residuum plays the role of implication (often

Gödel machine

A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. The machine was invented by Jürgen Schmidhuber (first proposed in 2003), but is named after Kurt Gödel who inspired the mathematical theories. The Gödel machine is often discussed when dealing with issues of meta-learning, also known as "learning to learn." Applications include automating human design decisions and transfer of knowledge between multiple related tasks, and may lead to design of more robust and general learning architectures. Though theoretically possible, no full implementation has been created. The Gödel machine is often compared with Marcus Hutter's AIXI, another formal specification for an artificial general intelligence. Schmidhuber points out that the Gödel machine could start out by implementing AIXItl as its initial sub-program, and self-modify after it finds proof that another algorithm for its search code will be better. == Limitations == Traditional problems solved by a computer only require one input and provide some output. Computers of this sort had their initial algorithm hardwired. This does not take into account the dynamic natural environment, and thus was a goal for the Gödel machine to overcome. The Gödel machine has limitations of its own, however. According to Gödel's First Incompleteness Theorem, any formal system that encompasses arithmetic is either flawed or allows for statements that cannot be proved in the system. Hence even a Gödel machine with unlimited computational resources must ignore those self-improvements whose effectiveness it cannot prove. == Variables of interest == There are three variables that are particularly useful in the run time of the Gödel machine. At some time t {\displaystyle t} , the variable time {\displaystyle {\text{time}}} will have the binary equivalent of t {\displaystyle t} . This is incremented steadily throughout the run time of the machine. Any input meant for the Gödel machine from the natural environment is stored in variable x {\displaystyle x} . It is likely the case that x {\displaystyle x} will hold different values for different values of variable time {\displaystyle {\text{time}}} . The outputs of the Gödel machine are stored in variable y {\displaystyle y} , where y ( t ) {\displaystyle y(t)} would be the output bit-string at some time t {\displaystyle t} . At any given time t {\displaystyle t} , where ( 1 ≤ t ≤ T ) {\displaystyle (1\leq t\leq T)} , the goal is to maximize future success or utility. A typical utility function follows the pattern u ( s , E n v ) : S × E → R {\displaystyle u(s,\mathrm {Env} ):S\times E\rightarrow \mathbb {R} } : u ( s , E n v ) = E μ [ ∑ τ = time T r ( τ ) ∣ s , E n v ] {\displaystyle u(s,\mathrm {Env} )=E_{\mu }{\Bigg [}\sum _{\tau ={\text{time}}}^{T}r(\tau )\mid s,\mathrm {Env} {\Bigg ]}} where r ( t ) {\displaystyle r(t)} is a real-valued reward input (encoded within s ( t ) {\displaystyle s(t)} ) at time t {\displaystyle t} , E μ [ ⋅ ∣ ⋅ ] {\displaystyle E_{\mu }[\cdot \mid \cdot ]} denotes the conditional expectation operator with respect to some possibly unknown distribution μ {\displaystyle \mu } from a set M {\displaystyle M} of possible distributions ( M {\displaystyle M} reflects whatever is known about the possibly probabilistic reactions of the environment), and the above-mentioned time = time ⁡ ( s ) {\displaystyle {\text{time}}=\operatorname {time} (s)} is a function of state s {\displaystyle s} which uniquely identifies the current cycle. Note that we take into account the possibility of extending the expected lifespan through appropriate actions. == Instructions used by proof techniques == The nature of the six proof-modifying instructions below makes it impossible to insert an incorrect theorem into proof, thus trivializing proof verification. === get-axiom(n) === Appends the n-th axiom as a theorem to the current theorem sequence. Below is the initial axiom scheme: Hardware Axioms formally specify how components of the machine could change from one cycle to the next. Reward Axioms define the computational cost of hardware instruction and the physical cost of output actions. Related Axioms also define the lifetime of the Gödel machine as scalar quantities representing all rewards/costs. Environment Axioms restrict the way new inputs x are produced from the environment, based on previous sequences of inputs y. Uncertainty Axioms/String Manipulation Axioms are standard axioms for arithmetic, calculus, probability theory, and string manipulation that allow for the construction of proofs related to future variable values within the Gödel machine. Initial State Axioms contain information about how to reconstruct parts or all of the initial state. Utility Axioms describe the overall goal in the form of utility function u. === apply-rule(k, m, n) === Takes in the index k of an inference rule (such as Modus tollens, Modus ponens), and attempts to apply it to the two previously proved theorems m and n. The resulting theorem is then added to the proof. === delete-theorem(m) === Deletes the theorem stored at index m in the current proof. This helps to mitigate storage constraints caused by redundant and unnecessary theorems. Deleted theorems can no longer be referenced by the above apply-rule function. === set-switchprog(m, n) === Replaces switchprog S pm:n, provided it is a non-empty substring of S p. === check() === Verifies whether the goal of the proof search has been reached. A target theorem states that given the current axiomatized utility function u (Item 1f), the utility of a switch from p to the current switchprog would be higher than the utility of continuing the execution of p (which would keep searching for alternative switchprogs). === state2theorem(m, n) === Takes in two arguments, m and n, and attempts to convert the contents of Sm:n into a theorem. == Example applications == === Time-limited NP-hard optimization === The initial input to the Gödel machine is the representation of a connected graph with a large number of nodes linked by edges of various lengths. Within given time T it should find a cyclic path connecting all nodes. The only real-valued reward will occur at time T. It equals 1 divided by the length of the best path found so far (0 if none was found). There are no other inputs. The by-product of maximizing expected reward is to find the shortest path findable within the limited time, given the initial bias. === Fast theorem proving === Prove or disprove as quickly as possible that all even integers > 2 are the sum of two primes (Goldbach’s conjecture). The reward is 1/t, where t is the time required to produce and verify the first such proof. === Maximizing expected reward with bounded resources === A cognitive robot that needs at least 1 liter of gasoline per hour interacts with a partially unknown environment, trying to find hidden, limited gasoline depots to occasionally refuel its tank. It is rewarded in proportion to its lifetime, and dies after at most 100 years or as soon as its tank is empty or it falls off a cliff, and so on. The probabilistic environmental reactions are initially unknown but assumed to be sampled from the axiomatized Speed Prior, according to which hard-to-compute environmental reactions are unlikely. This permits a computable strategy for making near-optimal predictions. One by-product of maximizing expected reward is to maximize expected lifetime.

Geographical cluster

A geographical cluster is a localized anomaly, usually an excess of something given the distribution or variation of something else. Often it is considered as an incidence rate that is unusual in that there is more of some variable than might be expected. Examples would include: a local excess disease rate, a crime hot spot, areas of high unemployment, accident blackspots, unusually high positive residuals from a model, high concentrations of flora or fauna, physical features or events like earthquake epicenters etc... Identifying these extreme regions may be useful in that there could be implicit geographical associations with other variables that can be identified and would be of interest. Pattern detection via the identification of such geographical clusters is a very simple and generic form of geographical analysis that has many applications in many different contexts. The emphasis is on localized clustering or patterning because this may well contain the most useful information. A geographical cluster is different from a high concentration as it is generally second order, involving the factoring in of the distribution of something else. == Geographical cluster detection == Identifying geographical clusters can be an important stage in a geographical analysis. Mapping the locations of unusual concentrations may help identify causes of these. Some techniques include the Geographical Analysis Machine and Besag and Newell's cluster detection method.

Promoter based genetic algorithm

The promoter based genetic algorithm (PBGA) is a genetic algorithm for neuroevolution developed by F. Bellas and R.J. Duro in the Integrated Group for Engineering Research (GII) at the University of Coruña, in Spain. It evolves variable size feedforward artificial neural networks (ANN) that are encoded into sequences of genes for constructing a basic ANN unit. Each of these blocks is preceded by a gene promoter acting as an on/off switch that determines if that particular unit will be expressed or not. == PBGA basics == The basic unit in the PBGA is a neuron with all of its inbound connections as represented in the following figure: The genotype of a basic unit is a set of real valued weights followed by the parameters of the neuron and proceeded by an integer valued field that determines the promoter gene value and, consequently, the expression of the unit. By concatenating units of this type we can construct the whole network. With this encoding it is imposed that the information that is not expressed is still carried by the genotype in evolution but it is shielded from direct selective pressure, maintaining this way the diversity in the population, which has been a design premise for this algorithm. Therefore, a clear difference is established between the search space and the solution space, permitting information learned and encoded into the genotypic representation to be preserved by disabling promoter genes. == Results == The PBGA was originally presented within the field of autonomous robotics, in particular in the real time learning of environment models of the robot. It has been used inside the Multilevel Darwinist Brain (MDB) cognitive mechanism developed in the GII for real robots on-line learning. In another paper it is shown how the application of the PBGA together with an external memory that stores the successful obtained world models, is an optimal strategy for adaptation in dynamic environments. Recently, the PBGA has provided results that outperform other neuroevolutionary algorithms in non-stationary problems, where the fitness function varies in time.

Proper generalized decomposition

The proper generalized decomposition (PGD) is an iterative numerical method for solving boundary value problems (BVPs), that is, partial differential equations constrained by a set of boundary conditions, such as the Poisson's equation or the Laplace's equation. The PGD algorithm computes an approximation of the solution of the BVP by successive enrichment. This means that, in each iteration, a new component (or mode) is computed and added to the approximation. In principle, the more modes obtained, the closer the approximation is to its theoretical solution. Unlike POD principal components, PGD modes are not necessarily orthogonal to each other. By selecting only the most relevant PGD modes, a reduced order model of the solution is obtained. Because of this, PGD is considered a dimensionality reduction algorithm. == Description == The proper generalized decomposition is a method characterized by a variational formulation of the problem, a discretization of the domain in the style of the finite element method, the assumption that the solution can be approximated as a separate representation and a numerical greedy algorithm to find the solution. === Variational formulation === In the Proper Generalized Decomposition method, the variational formulation involves translating the problem into a format where the solution can be approximated by minimizing (or sometimes maximizing) a functional. A functional is a scalar quantity that depends on a function, which in this case, represents our problem. The most commonly implemented variational formulation in PGD is the Bubnov-Galerkin method. This method is chosen for its ability to provide an approximate solution to complex problems, such as those described by partial differential equations (PDEs). In the Bubnov-Galerkin approach, the idea is to project the problem onto a space spanned by a finite number of basis functions. These basis functions are chosen to approximate the solution space of the problem. In the Bubnov-Galerkin method, we seek an approximate solution that satisfies the integral form of the PDEs over the domain of the problem. This is different from directly solving the differential equations. By doing so, the method transforms the problem into finding the coefficients that best fit this integral equation in the chosen function space. While the Bubnov-Galerkin method is prevalent, other variational formulations are also used in PGD, depending on the specific requirements and characteristics of the problem, such as: Petrov-Galerkin Method: This method is similar to the Bubnov-Galerkin approach but differs in the choice of test functions. In the Petrov-Galerkin method, the test functions (used to project the residual of the differential equation) are different from the trial functions (used to approximate the solution). This can lead to improved stability and accuracy for certain types of problems. Collocation Method: In collocation methods, the differential equation is satisfied at a finite number of points in the domain, known as collocation points. This approach can be simpler and more direct than the integral-based methods like Galerkin's, but it may also be less stable for some problems. Least Squares Method: This approach involves minimizing the square of the residual of the differential equation over the domain. It is particularly useful when dealing with problems where traditional methods struggle with stability or convergence. Mixed Finite Element Method: In mixed methods, additional variables (such as fluxes or gradients) are introduced and approximated along with the primary variable of interest. This can lead to more accurate and stable solutions for certain problems, especially those involving incompressibility or conservation laws. Discontinuous Galerkin Method: This is a variant of the Galerkin method where the solution is allowed to be discontinuous across element boundaries. This method is particularly useful for problems with sharp gradients or discontinuities. === Domain discretization === The discretization of the domain is a well defined set of procedures that cover (a) the creation of finite element meshes, (b) the definition of basis function on reference elements (also called shape functions) and (c) the mapping of reference elements onto the elements of the mesh. === Separate representation === PGD assumes that the solution u of a (multidimensional) problem can be approximated as a separate representation of the form u ≈ u N ( x 1 , x 2 , … , x d ) = ∑ i = 1 N X 1 i ( x 1 ) ⋅ X 2 i ( x 2 ) ⋯ X d i ( x d ) , {\displaystyle \mathbf {u} \approx \mathbf {u} ^{N}(x_{1},x_{2},\ldots ,x_{d})=\sum _{i=1}^{N}\mathbf {X_{1}} _{i}(x_{1})\cdot \mathbf {X_{2}} _{i}(x_{2})\cdots \mathbf {X_{d}} _{i}(x_{d}),} where the number of addends N and the functional products X1(x1), X2(x2), ..., Xd(xd), each depending on a variable (or variables), are unknown beforehand. === Greedy algorithm === The solution is sought by applying a greedy algorithm, usually the fixed point algorithm, to the weak formulation of the problem. For each iteration i of the algorithm, a mode of the solution is computed. Each mode consists of a set of numerical values of the functional products X1(x1), ..., Xd(xd), which enrich the approximation of the solution. Due to the greedy nature of the algorithm, the term 'enrich' is used rather than 'improve', since some modes may actually worsen the approach. The number of computed modes required to obtain an approximation of the solution below a certain error threshold depends on the stopping criterion of the iterative algorithm. == Features == PGD is suitable for solving high-dimensional problems, since it overcomes the limitations of classical approaches. In particular, PGD avoids the curse of dimensionality, as solving decoupled problems is computationally much less expensive than solving multidimensional problems. Therefore, PGD enables to re-adapt parametric problems into a multidimensional framework by setting the parameters of the problem as extra coordinates: u ≈ u N ( x 1 , … , x d ; k 1 , … , k p ) = ∑ i = 1 N X 1 i ( x 1 ) ⋯ X d i ( x d ) ⋅ K 1 i ( k 1 ) ⋯ K p i ( k p ) , {\displaystyle \mathbf {u} \approx \mathbf {u} ^{N}(x_{1},\ldots ,x_{d};k_{1},\ldots ,k_{p})=\sum _{i=1}^{N}\mathbf {X_{1}} _{i}(x_{1})\cdots \mathbf {X_{d}} _{i}(x_{d})\cdot \mathbf {K_{1}} _{i}(k_{1})\cdots \mathbf {K_{p}} _{i}(k_{p}),} where a series of functional products K1(k1), K2(k2), ..., Kp(kp), each depending on a parameter (or parameters), has been incorporated to the equation. In this case, the obtained approximation of the solution is called computational vademecum: a general meta-model containing all the particular solutions for every possible value of the involved parameters. == Sparse Subspace Learning == The Sparse Subspace Learning (SSL) method leverages the use of hierarchical collocation to approximate the numerical solution of parametric models. With respect to traditional projection-based reduced order modeling, the use of a collocation enables non-intrusive approach based on sparse adaptive sampling of the parametric space. This allows to recover the lowdimensional structure of the parametric solution subspace while also learning the functional dependency from the parameters in explicit form. A sparse low-rank approximate tensor representation of the parametric solution can be built through an incremental strategy that only needs to have access to the output of a deterministic solver. Non-intrusiveness makes this approach straightforwardly applicable to challenging problems characterized by nonlinearity or non affine weak forms.

Coherent extrapolated volition

Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment describing an approach by which an artificial superintelligence (ASI) would act on a benevolent supposition of what humans would want if they were more knowledgeable, more rational, had more time to think, and had matured together as a society, as opposed to humanity's current individual or collective preferences. It was proposed by Eliezer Yudkowsky in 2004 as part of his work on friendly AI. == Concept == CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences. In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted. == Debate == Yudkowsky and Nick Bostrom note that CEV has several interesting properties. It is designed to be humane and self-correcting, by capturing the source of human values instead of trying to list them. It avoids the difficulty of laying down an explicit, fixed list of rules. It encapsulates moral growth, preventing flawed current moral beliefs from getting locked in. It limits the influence that a small group of programmers can have on what the ASI would value, thus also reducing the incentives to build ASI first. And it keeps humanity in charge of its destiny. CEV also faces significant theoretical and practical challenges. Bostrom notes that CEV has "a number of free parameters that could be specified in various ways, yielding different versions of the proposal." One such parameter is the extrapolation base (whose extrapolated volition is taken into account). For example, whether it should include people with severe dementia, patients in a vegetative state, foetuses, or embryos. He also notes that if CEV's extrapolation base only includes humans, there is a risk that the result would be ungenerous toward other animals and digital minds. One possible solution would be to include a mechanism to expand CEV's extrapolation base. == Variants and alternatives == A proposed theoretical alternative to CEV is to rely on an artificial superintelligence's superior cognitive capabilities to figure out what is morally right, and let it act accordingly. It is also possible to combine both techniques, for instance with the ASI following CEV except when it is morally impermissible. In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.

Linear genetic programming

"Linear genetic programming" is unrelated to "linear programming". Linear genetic programming (LGP) is a particular method of genetic programming wherein computer programs in a population are represented as a sequence of register-based instructions from an imperative programming language or machine language. The adjective "linear" stems from the fact that each LGP program is a sequence of instructions and the sequence of instructions is normally executed sequentially. Like in other programs, the data flow in LGP can be modeled as a graph that will visualize the potential multiple usage of register contents and the existence of structurally noneffective code (introns) which are two main differences of this genetic representation from the more common tree-based genetic programming (TGP) variant. Like other Genetic Programming methods, Linear genetic programming requires the input of data to run the program population on. Then, the output of the program (its behaviour) is judged against some target behaviour, using a fitness function. However, LGP is generally more efficient than tree genetic programming due to its two main differences mentioned above: Intermediate results (stored in registers) can be reused and a simple intron removal algorithm exists that can be executed to remove all non-effective code prior to programs being run on the intended data. These two differences often result in compact solutions and substantial computational savings compared to the highly constrained data flow in trees and the common method of executing all tree nodes in TGP. Furthermore, LGP naturally has multiple outputs by defining multiple output registers and easily cooperates with control flow operations. Linear genetic programming has been applied in many domains, including system modeling and system control with considerable success. Linear genetic programming should not be confused with linear tree programs in tree genetic programming, program composed of a variable number of unary functions and a single terminal. Note that linear tree GP differs from bit string genetic algorithms since a population may contain programs of different lengths and there may be more than two types of functions or more than two types of terminals. == Examples of LGP programs == Because LGP programs are basically represented by a linear sequence of instructions, they are simpler to read and to operate on than their tree-based counterparts. For example, a simple program written to solve a Boolean function problem with 3 inputs (in R1, R2, R3) and one output (in R0), could read like this: R1, R2, R3 have to be declared as input (read-only) registers, while R0 and R4 are declared as calculation (read-write) registers. This program is very simple, having just 5 instructions. But mutation and crossover operators could work to increase the length of the program, as well as the content of each of its instructions. Note that one instruction is non-effective or an intron (marked), since it does not impact the output register R0. Recognition of those instructions is the basis for the intron removal algorithm which is used analyze code prior to execution. Technically, this happens by copying an individual and then run the intron removal once. The copy with removed introns is then executed as many times as dictated by the number of training cases. Notably, the original individual is left intact, so as to continue participating in the evolutionary process. It is only the copy that is executed that is compressed by removing these "structural" introns. Another simple program, this one written in the LGP language Slash/A looks like a series of instructions separated by a slash: By representing such code in bytecode format, i.e. as an array of bytes each representing a different instruction, one can make mutation operations simply by changing an element of such an array.