LightGBM

LightGBM

LightGBM, short for Light Gradient-Boosting Machine, is a free and open-source distributed gradient-boosting framework for machine learning, originally developed by Microsoft. It is based on decision tree algorithms and used for ranking, classification and other machine learning tasks. The development focus is on performance and scalability. == Overview == The LightGBM framework supports different algorithms including GBT, GBDT, GBRT, GBM, MART and RF. LightGBM has many of XGBoost's advantages, including sparse optimization, parallel training, multiple loss functions, regularization, bagging, and early stopping. A major difference between the two lies in the construction of trees. LightGBM does not grow a tree level-wise — row by row — as most other implementations do. Instead it grows trees leaf-wise. It will choose the leaf with max delta loss to grow. Besides, LightGBM does not use the widely used sorted-based decision tree learning algorithm, which searches the best split point on sorted feature values, as XGBoost or other implementations do. Instead, LightGBM implements a highly optimized histogram-based decision tree learning algorithm, which yields great advantages on both efficiency and memory consumption. The LightGBM algorithm utilizes two novel techniques called Gradient-Based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) which allow the algorithm to run faster while maintaining a high level of accuracy. LightGBM works on Linux, Windows, and macOS and supports C++, Python, R, and C#. The source code is licensed under MIT License and available on GitHub. == Gradient-based one-side sampling == When using gradient descent, one thinks about the space of possible configurations of the model as a valley, in which the lowest part of the valley is the model which most closely fits the data. In this metaphor, one walks in different directions to learn how much lower the valley becomes. Typically, in gradient descent, one uses the whole set of data to calculate the valley's slopes. However, this commonly used method assumes that every data point is equally informative. By contrast, Gradient-Based One-Side Sampling (GOSS), a method first developed for gradient-boosted decision trees, does not rely on the assumption that all data are equally informative. Instead, it treats data points with smaller gradients (shallower slopes) as less informative by randomly dropping them. This is intended to filter out data which may have been influenced by noise, allowing the model to more accurately model the underlying relationships in the data. == Exclusive feature bundling == Exclusive feature bundling (EFB) is a near-lossless method to reduce the number of effective features. In a sparse feature space many features are nearly exclusive, implying they rarely take nonzero values simultaneously. One-hot encoded features are a perfect example of exclusive features. EFB bundles these features, reducing dimensionality to improve efficiency while maintaining a high level of accuracy. The bundle of exclusive features into a single feature is called an exclusive feature bundle.

TipTop Technologies

TipTop Technologies is a real-time web and social search engine with a platform for semantic analysis of natural language. Tip-Top Search provides results capturing individual and group sentiment, opinions, and experiences there from the content of various sorts such as real-time messages from Twitter or consumer product reviews on Amazon.com. TipTop Technologies and ITC Infotech collaborated to create a search interface suitable for both enterprise and consumer applications. Tip-Top's products are part of the "emerging Web 3.0 applications which use semantic technologies to augment the underlying Web system's functionalities." Their main product is 360, an AI tool that incorporates multiple AI applications under one wing. Jonathan AlBright professor at Elon University, found videos generated by TipTop Technologies software on YouTube in his research into artificial intelligence, described it as AI-generated "fake news". Through semantic analysis of large data sets, TipTop gleaned behavioral insights from Tweets around events like Halloween, Thanksgiving, Holiday Gifting, the Super Bowl, and the Oscar Nominees for the Academy Awards coverage. Sentiment analysis, concept trend tracking, and real-time market research are other applications included in the TipTop Search product. TipTop's insight engine solves the problem of real-time data noise, and its ability to "sort the 'good tweets' from the 'bad tweets' when it comes to a product, service, or a region..." In addition, products like TipTop Shopping with customizable search widgets bring together consumer reviews, social search, and sentiment analysis enabling product comparisons across attributes like the overall value and aiding purchasing decisions through user-driven product tips and pits. TipTop Finance adds another complexity to real-time search results by incorporating corporate sentiment, company stock tickers, and social media into TipTop's existing social search platform. Additional success applying semantic technologies has been with polling, "if you compare these Gallup results with TipTop, a sentiment engine based on Twitter, the results are not way off. It does surprise you but it tells me that sentiment analysis in case of public opinion about a burning social issue or a famous personality is relatively easier." With the increasing amount of unstructured, opinion-oriented, and user-generated content available on the Web, TipTop's technology aims to make sense of all this data, and deliver it in a useful way for consumer and enterprise users alike. TipTop Technologies is a privately held company with its headquarters in the San Francisco Bay Area, and team members are located globally.

Linguamatics

Linguamatics, headquartered in Cambridge, England, with offices in the United States and UK, is a provider of text mining systems through software licensing and services, primarily for pharmaceutical and healthcare applications. Founded in 2001, the company was purchased by IQVIA in January 2019. == Technology == The company develops enterprise search tools for the life sciences sector. The core natural language processing engine (I2E) uses a federated architecture to incorporate data from 3rd party resources. Initially developed to be used interactively through a graphic user interface, the core software also has an application programming interface that can be used to automate searches. LabKey, Penn Medicine, Atrius Health and Mercy all use Linguamatics software to extract electronic health record data into data warehouses. Linguamatics software is used by 17 of the top 20 global pharmaceutical companies, the US Food and Drug Administration, as well as healthcare providers. == Software community == The core software, "I2E", is used by a number of companies to either extend their own software or to publish their data. Copyright Clearance Center uses I2E to produce searchable indexes of material that would otherwise be unsearchable due to copyright. Thomson Reuters produces Cortellis Informatics Clinical Text Analytics, which depends on I2E to make clinical data accessible and searchable. Pipeline Pilot can integrate I2E as part of a workflow. ChemAxon can be used alongside I2E to allow named entity recognition of chemicals within unstructured data. Data sources include MEDLINE, ClinicalTrials.gov, FDA Drug Labels, PubMed Central, and Patent Abstracts.

Language identification in the limit

Language identification in the limit is a formal model for inductive inference of formal languages, mainly by computers (see machine learning and induction of regular languages). It was introduced by E. Mark Gold in a technical report and a journal article with the same title. In this model, a teacher provides to a learner some presentation (i.e. a sequence of strings) of some formal language. The learning is seen as an infinite process. Each time the learner reads an element of the presentation, it should provide a representation (e.g. a formal grammar) for the language. Gold defines that a learner can identify in the limit a class of languages if, given any presentation of any language in the class, the learner will produce only a finite number of wrong representations, and then stick with the correct representation. However, the learner need not be able to announce its correctness; and the teacher might present a counterexample to any representation arbitrarily long after. Gold defined two types of presentations: Text (positive information): an enumeration of all strings the language consists of. Complete presentation (positive and negative information): an enumeration of all possible strings, each with a label indicating if the string belongs to the language or not. == Learnability == This model is an early attempt to formally capture the notion of learnability. Gold's journal article introduces for contrast the stronger models Finite identification (where the learner has to announce correctness after a finite number of steps), and Fixed-time identification (where correctness has to be reached after an apriori-specified number of steps). A weaker formal model of learnability is the Probably approximately correct learning (PAC) model, introduced by Leslie Valiant in 1984. == Examples == It is instructive to look at concrete examples (in the tables) of learning sessions the definition of identification in the limit speaks about. A fictitious session to learn a regular language L over the alphabet {a,b} from text presentation:In each step, the teacher gives a string belonging to L, and the learner answers a guess for L, encoded as a regular expression. In step 3, the learner's guess is not consistent with the strings seen so far; in step 4, the teacher gives a string repeatedly. After step 6, the learner sticks to the regular expression (ab+ba). If this happens to be a description of the language L the teacher has in mind, it is said that the learner has learned that language.If a computer program for the learner's role would exist that was able to successfully learn each regular language, that class of languages would be identifiable in the limit. Gold has shown that this is not the case. A particular learning algorithm always guessing L to be just the union of all strings seen so far:If L is a finite language, the learner will eventually guess it correctly, however, without being able to tell when. Although the guess didn't change during step 3 to 6, the learner couldn't be sure to be correct.Gold has shown that the class of finite languages is identifiable in the limit, however, this class is neither finitely nor fixed-time identifiable. Learning from complete presentation by telling:In each step, the teacher gives a string and tells whether it belongs to L (green) or not (red, struck-out). Each possible string is eventually classified in this way by the teacher. Learning from complete presentation by request:The learner gives a query string, the teacher tells whether it belongs to L (yes) or not (no); the learner then gives a guess for L, followed by the next query string. In this example, the learner happens to query in each step just the same string as given by the teacher in example 3.In general, Gold has shown that each language class identifiable in the request-presentation setting is also identifiable in the telling-presentation setting, since the learner, instead of querying a string, just needs to wait until it is eventually given by the teacher. == Gold's theorem == More formally, a language L {\displaystyle L} is a nonempty set, and its elements are called sentences. a language family is a set of languages. a language-learning environment E {\displaystyle E} for a language L {\displaystyle L} is a stream of sentences from L {\displaystyle L} , such that each sentence in L {\displaystyle L} appears at least once. a language learner is a function f {\displaystyle f} that sends a list of sentences to a language. This is interpreted as saying that, after seeing sentences a 1 , a 2 . . . , a n {\displaystyle a_{1},a_{2}...,a_{n}} in that order, the language learner guesses that the language that produces the sentences should be f ( a 1 , . . . , a n ) {\displaystyle f(a_{1},...,a_{n})} . Note that the learner is not obliged to be correct — it could very well guess a language that does not even contain a 1 , . . . , a n {\displaystyle a_{1},...,a_{n}} . a language learner f {\displaystyle f} learns a language L {\displaystyle L} in environment E = ( a 1 , a 2 , . . . ) {\displaystyle E=(a_{1},a_{2},...)} if the learner always guesses L {\displaystyle L} after seeing enough examples from the environment. a language learner f {\displaystyle f} learns a language L {\displaystyle L} if it learns L {\displaystyle L} in any environment E {\displaystyle E} for L {\displaystyle L} . a language family is learnable if there exists a language learner that can learn all languages in the family. Notes: In the context of Gold's theorem, sentences need only be distinguishable. They need not be anything in particular, such as finite strings (as usual in formal linguistics). Learnability is not a concept for individual languages. Any individual language L {\displaystyle L} could be learned by a trivial learner that always guesses L {\displaystyle L} . Learnability is not a concept for individual learners. A language family is learnable if, and only if, there exists some learner that can learn the family. It does not matter how well the learner performs for learning languages outside the family. Gold's theorem is easily bypassed if negative examples are allowed. In particular, the language family { L 1 , L 2 , . . . , L ∞ } {\displaystyle \{L_{1},L_{2},...,L_{\infty }\}} can be learned by a learner that always guesses L ∞ {\displaystyle L_{\infty }} until it receives the first negative example ¬ a n {\displaystyle \neg a_{n}} , where a n ∈ L n + 1 ∖ L n {\displaystyle a_{n}\in L_{n+1}\setminus L_{n}} , at which point it always guesses L n {\displaystyle L_{n}} . == Learnability characterization == Dana Angluin gave the characterizations of learnability from text (positive information) in a 1980 paper. If a learner is required to be effective, then an indexed class of recursive languages is learnable in the limit if there is an effective procedure that uniformly enumerates tell-tales for each language in the class (Condition 1). It is not hard to see that if an ideal learner (i.e., an arbitrary function) is allowed, then an indexed class of languages is learnable in the limit if each language in the class has a tell-tale (Condition 2). == Language classes learnable in the limit == The table shows which language classes are identifiable in the limit in which learning model. On the right-hand side, each language class is a superclass of all lower classes. Each learning model (i.e. type of presentation) can identify in the limit all classes below it. In particular, the class of finite languages is identifiable in the limit by text presentation (cf. Example 2 above), while the class of regular languages is not. Pattern Languages, introduced by Dana Angluin in another 1980 paper, are also identifiable by normal text presentation; they are omitted in the table, since they are above the singleton and below the primitive recursive language class, but incomparable to the classes in between. == Sufficient conditions for learnability == Condition 1 in Angluin's paper is not always easy to verify. Therefore, people come up with various sufficient conditions for the learnability of a language class. See also Induction of regular languages for learnable subclasses of regular languages. === Finite thickness === A class of languages has finite thickness if every non-empty set of strings is contained in at most finitely many languages of the class. This is exactly Condition 3 in Angluin's paper. Angluin showed that if a class of recursive languages has finite thickness, then it is learnable in the limit. A class with finite thickness certainly satisfies MEF-condition and MFF-condition; in other words, finite thickness implies M-finite thickness. === Finite elasticity === A class of languages is said to have finite elasticity if for every infinite sequence of strings s 0 , s 1 , . . . {\displaystyle s_{0},s_{1},...} and every infinite sequence of languages in the class L 1 , L 2 , . . . {\displaystyle L_{1},L_{2},...} , there exists a finite number n such

Multidimensional scaling

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n {\textstyle n} objects in a set into a configuration of n {\textstyle n} points mapped into an abstract Cartesian space. More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction. Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, N, an MDS algorithm places each object into N-dimensional space (a lower-dimensional representation) such that the between-object distances are preserved as well as possible. For N = 1, 2, and 3, the resulting points can be visualized on a scatter plot. Core theoretical contributions to MDS were made by James O. Ramsay of McGill University, who is also regarded as the founder of functional data analysis. == Types == MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix: === Classical multidimensional scaling === It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain, which is given by Strain D ( x 1 , x 2 , . . . , x n ) = ( ∑ i , j ( b i j − x i T x j ) 2 ∑ i , j b i j 2 ) 1 / 2 , {\displaystyle {\text{Strain}}_{D}(x_{1},x_{2},...,x_{n})={\Biggl (}{\frac {\sum _{i,j}{\bigl (}b_{ij}-x_{i}^{T}x_{j}{\bigr )}^{2}}{\sum _{i,j}b_{ij}^{2}}}{\Biggr )}^{1/2},} where x i {\displaystyle x_{i}} denote vectors in N-dimensional space, x i T x j {\displaystyle x_{i}^{T}x_{j}} denotes the scalar product between x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} , and b i j {\displaystyle b_{ij}} are the elements of the matrix B {\displaystyle B} defined on step 2 of the following algorithm, which are computed from the distances. Steps of a Classical MDS algorithm: Classical MDS uses the fact that the coordinate matrix X {\displaystyle X} can be derived by eigenvalue decomposition from B = X X ′ {\textstyle B=XX'} . And the matrix B {\textstyle B} can be computed from proximity matrix D {\textstyle D} by using double centering. Set up the squared proximity matrix D ( 2 ) = [ d i j 2 ] {\textstyle D^{(2)}=[d_{ij}^{2}]} Apply double centering: B = − 1 2 C D ( 2 ) C {\textstyle B=-{\frac {1}{2}}CD^{(2)}C} using the centering matrix C = I − 1 n J n {\textstyle C=I-{\frac {1}{n}}J_{n}} , where n {\textstyle n} is the number of objects, I {\textstyle I} is the n × n {\textstyle n\times n} identity matrix, and J n {\textstyle J_{n}} is an n × n {\textstyle n\times n} matrix of all ones. Determine the m {\textstyle m} largest eigenvalues λ 1 , λ 2 , . . . , λ m {\textstyle \lambda _{1},\lambda _{2},...,\lambda _{m}} and corresponding eigenvectors e 1 , e 2 , . . . , e m {\textstyle e_{1},e_{2},...,e_{m}} of B {\textstyle B} (where m {\textstyle m} is the number of dimensions desired for the output). Now, X = E m Λ m 1 / 2 {\textstyle X=E_{m}\Lambda _{m}^{1/2}} , where E m {\textstyle E_{m}} is the matrix of m {\textstyle m} eigenvectors and Λ m {\textstyle \Lambda _{m}} is the diagonal matrix of m {\textstyle m} eigenvalues of B {\textstyle B} . Classical MDS assumes metric distances. So this is not applicable for direct dissimilarity ratings. === Metric multidimensional scaling (mMDS) === It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares: Stress D ( x 1 , x 2 , . . . , x n ) = ∑ i ≠ j = 1 , . . . , n ( d i j − ‖ x i − x j ‖ ) 2 . {\displaystyle {\text{Stress}}_{D}(x_{1},x_{2},...,x_{n})={\sqrt {\sum _{i\neq j=1,...,n}{\bigl (}d_{ij}-\|x_{i}-x_{j}\|{\bigr )}^{2}}}.} Metric scaling uses a power transformation with a user-controlled exponent p {\textstyle p} : d i j p {\textstyle d_{ij}^{p}} and − d i j 2 p {\textstyle -d_{ij}^{2p}} for distance. In classical scaling p = 1. {\textstyle p=1.} Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities. === Non-metric multidimensional scaling (NMDS) === In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. Let d i j {\displaystyle d_{ij}} be the dissimilarity between points i , j {\displaystyle i,j} . Let d ^ i j = ‖ x i − x j ‖ {\displaystyle {\hat {d}}_{ij}=\|x_{i}-x_{j}\|} be the Euclidean distance between embedded points x i , x j {\displaystyle x_{i},x_{j}} . Now, for each choice of the embedded points x i {\displaystyle x_{i}} and is a monotonically increasing function f {\displaystyle f} , define the "stress" function: S ( x 1 , . . . , x n ; f ) = ∑ i < j ( f ( d i j ) − d ^ i j ) 2 ∑ i < j d ^ i j 2 . {\displaystyle S(x_{1},...,x_{n};f)={\sqrt {\frac {\sum _{i

Intrapixel and Interpixel processing

Intrapixel and Interpixel processing is used in the processing of computers graphics, as well as sensors and images in equipment such as cameras. For computer graphics, CMOS sensor processing is done in pixel level. This process includes two general categories: intrapixel processing, where the processing is performed on the individual pixel signals, and interpixel processing, where the processing is performed locally or globally on signals from several pixels. The purpose of interpixel processing is to perform early vision processing, not merely to capture images. Intrapixel and Interpixel processing is an integral part of spatial processing within the earth Mixed Spatial Attraction Model. This also includes use within hyperspectral image processing.

Minimum Population Search

In evolutionary computation, Minimum Population Search (MPS) is a computational method that optimizes a problem by iteratively trying to improve a set of candidate solutions with regard to a given measure of quality. It solves a problem by evolving a small population of candidate solutions by means of relatively simple arithmetical operations. MPS is a metaheuristic as it makes few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. For problems where finding the precise global optimum is less important than finding an acceptable local optimum in a fixed amount of time, using a metaheuristic such as MPS may be preferable to alternatives such as brute-force search or gradient descent. MPS is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means MPS does not require for the optimization problem to be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods. MPS can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. == Background == In a similar way to Differential evolution, MPS uses difference vectors between the members of the population in order to generate new solutions. It attempts to provide an efficient use of function evaluations by maintaining a small population size. If the population size is smaller than the dimensionality of the search space, then the solutions generated through difference vectors will be constrained to the n − 1 {\displaystyle n-1} dimensional hyperplane. A smaller population size will lead to a more restricted subspace. With a population size equal to the dimensionality of the problem ( n = d ) {\displaystyle (n=d)} , the “line/hyperplane points” in MPS will be generated within a d − 1 {\displaystyle d-1} dimensional hyperplane. Taking a step orthogonal to this hyperplane will allow the search process to cover all the dimensions of the search space. Population size is a fundamental parameter in the performance of population-based heuristics. Larger populations promote exploration, but they also allow fewer generations, and this can reduce the chance of convergence. Searching with a small population can increase the chances of convergence and the efficient use of function evaluations, but it can also induce the risk of premature convergence. If the risk of premature convergence can be avoided, then a population-based heuristic could benefit from the efficiency and faster convergence rate of a smaller population. To avoid premature convergence, it is important to have a diversified population. By including techniques for explicitly increasing diversity and exploration, it is possible to have smaller populations with less risk of premature convergence. === Thresheld Convergence === Thresheld Convergence (TC) is a diversification technique which attempts to separate the processes of exploration and exploitation. TC uses a “threshold” function to establish a minimum search step, and managing this step makes it possible to influence the transition from exploration to exploitation, convergence is thus “held” back until the last stages of the search process. The goal of a controlled transition is to avoid an early concentration of the population around a few search regions and avoid the loss of diversity which can cause premature convergence. Thresheld Convergence has been successfully applied to several population-based metaheuristics such as Particle Swarm Optimization, Differential evolution, Evolution strategies, Simulated annealing and Estimation of Distribution Algorithms. The ideal case for Thresheld Convergence is to have one sample solution from each attraction basin, and for each sample solution to have the same relative fitness with respect to its local optimum. Enforcing a minimum step aims to achieve this ideal case. In MPS Thresheld Convergence is specifically used to preserve diversity and avoid premature convergence by establishing a minimum search step. By disallowing new solutions which are too close to members of the current population, TC forces a strong exploration during the early stages of the search while preserving the diversity of the (small) population. == Algorithm == A basic variant of the MPS algorithm works by having a population of size equal to the dimension of the problem. New solutions are generated by exploring the hyperplane defined by the current solutions (by means of difference vectors) and performing an additional orthogonal step in order to avoid getting caught in this hyperplane. The step sizes are controlled by the Thresheld Convergence technique, which gradually reduces step sizes as the search process advances. An outline for the algorithm is given below: Generate the first initial population. Allowing these solutions to lie near the bounds of the search space generally gives good results: s k = ( r s 1 ∗ b o u n d 1 / 2 , r s 2 ∗ b o u n d 2 / 2 , . . . , r s n ∗ b o u n d n / 2 ) {\displaystyle s_{k}=(rs_{1}bound_{1}/2,rs_{2}bound_{2}/2,...,rs_{n}bound_{n}/2)} where s k {\displaystyle s_{k}} is the k {\displaystyle k} -th population member, r s i {\displaystyle rs_{i}} are random numbers which can be −1 or 1, and the b o u n d i {\displaystyle bound_{i}} are the lower and upper bounds on each dimension. While a stop condition is not reached: Update threshold convergence values ( m i n _ s t e p {\displaystyle min\_step} and m a x _ s t e p {\displaystyle max\_step} ) Calculate the centroid of the current population ( x c {\displaystyle x_{c}} ) For each member of the population ( x i {\displaystyle x_{i}} ), generate a new offspring as follows: Uniformly generate a scaling factor ( F i {\displaystyle F_{i}} ) between − m a x _ s t e p {\displaystyle -max\_step} and m a x _ s t e p {\displaystyle max\_step} Generate a vector ( x o {\displaystyle x_{o}} ) orthogonal to the difference vector between x i {\displaystyle x_{i}} and x c {\displaystyle x_{c}} Calculate a scaling factor for the orthogonal vector: m i n _ o r t h = s q r t ( m a x ( m i n _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle min\_orth=sqrt(max(min\_step^{2}-F_{i}^{2},0))} m a x _ o r t h = s q r t ( m a x ( m a x _ s t e p 2 − F i 2 , 0 ) ) {\displaystyle max\_orth=sqrt(max(max\_step^{2}-F_{i}^{2},0))} o r t h _ s t e p = u n i f o r m ( m i n _ o r t h , m a x _ o r t h ) {\displaystyle orth\_step=uniform(min\_orth,max\_orth)} Generate the new solution by adding the difference and the orthogonal vectors to the original solution n e w _ s o l u t i o n = x i + F i ∗ ( x i − x c ) ∗ o r t h _ s t e p ∗ x o {\displaystyle new\_solution=x_{i}+F_{i}(x_{i}-x_{c})orth\_stepx_{o}} Pick the best members between the old population and the new one by discarding the least fit members. Return the single best solution or the best population found as the final result.