Best AI Writing Tools

Best AI Writing Tools — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Artificial intelligence arms race

    Artificial intelligence arms race

    A military artificial intelligence arms race is a technological, economic, and military competition between two or more states to develop and deploy advanced AI technologies and lethal autonomous weapons systems (LAWS). The goal is to gain a strategic or tactical advantage over rivals, similar to previous arms races involving nuclear or conventional military technologies. Since the mid-2010s, many analysts have noted the emergence of such an arms race between superpowers for better AI technology and military AI, driven by increasing geopolitical and military tensions. An AI arms race is sometimes placed in the context of an AI Cold War between the United States and China. Several influential figures and publications have emphasized that whoever develops artificial general intelligence (AGI) first could dominate global affairs in the 21st century. Russian President Vladimir Putin stated that the leader in AI will "rule the world." Researchers and experts, such as Leopold Aschenbrenner and Adrian Pecotic respectively, warn that the AGI race between major powers like the U.S. and China could reshape geopolitical power. This includes AI for surveillance, autonomous weapons, decision-making systems, cyber operations, and more. == Terminology == Lethal autonomous weapons systems use artificial intelligence to identify and kill human targets without human intervention. LAWS have colloquially been called "slaughterbots" or "killer robots". Broadly, any competition for superior AI is sometimes framed as an "arms race". Advantages in military AI overlap with advantages in other sectors, as countries pursue both economic and military advantages, as per previous arms races throughout history. == History == In 2014, AI specialist Steve Omohundro warned that "An autonomous weapons arms race is already taking place". According to Siemens, worldwide military spending on robotics was US$5.1 billion in 2010 and US$7.5 billion in 2015. China became a top player in artificial intelligence research in the 2010s. According to the Financial Times, in 2016, for the first time, China published more AI research papers than the entire European Union. When restricted to number of AI papers in the top 5% of cited papers, China overtook the United States in 2016 but lagged behind the European Union. 23% of the researchers presenting at the 2017 American Association for the Advancement of Artificial Intelligence (AAAI) conference were Chinese. Eric Schmidt, the former chairman and chief executive officer of Alphabet, has predicted China will be the leading country in AI by 2025. == Risks == One risk concerns the AI race itself, whether or not the race is won by any one group. There are strong incentives for development teams to cut corners with regard to the safety of the system, increasing the risk of critical failures and unintended consequences. This is in part due to the perceived advantage of being the first to develop advanced AI technology. One team appearing to be on the brink of a breakthrough can encourage other teams to take shortcuts, ignore precautions and deploy a system that is less ready. Some argue that using "race" terminology at all in this context can exacerbate this effect. Another potential danger of an AI arms race is the possibility of losing control of the AI systems; the risk is compounded in the case of a race to artificial general intelligence, which may present an existential risk. In 2023, a United States Air Force official reportedly said that during a computer test, a simulated AI drone killed the human character operating it. The USAF later said the official had misspoken and that it never conducted such simulations. A third risk of an AI arms race is whether or not the race is actually won by one group. The concern is regarding the consolidation of power and technological advantage in the hands of one group. A US government report argued that "AI-enabled capabilities could be used to threaten critical infrastructure, amplify disinformation campaigns, and wage war":1, and that "global stability and nuclear deterrence could be undermined".:11 == By nation == === United States === In 2014, former Secretary of Defense Chuck Hagel posited the "Third Offset Strategy" that rapid advances in artificial intelligence will define the next generation of warfare. According to data science and analytics firm Govini, the U.S. Department of Defense (DoD) increased investment in artificial intelligence, big data and cloud computing from $5.6 billion in 2011 to $7.4 billion in 2016. However, the civilian NSF budget for AI saw no increase in 2017. Japan Times reported in 2018 that the United States private investment is around $70 billion per year. The November 2019 'Interim Report' of the United States' National Security Commission on Artificial Intelligence confirmed that AI is critical to US technological military superiority. The U.S. has many military AI combat programs, such as the Sea Hunter autonomous warship, which is designed to operate for extended periods at sea without a single crew member, and to even guide itself in and out of port. From 2017, a temporary US Department of Defense directive requires a human operator to be kept in the loop when it comes to the taking of human life by autonomous weapons systems. On October 31, 2019, the United States Department of Defense's Defense Innovation Board published the draft of a report recommending principles for the ethical use of artificial intelligence by the Department of Defense that would ensure a human operator would always be able to look into the 'black box' and understand the kill-chain process. However, a major concern is how the report will be implemented. The Joint Artificial Intelligence Center (JAIC) (pronounced "jake") is an American organization on exploring the usage of AI (particularly edge computing), Network of Networks, and AI-enhanced communication, for use in actual combat. It is a subdivision of the United States Armed Forces and was created in June 2018. The organization's stated objective is to "transform the US Department of Defense by accelerating the delivery and adoption of AI to achieve mission impact at scale. The goal is to use AI to solve large and complex problem sets that span multiple combat systems; then, ensure the combat Systems and Components have real-time access to ever-improving libraries of data sets and tools." In 2023, Microsoft pitched the DoD to use DALL-E models to train its battlefield management system. OpenAI, the developer of DALL-E, removed the blanket ban on military and warfare use from its usage policies in January 2024. The Biden administration imposed restrictions on the export of advanced NVIDIA chips and GPUs to China in an effort to limit China's progress in artificial intelligence and high-performance computing. The policy aimed to prevent the use of cutting-edge U.S. technology in military or surveillance applications and to maintain a strategic advantage in the global AI race. In 2025, under the second Trump administration, the United States began a broad deregulation campaign aimed at accelerating growth in sectors critical to artificial intelligence, including nuclear energy, infrastructure, and high-performance computing. The goal was to remove regulatory barriers and attract private investment to boost domestic AI capabilities. This included easing restrictions on data usage, speeding up approvals for AI-related infrastructure projects, and incentivizing innovation in cloud computing and semiconductors. Companies like NVIDIA, Oracle, and Cisco played a central role in these efforts, expanding their AI research, data center capacity, and partnerships to help position the U.S. as a global leader in AI development. ==== Project Maven ==== Project Maven is a Pentagon project involving using machine learning and engineering talent to distinguish people and objects in drone videos, apparently giving the government real-time battlefield command and control, and the ability to track, tag and spy on targets without human involvement. Initially the effort was led by Robert O. Work who was concerned about China's military use of the emerging technology. Reportedly, Pentagon development stops short of acting as an AI weapons system capable of firing on self-designated targets. The project was established in a memo by the U.S. Deputy Secretary of Defense on 26 April 2017. Also known as the Algorithmic Warfare Cross Functional Team, it is, according to Lt. Gen. of the United States Air Force Jack Shanahan in November 2017, a project "designed to be that pilot project, that pathfinder, that spark that kindles the flame front of artificial intelligence across the rest of the [Defense] Department". Its chief, U.S. Marine Corps Col. Drew Cukor, said: "People and computers will work symbiotically to increase the ability of weapon systems to detect objects." Project Maven has been noted by allies, such as Australia's Ian Langford, for the

    Read more →
  • Concept class

    Concept class

    In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory. Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning. In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map c : X → Y {\displaystyle c:X\to Y} , i.e. from examples to classifier labels (where Y = { 0 , 1 } {\displaystyle Y=\{0,1\}} and where c is a subset of X), c is then said to be a concept. A concept class C {\displaystyle C} is then a collection of such concepts. Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s. Not every subclass is reachable. == Background == A sample s {\displaystyle s} is a partial function from X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} . Identifying a concept with its characteristic function mapping X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} , it is a special case of a sample. Two samples are consistent if they agree on the intersection of their domains. A sample s ′ {\displaystyle s'} extends another sample s {\displaystyle s} if the two are consistent and the domain of s {\displaystyle s} is contained in the domain of s ′ {\displaystyle s'} . == Examples == Suppose that C = S + ( X ) {\displaystyle C=S^{+}(X)} . Then: the subclass { { x } } {\displaystyle \{\{x\}\}} is reachable with the sample s = { ( x , 1 ) } {\displaystyle s=\{(x,1)\}} ; the subclass S + ( Y ) {\displaystyle S^{+}(Y)} for Y ⊆ X {\displaystyle Y\subseteq X} are reachable with a sample that maps the elements of X − Y {\displaystyle X-Y} to zero; the subclass S ( X ) {\displaystyle S(X)} , which consists of the singleton sets, is not reachable. == Applications == Let C {\displaystyle C} be some concept class. For any concept c ∈ C {\displaystyle c\in C} , we call this concept 1 / d {\displaystyle 1/d} -good for a positive integer d {\displaystyle d} if, for all x ∈ X {\displaystyle x\in X} , at least 1 / d {\displaystyle 1/d} of the concepts in C {\displaystyle C} agree with c {\displaystyle c} on the classification of x {\displaystyle x} . The fingerprint dimension F D ( C ) {\displaystyle FD(C)} of the entire concept class C {\displaystyle C} is the least positive integer d {\displaystyle d} such that every reachable subclass C ′ ⊆ C {\displaystyle C'\subseteq C} contains a concept that is 1 / d {\displaystyle 1/d} -good for it. This quantity can be used to bound the minimum number of equivalence queries needed to learn a class of concepts according to the following inequality: F D ( C ) − 1 ≤ # E Q ( C ) ≤ ⌈ F D ( C ) ln ⁡ ( | C | ) ⌉ {\textstyle FD(C)-1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil } .

    Read more →
  • World Programming System

    World Programming System

    The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users of mixed ability to access and process data and to perform data science tasks. It has interactive visual programming tools using data workflows, and it has coding tools supporting the use of the SAS language mixed with Python, R and SQL. == About == WPS can use programs written in the language of SAS without the need for translating them into any other language. In this regard WPS is compatible with the SAS system. WPS has a built-in language interpreter able to process the language of SAS and produce similar results. WPS is available to run on z/OS, Windows, macOS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX. On all supported platforms, programs written in the language of SAS can be executed from a WPS command line interface, often referred to as running in batch mode. WPS can also be used from a graphical user interface known as the WPS Workbench for managing, editing and running programs written in the language of SAS. The WPS Workbench user interface is based on Eclipse. WPS version 4 (released in March 2018) introduced a drag-and-drop workflow canvas providing interactive blocks for data retrieval, blending and preparation, data discovery and profiling, predictive modelling powered by machine learning algorithms, model performance validation and scorecards. WPS version 3 (released in February 2012) provided a new client/server architecture that allows the WPS Workbench GUI to execute SAS programs on remote server installations of WPS in a network or cloud. The resulting output, data sets, logs, etc., can then all be viewed and manipulated from inside the Workbench as if the workloads had been executed locally. SAS programs do not require any special language statements to use this feature. == Summary of main features == Runs on Windows, macOS, z/OS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX An integrated development environment based on Eclipse for Linux, macOS and Windows. Support for language of SAS elements. Support for the language of SAS Macros. Matrix Programming support using PROC IML. Support for generating band plots, bar charts, box plots, bubble plots, contour plots, dendrogram plots, ellipse plots, fringe plots, heat maps, high-low plots, histograms, loess plots, needle plots, pie charts, penalised b-spline, radar charts, reference lines, scatter plots, series plots, step plots, regression plots and vector plots. Support for statistical procedures ACECLUS, ASSOCRULES, ANOVA, BIN, BOXPLOT, CANCORR, CANDISC, CLUSTER, CORRESP, DISCRIM, DISTANCE, FACTOR, FASTCLUS, FREQ, GAM, GANNO, GENMOD, GLIMMIX, GLM, GLMMOD, GLMSELECT, ICLIFETEST, KDE, LIFEREG, LIFETEST, LOESS, LOGISTIC, MDS, MEANS, MI, MIANALYSE, MIXED, MODECLUS, NESTED, NLIN, NPAR1WAY, PHREG, PLAN, PLS, POWER, PRINCOMP, PROBIT, QUANTREG, RBF, REG, ROBUSTREG, RSREG, SCORE, SEGMENT, SIMNORMAL, STANDARD, STDSIZE, STDRATE, STEPDISC, SUMMARY, SURVEYMEANS, SURVEYSELECT, TPSPLINE, TRANSREG, TREE, TTEST, UNIVARIATE, VARCLUS, VARCOMP Support for time series procedures ARIMA, AUTOREG, ESM, EXPAND, FORECAST, LOAN, SEVERITY, SPECTRA, TIMESERIES, X12 Support for machine learning procedures DECISIONFOREST, DECISIONTREE, GMM, MLP, OPTIMALBIN, SEGMENT, SVM Support for ODS. Reads and writes SAS datasets (compressed or uncompressed). Access: Actian Matrix (previously known as ParAccel), DASD, DB2, Excel, Greenplum, Hadoop, Informix, Kognitio Archived 2012-08-24 at the Wayback Machine, MariaDB, MySQL, Netezza, ODBC, OLEDB, Oracle, PostgreSQL, SAND, Snowflake, SPSS/PSPP, SQL Server, Sybase, Sybase IQ, Teradata, VSAM, Vertica and XML. Support for SAS Tape Format. Direct output of reports to CSV, PDF and HTML. Support to connect WPS systems programmatically, remote submit parts of a program to execute on connected remote servers, upload and download data between the connected systems. Support for Hadoop Support for R Support for Python == Industry recognition == Gartner recognized World Programming in their Cool Vendors in Data Science, 2014 Report. == Lawsuit == In 2010 World Programming defended its use of the language of SAS in the High Court of England and Wales in SAS Institute Inc. v World Programming Ltd. The software was the subject of a lawsuit by SAS Institute. The EU Court of Justice ruled in favor of World Programming, stating that the copyright protection does not extend to the software functionality, the programming language used and the format of the data files used by the program. It stated that there is no copyright infringement when a company which does not have access to the source code of a program studies, observes and tests that program to create another program with the same functionality.

    Read more →
  • State–action–reward–state–action

    State–action–reward–state–action

    State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote. This name reflects the fact that the main function for updating the Q-value depends on the current state of the agent "S1", the action the agent chooses "A1", the reward "R2" the agent gets for choosing this action, the state "S2" that the agent enters after taking that action, and finally the next action "A2" the agent chooses in its new state. The acronym for the quintuple (St, At, Rt+1, St+1, At+1) is SARSA. Some authors use a slightly different convention and write the quintuple (St, At, Rt, St+1, At+1), depending on which time step the reward is formally assigned. The rest of the article uses the former convention. == Algorithm == Q new ( S t , A t ) ← ( 1 − α ) Q ( S t , A t ) + α [ R t + 1 + γ Q ( S t + 1 , A t + 1 ) ] {\displaystyle Q^{\textrm {new}}(S_{t},A_{t})\leftarrow (1-\alpha )Q(S_{t},A_{t})+\alpha \,[R_{t+1}+\gamma \,Q(S_{t+1},A_{t+1})]} A SARSA agent interacts with the environment and updates the policy based on actions taken, hence this is known as an on-policy learning algorithm. The Q value for a state-action is updated by an error, adjusted by the learning rate α. Q values represent the possible reward received in the next time step for taking action a in state s, plus the discounted future reward received from the next state-action observation. Watkin's Q-learning updates an estimate of the optimal state-action value function Q ∗ {\displaystyle Q^{}} based on the maximum reward of available actions. While SARSA learns the Q values associated with taking the policy it follows itself, Watkin's Q-learning learns the Q values associated with taking the optimal policy while following an exploration/exploitation policy. Some optimizations of Watkin's Q-learning may be applied to SARSA. == Hyperparameters == === Learning rate (alpha) === The learning rate determines to what extent newly acquired information overrides old information. A factor of 0 will make the agent not learn anything, while a factor of 1 would make the agent consider only the most recent information. === Discount factor (gamma) === The discount factor determines the importance of future rewards. A discount factor of 0 makes the agent "opportunistic", or "myopic", e.g., by only considering current rewards, while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the Q {\displaystyle Q} values may diverge. === Initial conditions (Q(S0, A0)) === Since SARSA is an iterative algorithm, it implicitly assumes an initial condition before the first update occurs. A high (infinite) initial value, also known as "optimistic initial conditions", can encourage exploration: no matter what action takes place, the update rule causes it to have higher values than the other alternative, thus increasing their choice probability. In 2013 it was suggested that the first reward r {\displaystyle r} could be used to reset the initial conditions. According to this idea, the first time an action is taken the reward is used to set the value of Q {\displaystyle Q} . This allows immediate learning in case of fixed deterministic rewards. This resetting-of-initial-conditions (RIC) approach seems to be consistent with human behavior in repeated binary choice experiments.

    Read more →
  • YNAB

    YNAB

    You Need a Budget (YNAB) (pronounced ) is an online personal budgeting program based on the envelope system developed by a privately owned American company of the same name. It is available via any web browser or a mobile app. == History == The program was initially developed as standalone software in 2004 by Jesse Mecham, while he was in college pursuing his master's degree in accounting, after he and his wife faced financial difficulty and decided to improve their budgeting. It evolved from a spreadsheet that he created for the budgeting process. The acronym stands for "you need a budget." In 2015 they changed their licensing model to software as a service. In 2020, YNAB had 115 employees, all working remotely. == Overview == The service encourages users to follow four principles or "rules": Give every dollar a job: Each dollar in a budget is allocated to a specific purpose. This concept is also called zero-based budgeting. Embrace true expenses: All expenses are planned for, so that there are no surprises. Roll with the punches: Being flexible when there is overspending. Age your money: Keeping money in your budget without immediately spending it. Users can either import transactions automatically from their financial institutions or input them manually. The software also displays financial reports to keep users informed about their finances at a glance. == Awards and recognition == YNAB has been named one of the best budgeting apps by U.S. News & World Report, Kiplinger's Personal Finance, CNN, HuffPost, CNBC, and hundreds of other financial reporting outlets. The Wall Street Journal – Best budgeting app for hands-on budgeters. Forbes – Best Budgeting Apps Money – Best budgeting app for college students. Lifehacker – Most popular personal finance software. Wirecutter – "Great pick for hard-core budgeters". Investopedia – Best overall budgeting app.

    Read more →
  • Unique negative dimension

    Unique negative dimension

    Unique negative dimension (UND) is a complexity measure for the model of learning from positive examples. The unique negative dimension of a class C {\displaystyle C} of concepts is the size of the maximum subclass D ⊆ C {\displaystyle D\subseteq C} such that for every concept c ∈ D {\displaystyle c\in D} , we have ∩ ( D ∖ { c } ) ∖ c {\displaystyle \cap (D\setminus \{c\})\setminus c} is nonempty. This concept was originally proposed by M. Gereb-Graus in "Complexity of learning from one-side examples", Technical Report TR-20-89, Harvard University Division of Engineering and Applied Science, 1989.

    Read more →
  • Generalized canonical correlation

    Generalized canonical correlation

    In statistics, the generalized canonical correlation analysis (gCCA), is a way of making sense of cross-correlation matrices between the sets of random variables when there are more than two sets. While a conventional CCA generalizes principal component analysis (PCA) to two sets of random variables, a gCCA generalizes PCA to more than two sets of random variables. The canonical variables represent those common factors that can be found by a large PCA of all of the transformed random variables after each set underwent its own PCA. == Applications == The Helmert-Wolf blocking (HWB) method of estimating linear regression parameters can find an optimal solution only if all cross-correlations between the data blocks are zero. They can always be made to vanish by introducing a new regression parameter for each common factor. The gCCA method can be used for finding those harmful common factors that create cross-correlation between the blocks. However, no optimal HWB solution exists if the random variables do not contain enough information on all of the new regression parameters.

    Read more →
  • GeWorkbench

    GeWorkbench

    geWorkbench (genomics Workbench) is an open-source software platform for integrated genomic data analysis. It is a desktop application written in the programming language Java. geWorkbench uses a component architecture. As of 2016, there are more than 70 plug-ins available, providing for the visualization and analysis of gene expression, sequence, and structure data. geWorkbench is the Bioinformatics platform of MAGNet, the National Center for the Multi-scale Analysis of Genomic and Cellular Networks, one of the 8 National Centers for Biomedical Computing funded through the NIH Roadmap (NIH Common Fund). Many systems and structure biology tools developed by MAGNet investigators are available as geWorkbench plugins. == Features == Computational analysis tools such as t-test, hierarchical clustering, self-organizing maps, regulatory network reconstruction, BLAST searches, pattern-motif discovery, protein structure prediction, structure-based protein annotation, etc. Visualization of gene expression (heatmaps, volcano plot), molecular interaction networks (through Cytoscape), protein sequence and protein structure data (e.g., MarkUs). Integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis. Component integration through platform management of inputs and outputs. Among data that can be shared between components are expression datasets, interaction networks, sample and marker (gene) sets and sequences. Dataset history tracking - complete record of data sets used and input settings. Integration with 3rd party tools such as GenePattern, Cytoscape, and Genomespace. Demonstrations of each feature described can be found at GeWorkbench-web Tutorials. == Versions == geWorkbench is open-source software that can be downloaded and installed locally. A zip file of the released version Java source is also available. Prepackaged installer versions also exist for Windows, Macintosh, and Linux.

    Read more →
  • Princh

    Princh

    Princh is a Danish software company, which is headquartered in Aarhus, Denmark. Founded in 2015, Princh develops cloud printing and electronic payment products. The company is headquartered in the city of Aarhus. While utilizing a smartphone or web app, users can locate a nearby printer to their current location, get directions to access said printer, and/or authorize a print and pay for the print job in question. The product is available as a native mobile apps for Android and iOS, as well as on web and desktop products for businesses and libraries. The app connects a network of printer owners and users around the world. Princh supports an array of printable files. == History == The company was founded in 2015. The company is currently based in the southern part of Aarhus. The Princh printing service was officially launched on June 23, 2015. Currently, Princh is available as a service in a multitude of locations such as print shops, libraries, hotels, or universities. Princh is a popular printing and payment product among libraries and can among other places be found in Denmark, Sweden, Norway, Germany, United Kingdom, United States, and Canada. == How it works == With the Princh app, users will be able to locate their nearest printer. Once the user is at the printer, the user chooses the document to be printed out and shares it with the Princh app. The user then selects the desired nearby printer entering the printer ID number or scanning the QR-code located on top of the printer, pays electronically and the print job is processed by the printer. Printer owners get access to a personal control panel where they can set printing prices and monitor all Princh activity for their business. == Notes and references ==

    Read more →
  • Oscillatory neural network

    Oscillatory neural network

    An oscillatory neural network (ONN) is an artificial neural network that uses coupled oscillators as neurons. Oscillatory neural networks are closely linked to the Kuramoto model, and are inspired by the phenomenon of neural oscillations in the brain. Oscillatory neural networks have been trained to recognize images. Complex-Valued Oscillatory network has also been shown to store and retrieve multidimensional aperiodic signals. An oscillatory autoencoder has also been demonstrated, which uses a combination of oscillators and rate-coded neurons. A neuron made of two coupled oscillators, one having a fixed and the other having a tunable natural frequency, has been shown able to run logic gates such as XOR that conventional sigmoid neurons cannot.

    Read more →
  • Mixture model

    Mixture model

    In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation. Mixture models should not be confused with models for compositional data, i.e., data whose components are constrained to sum to a constant value (1, 100%, etc.). However, compositional models can be thought of as mixture models, where members of the population are sampled at random. Conversely, mixture models can be thought of as compositional models, where the total size reading population has been normalized to 1. == Structure == === General mixture model === A typical finite-dimensional mixture model is a hierarchical model consisting of the following components: N random variables that are observed, each distributed according to a mixture of K components, with the components belonging to the same parametric family of distributions (e.g., all normal, all Zipfian, etc.) but with different parameters. However, it is also possible to have a finite mixture model where each component belongs to a different parametric family of distributions, for example, a mixture of a multivariate normal distribution and a generalized hyperbolic distribution. N random latent variables specifying the identity of the mixture component of each observation, each distributed according to a K-dimensional categorical distribution A set of K mixture weights, which are probabilities that sum to 1. A set of K parameters, each specifying the parameter of the corresponding mixture component. In many cases, each "parameter" is actually a set of parameters. For example, if the mixture components are Gaussian distributions, there will be a mean and variance for each component. If the mixture components are categorical distributions (e.g., when each observation is a token from a finite alphabet of size V), there will be a vector of V probabilities summing to 1. In addition, in a Bayesian setting, the mixture weights and parameters will themselves be random variables, and prior distributions will be placed over the variables. In such a case, the weights are typically viewed as a K-dimensional random vector drawn from a Dirichlet distribution (the conjugate prior of the categorical distribution), and the parameters will be distributed according to their respective conjugate priors. Mathematically, a basic parametric mixture model can be described as follows: K = number of mixture components N = number of observations θ i = 1 … K = parameter of distribution of observation associated with component i ϕ i = 1 … K = mixture weight, i.e., prior probability of a particular component i ϕ = K -dimensional vector composed of all the individual ϕ 1 … K ; must sum to 1 z i = 1 … N = component of observation i x i = 1 … N = observation i F ( x | θ ) = probability distribution of an observation, parametrized on θ z i = 1 … N ∼ Categorical ⁡ ( ϕ ) x i = 1 … N | z i = 1 … N ∼ F ( θ z i ) {\displaystyle {\begin{array}{lcl}K&=&{\text{number of mixture components}}\\N&=&{\text{number of observations}}\\\theta _{i=1\dots K}&=&{\text{parameter of distribution of observation associated with component }}i\\\phi _{i=1\dots K}&=&{\text{mixture weight, i.e., prior probability of a particular component }}i\\{\boldsymbol {\phi }}&=&K{\text{-dimensional vector composed of all the individual }}\phi _{1\dots K}{\text{; must sum to 1}}\\z_{i=1\dots N}&=&{\text{component of observation }}i\\x_{i=1\dots N}&=&{\text{observation }}i\\F(x|\theta )&=&{\text{probability distribution of an observation, parametrized on }}\theta \\z_{i=1\dots N}&\sim &\operatorname {Categorical} ({\boldsymbol {\phi }})\\x_{i=1\dots N}|z_{i=1\dots N}&\sim &F(\theta _{z_{i}})\end{array}}} In a Bayesian setting, all parameters are associated with random variables, as follows: K , N = as above θ i = 1 … K , ϕ i = 1 … K , ϕ = as above z i = 1 … N , x i = 1 … N , F ( x | θ ) = as above α = shared hyperparameter for component parameters β = shared hyperparameter for mixture weights H ( θ | α ) = prior probability distribution of component parameters, parametrized on α θ i = 1 … K ∼ H ( θ | α ) ϕ ∼ S y m m e t r i c - D i r i c h l e t K ⁡ ( β ) z i = 1 … N | ϕ ∼ Categorical ⁡ ( ϕ ) x i = 1 … N | z i = 1 … N , θ i = 1 … K ∼ F ( θ z i ) {\displaystyle {\begin{array}{lcl}K,N&=&{\text{as above}}\\\theta _{i=1\dots K},\phi _{i=1\dots K},{\boldsymbol {\phi }}&=&{\text{as above}}\\z_{i=1\dots N},x_{i=1\dots N},F(x|\theta )&=&{\text{as above}}\\\alpha &=&{\text{shared hyperparameter for component parameters}}\\\beta &=&{\text{shared hyperparameter for mixture weights}}\\H(\theta |\alpha )&=&{\text{prior probability distribution of component parameters, parametrized on }}\alpha \\\theta _{i=1\dots K}&\sim &H(\theta |\alpha )\\{\boldsymbol {\phi }}&\sim &\operatorname {Symmetric-Dirichlet} _{K}(\beta )\\z_{i=1\dots N}|{\boldsymbol {\phi }}&\sim &\operatorname {Categorical} ({\boldsymbol {\phi }})\\x_{i=1\dots N}|z_{i=1\dots N},\theta _{i=1\dots K}&\sim &F(\theta _{z_{i}})\end{array}}} This characterization uses F and H to describe arbitrary distributions over observations and parameters, respectively. Typically H will be the conjugate prior of F. The two most common choices of F are Gaussian aka "normal" (for real-valued observations) and categorical (for discrete observations). Other common possibilities for the distribution of the mixture components are: Binomial distribution, for the number of "positive occurrences" (e.g., successes, yes votes, etc.) given a fixed number of total occurrences Multinomial distribution, similar to the binomial distribution, but for counts of multi-way occurrences (e.g., yes/no/maybe in a survey) Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs Poisson distribution, for the number of occurrences of an event in a given period of time, for an event that is characterized by a fixed rate of occurrence Exponential distribution, for the time before the next event occurs, for an event that is characterized by a fixed rate of occurrence Log-normal distribution, for positive real numbers that are assumed to grow exponentially, such as incomes or prices Multivariate normal distribution (aka multivariate Gaussian distribution), for vectors of correlated outcomes that are individually Gaussian-distributed Multivariate Student's t-distribution, for vectors of heavy-tailed correlated outcomes A vector of Bernoulli-distributed values, corresponding, e.g., to a black-and-white image, with each value representing a pixel; see the handwriting-recognition example below === Specific examples === ==== Gaussian mixture model ==== A typical non-Bayesian Gaussian mixture model looks like this: K , N = as above ϕ i = 1 … K , ϕ = as above z i = 1 … N , x i = 1 … N = as above θ i = 1 … K = { μ i = 1 … K , σ i = 1 … K 2 } μ i = 1 … K = mean of component i σ i = 1 … K 2 = variance of component i z i = 1 … N ∼ Categorical ⁡ ( ϕ ) x i = 1 … N ∼ N ( μ z i , σ z i 2 ) {\displaystyle {\begin{array}{lcl}K,N&=&{\text{as above}}\\\phi _{i=1\dots K},{\boldsymbol {\phi }}&=&{\text{as above}}\\z_{i=1\dots N},x_{i=1\dots N}&=&{\text{as above}}\\\theta _{i=1\dots K}&=&\{\mu _{i=1\dots K},\sigma _{i=1\dots K}^{2}\}\\\mu _{i=1\dots K}&=&{\text{mean of component }}i\\\sigma _{i=1\dots K}^{2}&=&{\text{variance of component }}i\\z_{i=1\dots N}&\sim &\operatorname {Categorical} ({\boldsymbol {\phi }})\\x_{i=1\dots N}&\sim &{\mathcal {N}}(\mu _{z_{i}},\sigma _{z_{i}}^{2})\end{array}}} A Bayesian version of a Gaussian mixture model is as follows: K , N = as above ϕ i = 1 … K , ϕ = as above z i = 1 … N , x i = 1 … N = as above θ i = 1 … K = { μ i = 1 … K , σ i = 1 … K 2 } μ i = 1 … K = mean of component i σ i = 1 … K 2 = variance of component i μ 0 , λ , ν , σ 0 2 = shared hyperparameters μ i = 1 … K ∼ N ( μ 0 , λ σ i 2 ) σ i = 1 … K 2 ∼ I n v e r s e - G a m m a ⁡ ( ν , σ 0 2 ) ϕ ∼ S y m m e t r i c - D i r i c h l e t K ⁡ ( β ) z i = 1 … N ∼ Categorical ⁡ ( ϕ ) x i = 1 … N ∼ N ( μ z i , σ z i 2 ) {\displaystyle {\begin{array}{lcl}K,N&=&{\text{as above}}\\\phi _{i=1\dots K},{\boldsymbol {\phi }}&=&{\text{as above}}\\z_{i=1\dots N},x_{i=1\dots N}&=&{\text{as above}}\\\theta _{i=1\

    Read more →
  • Jpred

    Jpred

    Jpred v.4 is the latest version of the JPred Protein Secondary Structure Prediction Server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction, that has existed since 1998 in different versions. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 134 000 jobs per month and has carried out over 2 million predictions in total for users in 179 countries. == JPred 2 == The static HTML pages of JPred 2 are still available for reference. == JPred 3 == The JPred v3 followed on from previous versions of JPred developed and maintained by James Cuff and Jonathan Barber (see JPred References). This release added new functionality and fixed many bugs. The highlights are: New, friendlier user interface Retrained and optimised version of Jnet (v2) - mean secondary structure prediction accuracy of >81% Batch submission of jobs Better error checking of input sequences/alignments Predictions now (optionally) returned via e-mail Users may provide their own query names for each submission JPred now makes a prediction even when there are no PSI-BLAST hits to the query PS/PDF output now incorporates all the predictions == JPred 4 == The current version of JPred (v4) has the following improvements and updates incorporated: Retrained on the latest UniRef90 and SCOPe/ASTRAL version of Jnet (v2.3.1) - mean secondary structure prediction accuracy of >82%. Upgraded the Web Server to the latest technologies (Bootstrap framework, JavaScript) and updating the web pages – improving the design and usability through implementing responsive technologies. Added RESTful API and mass-submission and results retrieval scripts - resulting in peak throughput above 20,000 predictions per day. Added prediction jobs monitoring tools. Upgraded the results reporting – both, on the web-site, and through the optional email summary reports: improved batch submission, added results summary preview through Jalview results visualization summary in SVG and adding full multiple sequence alignments into the reports. Improved help-pages, incorporating tool-tips, and adding one-page step-by-step tutorials. Sequence residues are categorised or assigned to one of the secondary structure elements, such as alpha-helix, beta-sheet and coiled-coil. Jnet uses two neural networks for its prediction. The first network is fed with a window of 17 residues over each amino acid in the alignment plus a conservation number. It uses a hidden layer of nine nodes and has three output nodes, one for each secondary structure element. The second network is fed with a window of 19 residues (the result of first network) plus the conservation number. It has a hidden layer with nine nodes and has three output nodes.

    Read more →
  • Cyber attribution

    Cyber attribution

    In the area of computer security, cyber attribution is an attribution of cybercrime, i.e., finding who perpetrated a cyberattack. Uncovering a perpetrator may give insights into various security issues, such as infiltration methods, communication channels, etc., and may help in enacting specific countermeasures. Cyber attribution is a costly endeavor requiring considerable resources and expertise in cyber forensic analysis. For governments and other major players dealing with cybercrime would require not only technical solutions, but legal and political ones as well, and for the latter ones cyber attribution is crucial. Attributing a cyberattack is difficult, and of limited interest to companies that are targeted by cyberattacks. In contrast, secret services often have a compelling interest in finding out whether a state is behind the attack. A further challenge in attribution of cyberattacks is the possibility of a false flag attack, where the actual perpetrator makes it appear that someone else caused the attack. Every stage of the attack may leave artifacts, such as entries in log files, that can be used to help determine the attacker's goals and identity. In the aftermath of an attack, investigators often begin by saving as many artifacts as they can find, and then try to determine the attacker.

    Read more →
  • Proximal policy optimization

    Proximal policy optimization

    Proximal policy optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent. Specifically, it is a policy gradient method, often used for deep RL when the policy network is very large. == History == The predecessor to PPO, Trust Region Policy Optimization (TRPO), was published in 2015. It addressed the instability issue of another algorithm, the Deep Q-Network (DQN), by using the trust region method to limit the KL divergence between the old and new policies. However, TRPO uses the Hessian matrix (a matrix of second derivatives) to enforce the trust region, but the Hessian is inefficient for large-scale problems. PPO was published in 2017. It was essentially an approximation of TRPO that does not require computing the Hessian. The KL divergence constraint was approximated by simply clipping the policy gradient. Since 2018, PPO was the default RL algorithm at OpenAI. PPO has been applied to many areas, such as controlling a robotic arm, beating professional players at Dota 2 (OpenAI Five), and playing Atari games. == TRPO == TRPO, the predecessor of PPO, is an on-policy algorithm. It can be used for environments with either discrete or continuous action spaces. The pseudocode is as follows: Input: initial policy parameters θ 0 {\textstyle \theta _{0}} , initial value function parameters ϕ 0 {\textstyle \phi _{0}} Hyperparameters: KL-divergence limit δ {\textstyle \delta } , backtracking coefficient α {\textstyle \alpha } , maximum number of backtracking steps K {\textstyle K} for k = 0 , 1 , 2 , … {\textstyle k=0,1,2,\ldots } do Collect set of trajectories D k = { τ i } {\textstyle {\mathcal {D}}_{k}=\left\{\tau _{i}\right\}} by running policy π k = π ( θ k ) {\textstyle \pi _{k}=\pi \left(\theta _{k}\right)} in the environment. Compute rewards-to-go R ^ t {\textstyle {\hat {R}}_{t}} . Compute advantage estimates, A ^ t {\textstyle {\hat {A}}_{t}} (using any method of advantage estimation) based on the current value function V ϕ k {\textstyle V_{\phi _{k}}} . Estimate policy gradient as g ^ k = 1 | D k | ∑ τ ∈ D k ∑ t = 0 T ∇ θ log ⁡ π θ ( a t ∣ s t ) | θ k A ^ t {\displaystyle {\hat {g}}_{k}=\left.{\frac {1}{\left|{\mathcal {D}}_{k}\right|}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\nabla _{\theta }\log \pi _{\theta }\left(a_{t}\mid s_{t}\right)\right|_{\theta _{k}}{\hat {A}}_{t}} Use the conjugate gradient algorithm to compute x ^ k ≈ H ^ k − 1 g ^ k {\displaystyle {\hat {x}}_{k}\approx {\hat {H}}_{k}^{-1}{\hat {g}}_{k}} where H ^ k {\textstyle {\hat {H}}_{k}} is the Hessian of the sample average KL-divergence. Update the policy by backtracking line search with θ k + 1 = θ k + α j 2 δ x ^ k T H ^ k x ^ k x ^ k {\displaystyle \theta _{k+1}=\theta _{k}+\alpha ^{j}{\sqrt {\frac {2\delta }{{\hat {x}}_{k}^{T}{\hat {H}}_{k}{\hat {x}}_{k}}}}{\hat {x}}_{k}} where j ∈ { 0 , 1 , 2 , … K } {\textstyle j\in \{0,1,2,\ldots K\}} is the smallest value which improves the sample loss and satisfies the sample KL-divergence constraint. Fit value function by regression on mean-squared error: ϕ k + 1 = arg ⁡ min ϕ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T ( V ϕ ( s t ) − R ^ t ) 2 {\displaystyle \phi _{k+1}=\arg \min _{\phi }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\left(V_{\phi }\left(s_{t}\right)-{\hat {R}}_{t}\right)^{2}} typically via some gradient descent algorithm. == PPO == The pseudocode is as follows: Input: initial policy parameters θ 0 {\textstyle \theta _{0}} , initial value function parameters ϕ 0 {\textstyle \phi _{0}} for k = 0 , 1 , 2 , … {\textstyle k=0,1,2,\ldots } do Collect set of trajectories D k = { τ i } {\textstyle {\mathcal {D}}_{k}=\left\{\tau _{i}\right\}} by running policy π k = π ( θ k ) {\textstyle \pi _{k}=\pi \left(\theta _{k}\right)} in the environment. Compute rewards-to-go R ^ t {\textstyle {\hat {R}}_{t}} . Compute advantage estimates, A ^ t {\textstyle {\hat {A}}_{t}} (using any method of advantage estimation) based on the current value function V ϕ k {\textstyle V_{\phi _{k}}} . Update the policy by maximizing the PPO-Clip objective: θ k + 1 = arg ⁡ max θ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T min ( π θ ( a t ∣ s t ) π θ k ( a t ∣ s t ) A π θ k ( s t , a t ) , g ( ϵ , A π θ k ( s t , a t ) ) ) {\displaystyle \theta _{k+1}=\arg \max _{\theta }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\min \left({\frac {\pi _{\theta }\left(a_{t}\mid s_{t}\right)}{\pi _{\theta _{k}}\left(a_{t}\mid s_{t}\right)}}A^{\pi _{\theta _{k}}}\left(s_{t},a_{t}\right),\quad g\left(\epsilon ,A^{\pi _{\theta _{k}}}\left(s_{t},a_{t}\right)\right)\right)} typically via stochastic gradient ascent with Adam. Fit value function by regression on mean-squared error: ϕ k + 1 = arg ⁡ min ϕ 1 | D k | T ∑ τ ∈ D k ∑ t = 0 T ( V ϕ ( s t ) − R ^ t ) 2 {\displaystyle \phi _{k+1}=\arg \min _{\phi }{\frac {1}{\left|{\mathcal {D}}_{k}\right|T}}\sum _{\tau \in {\mathcal {D}}_{k}}\sum _{t=0}^{T}\left(V_{\phi }\left(s_{t}\right)-{\hat {R}}_{t}\right)^{2}} typically via some gradient descent algorithm. Like all policy gradient methods, PPO is used for training an RL agent whose actions are determined by a differentiable policy function by gradient ascent. Intuitively, a policy gradient method takes small policy update steps, so the agent can reach higher and higher rewards in expectation. Policy gradient methods may be unstable: A step size that is too big may direct the policy in a suboptimal direction, thus having little possibility of recovery; a step size that is too small lowers the overall efficiency. To solve the instability, PPO implements a clip function that constrains the policy update of an agent from being too large, so that larger step sizes may be used without negatively affecting the gradient ascent process. === Basic concepts === To begin the PPO training process, the agent is set in an environment to perform actions based on its current input. In the early phase of training, the agent can freely explore solutions and keep track of the result. Later, with a certain amount of transition samples and policy updates, the agent will select an action to take by randomly sampling from the probability distribution P ( A | S ) {\displaystyle P(A|S)} generated by the policy network. The actions that are most likely to be beneficial will have the highest probability of being selected from the random sample. After an agent arrives at a different scenario (a new state) by acting, it is rewarded with a positive reward or a negative reward. The objective of an agent is to maximize the cumulative reward signal across sequences of states, known as episodes. === Policy gradient laws: the advantage function === The advantage function (denoted as A {\displaystyle A} ) is central to PPO, as it tries to answer the question of whether a specific action of the agent is better or worse than some other possible action in a given state. By definition, the advantage function is an estimate of the relative value for a selected action. If the output of this function is positive, it means that the action in question is better than the average return, so the possibilities of selecting that specific action will increase. The opposite is true for a negative advantage output. The advantage function can be defined as A = Q − V {\displaystyle A=Q-V} , where Q {\displaystyle Q} is the discounted sum of rewards (the total weighted reward for the completion of an episode) and V {\displaystyle V} is the baseline estimate. Since the advantage function is calculated after the completion of an episode, the program records the outcome of the episode. Therefore, calculating advantage is essentially an unsupervised learning problem. The baseline estimate comes from the value function that outputs the expected discounted sum of an episode starting from the current state. In the PPO algorithm, the baseline estimate will be noisy (with some variance), as it also uses a neural network, like the policy function itself. With Q {\displaystyle Q} and V {\displaystyle V} computed, the advantage function is calculated by subtracting the baseline estimate from the actual discounted return. If A > 0 {\displaystyle A>0} , the actual return of the action is better than the expected return from experience; if A < 0 {\displaystyle A<0} , the actual return is worse. === Ratio function === In PPO, the ratio function ( r t {\displaystyle r_{t}} ) calculates the probability of selecting action a {\displaystyle a} in state s {\displaystyle s} given the current policy network, divided by the previous probability under the old policy. In other words: If r t ( θ ) > 1 {\displaystyle r_{t}(\theta )>1} , where θ {\displaystyle \theta } are the policy network parameters, then selecting action a {\displaystyle a} in state s {\displaystyle s} is more likely based on the current policy than the previous policy. If 0 ≤ r t ( θ ) < 1 {\displaystyle 0\leq r_{t}(\theta )<1} , then selecting actio

    Read more →
  • Receptron

    Receptron

    The receptron (short for "reservoir perceptron") is a neuromorphic data processing model — specifically neuromorphic computing — that generalizes the traditional perceptron, by incorporating non-linear interactions between inputs. Unlike classical perceptron, which rely on linearly independent weights, the receptron leverages complexity in physical substrates, such as the electric conduction properties of nanostructured materials or optical speckle fields, to perform classification tasks. The receptron bridges unconventional computing and neural network principles, enabling solutions that do not require the training approaches typical of artificial neural networks based on the perceptron model. == Algorithm == The receptron is an algorithm for supervised learning of binary classifiers, so a classification algorithm that makes its predictions based on a predictor function, combining a set of weights with the feature vector. The mathematical model is based on the sum of inputs with non-linear interactions: S = ∑ k = 1 n x j w ~ j ( x → ) | S ∈ R {\displaystyle S=\sum _{k=1}^{n}x_{j}{\widetilde {w}}_{j}({\vec {x}})|S\in R} (1) where j ∈ [ 1 , n ] {\displaystyle j\in [1,n]} and w ~ j {\displaystyle {\widetilde {w}}_{j}} are non-linear weight functions depending on the inputs, x → {\displaystyle {\vec {x}}} . Nonlinearity will typically make the system extremely complex, and allowing for the solution of problems not solvable through the simpler rules of a linear system, such as the perceptron or McCulloch Pitts neurons, which is based on the sum of linearly independent weights: S = ∑ k = 1 n x j w j p {\displaystyle S=\sum _{k=1}^{n}x_{j}w_{j}^{p}} (2) where w j {\displaystyle w_{j}} are constant real values. A consequence of this simplicity is the limitation to linearly separable functions, which necessitates multi-layer architectures and training algorithms like backpropagation As in the perceptron case, the summation in Eq. 1 origins the activation of the receptron output through the thresholding process, Y ( x 1 , . . . , x n ) = { 1 if S > th 0 if S ≤ th {\displaystyle Y(x_{1},...,x_{n})={\begin{cases}1&{\text{if }}S>{\text{th}}\\0&{\text{if }}S\leq {\text{th}}\end{cases}}} (3) where th is a constant threshold parameter. Equation 3 can be written by using the Heaviside step function. The weight functions w ~ ( x → ) {\displaystyle {\widetilde {w}}({\vec {x}})} can be written with a finite number of parameters w j 1 . . . j n {\displaystyle w_{j_{1}...j_{n}}} , simplifying the model representation. One can Taylor-expand w ~ ( x → ) {\displaystyle {\widetilde {w}}({\vec {x}})} and use the idempotency of Boolean variables ( x j ) q = x j ∀ q ≥ 1 {\displaystyle (x_{j})^{q}=x_{j}\forall q\geq 1} such that S ′ = b + ∑ k = 1 n x j w ~ j ( x → ) {\displaystyle S'=b+\sum _{k=1}^{n}x_{j}{\widetilde {w}}_{j}({\vec {x}})} can be written as S ′ ( x → ) = b + ∑ j w j x j + ∑ j < k w j k x j x k + ∑ j < k < l w j k l x j x k x l + . . . {\displaystyle S'({\vec {x}})=b+\sum _{j}w_{j}x_{j}+\sum _{j Read more →