ZipBooks

ZipBooks is a free online accounting software company based in American Fork, Utah. The cloud-based software is an accounting and bookkeeping tool that helps business owners process credit cards, track finances, and send invoices, among other features. == History == ZipBooks was founded by Tim Chaves in June 2015, backed by venture capital firm Peak Ventures. The company secured an additional $2 million of funding in July 2016, and in 2017 it was awarded a $100,000 economic grant by the Utah Governor's Office of Economic Development Technology Commercialization and Innovation Program. == Products == ZipBooks' core modules are invoicing, transactions, bills, reporting, time tracking, contacts, and payroll. Accrual accounting was added in 2017. The application is available on G Suite, iOS, Slack, and as a web application. == Reception == Computerworld compared ZipBooks favorably with other accounting software. PC Magazine praised its user experience, but stated it lacked "a lot of features that competing sites offer".

Peanut App

Peanut, a product of Peanut App Ltd. is an online community for women who are planning to become pregnant, women who are pregnant, women who have had children, and women who are experiencing menopause. Profiles of potential friends are displayed to users who can swipe up to show intent to connect. Users can also connect via discussion threads, groups, and live audio conversations. The app allows users to select their stage of life (trying to conceive, pregnancy, motherhood, or menopause), so as to meet women at a similar life stage, and to discover relevant content. Peanut was founded by Michelle Kennedy shortly after she left Bumble, a female-first dating app. She has described Peanut as, "the app she wishes she had when she first became a mother". == History == Peanut was initially launched in 2017 for mothers and pregnant women. The app focuses on helping users find others with shared interests, such as spoken languages, occupations, and hobbies. It also displays a woman's life stage, such as the age of her children, or the stage of pregnancy. In 2018, it launched a community discussion feature that intended to give women an "alternative to other social platforms". In 2019, it started to serve women who are trying to conceive. In April 2021, it integrated live audio, in response to the COVID-19 pandemic, and the restrictions around in-person socializing. in September 2021, it started to include women who are navigating perimenopause, menopause, and postmenopausal. Although it had initially catered for younger women navigating into new families, a large number of users had undergone surgically or chemically induced menopause due to medical conditions. In July 2021, Peanut launched an investment micro fund, Peanut StartHER, focused on investing in women-owned businesses, as well as other historically excluded founders. == Operation == The Peanut app is a social network exclusively for women, focusing on topics of pregnancy, motherhood, fertility, and menopause. It is available on iOS and Android devices. Users must prove their identity, in keeping with the primary function of in-app safety, and then they can create a profile to interact with other users. For pregnant users, the “Bump Buddies” feature helps connect them with other Peanut users who have a similar due date, which aimed to help expecting mothers combat loneliness during the COVID-19 pandemic. Peanut users also have the option to join “Groups” ‒ sub-sections of users focused on specific topics, including (but not limited to) location, life stage, pregnancy due date, and interests or hobbies. The live voice chat feature “Pods”, enables Peanut users to socialize without the pressure of photos or video chat. It offers features such as a muted audience of listeners who need to virtually raise their hand to speak, emoji reactions, and hosts who can moderate the conversations and invite people to speak.

Multiple sequence alignment

Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations (single amino acid or nucleotide changes), insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides. Multiple sequence alignments require more sophisticated methodologies than pairwise alignments, as they are more computationally complex. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive. However, heuristic methods generally cannot guarantee high-quality solutions and have been shown to fail to yield near-optimal solutions on benchmark test cases. == Problem statement == Given m {\displaystyle m} sequences S i {\displaystyle S_{i}} , i = 1 , ⋯ , m {\displaystyle i=1,\cdots ,m} similar to the form below: S := { S 1 = ( S 11 , S 12 , … , S 1 n 1 ) S 2 = ( S 21 , S 22 , ⋯ , S 2 n 2 ) ⋮ S m = ( S m 1 , S m 2 , … , S m n m ) {\displaystyle S:={\begin{cases}S_{1}=(S_{11},S_{12},\ldots ,S_{1n_{1}})\\S_{2}=(S_{21},S_{22},\cdots ,S_{2n_{2}})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S_{m}=(S_{m1},S_{m2},\ldots ,S_{mn_{m}})\end{cases}}} A multiple sequence alignment is taken of this set of sequences S {\displaystyle S} by inserting any amount of gaps needed into each of the S i {\displaystyle S_{i}} sequences of S {\displaystyle S} until the modified sequences, S i ′ {\displaystyle S'_{i}} , all conform to length L ≥ max { n i ∣ i = 1 , … , m } {\displaystyle L\geq \max\{n_{i}\mid i=1,\ldots ,m\}} and no values in the sequences of S {\displaystyle S} of the same column consists of only gaps. The mathematical form of an MSA of the above sequence set is shown below: S ′ := { S 1 ′ = ( S 11 ′ , S 12 ′ , … , S 1 L ′ ) S 2 ′ = ( S 21 ′ , S 22 ′ , … , S 2 L ′ ) ⋮ S m ′ = ( S m 1 ′ , S m 2 ′ , … , S m L ′ ) {\displaystyle S':={\begin{cases}S'_{1}=(S'_{11},S'_{12},\ldots ,S'_{1L})\\S'_{2}=(S'_{21},S'_{22},\ldots ,S'_{2L})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S'_{m}=(S'_{m1},S'_{m2},\ldots ,S'_{mL})\end{cases}}} To return from each particular sequence S i ′ {\displaystyle S'_{i}} to S i {\displaystyle S_{i}} , remove all gaps. == Graphing approach == A general approach when calculating multiple sequence alignments is to use graphs to identify all of the different alignments. When finding alignments via graph, a complete alignment is created in a weighted graph that contains a set of vertices and a set of edges. Each of the graph edges has a weight based on a certain heuristic that helps to score each alignment or subset of the original graph. === Tracing alignments === When determining the best suited alignments for each MSA, a trace is usually generated. A trace is a set of realized, or corresponding and aligned, vertices that has a specific weight based on the edges that are selected between corresponding vertices. When choosing traces for a set of sequences it is necessary to choose a trace with a maximum weight to get the best alignment of the sequences. == Alignment methods == There are various alignment methods used within multiple sequence to maximize scores and correctness of alignments. Each is usually based on a certain heuristic with an insight into the evolutionary process. Most try to replicate evolution to get the most realistic alignment possible to best predict relations between sequences. === Dynamic programming === A direct method for producing an MSA uses the dynamic programming technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a gap penalty and a substitution matrix assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences, a similar gap penalty is used, but a much simpler substitution matrix, wherein only identical matches and mismatches are considered, is typical. The scores in the substitution matrix may be either all positive or a mix of positive and negative in the case of a global alignment, but must be both positive and negative, in the case of a local alignment. For n individual sequences, the naive method requires constructing the n-dimensional equivalent of the matrix formed in standard pairwise sequence alignment. The search space thus increases exponentially with increasing n and is also strongly dependent on sequence length. Expressed with the big O notation commonly used to measure computational complexity, a naïve MSA takes O(LengthNseqs) time to produce. To find the global optimum for n sequences this way has been shown to be an NP-complete problem. In 1989, based on Carrillo-Lipman Algorithm, Altschul introduced a practical method that uses pairwise alignments to constrain the n-dimensional search space. In this approach pairwise dynamic programming alignments are performed on each pair of sequences in the query set, and only the space near the n-dimensional intersection of these alignments is searched for the n-way alignment. The MSA program optimizes the sum of all of the pairs of characters at each position in the alignment (the so-called sum of pair score) and has been implemented in a software program for constructing multiple sequence alignments. In 2019, Hosseininasab and van Hoeve showed that by using decision diagrams, MSA may be modeled in polynomial space complexity. === Progressive alignment construction === The most widely used approach to multiple sequence alignments uses a heuristic search known as progressive technique (also known as the hierarchical or tree method) developed by Da-Fei Feng and Doolittle in 1987. Progressive alignment builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly related. All progressive alignment methods require two stages: a first stage in which the relationships between the sequences are represented as a phylogenetic tree, called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree. The initial guide tree is determined by an efficient clustering method such as neighbor-joining or unweighted pair group method with arithmetic mean (UPGMA), and may use distances based on the number of identical two-letter sub-sequences (as in FASTA rather than a dynamic programming alignment). Progressive alignments are not guaranteed to be globally optimal. The primary problem is that when errors are made at any stage in growing the MSA, these errors are then propagated through to the final result. Performance is also particularly bad when all of the sequences in the set are rather distantly related. Most modern progressive methods modify their scoring function with a secondary weighting function that assigns scaling factors to individual members of the query set in a nonlinear fashion based on their phylogenetic distance from their nearest neighbors. This corrects for non-random selection of the sequences given to the alignment program. Progressive alignment methods are efficient enough to implement on a large scale for many (100s to 1000s) sequences. A popular progressive alignment method has been the Clustal family. ClustalW is used extensively for phylogenetic tree construction, in spite of the author's explicit warnings that unedited alignments should not be used in such studies and as input for protein structure prediction by homology modeling. European Bioinformatics Institute (EMBL-EBI) announced that CLustalW2 will expire in August 2015. They recommend Clustal Omega which performs based on seeded guide trees and HMM profile-profile techniques for protein alignments. An alternative tool for progressive DNA alignments is multiple alignment using fast Fourier transform (MAFFT). Another common progressive alignment method named T-Coffee is slower than Clustal and its derivatives but generally produces more accurate alignments for distantly related sequence sets. T-Coffee calculates pairwise alignments by combining the direct alignment of the pair with indirect alignments that aligns each sequence of the pair to a third sequence. It uses the output from Clustal as well as another local alignment program LALIGN, which finds multiple regions of local alignment between two sequences. The resulting alignment and phylogenetic tree are used as a guide to produce new and more accurate w

Lori Levin

Lorraine Susan (Lori) Levin is an American computer scientist and computational linguist specializing in natural language processing, particularly involving syntax, morphosyntax, and languages with small corpora. She is a research professor in the Language Technologies Institute of the Carnegie Mellon University School of Computer Science, and one of the founders of the North American Computational Linguistics Open Competition. == Education and career == Levin has a 1979 bachelor's degree in linguistics (summa cum laude) from the University of Pennsylvania, and a 1986 Ph.D. in linguistics from the Massachusetts Institute of Technology. Her dissertation, Operations on Lexical Forms: Unaccusative Rules in Germanic Languages, was jointly supervised by Joan Bresnan and Kenneth L. Hale. She worked as an assistant professor of linguistics at the University of Pittsburgh from 1983 until 1988, when she joined the Carnegie Mellon University Language Technologies Institute. == Recognition == Levin was named as a Fellow of the Association for Computational Linguistics in 2025, "for pioneering work on the use of phonetics, syntax, lexical semantics and dialogue modeling in machine translation and in the transfer of NLP technologies to low resource languages, as well as an enduring contribution to the North American Computational Linguistics Olympiad". Levin was awarded the Antonio Zampolli prize of the ELRA Language Resources Association at the LREC 2026 conference.

Yorick Wilks

Yorick Alexander Wilks FBCS (27 October 1939 – 14 April 2023) was a British computer scientist. He was an emeritus professor of artificial intelligence at the University of Sheffield, visiting professor of artificial intelligence at Gresham College (a post created especially for him), senior research fellow at the Oxford Internet Institute, senior scientist at the Florida Institute for Human and Machine Cognition, and a member of the Epiphany Philosophers. In February 2023, Wilks joined WiredVibe as Director of AI and a Board Member, with the goal of commercializing his previous research and ideas. He remained in this role until his death, which occurred shortly before WiredVibe was acquired by AKY X, a company that continues to build on his legacy and contributions. == Biography == Wilks was born in Gerrards Cross, Buckinghamshire in England. He was educated at Torquay Boys' Grammar School, followed by Pembroke College, Cambridge, where he read Philosophy, joined the Epiphany Philosophers and obtained his Doctor of Philosophy degree (1968) under Professor R. B. Braithwaite for the thesis 'Argument and Proof'; he was an early pioneer in meaning-based approaches to the understanding of natural language content by computers. His main early contribution in the 1970s was called "Preference Semantics" (Wilks, 1973; Wilks and Fass, 1992), an algorithmic method for assigning the "most coherent" interpretation to a sentence in terms of having the maximum number of internal preferences of its parts (normally verbs or adjectives) satisfied. That early work was hand-coded with semantic entries (of the order of some hundreds) as was normal at the time, but since then has led to the empirical determinations of preferences (chiefly of English verbs) in the 1980s and 1990s. A key component of the notion of preference in semantics was that the interpretation of an utterance is not a well- or ill-formed notion, as was argued in Chomskyan approaches, such as those of Jerry Fodor and Jerrold Katz. It was rather that a semantic interpretation was the best available, even though some preferences might not be satisfied. So, in "The machine answered the question with a low whine" the agent of "answer" does not satisfy that verb's preference for a human answerer—which would cause it to be deemed ill-formed by Fodor and Katz—but is accepted as sub-optimal or metaphorical, and, now, conventional. The function of the algorithm is not to determine well-formedness at all but to make the optimal selection of word-senses to participate in the overall interpretation. Thus, in "The Pole answered..." the system will always select the human sense of the agent and not the inanimate one if it gives a more coherent interpretation overall. Preference Semantics is thus some of the earliest computational work—with programs run at Systems Development Corporation in Santa Monica in 1967 in LISP on an IBM360—in the now established field of word sense disambiguation. This approach was used in the first operational machine translation system based principally on meaning structures and built by Wilks at Stanford Artificial Intelligence Laboratory in the early 1970s (Wilks, 1973) at the same time and place as Roger Schank was applying his "Conceptual Dependency" approach to machine translation. The LISP code of Wilks' system was in The Computer Museum, Boston. Wilks was elected a fellow of the American and European Associations for Artificial Intelligence, of the British Computer Society, a member of the UK Computing Research Committee, and a permanent member of ICCL, the International Committee on Computational Linguistics. He was professor of artificial intelligence at the University of Sheffield and a senior research fellow at the Oxford Internet Institute. In 1991 he received a Defense Advanced Projects Agency grant on interlingual pragmatics-based machine translation and in 1994 he received a grant by the Engineering and Physical Sciences Research Council to investigate in the field of large-scale information extraction (LaSIE); in the following years he would obtain more grants to carry on exploring the field of information extraction (AVENTINUS, ECRAN, PASTA...). In the 1990s Wilks also became interested in modelling human-computer dialogue and the team led by David Levy and him as chief researcher won the Loebner Prize in 1997. He was the founding director of the EU funded Companions Project on creating long-term computer companions for people. At his Festschrift in 2007 at the British Computer Society in London a volume of his own papers was presented along with a volume of essays in his honour. He was awarded the Antonio Zampolli prize in honour of his lifetime work at the LREC 2008 conference on 28 May 2008, and the Lifetime Achievement Award at the ACL 2008 conference on 18 June 2008. In 2009, he was awarded the British Computer Society's Lovelace Medal, its annual award for research achievement, and was awarded the Fellowship of the Association for Computing Machinery. In 1998, Wilks became head of the Department of Computer Science of the University of Sheffield, where he had started working in the year 1993 as professor of artificial intelligence, a post he still held. In 1993 he became the founding director of the Institute of Language, Speech and Hearing (ILASH). Wilks also set up the Natural Language Processing Group of the University of Sheffield. In 1994 he (along with Rob Gaizauskas and Hamish Cunningham) designed GATE, an advanced NLP architecture that has been widely distributed. National Life Stories conducted an oral history interview (C1672/24) with Yorick Wilks in 2016 for its Science and Religion collection held by the British Library. Wilks died on 14 April 2023, at the age of 83. == Awards == Wilks received many awards: (2009) Elected Fellow of the Association for Computing Machinery (2009) Lovelace Medal by the British Computer Society (2008) Zampolli Prize (ELRA, awarded at LREC in Marrakech, Morocco) (2008) Lifetime Achievement Award (Association for Computational Linguistics, in Columbus) (2006) Visiting Professor, University of Oxford (2004) Elected to UK Computing Research Committee (2004) Elected Fellow, British Computer Society (2003) Visiting Fellow, Oxford Internet Institute (1998) Elected Fellow of European Association for Artificial Intelligence (1997) Elected Fellow, EPSRC College of Computing (1991) Visiting Fellow, Trinity Hall, Cambridge (1991) Elected Fellow of the American Association for Artificial Intelligence (1983) Royal Society Travel Fellowship (1983) Commonwealth of Australia Visiting Professor (1981) Visiting Sloan Fellow, University of California, Berkeley (1980) Invited Participant in the Nobel Symposium on Language, Stockholm (1979) NATO Senior Scientist Fellowship (1979) Visiting Sloan Fellow, Yale University (1975) SRC Senior Visiting Fellowship, University of Edinburgh == Membership == Wilks was an active member of the following associations: Association for Computational Linguistics Society for the Study of AI and Simulation of Behaviour Association for Computing Machinery Cognitive Science Society British Society for the Philosophy of Science American Association for Artificial Intelligence Aristotelian Society == Selected works == === Books === Wilks, Y. (2019) Artificial Intelligence: Modern Magic or Dangerous Future?.Icon Books. New illustrated edition, 2023, MIT Press. Wilks, Y. (2015) Machine Translation: its scope and limits. Springer Wilks, Y (ed.) (2010) Close Engagements with Artificial Companions: Key Social, Psychological and Design issues. John Benjamins; Amsterdam Wilks, Y., Brewster, C. (2009) Natural Language Processing as a Foundation of the Semantic Web. Now Press: London. Wilks, Y. (2007) Words and Intelligence I, Selected papers by Yorick Wilks. In K. Ahmad, C. Brewster & M. Stevenson (eds.), Springer: Dordrecht. Wilks, Y. (ed. and with introduction and commentaries). (2006) Language, cohesion and form: selected papers of Margaret Masterman. Cambridge: Cambridge University Press. Wilks, Y., Nirenburg, S., Somers, H. (eds.) (2003) Readings in Machine Translation. Cambridge, MA: MIT Press. Wilks, Y.(ed.). (1999) Machine Conversations. Kluwer: New York. Wilks, Y., Slator, B., Guthrie, L. (1996) Electric Words: dictionaries, computers and meanings. Cambridge, MA: MIT Press. Ballim, A., Wilks, Y. (1991) Artificial Believers. Norwood, NJ: Erlbaum. Wilks, Y.(ed.). (1990) Theoretical Issues in Natural Language Processing. Norwood, NJ: Erlbaum. Wilks, Y., Partridge, D. (eds. plus three YW chapters and an introduction). (1990) The Foundations of Artificial Intelligence: a sourcebook. Cambridge: Cambridge University Press. Wilks, Y., Sparck-Jones, K.(eds.). (1984) Automatic Natural Language Processing, paperback edition. New York: Wiley. Originally published by Ellis Horwood. Wilks, Y., Charniak, E. (eds and principal authors). (1976) Computational Semantics—an Introduction to Artificial Intelligence and

Exploration–exploitation dilemma

The exploration–exploitation dilemma, also known as the explore–exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves choosing the best option based on current knowledge of the system (which may be incomplete or misleading), while exploration involves trying out new options that may lead to better outcomes in the future at the expense of an exploitation opportunity. Finding the optimal balance between these two strategies is a crucial challenge in many decision-making problems whose goal is to maximize long-term benefits. == Application in machine learning == In the context of machine learning, the exploration–exploitation tradeoff is fundamental in reinforcement learning (RL), a type of machine learning that involves training agents to make decisions based on feedback from the environment. Crucially, this feedback may be incomplete or delayed. The agent must decide whether to exploit the current best-known policy or explore new policies to improve its performance. === Multi-armed bandit methods === The multi-armed bandit (MAB) problem was a classic example of the tradeoff, and many methods were developed for it, such as epsilon-greedy, Thompson sampling, and the upper confidence bound (UCB). See the page on MAB for details. In more complex RL situations than the MAB problem, the agent can treat each choice as a MAB, where the payoff is the expected future reward. For example, if the agent performs an epsilon-greedy method, then the agent will often "pull the best lever" by picking the action that had the best predicted expected reward (exploit). However, it would pick a random action with probability epsilon (explore). Monte Carlo tree search, for example, uses a variant of the UCB method. === Exploration problems === There are some problems that make exploration difficult. Sparse reward. If rewards occur only once a long while, then the agent might not persist in exploring. Furthermore, if the space of actions is large, then the sparse reward would mean the agent would not be guided by the reward to find a good direction for deeper exploration. A standard example is Montezuma's Revenge. Deceptive reward. If some early actions give immediate small reward, but other actions give later large reward, then the agent might be lured away from exploring the other actions. Noisy TV problem. If certain observations are irreducibly noisy (such as a television showing random images), then the agent might be trapped exploring those observations (watching the television). === Exploration reward === This section based on. The exploration reward (also called exploration bonus) methods convert the exploration-exploitation dilemma into a balance of exploitations. That is, instead of trying to get the agent to balance exploration and exploitation, exploration is simply treated as another form of exploitation, and the agent simply attempts to maximize the sum of rewards from exploration and exploitation. The exploration reward can be treated as a form of intrinsic reward. We write these as r t i , r t e {\displaystyle r_{t}^{i},r_{t}^{e}} , meaning the intrinsic and extrinsic rewards at time step t {\displaystyle t} . However, exploration reward is different from exploitation in two regards: The reward of exploitation is not freely chosen, but given by the environment, but the reward of exploration may be picked freely. Indeed, there are many different ways to design r t i {\displaystyle r_{t}^{i}} described below. The reward of exploitation is usually stationary (i.e. the same action in the same state gives the same reward), but the reward of exploration is non-stationary (i.e. the same action in the same state should give less and less reward). Count-based exploration uses N n ( s ) {\displaystyle N_{n}(s)} , the number of visits to a state s {\displaystyle s} during the time-steps 1 : n {\displaystyle 1:n} , to calculate the exploration reward. This is only possible in small and discrete state space. Density-based exploration extends count-based exploration by using a density model ρ n ( s ) {\displaystyle \rho _{n}(s)} . The idea is that, if a state has been visited, then nearby states are also partly-visited. In maximum entropy exploration, the entropy of the agent's policy π {\displaystyle \pi } is included as a term in the intrinsic reward. That is, r t i = − ∑ a π ( a | s t ) ln ⁡ π ( a | s t ) + ⋯ {\displaystyle r_{t}^{i}=-\sum _{a}\pi (a|s_{t})\ln \pi (a|s_{t})+\cdots } . === Prediction-based === This section based on. The forward dynamics model is a function for predicting the next state based on the current state and the current action: f : ( s t , a t ) ↦ s t + 1 {\displaystyle f:(s_{t},a_{t})\mapsto s_{t+1}} . The forward dynamics model is trained as the agent plays. The model becomes better at predicting state transition for state-action pairs that had been done many times. A forward dynamics model can define an exploration reward by r t i = ‖ f ( s t , a t ) − s t + 1 ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-s_{t+1}\|_{2}^{2}} . That is, the reward is the squared-error of the prediction compared to reality. This rewards the agent to perform state-action pairs that had not been done many times. This is however susceptible to the noisy TV problem. Dynamics model can be run in latent space. That is, r t i = ‖ f ( s t , a t ) − ϕ ( s t + 1 ) ‖ 2 2 {\displaystyle r_{t}^{i}=\|f(s_{t},a_{t})-\phi (s_{t+1})\|_{2}^{2}} for some featurizer ϕ {\displaystyle \phi } . The featurizer can be the identity function (i.e. ϕ ( x ) = x {\displaystyle \phi (x)=x} ), randomly generated, the encoder-half of a variational autoencoder, etc. A good featurizer improves forward dynamics exploration. The Intrinsic Curiosity Module (ICM) method trains simultaneously a forward dynamics model and a featurizer. The featurizer is trained by an inverse dynamics model, which is a function for predicting the current action based on the features of the current and the next state: g : ( ϕ ( s t ) , ϕ ( s t + 1 ) ) ↦ a t {\displaystyle g:(\phi (s_{t}),\phi (s_{t+1}))\mapsto a_{t}} . By optimizing the inverse dynamics, both the inverse dynamics model and the featurizer are improved. Then, the improved featurizer improves the forward dynamics model, which improves the exploration of the agent. Random Network Distillation (RND) method attempts to solve this problem by teacher–student distillation. Instead of a forward dynamics model, it has two models f , f ′ {\displaystyle f,f'} . The f ′ {\displaystyle f'} teacher model is fixed, and the f {\displaystyle f} student model is trained to minimize ‖ f ( s ) − f ′ ( s ) ‖ 2 2 {\displaystyle \|f(s)-f'(s)\|_{2}^{2}} on states s {\displaystyle s} . As a state is visited more and more, the student network becomes better at predicting the teacher. Meanwhile, the prediction error is also an exploration reward for the agent, and so the agent learns to perform actions that result in higher prediction error. Thus, we have a student network attempting to minimize the prediction error, while the agent attempting to maximize it, resulting in exploration. The states are normalized by subtracting a running average and dividing a running variance, which is necessary since the teacher model is frozen. The rewards are normalized by dividing with a running variance. Exploration by disagreement trains an ensemble of forward dynamics models, each on a random subset of all ( s t , a t , s t + 1 ) {\displaystyle (s_{t},a_{t},s_{t+1})} tuples. The exploration reward is the variance of the models' predictions. === Noise === For neural network–based agents, the NoisyNet method changes some of its neural network modules by noisy versions. That is, some network parameters are random variables from a probability distribution. The parameters of the distribution are themselves learnable. For example, in a linear layer y = W x + b {\displaystyle y=Wx+b} , both W , b {\displaystyle W,b} are sampled from Gaussian distributions N ( μ W , Σ W ) , N ( μ b , Σ b ) {\displaystyle {\mathcal {N}}(\mu _{W},\Sigma _{W}),{\mathcal {N}}(\mu _{b},\Sigma _{b})} at every step, and the parameters μ W , Σ W , μ b , Σ b {\displaystyle \mu _{W},\Sigma _{W},\mu _{b},\Sigma _{b}} are learned via the reparameterization trick.

How to Choose an AI Copywriting Tool

Trying to pick the best AI copywriting tool? An AI copywriting tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI copywriting tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.