In mathematics, the correlation immunity of a Boolean function is a measure of the degree to which its outputs are uncorrelated with some subset of its inputs. Specifically, a Boolean function is said to be correlation-immune of order m if every subset of m or fewer variables in x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\ldots ,x_{n}} is statistically independent of the value of f ( x 1 , x 2 , … , x n ) {\displaystyle f(x_{1},x_{2},\ldots ,x_{n})} . == Definition == A function f : F 2 n → F 2 {\displaystyle f:\mathbb {F} _{2}^{n}\rightarrow \mathbb {F} _{2}} is k {\displaystyle k} -th order correlation immune if for any independent n {\displaystyle n} binary random variables X 0 … X n − 1 {\displaystyle X_{0}\ldots X_{n-1}} , the random variable Z = f ( X 0 , … , X n − 1 ) {\displaystyle Z=f(X_{0},\ldots ,X_{n-1})} is independent from any random vector ( X i 1 … X i k ) {\displaystyle (X_{i_{1}}\ldots X_{i_{k}})} with 0 ≤ i 1 < … < i k < n {\displaystyle 0\leq i_{1}<\ldots AI warfare refers to the use of artificial intelligence technologies to automate military operation and enhance or bypass human decision-making in armed conflicts. AI is used to rapidly analyze large volumes of military intelligence data, including making recommendations or decisions on who and what to target. Abdul-Rahman al-Rawi, a 20-year-old student, was the first acknowledged civilian killed by AI-assisted airstrike in a U.S. strike in Iraq in 2024. In 2026, the U.S. declared it would become an 'AI-first' warfighting force. Husain et al (2018) coined the term hyperwar to refer to warfare which is algorithmic or controlled by artificial intelligence, with little to no human decision-making. == 2026 Iran war == The 2026 Iran war has been described as the "first AI war", although the Untied States and Israel have previously used AI to identify targets during the Gaza war. The U.S. has used AI tools to attack Iran. These tools have been used for military intelligence, targeting, and damage assessment in the war in Iran. Using the Maven smart system, the U.S. attacked 1,000 targets in the first 24 hours of the war and 5,000 targets over the course of 10 days. While the U.S. had used Maven in 2022 to share targeting information with Ukraine and strike against Iraq, Syria, and against the Houthis in 2024, Iran's attacks are its biggest. Authorities are looking into whether artificial intelligence was involved in the airstrike on an Iranian girls' school that killed 170 civilians, the majority of whom were female students. The United States Central Command emphasized that humans were making final targeting decisions. Per a White House tally released on April 8, the U.S. military hit over 13,000 targets in Iran during the war's first 38 days, including more than 2,000 command-and-control sites, 1,500 air defense targets, and 1,450 industrial infrastructure targets. == Gaza war == As part of the Gaza war, the Israel Defense Forces (IDF) have used artificial intelligence to rapidly and automatically perform much of the process of determining what to bomb. IDF's Unit 8200 developed AI systems, dubbed the Gospel and Lavender, to find targets for the Israeli Air Force to bomb. The Gospel automatically provides targeting recommendations to human analysts, who decide whether to approve strikes. Lavender identified 37,000 Hamas-linked individuals early in the war, and was used alongside the Gospel, which chooses buildings or structures as targets. According to a report by +972 Magazine and Local Call, strikes assisted by Lavender were routinely permitted to kill 5–20 civilians for each suspected Hamas militant, who were often bombed at home with their families. The IDF denies these claims, maintaining that every strike is assessed to minimize collateral damage, and that there is no policy "to kill tens of thousands of people in their homes." Israel deployed AI technologies during the Gaza war for audio analysis, facial recognition, and airstrike targeting. One such system was used to help identify the location of Hamas commander Ibrahim Biari through phone call analysis, leading to strikes that killed him as well as more than 125 civilians. == 2022 Russian Ukraine war == Kyiv launched a project with Palantir called Brave1 Dataroom to build AI systems using the extensive combat data Ukraine has gathered since Russia’s full-scale invasion in 2022. The country has also created tools for in-depth airstrike analysis, introduced AI to process large volumes of intelligence, and incorporated these technologies into the planning of long-range strike operations. == Involved companies == Maven Smart System is developed by Palantir. It integrates Anthropic's Claude as its large language model, and uses Amazon's AWS servers as its cloud infrastructure. Since Anthropic's refusal to support autonomous weapons development and domestic surveillance efforts. In its place, other AI firms, including OpenAI, have been brought in to take over that role. == Involved state actors == In 2024, the United States Department of Defense had 800-plus active AI-related projects and requested $1.8 billion in AI funding, with Project Maven and Project Artemis (AI-resistant drones developed together with Ukraine) being the main ones. The technology has been used in Iran, Iraq, Syria and Yemen to identify targets. China is pursuing intelligentized warfare, integrating AI across all combat domains—land, sea, air, space, and cyber—with military AI spending exceeding $1.6 billion annually. == International regulation == Since 2014, states meeting within the framework of the Convention on Certain Conventional Weapons have discussed lethal autonomous weapon systems. In 2016, the treaty's states parties established an open-ended Group of Governmental Experts on Lethal Autonomous Weapons Systems to continue those discussions. The discussions have addressed international humanitarian law, accountability, possible prohibitions and regulations, and the extent of human control required over AI-enabled weapons. Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He has been described by media outlets as a leading pioneer of modern artificial intelligence. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. He is best known for his work on long short-term memory (LSTM), a type of neural network architecture which was the dominant technique for various natural language processing tasks in research and commercial applications in the 2010s. He also introduced principles of dynamic neural networks, meta-learning, generative adversarial networks and linear transformers, all of which are widespread in modern AI. == Career == Schmidhuber completed his undergraduate (1987) and PhD (1991) studies at the Technical University of Munich in Munich, Germany. His PhD advisors were Wilfried Brauer and Klaus Schulten. He taught there from 2004 until 2009. From 2009 to 2021, he was a professor of artificial intelligence at the Università della Svizzera Italiana in Lugano, Switzerland. He has served as the director of Dalle Molle Institute for Artificial Intelligence Research (IDSIA), a Swiss AI lab, since 1995. Since 2021, he has also been the director of the AI Initiative at the King Abdullah University of Science and Technology (KAUST). In 2014, Schmidhuber formed a company, NNAISENSE, to work on commercial applications of artificial intelligence in fields such as finance, heavy industry and self-driving cars. Sepp Hochreiter, Jaan Tallinn, and Marcus Hutter are advisers to the company. Sales were under US$11 million in 2016; however, Schmidhuber states that the current emphasis is on research and not revenue. NNAISENSE raised its first round of capital funding in January 2017. Schmidhuber's overall goal is to create an all-purpose AI by training a single AI in sequence on a variety of narrow tasks, but as of 2026 he has said that the focus of NNAISENSE has shifted from artificial general intelligence to asset management. == Research == In the 1980s, backpropagation did not work well for deep learning with long credit assignment paths in artificial neural networks. To overcome this problem, Schmidhuber (1991) proposed a hierarchy of recurrent neural networks (RNNs) pre-trained one level at a time by self-supervised learning. It uses predictive coding to learn internal representations at multiple self-organizing time scales, facilitating downstream deep learning. The RNN hierarchy can be collapsed into a single RNN, by distilling a higher level chunker network into a lower level automatizer network. In 1993, a chunker solved a deep learning task whose depth exceeded 1000. In 1991, Schmidhuber published adversarial neural networks that contest with each other in the form of a zero-sum game, where one network's gain is the other network's loss. The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. This was called "artificial curiosity". In 2014, this principle was used in the creation of the generative adversarial network, which Schmidhuber describes as a special case of artificial curiosity where the environmental reaction is 1 or 0 depending on whether the first network's output is in a given set. Schmidhuber supervised the 1991 diploma thesis of his student Sepp Hochreiter which he considered "one of the most important documents in the history of machine learning". It studied the neural history compressor and analyzed and overcame the vanishing gradient problem. This led to the creation of long short-term memory (LSTM), a type of recurrent neural network. The name LSTM was introduced in a tech report in 1995, leading to the most cited LSTM publication, published in 1997 and co-authored by Hochreiter and Schmidhuber. The standard LSTM architecture was introduced in 2000 by Felix Gers, Schmidhuber, and Fred Cummins. Today's "vanilla LSTM" using backpropagation through time was published with his student Alex Graves in 2005, and its connectionist temporal classification (CTC) training algorithm in 2006. CTC was applied to end-to-end speech recognition with LSTM. In 2014, the state of the art was training “very deep neural network” with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In May 2015, Rupesh Kumar Srivastava, Klaus Greff, and Schmidhuber used LSTM principles to create the highway network, a feedforward neural network with hundreds of layers, much deeper than previous networks. In Dec 2015, the residual neural network (ResNet) was published, which is a variant of the highway network. In 1992, Schmidhuber published fast weights programmer, an alternative to recurrent neural networks. It has a slow feedforward neural network that learns by gradient descent to control the fast weights of another neural network through outer products of self-generated activation patterns, and the fast weights network itself operates over inputs. This was later shown to be equivalent to the unnormalized linear transformer. In 2011, Schmidhuber's team at IDSIA with his postdoc Dan Ciresan also achieved dramatic speedups of convolutional neural networks (CNNs) using graphics processing units (GPUs), based on CNN designs introduced much earlier by Kunihiko Fukushima. An earlier CNN on GPU by Chellapilla et al. (2006) was 4 times faster than an equivalent implementation on CPU. The deep CNN of Dan Ciresan et al. (2011) at IDSIA was 60 times faster and achieved the first superhuman performance in a computer vision contest in August 2011. Between 15 May 2011 and 10 September 2012, these CNNs won four more image competitions and improved the state of the art on multiple image benchmarks. The approach has become central to the field of computer vision. == Credit disputes == Schmidhuber has controversially argued that he and other researchers have been denied adequate recognition for their contribution to the field of deep learning, in favour of Geoffrey Hinton, Yoshua Bengio and Yann LeCun, who shared the 2018 Turing Award for their work in deep learning. He wrote a "scathing" 2015 article arguing that Hinton, Bengio and LeCun "heavily cite each other" but "fail to credit the pioneers of the field". In a statement to the New York Times, Yann LeCun wrote that "Jürgen is manically obsessed with recognition and keeps claiming credit he doesn't deserve for many, many things... It causes him to systematically stand up at the end of every talk and claim credit for what was just presented, generally not in a justified manner." Schmidhuber replied that LeCun did this "without any justification, without providing a single example", and published details of numerous priority disputes with Hinton, Bengio and LeCun. The term "schmidhubered" has been jokingly used in the AI community to describe Schmidhuber's habit of publicly challenging the originality of other researchers' work, a practice seen by some in the AI community as a "rite of passage" for young researchers. Some suggest that Schmidhuber's significant accomplishments have been underappreciated due to his confrontational personality. == Recognition == Schmidhuber received the Helmholtz Award of the International Neural Network Society in 2013, and the Neural Networks Pioneer Award of the IEEE Computational Intelligence Society in 2016 for "pioneering contributions to deep learning and neural networks." He is a member of the European Academy of Sciences and Arts. He has been referred to as the "father of modern AI", the "father of generative AI", and the "father of deep learning". Schmidhuber himself, however, has called Alexey Grigorevich Ivakhnenko the "father of deep learning", and gives credit to many even earlier AI pioneers. The New York Times ran a profile under the headline "When A.I. Matures, It May Call Jürgen Schmidhuber 'Dad'", highlighting his early work on deep learning and his long‑term vision for self‑improving AI. == Views == Schmidhuber is a proponent of open source AI, and believes that they will become competitive against commercial closed-source AI. Since the 1970s, Schmidhuber wanted to create "intelligent machines that could learn and improve on their own and become smarter than him within his lifetime." He differentiates between two types of AIs: tool AI, such as those for improving healthcare, and autonomous AIs that set their own goals, perform their own research, and explore the universe. He has worked on both types for de Ayanna MacCalla Howard (born January 24, 1972) is an American roboticist, entrepreneur, and educator currently serving as the dean of the College of Engineering at Ohio State University. Assuming this role in March 2021, Howard became the first woman to lead the Ohio State College of Engineering. Howard previously served as the chair of the School of Interactive Computing in the Georgia Tech College of Computing, the Linda J. and Mark C. Smith Endowed Chair in Bioengineering in the School of Electrical and Computer Engineering, and the director of the Human-Automation Systems (Humans) Lab. == Early life and education == As a little girl, Howard was interested in aliens and robots. Her favorite TV show was The Bionic Woman. Howard received her B.S. in engineering from Brown University in 1993 and her M.S. and Ph.D. in electrical engineering from the University of Southern California in 1994 and 1999, respectively. Her thesis, Recursive Learning for Deformable Object Manipulation, was advised by George A. Bekey. In addition, Howard's Doctoral thesis was triggered by the AIDS epidemic with focus on sorting hospital waste by using robots. Howard has also received an MBA from Claremont Graduate University. == Career == Howard's early interest in artificial intelligence led her to pursue a senior position at Seattle-based Axcelis Inc, where she helped develop Evolver, the first commercial genetic algorithm, and Brainsheet, a neural network developed in partnership with Microsoft. From 1993 to 2005, she worked at the NASA Jet Propulsion Laboratory, holding multiple roles such as senior robotics researcher and deputy manager in the Office of the Chief Scientist. In 2005, she joined Georgia Tech as an associate professor and founder of the Human-Automation Systems (Humans) lab. She has also served as the associate director of research for Georgia Tech's Institute for Robotics and Intelligent Machines and as chair of the multidisciplinary robotics Ph.D. program at Georgia Tech. In 2017, she became the chair of the School of Interactive Computing at Georgia Tech. In 2008, Howard received worldwide attention for her SnoMote robots, designed to study the impact of global warming on the Antarctic ice shelves. In 2013, she founded Zyrobotics, which has released their first suite of therapy and educational products for children with special needs. Howard has authored 250 publications in reputable journals and conferences, including serving as co-editor/co-author of more than a dozen books and book chapters. She has also received four patents and given over 140 invited talks and keynotes. She is a fellow of the Association for the Advancement of Artificial Intelligence (AAAI) and the Institute of Electrical and Electronics Engineers (IEEE). Among her many honors, Howard received the Computer Research Association's A. Nico Habermann Award and the Richard A. Tapia Achievement Award. In a 2020 interview on Marketplace, Howard outlined how companion robots could alleviate the effects of social distancing caused by the COVID-19 pandemic in the United States. On November 30, 2020, the Columbus Dispatch reported that Howard would become the next dean of the College of Engineering at Ohio State University on March 1, pending approval by the board of trustees. On March 1, 2021, she assumed this role, becoming the first woman to hold the position. In 2021, Howard received the Athena Lecturer Award from Association for Computing Machinery (ACM) for her Contributions to Robotics, AI and Broadening Participation in Computing. In June 2022, Howard was elected a trustee of Brown University. == Research == Howard's research interests include human-robot interaction, assistive/rehabilitation robotics, science-driven/field robotics, and perception, learning, and reasoning. Howard's research and published works span across various topics in robotics and AI, including intelligent learning, virtual reality for rehabilitation and robotics in the role of pediatric therapy. Her research is highlighted by her focus on technology development for intelligent agents that must interact with and in a human-centered world. Her work, which addresses issues of human-robot interaction, learning, and autonomous control, has resulted in more than 200 peer-reviewed publications. == Honors and awards == Howard's numerous accomplishments have been documented in more than a dozen featured articles. In 2003, she was named to the MIT Technology Review TR100 as one of the top 100 innovators in the world under the age of 35. She was featured in Time magazine's "Rise of the Machines" article in 2004. She was also featured in a USA Today Science & Space article. Some of Howard's notable awards include: Lew Allen Award for Excellence (formerly the Director's Research Achievement Award of the Jet Propulsion Laboratory) for significant technical contributions, 2001 MIT Technology Review Top 100 Young Innovators of the Year, 2003 NAE Gilbreth Lectureship, 2010 A. Richard Newton Educator ABIE Award, Anita Borg Institute, 2014 Computer Research Association's A. Nico Habermann Award, 2016 Brown Engineering Alumni Medal (BEAM), 2016 AAAS-Lemelson Invention Ambassador, 2016-2017 Atlanta magazine's Women Making a Mark, 2017 Walker's Legacy #WLPower25 Atlanta Award, 2017 Forbes America's Top 50 Women In Tech, 2018 ACM Athena Lecturer Award, 2021 2021 class of Fellows of the American Association for the Advancement of Science. IEEE Fellow, 2021, "for contributions to human-robot interaction systems" 2023 AAAI/EAAI Patrick Henry Winston Outstanding Educator Award Andrew McCallum is an American professor in the computer science department at University of Massachusetts Amherst. His primary specialties are in machine learning, natural language processing, information extraction, information integration, and social network analysis. == Career == McCallum graduated summa cum laude from Dartmouth College in 1989. He completed his Ph.D. at the University of Rochester in 1995 under the supervision of Dana H. Ballard. McCallum was then a postdoctoral fellow, working with Sebastian Thrun and Tom M. Mitchell at Carnegie Mellon University. From 1998 to 2000, he was a Research Scientist and Research Coordinator at Justsystem Pittsburgh Research Center. From 2000 to 2002, he was Vice President of Research and Development at WhizBang Labs, and Director of its Pittsburgh office. Since 2002, he has worked as a professor of computer science at the University of Massachusetts Amherst. In 2020, he also joined Google as a part-time research scientist. He was elected as a fellow of the Association for the Advancement of Artificial Intelligence in 2009, and as an Association for Computing Machinery in 2017. From 2014 to 2017, he was the President of International Machine Learning Society (IMLS), which organizes the International Conference on Machine Learning. He is also the director of the Center for Data Science at UMass, leading a new partnership with the Chan and Zuckerberg Initiative. In 2018, the initiative made an initial grant of 5.5 million to the center, supporting research to facilitate new ways for scientists to explore and discover research articles. == Main contributions == In collaboration with John D. Lafferty and Fernando Pereira, McCallum developed conditional random fields, first described in a paper presented at the International Conference on Machine Learning (ICML). In 2011 this research paper won the ICML "Test of Time" (10-year best paper) award. McCallum has written several widely used open-source software toolkits for machine learning, natural language processing and other text processing, including Rainbow, Mallet (software project), and FACTORIE. In addition, he was instrumental in publishing the Enron Corpus, a large collection of emails that has been used as a basis for a number of academic studies of social networking and language. McCallum instigated and directs the nonprofit project OpenReview.net, an online platform that aims to promote openness in scientific communication, particularly the peer review process, by providing a flexible cloud-based web interface and underlying database API. Adobe Presenter Video Express is screencasting and video editing software developed by Adobe Systems. == Description == Adobe Presenter Video Express is primarily used as a software by video creators, to record and mix webcam and screen video feeds. It allows users to simultaneously record video from their webcam and the screen, and easily mix the 2 tracks with a simple user interface. Users can change the background in their recorded video without needing equipment like a green screen. This is unlike other video tools which rely on chroma keying technology, and only work with green or blue screens. They can also add annotations and quizzes to their content and publish the video to MP4 or HTML5 formats. == List of notable features == === Record and mix, screen and webcam === Support for simultaneous recording of screen and webcam video feeds, with a simple editing interface to mix the two video streams. This lets the author rapidly create screencasts, software demos, etc. === Make my background awesome === This feature allows authors to change the background of their webcam recording without needing a green screen, provided they use a solid-colored backdrop which contrasts well against them. Authors can select images, videos or even the screen recording as their background. === In-video quizzing === Authors can insert quizzes within their video content. On success/failure attempts, the author can decide what message to display, and can also configure the video to jump to a certain point and play. Quizzes are published as part of the interactive HTML 5 player, which cannot be hosted on YouTube and Vimeo. === LMS Reporting === Authors can publish to any SCORM compliant LMS (Learning Management System) for quiz reporting, or to Adobe Captivate Prime. === In-app assets and branding === Adobe Presenter Video Express ships with a large number of branding videos, backgrounds and video filters to help authors create studio quality videos. === MP4 and HTML5 Output === The tool publishes a single MP4 video file containing all the video content, within an HTML 5 wrapper that contains the interactive player. The interactive HTML 5 player can be hosted on any website. == Common uses == === Screencasting === Screencasting is the process of recording one's computer screen as a video, usually with an audio voice over, to create a software demonstration, tutorial, presentation, etc. Adobe Presenter Video Express supports simultaneous recording of full screen video and microphone audio for creating screencasts. === Product marketing and demos === The ability to record the webcam video in addition to everything that is visible on the screen in Adobe Presenter Video Express, allows the author to add their personality to their screencasts. Features like video mixing and 'make my background awesome' further enhance the presentation, allowing effortless creation of marketing and demo videos. === Education === Adobe Presenter Video Express supports in-video quizzes and LMS reporting, along with screencasting and webcam recording. These features make it a powerful tool for creating educational content. == Differences from Adobe Presenter and Adobe Captivate == Adobe Presenter is a Microsoft PowerPoint plug-in for converting PowerPoint slides into interactive eLearning content, available only on Windows. Starting with Adobe Presenter 8, the video creation tool Adobe Presenter Video Express was bundled with every purchase of Adobe Presenter. From September 2015, Adobe Presenter Video Express 11 was also made available as a stand-alone product on Windows and Mac. A subscription license for Adobe Presenter Video Express, valid on Windows and Mac, is available for $9.99/month. Adobe Presenter Video Express continues to be bundled with purchases of Adobe Presenter on Windows as well. Adobe Captivate is an authoring tool for creating numerous forms of interactive eLearning content. Unlike Adobe Presenter, it uses a proprietary editing interface instead of Microsoft PowerPoint. While it is possible to create screen captures with Adobe Captivate, you cannot record the webcam feed. Adobe Captivate does not bundle Adobe Presenter or Adobe Presenter Video Express. Imitation learning is a paradigm in reinforcement learning, where an agent learns to perform a task by supervised learning from expert demonstrations . It is also called learning from demonstration and apprenticeship learning. It has been applied to underactuated robotics, self-driving cars, quadcopter navigation, helicopter aerobatics, and locomotion. == Approaches == Expert demonstrations are recordings of an expert performing the desired task, often collected as state-action pairs ( o t ∗ , a t ∗ ) {\displaystyle (o_{t}^{},a_{t}^{})} . === Behavior Cloning === Behavior Cloning (BC) is the most basic form of imitation learning. Essentially, it uses supervised learning to train a policy π θ {\displaystyle \pi _{\theta }} such that, given an observation o t {\displaystyle o_{t}} , it would output an action distribution π θ ( ⋅ | o t ) {\displaystyle \pi _{\theta }(\cdot |o_{t})} that is approximately the same as the action distribution of the experts. BC is susceptible to distribution shift. Specifically, if the trained policy differs from the expert policy, it might find itself straying from expert trajectory into observations that would have never occurred in expert trajectories. This was already noted by ALVINN, where they trained a neural network to drive a van using human demonstrations. They noticed that because a human driver never strays far from the path, the network would never be trained on what action to take if it ever finds itself straying far from the path. === DAgger === DAgger (Dataset Aggregation) improves on behavior cloning by iteratively training on a dataset of expert demonstrations. In each iteration, the algorithm first collects data by rolling out the learned policy π θ {\displaystyle \pi _{\theta }} . Then, it queries the expert for the optimal action a t ∗ {\displaystyle a_{t}^{}} on each observation o t {\displaystyle o_{t}} encountered during the rollout. Finally, it aggregates the new data into the dataset D ← D ∪ { ( o 1 , a 1 ∗ ) , ( o 2 , a 2 ∗ ) , . . . , ( o T , a T ∗ ) } {\displaystyle D\leftarrow D\cup \{(o_{1},a_{1}^{}),(o_{2},a_{2}^{}),...,(o_{T},a_{T}^{})\}} and trains a new policy on the aggregated dataset. === Decision transformer === The Decision Transformer approach models reinforcement learning as a sequence modelling problem. Similar to Behavior Cloning, it trains a sequence model, such as a Transformer, that models rollout sequences ( R 1 , o 1 , a 1 ) , ( R 2 , o 2 , a 2 ) , … , ( R t , o t , a t ) , {\displaystyle (R_{1},o_{1},a_{1}),(R_{2},o_{2},a_{2}),\dots ,(R_{t},o_{t},a_{t}),} where R t = r t + r t + 1 + ⋯ + r T {\displaystyle R_{t}=r_{t}+r_{t+1}+\dots +r_{T}} is the sum of future reward in the rollout. During training time, the sequence model is trained to predict each action a t {\displaystyle a_{t}} , given the previous rollout as context: ( R 1 , o 1 , a 1 ) , ( R 2 , o 2 , a 2 ) , … , ( R t , o t ) {\displaystyle (R_{1},o_{1},a_{1}),(R_{2},o_{2},a_{2}),\dots ,(R_{t},o_{t})} During inference time, to use the sequence model as an effective controller, it is simply given a very high reward prediction R {\displaystyle R} , and it would generalize by predicting an action that would result in the high reward. This was shown to scale predictably to a Transformer with 1 billion parameters that is superhuman on 41 Atari games. === Other approaches === See for more examples. == Related approaches == Inverse Reinforcement Learning (IRL) learns a reward function that explains the expert's behavior and then uses reinforcement learning to find a policy that maximizes this reward. Recent works have also explored multi-agent extensions of IRL in networked systems. Generative Adversarial Imitation Learning (GAIL) uses generative adversarial networks (GANs) to match the distribution of agent behavior to the distribution of expert demonstrations. It extends a previous approach using game theory.AI warfare
Jürgen Schmidhuber
Ayanna Howard
Andrew McCallum
Adobe Presenter Video Express
Imitation learning