List of publications in data science

This is a list of publications in data science, generally organized by order of use in a data analysis workflow. See the list of publications in statistics for more research-based and fundamental publications; while this list is more applied, business oriented, and cross-disciplinary. General article inclusion criteria are: Papers from notable practitioners or notable professors, either with a Wikipedia page or reference to their notability Common knowledge all data professionals should know, with references validating this claim Highly cited applied statistics and machine learning publications Discussion-facilitating papers on the field of data science as a whole (for example, the Attention Is All You Need paper is arguably a landmark paper that can be added here, but it is specific to generative artificial intelligence, not for all practitioners of data) Some reasons why a particular publication might be regarded as important: Topic creator – A publication that created a new topic Breakthrough – A publication that changed scientific knowledge significantly Influence – A publication which has significantly influenced the world or has had a massive impact on the teaching of data science. When possible, a reference is used to validate the inclusion of the publication in this list. == History == Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) Author: Leo Breiman Publication data: Online version: https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.pdf Description: Describes two cultures of statistics, one using a parsimonious and generative stochastic model, while the other is an algorithmic model with no known mechanism for how the data is generated. Breiman argues that while statistics has traditionally favored using the stochastic model, there is value in expanding the methods that statisticians can use to study phenomenon. Importance: Influence on the philosophies of statisticians right before the increased use of machine learning and deep learning methods. In a 20-year retrospective on this article, "Breiman's words are perhaps more relevant than ever". Notable statisticians at the time wrote opinion pieces about the publication. Although overall critical of the publication, David Cox writes that the publication "contains enough truth and exposes enough weaknesses to be thought-provoking." Bradley Efron commented that this publication is a "stimulating paper". Emanuel Parzen also comments about this publication that "Breiman alerts us to systematic blunders (leading to wrong conclusions) that have been committed applying current statistical practice of data modeling". Data Scientist: The Sexiest Job of the 21st Century Author: Thomas H. Davenport and DJ Patil Publication data: Online version: hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century Description: Describes the new role at companies that is coined "Data scientist", what they do, how an organization might recruit one to their organization, and how to work with one effectively. Importance: This publication has been an influence on the data community as mentioned near the time it was published in 2012 by institutions like IEEE Spectrum, but also mentioned nearly a decade later asking the same question the title poses. In a retrospective response to their own publication 10 years earlier, authors Davenport and Patil have reflected that the role of a data scientist has "become better institutionalized, the scope of the job has been redefined, the technology it relies on has made huge strides, and the importance of non-technical expertise, such as ethics and change management, has grown". 50 Years of Data Science Author: David Donoho Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734 Description: Retrospective discussion paper on the history and origins of data science, with a number of commentary from notable statisticians. Importance: This has been described as "the first in the field to present such a comprehensive and in-depth survey and overview", and helps to define the field that has many definitions. The Composable Data Management System Manifesto Author: Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, Jacques Nadeau Publication data: Online version: https://www.vldb.org/pvldb/vol16/p2679-pedreira.pdf Description: The vision paper advocating for a paradigm shift in how data management systems are designed using standard, composable, interoperable tools rather than siloed software tools. Importance: A paradigm shifting view on how future data science software tools should be designed for more efficient workflows, the principles of which "will be especially crucial for addressing fragmentation, improving interoperability, and promoting user-centricity as data ecosystems grow increasingly complex". == Data collection and organization == Tidy Data Author: Hadley Wickham Publication data: Online version: https://www.jstatsoft.org/article/view/v059i10/ https://vita.had.co.nz/papers/tidy-data.pdf Description: Describes a framework for data cleaning that is summarized in the quote, "each variable is a column, each observation is a row, and each type of observational unit is a table". This allows a standard data structure for which data analysis tools can be consistently built around. Importance: Cited over 1,500 times, this effort for tidy data has been described by David Donoho as having "more impact on today's practice of data analysis than many highly regarded theoretical statistics articles". In the context of data visualization, this publication is said to support "efficient exploration and prototyping because variables can be assigned different roles in the plot without modifying anything about the original dataset". Data Organization in Spreadsheets Author: Karl W. Broman and Kara H. Woo Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989 Description: This article offers practical recommendations for organizing data in spreadsheets, like Microsoft Excel and Google Sheets, to reduce errors and lower the barrier for later analyses due to limitations in spreadsheets or quirks in the software. Importance: Influences teaching both data and non-data practitioners to create more analysis-friendly spreadsheets, and has been described to outline "spreadsheet best practices". == Data visualizations == Quantitative Graphics in Statistics: A Brief History Author: James R. Beniger and Dorothy L. Robyn Publication data: Online version: https://www.jstor.org/stable/2683467 Description: Outlines history and evolution of quantitative graphics in statistics, going through spatial organization (17th and 18th centuries), discrete comparison (18th and 19th centuries), continuous distribution (19th century), and multivariate distribution and correlation (late 19th and 20th centuries). Importance: Helps put into perspective for learning data practitioners the recency of graphics that are used. A later publication "Graphical Methods in Statistics" by Stephen Fienberg in 1979 writes that his publication "owes much to the work of Beniger and Robyn". == Practice == Data Science for Business Author: Foster Provost and Tom Fawcett Publication data: Online version: N/A Description: Broadly outlines principles of data science and data-analytic thinking for businesses. Importance: Cited over 3,000 times, it is "highly recommended for students" but also it is also recommended due to its "relevance to senior management leaders who want to build and lead a team of data scientists and implement data science in solving complex business problems". == Tooling == Hidden Technical Debt in Machine Learning Systems Author: D. Sculley, Gary Holy, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison Publication data: Online version: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf Description: This paper argues that it is "dangerous to think of [complex machine learning] quick wins as coming for free" and overviews risk factors to account for when implementing a machine learning system. Importance: All authors worked for Google, article is cited over 2,000 times, and helped practitioners thinking about quickly implementing a machine learning tool without understanding the long-term maintenance of the tool. A few useful things to know about machine learning Author: Pedro Domingos Publication data: Online version: https://dl.acm.org/doi/10.1145/2347736.2347755 https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf Description: The purpose of this paper is to distill inaccessible "folk knowledge" to effectively implement machine learning projects because "machin

Colloquis

Colloquis, previously known as ActiveBuddy and Conversagent, was a company that created conversation-based interactive agents originally distributed via instant messaging platforms. The company had offices in New York, New York, and Sunnyvale, California. == History == Founded in 2000, the company was the brainchild of Robert Hoffer, Timothy Kay, and Peter Levitan. The idea for interactive agents (also known as Internet bots) came from the team's vision to add functionality to increasingly popular instant messaging services. The original implementation took shape as a word-based adventure game but quickly grew to include a wide range of database applications, including access to news, weather, stock information, movie times, Yellow Pages listings, and detailed sports data, as well as a variety of tools (calculators, translator, etc.). These various applications were bundled into one entity and launched as SmarterChild in 2001. SmarterChild acted as a showcase for the quick data access and possibilities for fun conversation that the company planned to turn into customized, niche-specific products. The rapid success of SmarterChild led to targeted promotional products for Radiohead, Austin Powers, The Sporting News, and others. ActiveBuddy sought to strengthen its hold on the interactive agent market for the future by filing for, and receiving, a controversial patent on their creation in 2002. The company also released the BuddyScript SDK, a free developer kit that allow programmers to design and launch their own interactive agents using ActiveBuddy's proprietary scripting language, in 2002. Ultimately, however, the decline in ad spending in 2001 and 2002 led to a shift in corporate strategy towards business focused Automated Service Agents, building products for clients including Cingular, Comcast and Cox Communications. The company subsequently changed its name from ActiveBuddy to Conversagent in 2003, and then again to Colloquis in 2006. Colloquis was purchased by Microsoft in October 2006.

Structure mapping engine

In artificial intelligence and cognitive science, the structure mapping engine (SME) is an implementation in software of an algorithm for analogical matching based on the psychological theory of Dedre Gentner. The basis of Gentner's structure-mapping idea is that an analogy is a mapping of knowledge from one domain (the base) into another (the target). The structure-mapping engine is a computer simulation of the analogy and similarity comparisons. The theory is useful because it ignores surface features and finds matches between potentially very different things if they have the same representational structure. For example, SME could determine that a pen is like a sponge because both are involved in dispensing liquid, even though they do this very differently. == Structure mapping theory == Structure mapping theory is based on the systematicity principle, which states that connected knowledge is preferred over independent facts. Therefore, the structure mapping engine should ignore isolated source-target mappings unless they are part of a bigger structure. The SME, the theory goes, should map objects that are related to knowledge that has already been mapped. The theory also requires that mappings be done one-to-one, which means that no part of the source description can map to more than one item in the target and no part of the target description can be mapped to more than one part of the source. The theory also requires that if a match maps subject to target, the arguments of subject and target must also be mapped. If both these conditions are met, the mapping is said to be "structurally consistent." == Concepts in SME == SME maps knowledge from a source into a target. SME calls each description a dgroup. Dgroups contain a list of entities and predicates. Entities represent the objects or concepts in a description — such as an input gear or a switch. Predicates are one of three types and are a general way to express knowledge for SME. Relation predicates contain multiple arguments, which can be other predicates or entities. An example relation is: (transmit (what from to)). This relation has a functor transmit and takes three arguments: what, from, and to. Attribute predicates are the properties of an entity. An example of an attribute is (red gear) which means that gear has the attribute red. Function predicates map an entity into another entity or constant. An example of a function is (joules power source) which maps the entity power source onto the numerical quantity joules. Functions and attributes have different meanings, and consequently SME processes them differently. For example, in SME's true analogy rule set, attributes differ from functions because they cannot match unless there is a higher-order match between them. The difference between attributes and functions will be explained further in this section's examples. All predicates have four parameters. They have (1) a functor, which identifies it, and (2) a type, which is either relation, attribute, or function. The other two parameters (3 and 4) are for determining how to process the arguments in the SME algorithm. If the arguments have to be matched in order, commutative is false. If the predicate can take any number of arguments, N-ary is false. An example of a predicate definition is: (sme:defPredicate behavior-set (predicate) relation :n-ary? t :commutative? t) The predicate's functor is “behavior-set,” its type is “relation,” and its n-ary and commutative parameters are both set to true. The “(predicate)” part of the definition specifies that there will be one or more predicates inside an instantiation of behavior-set. == Algorithm details == The algorithm has several steps. The first step of the algorithm is to create a set of match hypotheses between source and target dgroups. A match hypothesis represents a possible mapping between any part of the source and the target. This mapping is controlled by a set of match rules. By changing the match rules, one can change the type of reasoning SME does. For example, one set of match rules may perform a kind of analogy called literal similarity, and another performs a kind of analogy called true-analogy. These rules are not the place where domain-dependent information is added, but rather where the analogy process is tweaked, depending on the type of cognitive function the user is trying to emulate. For a given match rule, there are two types of rules that further define how it will be applied: filter rules and intern rules. Intern rules use only the arguments of the expressions in the match hypotheses that the filter rules identify. This limitation makes the processing more efficient by constraining the number of match hypotheses that are generated. At the same time, it also helps to build the structural consistencies that are needed later on in the algorithm. An example of a filter rule from the true-analogy rule set creates match hypotheses between predicates that have the same functor. The true-analogy rule set has an intern rule that iterates over the arguments of any match hypothesis, creating more match hypotheses if the arguments are entities or functions, or if the arguments are attributes and have the same functor. In order to illustrate how the match rules produce match hypotheses consider these two predicates: transmit torque inputgear secondgear (p1) transmit signal switch div10 (p2) Here we use true analogy for the type of reasoning. The filter match rule generates a match between p1 and p2 because they share the same functor, transmit. The intern rules then produce three more match hypotheses: torque to signal, inputgear to switch, and secondgear to div10. The intern rules created these match hypotheses because all the arguments were entities. If the arguments were functions or attributes instead of entities, the predicates would be expressed as: transmit torque (inputgear gear) (secondgear gear) (p3) transmit signal (switch circuit) (div10 circuit) (p4) These additional predicates make inputgear, secondgear, switch, and div10 functions or attributes depending on the value defined in the language input file. The representation also contains additional entities for gear and circuit. Depending on what type inputgear, secondgear, switch, and div10 are, their meanings change. As attributes, each one is a property of the gear or circuit. For example, the gear has two attributes, inputgear and secondgear. The circuit has two attributes, switch and circuit. As functions inputgear, secondgear, switch, and div10 become quantities of the gear and circuit. In this example, the functions inputgear and secondgear now map to the numerical quantities “torque from inputgear” and “torque from secondgear,” For the circuit the quantities map to logical quantity “switch engaged” and the numerical quantity “current count on the divide by 10 counter.” SME processes these differently. It does not allow attributes to match unless they are part of a higher-order relation, but it does allow functions to match, even if they are not part of such a relation. It allows functions to match because they indirectly refer to entities and thus should be treated like relations that involve no entities. However, as next section shows, the intern rules assign lower weights to matches between functions than to matches between relations. The reason SME does not match attributes is because it is trying to create connected knowledge based on relationships and thus satisfy the systematicity principle. For example, if both a clock and a car have inputgear attributes, SME will not mark them as similar. If it did, it would be making a match between the clock and car based on their appearance — not on the relationships between them. When the additional predicates in p3 and p4 are functions, the results from matching p3 and p4 are similar to the results from p1 and p2 except there is an additional match between gear and circuit and the values for the match hypotheses between (inputgear gear) and (switch circuit), and (secondgear gear) and (div10 circuit), are lower. The next section describes the reason for this in more detail. If the inputgear, secondgear, switch, and div10 are attributes instead of entities, SME does not find matches between any of the attributes. It finds matches only between the transmit predicates and between torque and signal. Additionally, the structural-evaluation scores for the remaining two matches decrease. In order to get the two predicates to match, p3 would need to be replaced by p5, which is demonstrated below. transmit torque (inputgear gear) (div10 gear) (p5) Since the true-analogy rule set identifies that the div10 attributes are the same between p5 and p4 and because the div10 attributes are both part of the higher-relation match between torque and signal, SME makes a match between (div10 gear) and (div10 circuit) — which leads to a match between gear and circuit. Being part of a higher-order match is a requiremen

SmartAction

SmartAction Company LLC is a U.S.-based software company that develops artificial intelligence–driven virtual agents for customer service applications, including voice-based interactive voice response (IVR) systems, chat, and SMS. The company was founded in 2009 by inventor and entrepreneur Peter Voss and is headquartered in Fort Worth, Texas. == History == In 2001, Peter Voss founded Adaptive AI, Inc., a research and development company focused on artificial intelligence concepts. In 2009, Voss founded SmartAction Company, LLC to commercialize customer-service automation software derived from this work. The company’s initial products focused on automating inbound and outbound calls for contact center environments. In November 2022, Kyle Johnson was appointed chief executive officer, succeeding Gary Davis, who had served as CEO since 2020. In 2024, SmartAction was acquired by Capacity, an AI-powered customer support automation company based in St. Louis, Missouri. == Technology == SmartAction develops cloud-based voice automation software that integrates speech recognition and natural language processing to support automated customer interactions in contact center environments. The platform supports automated handling of common customer service tasks and is designed to integrate with enterprise systems.

Noise-based logic

Noise-based logic (NBL) is a class of multivalued deterministic logic schemes, developed in the twenty-first century, where the logic values and bits are represented by different realizations of a stochastic process. The concept of noise-based logic and its name was created by Laszlo B. Kish. In its foundation paper it is noted that the idea was inspired by the stochasticity of brain signals and by the unconventional noise-based communication schemes, such as the Kish cypher. == The noise-based logic space and hyperspace == The logic values are represented by multi-dimensional "vectors" (orthogonal functions) and their superposition, where the orthogonal basis vectors are independent noises. By the proper combination (products or set-theoretical products) of basis-noises, which are called noise-bit, a logic hyperspace can be constructed with D(N) = 2N number of dimensions, where N is the number of noise-bits. Thus N noise-bits in a single wire correspond to a system of 2N classical bits that can express 22N different logic values. Independent realizations of a stochastic process of zero mean have zero cross-correlation with each other and with other stochastic processes of zero mean. Thus the basis noise vectors are orthogonal not only to each other but they and all the noise-based logic states (superpositions) are orthogonal also to any background noises in the hardware. Therefore, the noise-based logic concept is robust against background noises, which is a property that can potentially offer a high energy-efficiency. == The types of signals used in noise-based logic == In the paper, where noise-based logic was first introduced, generic stochastic-processes with zero mean were proposed and a system of orthogonal sinusoidal signals were also proposed as a deterministic-signal version of the logic system. The mathematical analysis about statistical errors and signal energy was limited to the cases of Gaussian noises and superpositions as logic signals in the basic logic space and their products and superpositions of their products in the logic hyperspace (see also. In the subsequent brain logic scheme, the logic signals were (similarly to neural signals) unipolar spike sequences generated by a Poisson process, and set-theoretical unifications (superpositions) and intersections (products) of different spike sequences. Later, in the instantaneous noise-based logic schemes and computation works, random telegraph waves (periodic time, bipolar, with fixed absolute value of amplitude) were also utilized as one of the simplest stochastic processes available for NBL. With choosing unit amplitude and symmetric probabilities, the resulting random-telegraph wave has 0.5 probability to be in the +1 or in the −1 state which is held over the whole clock period. == The noise-based logic gates == Noise-based logic gates can be classified according to the method the input identifies the logic value at the input. The first gates analyzed the statistical correlations between the input signal and the reference noises. The advantage of these is the robustness against background noise. The disadvantage is the slow speed and higher hardware complexity. The instantaneous logic gates are fast, they have low complexity but they are not robust against background noises. With either neural spike type signals or with bipolar random-telegraph waves of unity absolute amplitude, and randomness only in the sign of the amplitude offer very simple instantaneous logic gates. Then linear or analog devices unnecessary and the scheme can operate in the digital domain. However, whenever instantaneous logic must be interfaced with classical logic schemes, the interface must use correlator-based logic gates for an error-free signal. == Universality of noise-based logic == All the noise-based logic schemes listed above have been proven universal. The papers typically produce the NOT and the AND gates to prove universality, because having both of them is a satisfactory condition for the universality of a Boolean logic. == Computation by noise-based logic == The string verification work over a slow communication channel shows a powerful computing application where the methods is inherently based on calculating the hash function. The scheme is based on random telegraph waves and it is mentioned in the paper that the authors intuitively conclude that the intelligence of the brain is using similar operations to make a reasonably good decision based on a limited amount of information. The superposition of the first D(N) = 2N integer numbers can be produced with only 2N operations, which the authors call "Achilles ankle operation" in the paper. == Computer chip realization of noise-based logic == Preliminary schemes have already been published to utilize noise-based logic in practical computers. However, it is obvious from these papers that this young field has yet a long way to go before it will be seen in everyday applications.

Deep tomographic reconstruction

Deep Tomographic Reconstruction is a set of methods for using deep learning methods to perform tomographic reconstruction of medical and industrial images. It uses artificial intelligence and machine learning, especially deep artificial neural networks or deep learning, to overcome challenges such as measurement noise, data sparsity, image artifacts, and computational inefficiency. This approach has been applied across various imaging modalities, including CT, MRI, PET, SPECT, ultrasound, and optical imaging == Historical background == Traditional tomographic reconstruction relies on analytic methods such as filtered back-projection, or iterative methods which incrementally compute inverse transformations from measurement data (e.g., Radon or Fourier transform data). However, these approaches are not sufficient for certain imaging techniques such as low-dose CT and fast MRI, or scenarios involving metal artifacts and patient motion. == Use in imaging modalities == === Computed tomography (CT) === In CT, deep learning models can be particularly effective in reducing radiation exposure while maintaining image quality. Deep neural networks can also be able to reconstruct images of fair quality from sparsely sampled data without sacrificing diagnostic performance. Deep learning-based generative AI models can reduce CT metal artifacts. === Magnetic resonance imaging (MRI) === In magnetic resonance imaging (MRI), deep learning can lead to reduced MRI motion artifacts, and increased acquisition speed, referred to as fast MRI. Despite suffering from disadvantages such as lower signal-to-noise ratio (SNR), deep learning can enhance image quality in low field MRI, making these systems clinically viable. === Positron emission tomography (PET) and single-photon emission CT (SPECT) === For PET imaging, deep learning models can provide substantial improvements in low-dose imaging and motion artifact correction. Also, deep learning can help SPECT for generation of attenuation background. A notable technique for PET denoising involves integrating MR data through multimodal networks, which use anatomical information from MRI to enhance PET image quality. === Ultrasound imaging === Deep learning can enhance ultrasound imaging by reducing speckle noise and motion blur. For ultrasound beamforming, deep neural networks can allow superior image quality with limited data at high speed. === Optical imaging and microscopy === Diffuse optical tomography, optical coherence tomography and microscopy can be improved by deep neural networks beyond traditional methods. Furthermore, deep learning can also enhance Photoacoustic imaging (see Deep learning in photoacoustic imaging), addressing challenges like high noise, low contrast, and limited resolution. Deep learning has also been applied to label-free live-cell imaging, where convolutional neural networks predict fluorescence labels from transmitted light images, a technique known as in silico labeling. This method can enable high-throughput, non-invasive cell analysis and phenotyping without the need for traditional fluorescent dyes.

Roborace

Roborace was a competition with autonomously driving, electrically powered vehicles. Founded in 2015 by Denis Sverdlov, it aimed to be the first global championship for autonomous cars. From 2017 to 2019, the official CEO was 2016–17 Formula E champion, Lucas Di Grassi, who later became a member of Roborace’s supervisory board. The series tested their technology and race formats at FIA Formula E Championship events during 2016–2018. In 2019 Roborace organized Season Alpha, which consisted of 4 trial racing events with several independent teams competing against each other for the first time. In 2020–21 Roborace held Season Beta with 7 competing teams. All teams utilized the same chassis and powertrain, but they had to develop their own real-time computing algorithms and artificial intelligence technologies. In May 2022, Arrival, the owner of Roborace, confirmed that they were no longer continuing the Roborace programme, but that they were hoping to find alternative funding. In February 2024, after getting its stock delisted from the Nasdaq, Arrival's UK division entered administration, with future plans of a sale of Arrival and all of its affiliated assets. == Cars == === Robocar === The world's first purpose-built autonomous racing car, Robocar, was designed by Daniel Simon, who previously worked on vehicles for movies such as Tron: Legacy and Oblivion, as well as designing the livery for the 2011 HRT Formula One car. Michelin is the official tyre supplier, and the internal computing processors (Drive PX 2) are Nvidia. The chassis itself is shaped like a teardrop, improving aerodynamic efficiency. The car weighs around 1350 kg and is 4.8 metres (16 ft) long and 2 metres (6.6 ft) wide. It has four electric motors, each with a power of 135 kW producing over 500 hp combined, and utilizes a 840V battery. For navigation, it relies on a mixture of optical systems, radars, lidars and ultrasonic sensors. The vehicle has been demonstrated at speeds of almost 300 km/h (190 mph). === DevBot === Development of the Robocar started in early 2016, with a first outing of a test vehicle, the so-called DevBot, following in the summer of the same year. The test car consisted of the same internal units (battery, motor, electronics) used in the Robocar, but were placed in the chassis of a Ginetta LMP3 car without an engine cover in order to provide better cooling and access. DevBot saw its first public outing at the Formula E pre-season tests in Donington Park in August 2016. After battery issues in Hong Kong caused the development team to abandon their demonstration run, the DevBot successfully drove twelve laps around the Moulay El Hassan Formula E circuit in Marrakesh. Other test tracks included Michelin's testing ground in Ladoux and the Silverstone Stowe Circuit. During testing ahead of the 2017 Buenos Aires ePrix, two DevBot cars raced against each other autonomously, resulting in one of the vehicles crashing on a corner. During the 2017–18 Formula E season, Roborace pitched pro-drifter Ryan Tuerck against a DevBot at the Rome ePrix. At the Berlin ePrix, Roborace held the Human + Machine Challenge, the first race for combined teams of human drivers and AIs using a pair of Devbots. === DevBot 2.0 === An upgraded version of DevBot was announced in late 2018, and after private testing made its public debut in 2019 at the inaugural Season Alpha event. DevBot 2.0 uses the same technology as both Robocar and DevBot, with the main changes being a conversion to being driven on the rear axle only, a lower position for the driver for safety reasons and a bespoke composite bodywork. == Seasons == === Testing === ==== 2016–17 Formula E season ==== Roborace appeared at a number of Formula E events during the 2016–17 Formula E season. However, in this period only test drives with two different DevBots took place. Within the framework of the 2017 Buenos Aires ePrix both DevBot vehicles drove against each other on a race track for the first time. There were also DevBot demonstrations at the 2016 Marrakesh ePrix, 2017 Berlin ePrix, 2017 New York City ePrix and 2017 Montreal ePrix. At the 2017 Paris ePrix, the developers also let a Robocar onto the track for the first time, even though the vehicle only drove the track at walking speed. ==== 2017–18 Formula E season ==== At the start of the 2017/18 Formula E season, the Roborace developers once again tested the DevBot during a public time trial between the Roborace CI and the TV presenter Nicki Shields at the 2017 Hong Kong ePrix. As part of a similar time trial at the 2018 Rome ePrix, drift professional Ryan Tuerck also tested the DevBot. The Human + Machine Challenge was created for the Formula E race on the Berlin ePrix. A team of doctoral students from the Technical University of Munich (TUM) and the University of Pisa programmed the software for the Devbot to drive autonomously around the circuit in Berlin. Afterwards both teams in combination with a human driver competed in a public time trial. The vehicle of the team of the Technical University of Munich finished the Human + Machine Challenge with an average lap time of 91.59 seconds, almost four seconds faster than that of the University of Pisa with 95.36 seconds and thus won the Challenge. At the Goodwood Festival of Speed, Robocar became the first ever fully autonomous race car to complete the Goodwood Hill Climb. The vehicle completed the first official autonomous run on 13 July 2018 within the framework of the event. === Season Alpha (2019) === Season Alpha took place at various locations in Europe and North America with the aim of testing several competition formats using the new DevBot 2.0. The first event was held at the Circuito Monteblanco in Spain, and featured the first race between two fully autonomous cars. The events were not broadcast live, instead short clips on YouTube were released. Two teams were competing: Arrival and the Technical University of Munich. On 7 July 2019, the Roborace DevBot 2.0 car set the first ever autonomous official timed run at Goodwood Festival of Speed, with a time of 66.96 s and a top speed of 162.8 km/h (101.2 mph). This is currently the record for autonomous vehicles. Roborace also set the Guinness World Record for having the fastest autonomous car in the world. The Robocar reached a speed of 282.42 km/h (175.49 mph). === Season Beta (2020–21) === The second testing season took place at various locations between September 2020 and October 2021, featuring 16 races and involving mixed reality elements dubbed "Roborace Metaverse", which is based on Roborace's patented technology. The program of Season Beta competitions has gradually complicating rules arranged in a progression of so-called missions. Each mission consists of two racing rounds — one round per day. A mission plan issued by Roborace for each mission defines its objectives, rules, and point-scoring system. The key objective of Season Beta is to come to the point when the majority of competing teams have developed sufficient capability for wheel-to-wheel racing in Season 1. There were 7 teams competing in Season Beta: Arrival Racing (UK/Russia), Autonomous Racing Graz (Austria), MIT Driverless (United States), Acronis SIT (Switzerland), University of Pisa (Italy), PoliMOVE (Italy), CMU (United States).