Digital journalism

Digital journalism, also known as netizen journalism or online journalism, is a contemporary form of journalism where editorial content is distributed via the Internet, as opposed to publishing via print or broadcast. What constitutes digital journalism is debated amongst scholars. However, the primary product of journalism, which is news and features on current affairs, is presented solely or in combination as text, audio, video, or some interactive forms like storytelling stories or newsgames and disseminated through digital media technology. Fewer barriers to entry, lowered distribution costs and diverse computer networking technologies have led to the widespread practice of digital journalism. It has democratized the flow of information that was previously controlled by traditional media including newspapers, magazines, radio and television. Most readers expect online journalists to be reliable and competent, but these journalists often fail to meet this standard because they have very short deadlines and do not have enough resources to produce decent work. Some have asserted that a greater degree of creativity can be exercised with digital journalism when compared to traditional journalism and traditional media. The digital aspect may be central to the journalistic message and remains, to some extent, within the creative control of the writer, editor and/or publisher. It has been acknowledged that reports of its growth have tended to be exaggerated. In fact, a 2019 Pew survey showed a 16% decline in the time spent on online news sites since 2016. In the United States, reports issued by the Federal Communications Commission (FCC) in 2011 and by the Government Accountability Office (GAO) and the Congressional Research Service (CRS) in 2023 found that increases in newsroom staffing at digital-native news websites from 2008 to 2020 were not offsetting cuts in newsroom staffing among newspapers (which numbered in the tens of thousands of jobs), and that newspapers and television (which had been seeing declining newsroom staffing alongside newspapers) still employed the majority of payrolled newsroom staff in the United States in 2022 while online-only news websites employed less than 10%. The GAO and CRS reports noted further that the reduction in subscription and advertising revenue for the U.S. newspaper industry from 2000 to 2020 that constituted the overwhelming majority of its inflation-adjusted total revenue was not being offset by digital circulation or online advertising despite almost two-thirds of U.S. advertising spending in total by 2020 being online. Also, while the FCC report noted that local television stations in the United States had become some of the largest providers of local news online, the FCC found in a 2021 working paper that inflation-adjusted advertising revenue for television stations fell nationally from 2010 to 2018. == Overview == Digital journalism flows as journalism flows and is difficult to pinpoint where it is and where it is going. In partnership with digital media, digital journalism uses facets of digital media to perform journalist tasks, for example, using the internet as a tool rather than a singular form of digital media. There is no absolute agreement as to what constitutes digital journalism. Mu Lin argues that, "Web and mobile platforms demand us to adopt a platform-free mindset for an all-inclusive production approach – create the [digital] contents first, then distribute via appropriate platforms." The repurposing of print content for an online audience is sufficient for some, while others require content created with the digital medium's unique features like hypertextuality. Fondevila Gascón adds multimedia and interactivity to complete the digital journalism essence. For Deuze, online journalism can be functionally differentiated from other kinds of journalism by its technological component which journalists have to consider when creating or displaying content. Digital journalistic work may range from purely editorial content like CNN (produced by professional journalists) online to public-connectivity websites like Slashdot (communication lacking formal barriers of entry). The difference of digital journalism from traditional journalism may be in its re-conceptualised role of the reporter in relation to audiences and news organizations. The expectations of society for instant information was important for the evolution of digital journalism. However, it is likely that the exact nature and roles of digital journalism will not be fully known for some time. Some researchers even argue that the free distribution of online content, online advertisement and the new way recipients use news could undermine the traditional business model of mass media distributors that is based on single-copy sales, subscriptions and the selling of advertisement space. == History == The first type of digital journalism, called teletext, was invented in the UK in 1970. Teletext is a system allowing viewers to choose which stories they wish to read and see it immediately. The information provided through teletext is brief and instant, similar to the information seen in digital journalism today. The information was broadcast between the frames of a television signal in what was called the vertical blanking interval or VBI. American journalist Hunter S. Thompson relied on early digital communication technology beginning by using a fax machine to report from the 1971 US presidential campaign trail as documented in his book Fear and Loathing on the Campaign Trail. After the invention of teletext was the invention of videotex, of which Prestel was the world's first system, launching commercially in 1979 with various British newspapers, such as the Financial Times lining up to deliver newspaper stories online through it. Videotex closed down in 1986 due to failing to meet end-user demand. American newspaper companies took notice of the new technology and created their own videotex systems, the largest and most ambitious being Viewtron, a service of Knight-Ridder launched in 1981. Others were Keycom in Chicago and Gateway in Los Angeles. All of them had closed by 1986. Next came computer Bulletin Board Systems. In the late 1980s and early 1990s, several smaller newspapers started online news services using BBS software and telephone modems. The first of these was the Albuquerque Tribune in 1989. Computer Gaming World in September 1992 broke the news of Electronic Arts' acquisition of Origin Systems on Prodigy, before its next issue went to press. Online news websites began to proliferate in the 1990s. An early adopter was The News & Observer in Raleigh, North Carolina which offered online news as Nando. Steve Yelvington wrote on the Poynter Institute website about Nando, owned by The N&O, by saying "Nando evolved into the first serious, professional news site on the World Wide Web". It originated in the early 1990s as "NandO Land". It is believed that a major increase in digital online journalism occurred around this time when the first commercial web browsers, Netscape Navigator (1994) and Internet Explorer (1995). By 1996, most news outlets had an online presence. Although journalistic content was repurposed from original text/video/audio sources without change in substance, it could be consumed in different ways because of its online form through toolbars, topically grouped content, and intertextual links. A twenty-four-hour news cycle and new ways of user-journalist interaction web boards were among the features unique to the digital format. Later, portals such as AOL and Yahoo! and their news aggregators (sites that collect and categorize links from news sources) led to news agencies such as The Associated Press to supplying digitally suited content for aggregation beyond the limit of what client news providers could use in the past. Also, Salon, was founded in 1995. In 2001, the American Journalism Review called Salon the Internet's "preeminent independent venue for journalism." In 2008, for the first time, more Americans reported getting their national and international news from the internet, rather than newspapers. Young people aged 18 to 29 now primarily get their news via the Internet, according to a Pew Research Center report. Audiences to news sites continued to grow due to the launch of new news sites, continued investment in news online by conventional news organizations, and the continued growth in internet audiences overall. Sixty-five percent of youth now primarily access the news online. Mainstream news sites are the most widespread form of online news media production. As of 2000, the vast majority of journalists in the Western world now use the internet regularly in their daily work. In addition to mainstream news sites, digital journalism is found in index and category sites (sites without much original content but multiple links to existing news sites), meta- and comment sites (sites about

Connected-component labeling

Connected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected-component labeling is not to be confused with segmentation. Connected-component labeling is used in computer vision to detect connected regions in binary digital images, although color images and data with higher dimensionality can also be processed. When integrated into an image recognition system or human-computer interaction interface, connected component labeling can operate on a variety of information. Blob extraction is generally performed on the resulting binary image from a thresholding step, but it can be applicable to gray-scale and color images as well. Blobs may be counted, filtered, and tracked. Blob extraction is related to but distinct from blob detection. == Overview == A graph, containing vertices and connecting edges, is constructed from relevant input data. The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbors'. An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbors. Connectivity is determined by the medium; image graphs, for example, can be 4-connected neighborhood or 8-connected neighborhood. Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed . == Definition == The usage of the term connected-component labeling (CCL) and its definition is quite consistent in the academic literature, whereas connected-component analysis (CCA) varies both in terminology and in its definition of the problem. Rosenfeld et al. define connected components labeling as the “[c]reation of a labeled image in which the positions associated with the same connected component of the binary input image have a unique label.” Shapiro et al. define CCL as an operator whose “input is a binary image and [...] output is a symbolic image in which the label assigned to each pixel is an integer uniquely identifying the connected component to which that pixel belongs.” There is no consensus on the definition of CCA in the academic literature. It is often used interchangeably with CCL. A more extensive definition is given by Shapiro et al.: “Connected component analysis consists of connected component labeling of the black pixels followed by property measurement of the component regions and decision making.” The definition for connected-component analysis presented here is more general, taking the thoughts expressed in into account. == Algorithms == The algorithms discussed can be generalised to arbitrary dimensions, albeit with increased time and space complexity. === One component at a time === This is a fast and very simple method to implement and understand. It is based on graph traversal methods in graph theory. In short, once the first pixel of a connected component is found, all the connected pixels of that connected component are labelled before going onto the next pixel in the image. This algorithm is part of Vincent and Soille's watershed segmentation algorithm, other implementations also exist. In order to do that a linked list is formed that will keep the indexes of the pixels that are connected to each other, steps (2) and (3) below. The method of defining the linked list specifies the use of a depth or a breadth first search. For this particular application, there is no difference which strategy to use. The simplest kind of a last in first out queue implemented as a singly linked list will result in a depth first search strategy. It is assumed that the input image is a binary image, with pixels being either background or foreground and that the connected components in the foreground pixels are desired. The algorithm steps can be written as: Start from the first pixel in the image. Set current label to 1. Go to (2). If this pixel is a foreground pixel and it is not already labelled, give it the current label and add it as the first element in a queue, then go to (3). If it is a background pixel or it was already labelled, then repeat (2) for the next pixel in the image. Pop out an element from the queue, and look at its neighbours (based on any type of connectivity). If a neighbour is a foreground pixel and is not already labelled, give it the current label and add it to the queue. Repeat (3) until there are no more elements in the queue. Go to (2) for the next pixel in the image and increment current label by 1. Note that the pixels are labelled before being put into the queue. The queue will only keep a pixel to check its neighbours and add them to the queue if necessary. This algorithm only needs to check the neighbours of each foreground pixel once and doesn't check the neighbours of background pixels. The pseudocode is: algorithm OneComponentAtATime(data) input : imageData[xDim][yDim] initialization : label = 0, labelArray[xDim][yDim] = 0, statusArray[xDim][yDim] = false, queue1, queue2; for i = 0 to xDim do for j = 0 to yDim do if imageData[i][j] has not been processed do if imageData[i][j] is a foreground pixel do check its four neighbors(north, south, east, west) : if neighbor is not processed do if neighbor is a foreground pixel do add it to queue1 else update its status to processed end if labelArray[i][j] = label (give label) statusArray[i][j] = true (update status) while queue1 is not empty do For each pixel in the queue do : check its four neighbors if neighbor is not processed do if neighbor is a foreground pixel do add it to queue2 else update its status to processed end if give it the current label update its status to processed remove the current element from queue1 copy queue2 into queue1 end While increase the label end if else update its status to processed end if end if end if end for end for === Two-pass === Relatively simple to implement and understand, the two-pass algorithm, (also known as the Hoshen–Kopelman algorithm) iterates through 2-dimensional binary data. The algorithm makes two passes over the image: the first pass to assign temporary labels and record equivalences, and the second pass to replace each temporary label by the smallest label of its equivalence class. The input data can be modified in situ (which carries the risk of data corruption), or labeling information can be maintained in an additional data structure. Connectivity checks are carried out by checking neighbor pixels' labels (neighbor elements whose labels are not assigned yet are ignored), or say, the north-east, the north, the north-west and the west of the current pixel (assuming 8-connectivity). 4-connectivity uses only north and west neighbors of the current pixel. The following conditions are checked to determine the value of the label to be assigned to the current pixel (4-connectivity is assumed) Conditions to check: Does the pixel to the left (west) have the same value as the current pixel? Yes – We are in the same region. Assign the same label to the current pixel No – Check next condition Do both pixels to the north and west of the current pixel have the same value as the current pixel but not the same label? Yes – We know that the north and west pixels belong to the same region and must be merged. Assign the current pixel the minimum of the north and west labels, and record their equivalence relationship No – Check next condition Does the pixel to the left (west) have a different value and the one to the north the same value as the current pixel? Yes – Assign the label of the north pixel to the current pixel No – Check next condition Do the pixel's north and west neighbors have different pixel values than current pixel? Yes – Create a new label id and assign it to the current pixel The algorithm continues this way, and creates new region labels whenever necessary. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships. Union-find essentially stores labels which correspond to the same blob in a disjoint-set data structure, making it easy to remember the equivalence of two labels by the use of an interface method E.g.: findSet(l). findSet(l) returns the minimum label value that is equivalent to the function argument 'l'. Once the initial labeling and equivalence recording is completed, the second pass merely replaces each pixel label with its equivalent disjoint-set representative element. A faster-scanning algorithm for connected-region extraction is presented below. On the first pass: Iterate through each element of the data by column, then by row (Raster Scanning) If the element is not the background Get the neighboring elements of the current element If there are no neighbors, uniquely

Shyster (expert system)

SHYSTER is a legal expert system developed at the Australian National University in Canberra in 1993. It was written as the doctoral dissertation of James Popple under the supervision of Robin Stanton, Roger Clarke, Peter Drahos, and Malcolm Newey. A full technical report of the expert system, and a book further detailing its development and testing have also been published. SHYSTER emphasises its pragmatic approach, and posits that a legal expert system need not be based upon a complex model of legal reasoning in order to produce useful advice. Although SHYSTER attempts to model the way in which lawyers argue with cases, it does not attempt to model the way in which lawyers decide which cases to use in those arguments. SHYSTER is of a general design, permitting its operation in different legal domains. It was designed to provide advice in areas of case law that have been specified by a legal expert using a bespoke specification language. Its knowledge of the law is acquired, and represented, as information about cases. It produces its advice by examining, and arguing about, the similarities and differences between cases. It derives its name from Shyster: a slang word for someone who acts in a disreputable, unethical, or unscrupulous way, especially in the practice of law and politics. == Methods == SHYSTER is a specific example of a general category of legal expert systems, broadly defined as systems that make use of artificial intelligence (AI) techniques to solve legal problems. Legal AI systems can be divided into two categories: legal retrieval systems and legal analysis systems. SHYSTER belongs to the latter category of legal analysis systems. Legal analysis systems can be further subdivided into two categories: judgment machines and legal expert systems. SHYSTER again belongs to the latter category of legal expert systems. A legal expert system, as Popple uses the term, is a system capable of performing at a level expected of a lawyer: "AI systems which merely assist a lawyer in coming to legal conclusions or preparing legal arguments are not here considered to be legal expert systems; a legal expert system must exhibit some legal expertise itself." Designed to operate in more than one legal domain, and be of specific use to the common law of Australia, SHYSTER accounts for statute law, case law, and the doctrine of precedent in areas of private law. Whilst it accommodates statute law, it is primarily a case-based system, in contradistinction to rule-based systems like MYCIN. More specifically, it was designed in a manner enabling it to be linked with a rule-based system to form a hybrid system. Although case-based reasoning possesses an advantage over rule-based systems by the elimination of complex semantic networks, it suffers from intractable theoretical obstacles: without some further theory it cannot be predicted what features of a case will turn out to be relevant. Users of SHYSTER therefore require some legal expertise. Richard Susskind argues that "jurisprudence can and ought to supply the models of law and legal reasoning that are required for computerized [sic] implementation in the process of building all expert systems in law". Popple, however, believes jurisprudence is of limited value to developers of legal expert systems. He posits that a lawyer must have a model of the law (maybe unarticulated) which includes assumptions about the nature of law and legal reasoning, but that model need not rest on basic philosophical foundations. It may be a pragmatic model, developed through experience within the legal system. Many lawyers perform their work with little or no jurisprudential knowledge, and there is no evidence to suggest that they are worse, or better, at their jobs than lawyers well-versed in jurisprudence. The fact that many lawyers have mastered the process of legal reasoning, without having been immersed in jurisprudence, suggests that it may indeed be possible to develop legal expert systems of good quality without jurisprudential insight. As a pragmatic legal expert system SHYSTER is the embodiment of this belief. A further example of SHYSTER’s pragmatism is its simple knowledge representation structure. This structure was designed to facilitate specification of different areas of case law using a specification language. Areas of case law are specified in terms of the cases and attributes of importance in those areas. SHYSTER weights its attributes and checks for dependence between them. In order to choose cases upon which to construct its opinions, SHYSTER calculates distances between cases and uses these distances to determine which of the leading cases are nearest to the instant case. To this end SHYSTER can be seen to adopt and expand upon nearest neighbor search methods used in pattern recognition. These nearest cases are used to produce an argument (based on similarities and differences between the cases) about the likely outcome in the instant case. This argument relies on the doctrine of precedent; it assumes that the instant case will be decided the same way as was the nearest case. SHYSTER then uses information about these nearest cases to construct a report. The report that SHYSTER generates makes a prediction and justifies that prediction by reference only to cases and their similarities and differences: the calculations that SHYSTER performs in coming to its opinion do not appear in that opinion. Safeguards are employed to warn users if SHYSTER doubts the veracity of its advice. == Results == SHYSTER was tested in four different and disparate areas of case law. Four specifications were written, each representing an area of Australian law: an aspect of the law of trover; the meaning of "authorization [sic]" in copyright law of Australia; the categorisation of employment contracts; and the implication of natural justice in administrative decision-making. SHYSTER was evaluated under five headings: its usefulness, its generality, the quality of its advice, its limitations, and possible enhancements that could be made to it. Despite its simple knowledge representation structure, it has shown itself capable of producing good advice, and its simple structure has facilitated the specification of different areas of law. Appreciating the difficulties encountered by legal expert systems developers in adequately representing legal knowledge can assist in appreciating the shortcomings of digital rights management technologies. Some academics believe future digital rights management systems may become sophisticated enough to permit exceptions to copyright law. To this end SHYSTER's attempt to model "authorization [sic]" in the Copyright Act can be viewed as pioneering work in this field. The term "authorization [sic]" is undefined in the Copyright Act. Consequently, a number of cases have been before the courts seeking answers as to what conduct amounts to authorisation. The main contexts in which the issue has arisen are analogous to permitted exceptions to copyright currently prevented by most digital rights management technologies: "home taping of recorded materials, photocopying in educational institutions and performing works in public". When applied to one case concerning compact cassettes, SHYSTER successfully agreed that Amstrad did not authorise the infringement. 'shyster-myci'n Popple highlighted the most obvious avenue of future research using SHYSTER as the development of a rule-based system, and the linking together of that rule-based system with the existing case-based system to form a hybrid system. This intention was eventually realised by Thomas O’Callaghan, the creator of SHYSTER-MYCIN: a hybrid legal expert system first presented at ICAIL '03, 24–28 June 2003 in Edinburgh, Scotland. MYCIN is an existing medical expert system, which was adapted for use with SHYSTER. MYCIN’s controversial "certainty factor" is not used in SHYSTER-MYCIN. The reason for this is the difficulty in scientifically establishing how certain a fact is in a legal domain. The rule-based approach of the MYCIN part is used to reason with the provisions of an Act of Parliament only. This hybrid system enables the case-based system (SHYSTER) to determine open textured concepts when required by the rule-based system (MYCIN). The ultimate conclusion of this joint endeavour is that a hybrid approach is preferred in the creation of legal expert systems where "it is appropriate to use rule-based reasoning when dealing with statutes, and…case-based reasoning when dealing with cases".

Question (short story)

"Question" is a science fiction short story by American writer Isaac Asimov. The story first appeared in the March 1955 issue of Computers and Automation (thought to be the first computer magazine), and was reprinted in the April 30, 1957, issue of Science World. It is the first of a loosely connected series of stories concerning a fictional supercomputer called Multivac. The story concerns two technicians who are servicing Multivac, and their argument over whether or not the machine is truly intelligent and able to think. Multivac, however, supplies the answer on its own. After the reprint, another author, Robert Sherman Townes, noticed the climax in the last sentence was very similar to one of his own stories, "Problem for Emmy" (Startling Stories, June 1952), and wrote to Asimov about it. After searching in his library, Asimov did find the original story and, although he did not recall having read it, admitted that the endings were pretty similar. He then replied to Townes, apologizing and promising the story would never again be published, and it never was. Asimov mentioned "Question" in an editorial called "Plagiarism" which appeared in the August 1985 issue of Asimov's Science Fiction (although he did not mention Townes' name or the title of either story). "Plagiarism" was reprinted in Asimov's collection Gold (1995).

The Last Question

"The Last Question" is a science fiction short story by American writer Isaac Asimov. It first appeared in the November 1956 issue of Science Fiction Quarterly; and in the anthologies in the collections Nine Tomorrows (1959), The Best of Isaac Asimov (1973), Robot Dreams (1986), The Best Science Fiction of Isaac Asimov (1986), the retrospective Opus 100 (1969), and Isaac Asimov: The Complete Stories, Vol. 1 (1990). While he also considered it one of his best works, "The Last Question" was Asimov's favorite short story of his own authorship, and is one of a loosely connected series of stories concerning a fictional computer called Multivac. Through successive generations, humanity questions Multivac on the subject of entropy. The story blends science fiction, theology, and philosophy. It has been recognized as a counterpoint to Fredric Brown's short short story "Answer", published two years earlier. == History == In conceiving Multivac, Asimov was extrapolating the trend towards centralization that characterized computation technology planning in the 1950s to an ultimate centrally managed global computer. After seeing a planetarium adaptation of his work, Asimov "privately" concluded that the story was his best science fiction yet written. He placed it just higher than "The Ugly Little Boy" (September 1958) and "The Bicentennial Man" (1976). The story asks the question of humanity's fate, and human existence as a whole, highlighting Asimov's focus on important aspects of our future like population growth and environmental issues. "The Last Question" ranks with "Nightfall" (1941) as one of Asimov's best-known and most acclaimed short stories. He wrote in 1973 that he appreciated how easy the story was to write after he had the idea. He was so often approached by fans who remembered the story but not the title, that in one instance he gave the answer, correctly, before the fan had even described the story. == Plot summary == By the year 2061, Multivac, a self-adjusting and self-correcting computer, has allowed mankind to reach beyond the planetary confines of Earth and harness solar energy. Two technicians, Adell and Lupov, celebrate Multivac's role in this development. Over drinks, they discuss that the sun will expire due to the second law of thermodynamics, which states that entropy inevitably increases. When Adell asks Multivac whether this can be reversed, the computer responds that it has insufficient data to answer. In several episodes over ten trillion years, increasingly advanced humans pose the same question to the computers of their time. Each time the computer gives the same response. At the heat death of the universe, the last disembodied consciousness of Man asks the question a final time of a computer that resides in hyperspace before merging with it. After collecting the last data from the dead universe, the computer continues to process it alone and finds an answer to the last question. Having no one to tell it to, it proceeds to demonstrate by saying "LET THERE BE LIGHT!" == Themes == === Philosophy === Although science and religion are frequently presented as having an oppositional relationship, "The Last Question" explores some biblical contexts ("Let there be light"). In Asimov's story, aspects like the great meaning of existence are culminated through both technology and human knowledge. The evolution from Multivac to AC also emulates a sort of cycle of existence. === Dystopian happy ending === Multivac's purpose was conceptualized with a desire for knowledge, promoting the idea that more knowledge will lead to a better and more fruitful future for humanity. However, the computer's answers regarding the future suggest an inevitable exhaustion of the Sun, and this thirst for knowledge becomes an obsession with the future. The story's end displays a dichotomy between annihilation and peace. == Dramatic adaptations == === Planetarium shows === "The Last Question" was first adapted for the Abrams Planetarium at Michigan State University (in 1966), featuring the voice of Leonard Nimoy, as Asimov wrote in his autobiography In Joy Still Felt (1980). It was adapted for the Strasenburgh Planetarium in Rochester, New York (in 1969), under the direction of Ian C. McLennan. It was adapted for the Edmonton Space Sciences Centre in Edmonton, Alberta (early 1970s), under the direction of John Hault. It was adapted for the Gates Planetarium at the Denver Museum of Natural History in 1973 under the direction of Mark B. Peterson It subsequently played at the: Fels Planetarium of the Franklin Institute in Philadelphia in 1973 Planetarium of the Reading School District in Reading, Pennsylvania in 1974 Buhl Planetarium, Pittsburgh in 1974 The Space Transit Planetarium of the Museum of Science in Miami during 1977 Vanderbilt Planetarium in Centerport New York, in 1978, read by singer-songwriter and Long Island resident Harry Chapin. Hansen Planetarium in Salt Lake City, Utah (in 1980 and 1989) A reading of the story was played on BBC Radio 7 in 2008 and 2009. Gates Planetarium in Denver, Colorado (in early 2020) In 1989 Asimov updated the star show adaptation to add in quasars and black holes. The story was adapted as a comic book by Don Thompson and drawn by John Estes in the third issue of ORBiT.

Dataset shift

Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

Willy's Chocolate Experience

Willy's Chocolate Experience was an unlicensed event based on Charlie and the Chocolate Factory that took place in Glasgow, Scotland, in February 2024. The event was promoted as an immersive and interactive family experience, illustrated on a promotional website with "dreamlike" AI-generated images. Once it was discovered that the event was held in a sparsely decorated warehouse, many customers complained, and the police were called to the venue. The event went viral on the Internet and attracted worldwide media attention. The event drew comparisons to the 2008 Lapland New Forest controversy, the 2014 Tumblr fan convention DashCon, and Billy McFarland's 2017 Fyre Festival. == Background and advertising == The event was stated to take place over the weekend of 24–25 February 2024. Promotional material advertised "stunning and intricately designed settings inspired by Roald Dahl's timeless tale" and "an array of delectable treats scattered throughout the experience". Both the website and promotional material used poor-quality AI-generated images, which included several spelling errors such as "cartchy tuns" and "a pasadise of sweet teats" and nonsensical words such as "catgacating" and "exarserdray". Tickets cost up to £35 per person. While the event was being promoted in early February, a Reddit user who saw Facebook advertisements suspected it to be a scam and was surprised that people were apparently buying tickets based solely on AI-generated images. The event was organised by House of Illuminati, a company registered to Billy Coull which claimed to offer "unparalleled immersive experiences". An investigation by Third Force News conducted after the event described Coull's previous "murky involvement in the charity sector." Coull had previously registered several other companies and claimed to work as a "consultant" for the now-defunct brand Empowerity, formerly known as the charity Gowanbank Community Hub. In 2021, Gowanbank was forced to remove claims of a £95-per-ticket fundraising "gala" at DoubleTree Glasgow which had been falsely advertised to feature TV personalities and performers including Gok Wan and Joe Black. Coull had claimed to be a doctor with a fake degree from a false university that provided "metaphysical degrees", and had attempted to use the charity to win the 2022 Glasgow City Council election in the seat of Greater Pollok, though he never registered for the election. In the summer of 2023, he independently published 17 AI-generated books on various topics, including vaccine conspiracy theories. Rolling Stone concluded that House of Illuminati's websites and event descriptions were likely written by an AI chatbot, such as ChatGPT. Three actors were hired to portray "Willy McDuff", a character based on Willy Wonka. One of them, Paul Connell, said that the cast were given one day to learn the script. Another actor playing Willy McDuff was 18-year-old Michael Archibald; the experience was his first ever acting job, and he was given the script at 6 pm on Friday before the event began on Saturday. Kirsty Paterson, an actress who played one of the Oompa-Loompas (called "Wonkidoodles" in the script), said that the job offer had been posted on Indeed.com and offered £500 for two days of work. The day before the event, the actors attended a dress rehearsal at the sparsely decorated venue. They were told that others would be working through the night on the production. When they returned on the day of the event, the venue was in the same condition. Paterson was given her costume an hour before the event opened, saying that "We were just handed an Amazon box that probably arrived that morning." == Script == The script for the event is titled Wonkidoodles at McDuff's Chocolate Factory: A Script, and describes Willy McDuff leading an audience through the Garden of Enchantment and the Twilight Tunnel. Once there, they are confronted by a character called The Unknown, described as "an evil chocolate maker who lives in the walls" who seeks to steal the magical "Anti-Graffiti Gobstopper" from McDuff's Imagination Lab. The gobstopper is "a sweet so powerful, it can make any room sparkle without lifting a finger". McDuff defeats The Unknown by amplifying the power of the gobstopper and causing his enemy to be "gently swept up by a robotic vacuum, humorously ending the confrontation". The script was unusual in that it included stage directions for the audience, and descriptions of their reactions. Connell described it as "15 pages of AI-generated gibberish of me just monologuing these mad things", and compared the vacuum cleaner plot point to that of the Nintendo video game Luigi's Mansion. Interviewed after the event, Coull claimed to have written the script himself, using AI only to "check spelling, grammar, and continuity" as he said he had dyslexia. == Event == The event was held at the Box Hub Warehouse event space in Whiteinch, an industrial area of Glasgow. Customers described the venue as "little more than an abandoned, empty warehouse", with set dressings including a small bouncy castle, AI-generated backdrop images pinned to some of the walls, and props which were "strewn about on bare concrete floors". The venue's windows were dirty and its air conditioning systems were left exposed. Paterson has stated that by the time she saw the venue, she had already signed her contract and "didn't want to disappoint the kids", and thus chose to proceed with the work. The Unknown was played by a 16-year-old actress named Felicia Dawkins, who wore a silver mask and a black cloak. Young children were frightened by the character, who appeared from behind a large rectangular mirror. Despite the script calling for The Unknown to be defeated with a vacuum cleaner, no such prop was provided, and actors were instead asked to improvise. Connell said that he and other employees were told to give each child "two jelly beans and a quarter of a cup of lemonade", although the limited supply of jelly beans quickly ran out. Paterson and another "Wonkidoodle" actress, Jenny Fogarty, said that after the first three 45-minute performances, the cast were told to abandon the script and instead let guests walk through the venue, a process that Paterson said took "about two minutes". The character of The Unknown, previously introduced as the main antagonist, was now "scaring children for no reason". One of the actors playing McDuff improvised the idea that children should pull a "silly face" at The Unknown to scare them away, but Dawkins said that, in other cases, she "just had to awkwardly walk back to my corner". Connell was told he would be given a 15-minute break every 45 minutes, but on the day of the event, he played Willy McDuff for three and a half hours without a break. After returning from a lunch break, Connell encountered a crowd of customers demanding refunds from Coull, and the other actors were unsure what to do next. After being told that the event was now cancelled halfway through its opening day, the actors left and went to a pub. Upon returning to the venue some time later, Connell said that he felt "the threat of violence had become quite high" and that there were two police vans and two squad cars at the scene. == Customer reviews and response == Willy's Chocolate Experience was widely criticised by those who attended it, many of whom demanded refunds. One customer, who had driven with his children for two hours to reach the event, described it as an "absolute con". Other visitors who arrived after the event was closed and were not informed of its cancellation requested compensation for wasted rail fares. Following the event's cancellation, Coull offered to refund 850 people, a statement repeated by the event's Facebook page. Some Facebook users stated that they had received their money back. Paterson and Fogarty stated that they only received half of their paycheque. Box Hub, the organisation that had rented the warehouse to House of Illuminati, issued an apology on House of Illuminati's behalf, stating that they "either have no regards for the families and young children they have disappointed or are too embarrassed to comment", and offered to provide a venue free of charge for those who attended the event. House of Illuminati later stated that they would not host any future events. Coull deleted his LinkedIn profile, his YouTube channel, and his personal website in response to the controversy. A few days after the event, Connell said he felt that Coull was "probably one of the most disliked people in Glasgow right now". In an interview with The Sunday Times, Coull apologised for how the event turned out, saying he would accept responsibility. == Fundraising == In an interview with Wired magazine, Connell stated that he and the other actors were working with parents to provide a free show for the children who attended. Some items from the event were later auctioned for charity. The venue auctioned the leftover hand-written "even