Computer Graphics: Principles and Practice is a textbook written by James D. Foley, Andries van Dam, Steven K. Feiner, John Hughes, Morgan McGuire, David F. Sklar, and Kurt Akeley and published by Addison–Wesley. First published in 1982 as Fundamentals of Interactive Computer Graphics, it is widely considered a classic standard reference book on the topic of computer graphics. It is sometimes known as the bible of computer graphics (due to its size). == Editions == === First Edition === The first edition, published in 1982 and titled Fundamentals of Interactive Computer Graphics, discussed the SGP library, which was based on ACM's SIGGRAPH CORE 1979 graphics standard, and focused on 2D vector graphics. === Second Edition === The second edition, published 1990, was completely rewritten and covered 2D and 3D raster and vector graphics, user interfaces, geometric modeling, anti-aliasing, advanced rendering algorithms and an introduction to animation. The SGP library was replaced by SRGP (Simple Raster Graphics Package), a library for 2D raster primitives and interaction handling, and SPHIGS (Simple PHIGS), a library for 3D primitives, which were specifically written for the book. === Second Edition in C === In the second edition in C, published in 1995, all examples were converted from Pascal to C. New implementations for the SRGP and SPHIGS graphics packages in C were also provided. === Third Edition === A third edition covering modern GPU architecture was released in July 2013. Examples in the third edition are written in C++, C#, WPF, GLSL, OpenGL, G3D, or pseudocode. == Awards == The book has won a Front Line Award (Hall of Fame) in 1998.
Active learning (machine learning)
Active learning is a special case of machine learning in which a learning algorithm can interactively query a human user (or some other information source) to label new data points with the desired outputs. The human user must possess expertise in the problem domain, including the ability to consult authoritative sources when necessary. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the teacher for labels. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. However, there is a risk that the algorithm is overwhelmed by uninformative examples. Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of machine learning (e.g. conflict and ignorance) with adaptive, incremental learning policies in the field of online machine learning. Using active learning allows for faster development of a machine learning algorithm, when comparative updates would require a quantum or super computer. Large-scale active learning projects may benefit from crowdsourcing frameworks such as Amazon Mechanical Turk that include many humans in the active learning loop. == Definitions == Let T be the total set of all data under consideration. For example, in a protein engineering problem, T would include all proteins that are known to have a certain interesting activity and all additional proteins that one might want to test for that activity. During each iteration, i, T is broken up into three subsets T K , i {\displaystyle \mathbf {T} _{K,i}} : Data points where the label is known. T U , i {\displaystyle \mathbf {T} _{U,i}} : Data points where the label is unknown. T C , i {\displaystyle \mathbf {T} _{C,i}} : A subset of TU,i that is chosen to be labeled. Most of the current research in active learning involves the best method to choose the data points for TC,i. == Scenarios == Pool-based sampling: In this approach, which is the most well known scenario, the learning algorithm attempts to evaluate the entire dataset before selecting data points (instances) for labeling. It is often initially trained on a fully labeled subset of the data using a machine-learning method such as logistic regression or SVM that yields class-membership probabilities for individual data instances. The candidate instances are those for which the prediction is most ambiguous. Instances are drawn from the entire data pool and assigned a confidence score, a measurement of how well the learner "understands" the data. The system then selects the instances for which it is the least confident and queries the teacher for the labels. The theoretical drawback of pool-based sampling is that it is memory-intensive and is therefore limited in its capacity to handle enormous datasets, but in practice, the rate-limiting factor is that the teacher is typically a (fatiguable) human expert who must be paid for their effort, rather than computer memory. Stream-based selective sampling: Here, each consecutive unlabeled instance is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint. As contrasted with Pool-based sampling, the obvious drawback of stream-based methods is that the learning algorithm does not have sufficient information, early in the process, to make a sound assign-label-vs ask-teacher decision, and it does not capitalize as efficiently on the presence of already labeled data. Therefore, the teacher is likely to spend more effort in supplying labels than with the pool-based approach. Membership query synthesis: This is where the learner generates synthetic data from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if the dataset is small. The challenge here, as with all synthetic-data-generation efforts, is in ensuring that the synthetic data is consistent in terms of meeting the constraints on real data. As the number of variables/features in the input data increase, and strong dependencies between variables exist, it becomes increasingly difficult to generate synthetic data with sufficient fidelity. For example, to create a synthetic data set for human laboratory-test values, the sum of the various white blood cell (WBC) components in a white blood cell differential must equal 100, since the component numbers are really percentages. Similarly, the enzymes alanine transaminase (ALT) and aspartate transaminase (AST) measure liver function (though AST is also produced by other tissues, e.g., lung, pancreas) A synthetic data point with AST at the lower limit of normal range (8–33 units/L) with an ALT several times above normal range (4–35 units/L) in a simulated chronically ill patient would be physiologically impossible. == Query strategies == Algorithms for determining which data points should be labeled can be organized into a number of different categories, based upon their purpose: Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al. propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Expected model change: label those points that would most change the current model. Expected error reduction: label those points that would most reduce the model's generalization error. Exponentiated Gradient Exploration for Active Learning: In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most Querying from diverse subspaces or partitions: When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original feature space. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. Variance reduction: label those points that would minimize output variance, which is one of the components of error. Conformal prediction: predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction. Mismatch-first farthest-traversal: The primary selection criterion is the prediction mismatch between the current model and nearest-neighbour prediction. It targets on wrongly predicted data points. The second selection criterion is the distance to previously selected data, the farthest first. It aims at optimizing the diversity of selected data. User-centered labeling strategies: Learning is accomplished by applying dimensionality reduction to graphs and figures like scatter plots. Then the user is asked to label the compiled data (categorical, numerical, relevance scores, relation between two instances). A wide variety of algorithms have been studied that fall into these categories. While the traditional AL strategies can achieve remarkable performance, it is often challenging to predict in advance which strategy is the most suitable in a particular situation. In recent years, meta-learning algorithms have been gaining in popularity. Some of them have been proposed to tackle the problem of learning AL strategies instead of relying on manually designed strategies. A benchmark which compares 'meta-learning approaches to active learning' to 'traditional heuristic-based Active Learning' may give intuitions if 'Learning active learning' is at the crossroads == Minimum marginal hyperplane == Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, W, of each u
Deluxe Media
Deluxe Media Inc., also known simply as Deluxe and formerly Deluxe Entertainment Services Group, Inc., is an American multinational multimedia and entertainment service provisions company owned by Platinum Equity, founded in 1915 by Hungarian-born American film producer William Fox and headquartered in Burbank, California. The company services multiple clients in the film, television, digital content and advertising industries across the globe, and has been recognized with 10 Academy Awards for scientific and technical achievements, including developments in CinemaScope pictures (as part of 20th Century Fox) and more recently for a process of creating archival separations from digital image data. == History == Deluxe began as a film processing laboratory established in 1915 by William Fox under the name De Luxe as part of his eponymous film conglomerate corporation in Fort Lee, New Jersey. In 1916, Fox Film Corporation opened its studio in Hollywood on 13 acres at Sunset and Western. The first Deluxe film laboratory on the west coast was built on the south side of the lot (Fernwood and Serrano), and the laboratory was moved to the new Fox studios building on Manhattan's west side in 1919, where it remained for over 40 years. The "business manager" (later president) of the laboratory was Alan E. Freedman, who guided the company into the 1960s. In 1927, Fox (Deluxe) received a patent for sound-on-film, the Fox Movietone system. In 1927, "Sunrise: A Song of Two Humans," an early Movietone film, opened. Fox Movietone News, ran weekly in theaters until 1963. During the Great Depression, Fox Film Corporation encountered financial difficulties. Among the actions taken to maintain liquidity, Fox sold the laboratories in 1932 to Freedman, who renamed the operation Deluxe. Under Freedman's leadership, Deluxe added two more plants in Chicago and Toronto. In January 1934, Fox was granted an option to rebuy DeLuxe before December 31, 1938. On 31 May 1935, under Sidney Kent, Fox merged his film company with Twentieth Century Pictures to form The Twentieth Century-Fox Film Corporation following a bank-infused reorganisation. The merged company then exercised this option in July 1936, with Freedman remaining as president. In 1953, Deluxe developed the widescreen format CinemaScope. Titles included "There's No Business Like Show Business" (1954) and "The Seven Year Itch" (1955). Other innovations included the processing and sound striping of CinemaScope, and were patented and/or received Academy awards. In 1962 Freedman retired. In the 1960s, Deluxe closed its New York plant, followed by its plants in Chicago and Toronto, as motion picture production declined on the East Coast. In 1972, Deluxe began large volume videocassette production, with a billion by 1996. In 1990, The Rank Organisation acquired Deluxe from Fox. In 2000, Deluxe began large volume DVD production. In 2006, The Rank Organisation sold Deluxe Film Group to MacAndrews & Forbes, renamed Deluxe Entertainment Services Group. On 9 February 2012, Deluxe acquired Hong Kong–based visual effects and post-production company, Centro Digital Pictures, with its founder John Chu remaining as president while reporting to Alaric McAusland, managing director for Deluxe in Australia. In May 2014, Deluxe shut down its Los Angeles plant at Sunset & Western Studios complex, where other studios themselves were demolished way back in 1971. Also that same year, Deluxe closed the Hollywood film labs, and they gave thousands of orphaned film elements to the Academy Film Archive. The Deluxe Laboratories Collection at the Academy Film Archive consists of over 7,500 35mm and 16mm film elements of various motion pictures dating back to the early 1960s. On 22 April 2015, Deluxe and its longtime competitor, Technicolor S.A., announced that they had entered into a binding agreement to create a new joint venture known as Deluxe Technicolor Digital Cinema which will specialize in cinema mastering, distribution and management services. Deluxe got acquired on 4 September 2019 by creditors in a debt-for-equity swap to avoid bankruptcy. On 3 October 2019, Deluxe filed for bankruptcy, pending in the Southern District of New York. The same month on the 24th, the company received court approval to emerge from bankruptcy with a comprehensive restructuring plan. On July 1, 2020, Platinum Equity agreed to acquire the distribution division of Deluxe and re-unite with former CEO Cyril Drabinsky who would merge CineVizion, a film distribution company he founded after leaving Deluxe in 2016, into it. The companies Company 3 and Method Studios which formed the creative divisions of Deluxe were sold to Framestore in November 2020.
Digital citizen
The term digital citizen is used with different meanings. According to the definition provided by Karen Mossberger, one of the authors of Digital Citizenship: The Internet, Society, and Participation, digital citizens are "those who use the internet regularly and effectively". In this sense, a digital citizen is a person who uses information technology (IT) to engage in society, politics, and government. More recent elaborations of the concept define digital citizenship as the self-enactment of people’s role in society through the use of digital technologies, stressing the empowering and democratizing characteristics of the citizenship idea. These theories aim at taking into account the ever-increasing datafication of contemporary societies (symbolically linked to the Snowden leaks), which has called into question the meaning of “being (digital) citizens in a datafied society”. This condition is also referred to as the “algorithmic society”, characterised by the increasing datafication of social life and the pervasive presence of surveillance practices – see surveillance and surveillance capitalism, the use of artificial intelligence, and Big Data. Datafication presents crucial challenges for the very notion of citizenship, so that data collection can no longer be seen as an issue of privacy alone so that:We cannot simply assume that being a citizen online already means something (whether it is the ability to participate or the ability to stay safe) and then look for those whose conduct conforms to this meaning Instead, the idea of digital citizenship shall reflect the idea that we are no longer mere “users” of technologies since they shape our agency both as individuals and as citizens. Digital citizenship refers to the responsible and respectful use of technology to engage online, evaluate information, and protect human rights. It encompasses skills for communication, collaboration, empathy, privacy protection, and security to prevent data breaches and identity theft. == Digital citizenship in the "algorithmic society" == In the context of the algorithmic society, the question of digital citizenship "becomes one of the extents to which subjects are able to challenge, avoid or mediate their data double in this datafied society”. These reflections put the emphasis on the idea of the digital space (or cyberspace) as a political space where the respect of fundamental rights of the individual shall be granted (with reference both to the traditional ones as well as to new specific rights of the internet [see “digital constitutionalism”]) and where the agency and the identity of the individuals as citizens is at stake. This idea of digital citizenship is thought to be not only active but also performative, in the sense that “in societies that are increasingly mediated through digital technologies, digital acts become important means through which citizens create, enact and perform their role in society.” In particular, for Isin and Ruppert this points towards an active meaning of (digital) citizenship based on the idea that we constitute ourselves as digital citizen by claiming rights on the internet, either by saying or by doing something. == Types of digital participation == People who characterize themselves as digital citizens often use IT extensively—creating blogs, using social networks, and participating in online journalism. Although digital citizenship begins when any child, teen, or adult signs up for an email address, posts pictures online, uses e-commerce to buy merchandise online, and/or participates in any electronic function that is B2B or B2C, the process of becoming a digital citizen goes beyond simple internet activity. According to Thomas Humphrey Marshall, a British sociologist known for his work on social citizenship, a primary framework of citizenship comprises three different traditions: liberalism, republicanism, and ascriptive hierarchy. Within this framework, the digital citizen needs to exist in order to promote equal economic opportunities and increase political participation. In this way, digital technology helps to lower the barriers to entry for participation as a citizen within a society. They also have a comprehensive understanding of digital citizenship, which is the appropriate and responsible behavior when using technology. Since digital citizenship evaluates the quality of an individual's response to membership in a digital community, it often requires the participation of all community members, both visible and those who are less visible. A large part in being a responsible digital citizen encompasses digital literacy, etiquette, online safety, and an acknowledgement of private versus public information. The development of digital citizen participation can be divided into two main stages. The first stage is through information dissemination, which includes subcategories of its own: static information dissemination, characterized largely by citizens who use read-only websites where they take control of data from credible sources in order to formulate judgments or facts. Many of these websites where credible information may be found are provided by the government. dynamic information dissemination, which is more interactive and involves citizens as well as public servants. Both questions and answers can be communicated, and citizens have the opportunity to engage in question-and-answer dialogues through two-way communication platforms The second stage of digital citizen participation is citizen deliberation, which evaluates what type of participation and role that they play when attempting to ignite some sort of policy change. static citizen participants can play a role by engaging in online polls as well as through complaints and recommendations sent up, mainly toward the government who can create changes in policy decisions. dynamic citizen participants can deliberate amongst others on their thoughts and recommendations in town hall meetings or various media sites. One potential advantage of online participation through digital citizenship is increased social inclusion. In a report on civic engagement, citizen-powered democracy can be initiated either through information shared through the web, direct communication signals made by the state toward the public, and social media tactics from both private and public companies. In fact, it was found that the community-based nature of social media platforms allow individuals to feel more socially included and informed about political issues that peers have also been found to engage with, otherwise known as a "second-order effect." Understanding strategic marketing on social media would further explain social media customers’ participation. Two types of opportunities rise as a result, the first being the ability to lower barriers that can make exchanges much easier. In addition, they have the chance to participate in transformative disruption, giving people who have a historically lower political engagement to mobilize in a much easier and convenient fashion. Nonetheless, there are several challenges that face the presence of digital technologies in political participation. Both current as well as potential challenges can create significant risks for democratic processes. Not only is digital technology still seen as relatively ambiguous, it was also seen to have "less inclusivity in democratic life." Demographic groups differ considerably in the use of technology, and thus, one group could potentially be more represented than another as a result of digital participation. Another primary challenge consists in the ideology of a "filter bubble" effect. Alongside a tremendous spread of false information, internet users could reinforce existing prejudices and assist in polarizing disagreements in the public sphere. This can lead to misinformed voting and decisions based on exposure rather than on pure knowledge. A communication technology director, Van Dijk, stated, "Computerized information campaigns and mass public information systems have to be designed and supported in such a way that they help to narrow the gap between the 'information rich' and 'information poor' otherwise the spontaneous development of ICT will widen it." Access and equivalent amounts of knowledge behind digital technology must be equivalent in order for a fair system to put into place. Alongside a lack of evidenced support for technology that can be proven to be safe for citizens, the OECD has identified five struggles for the online engagement of citizens: Scale: To what extent can a society allow every individual's voice to be heard, but also not be lost in the mass debate? This can be extremely challenging for the government, which may not effectively know how to listen and respond to each individual contribution. Capacity: How can digital technology offer citizens more information on public policy-making? The opportunity for citizens to debate with one another is lacking for acti
Watcher Entertainment
Watcher Entertainment is an American digital media and entertainment company, founded by Steven Lim, Shane Madej, and Ryan Bergara. The channel features a variety of comedy, paranormal, gaming, cooking, and educational shows – typically hosted by Madej and Bergara. The Watcher main channel has over 400 million views and 2.9 million subscribers. The company launched their own streaming service, WatcherTV, in 2024. == History == === Buzzfeed and the creation of Watcher Entertainment (2019) === Madej, Bergara, and Lim met while working at the digital media company BuzzFeed. Madej and Bergara were co-hosts of the popular true crime and paranormal series Buzzfeed Unsolved and Lim was the creator and co-host of the popular internet food series Worth It. Both shows generated a combined 2 billion views with 15 billion minutes watched, making them two of the most successful shows on Buzzfeed. In 2019, Madej, Bergara, and Lim quit Buzzfeed as full-time employees. They each stayed on as contracted employees to complete their respective shows. The trio credited their departure to their desire to found a company with more "creative opportunities" and the ability to have "actual ownership of the content" made. The company is majority-owned by the trio. They received funding from Neuro, a caffeinated energy gum company; Boba Guys, a bubble-milk tea chain; and Steve Chen, a YouTube co-founder. Watcher Entertainment gained its name from the infamous true crime case of The Westfield Watcher, which Madej and Bergara had covered in a Buzzfeed Unsolved episode. The trio began the company as co-CEOs; however, Bergara and Madej stepped down from the role in 2023 to focus on content creation. === Watcher Entertainment (2020–present) === Watcher Entertainment was launched in January 2020. The company debuted with seven series and a weekly interactive talk show: Homemade, Grocery Run, Weird Wonderful World, Puppet History, Tourist Trapped, Top 5 Beatdown, Spooky Small Talk, and Watcher Weekly. The channel reached over 300,000 subscribers within the first month of launching. They were signed by talent agency CAA in the same year. Puppet History, a comedy educational game show, quickly became a success and gained a significant audience. The show, which stars Madej as a fluffy blue puppet, has spanned seven seasons and led to the creation of a variety of merchandise. It has featured a variety of guest stars on every episode, including other former Buzzfeed employees. The company premiered its first horror series in July 2020 with Are You Scared?. Following the end of Buzzfeed Unsolved: Supernatural in 2021, the studio premiered its highly anticipated successor, Ghost Files, just months after. The show followed a similar format, with Bergara and Madej investigating reportedly haunted locations and attempting to find evidence of the paranormal. The show had significant success, with critics noting the improved production value and design from its predecessor. In 2023, Bergara and Madej went on a tour across the United States to premiere episodes of the second season. The series was renewed for a third season, which they premiered with a United Kingdom tour in 2024. That year, Watcher premiered a light-hearted successor to the graphic Buzzfeed Unsolved: True Crime, with Mystery Files. In this rendition, Bergara or Madej present unusual crime or supernatural mysteries with a collection of theoretical solutions. The show was met with great success by audiences and was quickly renewed for a second season. Watcher launched a second channel, 'WatcherPodcasts,' in October 2023. The channel features podcasts hosted by Lim, Bergara, and Madej. On April 19, 2024, the company launched its Watcher streaming service. Going forward, all of their content would be released exclusively on the service and the company planned to transition away from YouTube. This announcement was met with overwhelmingly negative reactions from their fans, with many calling for the company to reverse the decision. Additionally, their YouTube channel lost over 50,000 subscribers in the day following the announcement. On April 22, 2024, the company issued an apology and changed their decision, stating that episodes would instead be released on the streaming service a month before their premiere on YouTube. In May 2025, the channel 'Andrew, Steven, and Adam' was launched as a subsidiary of Watcher with the release of the second season of Travel Season. Travel Season is a spiritual successor to Worth It with the same cast of Lim, Andrew Ilnyckyj, and Adam Bianchi. The channel focuses on food reviews and the behind of the scenes of making it. The main channel is now set to be focused primarily on horror, creepy, and paranormal content. == Channels and shows == === Watcher === ==== Current shows ==== Puppet History (2020–present) A whimsical puppet host walks through history's wildest tales as two guests compete for the title of history wizard. Making Watcher (2020–present) What happens when 3 creators with no business experience decide to make their own company? A multi-series documentary on the journey of creating Watcher Entertainment. Weird Wonderful World (2020–present) Curious pals Madej and Bergara explore lesser-known destinations and the fascinating subcultures within them. Too Many Spirits (2020–present) Bergara and Madej read and rate audience-submitted ghost stories, while getting progressively more tipsy drinking cocktails prepared by Steven and Ricky Wang. Top 5 Beatdown (2020–present) Bergara and Madej compare asinine top 5 lists with a topical expert, inspiring surprisingly heated debate. Are You Scared? (2020–2022, 2024–present) Bergara reads the internet's scariest stories (some true, some false) to his pal Madej as they try to figure out if the story is experienced or imagined. Ghost Files (2021–present) Bergara and Madej investigate haunted locations to discover whether something paranormal really lies within. Mystery Files (2023–present) Bergara and Madej present unusual crime or supernatural mysteries with a collection of theoretical solutions. Survival Mode (2023–present) Bergara and Madej play a variety of horror games and give a spooky review. ==== Former shows ==== Grocery Run (2020) Madej interviews a celeb on their typical grocery run, before returning to their home to help prepare their signature dish. Homemade (2020) Lim examines popular food by comparing an elevated restaurant experience vs. a home-cooked experience. Spooky Small Talk (2020) Bergara interviews celebs in a haunted house, exposing their fears and if they can manage it, a little about themselves too. Social Distancing D&D (2020) Socially Distance along with the motley gang of Watchers as they embark on a great quest of Dungeons and Dragons! Tourist Trapped (2020) Begara and Madej battle for tour guide supremacy, highlighting the two sides of a city, tourist attractions and hidden gems. Watcher Weekly (2020–2021) Lim, Bergara, and Madej chat the week's content and answer questions, with the occasional musical guest! Dish Granted (2021–2022) A show where host and amateur home cook Lim attempts to create the most extravagant dishes for his friends. Pretty Historic (2022) Selorm and guests explore beauty and fashion trends from history, try them, and decide whether the trends should remain in the past or come to the present. Worth a Shot (2022–2023) Take a seat at a Master Mixologist's bar as pro Ricky Wang crafts the unbelievable into a digestible drink for his guests. === Watcher Podcast === ==== Current shows ==== Get Scared with Shane, Ryan, and Steven (2023–2025) Previously named 'Pod Watcher' Madej, Bergara, and Lim host a weekly podcasts, exploring a variety of topics and answering viewer questions. Guests occasionally appear to replace one host. Matt Real serves as the producer and a fourth voice for the podcast. For Your Amusement (2023–present) Bergara explores a variety of topics surrounding theme parks. === Andrew, Steven, and Adam === Travel Season (2024–present) Lim reunites with Worth It costars Andrew Ilnyckyj and Adam Bianchi in a new food review show. == Awards and nominations ==
Large language model
A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. LLMs can typically generate, summarize, translate and analyze text in many contexts, and are a foundational technology behind modern chatbots. Biased or inaccurate training data can make an LLM's output less reliable. As of 2026, the most capable LLMs are based on transformer architectures, which, according to the 2017 paper "Attention Is All You Need", can be more efficient and parallelizable than earlier statistical and recurrent neural network models. Benchmark evaluations for LLMs attempt to measure model reasoning, factual accuracy, alignment, and safety. == History == Before the emergence of transformer-based models in 2017, some language models were considered large relative to the computational and data constraints of their time. In the early 1990s, IBM's statistical models pioneered word alignment techniques for machine translation, laying the groundwork for corpus-based language modeling. In 2001, a smoothed n-gram model, such as those employing Kneser–Ney smoothing, trained on 300 million words, achieved state-of-the-art perplexity on benchmark tests. During the 2000s, with the rise of widespread internet access, researchers began compiling massive text datasets from the web ("web as corpus") to train statistical language models. Moving beyond n-gram models, researchers started in 2000 to use neural networks as language models. Following the breakthrough of deep neural networks in image classification around 2012, similar architectures were adapted for language tasks. This shift was marked by the development of word embeddings (e.g., Word2Vec by Mikolov in 2013) and sequence-to-sequence (seq2seq) models using LSTM. In 2016, Google transitioned its translation service to neural machine translation (NMT), replacing statistical phrase-based models with deep recurrent neural networks. These early NMT systems used LSTM-based encoder-decoder architectures, as they preceded the invention of transformers. At the 2017 NeurIPS conference, Google researchers introduced the transformer architecture in their landmark paper "Attention Is All You Need". This paper's goal was to improve upon 2014 seq2seq technology, and was based mainly on the attention mechanism developed by Bahdanau et al. in 2014. The following year in 2018, BERT was introduced and quickly became "ubiquitous". Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model. Academic and research usage of BERT began to decline in 2023, following rapid improvements in the abilities of decoder-only models (such as GPT) to solve tasks via prompting. Although decoder-only GPT-1 was introduced in 2018, it was GPT-2 in 2019 that caught widespread attention because OpenAI claimed to have initially deemed it too powerful to release publicly, out of fear of malicious use. GPT-3 in 2020 went a step further and as of 2025 is available only via API with no offering of downloading the model to execute locally. But it was the consumer-facing chatbot ChatGPT in late 2022 that received extensive media coverage and public attention by 2023. The 2023 GPT-4 was praised for its increased accuracy and as a "holy grail" for its multimodal capabilities. OpenAI did not reveal the high-level architecture and the number of parameters of GPT-4. The release of ChatGPT led to an uptick in LLM usage across several research subfields of computer science, including robotics, software engineering, and societal impact work. In 2024, OpenAI released the reasoning model OpenAI o1, which generates long chains of thought before returning a final answer. Many LLMs with parameter counts comparable to those of OpenAI's GPT series have been developed. Since 2022, weights-available models have been gaining popularity, especially at first with BLOOM and LLaMA, though both have restrictions on usage and deployment. Mistral AI's open-weight models Mistral 7B and Mixtral 8x7B have a more permissive Apache License. In January 2025, DeepSeek released DeepSeek R1, a 671-billion-parameter open-weight model that performs comparably to OpenAI o1 but at a much lower price per token for users. Since 2023, many LLMs have been trained to be multimodal, having the ability to also process or generate other types of data, such as images, audio, or 3D meshes. Open-weight LLMs have become more influential since 2023. Per Vake et al. (2025), community-driven contributions to open-weight models improve their efficiency and performance via collaborative platforms such as Hugging Face. == Dataset preprocessing == === Tokenization === As machine learning algorithms process numbers rather than text, the text must be converted to numbers. In the first step, a vocabulary is decided upon, then integer indices are arbitrarily but uniquely assigned to each vocabulary entry, and finally, an embedding is associated with the integer index. Algorithms include byte-pair encoding (BPE) and WordPiece. There are also special tokens serving as control characters, such as [MASK] for masked-out token (as used in BERT), and [UNK] ("unknown") for characters not appearing in the vocabulary. Also, some special symbols are used to denote special text formatting. For example, "Ġ" denotes a preceding whitespace in RoBERTa and GPT and "##" denotes continuation of a preceding word in BERT. For example, the BPE tokenizer used by the legacy version of GPT-3 would split tokenizer: texts -> series of numerical "tokens" as Tokenization also compresses the datasets. Because LLMs generally require input to be an array that is not jagged, the shorter texts must be "padded" until they match the length of the longest one. ==== Byte-pair encoding ==== As an example, consider a tokenizer based on byte-pair encoding. In the first step, all unique characters (including blanks and punctuation marks) are treated as an initial set of n-grams (i.e. initial set of uni-grams). Successively the most frequent pair of adjacent characters is merged into a bi-gram and all instances of the pair are replaced by it. All occurrences of adjacent pairs of (previously merged) n-grams that most frequently occur together are then again merged into even lengthier n-gram, until a vocabulary of prescribed size is obtained. After a tokenizer is trained, any text can be tokenized by it, as long as it does not contain characters not appearing in the initial-set of uni-grams. === Dataset cleaning === In the context of training LLMs, datasets are typically cleaned by removing low-quality, duplicated, or toxic data. Cleaned datasets can increase training efficiency and lead to improved downstream performance. A trained LLM can be used to clean datasets for training a further LLM. With the increasing proportion of LLM-generated content on the web, data cleaning in the future may include filtering out such content. LLM-generated content can pose a problem if the content is similar to human text (making filtering difficult) but of lower quality (degrading performance of models trained on it). === Synthetic data === Training of largest language models might need more linguistic data than naturally available, or that the naturally occurring data is of insufficient quality. In these cases, synthetic data might be used. == Training == An LLM is a type of foundation model (large X model) trained on language. LLMs can be trained in different ways. In particular, GPT models are first pretrained to predict the next word on a large amount of data, before being fine-tuned. === Cost === Substantial infrastructure is necessary for training the largest models. The tendency towards larger models is visible in the list of large language models. For example, the training of GPT-2 (i.e. a 1.5-billion-parameter model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameter model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million. The qualifier "large" in "large language model" is inherently vague, as there is no definitive threshold for the number of parameters required to qualify as "large". === Fine-tuning === Before being fine-tuned, most LLMs are next-token predictors. The fine-tuning shapes the LLM's behavior via techniques like reinforcement learning from human feedback (RLHF) or constitutional AI. Instruction fine-tuning is a form of supervised learning used to teach LLMs to follow user instructions. In 2022, OpenAI demonstrated InstructGPT, a version of GPT-3 similarly fine-tuned to follow instructions. Reinforcement learning from human feedback (RLHF) involves training a reward model to predict which text humans prefer. Then, the LLM can be fine-tuned through reinforcement learning to better satisfy this reward model. Since humans typically prefer truthful, helpful and harmless answers, RLHF favors such answers. == Architecture == LLMs are generally based on the tra
Contact center telephony
In marketing, contact center telephony is the communication and collaboration system used by businesses to either manage high volumes of inbound queries or outbound telephone calls keeping their workforce or agents productive and in control to serve or acquire customers. This business communication system is an extension of computer telephony integration (CTI). == Overview == The interactions between callers and customer service representatives are supported by the collective system of computers, telephones and the Internet. The shift from CTI to contact center telephony is marked by the sheer change in the customer’s behavior when it comes to communication. Means customers are no longer confined only to voice-based communication i.e. phone to connect with their customer service departments. In addition, they are making use of email, SMS, chat, social media, and other virtual contact channels. This is also the reason for the shift in nomenclature from "call centers" to "contact centers", "contact" being a wider term than "call". Respecting the trend, contact center owners need to adopt unified communication or multi-channel approach to let customers get in touch with them via their preferred communication mediums, either voice or non-voice (data). Cloud-based phone system is a further advancement in the direction as it allows operators to access all the features and benefits of call center telephony over the Web against an affordable & flexible pay-as-you-go subscription model. Thus, in-house infrastructure deployment to manage public switched telephone networks, storage, communication applications, and collaboration servers is no more an obligation. Neither is the need to invest resources for their upgrade, repair, maintenance and security as cloud vendor would be responsible for the same. == India == India, a popular call center business process outsourcing destination, often uses a cloud-based phone system in order to cut operational expenses and downtime, and increase connectivity. == Promotion == Businesses can rely on contact center telephony services to respond to their customers’ queries over phone, email, chat, fax, etc. Integrating it with their customer relationship management tools, entire contact details of customers and their interaction sessions with different customer service representatives can be found at one place. The combination can manage not just sales and marketing but also deliver excellent post-sales customer service or technical support to allow customers derive the most from their products or services. Hence, it’s becoming instrumental in increasing customer satisfaction and loyalty and most of the call center services in India are taking refuge from it. The entire contact center telephony service can be availed by professionals over a browser. Hence, businesses can leverage the concept of BYOD (bring your own device) and mobility and serve their customers well using mobile applications. According to market analysts, BYOD increases satisfaction among workforce, and hence their individual and collective productivity as well. BYOD programme significantly reduces the TCO (total cost of ownership) as professionals prefer to work with their own devices rather than using company-provisioned devices. Next, they tend to be more caring towards such devices and can even shell out money to update and upgrade those when required. Integration of IM, along with audio and video conferencing services helps call center or contact center representatives to get real time assistance from their peers or seniors to resolve any complex issues. They can internally exchange information and knowledge articles as and when required. Real-time call monitoring/barging system can be used by quality assessment team to provide important guidelines to agents to maintain the standard of the service as per industry norms. Integrated recording feature is helpful for internal training and quality purposes to improve productivity and customer satisfaction in equal measures. It also helps in getting business insights and improving products or services to gain deeper penetration into the market.