BERT (language model)

BERT (language model)

Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state of the art for large language models. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments. BERT is trained by masked token prediction and next sentence prediction. With this training, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2. It found applications for many natural language processing tasks, such as coreference resolution and polysemy resolution. It improved on ELMo and spawned the study of "BERTology", which attempts to interpret what is learned by BERT. BERT was originally implemented in the English language at two model sizes, BERTBASE (110 million parameters) and BERTLARGE (340 million parameters). Both were trained on the Toronto BookCorpus (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub. On March 11, 2020, 24 smaller models were released, the smallest being BERTTINY with just 4 million parameters. == Architecture == BERT is an "encoder-only" transformer architecture. At a high level, BERT consists of 4 modules: Tokenizer: This module converts a piece of English text into a sequence of integers ("tokens"). Embedding: This module converts the sequence of tokens into an array of real-valued vectors representing the tokens. It represents the conversion of discrete token types into a lower-dimensional Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal masking. Task head: This module converts the final representation vectors into one-shot encoded tokens again by producing a predicted probability distribution over the token types. It can be viewed as a simple decoder, decoding the latent representation into token types, or as an "un-embedding layer". The task head is necessary for pre-training, but it is often unnecessary for so-called "downstream tasks," such as question answering or sentiment classification. Instead, one removes the task head and replaces it with a newly initialized module suited for the task, and finetune the new module. The latent vector representation of the model is directly fed into this new module, allowing for sample-efficient transfer learning. === Embedding === This section describes the embedding used by BERTBASE. The other one, BERTLARGE, is similar, just larger. The tokenizer of BERT is WordPiece, which is a sub-word strategy like byte-pair encoding. Its vocabulary size is 30,000, and any token not appearing in its vocabulary is replaced by [UNK] ("unknown"). The first layer is the embedding layer, which contains three components: token type embeddings, position embeddings, and segment type embeddings. Token type: The token type is a standard embedding layer, translating a one-hot vector into a dense vector based on its token type. Position: The position embeddings are based on a token's position in the sequence. BERT uses absolute position embeddings, where each position in a sequence is mapped to a real-valued vector. Each dimension of the vector consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Using a vocabulary of just 0 or 1, this embedding layer produces a dense vector based on whether the token belongs to the first or second text segment in that input. In other words, type-1 tokens are all tokens that appear after the [SEP] special token. All prior tokens are type-0. The three embedding vectors are added together representing the initial token representation as a function of these three pieces of information. After embedding, the vector representation is normalized using a LayerNorm operation, outputting a 768-dimensional vector for each input token. After this, the representation vectors are passed forward through 12 Transformer encoder blocks, and are decoded back to 30,000-dimensional vocabulary space using a basic affine transformation layer. === Architectural family === The encoder stack of BERT has 2 free parameters: L {\displaystyle L} , the number of layers, and H {\displaystyle H} , the hidden size. There are always H / 64 {\displaystyle H/64} self-attention heads, and the feed-forward/filter size is always 4 H {\displaystyle 4H} . By varying these two numbers, one obtains an entire family of BERT models. For BERT: the feed-forward size and filter size are synonymous. Both of them denote the number of dimensions in the middle layer of the feed-forward network. the hidden size and embedding size are synonymous. Both of them denote the number of real numbers used to represent a token. The notation for encoder stack is written as L/H. For example, BERTBASE is written as 12L/768H, BERTLARGE as 24L/1024H, and BERTTINY as 2L/128H. == Training == === Pre-training === BERT was pre-trained simultaneously on two tasks: Masked language modeling (MLM): In this task, BERT ingests a sequence of words, where one word may be randomly changed ("masked"), and BERT tries to predict the original words that had been changed. For example, in the sentence "The cat sat on the [MASK]," BERT would need to predict "mat." This helps BERT learn bidirectional context, meaning it understands the relationships between words not just from left to right or right to left but from both directions at the same time. Next sentence prediction (NSP): In this task, BERT is trained to predict whether one sentence logically follows another. For example, given two sentences, "The cat sat on the mat" and "It was a sunny day", BERT has to decide if the second sentence is a valid continuation of the first one. This helps BERT understand relationships between sentences, which is important for tasks like question answering or document classification. ==== Masked language modeling ==== In masked language modeling, 15% of tokens would be randomly selected for masked-prediction task, and the training objective was to predict the masked token given its context. In more detail, the selected token is: replaced with a [MASK] token with probability 80%, replaced with a random word token with probability 10%, not replaced with probability 10%. The reason not all selected tokens are masked is to avoid the dataset shift problem. The dataset shift problem arises when the distribution of inputs seen during training differs significantly from the distribution encountered during inference. A trained BERT model might be applied to word representation (like Word2Vec), where it would be run over sentences not containing any [MASK] tokens. It is later found that more diverse training objectives are generally better. As an illustrative example, consider the sentence "my dog is cute". It would first be divided into tokens like "my1 dog2 is3 cute4". Then a random token in the sentence would be picked. Let it be the 4th one "cute4". Next, there would be three possibilities: with probability 80%, the chosen token is masked, resulting in "my1 dog2 is3 [MASK]4"; with probability 10%, the chosen token is replaced by a uniformly sampled random token, such as "happy", resulting in "my1 dog2 is3 happy4"; with probability 10%, nothing is done, resulting in "my1 dog2 is3 cute4". After processing the input text, the model's 4th output vector is passed to its decoder layer, which outputs a probability distribution over its 30,000-dimensional vocabulary space. ==== Next sentence prediction ==== Given two sentences, the model predicts if they appear sequentially in the training corpus, outputting either [IsNext] or [NotNext]. During training, the algorithm sometimes samples two sentences from a single continuous span in the training corpus, while at other times, it samples two sentences from two discontinuous spans. The first sentence starts with a special token, [CLS] (for "classify"). The two sentences are separated by another special token, [SEP] (for "separate"). After processing the two sentences, the final vector for the [CLS] token is passed to a linear layer for binary classification into [IsNext] and [NotNext]. For example: Given "[CLS] my dog is cute [SEP] he likes playing [SEP]", the model should predict [IsNext]. Given "[CLS] my dog is cute [SEP] how do magnets work [SEP]", the model should predict [NotNext]. === Fine-tuning === BERT is meant as a general pretrained model for various applications in natural language processing. That is, after pre-training, BERT can be fine-tuned with fewer resources on smaller datasets to optimize its performance on specific tasks such as natural language inference and text classification, and sequence-to-sequence-based language generation tasks such as question answering and conversational response generation. The original BERT paper published results demonstrating that a small amount of fine

Pixorial

Pixorial was a cloud-based consumer photo sharing, video sharing and video editing platform. The company was formed in 2007 in Centennial, Colorado as a media conversion service. In 2013, Pixorial was chosen as one of two video storage companies to partner with the launch of Google Drive. Pixorial allowed users to edit and share videos on social channels by connecting through their Pixorial account. The company closed on July 18, 2014, and its assets were acquired by LifeLogger Technologies Corp in November 2015. == History == The company was founded in 2007 and launched in 2009 by former Netscape employee Andres Espineira. Changing its focus to video editing software in 2009, Pixorial began developing an app that would be launched for iOS and Android devices in 2011. Later developments in the app in 2012 would also included real time filters, which were later removed. With the launch of Google Drive in 2012, Pixorial was chosen as an integrated video partner. This integration with Google Drive allowed users to access videos stored in Google Drive within the web app of Pixorial. After the Google Drive launch, Pixorial developed a crowdsourced, location-based video sharing app, Krowds. The app was cited in July 2012 by PC Magazine as one of "The 8 Best Apps for Making and Sharing Videos on Your iPhone". In late July, Pixorial replaced its original mobile app with the MyPlayer HD app that optimized HD video viewing for large screen viewing including tablets and smart televisions. Pixorial's services terminated on July 18, 2014. == Products == === Krowds App === Pixorial's app was launched in April 2013 for iOS, and in May for Android, as a tool to aggregate event videos through location based collections. The app was launched to generally positive reviews. === Movie Creator === Launched July 12, 2012 Pixorial's Movie Creator allowed users to edit movies in a simple story-telling platform Movie Creator's features include transitions, text boxes, access to free music tracks, credits, and social media sharing capabilities. The Pixorial platform allowed users to view, share, and edit videos without modifying the original. Movie Creator integrated pictures and video to create user movies. == Awards == 2012 Apex Award from the Colorado Technology Association, for Best Technology Project of the Year 2010 Computerworld Laureate for Media, Arts and Entertainment

Digital media in education

Digital media in education refers to the use of digital technologies to support and enhance teaching and learning processes. This includes the application of multiple digital software applications, devices, and online platforms as tools for learning. Learners interact with these technologies to access, analyze, evaluate, and create media content and communication in various forms. The integration of digital media in education has dramatically increased over time, significantly transforming traditional educational practices. When viewed through a global and inclusive lens, digital education should be guided by principles of equity, inclusion, and public infrastructure to ensure meaningful participation of all learners. == History == === 20th century === Technological advances in the 20th century, particularly the invention of the Internet, laid the foundation for incorporating technology into education. In the early 1900s, the overhead projector and instructional radio broadcasts were among the first technologies used for educational purposes. The introduction of computers in classrooms occurred in 1950, when a flight simulation program was developed to train pilots at the Massachusetts Institute of Technology. However, access to computers remained extremely limited for several decades. In 1964, John Kemeny and Thomas Kurtz developed the BASIC programming language, which simplified computer interaction and introduced time-sharing, enabling multiple users to work on the same system simultaneously. This innovation made computing increasingly accessible for educational settings. By the 1980s, schools began to show more interest in computers as companies released mass-market devices to the public. Networking further enabled the interconnection of computers into unified communication systems, which proved more efficient and cost-effective than previous stand-alone machines. This development prompted wider adoption of computing in educational institutions. The invention of the World Wide Web in 1992 further simplified internet navigation and sparked further interest in educational settings. Initially, computers were integrated into school curricula for tasks such as word processing, spreadsheet creation, and data organization. By the late 1990s, the Internet became a research tool, functioning as a vast library. By 1999, 99% of public school teachers in the United States reported having access to at least one computer in their schools, and 84% had a computer available in their classrooms. The emergence of World Wide Web also contributed to the development of learning management systems (LMS), which allowed educators to create online teaching environments for content storage, student activities, discussions, and assignments. Advances in digital compression and high-speed Internet made video creation and distribution more affordable, fostering the use of the systems designed for recording lectures. These tools were often incorporated into learning management platforms, supporting the expansion of fully online courses. === 21st century === By 2002, the Massachusetts Institute of Technology began offering recorded lectures to the public, marking a significant milestone in the movement toward accessible online education. The launch of YouTube in 2005 further transformed educational content distribution. Educators increasingly uploaded lectures and instructional videos on platforms with initiatives like Khan Academy, which was active in 2006, contributing to You Tube's role as a prominent educational resource. In 2007, Apple launched iTunesU, another platform for sharing educational resources and videos. Meanwhile, learning management systems gained popularity, with Blackboard and Canvas becoming two of the most widely used platforms with Canvas's release in 2008. That same year also marked the introduction of the first Massive Open Online Course (MOOC), which provided open access to webinars and expert-led instructions for global learners. As technology evolved, traditional projectors were gradually replaced by interactive whiteboards, which enabled educators to integrate digital tools more effectively in their classrooms. By 2009, 97% of classrooms in the United States had at least one computer, and 93% had Internet access. The COVID-19 pandemic, which forced schools across the world to close, significantly impacted education with schools shifting to distance education. Students attended classes remotely using devices such as laptops, phones, and tablets, supported by digital platforms that facilitated at-home learning environments. However, adapting assessment methods to the new learning environment posed certain challenges. A study conducted by Eddie M. Mulenga and José M. Marbán on Zambian students during the pandemic revealed difficulties in adapting to digital learning, particularly in subjects like mathematics. Similar issues were reported among students in Romania, where the transition to virtual learning presented significant obstacles in engagement and adaptability. === Post-pandemic developments === In the period following the onset of COVID-19, education systems worldwide rapidly adopted digital solutions to maintain continuity of learning and teaching. By the end of March 2020, all 46 OECD and partners countries closed some or all of their schools nationwide. By June 2020, the length of school closures in these countries ranged from 7 to over 18 weeks. These disruptions in formal education prompted governments and educators to quickly adopt digital learning. This global shift to online education highlighted considerable inequalities in digital access, although many systems struggled with inequitable access, especially in regions lacking devices, stable internet connections, or conducive home learning environments. Stimultaneously, commercial educational technology (ed-tech) companies introduced rapid digital solutions to the disruption caused by the pandemic. This led to what has been described as a "seller's market," where the urgency of implementation may cause the prioritization of availability and scale over pedagogical and equity considerations. In the post-pandemic era, digital media in education continues to evolve. It increasingly intersects with artificial intelligence (AI) technologies such as adaptive learning platforms, AI-enabled content generation, and personalized learning environments. These tools enhance global engagement and access but also raise concerns about infrastructure, inclusivity, ethical implementation as well as critical pedagogies. Scholars recommend that educators and policymakers adopt inclusive practices, prioritize equitable infrastructure, and develop critical digital literacy. Facer and Selwyn also emphasize the need for public digital infrastructure and sustainable and justice-oriented policies that empower all learners. Overall, these perspectives reflect a growing consensus that digital media in education should be implemented critically to promote inclusive, multimodal, and future-oriented learning environments.

GeForce RTX 50 series

The GeForce RTX 50 series of consumer graphics cards is the successor of Nvidia's GeForce 40 series. Announced at CES 2025, it debuted with the release of the RTX 5070, RTX 5080 and RTX 5090 in January 2025. It is based on Nvidia's Blackwell architecture featuring Nvidia RTX's fourth-generation RT cores for hardware-accelerated real-time ray tracing, and fifth-generation deep learning–focused Tensor Cores. The GPUs are manufactured by TSMC on a custom 4N process node. == Background == In March 2024, Nvidia announced the Blackwell architecture for its datacenter products. Like Ampere, the architecture is shared by consumer and datacenter products rather than having distinct architectures released simultaneously like Ada Lovelace for consumers and Hopper for datacenter. At the Game Awards in December 2024, a cinematic trailer for The Witcher IV was shown that had been pre-rendered on an "unannounced Nvidia GeForce RTX GPU". This was assumed to be an upcoming GeForce RTX 50 series GPU. Following the RTX 50 series announcement, Nvidia confirmed that the trailer was "pre-rendered in Unreal Engine 5 on a GeForce RTX 5090". Later in the same month, it was reported that Nvidia had begun stockpiling GeForce RTX 50 series units in U.S. warehouses due to a threatened 10% import tariff and 60% tariff on Chinese imports that Donald Trump promised in his re-election campaign. === Announcement === On January 6, 2025, the GeForce RTX 50 series was officially announced for desktop and mobile devices during Nvidia's CES keynote in Las Vegas. The pricing announcement was met with surprise as the RTX 5080 at $999 was the same price that the RTX 4080 Super released at a year earlier despite the anticipated price increases. Nvidia CEO Jensen Huang falsely claimed that the RTX 5070 could reach "RTX 4090 performance at $549", a figure that relies on the use of DLSS 4 upscaling and Multi Frame generation, and is not an indication of raw performance. == Features == === Blackwell architecture === The GeForce RTX 50 series is powered by the Blackwell microarchitecture, which continues Ada Lovelace's emphasis on high graphics frequencies and large L2 caches. The Blackwell architecture introduces Nvidia RTX's fourth-generation RT cores for hardware-accelerated real-time ray tracing and fifth-generation Tensor Cores for AI compute and performing floating-point calculations. === GDDR7 === RTX 50 series GPUs are the first consumer GPUs to feature GDDR7 video memory for greater memory bandwidth over the same bus width compared to the GDDR6 and GDDR6X memory used in the GeForce 40 series. RTX 50 series desktop GPUs use GDDR7 modules from Samsung due to them being available for validation earlier than modules from SK Hynix and Micron. === 12V-2×6 connector === The GeForce RTX 50 series uses the 16-pin 12V-2×6 connector, which is a revision of the 12VHPWR connector featured on the GeForce 40 series. There were problems with the 12VHPWR connector melting on some RTX 4090 GPUs due to the connector not being fully seated and connector design flaws that did not implement a high enough safety and error tolerance. The 12V-2×6 connector revision, published by PCI-SIG in July 2023, addressed this by shortening the four sense pins so the connector will not push any power if it has not been fully seated. The 12VHPWR design would still draw up to 150W of power even if the sense pins were not making full contact. 12V-2×6 is backwards compatible with existing 12VHPWR cables and adapters. Nvidia has mandated to its AIB partners that the 16-pin 12V-2×6 connector be used on all RTX 50 series designs. With the GeForce 40 series, the 12VHPWR connector was only mandated on higher power cards such as the RTX 4070 Super, RTX 4070 Ti, RTX 4070 Ti Super, RTX 4080, RTX 4080 Super and RTX 4090 while RTX 4060, RTX 4060 Ti and RTX 4070 AIB designs had the option of using 8-pin PCIe connectors. The 600W-capable 12VHPWR connector would not have been necessary on sub-200W cards. === DLSS 4 === The fourth generation of Deep Learning Super Sampling (DLSS) was unveiled alongside the RTX 50 series. DLSS 4 upscaling uses a new vision transformer-based model for enhanced image quality with reduced ghosting and greater image stability in motion compared to the previous convolutional neural network (CNN) model. DLSS 4 also allows a greater number of frames to be generated and interpolated based on a single traditionally rendered frame. This form of frame generation called Multi Frame Generation is exclusive to the RTX 50 series while the GeForce 40 series is limited to one interpolated frame per traditionally rendered frame. Nvidia claims that DLSS 4's frame generation model uses 30% less video memory with the example of Warhammer 40,000: Darktide using 400 MB less memory at 4K resolution with frame generation enabled. Nvidia claims that 75 titles will integrate DLSS 4 Multi Frame Generation at launch, including Alan Wake 2, Cyberpunk 2077, Indiana Jones and the Great Circle, and Star Wars Outlaws. === Media Engine and I/O === The RTX 50 series includes DisplayPort 2.1b UHBR20 (80Gbps) with higher display output data rates to support high resolution and high refresh rate displays. The GeForce 40 series received criticism for only including DisplayPort 1.4a (32Gbps) while the competing Radeon RX 7000 series included DisplayPort 2.1 UHBR13.5 (54Gbps). At CES 2025, VESA announced a collaboration with Nvidia on the new DP80LL ("low loss") UHBR20 active cable standard. DP80LL allows for 80Gbps DisplayPort 2.1 cables up to 3 meters long as passive DP80 cables are limited in length due to signal integrity concerns. The RTX 50 series introduces the ninth-generation NVENC encoder and sixth-generation NVDEC video decoder. For the first time in a consumer GeForce GPU, encoding and decoding video in the 4:2:2 color format for professional-grade higher color depth is supported. == List of GPUs == === Desktop === GeForce RTX 50 series desktop GPUs are the second consumer GPUs to utilize a PCIe 5.0 interface and the first to feature GDDR7 video memory (except for the entry level RTX 5050 that still uses GDDR6). They are fabricated by TSMC using a custom 5 nm process dubbed 4N. === Mobile === Laptops featuring GeForce RTX 50 series laptop GPUs were shown at CES 2025. Laptops with RTX 50 series GPUs were paired with Intel's Arrow Lake-HX and AMD's Strix Point and Fire Range CPUs. Nvidia claims that Blackwell architecture's new Max-Q features can increase battery life by up to 40% over GeForce 40 series laptops. For example, Advanced Power Gating saves power by turning off areas of the GPU that are unused and the paired GDDR7 memory can run in an "ultra" low-voltage state. Initial RTX 50 series laptops will become available in March 2025 starting at $1,299. == Controversies == === 12V-2x6 power connector issue === The 12V-2x6 connector used by multiple 5090 cards faces criticism due to a design flaw that can potentially cause the connector to melt. The flaw primarily affect Nvidia's own RTX 5090 FE and RTX 5080 FE cards and are similar to the failures seen on the RTX 40 series but models by third party OEMs have been affected as well. === Availability and pricing === The releases of the RTX 5090, 5080 and 5070 Ti were marked by severe availability issues and pricing well above MSRP. Pricing became an issue again at the end of 2025 due to an ongoing memory supply shortage. Nvidia has been rumored to cut production of 16GB VRAM cards, affecting the availability of the RTX 5060 Ti 16GB and RTX 5070 Ti SKUs. === 32-bit support removal for CUDA, OpenCL, and GPU PhysX === Support for 32-bit OpenCL, and CUDA applications (and as a result 32-bit GPU-accelerated PhysX), was dropped for the GeForce RTX 50 series, which resulted in several applications encountering performance issues with GPU PhysX options or not being able to run at all, causing negative reactions from numerous gaming communities. On December 4, 2025, with the release of driver version 591.44, 32-bit GPU-accelerated PhysX support was restored for certain games. Support for more games was promised in the future. === Incomplete dies and missing ROPs === The dies of certain RTX 5090/5090D, 5080, and 5070 Ti cards were missing eight render output units (ROPs), resulting in slower graphics while pure compute and AI workloads are unaffected. Nvidia claimed that less than 0.5% of cards are affected and that the "production anomaly" has been rectified. === Black screen issues === Some RTX 5080 and 5090 users reported an issue where the system would boot into a black screen after installing Nvidia drivers. Nvidia confirmed the issue and said that a new driver update would fix it for people who hadn't received a VBIOS update yet. Released on February 27, 2025 Nvidia drivers version 572.60 claim to have fixed the issue. Nvidia has since released multiple hotfix and Game Ready drivers that contain additional fixes for the issue. === Windows driver branch quality and stabilit

Vintage computer

A vintage computer is an older computer system that is largely regarded as obsolete. The personal computer has been around since around 1971, and in that time technological advancement means existing models get replaced every few years. Nevertheless, these otherwise useless computers have spawned a sub-culture of vintage computer collectors who often spend large sums for the rarest examples, not only to display but functionally restore. This involves active software development and adaptation to modern uses. This often includes homebrew developers and hackers who add on, update and create hybrid composites from new and old computers for uses they were otherwise never intended. Ethernet interfaces have been designed for many vintage 8-bit machines to allow limited connectivity to the Internet, where users can access discussion groups, bulletin boards, and software databases. Most of this hobby centers on computers made after 1960, though some collectors also specialize in older computers. The Vintage Computer Festival, an event held by the Vintage Computer Federation for the exhibition and celebration of vintage computers, has been held annually since 1997 and has expanded internationally. == By platform == === MITS Inc. === Micro Instrumentation and Telemetry Systems (MITS) produced the Altair 8800 in 1975. According to Harry Garland, the Altair 8800 was the product that catalyzed the microcomputer revolution of the 1970s. === IMSAI === The IMSAI 8080 is a clone of the Altair 8800. It was introduced in 1975, first as a kit, and later as an assembled system. The list price was $591 (equivalent to $3,584 in 2025) for a kit, and $931 (equivalent to $5,570 in 2025) assembled. === Processor Technology === Processor Technology produced the Sol-20. This was one of the first machines to have a case that included a keyboard; a design feature copied by many of later "home computers". === SWTPC === Southwest Technical Products Corporation (SWTPC) produced the 8-bit SWTPC 6800 and later the 16-bit SWTPC 6809 kits that employed the Motorola 68xx series microprocessors. === Apple Inc. === The earliest Apple Inc. personal computers, using the MOS Technology 6502 processors, are among some of the most collectible. They are relatively easy to maintain in an operational state thanks to Apple's use of readily available off-the-shelf parts. Apple I (1976): The Apple-1 was Apple's first product and has brought some of the highest prices ever paid for a microcomputer at auction. Apple II (1977): The Apple II series of computers are some of the easiest to adapt, thanks to the original expansion architecture designed for them. New peripheral cards are still being designed by an avid thriving community, thanks to the longevity of this platform, manufactured from 1977 through 1993. Numerous websites exist to support not only legacy users but new adopters who weren't even born when the Apple II was discontinued by Apple. Macintosh (1984): The original Macintosh used a 32-bit Motorola 68000 processor running at 7.8336 MHz and came with 128 KB of RAM. The list price was $2495 (equivalent to $7,732 in 2025).Perhaps because of its friendly design and first commercially successful graphical user interface as well as its enduring Finder application that persists on the most current Macs, the Macintosh is one of the most collected and used vintage computers. With dozens of websites around the world, old Macintosh hardware and software are input into daily use. The Macintosh had a strong presence in many early computer labs, creating a nostalgia factor for former students who recall their first computing experiences. === RCA === The COSMAC Elf in 1976 was an inexpensive (about $100) single-board computer that was easily built by hobbyists. Many people who could not afford an Altair could afford an ELF, which was based on the RCA 1802 chip. Because the chips are still available from other sources, modern recreations of the ELF are fairly common and there are several fan websites. === IBM === The IBM 1130 (1965) was a desk-sized small computer. It was the often the first computer used by many college students, still has a following of interested users. Most of the remaining 1130 systems in 2023 are in museums, but an emulator is available for users who don't have access to a physical 1130. The 5100 also has an avid collector and fan base. The PC series (5150 PC, 5155 Portable PC, 5160 PC/XT, 5170 PC/AT) has become very popular in recent years, with the earliest models (PC) being considered the most collectible. === Acorn BBC & Archimedes === The Acorn BBC Micro was a very popular British computer in the 1980s with home and educational users and enjoyed near-universal usage in British schools into the mid-1990s. It was possible to use 100K 5+1⁄4-inch disks, and it had many expansion ports. The Archimedes series – the de facto successor to the BBC Micro – has also enjoyed a following in recent years, thanks to its status as the first computer to be based around ARM's RISC microprocessor. === Tandy/Radio Shack === The Tandy/RadioShack Model 100 is still widely collected and used as one of the earliest examples of a truly portable computer. Other Tandy offerings, such as the TRS-80 line, are also very popular, and early systems, like the Model I, in good condition can command premium prices on the vintage computer market. === Sinclair === The Sinclair ZX81 and ZX Spectrum series were the most popular British home computers of the early 1980s, with a wide choice of emulators available for both platforms. The Spectrum in particular enjoys a cult following due to its popularity as a games platform, with new games titles still being developed even today. Original "rubber key" Spectrums fetch the highest prices on the second-hand market, with the later Amstrad-built models attracting less of a following. The earlier ZX81 is not as popular in original hardware form due to its monochrome display and limited abilities next to the Spectrum, but still unassembled ZX81 kits still appear on eBay occasionally. === MSX === Although nearly nonexistent in the United States, the MSX architecture has strong communities of fans and hobbyists worldwide, particularly in Japan (where the standard was conceived and developed), South Korea (the only country that had an MSX-based game console, Zemmix), Netherlands, Spain, Brazil, Argentina, Russia, Chile, the Middle East, and others. New hardware and software are being actively developed to this day as well. One of the latest fundamental (from hardware and software perspectives) revivals of the MSX is the GR8BIT. === Robotron === The Robotron Z1013 was an East German home computer produced by VEB Robotron. It had a U880 processor, 16 KB RAM, and a membrane keyboard. The KC 85 series of computers was a modular 8-bit computer system used in East German schools. === Commodore === VIC-20 Commodore 64 Commodore PET Amiga === Xerox === The Xerox Alto, designed and manufactured by Xerox PARC and released in 1973, was the first personal computer equipped with a graphic user interface. In 1979, Steve Jobs of Apple Inc. arranged for his engineers to visit Xerox in order to see the Alto. The design concepts of the Alto soon appeared in the Apple Lisa and Macintosh systems. The Xerox Star, also known as the 8010/40, was made available in 1981. It followed on the Alto. Like the Alto, this machine was expensive and was only intended for corporate office usage. Therefore, being out of the price range of the average user, this product had little market penetration. === Silicon Graphics === The SGI Indy, built in 1993 for Silicon Graphics has a history of usage in the development of the Nintendo 64 as well as various CGI projects throughout the 1990s and early 2000s. The Indy and other machines in the SGI lineup have remained cult classics.

Inception (deep learning architecture)

Inception is a family of convolutional neural network (CNN) for computer vision, introduced by researchers at Google in 2014 as GoogLeNet (later renamed Inception v1). The series was historically important as an early CNN that separates the stem (data ingest), body (data processing), and head (prediction), an architectural design that persists in all modern CNN. == Version history == === Inception v1 === In 2014, a team at Google developed the GoogLeNet architecture, an instance of which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The name came from the LeNet of 1998, since both LeNet and GoogLeNet are CNNs. They also called it "Inception" after a "we need to go deeper" internet meme, a phrase from Inception (2010) the film. Because later, more versions were released, the original Inception architecture was renamed again as "Inception v1". The models and the code were released under Apache 2.0 license on GitHub. The Inception v1 architecture is a deep CNN composed of 22 layers. Most of these layers were "Inception modules". The original paper stated that Inception modules are a "logical culmination" of Network in Network and (Arora et al, 2014). Since Inception v1 is deep, it suffered from the vanishing gradient problem. The team solved it by using two "auxiliary classifiers", which are linear-softmax classifiers inserted at 1/3-deep and 2/3-deep within the network, and the loss function is a weighted sum of all three: L = 0.3 L a u x , 1 + 0.3 L a u x , 2 + L r e a l {\displaystyle L=0.3L_{aux,1}+0.3L_{aux,2}+L_{real}} These were removed after training was complete. This was later solved by the ResNet architecture. The architecture consists of three parts stacked on top of one another: The stem (data ingestion): The first few convolutional layers perform data preprocessing to downscale images to a smaller size. The body (data processing): The next many Inception modules perform the bulk of data processing. The head (prediction): The final fully-connected layer and softmax produces a probability distribution for image classification. This structure is used in most modern CNN architectures. === Inception v2 === Inception v2 was released in 2015, in a paper that is more famous for proposing batch normalization. It had 13.6 million parameters. It improves on Inception v1 by adding batch normalization, and removing dropout and local response normalization which they found became unnecessary when batch normalization is used. === Inception v3 === Inception v3 was released in 2016. It improves on Inception v2 by using factorized convolutions. As an example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3. Both has a receptive field of size 5×5. The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version. Thus, the 5×5 convolution is strictly more powerful than the factorized version. However, this power is not necessarily needed. Empirically, the research team found that factorized convolutions help. It also uses a form of dimension-reduction by concatenating the output from a convolutional layer and a pooling layer. As an example, a tensor of size 35 × 35 × 320 {\displaystyle 35\times 35\times 320} can be downscaled by a convolution with stride 2 to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} , and by maxpooling with pool size 2 × 2 {\displaystyle 2\times 2} to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} . These are then concatenated to 17 × 17 × 640 {\displaystyle 17\times 17\times 640} . Other than this, it also removed the lowest auxiliary classifier during training. They found that the auxiliary head worked as a form of regularization. They also proposed label-smoothing regularization in classification. For an image with label c {\displaystyle c} , instead of making the model to predict the probability distribution δ c = ( 0 , 0 , … , 0 , 1 ⏟ c -th entry , 0 , … , 0 ) {\displaystyle \delta _{c}=(0,0,\dots ,0,\underbrace {1} _{c{\text{-th entry}}},0,\dots ,0)} , they made the model predict the smoothed distribution ( 1 − ϵ ) δ c + ϵ / K {\displaystyle (1-\epsilon )\delta _{c}+\epsilon /K} where K {\displaystyle K} is the total number of classes. === Inception v4 === In 2017, the team released Inception v4, Inception ResNet v1, and Inception ResNet v2. Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks. Inception ResNet v1 and v2 are both modifications of Inception v4, where residual connections are added to each Inception module, inspired by the ResNet architecture. === Xception === Xception ("Extreme Inception") was published in 2017. It is a linear stack of depthwise separable convolution layers with residual connections. The design was proposed on the hypothesis that in a CNN, the cross-channels correlations and spatial correlations in the feature maps can be entirely decoupled. Training each network took 3 days on 60 K80 GPUs, or approximately 0.5 petaFLOP-days.

False answer supervision

False answer supervision (FAS) refers to VoIP fraud where the billed duration for the caller is more than the duration of the actual connection duration. The FAS is usually performed by VoIP wholesalers in their softswitches for randomly selected calls. Adding a small amount of extra billed seconds for many calls results in significant revenue for the VoIP wholesaler. == Implementation of FAS == The FAS fraud can be implemented in a softswitch in many different ways. These include: False billing of party A without calling a party B. Usually a fake ringback tone, loopback audio or voicemail message is played Start of billing before actual answer of party B Extra billing after disconnection of party B == Detection of FAS == The FAS can be detected and blocked in a softswitch. Common methods are: Manual verification of call detail records: listening to voice recordings Identification of FAS types and using algorithms to automatically detect the FAS RTP audio signal processing: detection of voice RTP audio signal processing: detection of silence RTP audio signal processing: detection of ringback tone