Chinchilla (language model)

Chinchilla is a family of large language models (LLMs) developed by the research team at Google DeepMind, presented in March 2022. == Models == It is named "chinchilla" because it is a further development over a previous model family named Gopher. Both model families were trained in order to investigate the scaling laws of large language models. It claimed to outperform GPT-3. It considerably simplifies downstream utilization because it requires much less computer power for inference and fine-tuning. Based on the training of previously employed language models, it has been determined that if one doubles the model size, one must also have twice the number of training tokens. This hypothesis has been used to train Chinchilla by DeepMind. Similar to Gopher in terms of cost, Chinchilla has 70B parameters and four times as much data. Chinchilla has an average accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was still in the testing phase as of January 12, 2023. Chinchilla contributes to developing an effective training paradigm for large autoregressive language models with limited compute resources. The Chinchilla team recommends that the number of training tokens is twice for every model size doubling, meaning that using larger, higher-quality training datasets can lead to better results on downstream tasks. It has been used for the Flamingo vision-language model. == Architecture == Both the Gopher family and Chinchilla family are families of transformer models. In particular, they are essentially the same as GPT-2, with different sizes and minor modifications. Gopher family uses RMSNorm instead of LayerNorm; relative positional encoding rather than absolute positional encoding. The Chinchilla family is the same as the Gopher family, but trained with AdamW instead of Adam optimizer. The Gopher family contains six models of increasing size, from 44 million parameters to 280 billion parameters. They refer to the largest one as "Gopher" by default. Similar naming conventions apply for the Chinchilla family. Table 1 of shows the entire Gopher family: Table 4 of compares the 70-billion-parameter Chinchilla with Gopher 280B.

Scene statistics

Scene statistics is a discipline within the field of perception. It is concerned with the statistical regularities related to scenes. It is based on the premise that a perceptual system is designed to interpret scenes. Biological perceptual systems have evolved in response to physical properties of natural environments. Therefore natural scenes receive a great deal of attention. Natural scene statistics are useful for defining the behavior of an ideal observer in a natural task, typically by incorporating signal detection theory, information theory or estimation theory. == Within-domain versus across-domain == Geisler (2008) distinguishes between four kinds of domains: (1) Physical environments (2) Images/Scenes (3) Neural responses and (4) Behavior. Within the domain of images/scenes one can study the characteristics of information related to redundancy and efficient coding. Across-domain statistics determine how an autonomous system should make inferences about its environment, process information and control its behavior. To study these statistics it is necessary to sample or register information in multiple domains simultaneously. == Applications == === Prediction of picture and video quality === One of the most successful applications of Natural Scenes Statistics Models has been perceptual picture and video quality prediction. For example, the Visual Information Fidelity (VIF) algorithm, which is used to measure the degree of distortion of pictures and videos, is used extensively by the image and video processing communities to assess perceptual quality. This is often after processing, such as compression, which can degrade the appearance of a visual signal. The premise is that the scene statistics are changed by distortion and that the visual system is sensitive to the changes in the scene statistics. VIF is heavily used in the streaming television industry. Other popular picture quality models that use natural scene statistics include BRISQUE and NIQE, both of which are no-reference since they do not require any reference picture to measure quality against.

Mike Little

Mike Little (born 12 May 1962) is an English web developer and writer. He is the co-founder of the free and open source web publishing software WordPress. == Biography == Mike Little was born in Manchester, England in 1962 to a Nigerian father, who was a mathematics lecturer and musician, and an English mother who worked as a primary school teacher. Little was placed into foster care when he was four months of age, and was later adopted by the same family. He grew up on a council estate in Brinnington, Stockport, and was educated at Stockport School. In 2003, Little and Matt Mullenweg started working on a project in which they built on b2/cafelog and later named it WordPress, releasing the first version on 27 May 2003. Little states that, despite not being invited to join his co-founder's for-profit business Automattic, he and Mullenweg remain on good terms. He clarified: "I don’t want it to sound like he cheated me out of something or ripped me off in some way. He didn’t." In June 2013, Little was awarded the SAScon's "Outstanding Contribution to Digital" award for his part in co-founding and developing WordPress. Little has been described as "modest" and living in "virtual anonymity". He has one daughter. He identifies as a follower of Stoicism and a humanist, and in 2021, he became a patron of charity Humanists UK.

DBOS

DBOS (Formerly Database-Oriented Operating System, now just DBOS) is an open source durable workflow execution software library written for the Python, TypeScript, Java, and Go programming languages. DBOS arose from a joint open source project from MIT and Stanford, after a discussion between Michael Stonebraker and Matei Zaharia on how to scale and improve scheduling and performance of millions of Apache Spark tasks. Today it is a commercial company that offers an open source system to add durable computing to any software, built on concepts derived from the joint research project. == History == === 2020: Academic R&D Project === DBOS originated in 2020 as a joint open source project between MIT, Stanford, and Carnegie Mellon. The project explored the idea of operating system services built atop a distributed database - a database-oriented operating system meant to simplify and improve the scalability, security and resilience of large-scale distributed applications. The basic concept was to run a multi-node multi-core, transactional, highly-available distributed database, such as VoltDB, as the only application for a microkernel, and then to implement scheduling, messaging, file systems and other operating system services on top of the database. The architectural philosophy is described by this quote from the abstract of their initial preprint: All operating system state should be represented uniformly as database tables, and operations on this state should be made via queries from otherwise stateless tasks. This design makes it easy to scale and evolve the OS without whole-system refactoring, inspect and debug system state, upgrade components without downtime, manage decisions using machine learning, and implement sophisticated security features. A prototype was built with competitive performance to existing systems. ==

UCSD Pascal

UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (UCSD). == The p-System == In 1977, the University of California, San Diego (UCSD) Institute for Information Systems developed UCSD Pascal to provide students with a common environment that could run on any of the then available microcomputers as well as campus DEC PDP-11 minicomputers. The operating system became known as UCSD p-System. There were three operating systems that IBM offered for its original IBM PC: the UCSD p-System, CP/M-86, and IBM PC DOS. Vendor SofTech Microsystems emphasized p-System's application portability, with virtual machines for 20 CPUs as of the IBM PC's release. It predicted that users would be able to use applications they purchased on future computers running p-System; advertisements called it "the Universal Operating System". PC Magazine denounced UCSD p-System on the IBM PC, stating in a review of Context MBA, written in the language, that it "simply does not produce good code". The p-System did not sell very well for the IBM PC, because of a lack of applications and because it was more expensive than the other choices. Previously, IBM had offered the UCSD p-System as an option for IBM Displaywriter, an 8086-based dedicated word processing machine. (The Displaywriter's native operating system had been developed completely internally and was not opened for end-user programming.) Notable extensions to standard Pascal include separately compilable Units and a String type. Some intrinsics were provided to accelerate string processing (e.g. scanning in an array for a particular search pattern); other language extensions were provided to allow the UCSD p-System to be self-compiling and self-hosted. UCSD Pascal was based on a p-code machine architecture. Its contribution to these early virtual machines was to extend p-code away from its roots as a compiler intermediate language into a full execution environment. The UCSD Pascal p-Machine was optimized for the new small microcomputers with addressing restricted to 16-bit (only 64 KB of memory). James Gosling cites UCSD Pascal as a key influence (along with the Smalltalk virtual machine) on the design of the Java virtual machine. UCSD p-System achieved machine independence by defining a virtual machine, called the p-Machine (or pseudo-machine, which many users began to call the "Pascal-machine" like the OS—although UCSD documentation always used "pseudo-machine") with its own instruction set called p-code (or pseudo-code). Urs Ammann, a student of Niklaus Wirth, originally presented a p-code in his PhD thesis, from which the UCSD implementation was derived, the Zurich Pascal-P implementation. The UCSD implementation changed the Zurich implementation to be "byte oriented". The UCSD p-code was optimized for execution of the Pascal programming language. Each hardware platform then only needed a p-code interpreter program written for it to port the entire p-System and all the tools to run on it. Later versions also included additional languages that compiled to the p-code base. For example, Apple Computer offered a Fortran Compiler (written by Silicon Valley Software, Sunnyvale California) producing p-code that ran on the Apple version of the p-system. Later, TeleSoft (also located in San Diego) offered an early Ada development environment that used p-code and was therefore able to run on a number of hardware platforms including the Motorola 68000, the System/370, and the Pascal MicroEngine. UCSD p-System shares some concepts with the later Java platform. Both use a virtual machine to hide operating system and hardware differences, and both use programs written to that virtual machine to provide cross-platform support. Likewise both systems allow the virtual machine to be used either as the complete operating system of the target computer or to run in a "box" under another operating system. The UCSD Pascal compiler was distributed as part of a portable operating system, the p-System. == History == UCSD p-System began around 1974 as the idea of UCSD's Kenneth Bowles, who believed that the number of new computing platforms coming out at the time would make it difficult for new programming languages to gain acceptance. He based UCSD Pascal on the Pascal-P2 release of the portable compiler from Zurich. He was particularly interested in Pascal as a language to teach programming. UCSD introduced two features that were important improvements on the original Pascal: variable length strings, and "units" of independently compiled code (an idea included into the then-evolving Ada (programming language)). Niklaus Wirth credits the p-System, and UCSD Pascal in particular, with popularizing Pascal. It was not until the release of Turbo Pascal that UCSD's version started to slip from first place among Pascal users. The Pascal dialect of UCSD Pascal came from the subset of Pascal implemented in Pascal-P2, which was not designed to be a full implementation of the language, but rather "the minimum subset that would self-compile", to fit its function as a bootstrap kit for Pascal compilers. UCSD added strings from BASIC, and several other implementation dependent features. Although UCSD Pascal later obtained many of the other features of the full Pascal language, the Pascal-P2 subset persisted in other dialects, notably Borland Pascal, which copied much of the UCSD dialect. == Versions == There were four versions of UCSD p-code engine, each with several revisions of the p-System and UCSD Pascal. A revision of the p-code engine (i.e., the p-Machine) meant a change to the p-code language, and therefore compiled code is not portable between different p-Machine versions. Each revision was represented with a leading Roman Numeral, while operating system revisions were enumerated as the "dot" number following the p-code Roman Numeral. For example, II.3 represented the third revision of the p-System running on the second revision of the p-Machine. === Version I === Original version, never officially distributed outside of the University of California, San Diego. However, the Pascal sources for both Versions I.3 and I.5 were freely exchanged between interested users. Specifically, the patch revision I.5a was known to be one of the most stable. === Version II === Widely distributed, available on many early microcomputers. Numerous versions included Apple II ultimately Apple Pascal, DEC PDP-11, Intel 8080, Zilog Z80, and MOS 6502 based machines, Motorola 68000 and the IBM PC (Version II on the PC was restricted to one 64K code segment and one 64K stack/heap data segment; Version IV removed the code segment limit but cost a lot more). Project members from this era include Dr Kenneth L Bowles, Mark Allen, Richard Gleaves, Richard Kaufmann, Pete Lawrence, Joel McCormack, Mark Overgaard, Keith Shillington, Roger Sumner, and John Van Zandt. === Version III === Custom version written for Western Digital to run on their Pascal MicroEngine microcomputer. Included support for parallel processes for the first time. === Version IV === Commercial version, developed and sold by SofTech. Based on Version II; did not include changes from Version III. Did not sell well due to combination of their pricing structure, performance problems due to p-code interpreter, and competition with native operating systems (on top of which it often ran). After SofTech dropped the product, it was picked up by Pecan Systems, a relatively small company formed of p-System users and fans. Sales revived somewhat, due mostly to Pecan's reasonable pricing structure, but the p-System and UCSD Pascal gradually lost the market to native operating systems and compilers. Available for the TI-99/4A equipped with p-code card, Commodore CBM 8096, Sage II/IV, HP 9000, and BBC Micro with 6502 second processor. == Further use == The Corvus Systems computer used UCSD Pascal for all its user software. The "innovative concept" of the Constellation OS was to run Pascal (interpretively or compiled) and include all common software in the manual, so users could modify as needed.

Software component

A software component is a modular unit of software that encapsulates specific functionality. The desired characteristics of a component are reusability and maintainability. == Value == Components allow software developers to assemble software with reliable parts rather than writing code for every aspect. It makes implementation more like factory assembly than custom building. == Attributes == Desirable attributes of a component include but are not limited to: Cohesive – encapsulates related functionality Reusable Robust Substitutable – can be replaced by another component with the same interface Documented Tested == Third-party == Some components are built in-house by the same organization or team building the software system. Some are third-party, developed elsewhere and assembled into the software system. == Component-based software engineering == For large-scale systems, component-based development encourages a disciplined process to manage complexity. == Framework == Some components conform to a framework technology that allows them to be consumed in a well-known way. Examples include: CORBA, COM, Enterprise JavaBeans, and the .NET Framework. == Modeling == Component design is often modeled visually. In Unified Modeling Language (UML) 2.0 a component is shown as a rectangle, and an interface is shown as a lollipop to indicate a provided interface and as a socket to indicate consumption of an interface. == History == The idea of reusable software components was promoted by Douglas McIlroy in his presentation at the NATO Software Engineering Conference of 1968. (One goal of that conference was to resolve the so-called software crisis of the time.) In the 1970s, McIlroy put this idea into practice with the addition of the pipeline feature to the Unix operating system. Brad Cox refined the concept of a software component in the 1980s. He attempted to create an infrastructure and market for reusable third-party components by inventing the Objective-C programming language. IBM introduced System Object Model (SOM) in the early 1990s. Microsoft introduced Component Object Model (COM) in the early 1990s. Microsoft built many domain-specific component technologies on COM, including Distributed Component Object Model (DCOM), Object Linking and Embedding (OLE), and ActiveX.

Content Credentials

Content Credentials (also known as C2PA signatures) are a digital media metadata specification. They aim to provide provenance information about a piece of media (such as an image or a video) and help prove its authenticity. They are described as the equivalent of nutrition labels for digital media. One of the stated goal of this specification is to fight online disinformation. The specification is written and maintained by the Coalition for Content Provenance and Authenticity (C2PA), a group of many media and tech organizations including Adobe, Amazon, the BBC, Google, Meta, Microsoft, OpenAI and Sony. Another organization, the Content Authenticity Initiative (CAI), is responsible for promoting the standard and accelerate its adoption. The standard relies on cryptographic digital signatures. == Adoption == There are two main stakeholders who can implement Content Credentials: Producers (softwares and hardwares that produce or modify digital media) and publishers (softwares that show digital media to users). === Producers === ==== Adobe ==== Adobe is one of the first companies to implement the specification, announcing support in Photoshop in 2021. Content Credentials can be enabled and the complete history of edits is kept. ==== Google ==== Google announced support for Content Credentials on its Pixel 10 phones in August 2025. The Content Credentials are embedded on each picture taken from the Pixel Camera, and modifications done using Google Photos. Information include picture timestamp and a non-identifiable signature that proves it was taken from a Pixel 10. As for Google Photos, a list of AI and non-AI edits are kept. Google is the first company to introduce support for Content Credentials on either phones or consumer-grade devices, and also the first company to make it available for free to all users. ==== Nikon ==== Nikon announced in 2024 that their Z6 III camera would support embedding Content Credentials in its photos. However, in 2025, a vulnerability was discovered in the software of the camera that allowed to combine unauthentic images with authentic photos and still have the resulting image with a valid digital signature. Nikon revoked the certificates. ==== Media organizations ==== CBC/Radio-Canada and the BBC both have started attaching Content Credentials to media they produce or verify. ==== OpenAI ==== OpenAI embeds Content Credentials on the images and videos it generates that includes that the media was created by AI using their platforms. ==== Sony ==== In June 2025, Sony announced the release of its Camera Verify system for press photographers and news editors using C2PA digital signatures. Initially, the system will be limited to still images, high‑end cameras, and selected news agencies. Registration with Sony Creators' Cloud is also required. === Publishers === ==== LinkedIn ==== In 2024, LinkedIn started showing a "CR" icon on images that contain Content Credentials of AI-generated images. In 2025, they announced a partnership with Adobe to allow photographers to prove ownership of images using Content Credentials. ==== TikTok ==== TikTok announced in 2024 that an "AI-generated" label would be applied to videos containing Content Credentials if they were AI-generated. In 2025, they announced that users could control the amount of AI-generated content they see, using self-reported labels, Content Credentials and an invisible, proprietary AI watermark embedded in videos by their AI editor tool. ==== YouTube ==== In 2024, YouTube started showing to users a label that reads "captured with a camera" on videos that show authentic, unedited videos taken by Content Credentials-compatible cameras.