World model (artificial intelligence)

World model (artificial intelligence)

A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act without constant real-world trial and error. World models differ from systems that merely classify or generate outputs. They simulate dynamics such as physics, object interactions, and causality. Early ideas date to the 1990s. Modern versions power robots, autonomous driving, and interactive video generation. == History == Jürgen Schmidhuber introduced the term world model in machine learning in 1990. He proposed recurrent neural networks that predict future states from observations and use those predictions to train agents. David Ha and Schmidhuber revived the concept in a 2018 paper. Their agents learned to drive virtual cars and play video games inside self-generated simulations. Yann LeCun advanced the idea in a 2022 position paper titled "A Path Towards Autonomous Machine Intelligence". He argued that intelligence requires predictive models of the world rather than pure pattern matching. LeCun proposed the joint embedding predictive architecture (JEPA) as a practical foundation. LeCun and collaborators developed several JEPA variants. V-JEPA 2 reached state-of-the-art performance on video understanding and physical reasoning at the time. It supports zero-shot robot control in unfamiliar environments. Introduced in March 2026, LeWorldModel trains stably end-to-end from raw pixels and uses two loss terms and avoids hand-crafted heuristics. LeCun founded Advanced Machine Intelligence Labs in 2026 to further develop world models. Google DeepMind introduced Genie in 2024. The model learned interactive environments from unlabeled internet videos. Genie 2 followed in late 2024 and added three-dimensional generation. The Genie series set benchmarks for general-purpose simulation. Genie 3 was introduced in August 2025. It produces photorealistic, real-time interactive worlds from text prompts which are displayed at 24 frames per second and explored in real time with text or image prompts. The model supports persistent three-dimensional worlds and real-time interaction. Waymo adopted Genie 3 in February 2026 and used it to create a specialized world model for autonomous driving simulation, called the Waymo World Model. It produces synchronized camera and lidar outputs and creates edge cases that real robotaxis rarely encounter. The edge cases were reported to be unusual by PCMag. General Intuition announced a $133.7 million seed round. World Labs raised $1 billion. AMI raised $1.03 billion. In April 2026, Alibaba announced Happy Oyster, its world model designed for real-time and “flowy” world model. It includes a directing mode for world building based on text and image prompts and a wandering mode for exploring the resulting world. It can generate 3-minute in-world video clips. Also in April, World Labs, co-founded by Li Fei Fei, unveiled Spark 2.0, an open-source 3D Gaussian splatting rendering engine that targets smartphone-class devices. In June 2026, Nvidia released Cosmos 3, a family of open-weight models. It combines previously independent physical reasoning, world simulation, and action generation. Cosmos 3 integrates can process and generate text, image, video, audio, and action sequences. The model employs a Mixture-of-Transformers" (MoT) approach. An autoregressive (AR) transformer handles reasoning and next-token prediction, while a diffusion transformer (DT) does multimodal generation. Encoders (ViT for vision, VAE for visual/audio, and domain-specific for actions) and generate a shared representation space using 3D multi-dimensional rotary position embedding (mRoPE) for spatial and temporal information. The family includes Cosmos3-Nano (16B parameters) for workstations; Cosmos3-Super (64B parameters) for research. == Architecture == World models process raw sensory data such as video frames or lidar scans. They compress this input into compact latent representations. The system then predicts future representations rather than pixel-by-pixel reconstructions. Many modern world models use joint embedding predictive architecture (JEPA). An encoder turns observations into embeddings. A predictor estimates one or a suite of embeddings from the current one and an action. In some cases a critic chooses one embedding as the best result. A regularizer keeps embeddings well-behaved. The model trains by minimizing prediction error in embedding space. This approach avoids the high cost of generating every detail. Some architectures add explicit components. A fast reactive path handles immediate responses. A slower deliberative path performs longer-horizon planning. Video prediction accuracy or robot success rates are key metrics, but do not always predict real-world performance. Generative world models such as Genie 3 combine these with a simulator. They accept text prompts or layouts and output consistent video, lidar, or three-dimensional scenes. World models often train with self-supervised learning. They use large unlabeled datasets of video or robot interactions. Self-supervised learning can speed learning. Reinforcement learning can fine-tune a model for specific tasks. == Applications == World models support robot learning. Agents train inside simulations and transfer skills to the physical world. This reduces the need for dangerous or expensive real-world trials. Autonomous vehicles use world models to test rare events. Waymo's system simulates tornadoes or unusual pedestrian behavior. Companies train planners without putting vehicles on public roads. Interactive entertainment benefits from world models. Genie 3 lets users generate playable environments from simple descriptions. Game studios prototype levels faster. Scientific simulation gains from these models. Researchers model physical systems or biological processes at scale. Planners in logistics or urban design test strategies inside accurate digital twins. == Comparison with large language models == Both world models and large language models (LLMs) use inferencing on their inputs to make predictions. LLMs operate on textual inputs. They predict the next token in text sequences. They excel at language-oriented tasks such as translation or summarization. However, they lack understanding of physics. World models operate on sensor inputs such as pixels. They predict state changes in that data in latent space. This design supports planning and causal reasoning. LLMs generate fluent text but often fail at consistent physical predictions. Their architecture employs transformers with refinements such as mixture of experts. World models divide an inferencing task into work performed by encoders, predictors, simulators, and other pieces. They typically handle multimodal inputs such as video, lidar, radar, and audio, guided by textual prompting. LLMs power chatbots and code assistants. World models drive embodied agents that act in dynamic environments, such as autonomous driving. The two may be combined in hybrid systems. For example, a LLM handles instructions, while a world model manages low-level control. World model proponents such as LeCun claim that because LLMs are trained only on text, they have no ability to predict anything beyond text, such as real-world events. == Benchmarks == World model benchmarks test physical understanding, long-term consistency, planning, and generalization from sensor data. Meta introduced three benchmarks for V-JEPA 2. IntPhys 2 measures a model's ability to detect physics violations. It presents pairs of videos that diverge when one breaks physical rules. Humans score near 100% accuracy. V-JEPA 2 achieves little better than random chance on many conditions. Minimal Video Pairs (MVPBench) tests physical understanding through multiple-choice questions based on short video clips. It probes object interactions and causality. Something-Something tests action recognition. Epic-Kitchens-100 tests human action anticipation. DeepMind benchmark: Interactive evaluation measures consistency over minutes of interaction, memory of off-screen objects, and response to user actions or text prompts. Waymo benchmark: Output generation quality: Metrics include realism, controllability (via text prompts), and usefulness for training planners in simulated worlds. However, pixel reconstruction error rate with episodic rewards often fails. Other: Epic-Kitchens-100 (often measured with Recall@5) Ego4D 50 Salads, Breakfast, etc. Potential benchmarks: Zero-shot transfer to robots Long-horizon planning Implausible prediction rate

List of artificial intelligence journals

This is a list of notable peer-reviewed academic journals that publish research in the field of artificial intelligence (AI), including areas such as machine learning, computer vision, natural language processing, robotics, and intelligent systems. == General artificial intelligence == Artificial Intelligence (journal) – Elsevier Journal of Artificial Intelligence Research (JAIR) – AI Access Foundation Knowledge-Based Systems – Elsevier == Machine learning == Data Mining and Knowledge Discovery – Springer Machine Learning (journal) – Springer Journal of Machine Learning Research – Microtome Pattern Recognition (journal) – Elsevier Neural Networks (journal) – Elsevier Neural Computation (journal) – MIT Press Neurocomputing (journal) - Elsevier == Deep learning and neural computation == IEEE Transactions on Evolutionary Computation – IEEE IEEE Transactions on Neural Networks and Learning Systems – IEEE Nature Machine Intelligence – Springer Nature == Computer vision == International Journal of Computer Vision – Springer IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Machine Vision and Applications – Springer == Natural language processing == Computational Linguistics (journal) – MIT Press Natural Language Processing Transactions of the Association for Computational Linguistics – ACL == Robotics and intelligent systems == IEEE Transactions on Robotics – IEEE Autonomous Robots – Springer Journal of Intelligent & Robotic Systems – Springer == Interdisciplinary and ethics in AI == AI & Society – Springer Artificial Life – MIT Press Philosophy & Technology – Springer Minds and Machines – Springer

MADI

Multichannel Audio Digital Interface (MADI) standardized as AES10 by the Audio Engineering Society (AES) defines the data format and electrical characteristics of an interface that carries multiple channels of digital audio. The AES first documented the MADI standard in AES10-1991 and updated it in AES10-2003 and AES10-2008. The MADI standard includes a bit-level description and has features in common with the two-channel AES3 interface. MADI supports serial digital transmission over coaxial cable or fibre-optic lines of 28, 56, 32, or 64 channels; and sampling rates to 96 kHz and beyond with an audio bit depth of up to 24 bits per channel. Like AES3 and ADAT Lightpipe, it is a unidirectional interface from one sender to one receiver. == Development and applications == MADI was developed by AMS Neve, Solid State Logic, Sony and Mitsubishi and is widely used in the audio industry, especially in the professional audio sector. It provides advantages over other audio digital interface protocols and standards such as AES3, ADAT Lightpipe, TDIF (Tascam Digital Interface), and S/PDIF (Sony/Philips Digital Interface). These advantages include: Support for a greater number of channels per line Use of coaxial and optical fiber media that support transmission of audio signals over 100 meters, up to 3000 meters over multi-mode and 40,000 meters over single-mode optical fiber The original specification (AES10-1991) defined the MADI link as a 56-channel transport for linking large-format mixing consoles to digital multitrack recording devices. Large broadcast studios also adopted it for routing multi-channel audio throughout their facilities. The 2003 revision (AES10-2003) adds a 64-channel capability by removing varispeed operation and supports 96 kHz sampling frequency with reduced channel capacity. The latest AES10-2008 standard includes minor clarifications and updates to correspond to the current AES3 standard. Audio over Ethernet of various types is the primary alternative to MADI for transport of many channels of professional digital audio. == Transmission format == MADI links use a transmission format similar to Fiber Distributed Data Interface (FDDI) networking. Since MADI is most often transmitted on copper links via 75-ohm coaxial cables, it more closely compares to the FDDI specification for copper-based links, called CDDI. AES10-2003 recommends using BNC connectors with coaxial cables and SC connectors with optic fibers. MADI over fibre can support a range of up to 2 km. The basic data rate is 100 Mbit/s of data using 4B5B encoding to produce a 125 MHz physical baud rate. Unlike AES3, this clock is not synchronized to the audio sample rate, and the audio data payload is padded using JK sync symbols. Sync symbols may be inserted at any subframe boundary, and must occur at least once per frame. Though the standard disassociates the transmission clock from the audio sample rate, and thus requires a separate word clock connection to maintain synchronization, some vendors do give the option of locking to parts of the transmission timing information for purposes of deriving a word clock. The audio data is almost identical to the AES3 payload, though with more channels. Rather than letters, MADI assigns channel numbers from 0–63. Frame synchronization is provided by sync symbols outside the data itself, rather than an embedded preamble sequence, and the first four time slots of each sub-channel are encoded as normal data, used for sub-channel identification: Bit 0: Set to 1 to mark channel 0, the first channel in each frame Bit 1: Set to 1 to indicate that this channel is active (contains interesting data) Bit 2: notA/B channel marker, used to mark left (0) and right (1) channels. Generally, even channels are A and odd channels are B. Bit 3: Set to 1 to mark the beginning of a 192-sample data block == Sampling frequency == The original AES10-1991 specification allowed 56 channels at sample rates from 32 to 48 kHz with an additional vari-speed range of ± 12.5%. This leads to a total range of 28 to 54 kHz. At the highest frequency, this produces a total of 56 × 32 × 54 = 96768 kbit/s, leaving 3.232% of the channel for synchronization marks and transmit clock error. The 2003 revision specifies different relations between sampling frequency and number of channels. 32 kHz to 48 kHz ± 12.5%, 56 channels; 32 kHz to 48 kHz nominal, 64 channels; 64 kHz to 96 kHz ± 12.5%, 28 channels. With a 48 kHz sampling frequency, 64 channels take 64 × 32 × 48000 = 98.304 Mbit/s. Adding the minimum 8 × 58 kbit/s of framing produces 98688 bit/s, leaving 1.312% free for timing variation and other overhead. Both versions of the standard accommodate higher sampling frequencies (for example, 96 kHz or 192 kHz) by using two or more channels per audio sample on the link.

Haul video

A haul video is a video recording posted to the Internet in which a person discusses items that they recently purchased, sometimes going into detail about their experiences during the purchase and the cost of the items they bought. The posting of haul videos (or hauls) was a growing trend between 2008 and 2016. Often the items bought are books, clothing, groceries, household goods, makeup, or jewellery. == Details == The posting of haul videos grew as a trend between 2008 and 2016. By late 2010, nearly a quarter of a million haul videos had been shared on the website YouTube alone. Certain videos have each received tens of millions of views. Many young adults (mostly women) have displayed their shopping hauls, while including their beauty and design commentary in the narration. The videos are often grouped by store name or by the type of product (cosmetics, accessories, shoes, postage stamps, etc.). Before haul videos became an online trend, millions of people spent time watching other people, in technical product videos unbox their latest new gadgets and technology. The trend of "unboxing videos" had emerged during 2006. Haul videos have led to celebrity status for some people. Other haul video bloggers have entered sponsorship deals and advertising programs from major brands. The videos are rarely negative about the products being reviewed. This aspect of the genre of haul videos makes sponsorship by brand advertisers particularly appealing. Brands including J.C. Penney contacted haulers as part of their marketing efforts for Back to School 2010. Haul videos also convinced three San Francisco Bay Area area natives to launch HaulBlog–a parody site that creates fake haul videos which poke fun at the phenomenon. The site is also home to the original monthly web series "The Haul Monitor" a humorous commentary show that features haul videos from around the community. == Fashion media == Sarah Sykes and John Zimmerman of Carnegie Mellon University, HCII and School of Design wrote an article "Making Sense of Haul Videos: Self-created Celebrities Fill a Fashion Media Gap". They discuss their analysis and research project examining what makes video bloggers so popular on YouTube, as well as how it affects fashion media through the production of haul videos. == Federal Trade Commission == The United States Federal Trade Commission recently enacted laws to regulate many types of online publishers and content creators. The posted information includes blogging and podcasting in text, images, audio, and video. While any publishers (including the haul-video creators) are allowed to accept free merchandise and advertising, the gifts or payments must be fully (and clearly) disclosed to reveal being paid by a brand name, as a sponsor, to review a product. The Canadian Radio-television and Telecommunications Commission is also closely monitoring such Internet activities.

News ticker

A news ticker (sometimes called a crawler, crawl, slide, zipper, ticker tape, or chyron) is a horizontal or vertical (depending on the language's writing system) text-based display either in the form of a graphic that typically resides in the lower third of the screen space on a television station or network (usually during news programming) or as a long, thin scoreboard-style display seen around the facades of some offices or public buildings dedicated to presenting headlines or minor pieces of news. It is an evolution of the paper strips tapes, a continuous paper print-out of stock quotes from a printing telegraph which was mainly used to transmit companies' share price information over telegraph lines before the advance of technology in the 1960s. News tickers have been used in Europe in countries such as United Kingdom, Germany and Ireland for some years; they are also used in several Asian countries and Australia. In the United States, tickers were long used on a special event basis by broadcast television stations to disseminate weather warnings, school closings, and election results. Sports telecasts occasionally used a ticker to update other contests in progress before the expansion of cable news networks and the internet for news content. In addition, some ticker displays are used to relay continuous business and financial information. Most tickers are traditionally displayed in the form of scrolling text running from right to left across the screen or building display (or in the opposite direction for right-to-left writing systems such as Arabic script and Hebrew), allowing for headlines of varying degrees of detail; some used by television broadcasters, however, display stories in a static manner (allowing for the seamless switching of each story individually programmed for display) or utilize a "flipping" effect (in which each individual headline is shown for a few seconds before transitioning to the next, instead of scrolling across the screen, usually resulting in a relatively quicker run through of all of the information programmed into the ticker). Since the growth in usage of the World Wide Web, some news tickers have syndicated news stories posted largely on websites of broadcasters or by other independent news agencies. == Current uses == === Television === The presentation of headlines or other information in a news ticker has become a common element of many different news networks. The use of the ticker has differed on a number of channels: News networks and local newscasts commonly use a setup in which news headlines are scrolled across an area near the bottom of the screen, though some variations have formed, such as showing one headline at a time with a scrolling or "flipper" effect. Financial news channels use two or more tickers displaying company shares prices and business headlines. Networks with a focus on sports often use a slightly different system, where scores and statuses of ongoing and finished games are displayed one by one, along with minor sports highlights, statistics and sports news headlines. They are typically divided into categories devoted to specific leagues and events (with college basketball and football usually focusing on the top 25 ranked teams on the AP Poll, occasionally supplemented by sections for specific conferences). Some programs, including news-based programs emphasizing viewer interactivity, or special events, may also use tickers to display messages and reactions from viewers and others that relate to the program. These comments are often sourced from social networking services such as Facebook and Twitter, typically curating comments from a specific page or hashtag. Due to their current prevalence, they have been occasionally been made targets of pranks and vandalism. In one such example, News 14 Carolina allowed viewers to submit relevant information such as school closings or traffic delays via telephone or the Internet that would be incorporated into the ticker; the system was exploited in February 2004 to display humorous and crude messages, including the infamous "All your base are belong to us". Occasionally messages intended for training accidentally end up being put on the live ticker as happened on BBC News in 2022 when "Weather rain everywhere" and "Manchester United are rubbish" appeared on the live news ticker. Some businesses and organizations have utilized tickers intended for relaying weather-related closings as a surreptitious source for free guerrilla marketing, proclaiming they were open rather than closed and giving their phone number if possible, allowing them to 'advertise' on a television station all day for free. Since then, many stations have required pre-registration of businesses or organizations with an authorized representative and a signed affidavit on company letterhead affirming their authenticity, along with filtering out unfamiliar businesses and organizations, before being able to display their closing announcements. Stations also confirm all closings involving school districts with authorized officials to prevent situations in which students either show up to canceled classes in dangerous conditions, or do not attend school due to an erroneous, prank-submitted, or false listing. === On personal computers === Various applications have been developed over time to install news tickers on personal computer desktops using RSS feeds from news organizations, which are displayed in a fashion similar to those used by television channels but enable the user to access to underlying news stories, a feature not offered by traditional television channels. The Bloomberg Terminal and other financial information-tracking programs and devices also utilize tickers. A ticker may also be used as an unobtrusive method by businesses in order to deliver important information to their staff. The ticker can be set to reappear, stay on screen, or be put into a retractable mode (where a small tab is left visible on-screen). In the United Kingdom, broadcasters have stopped using this technology as other forms of communications have become available and increased in popularity. BBC News and Sky News discontinued their respective desktop tickers in March 2011 and 2012 to focus on other products, such as smartphone applications, to deliver updated information on breaking news and sport stories. === News tickers on buildings === Since the advent of the telegraph, newspapers commonly used their buildings to share the latest headlines. At first simple chalkboard signs were used for bulletins, but limelight illumination, electric lights, magic lantern projections, and other novel techniques were later employed. The method of using electric lights to spell out moving letters was invented by Frank C. Reilly (August 20, 1888 – April 10, 1947) and patented in 1923. Reilly called his invention the Motograph News Bulletin. In 1928, The New York Times installed a Motograph News Bulletin to display news headlines on the sides of Times Tower. The display was 388 feet (118 m) long, 5 feet (1.5 m) high, and employed over 14,800 light bulbs. Popularly known as the "Zipper", the sign remained in use until the building was sold in 1961. The sign was darkened during World War II to comply with wartime lighting restrictions. The Motograph operated until 1994 and was replaced by an electronic version in 1995, which was in turn removed in 2017 due to the replacement of all individual screens on the front of One Times Square with a 350 foot (110 m)-tall LED billboard in 2018. Ticker displays appear today on the exterior of the News Corp Building, which houses the headquarters for Fox News Channel/News Corp in the west extension of Manhattan's Rockefeller Center, as well as one that displays delayed stock market data that is located in Times Square. NASDAQ itself features a large display screen on the facade of the NASDAQ MarketSite building in Times Square. The Reuters buildings at Canary Wharf and in Toronto have news and stock tickers; the latter type features market data for the New York Stock Exchange, NASDAQ and London Stock Exchange, while the Toronto building's ticker also includes quotes from the Toronto Stock Exchange. A red-LED ticker was added to the perimeter of 10 Rockefeller Center in 1994, as the building was being renovated to accommodate the studios for NBC's Today. Placed at the juncture of the first and second floors, the ticker is visible to spectators in Rockefeller Plaza and passersby on West 49th Street and updates continuously, even at times when Today is not being produced and broadcast. As of 2015, the ticker strip is only a small part of a large two-floor LCD video display that is placed within the window of the studio showing promotional information. The Martin Place Headquarters of Seven News, the news division of Australian television broadcaster Seven Network, also incorporates a ticker that wraps around the building. == In popular culture == The use of new

Avid Free DV

Avid Free DV is a non-linear editing video editing software application developed by Avid Technology. Avid introduced Free DV in January 2003 at the 2003 MacWorld Expo; the company discontinued it in September 2007. Free DV was intended to give editors a sample of the Avid interface to use in deciding whether or not to purchase Avid software, so when compared with other Avid products its features were relatively minimal. When it was available it was not limited by time or watermarking, so it could be used as a non-linear editor for as long as desired. == Comparisons == When compared with other consumer-end non-linear editors such as iMovie and Windows Movie Maker, it sported more powerful video processing tools, but lacked the ease-of-use and shallow learning curve emphasized in similar programs because it had the full interface of the professional Avid system. However, Avid did offer a number of flash-based tutorials to help new users learn how to use the program for capturing, editing, clipping, processing, and outputting audio/video, among other things. == Limitations == The limitations of Avid Free DV included that it allowed only two video and audio tracks, had fewer editing tools than other Avid products, had few import and export formats, and allowed capture and output of standard-definition DV only, via FireWire. Avid Free DV projects and media were not compatible with other Avid systems. As the name implied, Avid Free DV was available as a free download, although users were required to complete a short survey on the Avid website before they were given a download link and key. In addition to using Free DV to evaluate Avid prior to purchase, it could also act as a stepping stone for people wishing to learn to use Avid's other editing products, such as Xpress Pro, Media Composer and Symphony. While additional skills and techniques are necessary to use these professionally geared systems, the basic operation remains the same. == Operating systems == Avid Free DV was available for Windows XP and Mac OS X. The officially supported Mac OS X versions were Panther versions up to 10.3.5, and Tiger versions up to 10.4.3 only. == Supported formats == Avid Free DV supported QuickTime (MOV) and DV AVIs. == Reception == John P. Mello Jr. of The Boston Globe gave Free DV a negative review, finding the user interface obfuscatory and the process of ingesting video error-prone. He summarized: "Professional video editors who use an Avid competitor may jump at the chance to take a free look at how Avid does things. But for the merely curious, this software is a nightmare". Video Systems's Steve Mullen opined that its lack of interoperability with Avid's professional editing software contracted Avid's stated goal to entice budding video editors into buying into the company's software ecosystem.

Influence-for-hire

Influence-for-hire or collective influence, refers to the economy that has emerged around buying and selling influence on social media platforms. == Overview == Companies that engage in the influence-for-hire industry range from content farms to high-end public relations agencies. Traditionally influence operations have largely been confined to public sector actors like intelligence agencies, in the influence-for-hire industry the groups conduction the operations are private with commerce being their primary consideration. However many of the clients in the influence-for-hire industry are countries or countries acting through proxies. They are often located in countries with less expensive digital labor. == History == In May 2021, Facebook took a Ukrainian influence-for-hire network offline. Facebook attributed the network to organizations and consultants linked to Ukrainian politicians including Andriy Derkach. During the COVID-19 pandemic state sponsored misinformation was spread through influence-for-hire networks. In August 2021, a report published by the Australian Strategic Policy Institute implicated the Chinese government and the ruling Chinese Communist Party in campaigns of online manipulation conducted against Australia and Taiwan using influence-for-hire.