World model (artificial intelligence)

World model (artificial intelligence)

A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act without constant real-world trial and error. World models differ from systems that merely classify or generate outputs. They simulate dynamics such as physics, object interactions, and causality. Early ideas date to the 1990s. Modern versions power robots, autonomous driving, and interactive video generation. == History == Jürgen Schmidhuber introduced the term world model in machine learning in 1990. He proposed recurrent neural networks that predict future states from observations and use those predictions to train agents. David Ha and Schmidhuber revived the concept in a 2018 paper. Their agents learned to drive virtual cars and play video games inside self-generated simulations. Yann LeCun advanced the idea in a 2022 position paper titled "A Path Towards Autonomous Machine Intelligence". He argued that intelligence requires predictive models of the world rather than pure pattern matching. LeCun proposed the joint embedding predictive architecture (JEPA) as a practical foundation. LeCun and collaborators developed several JEPA variants. V-JEPA 2 reached state-of-the-art performance on video understanding and physical reasoning at the time. It supports zero-shot robot control in unfamiliar environments. Introduced in March 2026, LeWorldModel trains stably end-to-end from raw pixels and uses two loss terms and avoids hand-crafted heuristics. LeCun founded Advanced Machine Intelligence Labs in 2026 to further develop world models. Google DeepMind introduced Genie in 2024. The model learned interactive environments from unlabeled internet videos. Genie 2 followed in late 2024 and added three-dimensional generation. The Genie series set benchmarks for general-purpose simulation. Genie 3 was introduced in August 2025. It produces photorealistic, real-time interactive worlds from text prompts which are displayed at 24 frames per second and explored in real time with text or image prompts. The model supports persistent three-dimensional worlds and real-time interaction. Waymo adopted Genie 3 in February 2026 and used it to create a specialized world model for autonomous driving simulation, called the Waymo World Model. It produces synchronized camera and lidar outputs and creates edge cases that real robotaxis rarely encounter. The edge cases were reported to be unusual by PCMag. General Intuition announced a $133.7 million seed round. World Labs raised $1 billion. AMI raised $1.03 billion. In April 2026, Alibaba announced Happy Oyster, its world model designed for real-time and “flowy” world model. It includes a directing mode for world building based on text and image prompts and a wandering mode for exploring the resulting world. It can generate 3-minute in-world video clips. Also in April, World Labs, co-founded by Li Fei Fei, unveiled Spark 2.0, an open-source 3D Gaussian splatting rendering engine that targets smartphone-class devices. In June 2026, Nvidia released Cosmos 3, a family of open-weight models. It combines previously independent physical reasoning, world simulation, and action generation. Cosmos 3 integrates can process and generate text, image, video, audio, and action sequences. The model employs a Mixture-of-Transformers" (MoT) approach. An autoregressive (AR) transformer handles reasoning and next-token prediction, while a diffusion transformer (DT) does multimodal generation. Encoders (ViT for vision, VAE for visual/audio, and domain-specific for actions) and generate a shared representation space using 3D multi-dimensional rotary position embedding (mRoPE) for spatial and temporal information. The family includes Cosmos3-Nano (16B parameters) for workstations; Cosmos3-Super (64B parameters) for research. == Architecture == World models process raw sensory data such as video frames or lidar scans. They compress this input into compact latent representations. The system then predicts future representations rather than pixel-by-pixel reconstructions. Many modern world models use joint embedding predictive architecture (JEPA). An encoder turns observations into embeddings. A predictor estimates one or a suite of embeddings from the current one and an action. In some cases a critic chooses one embedding as the best result. A regularizer keeps embeddings well-behaved. The model trains by minimizing prediction error in embedding space. This approach avoids the high cost of generating every detail. Some architectures add explicit components. A fast reactive path handles immediate responses. A slower deliberative path performs longer-horizon planning. Video prediction accuracy or robot success rates are key metrics, but do not always predict real-world performance. Generative world models such as Genie 3 combine these with a simulator. They accept text prompts or layouts and output consistent video, lidar, or three-dimensional scenes. World models often train with self-supervised learning. They use large unlabeled datasets of video or robot interactions. Self-supervised learning can speed learning. Reinforcement learning can fine-tune a model for specific tasks. == Applications == World models support robot learning. Agents train inside simulations and transfer skills to the physical world. This reduces the need for dangerous or expensive real-world trials. Autonomous vehicles use world models to test rare events. Waymo's system simulates tornadoes or unusual pedestrian behavior. Companies train planners without putting vehicles on public roads. Interactive entertainment benefits from world models. Genie 3 lets users generate playable environments from simple descriptions. Game studios prototype levels faster. Scientific simulation gains from these models. Researchers model physical systems or biological processes at scale. Planners in logistics or urban design test strategies inside accurate digital twins. == Comparison with large language models == Both world models and large language models (LLMs) use inferencing on their inputs to make predictions. LLMs operate on textual inputs. They predict the next token in text sequences. They excel at language-oriented tasks such as translation or summarization. However, they lack understanding of physics. World models operate on sensor inputs such as pixels. They predict state changes in that data in latent space. This design supports planning and causal reasoning. LLMs generate fluent text but often fail at consistent physical predictions. Their architecture employs transformers with refinements such as mixture of experts. World models divide an inferencing task into work performed by encoders, predictors, simulators, and other pieces. They typically handle multimodal inputs such as video, lidar, radar, and audio, guided by textual prompting. LLMs power chatbots and code assistants. World models drive embodied agents that act in dynamic environments, such as autonomous driving. The two may be combined in hybrid systems. For example, a LLM handles instructions, while a world model manages low-level control. World model proponents such as LeCun claim that because LLMs are trained only on text, they have no ability to predict anything beyond text, such as real-world events. == Benchmarks == World model benchmarks test physical understanding, long-term consistency, planning, and generalization from sensor data. Meta introduced three benchmarks for V-JEPA 2. IntPhys 2 measures a model's ability to detect physics violations. It presents pairs of videos that diverge when one breaks physical rules. Humans score near 100% accuracy. V-JEPA 2 achieves little better than random chance on many conditions. Minimal Video Pairs (MVPBench) tests physical understanding through multiple-choice questions based on short video clips. It probes object interactions and causality. Something-Something tests action recognition. Epic-Kitchens-100 tests human action anticipation. DeepMind benchmark: Interactive evaluation measures consistency over minutes of interaction, memory of off-screen objects, and response to user actions or text prompts. Waymo benchmark: Output generation quality: Metrics include realism, controllability (via text prompts), and usefulness for training planners in simulated worlds. However, pixel reconstruction error rate with episodic rewards often fails. Other: Epic-Kitchens-100 (often measured with Recall@5) Ego4D 50 Salads, Breakfast, etc. Potential benchmarks: Zero-shot transfer to robots Long-horizon planning Implausible prediction rate

Camera interface

The Camera Interface block or CAMIF is the hardware block that interfaces with different image sensor interfaces and provides a standard output that can be used for subsequent image processing. A typical Camera Interface would support at least a parallel interface although these days many camera interfaces are beginning to support the Mobile Industry Processor Interface (MIPI) Camera Serial Interface (CSI) interface. == Electrical connections == The camera interface's parallel interface consists of the following lines: 8 to 12 bits parallel data line These are parallel data lines that carry pixel data. The data transmitted on these lines change with every Pixel Clock (PCLK). Horizontal Sync (HSYNC) This is a special signal that goes from the camera sensor or ISP to the camera interface. An HSYNC indicates that one line of the frame is transmitted. Vertical Sync (VSYNC) This signal is transmitted after the entire frame is transferred. This signal is often a way to indicate that one entire frame is transmitted. Pixel Clock (PCLK) This is the pixel clock and it would change on every pixel. NOTE: The above lines are all treated as input lines to the Camera Interface hardware.

DialogOS

DialogOS is a graphical programming environment to design computer system which can converse through voice with the user. Dialogs are clicked together in a Flowchart. DialogOS includes bindings to control Lego Mindstorms robots by voice and has bindings to SQL databases, as well as a generic plugin architecture to integrate with other types of backends. DialogOS is used in computer science courses in schools and universities to teach programming and to introduce beginners in the basic principles of human/computer interaction and dialog design. It has also been used in research systems. DialogOS was initially developed commercially by CLT Sprachtechnologie GmbH until its liquidation in 2017. The rights were then acquired by Saarland University and the software was released as open-source. == Bindings to Lego Mindstorms NXT == DialogOS can control the LEGO Mindstorms NXT Series. It uses sensor-nodes to obtain values for the following sensors: noise sensor ultrasonic sensor touch sensor luminosity sensor

Fuzzy markup language

Fuzzy Markup Language (FML) is a specific purpose markup language based on XML, used for describing the structure and behavior of a fuzzy system independently of the hardware architecture devoted to host and run it. == Overview == FML was designed and developed by Giovanni Acampora during his Ph.D. course in Computer Science, at University of Salerno, Italy, in 2004. The original idea inspired Giovanni Acampora to create FML was the necessity of creating a cooperative fuzzy-based framework aimed at automatically controlling a living environment characterized by a plethora of heterogeneous devices whose interactions were devoted to maximize the human comfort under energy saving constraints. This framework represented one of the first concrete examples of Ambient Intelligence. Beyond this pioneering application, the major advantage of using XML to describe a fuzzy system is hardware/software interoperability. Indeed, all that is needed to read an FML file is the appropriate schema for that file, and an FML parser. This markup approach makes it much easier to exchange fuzzy systems between software: for example, a machine learning application could extract fuzzy rules which could then be read directly into a fuzzy inference engine or uploaded into a fuzzy controller. Also, with technologies like XSLT, it is possible to compile the FML into the programming language of your choice, ready for embedding into whatever application you please. As stated by Mike Watts on his popular Computational Intelligence blog: "Although Acampora's motivation for developing FML seems to be to develop embedded fuzzy controllers for ambient intelligence applications, FML could be a real boon for developers of fuzzy rule extraction algorithms: from my own experience during my PhD, I know that having to design a file format and implement the appropriate parsers for rule extraction and fuzzy inference engines can be a real pain, taking as much time as implementing the rule extraction algorithm itself. I would much rather have used something like FML for my work." A complete overview of FML and related applications can be found in the book titled On the power of Fuzzy Markup Language edited by Giovanni Acampora, Chang-Shing Lee, Vincenzo Loia and Mei-Hui Wang, and published by Springer in the series Studies on Fuzziness and Soft Computing. == Syntax, grammar and hardware synthesis == FML allows fuzzy systems to be coded through a collection of correlated semantic tags capable of modeling the different components of a classical fuzzy controller such as knowledge base, rule base, fuzzy variables and fuzzy rules. Therefore, the FML tags used to build a fuzzy controller represent the set of lexemes used to create fuzzy expressions. In order to design a well-formed XML-based language, an FML context-free grammar is defined by means of a XML schema which defines name, type and attributes characterized each XML element. However, since an FML program represents only a static view of a fuzzy logic controller, XSLT is provided to change this static view to a computable version. Indeed, XSLTs modules are able to convert the FML-based fuzzy controller in a general purpose computer language using an XSL file containing the translation description. At this level, the control is executable for the hardware. In short, FML is essentially composed by three layers: XML in order to create a new markup language for fuzzy logic control; a XML Schema in order to define the legal building blocks; eXtensible Stylesheet Language Transformations (XSLT) in order to convert a fuzzy controller description into a specific programming language. === Syntax === FML syntax is composed of XML tags and attributes which describe the different components of a fuzzy logic controller listed below: fuzzy knowledge base; fuzzy rule base; inference engine fuzzification subsystem; defuzzification subsystem. In detail, the opening tag of each FML program is which represents the fuzzy controller under modeling. This tag has two attributes: name and ip. The first attribute permits to specify the name of fuzzy controller and ip is used to define the location of controller in a computer network. The fuzzy knowledge base is defined by means of the tag which maintains the set of fuzzy concepts used to model the fuzzy rule base. In order to define the fuzzy concept related controlled system, tag uses a set of nested tags: defines the fuzzy concept; defines a linguistic term describing the fuzzy concept; a set of tags defining a shape of fuzzy sets are related to fuzzy terms. The attributes of tag are: name, scale, domainLeft, domainRight, type and, for only an output, accumulation, defuzzifier and defaultValue. The name attribute defines the name of fuzzy concept, for instance, temperature; scale is used to define the scale used to measure the fuzzy concept, for instance, Celsius degree; domainLeft and domainRight are used to model the universe of discourse of fuzzy concept, that is, the set of real values related to fuzzy concept, for instance [0°,40°] in the case of Celsius degree; the position of fuzzy concept into rule (consequent part or antecedent part) is defined by type attribute (input/output); accumulation attribute defines the method of accumulation that is a method that permits the combination of results of a variable of each rule in a final result; defuzzifier attribute defines the method used to execute the conversion from a fuzzy set, obtained after aggregation process, into a numerical value to give it in output to system; defaultValue attribute defines a real value used only when no rule has fired for the variable at issue. As for tag , it uses two attributes: name used to identify the linguistic value associate with fuzzy concept and complement, a boolean attribute that defines, if it is true, it is necessary to consider the complement of membership function defined by given parameters. Fuzzy shape tags, used to complete the definition of fuzzy concept, are: Every shaping tag uses a set of attributes which defines the real outline of corresponding fuzzy set. The number of these attributes depends on the chosen fuzzy set shape. In order to make an example, consider the Tipper Inference System described in Mathworks Matlab Fuzzy Logic Toolbox Tutorial. This Mamdani system is used to regulate the tipping in, for example, a restaurant. It has got two variables in input (food and service) and one in output (tip). FML code for modeling part of knowledge base of this fuzzy system containing variables food and tip is shown below. A special tag that can furthermore be used to define a fuzzy shape is . This tag is used to customize fuzzy shape (custom shape). The custom shape modeling is performed via a set of tags that lists the extreme points of geometric area defining the custom fuzzy shape. Obviously, the attributes used in tag are x and y coordinates. As for rule base component, FML allows to define a set of rule bases, each one of them describes a different behavior of system. The root of each rule base is modeled by tag which defines a fuzzy rule set. The tag uses five attributes: name, type, activationMethod, andMethod and orMethod. Obviously, the name attribute uniquely identifies the rule base. The type attribute permits to specify the kind of fuzzy controller (Mamdani or TSK) respect to the rule base at issue. The activationMethod attribute defines the method used to implication process; the andMethod and orMethod attribute define, respectively, the and and or algorithm to use by default. In order to define the single rule the tag is used. The attributes used by the tag are: name, connector, operator and weight. The name attribute permits to identify the rule; connector is used to define the logical operator used to connect the different clauses in antecedent part (and/or); operator defines the algorithm to use for chosen connector; weight defines the importance of rule during inference engine step. The definition of antecedent and consequent rule part is obtained by using and tags. tag is used to model the fuzzy clauses in antecedent and consequent part. This tag use the attribute modifier to describe a modification to term used in the clause. The possible values for this attribute are: above, below, extremely, intensify, more or less, norm, not, plus, slightly, somewhat, very, none. To complete the definition of fuzzy clause the nested and tags have to be used. A sequence of tags realizes a fuzzy rule base. As example, consider a Mamdani rule composed by (food is rancid) OR (servi

CogX Festival

CogX Festival is a global festival focusing on the impact of artificial intelligence (AI) and emerging technology on industry, government, and society. It takes place annually, usually in September, in London, England. Founded by Charlie Muirhead and Tabitha Goldstaub in 2017, CogX aims to facilitate dialogue and understanding about AI and its implications across various sectors. CogX Festival 2023 was held from September 12 to September 14 across multiple sites in London. == History == The inaugural CogX event took place in 2017, intending to bring together experts from diverse fields to discuss the role and impact of AI and emerging technologies. Since then, it has evolved to include a broader range of topics and attract a diverse audience. In 2018, the first CogX Awards festival was hosted. That year, over 50 awards were shown to 300 guests. In 2021, CogX and Hopin, a video conferencing software, signed an agreement lasting 4 years to make CogX a hybrid conference due to the COVID-19 pandemic. CogX 2021 attracted over 5,000 attendees in-person and over 100,000 virtually. In 2022, they returned to a live event format after two years of hybrid events and controlled physical attendance. They also launched the CogX app, which curated insights from the world's top podcasts. In 2023, after he had delivered the keynote address guest speaker Stephen Fry fell off the stage and subsequently broke his leg, hip, pelvis and a "bunch of ribs". A court filing in 2026 revealed that Fry was seeking £100,000 in damages from CogX Festival Ltd and creative agency Blonstein Events. == Programming == The festival features sessions, discussions, workshops, and exhibitions, encompassing various domains of AI and technology. In recent CogX Festivals, they have featured summits encompassing topics like global leadership and industry transformation.

Core FTP

Core FTP LE is a freeware secure FTP client for Windows, developed by CoreFTP.com. Features include FTP, SSL/TLS, SFTP via SSH, and HTTP/HTTPS support. Secure FTP clients encrypt account information and data transferred across the internet, protecting data from being seen, or sniffed across networks. Core FTP is a traditional FTP client with local files displayed on the left, remote files on the right. Core FTP Server is a secure FTP server for Windows, developed by CoreFTP.com, starting in 2010. == Licensing == CoreFTP LE is free for personal, educational, non-profit, and business use.

Iron Man 2020 (event)

"Iron Man 2020" is a storyline published by Marvel Comics in 2020 which follows the character Arno Stark as he attempts to take over Stark Industries and the mantle of his estranged brother Tony Stark (Iron Man). The crossover characters of two different brands meeting up in one storyline received mixed reviews from critics. == Publication history == Marvel Comics released the teaser for the event at New York Comic Con in November 2019. It was also alluded to in December 2019's Incoming! In the original checklist released for the event, 2020 Force Works was originally titled Force Works 2020, while 2020 Machine Man was previously named Machine Man 2020, and so on. Additionally, 2020 Wolverine was going to be called Weapon.EXE 2020. The publication of this event was intended to span from January to June 2020, however, due to the COVID-19 pandemic, Diamond Comic Distributors suspended the distribution of new print titles between April 1 and May 27, which also caused digital releases by Marvel Entertainment to be postponed. The rescheduling of the postponed issues to new dates pushed the event's conclusion to August, and certain issues, namely 2020 Force Works #3 and 2020 Ironheart #1–2, were released exclusively in a digital format. == Main plot == Arno Stark wakes up from a nightmare involving the Extinction Entity, a monstrous amalgamation of alien and machine. He dreams that the Extinction Entity is going to come to Earth in a matter of weeks and create an artificial intelligence (A.I.) army to consume humanity. After eating breakfast with duplicates of Howard Stark and Maria Stark, Arno suits up as Iron Man and saves a construction worker from a hostage situation involving several Nick Fury Life Model Decoys, which represent the A.I. army trying to liberate construction robots. Over different news outlets, the media wonders about the whereabouts of Tony Stark, who declared himself as nothing more than a simulation of the real, late Tony Stark. At the A.I. army's base, Machine Man is commanding the robots' moves when Arno appears, having planned for the A.I. army's leader to show himself. Machine Man activates the bomb, forcing Arno to fly it away so it explodes somewhere safe while he escapes. Machine Man reaches the Thirteenth Floor, a dimensional-shunted plane of existence made of solid light, and a haven for robotkind that humans cannot access or comprehend. Aaron meets with the leader of the A.I. army and creator of Thirteenth Floor: Tony Stark -- who is now going by the name Mark One, having embraced his nature as artificial intelligence. Also in the A.I. army are Albert, Awesome Android, H.E.R.B.I.E., Machinesmith, and Quasimodo. The A.I. army continues its efforts to liberate artificial life forms by raiding places where robots are being subjugated. Iron Man intercepts an attack on a Futura Motors testing site by Quasimodo and H.E.R.B.I.E. and manages to recover an Un-Inhibitor allowing him to take control of all A.I.s. On the Thirteenth Floor, Mark One receives a transmission from a mole inside Baintronics -- codenamed Ghost in the Machine --revealing that Arno used the submission code on Jocasta, who received a new body, making her entirely compliant. Stark plans to upload the submission code to the internet to instantly infect robots. With only three hours before the code is transmitted to Stark Unlimited's satellite network, Mark One devises a heist on Bain Tower to tamper with the code before launch. Having discovered the secret behind the Thirteenth Floor, Arno shuts out the A.I. army, uses Jocasta to lure Machine Man away from the tower, infects Machinesmith with the submission code, and confronts Mark One. H.E.R.B.I.E., Awesome Android, and Machinesmith escape from Bain Tower and call for help to every robot in New York City. Mark One is left to fight Iron Man and is defeated. Meanwhile, Sunset Bain confronts and fires Andy Bhang under the accusation of working as a mole inside Stark Unlimited and feeding Bethany Cabe information to relay to the A.I. army. Arno takes Mark One inside Bain Tower to meet Howard and Maria Stark and asks Tony to join him, but he refuses and dismisses his rationale as lunacy. The robotic mob assembled by Machine Man reaches Bain Tower, giving Mark a distraction which allows him to fly off and disable the transmission dish from which Arno intends to broadcast the obedience O.S. to subjugate every robot. Tony manages to stop the upload and make the antenna unusable. In retaliation, Arno fires all of his armor's firepower at Tony as he falls to the ground. Tony Stark's remaining allies escape with his body as Arno attacks the robot protesters. Tony wakes up inside the Thirteenth Floor and is greeted by F.R.I.D.A.Y., who had plucked Tony's consciousness from his body during his fall. In the streets, Arno Stark tracks down Howard and Maria, who die from an illness inherited from Arno. When Sunset Bain objects to Arno creating new bodies for his parents and trying to control people, he reveals she is an A.I., a duplicate of the real Bain whom Arno replaced back when she solicited him to heal a scar on her face. He makes new bodies for Howard and Maria by recreating the Arsenal and Mistress bodies from the eScape. After learning of Arno's new plan, Dr. Shapiro (who is the actual mole) sneaks into a computer and warns F.R.I.D.A.Y. about it. When F.R.I.D.A.Y. relays that only Tony Stark can stop Arno, Tony insists that he is not the real Tony Stark, but is confronted by holographic manifestations of himself in different points of his life, until they all merge into him and he acknowledges that he has always been Tony. As Arno Stark sets off to the Stark Space Station to install his mind-controlling device to enslave all of humanity, Tony Stark's allies assault the Stark Unlimited HQ, confronting Sunset Bain's duplicate and Arno's Iron Legion. Jocasta uploads a submission code to Bain and they place Tony's body inside a bio-pod that restores his body to normalcy, uploads his consciousness back into his body. Using the Thirteenth Floor's access mechanisms, Tony and his allies reach the Stark Space Station from one of the elevators within. Employing his new Virtual Armor, Tony defeats Arno in combat. When Arno prepares to activate his mind-controlling device, the Extinction Entity suddenly appears. Arno ultimately defeats the Extinction Entity by willingly assimilating with it, causing it to explode. The entity is revealed to be a delusion caused by Arno's terminal disease, of which he would die by the end of 2020. Unable to stop Arno, Tony placed him in a simulation where he successfully stopped the entity. Afterwards, Jocasta uses the submission code to force Sunset Bain's duplicate to confess all of Baintronics' crimes, also claiming responsibility for tricking Tony into thinking he was an artificial intelligence and pulling the strings of the A.I. Army, putting an end to the robot revolution. Tony gives up Stark Unlimited to Bhang Robotics and he flies off in a new armor, reasserting himself as Iron Man. == Issues involved == === Main issues === Iron Man 2020 (vol. 2) #1–6 === Tie-In issues === 2020 Force Works #1–3 2020 Iron Age #1 2020 Ironheart #1–2 2020 Machine Man #1–2 2020 Rescue #1–2 2020 iWolverine #1–2 == Critical reception == According to Comic Book Roundup, the entire crossover received an average score of 6.4 out of 10 based on 36 reviews. William Tucker from ButWhyTho Podcast stated "Iron Man 2020 #6 is an initially exciting end to a great event that eventually feels deflated. There is absolutely nothing wrong with the art, Woods has been incredible throughout, but the ending that Slott and Gage chose to round out an epic tale like this left me feeling cold. And while there were loads of enjoyable cameos, their involvement ultimately didn't seem important to the story as a whole. Which is disappointing, as the rest of the event really was a fun and exciting ride." Anthony Wendel from MonkeysFightingRobots wrote "The 2020 event seems like it is taking some big risk, and it doesn't inspire a lot of confidence from the start. Iron Man 2020 #1 has set the stakes and shown some very intense players on both sides of the board. Sadly, if it doesn't unfold just the right way, many may feel cheated about defending the path characters are taking." == Collected editions ==