Illia Polosukhin is a Ukrainian-born computer scientist and entrepreneur known for his work on the transformer architecture in machine learning and for co-founding the NEAR blockchain. == Early life and education == Polosukhin studied at the Kharkiv Polytechnic Institute, later relocating to San Diego and then moving to Silicon Valley. == Career == === Google and transformer research === Polosukhin worked at Google and was part of the team associated with research on self-attention that culminated in the 2017 paper Attention Is All You Need, widely credited with introducing the transformer architecture used in modern large language models. === NEAR Protocol === After his work in machine learning, Polosukhin became a co-founder of NEAR Protocol and later associated with the NEAR Foundation ecosystem. In 2023, Polosukhin publicly argued that increasingly capable A.I. systems should be more transparent and user-controlled, and expressed skepticism that conventional regulation alone would solve problems created by closed, corporate models, warning about risks such as regulatory capture. He has promoted “user-owned AI” concepts that combine open approaches with decentralized infrastructure aligned with the blockchain technology. In 2024, Polosukhin downplayed scenarios of A.I. independently causing human extinction, arguing that conflicts are driven by people and that misuse of AI would reflect human intent and incentives. Later this year, Polosukhin said the NEAR Foundation would reduce its workforce by about 40%. == Publications == Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Lukasz Kaiser, Illia Polosukhin; et al. (2017). "Attention Is All You Need". arXiv.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Language technology
Language technology, often called human language technology (HLT), studies methods of how computer programs or electronic devices can analyze, produce, modify or respond to human texts and speech. Working with language technology often requires broad knowledge not only about linguistics but also about computer science. It consists of natural language processing (NLP) and computational linguistics (CL) on the one hand, many application oriented aspects of these, and more low-level aspects such as encoding and speech technology on the other hand. Note that these elementary aspects are normally not considered to be within the scope of related terms such as natural language processing and (applied) computational linguistics, which are otherwise near-synonyms. As an example, for many of the world's lesser known languages, the foundation of language technology is providing communities with fonts and keyboard setups so their languages can be written on computers or mobile devices. Other tools also are part of modern language technology and include machine translation, speech recognition, text processing and natural language processing. Large scale AI models have recently advanced the field and enhanced the ability of machines to interpret complex human context.
GlTF
glTF (Graphics Library Transmission Format or GL Transmission Format and formerly known as WebGL Transmissions Format or WebGL TF) is a standard file format for three-dimensional scenes and models. A glTF file uses one of two possible file extensions: .gltf (JSON/ASCII) or .glb (binary). Both .gltf and .glb files may reference external binary and texture resources. Alternatively, both formats may be self-contained by directly embedding binary data buffers (as base64-encoded strings in .gltf files or as raw byte arrays in .glb files). An open standard developed and maintained by the Khronos Group, it supports 3D model geometry, appearance, scene graph hierarchy, and animation. It is intended to be a streamlined, interoperable format for the delivery of 3D assets, while minimizing file size and runtime processing by apps. As such, its creators have described it as the "JPEG of 3D". == Overview == The glTF format stores data primarily in JSON. The JSON may also contain blobs of binary data known as buffers, and refer to external files, for storing mesh data, images, etc. The binary .glb format also contains JSON text, but serialized with binary chunk headers to allow blobs to be directly appended to the file. The fundamental building blocks of a glTF scene are nodes. Nodes are organized into a hierarchy, such that a node may have other nodes defined as children. Nodes may have transforms relative to their parent. Nodes may refer to resources, such as meshes, skins, and cameras. Meshes may refer to materials, which refer to textures, which refer to images. Scenes are defined using an array of root nodes. Most of the top-level glTF properties use a flat hierarchy for storage. Nodes are saved in an array and are referred to by index, including by other nodes. A glTF scene refers to its root nodes by index. Furthermore, nodes refer to meshes by index, which refer to materials by index, which refer to textures by index, which refer to images by index. All glTF data structures support being extended using a JSON property, allowing arbitrary JSON data to be added. == Releases == === glTF 1.0 === Members of the COLLADA working group conceived the file format in 2012. At SIGGRAPH 2012, Khronos presented a demo of glTF, which was then called WebGL Transmissions Format (WebGL TF). On October 19, 2015, Khronos released the glTF 1.0 specification. ==== Adoption of glTF 1.0 ==== At SIGGRAPH 2016, Oculus announced their adoption of glTF citing the similarities to their ovrscene format. In October 2016, Microsoft joined the 3D Formats working group at Khronos to collaborate on glTF. === glTF 2.0 === The second version, glTF 2.0, was released in June 2017, and is a complete overhaul of the file format from version 1.0, with most tools adopting the 2.0 version. Based on a proposal by Fraunhofer originally presented at SIGGRAPH 2016, physically based rendering (PBR) was added, replacing WebGL shaders used in glTF 1.0. glTF 2.0 added the GLB binary format into the base specification. Other upgrades include sparse accessors and morph targets for techniques such as facial animation, and schema tweaks and breaking changes for corner cases or performance such as replacing top-level glTF object properties with arrays for faster index-based access. There is ongoing work towards import and export in Unity and an integrated multi-engine viewer and validator. ==== Adoption of glTF 2.0 ==== On March 3, 2017, Microsoft announced that they would be using glTF 2.0 as the 3D asset format across their product line, including Paint 3D, 3D Viewer, Remix 3D, Babylon.js, and Microsoft Office. Sketchfab also announced support for glTF 2.0. The glTF and GLB formats are used on and supported by companies including DGG, UX3D, Sketchfab, Facebook, Microsoft, Meta, Google, Adobe, Box, TurboSquid, Unreal Engine, Unity, and Qt Quick 3D. The format has been noted as an important standard for augmented reality, integrating with modeling software such as Autodesk Maya, Autodesk 3ds Max, and Poly. In February 2020, the Smithsonian Institution launched their Open Access Initiative, releasing approximately 2.8 million 2D images and 3D models into the public domain, using glTF for the 3D models. In July 2022, glTF 2.0 was released as the ISO/IEC 12113:2022 International Standard. Khronos stated they would make regular submissions to bring updates and new widely adopted glTF functionality into refreshed versions of ISO/IEC 12113 to ensure that there is no long-term divergence between the ISO/IEC and Khronos specifications. The open-source game engine Godot supports importing glTF 2.0 files since version 3.0 and export since version 4.0. === Extensions === The glTF format can be extended with arbitrary JSON to add new data and functionality. Extensions can be placed on any part of a glTF, including nodes, animations, materials, textures, and on the entire document. Khronos keeps a non-comprehensive registry of glTF extensions on GitHub, including all official Khronos extensions and a few third-party extensions. PBR extensions model the physical appearance of real-world objects, allowing developers to create realistic 3D assets that have the correct appearance. As new PBR extensions are released, they continue to expand PBR capabilities within the glTF framework, allowing a wider range of scenes and objects to be realistically rendered as 3D assets. The KTX 2.0 extension for universal texture compression enables 3D models in the glTF format to be highly compressed and to use natively supported texture formats, reducing file size and boosting rendering speed. Draco is a glTF extension for mesh compression, to compress and decompress 3D meshes, to help reduce the size of 3D files. It compresses vertex attributes, normals, colors, and texture coordinates. Various glTF extensions for game engine interoperability have been developed by OMI group. This includes extensions for physics shapes, physics bodies, physics joints, audio playback, seats, spawn points, and more. The VRM consortium has developed glTF extensions for advanced humanoid 3D avatars including dynamic spring bones and toon materials. == Derivative formats == 3D Tiles, an OGC Community Standard, builds on glTF to add a spatial data structure, metadata, and declarative styling for streaming massive heterogeneous 3D geospatial datasets. VRM, a model format for VR, is built on the .glb format. It is a 3D humanoid avatar specification and file format. == Software ecosystem == Khronos maintains the glTF Sample Viewer for viewing glTF assets. Khronos also maintains the glTF Validator for validating if 3D models conform to the glTF specification. Khronos maintains a glTF Compressor tool to interactively optimize and fine-tune compression settings for glTF assets using KTX 2.0 textures. glTF loaders are in open-source WebGL engines including PlayCanvas, Three.js, Babylon.js, Cesium, PEX, xeogl, and A-Frame. The Godot game engine supports and recommends the glTF format, with both import and export support. Open-source glTF converters are available from COLLADA, FBX, and OBJ. Assimp can import and export glTF. glTF files can also be directly exported from a variety of 3D editors, such as Blender, Unity (using the glTFast importer/exporter), Freecad, Vectary, Autodesk 3ds Max (natively or using Verge3D exporter), Autodesk Maya (using babylon.js exporter), Autodesk Inventor, Modo, Houdini, Paint 3D, Godot, and Substance Painter. Open-source glTF utility libraries are available for programming languages including JavaScript, Node.js, C++, C#, Python, Haskell, Java, Go, Rust, Haxe, Ada, and TypeScript. Khronos keeps a list of these libraries and other related applications on their ecosystem site. The Khronos 3D Commerce Working Group released Asset Creation Guidelines in 2020 outlining best practices for use of the glTF file format in 3D Commerce. In 2025, the Working Group launched Asset Creation Guidelines 2.0, a continuously updated resource with additional guidance for geometry, mesh optimization, UV maps, textures, materials/PBR performance, and web optimization. The Khronos PBR Neutral Tone Mappers specification is a tone mapper designed to faithfully reproduce an object's base color, hue, and saturation when using PBR rendering under grayscale lighting, supporting brand- and product-accurate color representation. Khronos maintains the glTF Asset Auditor to allow retailers and advertising technology platforms to validate 3D assets against either a default Audit Profile modelled on the 2020 3D Commerce Asset Creation Guidelines or a custom profile defined by the target application.
Content strategy
Content strategy guides the planning, development, and management of content. It is a recognized field in user experience design, and it also draws from adjacent disciplines such as information architecture, content management, business analysis, digital marketing, and technical communication. == Definitions == Content strategy has been described as planning for "the creation, publication, and governance of useful, usable content." It has also been called "a repeatable system that defines the entire editorial content development process for a website development project." In a 2007 article titled "Content Strategy: The Philosophy of Data," Rachel Lovinger describes the goal of content strategy as using "words and data to create unambiguous content that supports meaningful, interactive experiences." Here, she also provided the analogy that "content strategy is to copywriting as information architecture is to design." She encourages content strategists and collaborators to engage in early discussions about content meaning, models, and tools, to make sure strategy is integrated from the start rather than as an afterthought. The Content Strategy Alliance combines Kevin Nichols' definition with Kristina Halvorson's and defines content strategy as "getting the right content to the right user at the right time through strategic planning of content creation, delivery, and governance." == Practitioners == Content strategists are often familiar with a wide range of approaches, techniques, and tools. The perspectives that content strategists bring also depend heavily on their professional training and education. For instance, some specialize in "front-end strategy," which includes developing personas, journey mapping the user experience, aligning business strategy and user needs, developing a brand strategy, exploring different channels, and creating style guidelines and search engine optimization (SEO) guidelines. Others specialize in "back-end strategy," which includes creating content models, planning taxonomies and metadata, structuring content management systems, and building systems to support content reuse. Both roles involve addressing workflow and governance issues. Many organizations and individuals tend to confuse content strategists with editors. However, content strategy is "about more than just the written word," according to Washington State University associate professor Brett Atwood. For example, Atwood indicates that a practitioner needs to also "consider how content might be re-distributed and/or re-purposed in other channels of delivery." It has also been proposed that the content strategist performs the role of a curator. Just as a museum curator sifts through a collection of content and identifies key pieces that can be juxtaposed against each other to create meaning and spur excitement, a content strategist "must approach a business’s content as a medium that needs to be strategically selected and placed to engage the audience, convey a message, and inspire action."
Web3D
Web3D, also called 3D Web, is a group of technologies to display and navigate websites using 3D computer graphics. These technologies enable applications such as online games, virtual reality experiences, interactive product demonstrations, and 3D data visualization directly within web browsers. The emergence of Web3D dates back to 1994, with the advent of VRML, a file format designed to store and display 3D graphical data on the World Wide Web. Modern Web3D is primarily powered by WebGL, a JavaScript API that enables hardware-accelerated 3D graphics rendering in web browsers without requiring plug-ins. == Pre-WebGL era == The emergence of Web3D dates back to 1994, with the advent of VRML, a file format designed to store and display 3D graphical data on the World Wide Web. In October 1995, at Internet World, Template Graphics Software demonstrated a 3D/VRML plug-in for the beta release of Netscape 2.0 by Netscape Communications. The Web3D Consortium was formed to further the collective development of the format. VRML and its successor, X3D, have been accepted as international standards by the International Organization for Standardization and the International Electrotechnical Commission. The main drawback of the technology was the requirement to use third-party browser plug-ins to perform 3D rendering, which slowed the adoption of the standard. Between 2000 and 2010, one of these plug-ins, Adobe Flash Player, was widely installed on desktop computers and was used to display interactive web pages and online games and to play video and audio content. Several Flash-based frameworks appeared that used software rendering and ActionScript 3 to perform 3D computations such as transformations, lighting, and texturing. Most notable among them were Papervision3D and Away3D. Eventually, Adobe developed Stage3D, an API for rendering interactive 3D graphics with GPU-acceleration for its Flash player and AIR products, which was adopted by software vendors. In 2009, an open-source 3D web technology called O3D was introduced by Google. It also required a browser plug-in, but contrary to Flash/Stage3D, was based on JavaScript API. O3D was geared not only for games but also for advertisements, 3D model viewers, product demos, simulations, engineering applications, control and monitoring systems. == WebGL and glTF == WebGL (short for "Web Graphics Library") evolved out of the Canvas 3D experiments started by Vladimir Vukićević at Mozilla Foundation. Vukićević first demonstrated a Canvas 3D prototype in 2006. By the end of 2007, both Mozilla and Opera had made their own separate implementations. In early 2009, the nonprofit technology consortium Khronos Group started the WebGL Working Group, with initial participation from Apple, Google, Mozilla, Opera, and others. Version 1.0 of the WebGL specification was released in March 2011. Major advantages of the new technology include conformity with web standards and near-native 3D performance without the use of any browser plug-ins. Since WebGL is based on OpenGL ES, it works on mobile devices without any additional abstraction layers. For other platforms, WebGL implementations leverage ANGLE to translate OpenGL ES calls to DirectX, OpenGL, or Vulkan API calls. Among notable WebGL frameworks are A-Frame, which uses HTML-based markup for building virtual reality experiences; PlayCanvas, an open-source engine alongside a proprietary cloud-hosted creation platform for building browser games; Three.js, an MIT-licensed framework used to create demoscene from the early 2000s; Unity, which obtained a WebGL back-end in version 5; and Verge3D, which integrates with Blender, 3ds Max, and Maya to create 3D web content. With the rapid adoption of WebGL, a new problem arose—the lack of a 3D file format optimized for the Web. This issue was addressed by glTF, a format that was conceived in 2012 by members of the COLLADA working group. At SIGGRAPH 2012, Khronos presented a demo of glTF, which was then called WebGL Transmissions Format (WebGL TF). On 19 October 2015, the glTF 1.0 specification was released. Version 2.0 glTF uses a physically based rendering material model, proposed by Fraunhofer. Other upgrades include sparse accessors and morph targets for techniques such as facial animation, and schema tweaks and breaking changes for corner cases or performance, such as replacing top-level glTF object properties with arrays for faster index-based access. == Future == "WebGPU" is the working name for a potential web standard and JavaScript API for accelerated graphics and computing, aiming to provide "modern 3D graphics and computation capabilities". It is developed by the W3C "GPU for the Web" Community Group, with engineers from Apple, Mozilla, Microsoft, and Google, among others. WebGPU will not be based on any existing 3D API and will use Rust-like syntax for shaders.
Robot learning
Robot learning is a research field at the intersection of machine learning and robotics. It studies techniques allowing a robot to acquire novel skills or adapt to its environment through learning algorithms. The embodiment of the robot, situated in a physical embedding, provides at the same time specific difficulties (e.g. high-dimensionality, real time constraints for collecting data and learning) and opportunities for guiding the learning process (e.g. sensorimotor synergies, motor primitives). Example of skills that are targeted by learning algorithms include sensorimotor skills such as locomotion, grasping, active object categorization, as well as interactive skills such as joint manipulation of an object with a human peer, and linguistic skills such as the grounded and situated meaning of human language. Learning can happen either through autonomous self-exploration or through guidance from a human teacher, like for example in robot learning by imitation. Robot learning can be closely related to adaptive control, reinforcement learning as well as developmental robotics which considers the problem of autonomous lifelong acquisition of repertoires of skills. While machine learning is frequently used by computer vision algorithms employed in the context of robotics, these applications are usually not referred to as "robot learning". == Imitation learning == Many research groups are developing techniques where robots learn by imitating. This includes various techniques for learning from demonstration (sometimes also referred to as "programming by demonstration") and observational learning. == Sharing learned skills and knowledge == In Tellex's "Million Object Challenge", the goal is robots that learn how to spot and handle simple items and upload their data to the cloud to allow other robots to analyze and use the information. RoboBrain is a knowledge engine for robots which can be freely accessed by any device wishing to carry out a task. The database gathers new information about tasks as robots perform them, by searching the Internet, interpreting natural language text, images, and videos, object recognition as well as interaction. The project is led by Ashutosh Saxena at Stanford University. RoboEarth is a project that has been described as a "World Wide Web for robots" − it is a network and database repository where robots can share information and learn from each other and a cloud for outsourcing heavy computation tasks. The project brings together researchers from five major universities in Germany, the Netherlands and Spain and is backed by the European Union. Google Research, DeepMind, and Google X have decided to allow their robots share their experiences. == Vision-language-action model == Research groups and companies are developing vision-language-action models, foundation models that allow robotic control through the combination of vision and language. Google DeepMind, Figure AI and Hugging Face are actively working on that.
Digital Cinema Package
A Digital Cinema Package (DCP) is a collection of digital files used to store and convey digital cinema (DC) audio, image, and data streams. The term was popularized by Digital Cinema Initiatives, LLC in its original recommendation for packaging DC contents. However, the industry tends to apply the term to the structure more formally known as the composition. A DCP is a container format for compositions, a hierarchical file structure that represents a title version. The DCP may carry a partial composition (e.g. not a complete set of files), a single complete composition, or multiple and complete compositions. The composition consists of a Composition Playlist (in XML format) that defines the playback sequence of a set of Track Files. Track Files carry the essence (audio, image, subtitles), which is wrapped using Material eXchange Format (MXF). Track Files must contain only one essence type. Two track files at a minimum must be present in every composition (see SMPTE ST429-2 D-Cinema Packaging – DCP Constraints, or Cinepedia): a track file carrying picture essence, and a track file carrying audio essence. The composition, consisting of a Composition Playlist (CPL) and associated track files, are distributed as a Digital Cinema Package (DCP). A composition is a complete representation of a title version, while the DCP need not carry a full composition. However, as already noted, it is commonplace in the industry to discuss the title in terms of a DCP, as that is the deliverable to the cinema. The Picture Track File essence is compressed using JPEG 2000 and the Audio Track File carries a 24-bit linear PCM uncompressed multichannel WAV file. Encryption may optionally be applied to the essence of a track file to protect it from unauthorized use. The encryption used is AES 128-bit in CBC mode. In practice, there are two versions of composition in use. The original version is called Interop DCP. In 2009, a specification was published by SMPTE (SMPTE ST 429-2 Digital Cinema Packaging – DCP Constraints) for what is commonly referred to as SMPTE DCP. SMPTE DCP is similar but not backwards compatible with Interop DCP, resulting in an uphill effort to transition the industry from Interop DCP to SMPTE DCP. SMPTE DCP requires significant constraints to ensure success in the field, as shown by ISDCF. While legacy support for Interop DCP is necessary for commercial products, new productions are encouraged to be distributed in SMPTE DCP. == Technical specifications == The DCP root folder (in the storage medium) contains a number of files, some used to store the image and audio contents, and some other used to organize and manage the whole playlist. === Picture MXF files === Picture contents may be stored in one or more reels corresponding to one or more MXF files. Each reel contains pictures as MPEG-2 or JPEG 2000 essence, depending on the adopted codec. MPEG-2 is no longer compliant with the DCI specification. JPEG 2000 is the only accepted compression format. Supported frame rates are: SMPTE (JPEG 2000) 24, 25, 30, 48, 50, and 60 fps @ 2K 24, 25, and 30 fps @ 4K 24 and 48 fps @ 2K stereoscopic MXF Interop (JPEG 2000) – Deprecated 24 and 48 fps @ 2K (MXF Interop can be encoded at 25 frame/s but support is not guaranteed) 24 fps @ 4K 24 fps @ 2K stereoscopic MXF Interop (MPEG-2) – Deprecated 23.976 and 24 fps @ 1920 × 1080 Maximum frame sizes are 2048 × 1080 for 2K DC, and 4096 × 2160 for 4K DC. Common formats are: SMPTE (JPEG 2000) Flat (1998 × 1080 or 3996 × 2160), = 1.85:1 aspect ratio Scope (2048 × 858 or 4096 × 1716), ~2.39:1 aspect ratio HDTV (1920 × 1080 or 3840 × 2160), 16:9 aspect ratio (~1.78:1) (although not specifically defined in the DCI specification, this resolution is DCI compliant per section 8.4.3.2). Full (2048 × 1080 or 4096 × 2160) (~1.9:1 aspect ratio, official name by DCI is Full Container. Not widely accepted in cinemas.) MXF Interop (MPEG-2) – Deprecated Full Frame (1920 × 1080) 12 bits per component precision (36 bits total per pixel) XYZ' colorspace; the prime mark indicates gamma encoding (gamma=2.6) Maximum bit rate is 250 Mbit/s (1.3 MBytes per frame at 24 frame per second) === Sound MXF files === Sound contents are also stored in reels corresponding to picture reels in number and duration. In case of multilingual features, separate reels are required to convey different languages. Each file contains linear PCM essence. Sampling rate is 48,000 or 96,000 samples per second Sample precision of 24 bits Linear mapping (no companding) Up to 16 independent channels === Asset map file === List of all files included in the DCP, in XML format. === Composition playlist file === Defines the playback order during presentation. The order is saved in XML format in this file; each picture and sound reel is identified by its UUID. In the following example, a reel is composed by picture and sound: === Packing list file or package key list (PKL) === All files in the composition are hashed and their hash is stored here, in XML format. This file is generally used during ingestion in a digital cinema server to verify if data have been corrupted or tampered with in some way. For example, an MXF picture reel is identified by the following