AI News and Guides

Explore the best AI News and Guides — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Realization (linguistics)

    Realization (linguistics)

    In linguistics, realization is the process by which some kind of surface representation is derived from its underlying representation; that is, the way in which some abstract object of linguistic analysis comes to be produced in actual language. Phonemes are often said to be realized by speech sounds. The different sounds that can realize a particular phoneme are called its allophones. Realization is also a subtask of natural language generation, which involves creating an actual text in a human language (English, French, etc.) from a syntactic representation. There are a number of software packages available for realization, most of which have been developed by academic research groups in NLG. The remainder of this article concerns realization of this kind. == Example == For example, the following Java code causes the simplenlg system [2] to print out the text The women do not smoke.: In this example, the computer program has specified the linguistic constituents of the sentence (verb, subject), and also linguistic features (plural subject, negated), and from this information the realiser has constructed the actual sentence. == Processing == Realisation involves three kinds of processing: Syntactic realisation: Using grammatical knowledge to choose inflections, add function words and also to decide the order of components. For example, in English the subject usually precedes the verb, and the negated form of smoke is do not smoke. Morphological realisation: Computing inflected forms, for example the plural form of woman is women (not womans). Orthographic realisation: Dealing with casing, punctuation, and formatting. For example, capitalising The because it is the first word of the sentence. The above examples are very basic, most realisers are capable of considerably more complex processing. == Systems == A number of realisers have been developed over the past 20 years. These systems differ in terms of complexity and sophistication of their processing, robustness in dealing with unusual cases, and whether they are accessed programmatically via an API or whether they take a textual representation of a syntactic structure as their input. There are also major differences in pragmatic factors such as documentation, support, licensing terms, speed and memory usage, etc. It is not possible to describe all realisers here, but a few of the emerging areas are: Simplenlg [3]: a document realizing engine with an api which intended to be simple to learn and use, focused on limiting scope to only finding the surface area of a document. KPML [4]: this is the oldest realiser, which has been under development under different guises since the 1980s. It comes with grammars for ten different languages. FUF/SURGE [5]: a realiser which was widely used in the 1990s, and is still used in some projects today OpenCCG [6]: an open-source realiser which has a number of nice features, such as the ability to use statistical language models to make realisation decisions.

    Read more →
  • Mike Vernal

    Mike Vernal

    Mike Vernal (born September 7, 1980) is an American business executive who is a venture capitalist at Conviction. He was previously an investor at Sequoia Capital in Silicon Valley and was one of the top executives at Facebook between 2008 and 2016. Prior to joining Sequoia Capital, he was Vice President of Search, Local, and Developer products at Facebook. == Career == Vernal joined Facebook in 2008. From 2009 to 2013, Vernal managed the Facebook Platform team and is credited with managing the Facebook Platform transition from desktop to mobile. During his time at Facebook, he served as vice president and was considered among the “top executives” who ran the company. In 2016, after eight years at Facebook, Vernal announced his plans to leave the company. In May 2016, he joined Sequoia Capital, a venture-capital firm specializing in technology startups. He is an early investor in Rippling, Clay, Notion and Statsig. In July 2023, The Information reported that Vernal was departing Sequoia. At Conviction, he has led investments in Listen Labs, OpenEvidence and Thinking Machines Lab.

    Read more →
  • Hyperion Data Center

    Hyperion Data Center

    The Richland Parish Data Center, nicknamed "Hyperion", is a planned artificial intelligence data center by Meta Platforms under-construction along Highway La. 183 in Richland Parish, Louisiana, just outside of Holly Ridge. It is one of a number of "titan clusters" being built in preparation for the emergence of AI superintelligence. Modern technological researchers disagree as to whether or not superintelligence will ever exist, though Meta CEO Mark Zuckerberg has expressed belief that its creation is inevitable. Current plans allot for the investment of $27 billion, as the structure is built from 2025 to 2030. == History == Meta was considering potential locations for their flagship data center in early 2024. Before being announced later in December, the plan was completely secret; meetings held between involved organisations and even government officials could only refer to it by the codename "Project Sucre" to protect it from potential corporate espionage. The data center was first announced on 04 December 2024, though its full scale was yet to be revealed. At first, Meta would not even claim responsibility for it, channelling all of its investments through the secret shell subsidiary Laidley LLC. We set out looking for a place where we could expand into gigawatts pretty quickly, and really get moving within that community on a large plot of land very quickly. We looked at finding very, very large contiguous plots of land that had access to the infrastructure that we need, the energy that we needed, and could move very, very quickly for us. The Louisiana-based Entergy Corporation, aiming for the facility to be built in its own backyard, negotiated a deal with the government of Louisiana to provide Meta with enormous tax breaks if they agreed to build Hyperion there. The Louisiana legislature responded by passing Act 730, which provides significant tax rebates on the purchase or lease of equipment for building and operating data centers. Meta found the arrangement acceptable, and bought a plot of land from the government. The government also had to further amend its laws to allow Meta to do this, as pre-existing policy forbade purchasing land directly from the government instead of hosting a public auction. The plot of land, originally called Franklin Farms, was purchased from the Franklin family in 2006 by the government, intending for it to be developed into an automotive manufacturing plant. Greater attention was brought to Hyperion it when Zuckerberg posted about the project on 14 July 2025 on Threads. The project subsequently caught media attention for its large size, as Zuckerberg's post portrayed the structure superimposed over Manhattan (pictured). The construction site spans 2,250 acres (9.1 km2) with a planned floor area of 4,000,000 square feet (371612 m2), making it the third largest building in the world by floor area upon completion. Meta initially reported the construction cost to be over $10 billion, but in October 2025, it announced a partnership with Blue Owl Capital providing for at least $27 billion. == Operation == The facility is expected to consume up to 5 gigawatts (GW) of computational power, more electricity than is currently used by the entire State of Louisiana. As part of their deal made with Meta, Entergy plans to be able to produce at least 3.8 GW of electricity for the operation. == Response to the project == Louisiana Governor Jeff Landry thanked Meta for their decision to build Hyperion in Louisiana, stating that it would "create opportunities for Louisiana workers to fill high-paying jobs of the future." and calling it "A New Chapter" for the state. The Louisiana Economic Development (LED) state agency further praised the project, citing Meta's estimate that it would create 1,500 jobs. Additionally, Richland Parish Supervisor Joey Evans stated that he was excited about the project. As part of their agreement with Meta, Energy announced their plan to increase electricity production state-wide. They say that this will result in the cost of energy reducing, though Entergy filings revealed in June 2025 that the cost of electricity would rise and be passed onto consumers. Meta also pledged to match all of Hyperion's power consumption with 100% environmentally friendly electricity production. So far, Entergy has begun building three gas-powered combined-cycle power plants and a substation in response to the project. Delta Community College announced in response to Hyperion's construction that it would expand its construction and trade programs. In January 2025, Business Facilities Magazine selected Hyperion for its annual Deal of the Year Platinum Award for 2024. Much of the initial backlash following Hyperion's announcement centered around the fast-tracked approval of the project by the state government, and scepticism around Meta's various claims (environmental friendliness, 100% renewable energy, local economic stimulation, price reductions). The Sierra Club criticised Meta for gentrifying the surrounding area, and was highly sceptical of their promise to keep it environmentally friendly. Environmental activist group Earthjustice attempted to have a subpoena of Meta approved to determine if they were compliant with environmental protection laws, though they were unsuccessful. Many residents of Holy Ridge have been critical of the construction, complaining about the increased construction vehicle traffic and intense gentrification. Another point of contention is Meta's continued reliance on out-of-state contractors in the facility's construction in spite of their previous commitment to "hire as many local folk as [we] possibly can." In spite of Entergy's continual denial that the facility's construction will not adversely affect the power grid, numerous electrical outages have been reported since construction began.

    Read more →
  • Innovation Center for Artificial Intelligence

    Innovation Center for Artificial Intelligence

    The Innovation Center for Artificial Intelligence (ICAI) is a Dutch national network focused on joint technology development between academia, industry and government in the area of artificial intelligence (AI). The initiative was launched in April 2018 and is based at Amsterdam Science Park. As of 2024, the director of the ICAI is Maarten de Rijke. In November 2018, ICAI announced its contribution to AINED, the first iteration of the Dutch National AI Strategy. In January 2023, Maastricht University announced the ROBUST program, led by the Innovation Center for Artificial Intelligence (ICAI) and supported by the University of Amsterdam and others. This initiative focuses on advancing research in trustworthy AI technology across various sectors, notably healthcare and energy, in the Netherlands. The program's plan includes the creation of 17 new labs and the appointment of PhD candidates, backed by a €25 million funding from the Dutch Research Council (NWO). == Labs == The ICAI network is linked to several collaborative labs: Thira Lab (Imaging): Thirona, Delft Imaging Systems and Radboud UMC, founded March 2019 AIMLab (AI for Medical Imaging): Uva and Inception Institute of Artificial Intelligence from the United Arab Emirates, founded March 2019 AFL (AI for Fintech): ING and Delft University of Technology, founded March 2019 Police Lab AI: Dutch National Police, founded January 2019 Elsevier AI Lab: Uva and Elsevier, founded October 2018 AIRLab Delft (AI for Retail Robotics): TU Delft Robotics and AholdDelhaize, founded November 2018 Quva Lab (Deep Vision): Uva and Qualcomm, founded 2016 (prior to ICAI) AIRLab Amsterdam (AI for Retail): Uva and AholdDelhaize, founded April 2018 DeltaLab (Deep Learning Technologies Amsterdam): Uva and Bosch, founded April 2017 (prior to ICAI) AI4SE (AI for Software Engineering Lab) Delft University of Technology and JetBrains, founded October 2023 Atlas Lab: Uva and TomTom (TOM2)

    Read more →
  • Web development

    Web development

    Web development is the process of designing, developing and maintaining websites and web apps. Web development encompasses several different fields, most commonly referring to the programming of websites. Front-end development is the act of developing the user interface and client-side code, while back-end development focuses on the infrastructure behind a website, mainly server-side code. Since the World Wide Web was released publicly in 1993, web development has evolved greatly, with websites changing from a collection of static HTML pages to complex projects using frameworks, servers, and databases. == Overview == Web development includes many individual tasks, including web design, web content development, networking, and coding. Among web professionals, "web development" usually refers to the main non-design aspects of building websites: writing markup and coding. Web development is generally split into two fields: front-end development and back-end development. Front-end developers create the user interface of websites, turning web designs into HTML, CSS, and JavaScript code. Front-end developers must also make sure that websites work consistently across different browsers and devices. Back-end development, also known as server-side development, focuses on the infrastructure behind a website, including APIs, database management, and security. Some choose to be full-stack developers, meaning they work on both the front-end and back-end. == History == The World Wide Web is often categorised into three generations: Web 1.0, Web 2.0, and Web 3.0 (or Web3). It was invented in 1989, and released to the public in 1993. In the early years of the web, restrospecitvely referred to as Web 1.0, websites were simply a collection of static HTML files, and had limited interactivity. After the introduction of JavaScript in 1995, websites could contain logic, allowing for interactivity. The following year CSS was released, allowing greater control over the styling of web pages. In 1999, the term Web 2.0 was coined by Darcy DiNucci. The term later resurfaced in the early 2000s, as websites started to increase in complexity, requiring server-side services in addition to JavaScript. This led to the emergence of various new programming languages and frameworks designed for backend services, such as PHP, Active Server Pages, and Jakarta Server Pages. This enabled websites to do additional server-side processing, such as accessing databases. Another shift in web development was the release of the iPhone in 2007. This created a new medium for accessing the web, requiring a new approach to web development, and resulting in responsive web design, which allows a single website to appear different depending on the device running it. Later, progressive web apps were introduced, allowing websites to be installed on a device as an independent application. In the 2010s, JavaScript frameworks began to emerge, creating new ways to manipulate web pages, and increasing compatibility between web browsers. JQuery was popular in the early 2010s, but was later surpassed by other frameworks such as React and Vue.js. In the mid 2020s, use of AI became prevalent among web developers, with the 2025 Stack Overflow survey showing over 80% of developers saying the use AI at least monthly in their development process.

    Read more →
  • Leading the Future

    Leading the Future

    Leading the Future is an American super PAC network focused on lobbying for policies friendly to the artificial intelligence industry. It was launched in 2025 with over $100 million from industry stakeholders including Andreessen Horowitz, OpenAI President Greg Brockman and Palantir co-founder Joe Lonsdale. The launch was preceded by talks between Collin McCune, head of government affairs at Andreessen Horowitz, and Chris Lehane, chief global affairs officer at OpenAI. Among the members of the network are the American Mission PAC, which supported Chris Gober, and the Think Big PAC, which targeted Alex Bores. Leading the Future is affiliated with the nonprofit Build American AI, which Axios describes as a dark money advocacy "offshoot" operating alongside the super PAC. NBC News states that the network’s efforts are modeled after the pro-cryptocurrency group Fairshake. Leading the Future is led by Zac Moffatt and Josh Vlasto, the latter of whom previously served as an advisor to Fairshake. In response to the creation of Leading the Future, former members of Congress Brad Carson and Chris Stewart co-founded the super PAC network Public First, aiming to counter the group’s influence. In April 2026, an investigation by Model Republic linked Leading the Future to The Wire By Acutus, an automated news website that allegedly used AI agents posing as human journalists to solicit interviews. The site's content was found to closely mirror the PAC's deregulatory policy goals while targeting researchers and advocates skeptical of rapid AI development. In May 2026, Wired revealed that Build American AI used a "dark money" campaign to pay TikTok and Instagram influencers $5,000 per video to promote scripted narratives framing Chinese AI as a "national security threat." According to internal documents and staff at the marketing agency managing the project, the campaign's explicit goal was to "subtly shift public debate" toward the deregulation of AI industries while intentionally avoiding technical discussions regarding AI quality or safety. During the 2026 primary season Leading the Future went on to endorse several candidates in both Democratic and Republican races with several of them going on to win.

    Read more →
  • Executive Order 14179

    Executive Order 14179

    Executive Order 14179, titled "Removing Barriers to American Leadership in Artificial Intelligence", is an executive order signed by Donald Trump, the 47th President of the United States, on January 23, 2025. The executive order aims to initiate the process of strengthening U.S. leadership in artificial intelligence, promote AI development free from ideological bias or social agendas, establish an action plan to maintain global AI dominance, and to revise or rescind policies that conflict with these goals. == Background == === Joe Biden === This executive order comes in response to the Executive Order 14110 titled Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (sometimes referred to as "Executive Order on Artificial Intelligence") signed by Joe Biden on October 30, 2023. === Donald Trump === Donald Trump rescinded Executive Order 14110 on his first day in office with the Initial Rescissions of Harmful Executive Orders and Actions executive order. On January 23, 2025, Trump signed the Removing Barriers to American Leadership in Artificial Intelligence executive order as the replacement executive order covering the development of artificial intelligence technologies. == Provisions == It revokes existing AI policies and directives that are seen as barriers to U.S. AI innovation. It mandates the creation of an action plan within 180 days to sustain U.S. AI leadership, focusing on human flourishing, economic competitiveness, and national security. It requires the review of policies, directives, and regulations related to Executive Order 14110 (from October 2023) to identify actions that may conflict with the new policy goals. Agencies are instructed to suspend, revise, or rescind actions from the previous executive order that may be inconsistent with the new policy. The Office of Management and Budget (OMB) must revise certain memoranda (M-24-10 and M-24-18) within 60 days to align with the new policy. The order specifies that it does not create new enforceable rights or benefits and should be implemented within the boundaries of existing law and appropriations. == Implementation == The NITRD program, on behalf of the Office of Science and Technology Policy (OSTP), requested public input on the development of an AI Action Plan by March 15. == Reactions == Over 10,000 public comments were submitted in response to the OSTP request for public input. OpenAI submitted comments proposing a five-point strategy focused on regulatory preemption, export controls, copyright protections, infrastructure investment, and government adoption to ensure AI innovation, promote democratic AI globally, and protect national security. They emphasized the ability to learn from copyrighted material to maintain America's lead against China's state-controlled AI efforts like DeepSeek. Google submitted comments advocating for a three-pronged plan that invests in domestic AI development through energy infrastructure reform, balanced export controls, continued research funding, and coherent federal policies, while modernizing government AI adoption and promoting innovation-friendly approaches internationally. Both OpenAI and Google urged White House opposition to foreign copyright and transparency obligations, for example in the UK Government's preferred option in their Copyright and AI consultation.

    Read more →
  • OntoCAPE

    OntoCAPE

    OntoCAPE is a large-scale ontology for the domain of Computer-Aided Process Engineering (CAPE). It can be downloaded free of charge via the OntoCAPE Homepage. OntoCAPE is partitioned into 62 sub-ontologies, which can be used individually or as an integrated suite. The sub-ontologies are organized across different abstraction layers, which separate general knowledge from knowledge about particular domains and applications. The upper layers have the character of an upper ontology, covering general topics such as mereotopology, systems theory, quantities and units. The lower layers conceptualize the domain of chemical process engineering, covering domain-specific topics such as materials, chemical reactions, or unit operations.

    Read more →
  • Developmental robotics

    Developmental robotics

    Developmental robotics (DevRob), sometimes called epigenetic robotics, is a scientific field which aims at studying the developmental mechanisms, architectures and constraints that allow lifelong and open-ended learning of new skills and new knowledge in embodied machines. As in human children, learning is expected to be cumulative and of progressively increasing complexity, and to result from self-exploration of the world in combination with social interaction. The typical methodological approach consists in starting from theories of human and animal development elaborated in fields such as developmental psychology, neuroscience, developmental and evolutionary biology, and linguistics, then to formalize and implement them in robots, sometimes exploring extensions or variants of them. The experimentation of those models in robots allows researchers to confront them with reality, and as a consequence, developmental robotics also provides feedback and novel hypotheses on theories of human and animal development. Developmental robotics is related to but differs from evolutionary robotics (ER). ER uses populations of robots that evolve over time, whereas DevRob is interested in how the organization of a single robot's control system develops through experience, over time. DevRob is also related to work done in the domains of robotics and artificial life. == Background == Can a robot learn like a child? Can it learn a variety of new skills and new knowledge unspecified at design time and in a partially unknown and changing environment? How can it discover its body and its relationships with the physical and social environment? How can its cognitive capacities continuously develop without the intervention of an engineer once it is "out of the factory"? What can it learn through natural social interactions with humans? These are the questions at the center of developmental robotics. Alan Turing, as well as a number of other pioneers of cybernetics, already formulated those questions and the general approach in 1950, but it is only since the end of the 20th century that they began to be investigated systematically. Because the concept of adaptive intelligent machines is central to developmental robotics, it has relationships with fields such as artificial intelligence, machine learning, cognitive robotics or computational neuroscience. Yet, while it may reuse some of the techniques elaborated in these fields, it differs from them from many perspectives. It differs from classical artificial intelligence because it does not assume the capability of advanced symbolic reasoning and focuses on embodied and situated sensorimotor and social skills rather than on abstract symbolic problems. It differs from cognitive robotics because it focuses on the processes that allow the formation of cognitive capabilities rather than these capabilities themselves. It differs from computational neuroscience because it focuses on functional modeling of integrated architectures of development and learning. More generally, developmental robotics is uniquely characterized by the following three features: It targets task-independent architectures and learning mechanisms, i.e. the machine/robot has to be able to learn new tasks that are unknown by the engineer; It emphasizes open-ended development and lifelong learning, i.e. the capacity of an organism to acquire continuously novel skills. This should not be understood as a capacity for learning "anything" or even “everything”, but just that the set of skills that is acquired can be infinitely extended at least in some (not all) directions; The complexity of acquired knowledge and skills shall increase (and the increase be controlled) progressively. Developmental robotics emerged at the crossroads of several research communities including embodied artificial intelligence, enactive and dynamical systems cognitive science, connectionism. Starting from the essential idea that learning and development happen as the self-organized result of the dynamical interactions among brains, bodies and their physical and social environment, and trying to understand how this self-organization can be harnessed to provide task-independent lifelong learning of skills of increasing complexity, developmental robotics strongly interacts with fields such as developmental psychology, developmental and cognitive neuroscience, developmental biology (embryology), evolutionary biology, and cognitive linguistics. As many of the theories coming from these sciences are verbal and/or descriptive, this implies a crucial formalization and computational modeling activity in developmental robotics. These computational models are then not only used as ways to explore how to build more versatile and adaptive machines but also as a way to evaluate their coherence and possibly explore alternative explanations for understanding biological development. == Research directions == === Skill domains === Due to the general approach and methodology, developmental robotics projects typically focus on having robots develop the same types of skills as human infants. A first category that is important being investigated is the acquisition of sensorimotor skills. These include the discovery of one's own body, including its structure and dynamics such as hand-eye coordination, locomotion, and interaction with objects as well as tool use, with a particular focus on the discovery and learning of affordances. A second category of skills targeted by developmental robots are social and linguistic skills: the acquisition of simple social behavioural games such as turn-taking, coordinated interaction, lexicons, syntax and grammar, and the grounding of these linguistic skills into sensorimotor skills (sometimes referred as symbol grounding). In parallel, the acquisition of associated cognitive skills are being investigated such as the emergence of the self/non-self distinction, the development of attentional capabilities, of categorization systems and higher-level representations of affordances or social constructs, of the emergence of values, empathy, or theories of mind. === Mechanisms and constraints === The sensorimotor and social spaces in which humans and robot live are so large and complex that only a small part of potentially learnable skills can actually be explored and learnt within a life-time. Thus, mechanisms and constraints are necessary to guide developmental organisms in their development and control of the growth of complexity. There are several important families of these guiding mechanisms and constraints which are studied in developmental robotics, all inspired by human development: Motivational systems, generating internal reward signals that drive exploration and learning, which can be of two main types: extrinsic motivations push robots/organisms to maintain basic specific internal properties such as food and water level, physical integrity, or light (e.g. in phototropic systems); intrinsic motivations push robot to search for novelty, challenge, compression or learning progress per se, thus generating what is sometimes called curiosity-driven learning and exploration, or alternatively active learning and exploration; Social guidance: as humans learn a lot by interacting with their peers, developmental robotics investigates mechanisms that can allow robots to participate to human-like social interaction. By perceiving and interpreting social cues, this may allow robots both to learn from humans (through diverse means such as imitation, emulation, stimulus enhancement, demonstration, etc. ...) and to trigger natural human pedagogy. Thus, social acceptance of developmental robots is also investigated; Statistical inference biases and cumulative knowledge/skill reuse: biases characterizing both representations/encodings and inference mechanisms can typically allow considerable improvement of the efficiency of learning and are thus studied. Related to this, mechanisms allowing to infer new knowledge and acquire new skills by reusing previously learnt structures is also an essential field of study; The properties of embodiment, including geometry, materials, or innate motor primitives/synergies often encoded as dynamical systems, can considerably simplify the acquisition of sensorimotor or social skills, and is sometimes referred as morphological computation. The interaction of these constraints with other constraints is an important axis of investigation; Maturational constraints: In human infants, both the body and the neural system grow progressively, rather than being full-fledged already at birth. This implies, for example, that new degrees of freedom, as well as increases of the volume and resolution of available sensorimotor signals, may appear as learning and development unfold. Transposing these mechanisms in developmental robots, and understanding how it may hinder or on the contrary ease the acquisition of novel complex skills is a central questi

    Read more →
  • John Schulman

    John Schulman

    John Schulman (born 1987 or 1988) is an American artificial intelligence researcher and co-founder of OpenAI. In August 2024, he announced he would be joining Anthropic. In February 2025, he announced he was leaving to join Thinking Machines Lab, where he is chief scientist. == Early life and education == Schulman had an interest in science and math from a young age. He enjoyed science fiction, especially the work of Isaac Asimov. When he was in seventh grade, he became deeply interested in the television program BattleBots, which featured combat between remote-controlled robots. In what he said was his first self-directed study, he read extensively in subject areas that would help him design a superior robot, but the robot he and his friends worked on was never built. He attended Great Neck South High School. He was a member of the US Physics olympiad Team in 2005. In 2010, he graduated from Caltech with a degree in physics. He has a PhD in electrical engineering and computer sciences from the University of California, Berkeley, where he was advised by Pieter Abbeel. == Career == In December 2015, shortly before finishing his PhD, Schulman co-founded OpenAI with Sam Altman, Elon Musk, Ilya Sutskever, Greg Brockman, Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, Pamela Vagata, and Wojciech Zaremba, with Sam Altman and Elon Musk as the co-chairs. There, he led the reinforcement learning team that created ChatGPT. He has been referred to as the "architect" of ChatGPT. In August 2024, Schulman announced he would be joining Anthropic. He stated his move was to allow him to deepen his focus on AI alignment and return to more hands-on technical work. In February 2025, he announced he was leaving to join Thinking Machines Lab, where he is chief scientist. == Awards and honors == In 2025, Schulman received the Mark Bingham Award for Excellence in Achievement by Young Alumni from his alma mater, UC Berkeley.

    Read more →
  • SHRDLU

    SHRDLU

    SHRDLU is an early natural-language understanding computer program that was developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the state of a simplified "blocks world", essentially a virtual box filled with different blocks. SHRDLU was written in the Micro Planner and Lisp programming language on the DEC PDP-6 computer and a DEC graphics terminal. Later additions were made at the computer graphics labs at the University of Utah, adding a full 3D rendering of SHRDLU's "world". The name SHRDLU was derived from ETAOIN SHRDLU, the arrangement of the letter keys on a Linotype machine, arranged in descending order of usage frequency in English. == Functionality == SHRDLU is primarily a language parser that allows user interaction using English terms. The user instructs SHRDLU to move various objects around in the "blocks world" containing various basic objects such as blocks, cones and balls. SHRDLU combined four simple ideas to make the simulation of "understanding" more convincing. One was that SHRDLU's world is so simple that the entire set of objects and locations could be described by including as few as 50 words: nouns like "block" and "cone", verbs like "place on" and "move to", and adjectives like "big" and "blue". The possible combinations of these basic language building blocks are quite simple. SHRDLU also includes a basic memory to supply context. One could ask SHRDLU to "put the green cone on the red block" and then "take the cone off"; "the cone" would be taken to mean the green cone one had just talked about. SHRDLU can search back further through the interactions to find the proper context in most cases when additional adjectives were supplied. One could also ask questions about the history; for instance, one could ask "did you pick up anything before the cone?" A side effect of this memory, and the original rules SHRDLU was supplied with, is that the program can answer questions about what was possible in the world and what was not. For instance, SHRDLU can deduce that blocks could be stacked by looking for examples, but also realize that triangles could not be stacked, after having tried it. The "world" contains basic physics to make blocks fall over, independent of the language parser. Finally, SHRDLU can also remember names given to objects, or arrangements of them. For instance one could say "a steeple is a small triangle on top of a tall rectangle"; SHRDLU can then answer questions about steeples in the blocks world, and build new ones. This is an example of a SHRDLU session: Person: Pick up a big red block. Computer: OK. Person: Grasp the pyramid. Computer: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. Person (changing their mind): Find a block which is taller than the one you are holding and put it into the box. Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. Computer: OK. Person: What does the box contain? Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. Person: What is the pyramid supported by? Computer: THE BOX. Person: How many blocks are not in the box? Computer: FOUR OF THEM. Person: Is at least one of them narrower than the one which i told you to pick up? Computer: YES, THE RED CUBE. Person: Is it supported? Computer: YES, BY THE TABLE. Person: Can the table pick up blocks? Computer: NO. Person: Can a pyramid be supported by a block? Computer: YES. Person: Can a pyramid support a pyramid? Computer: I DON'T KNOW. Person: Stack up two pyramids. Computer: I CAN'T. Person: Will you please stack up both of the red blocks and either a green cube or a pyramid? Computer: OK. Person: Which cube is sitting on the table? Computer: THE LARGE GREEN ONE WHICH SUPPORTS THE RED PYRAMID. Person: Is there a large block behind a pyramid? Computer: YES, THREE OF THEM: A LARGE RED ONE, A LARGE GREEN CUBE, AND THE BLUE ONE. Person: Put a small one onto the green cube which supports a pyramid. Computer: OK. Person: Put the littlest pyramid on top of it. Computer: OK. Person: Does the shortest thing the tallest pyramid's support supports support anything green? Computer: YES, THE GREEN PYRAMID. Person: What color is it? Computer: BY "IT", I ASSUME YOU MEAN THE SHORTEST THING THE TALLEST PYRAMID'S SUPPORT SUPPORTS. Computer: RED Person: How many things are on top of green cubes? Computer: I'M NOT SURE WHAT YOU MEAN BY "ON TOP OF" IN THE PHRASE "ON TOP OF GREEN CUBES" . DO YOU MEAN: 1 - DIRECTLY ON THE SURFACE 2 - ANWHERE ON TOP OF ? Person: 2 Computer: THREE OF THEM. Person: Had you touched any pyramid before you put the green one on the little cube? Computer: YES, THE GREEN ONE. Person: When did you pick it up? Computer: WHILE I WAS STACKING UP THE RED CUBE, A LARGE RED BLOCK AND A LARGE GREEN CUBE. Person: Why? Computer: TO GET RID OF IT. Person: Why did you do that? Computer: TO CLEAN OFF THE RED CUBE. Person: Why did you clear off that cube? Computer: TO PUT IT ON A LARGE GREEN CUBE. Person: Why did you do that? Computer: TO STACK UP THE RED CUBE , A LARGE RED BLOCK AND A LARGE GREEN CUBE. Person: Why did you do that? Computer: BECAUSE YOU ASKED ME TO. Person: How did you do it? Computer: BY PUTTING A LARGE RED BLOCK ON THE TABLE ; THEN LETTING GO OF IT ; THEN PUTTING A LARGE GREEN CUBE ON IT ; THEN LETTING GO OF THAT CUBE ; THEN PUTTING THE RED CUBE ON THAT CUBE; THEN LETTING GO OF THAT CUBE. Person: How many objects did you touch while you were doing it? Computer: FOUR OF THEM. Person: What did the red cube support before you started to clean it off? Computer: THE GREEN PYRAMID. Person: There were five blocks to the left of the box then. Computer: NO, ONLY FOUR OF THEM: THE RED CUBE, TWO LARGE GREEN CUBES AND A LARGE RED BLOCK. Person: Put the blue pyramid on the block in the box. Computer: OK. Person: Is there anything which is bigger than every pyramid but is not as wide as the thing that supports it? Computer: YES, THE BLUE BLOCK. Person: Does a steeple Computer (interrupting): SORRY, I DON'T KNOW THE WORD "STEEPLE". Person: A "steeple" is a stack which contains two green cubes and a pyramid. Computer: I UNDERSTAND. Person: Are there any steeples now? Computer: NO. Person: Build one. Computer: OK. Person: Call the biggest block "superblock". Computer: OK. Person: Have you picked superblock up since we began? Computer: YES. Person: Why did you drop it? Computer: BY "IT", I ASSUME YOU MEAN THE SUPERBLOCK. I CAN'T EXPLAIN A NON-EXISTENT EVENT. Person: Is there anything to the right of the red pyramid? Computer: YES, FOUR THINGS: THE BOX, THE RED CUBE, THE BLUE PYRAMID AND THE BLOCK NAMED SUPERBLOCK Person: Thank you. Computer: YOU'RE WELCOME! == Consequences == SHRDLU was considered a tremendously successful demonstration of artificial intelligence (AI). This led other AI researchers to excessive optimism which was soon lost when later systems attempted to deal with situations with a more realistic level of ambiguity and complexity. Subsequent efforts of the SHRDLU type, such as Cyc, have tended to focus on providing the program with considerably more information from which it can draw conclusions. In a 1991 interview, Winograd said about SHRDLU: [...] the famous dialogue with SHRDLU where you could pick up a block, and so on, I very carefully worked through, line by line. If you sat down in front of it, and asked it a question that wasn't in the dialogue, there was some probability it would answer it. I mean, if it was reasonably close to one of the questions that was there in form and in content, it would probably get it. But there was no attempt to get it to the point where you could actually hand it to somebody and they could use it to move blocks around. And there was no pressure for that whatsoever. Pressure was for something you could demo. Take a recent example, Negroponte's Media Lab, where instead of "perish or publish" it's "demo or die." I think that's a problem. I think AI suffered from that a lot, because it led to "Potemkin villages", things which - for the things they actually did in the demo looked good, but when you looked behind that there wasn't enough structure to make it really work more generally. Though not intentionally developed as such, SHRDLU is considered the first known formal example of interactive fiction, as the user interacts with simple commands to move objects around a virtual environment, though lacking the distinct story-telling normally present in the interactive fiction genre. The 1976-1977 game Colossal Cave Adventure is broadly considered to be the first true work of interactive fiction.

    Read more →
  • BabelNet

    BabelNet

    BabelNet is a multilingual lexical-semantic knowledge graph, ontology and encyclopedic dictionary developed at the NLP group of the Sapienza University of Rome under the supervision of Roberto Navigli. BabelNet was automatically created by linking Wikipedia to the most popular computational lexicon of the English language, WordNet. The integration is done using an automatic mapping and by filling in lexical gaps in resource-poor languages by using statistical machine translation. The result is an encyclopedic dictionary that provides concepts and named entities lexicalized in many languages and connected with large amounts of semantic relations. Additional lexicalizations and definitions are added by linking to free-license wordnets, OmegaWiki, the English Wiktionary, Wikidata, FrameNet, VerbNet and others. Similarly to WordNet, BabelNet groups words in different languages into sets of synonyms, called Babel synsets. For each Babel synset, BabelNet provides short definitions (called glosses) in many languages harvested from both WordNet and Wikipedia. == Statistics of BabelNet == As of December 2023, BabelNet (version 5.3) covers 600 languages. It contains almost 23 million synsets and around 1.7 billion word senses (regardless of their language). Each Babel synset contains 2 synonyms per language, i.e., word senses, on average. The semantic network includes all the lexico-semantic relations from WordNet (hypernymy and hyponymy, meronymy and holonymy, antonymy and synonymy, etc., totaling around 364,000 relation edges) as well as an underspecified relatedness relation from Wikipedia (totaling around 1.9 billion edges). Version 5.3 also associates around 61 million images with Babel synsets and provides a Lemon RDF encoding of the resource, available via a SPARQL endpoint. 2.67 million synsets are assigned domain labels. == Applications == BabelNet has been shown to enable multilingual natural language processing applications. The lexicalized knowledge available in BabelNet has been shown to obtain state-of-the-art results in: Semantic relatedness, Multilingual word-sense disambiguation and entity linking, with the Babelfy system, Video games with a purpose. == Prizes and acknowledgments == BabelNet received the META prize 2015 for "groundbreaking work in overcoming language barriers through a multilingual lexicalised semantic network and ontology making use of heterogeneous data sources". The Artificial Intelligence Journal paper that describes BabelNet won the Prominent Paper Award in 2017. BabelNet featured prominently in a Time magazine article about the new age of innovative and up-to-date lexical knowledge resources available on the Web.

    Read more →
  • Non-separable wavelet

    Non-separable wavelet

    Non-separable wavelets are multi-dimensional wavelets that are not directly implemented as tensor products of wavelets on some lower-dimensional space. They have been studied since 1992. They offer a few important advantages. Notably, using non-separable filters leads to more parameters in design, and consequently better filters. The main difference, when compared to the one-dimensional wavelets, is that multi-dimensional sampling requires the use of lattices (e.g., the quincunx lattice). The wavelet filters themselves can be separable or non-separable regardless of the sampling lattice. Thus, in some cases, the non-separable wavelets can be implemented in a separable fashion. Unlike separable wavelet, the non-separable wavelets are capable of detecting structures that are not only horizontal, vertical or diagonal (show less anisotropy). == Examples == Red-black wavelets Contourlets Shearlets Directionlets Steerable pyramids Non-separable schemes for tensor-product wavelets

    Read more →
  • Tag (metadata)

    Tag (metadata)

    In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system, although they may also be chosen from a controlled vocabulary. Tagging was popularized by websites associated with Web 2.0 and is an important feature of many Web 2.0 services. It is now also part of other database systems, desktop applications, and operating systems. == Overview == People use tags to aid classification, mark ownership, note boundaries, and indicate online identity. Tags may take the form of words, images, or other identifying marks. An analogous example of tags in the physical world is museum object tagging. People were using textual keywords to classify information and objects long before computers. Computer based search algorithms made the use of such keywords a rapid way of exploring records. Tagging gained popularity due to the growth of social bookmarking, image sharing, and social networking websites. These sites allow users to create and manage labels (or "tags") that categorize content using simple keywords. Websites that include tags often display collections of tags as tag clouds, as do some desktop applications. On websites that aggregate the tags of all users, an individual user's tags can be useful both to them and to the larger community of the website's users. Tagging systems have sometimes been classified into two kinds: top-down and bottom-up. Top-down taxonomies are created by an authorized group of designers (sometimes in the form of a controlled vocabulary), whereas bottom-up taxonomies (called folksonomies) are created by all users. This definition of "top down" and "bottom up" should not be confused with the distinction between a single hierarchical tree structure (in which there is one correct way to classify each item) versus multiple non-hierarchical sets (in which there are multiple ways to classify an item); the structure of both top-down and bottom-up taxonomies may be either hierarchical, non-hierarchical, or a combination of both. Some researchers and applications have experimented with combining hierarchical and non-hierarchical tagging to aid in information retrieval. Others are combining top-down and bottom-up tagging, including in some large library catalogs (OPACs) such as WorldCat. When tags or other taxonomies have further properties (or semantics) such as relationships and attributes, they constitute an ontology. In folder system a file cannot exist in two or more folders so tag system has been thought more convenient. But transitioning to tag system requires awareness of difference between properties of two systems. In folder system the information of classification is put outside of the file and we can change folder at once. In tag system the information of classification is put inside the file so changing its tag means changing the file and it needs to be saved again and takes time. Metadata tags as described in this article should not be confused with the use of the word "tag" in some software to refer to an automatically generated cross-reference; examples of the latter are tags tables in Emacs and smart tags in Microsoft Office. == History == The use of keywords as part of an identification and classification system long predates computers. Paper data storage devices, notably edge-notched cards, that permitted classification and sorting by multiple criteria were already in use prior to the twentieth century, and faceted classification has been used by libraries since the 1930s. In the late 1970s and early 1980s, Emacs, the text editor for Unix systems, offered a companion software program called Tags that could automatically build a table of cross-references called a tags table that Emacs could use to jump between a function call and that function's definition. This use of the word "tag" did not refer to metadata tags, but was an early use of the word "tag" in software to refer to a word index. Online databases and early websites deployed keyword tags as a way for publishers to help users find content. In the early days of the World Wide Web, the keywords meta element was used by web designers to tell web search engines what the web page was about, but these keywords were only visible in a web page's source code and were not modifiable by users. In 1997, the collaborative portal "A Description of the Equator and Some ØtherLands" produced by documenta X, Germany, used the folksonomic term Tag for its co-authors and guest authors on its Upload page. In "The Equator" the term Tag for user-input was described as an abstract literal or keyword to aid the user. However, users defined singular Tags, and did not share Tags at that point. In 2003, the social bookmarking website Delicious provided a way for its users to add "tags" to their bookmarks (as a way to help find them later); Delicious also provided browseable aggregated views of the bookmarks of all users featuring a particular tag. Within a couple of years, the photo sharing website Flickr allowed its users to add their own text tags to each of their pictures, constructing flexible and easy metadata that made the pictures highly searchable. The success of Flickr and the influence of Delicious popularized the concept, and other social software websites—such as YouTube, Technorati, and Last.fm—also implemented tagging. In 2005, the Atom web syndication standard provided a "category" element for inserting subject categories into web feeds, and in 2007 Tim Bray proposed a "tag" URN. == Examples == === Within a blog === Many systems (and other web content management systems) allow authors to add free-form tags to a post, along with (or instead of) placing the post into a predetermined category. For example, a post may display that it has been tagged with baseball and tickets. Each of those tags is usually a web link leading to an index page listing all of the posts associated with that tag. The blog may have a sidebar listing all the tags in use on that blog, with each tag leading to an index page. To reclassify a post, an author edits its list of tags. All connections between posts are automatically tracked and updated by the blog software; there is no need to relocate the page within a complex hierarchy of categories. === Within application software === Some desktop applications and web applications feature their own tagging systems, such as email tagging in Gmail and Mozilla Thunderbird, bookmark tagging in Firefox, audio tagging in iTunes or Winamp, and photo tagging in various applications. Some of these applications display collections of tags as tag clouds. === Assigned to computer files === There are various systems for applying tags to the files in a computer's file system. In Apple's Mac System 7, released in 1991, users could assign one of seven editable colored labels (with editable names such as "Essential", "Hot", and "In Progress") to each file and folder. In later iterations of the Mac operating system ever since OS X 10.9 was released in 2013, users could assign multiple arbitrary tags as extended file attributes to any file or folder, and before that time the open-source OpenMeta standard provided similar tagging functionality for Mac OS X. Several semantic file systems that implement tags are available for the Linux kernel, including Tagsistant. Microsoft Windows allows users to set tags only on Microsoft Office documents and some kinds of picture files. Cross-platform file tagging standards include Extensible Metadata Platform (XMP), an ISO standard for embedding metadata into popular image, video and document file formats, such as JPEG and PDF, without breaking their readability by applications that do not support XMP. XMP largely supersedes the earlier IPTC Information Interchange Model. Exif is a standard that specifies the image and audio file formats used by digital cameras, including some metadata tags. TagSpaces is an open-source cross-platform application for tagging files; it inserts tags into the filename. === For an event === An official tag is a keyword adopted by events and conferences for participants to use in their web publications, such as blog entries, photos of the event, and presentation slides. Search engines can then index them to make relevant materials related to the event searchable in a uniform way. In this case, the tag is part of a controlled vocabulary. === In research === A researcher may work with a large collection of items (e.g. press quotes, a bibliography, images) in digital form. If he/she wishes to associate each with a small number of themes (e.g. to chapters of a book, or to sub-themes of the overall subject), then a group of tags for these themes can be attached to each of the items in

    Read more →
  • Retrieval-based Voice Conversion

    Retrieval-based Voice Conversion

    Retrieval-based Voice Conversion (RVC) is an open source voice conversion AI algorithm that enables realistic speech-to-speech transformations, accurately preserving the intonation and audio characteristics of the original speaker. == Overview == In contrast to text-to-speech systems such as ElevenLabs, RVC differs by providing speech-to-speech outputs instead. It maintains the modulation, timbre and vocal attributes of the original speaker, making it suitable for applications where emotional tone is crucial. The algorithm enables both pre-processed and real-time voice conversion with low latency. This real-time capability marks a significant advancement over previous AI voice conversion technologies, such as So-vits SVC. Its speed and accuracy have led many to note that its generated voices sound near-indistinguishable from "real life", provided that sufficient computational specifications and resources (e.g., a powerful GPU and ample RAM) are available when running it locally and that a high-quality voice model is used. == Technical foundation == Retrieval-based Voice Conversion (RVC) utilizes a hybrid approach that integrates feature extraction with retrieval-based synthesis. Instead of directly mapping source speaker features to the target speaker using statistical models, RVC retrieves relevant segments from a target speech database, aiming to enhance the naturalness and speaker fidelity of the converted speech. At a high level, the RVC system typically comprises three main components: (1) a content feature extractor, such as a phonetic posteriorgram (PPG) encoder or self-supervised models like HuBERT; (2) a vector retrieval module that searches a target voice database for the most similar speech units; and (3) a vocoder or neural decoder that synthesizes waveform output from the retrieved representations. The retrieval-based paradigm aims to mitigate the oversmoothing effect commonly observed in fully neural sequence-to-sequence models, potentially leading to more expressive and natural-sounding speech. Furthermore, with the incorporation of high-dimensional embeddings and k-nearest-neighbor search algorithms, the model can perform efficient matching across large-scale databases without significant computational overhead. Recent RVC frameworks have incorporated adversarial learning strategies and GAN-based vocoders, such as HiFi-GAN, to enhance synthesis quality. These integrations have been shown to produce clearer harmonics and reduce reconstruction errors. == Research developments == Research on RVC has recently explored the use of self-supervised learning (SSL) encoders such as wav2vec 2.0 and HuBERT to replace hand-engineered features like MFCCs. These encoders improve content preservation, especially when source and target speakers have dissimilar speaking styles or accents. Moreover, modern RVC models leverage vector quantization methods to discretize the acoustic space, improving both synthesis accuracy and generalization across unseen speakers. For example, retrieval-augmented VQ models can condition the synthesis stage on quantized speech tokens, which enhances controllability and style transfer. Despite its strengths, RVC still faces limitations related to database coverage, especially in real-time or few-shot settings. Inadequate diversity in the target voice corpus may lead to suboptimal retrieval or unnatural prosody. These advances demonstrate the viability of RVC as a strong alternative to conventional deep learning VC systems, balancing both flexibility and efficiency in diverse voice synthesis applications. == Training process == The training pipeline for retrieval-based voice conversion typically includes a preprocessing step where the target speaker's dataset is segmented and normalized. A pitch extractor such as librosa or DDSP-DDC may be used to obtain fundamental frequency (F0) features. During training, the model learns to map content features from the source speaker to the acoustic representation of the target speaker while maintaining pitch and prosody. The training objective often combines reconstruction loss with feature consistency loss across intermediate layers, and may incorporate cycle consistency loss to preserve speaker identity. Fine-tuning on small datasets is feasible due to the use of pre-trained models, particularly for the SSL encoder and content extractor components. This approach allows transfer learning to be applied effectively, enabling the model to converge faster and generalize better to unseen inputs. Most open implementations support batch training, gradient accumulation, and mixed-precision acceleration (e.g., FP16), especially when utilizing NVIDIA CUDA-enabled GPUs. == Real-time deployment == RVC systems can be deployed in real-time scenarios through WebUI interfaces and streaming audio frameworks. Optimizations include converting the inference graph to ONNX or TensorRT formats, reducing latency. Audio buffers are typically processed in chunks of 0.2–0.5 seconds to ensure minimal delay and seamless conversion. Cross-platform compatibility with tools such as OBS Studio and Voicemeeter enables integration into live streaming, video production, or virtual avatar environments. == Applications and concerns == The technology enables voice changing and mimicry, allowing users to create accurate models of others using only a negligible amount of minutes of clear audio samples. These voice models can be saved as .pth (PyTorch) files. While this capability facilitates numerous creative applications, it has also raised concerns about potential misuse as deepfake software for identity theft and malicious impersonation through voice calls. == Ethical and legal considerations == As with other deep generative models, the rise of RVC technology has led to increasing debate about copyright, consent, and authorship. While some jurisdictions may allow parody or fair use in creative contexts, impersonating living individuals without permission may infringe upon privacy and likeness rights. As a result, some platforms have begun issuing takedown notices against AI-generated voice content that closely mimics celebrities or musicians. === In pop culture === RVC inference has been used to create realistic depictions of song covers, such as replacing original vocals with characters like Twilight Sparkle and Mordecai to have them sing duets of popular music like "Airplanes" and "Somebody That I Used to Know." These AI-generated covers, which can sound strikingly similar to the voice imitated, have gained popularity on platforms like YouTube as humorous memes.

    Read more →