AI Generator Outfit

AI Generator Outfit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Open Data Center Alliance

    Open Data Center Alliance

    opendatacenteralliance.org appears to have been closed down. The Open Data Center Alliance is an independent organization created in Oct. 2010 with the assistance of Intel to coordinate the development of standards for cloud computing. Approximately 100 companies, which account for more than $50bn of IT spending, have joined the Alliance, including BMW, Royal Dutch Shell and Marriott Hotels. "The Alliance's Cloud 2015 vision is aimed at creating a federated cloud where common standards will be laid down for those in the hardware and software arena." == Usage Model Roadmap == The organization sees a growing need for solutions developed in an open, industry-standard and multivendor fashion, and has thus created a usage model roadmap featuring 19 prioritized usage models. The usage models provide detailed requirements for data center and cloud solutions, and will include detailed technical documentation discussing the requirements for technology deployments. To further its roadmap development, the steering committee established five initial technical workgroups in the areas of infrastructure, management, regulation & ecosystem, security and services. The organization delivered a 0.50 usage model roadmap to Open Data Center Alliance technical workgroups in Oct. 2010, and delivered a full 1.0 roadmap for public use in June 2011. == Membership == The steering committee consists of BMW, Capgemini, China Life, China Unicom Group, Deutsche Bank, JPMorgan Chase, Lockheed Martin, Marriott International, Inc., National Australia Bank, Royal Dutch Shell, Terremark and UBS. Other members include AT&T, CERN, eBay, Logica, Motorola Mobility Inc. and Nokia. "The demands on the IT organisations are coming at such an alarming rate that there are many, many different solutions being developed today that maybe don't work with each other. We need one voice, one road map, so that companies are able to say to manufacturers here is a clear vision of what they should be developing their product to do." says Marvin Wheeler, of Terremark, chairman of the Alliance. "While it's unclear how successful this alliance will be, it is at least shedding the spotlight on cloud interoperability, a big emerging issue," said Larry Dignan of ZDNet.

    Read more →
  • Mittens (chess)

    Mittens (chess)

    Mittens is a chess engine developed by Chess.com. It was released on January 1, 2023, alongside four other engines, all of them given cat-related names. The engine became a viral sensation in the chess community due to exposure through content made by chess streamers and a social media marketing campaign, later contributing to record levels of traffic to the Chess.com website and causing issues with database scalability. Mittens was given a rating of one point by Chess.com, although it was evidently stronger than that. Various chess masters played matches against the engine, with players such as Hikaru Nakamura and Levy Rozman drawing and losing their games respectively. A month after its release, Mittens was removed from the website on February 1, as expected through Chess.com's monthly bot cycles. In December 2023, Mittens was brought back in a group of Chess.com's most popular bots of 2023. In January 2024, Mittens was removed again. == Release == Mittens was released on January 1, 2023, as part of a New Year event on Chess.com. It was one of five engines released, all with names related to cats. The other engines released were named Scaredy Cat, rated 800; Angry Cat, rated 1000; Mr. Grumpers, rated 1200 and Catspurrov (a pun on Garry Kasparov), rated 1400. As part of the announcement, a picture of each engine was accompanied by a short description of its character. The description given for Mittens suggested that the engine was hiding something, reading: Mittens likes chess… But how good is she? Of the five engines released, Mittens was by far the most popular. In December 2023, Chess.com re-released Mittens as part of a "best of 2023" group of chess bots made to showcase their most popular bots of the year. == Design == Mittens was conceptualized by Chess.com employee Will Whalen. Appearing as a kitten, Mittens trash talked its opponents with a selection of voice lines: these lines included quotes from J. Robert Oppenheimer, Vincent van Gogh and Friedrich Nietzsche, as well as the 1967 film Le Samouraï. The engine's "personality" was devised by a writing team headed by Sean Becker, and Marija Casic provided the engine's graphics. Chess.com did not disclose any information about the software running the engine. It may be based on Chess.com's Komodo Dragon 3 engine. Mittens' strategy was to slowly grind down an opponent, a tactic likened to the playing style of Anatoly Karpov. Becker stated that the design team believed it would be "way more demoralizing and funny" for the engine to play this way. According to Hikaru Nakamura, Mittens sometimes missed the best move (or winning positions). == Rating == On Chess.com, Mittens had a rating of one point. However, the engine's playing style and tactics showed that it was stronger than that; Mittens was able to beat or draw against many top human players. In an interview with CNN Business, Whalen stated that the idea behind giving Mittens a rating of one was to surprise its opponents, giving it the upper hand psychologically. Estimates of Mittens' true rating range from an Elo of 3200 to 3500, because of its ability to beat other engines of around that level. An upper bound of the engine's rating was found after Levy Rozman made Mittens play against Stockfish 15, a 3700 rated engine. Mittens lost the two games that the engines played. The range of Mittens' possible ratings was summarized by Dot Esports, who stated: It seems like she’s around the 3200–3500 rating range (in Chess.com terms, where the best human players, like Magnus Carlsen and Hikaru Nakamura, sport a 3000–3100 rating in the faster formats), as evidenced by her victories over the site’s otherwise strongest, 3200-rated bots, and her defeat to Stockfish 15, which is currently rated around 3700. == Games == Against human players, Mittens won over 99 percent of the millions of games it played. Chess players such as Hikaru Nakamura, Benjamin Bok, Levy Rozman and Eric Rosen struggled against Mittens; while Rozman and Rosen both lost against the engine, Nakamura and Bok were both able to make a draw. In particular, Nakamura's game against the engine lasted 166 moves; he was playing as White. Bok, Benjamin Finegold and Rozman later went on to win against Mittens, the latter with engine assistance from Stockfish. Magnus Carlsen publicly refused to play the engine, calling it a "transparent marketing trick" and "a soulless computer". Against other chess engines, Mittens participated in the Chess.com Computer Chess Championship as a side act. In the competition, Mittens played 150 games against an engine named after the film M3GAN and won overall with a score of 81.5 to 68.5. This equated to 54 percent of the games played. During the event, an estimate of Mittens' rating was made at 3515 points. == Impact == Mittens went viral in the chess community due to its concept and design: according to an announcement by Chess.com, a combined total of 120 million games were played against the cat engines over the course of January, with around 40 million played against Mittens. The popularity of the engine was helped by the social media exposure created by Chess.com. This included creating an official Twitter account to promote the engine. Chess streamers like Rozman and Nakamura helped cultivate this by creating content around the engine. A video by Nakamura entitled "Mittens the chess bot will make you quit chess" gained over 3.5 million views on YouTube. On January 11, Chess.com reported issues with database scalability due to record levels of traffic: 40 percent more games had been played on Chess.com in January 2023 than any other month since the website's release. According to The Wall Street Journal, the popularity spike was more than the similar surge following the release of Netflix's The Queen's Gambit. The popularity of Mittens was cited by Chess.com as a reason for this instability. The problems continued throughout January; Chess.com stated that they would have to upgrade their servers and invest more in cloud computing to solve the problems caused by the website's popularity surge. On February 1, 2023, Mittens and the other cat engines were removed from the computer section of Chess.com. They were replaced with five new engines themed around artificial intelligence. A tweet was posted on the Mittens's Twitter account after the engine's removal, reading "This is just the beginning. Goodbye for now."

    Read more →
  • SHRDLU

    SHRDLU

    SHRDLU is an early natural-language understanding computer program that was developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the state of a simplified "blocks world", essentially a virtual box filled with different blocks. SHRDLU was written in the Micro Planner and Lisp programming language on the DEC PDP-6 computer and a DEC graphics terminal. Later additions were made at the computer graphics labs at the University of Utah, adding a full 3D rendering of SHRDLU's "world". The name SHRDLU was derived from ETAOIN SHRDLU, the arrangement of the letter keys on a Linotype machine, arranged in descending order of usage frequency in English. == Functionality == SHRDLU is primarily a language parser that allows user interaction using English terms. The user instructs SHRDLU to move various objects around in the "blocks world" containing various basic objects such as blocks, cones and balls. SHRDLU combined four simple ideas to make the simulation of "understanding" more convincing. One was that SHRDLU's world is so simple that the entire set of objects and locations could be described by including as few as 50 words: nouns like "block" and "cone", verbs like "place on" and "move to", and adjectives like "big" and "blue". The possible combinations of these basic language building blocks are quite simple. SHRDLU also includes a basic memory to supply context. One could ask SHRDLU to "put the green cone on the red block" and then "take the cone off"; "the cone" would be taken to mean the green cone one had just talked about. SHRDLU can search back further through the interactions to find the proper context in most cases when additional adjectives were supplied. One could also ask questions about the history; for instance, one could ask "did you pick up anything before the cone?" A side effect of this memory, and the original rules SHRDLU was supplied with, is that the program can answer questions about what was possible in the world and what was not. For instance, SHRDLU can deduce that blocks could be stacked by looking for examples, but also realize that triangles could not be stacked, after having tried it. The "world" contains basic physics to make blocks fall over, independent of the language parser. Finally, SHRDLU can also remember names given to objects, or arrangements of them. For instance one could say "a steeple is a small triangle on top of a tall rectangle"; SHRDLU can then answer questions about steeples in the blocks world, and build new ones. This is an example of a SHRDLU session: Person: Pick up a big red block. Computer: OK. Person: Grasp the pyramid. Computer: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. Person (changing their mind): Find a block which is taller than the one you are holding and put it into the box. Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. Computer: OK. Person: What does the box contain? Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. Person: What is the pyramid supported by? Computer: THE BOX. Person: How many blocks are not in the box? Computer: FOUR OF THEM. Person: Is at least one of them narrower than the one which i told you to pick up? Computer: YES, THE RED CUBE. Person: Is it supported? Computer: YES, BY THE TABLE. Person: Can the table pick up blocks? Computer: NO. Person: Can a pyramid be supported by a block? Computer: YES. Person: Can a pyramid support a pyramid? Computer: I DON'T KNOW. Person: Stack up two pyramids. Computer: I CAN'T. Person: Will you please stack up both of the red blocks and either a green cube or a pyramid? Computer: OK. Person: Which cube is sitting on the table? Computer: THE LARGE GREEN ONE WHICH SUPPORTS THE RED PYRAMID. Person: Is there a large block behind a pyramid? Computer: YES, THREE OF THEM: A LARGE RED ONE, A LARGE GREEN CUBE, AND THE BLUE ONE. Person: Put a small one onto the green cube which supports a pyramid. Computer: OK. Person: Put the littlest pyramid on top of it. Computer: OK. Person: Does the shortest thing the tallest pyramid's support supports support anything green? Computer: YES, THE GREEN PYRAMID. Person: What color is it? Computer: BY "IT", I ASSUME YOU MEAN THE SHORTEST THING THE TALLEST PYRAMID'S SUPPORT SUPPORTS. Computer: RED Person: How many things are on top of green cubes? Computer: I'M NOT SURE WHAT YOU MEAN BY "ON TOP OF" IN THE PHRASE "ON TOP OF GREEN CUBES" . DO YOU MEAN: 1 - DIRECTLY ON THE SURFACE 2 - ANWHERE ON TOP OF ? Person: 2 Computer: THREE OF THEM. Person: Had you touched any pyramid before you put the green one on the little cube? Computer: YES, THE GREEN ONE. Person: When did you pick it up? Computer: WHILE I WAS STACKING UP THE RED CUBE, A LARGE RED BLOCK AND A LARGE GREEN CUBE. Person: Why? Computer: TO GET RID OF IT. Person: Why did you do that? Computer: TO CLEAN OFF THE RED CUBE. Person: Why did you clear off that cube? Computer: TO PUT IT ON A LARGE GREEN CUBE. Person: Why did you do that? Computer: TO STACK UP THE RED CUBE , A LARGE RED BLOCK AND A LARGE GREEN CUBE. Person: Why did you do that? Computer: BECAUSE YOU ASKED ME TO. Person: How did you do it? Computer: BY PUTTING A LARGE RED BLOCK ON THE TABLE ; THEN LETTING GO OF IT ; THEN PUTTING A LARGE GREEN CUBE ON IT ; THEN LETTING GO OF THAT CUBE ; THEN PUTTING THE RED CUBE ON THAT CUBE; THEN LETTING GO OF THAT CUBE. Person: How many objects did you touch while you were doing it? Computer: FOUR OF THEM. Person: What did the red cube support before you started to clean it off? Computer: THE GREEN PYRAMID. Person: There were five blocks to the left of the box then. Computer: NO, ONLY FOUR OF THEM: THE RED CUBE, TWO LARGE GREEN CUBES AND A LARGE RED BLOCK. Person: Put the blue pyramid on the block in the box. Computer: OK. Person: Is there anything which is bigger than every pyramid but is not as wide as the thing that supports it? Computer: YES, THE BLUE BLOCK. Person: Does a steeple Computer (interrupting): SORRY, I DON'T KNOW THE WORD "STEEPLE". Person: A "steeple" is a stack which contains two green cubes and a pyramid. Computer: I UNDERSTAND. Person: Are there any steeples now? Computer: NO. Person: Build one. Computer: OK. Person: Call the biggest block "superblock". Computer: OK. Person: Have you picked superblock up since we began? Computer: YES. Person: Why did you drop it? Computer: BY "IT", I ASSUME YOU MEAN THE SUPERBLOCK. I CAN'T EXPLAIN A NON-EXISTENT EVENT. Person: Is there anything to the right of the red pyramid? Computer: YES, FOUR THINGS: THE BOX, THE RED CUBE, THE BLUE PYRAMID AND THE BLOCK NAMED SUPERBLOCK Person: Thank you. Computer: YOU'RE WELCOME! == Consequences == SHRDLU was considered a tremendously successful demonstration of artificial intelligence (AI). This led other AI researchers to excessive optimism which was soon lost when later systems attempted to deal with situations with a more realistic level of ambiguity and complexity. Subsequent efforts of the SHRDLU type, such as Cyc, have tended to focus on providing the program with considerably more information from which it can draw conclusions. In a 1991 interview, Winograd said about SHRDLU: [...] the famous dialogue with SHRDLU where you could pick up a block, and so on, I very carefully worked through, line by line. If you sat down in front of it, and asked it a question that wasn't in the dialogue, there was some probability it would answer it. I mean, if it was reasonably close to one of the questions that was there in form and in content, it would probably get it. But there was no attempt to get it to the point where you could actually hand it to somebody and they could use it to move blocks around. And there was no pressure for that whatsoever. Pressure was for something you could demo. Take a recent example, Negroponte's Media Lab, where instead of "perish or publish" it's "demo or die." I think that's a problem. I think AI suffered from that a lot, because it led to "Potemkin villages", things which - for the things they actually did in the demo looked good, but when you looked behind that there wasn't enough structure to make it really work more generally. Though not intentionally developed as such, SHRDLU is considered the first known formal example of interactive fiction, as the user interacts with simple commands to move objects around a virtual environment, though lacking the distinct story-telling normally present in the interactive fiction genre. The 1976-1977 game Colossal Cave Adventure is broadly considered to be the first true work of interactive fiction.

    Read more →
  • Freddy II

    Freddy II

    Freddy (1969–1971) and Freddy II (1973–1976) were experimental robots built in the Department of Machine Intelligence and Perception (later Department of Artificial Intelligence, now part of the School of Informatics at the University of Edinburgh). == Technology == Technical innovations involving Freddy were at the forefront of the 70s robotics field. Freddy was one of the earliest robots to integrate vision, manipulation and intelligent systems as well as having versatility in the system and ease in retraining and reprogramming for new tasks. The idea of moving the table instead of the arm simplified the construction. Freddy also used a method of recognising the parts visually by using graph matching on the detected features. The system used an innovative collection of high level procedures for programming the arm movements which could be reused for each new task. == Lighthill controversy == In the mid 1970s there was controversy about the utility of pursuing a general purpose robotics programme in both the USA and the UK. A BBC TV programme in 1973, referred to as the "Lighthill Debate", pitched James Lighthill, who had written a critical report for the science and engineering research funding agencies in the UK, against Donald Michie from the University of Edinburgh and John McCarthy from Stanford University. The Edinburgh Freddy II and Stanford/SRI Shakey robots were used to illustrate the state-of-the-art at the time in intelligent robotics systems. == Freddy I and II == Freddy Mark I (1969–1971) was an experimental prototype, with 3 degrees-of-freedom created by a rotating platform driven by a pair of independent wheels. The other main components were a video camera and bump sensors connected to a computer. The computer moved the platform so that the camera could see and then recognise the objects. Freddy II (1973–1976) was a 5 degrees of freedom manipulator with a large vertical 'hand' that could move up and down, rotate about the vertical axis and rotate objects held in its gripper around one horizontal axis. Two remaining translational degrees of freedom were generated by a work surface that moved beneath the gripper. The gripper was a two finger pinch gripper. A video camera was added as well as later a light stripe generator. The Freddy and Freddy II projects were initiated and overseen by Donald Michie. The mechanical hardware and analogue electronics were designed and built by Stephen Salter (who also pioneered renewable energy from waves (see Salter's Duck)), and the digital electronics and computer interfacing were designed by Harry Barrow and Gregan Crawford. The software was developed by a team led by Rod Burstall, Robin Popplestone and Harry Barrow which used the POP-2 programming language, one of the world's first functional programming languages. The computing hardware was an Elliot 4130 computer with 384KB (128K 24-bit words) RAM and a hard disk linked to a small Honeywell H316 computer with 16KB of RAM which directly performed sensing and control. Freddy was a versatile system which could be trained and reprogrammed to perform a new task in a day or two. The tasks included putting rings on pegs and assembling simple model toys consisting of wooden blocks of different shapes, a boat with a mast and a car with axles and wheels. Information about part locations was obtained using the video camera, and then matched to previously stored models of the parts. It was soon realised in the Freddy project that the 'move here, do this, move there' style of robot behavior programming (actuator or joint level programming) is tedious and also did not allow for the robot to cope with variations in part position, part shape and sensor noise. Consequently, the RAPT robot programming language was developed by Pat Ambler and Robin Popplestone, in which robot behavior was specified at the object level. This meant that robot goals were specified in terms of desired position relationships between the robot, objects and the scene, leaving the details of how to achieve the goals to the underlying software system. Although developed in the 1970s RAPT is still considerably more advanced than most commercial robot programming languages. The team of people who contributed to the project were leaders in the field at the time and included Pat Ambler, Harry Barrow, Ilona Bellos, Chris Brown, Rod Burstall, Gregan Crawford, Jim Howe, Donald Michie, Robin Popplestone, Stephen Salter, Austin Tate and Ken Turner. Also of interest in the project was the use of a structured-light 3D scanner to obtain the 3D shape and position of the parts being manipulated. The Freddy II robot is currently on display at the Royal Museum in Edinburgh, Scotland, with a segment of the assembly video shown in a continuous loop.

    Read more →
  • BERT (language model)

    BERT (language model)

    Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state of the art for large language models. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments. BERT is trained by masked token prediction and next sentence prediction. With this training, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2. It found applications for many natural language processing tasks, such as coreference resolution and polysemy resolution. It improved on ELMo and spawned the study of "BERTology", which attempts to interpret what is learned by BERT. BERT was originally implemented in the English language at two model sizes, BERTBASE (110 million parameters) and BERTLARGE (340 million parameters). Both were trained on the Toronto BookCorpus (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub. On March 11, 2020, 24 smaller models were released, the smallest being BERTTINY with just 4 million parameters. == Architecture == BERT is an "encoder-only" transformer architecture. At a high level, BERT consists of 4 modules: Tokenizer: This module converts a piece of English text into a sequence of integers ("tokens"). Embedding: This module converts the sequence of tokens into an array of real-valued vectors representing the tokens. It represents the conversion of discrete token types into a lower-dimensional Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal masking. Task head: This module converts the final representation vectors into one-shot encoded tokens again by producing a predicted probability distribution over the token types. It can be viewed as a simple decoder, decoding the latent representation into token types, or as an "un-embedding layer". The task head is necessary for pre-training, but it is often unnecessary for so-called "downstream tasks," such as question answering or sentiment classification. Instead, one removes the task head and replaces it with a newly initialized module suited for the task, and finetune the new module. The latent vector representation of the model is directly fed into this new module, allowing for sample-efficient transfer learning. === Embedding === This section describes the embedding used by BERTBASE. The other one, BERTLARGE, is similar, just larger. The tokenizer of BERT is WordPiece, which is a sub-word strategy like byte-pair encoding. Its vocabulary size is 30,000, and any token not appearing in its vocabulary is replaced by [UNK] ("unknown"). The first layer is the embedding layer, which contains three components: token type embeddings, position embeddings, and segment type embeddings. Token type: The token type is a standard embedding layer, translating a one-hot vector into a dense vector based on its token type. Position: The position embeddings are based on a token's position in the sequence. BERT uses absolute position embeddings, where each position in a sequence is mapped to a real-valued vector. Each dimension of the vector consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Using a vocabulary of just 0 or 1, this embedding layer produces a dense vector based on whether the token belongs to the first or second text segment in that input. In other words, type-1 tokens are all tokens that appear after the [SEP] special token. All prior tokens are type-0. The three embedding vectors are added together representing the initial token representation as a function of these three pieces of information. After embedding, the vector representation is normalized using a LayerNorm operation, outputting a 768-dimensional vector for each input token. After this, the representation vectors are passed forward through 12 Transformer encoder blocks, and are decoded back to 30,000-dimensional vocabulary space using a basic affine transformation layer. === Architectural family === The encoder stack of BERT has 2 free parameters: L {\displaystyle L} , the number of layers, and H {\displaystyle H} , the hidden size. There are always H / 64 {\displaystyle H/64} self-attention heads, and the feed-forward/filter size is always 4 H {\displaystyle 4H} . By varying these two numbers, one obtains an entire family of BERT models. For BERT: the feed-forward size and filter size are synonymous. Both of them denote the number of dimensions in the middle layer of the feed-forward network. the hidden size and embedding size are synonymous. Both of them denote the number of real numbers used to represent a token. The notation for encoder stack is written as L/H. For example, BERTBASE is written as 12L/768H, BERTLARGE as 24L/1024H, and BERTTINY as 2L/128H. == Training == === Pre-training === BERT was pre-trained simultaneously on two tasks: Masked language modeling (MLM): In this task, BERT ingests a sequence of words, where one word may be randomly changed ("masked"), and BERT tries to predict the original words that had been changed. For example, in the sentence "The cat sat on the [MASK]," BERT would need to predict "mat." This helps BERT learn bidirectional context, meaning it understands the relationships between words not just from left to right or right to left but from both directions at the same time. Next sentence prediction (NSP): In this task, BERT is trained to predict whether one sentence logically follows another. For example, given two sentences, "The cat sat on the mat" and "It was a sunny day", BERT has to decide if the second sentence is a valid continuation of the first one. This helps BERT understand relationships between sentences, which is important for tasks like question answering or document classification. ==== Masked language modeling ==== In masked language modeling, 15% of tokens would be randomly selected for masked-prediction task, and the training objective was to predict the masked token given its context. In more detail, the selected token is: replaced with a [MASK] token with probability 80%, replaced with a random word token with probability 10%, not replaced with probability 10%. The reason not all selected tokens are masked is to avoid the dataset shift problem. The dataset shift problem arises when the distribution of inputs seen during training differs significantly from the distribution encountered during inference. A trained BERT model might be applied to word representation (like Word2Vec), where it would be run over sentences not containing any [MASK] tokens. It is later found that more diverse training objectives are generally better. As an illustrative example, consider the sentence "my dog is cute". It would first be divided into tokens like "my1 dog2 is3 cute4". Then a random token in the sentence would be picked. Let it be the 4th one "cute4". Next, there would be three possibilities: with probability 80%, the chosen token is masked, resulting in "my1 dog2 is3 [MASK]4"; with probability 10%, the chosen token is replaced by a uniformly sampled random token, such as "happy", resulting in "my1 dog2 is3 happy4"; with probability 10%, nothing is done, resulting in "my1 dog2 is3 cute4". After processing the input text, the model's 4th output vector is passed to its decoder layer, which outputs a probability distribution over its 30,000-dimensional vocabulary space. ==== Next sentence prediction ==== Given two sentences, the model predicts if they appear sequentially in the training corpus, outputting either [IsNext] or [NotNext]. During training, the algorithm sometimes samples two sentences from a single continuous span in the training corpus, while at other times, it samples two sentences from two discontinuous spans. The first sentence starts with a special token, [CLS] (for "classify"). The two sentences are separated by another special token, [SEP] (for "separate"). After processing the two sentences, the final vector for the [CLS] token is passed to a linear layer for binary classification into [IsNext] and [NotNext]. For example: Given "[CLS] my dog is cute [SEP] he likes playing [SEP]", the model should predict [IsNext]. Given "[CLS] my dog is cute [SEP] how do magnets work [SEP]", the model should predict [NotNext]. === Fine-tuning === BERT is meant as a general pretrained model for various applications in natural language processing. That is, after pre-training, BERT can be fine-tuned with fewer resources on smaller datasets to optimize its performance on specific tasks such as natural language inference and text classification, and sequence-to-sequence-based language generation tasks such as question answering and conversational response generation. The original BERT paper published results demonstrating that a small amount of fine

    Read more →
  • Civitai

    Civitai

    Civitai is an online platform and marketplace for generative artificial intelligence (Gen AI) content, primarily focused on AI-generated images and models, and AI-generated videos. == History == Civitai was founded in 2022 by Justin Maier. By January 2023, the site reached 100,000 registered users and 3 million by November. In November 2023, Civitai secured funding from venture capital firm Andreessen Horowitz. By April 2024, Civitai had 23.2 million monthly accesses. The company is headquartered in Boise, Idaho. == Platform == Civitai allows users to share and download AI models, particularly those used for image generation. The platform supports various AI models, including Stable Diffusion and Flux, and provides a space for users to showcase and monetize their AI-generated content. Users have profile pages and can comment on other users' models and images. The website also features a virtual currency called Buzz that can be used to generate images on Civitai's servers. Buzz can be bought or earned by engaging with the site. The platform is open source. == Controversies == In 2023, 404 Media reported that Civitai began a "Bounties" marketplace where users could commission deepfakes, of real or fake people. Users are rewarded with Buzz for completing Bounties. In December 2023, AI provider OctoML announced it had ended its business relationship with Civitai after concerns were raised users were generating images that “could be categorized as child pornography.”

    Read more →
  • LipNet

    LipNet

    LipNet is a deep neural network for audio-visual speech recognition (ASVR). It was created by University of Oxford researchers Yannis Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas. The researchers stated that could match mouth movements to text with 93 percent accuracy, though it was criticized for its test using a limited dataset of words and grammar. It was used in Nvidia's autonomous "backseat driver" prototype Co-Pilot.

    Read more →
  • RealSense

    RealSense

    RealSense is an American technology company that develops depth cameras and computer-vision systems used in robotics, access control, industrial automation and healthcare. The company’s stereoscopic 3D cameras and software are marketed as a perception platform for “physical AI”, particularly for humanoid robots and autonomous mobile robots (AMRs). RealSense was incubated for more than a decade inside Intel’s perceptual computing and depth-sensing group before being spun out as an independent company in July 2025 with a US$50 million Series A round backed by a semiconductor-focused private equity firm and strategic investors including Intel Capital and the MediaTek Innovation Fund. Following the spin-out, RealSense announced a strategic collaboration with Nvidia to integrate its AI depth cameras with the Nvidia Jetson Thor robotics platform, the Isaac Sim simulation environment and the Holoscan Sensor Bridge for low-latency sensor fusion. In November 2025, Swiss access-solutions provider dormakaba acquired a minority stake in RealSense and formed a partnership to develop AI-powered biometric access-control and security systems for data centres, airports and other critical infrastructure. == History == === Origins in Intel Perceptual Computing === Intel began developing depth-sensing and perceptual-computing technologies in the early 2010s under the Perceptual Computing brand, with research spanning gesture control, facial recognition and eye-tracking systems. The work led to a series of 3D cameras and developer challenge programmes intended to stimulate software ecosystems for natural-user interfaces. In 2014 Intel rebranded the effort as Intel RealSense, positioning the technology as a family of depth cameras and vision processors for PCs, mobile devices and embedded systems. Early devices such as the F200 and R200 were integrated into laptops and tablets from OEMs including Asus, HP, Dell, Lenovo and Acer, and were also sold as standalone webcams by partners such as Razer and Creative. === Refocus on robotics and near-closure === By the late 2010s Intel had steered RealSense away from mainstream PC peripherals toward robotics, industrial and embedded applications, adding stereo and lidar-based depth cameras to the portfolio. In August 2021, trade publication CRN reported that Intel planned to wind down the RealSense business as part of a broader restructuring, raising questions about the future of the product line. Despite that announcement, Intel continued to invest in new custom silicon for depth cameras, and RealSense remained widely used in mobile robots and automation projects. === Spin-out as RealSense Inc. (2025) === On 11 July 2025, Intel completed the spin-out of its RealSense 3D-camera business into a new privately held company, RealSense Inc., and the new entity announced a US$50 million Series A funding round. The round was led by a semiconductor-focused private equity investor with participation from Intel Capital, MediaTek Innovation Fund and other strategics. Independent coverage described RealSense as serving more than 3,000 active customers and supplying depth cameras to a large share of global AMR and humanoid robot platforms. The company stated that it would continue to support the existing Intel RealSense product roadmap while accelerating development of AI-enabled cameras and perception software. === Strategic partnerships and investments === In October 2025 RealSense and Nvidia announced a strategic collaboration centered on integrating RealSense AI depth cameras with Nvidia’s Jetson Thor robotics compute modules, the Isaac Sim simulation environment and the Holoscan Sensor Bridge for multi-sensor streaming. The collaboration is positioned as enabling “physical AI” workloads such as whole-body humanoid control, real-time mapping and safety-critical human–robot interaction. On 19 November 2025, dormakaba announced that it had acquired a minority stake in RealSense and entered into a partnership to co-develop intelligent access-control solutions, including biometric gates for airports and enterprise facilities. The partnership aims to combine RealSense’s depth and facial-authentication technology with dormakaba’s installed base of sensors, doors and turnstiles. == Products == === Depth-camera families === RealSense’s products are sold as modular components (depth modules, vision processors and complete cameras) and as integrated systems with on-device AI. The company continues to offer and support the Intel RealSense D400 family of active-stereo depth cameras (including the D415, D435 and D455), which are widely used in robotics and automation. These devices combine a RealSense Vision Processor from the D4 family with dual infrared imagers and, on some models, an RGB camera. Earlier generations of Intel RealSense cameras, including the F200, R200, SR300 and the L515 lidar camera, remain in use in niche and legacy applications but are no longer the focus of the independent company’s roadmap. === D555 PoE depth camera === The first new hardware platform announced after the spin-out was the RealSense Depth Camera D555, a ruggedised stereo-depth device aimed at industrial and robotics deployments. The D555 uses the longer-range D450 optical module with a global shutter and integrates RealSense’s Vision SoC V5, a new generation of vision processor optimised for neural-network inference and depth computation. Key features highlighted in technical coverage include: Power over Ethernet (PoE), allowing power and data to be delivered over a single cable and supporting both RJ45 and ruggedised M12 connections; an IP-rated enclosure designed for harsh indoor and outdoor environments; a built-in inertial measurement unit (IMU) to support simultaneous localisation and mapping (SLAM) and motion tracking; native support for ROS 2 and integration with the open-source RealSense SDK. According to independent reporting, the D555 is used in AI-enabled embedded-vision applications in mobile robots and fixed industrial systems, and was among the first RealSense products to be tightly integrated with Nvidia’s Jetson Thor and Holoscan platforms for low-latency sensor fusion. === Software and SDK === RealSense cameras are supported by a cross-platform, open-source software stack historically branded as Intel RealSense SDK 2.0. The SDK provides device drivers, depth and point-cloud processing, tracking and calibration tools, and bindings for languages such as C++, Python and C#. The independent company has continued to maintain and extend the SDK for new hardware, including D555 and other Vision SoC V5-based devices, and publishes reference integrations for ROS 2 and industrial-automation frameworks. === Biometrics and access-control products === In addition to general-purpose depth cameras, RealSense offers facial-authentication hardware and software, commonly referred to as RealSense ID, for biometric access control and identity verification. These products combine an active depth sensor with a dedicated neural-network pipeline running on embedded processors, aimed at applications such as secure doors, turnstiles and kiosks. Use-case material published by partners describes deployments of RealSense-based biometric readers in school lunch programmes, agricultural biosecurity checkpoints and enterprise facilities. The dormakaba partnership announced in 2025 extends this portfolio to integrated biometric gates and sensor-equipped doors in airports and data centres. == Applications == === Robotics and automation === RealSense depth cameras are used in autonomous mobile robots, humanoid robots, drones and industrial automation systems for tasks such as obstacle avoidance, navigation and manipulation. Reuters reported in 2025 that RealSense cameras were embedded in around 60 percent of the world’s AMRs and humanoid robots, citing customers including Unitree Robotics and ANYbotics. Developers and integrators use RealSense systems with platforms such as Nvidia Jetson, ROS and proprietary motion-planning stacks. === Biometrics and security === RealSense technology is also applied in biometric access control and surveillance, where depth and infrared imaging are used to improve anti-spoofing performance for facial recognition. The dormakaba investment and collaboration is aimed at integrating these capabilities into boarding gates, staff entrances and secure facilities, with RealSense providing perception hardware and algorithms and dormakaba providing access-control infrastructure and global distribution. == Reception == Early coverage of Intel RealSense for consumer PCs noted that the technology’s impact would depend on the availability of compelling software and use cases for depth-sensing cameras. Later reporting on the spin-out has characterised the new company as part of a broader wave of investment in robotics and physical AI, with some analysts suggesting that RealSense’s installed base and patent portfolio give it an advantage as dep

    Read more →
  • Granular computing

    Granular computing

    Granular computing is an emerging computing paradigm of information processing that concerns the processing of complex information entities called "information granules", which arise in the process of data abstraction and derivation of knowledge from information or data. Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency, indistinguishability, coherency, or the like. At present, granular computing is more a theoretical perspective than a coherent set of methods or principles. As a theoretical perspective, it encourages an approach to data that recognizes and exploits the knowledge present in data at various levels of resolution or scales. In this sense, it encompasses all methods which provide flexibility and adaptability in the resolution at which knowledge or information is extracted and represented. == Types of granulation == As mentioned above, granular computing is not an algorithm or process; there is no particular method that is called "granular computing". It is rather an approach to looking at data that recognizes how different and interesting regularities in the data can appear at different levels of granularity, much as different features become salient in satellite images of greater or lesser resolution. On a low-resolution satellite image, for example, one might notice interesting cloud patterns representing cyclones or other large-scale weather phenomena, while in a higher-resolution image, one misses these large-scale atmospheric phenomena but instead notices smaller-scale phenomena, such as the interesting pattern that is the streets of Manhattan. The same is generally true of all data: At different resolutions or granularities, different features and relationships emerge. The aim of granular computing is to try to take advantage of this fact in designing more effective machine-learning and reasoning systems. There are several types of granularity that are often encountered in data mining and machine learning, and we review them below: === Value granulation (discretization/quantization) === One type of granulation is the quantization of variables. It is very common that in data mining or machine-learning applications the resolution of variables needs to be decreased in order to extract meaningful regularities. An example of this would be a variable such as "outside temperature" (temp), which in a given application might be recorded to several decimal places of precision (depending on the sensing apparatus). However, for purposes of extracting relationships between "outside temperature" and, say, "number of health-club applications" (club), it will generally be advantageous to quantize "outside temperature" into a smaller number of intervals. ==== Motivations ==== There are several interrelated reasons for granulating variables in this fashion: Based on prior domain knowledge, there is no expectation that minute variations in temperature (e.g., the difference between 80–80.7 °F (26.7–27.1 °C)) could have an influence on behaviors driving the number of health-club applications. For this reason, any "regularity" which our learning algorithms might detect at this level of resolution would have to be spurious, as an artifact of overfitting. By coarsening the temperature variable into intervals the difference between which we do anticipate (based on prior domain knowledge) might influence number of health-club applications, we eliminate the possibility of detecting these spurious patterns. Thus, in this case, reducing resolution is a method of controlling overfitting. By reducing the number of intervals in the temperature variable (i.e., increasing its grain size), we increase the amount of sample data indexed by each interval designation. Thus, by coarsening the variable, we increase sample sizes and achieve better statistical estimation. In this sense, increasing granularity provides an antidote to the so-called curse of dimensionality, which relates to the exponential decrease in statistical power with increase in number of dimensions or variable cardinality. Independent of prior domain knowledge, it is often the case that meaningful regularities (i.e., which can be detected by a given learning methodology, representational language, etc.) may exist at one level of resolution and not at another. For example, a simple learner or pattern recognition system may seek to extract regularities satisfying a conditional probability threshold such as p ( Y = y j | X = x i ) ≥ α . {\displaystyle p(Y=y_{j}|X=x_{i})\geq \alpha .} In the special case where α = 1 , {\displaystyle \alpha =1,} this recognition system is essentially detecting logical implication of the form X = x i → Y = y j {\displaystyle X=x_{i}\rightarrow Y=y_{j}} or, in words, "if X = x i , {\displaystyle X=x_{i},} then Y = y j {\displaystyle Y=y_{j}} ". The system's ability to recognize such implications (or, in general, conditional probabilities exceeding threshold) is partially contingent on the resolution with which the system analyzes the variables. As an example of this last point, consider the feature space shown to the right. The variables may each be regarded at two different resolutions. Variable X {\displaystyle X} may be regarded at a high (quaternary) resolution wherein it takes on the four values { x 1 , x 2 , x 3 , x 4 } {\displaystyle \{x_{1},x_{2},x_{3},x_{4}\}} or at a lower (binary) resolution wherein it takes on the two values { X 1 , X 2 } . {\displaystyle \{X_{1},X_{2}\}.} Similarly, variable Y {\displaystyle Y} may be regarded at a high (quaternary) resolution or at a lower (binary) resolution, where it takes on the values { y 1 , y 2 , y 3 , y 4 } {\displaystyle \{y_{1},y_{2},y_{3},y_{4}\}} or { Y 1 , Y 2 } , {\displaystyle \{Y_{1},Y_{2}\},} respectively. At the high resolution, there are no detectable implications of the form X = x i → Y = y j , {\displaystyle X=x_{i}\rightarrow Y=y_{j},} since every x i {\displaystyle x_{i}} is associated with more than one y j , {\displaystyle y_{j},} and thus, for all x i , {\displaystyle x_{i},} p ( Y = y j | X = x i ) < 1. {\displaystyle p(Y=y_{j}|X=x_{i})<1.} However, at the low (binary) variable resolution, two bilateral implications become detectable: X = X 1 ↔ Y = Y 1 {\displaystyle X=X_{1}\leftrightarrow Y=Y_{1}} and X = X 2 ↔ Y = Y 2 {\displaystyle X=X_{2}\leftrightarrow Y=Y_{2}} , since every X 1 {\displaystyle X_{1}} occurs iff Y 1 {\displaystyle Y_{1}} and X 2 {\displaystyle X_{2}} occurs iff Y 2 . {\displaystyle Y_{2}.} Thus, a pattern recognition system scanning for implications of this kind would find them at the binary variable resolution, but would fail to find them at the higher quaternary variable resolution. ==== Issues and methods ==== It is not feasible to exhaustively test all possible discretization resolutions on all variables in order to see which combination of resolutions yields interesting or significant results. Instead, the feature space must be preprocessed (often by an entropy analysis of some kind) so that some guidance can be given as to how the discretization process should proceed. Moreover, one cannot generally achieve good results by naively analyzing and discretizing each variable independently, since this may obliterate the very interactions that we had hoped to discover. A sample of papers that address the problem of variable discretization in general, and multiple-variable discretization in particular, is as follows: Chiu, Wong & Cheung (1991), Bay (2001), Liu et al. (2002), Wang & Liu (1998), Zighed, Rabaséda & Rakotomalala (1998), Catlett (1991), Dougherty, Kohavi & Sahami (1995), Monti & Cooper (1999), Fayyad & Irani (1993), Chiu, Cheung & Wong (1990), Nguyen & Nguyen (1998), Grzymala-Busse & Stefanowski (2001), Ting (1994), Ludl & Widmer (2000), Pfahringer (1995), An & Cercone (1999), Chiu & Cheung (1989), Chmielewski & Grzymala-Busse (1996), Lee & Shin (1994), Liu & Wellman (2002), Liu & Wellman (2004). === Variable granulation (clustering/aggregation/transformation) === Variable granulation is a term that could describe a variety of techniques, most of which are aimed at reducing dimensionality, redundancy, and storage requirements. We briefly describe some of the ideas here, and present pointers to the literature. ==== Variable transformation ==== A number of classical methods, such as principal component analysis, multidimensional scaling, factor analysis, and structural equation modeling, and their relatives, fall under the genus of "variable transformation." Also in this category are more modern areas of study such as dimensionality reduction, projection pursuit, and independent component analysis. The common goal of these methods in general is to find a representation of the data in terms of new variables, which are a linear or nonlinear transformation of the original variables, and in which important stati

    Read more →
  • Hallin's spheres

    Hallin's spheres

    Hallin's spheres is a theory of news reporting and its rhetorical framing posited by journalism historian Daniel C. Hallin in his 1986 book The Uncensored War to explain the news coverage of the Vietnam War. Hallin divides the world of political discourse into three concentric spheres: consensus, legitimate controversy, and deviance. In the sphere of consensus, journalists assume everyone agrees. The sphere of legitimate controversy includes the standard political debates, and journalists are expected to remain neutral. The sphere of deviance falls outside the bounds of legitimate debate, and journalists can ignore it. These boundaries shift, as public opinion shifts. Hallin's spheres, which deals with the media, are similar to the Overton window, which deals with public opinion generally, and posits a sliding scale of public opinion on any given issue ranging from conventional wisdom to unacceptable. Hallin used the concept of framing to describe the presentation and reception of issues in public. For example, framing the use of drugs as criminal activity can encourage the public to consider that behavior anti-social. Hallin's work was later referred to in the controversial formulation of the concept of an opinion corridor, in which the range of acceptable public opinion narrows, and opinion outside that corridor moves from legitimate controversy into deviance. == Description == === Sphere of consensus === This sphere contains those topics on which there is widespread agreement, or at least the perception thereof. Within the sphere of consensus, "journalists feel free to invoke a generalized 'we' and to take for granted shared values and shared assumptions". Examples include such things as motherhood and apple pie. For topics in this sphere, journalists feel free to be advocating cheerleaders without having to be neutral or present any opposing view point and be disinterested observers." === Sphere of legitimate controversy === For topics in this sphere rational and informed people hold differing views within limited range. These topics are therefore the most important to cover, and also ones upon which journalists are seemingly obliged to remain disinterested reporters, rather than advocating for or against a particular view. Schudson notes that Hallin, in his influential study of the US media during the Vietnam War, argues that journalism's commitment to objectivity has always been compartmentalized. That is, within a certain sphere—the sphere of legitimate controversy—journalists seek conscientiously to be balanced and objective. The work of Walter Williams professor at the University of Missouri, Rod Petersen, advanced the idea that priming—controlling the narratives that media covers—can be the tool that media use to get deviant news subjects into the legitimate controversial circles of new coverage. === Sphere of deviance === Topics in this sphere are rejected by journalists as being unworthy of general consideration. Such views are perceived as being out of hand, unfounded, taboo, or of such minor consequence that they are not newsworthy. Hallin argues that in the sphere of deviance, "journalists also depart from standard norms of objective reporting and feel authorized to treat as marginal, laughable, dangerous". They either avoid mentioning or ridicule the controversial subject as outside the bounds of acceptable controversy; and they censor the individuals and groups who are associated with it. A simple example: a person claiming that aliens are manipulating college basketball scores might have difficulty finding sports media coverage for such a claim. A more political example: the US media regulator FCC's "Fairness Doctrine" aimed at radio stations, advocated balance between right and left political news and opinions, yet specified that broadcasters did not have to reserve any space or time for Communist viewpoints. == Uses of the terms == Craig Watkins (2001, pp. 92–94) makes use of the Hallin's spheres in a paper examining ABC, CBS, and NBC television network television news coverage of the Million Man March, a demonstration that took place in Washington, D.C., on October 16, 1995. Watkins analyzes the dominant framing practices—problem definition, rhetorical devices, use of sources, and images—employed by journalists to make sense of this particular expression of political protest. He argues that Hallin's three spheres are a way for media framing practices to develop specific reportorial contexts, and each sphere develops its own distinct style of news reporting resources by different rhetorical tropes and discourses. Piers Robinson (2001, p. 536) uses the concept in relation to debates that have emerged over the extent to which the mass media serves elite interests or, alternatively, plays a powerful role in shaping political outcomes. His article reviews Hallin's spheres as an example of media-state relations, that highlights theoretical and empirical shortcomings in the 'manufacturing consent' thesis (Chomsky, McChesney). Robinson argues that a more nuanced and bi-directional understanding is needed of the direction of influence between media and the state that builds upon, rather than rejecting, existing theoretical accounts. Hallin's theory assumed a relatively homogenized media environment, where most producers were trying to reach most consumers. A more fractured media landscape can challenge this assumption because different audiences may place topics in different spheres, a concept related to the filter bubble, which posits that many members of the public choose to limit their media consumption to the areas of consensus and deviance that they personally prefer.

    Read more →
  • Knowledge processing for robots

    Knowledge processing for robots

    KnowRob (Knowledge processing for robots) is a system which combines knowledge representation and reasoning methods to acquire and ground knowledge. This system is the backbone of openEASE. both under developing at the Institute for Artificial Intelligence at the University of Bremen, Germany. == The framework == KnowRob can serve as a common sense framework for the integration of knowledge. This knowledge can be static encyclopedic knowledge, common sense knowledge, task descriptions, environment models, object information, observed actions, etc., which can come from different sources, like manually axiomatized, derived from observations, or imported from the web. KnowRob has been used by different research groups, as the Rice University using the ontological knowledge base in a robotic platform. As well by the Eindhoven University of Technology research group competing in the RoboCup league, in the "at Home" category, with the RoboEarth project. As well, KnowRob is mentioned in the work of some research groups from the Lucian Blaga University of Sibiu, Middle East Technical University in their combination of different knowledge bases, Keio University as related work because of the ontology service, University of Texas at Austin as related work as well because of the relation with the work presented, Hanyang University as related work as an OWL based knowledge processing framework. == Representations == To represent the knowledge, KnowRob uses the OWL ontology language and an extended first-order logic knowledge representation with computable predicates. To give the order of subactions, KnowRob includes a pair-wise ordering constrain, which gives a partial ordering. KnowRob adopts the closed-world assumption Prolog, and an open-world assumption by the use of computables. To include reasoning rules into Prolog, KnowRob uses an inference procedure beyond the capabilities of OWL to extract information about tasks executions. In its second version, KnowRob provides a logic interface to the hybrid reasoning kernel as a logic based language. This language presents the hybrid reasoning kernel as if everything were entities retrievable by providing partial descriptions for them. This entities descriptions include objects, their parts, and articulation models, environments composed of objects, software components, actions, and events. === Episodic memories === Episodic memory is related to the experience information, which is organized temporally and spatially, alongside combined with context information. In KnowRob, an episodic memory is understood as a recording that the agent makes of the ongoing activity, which includes very detailed information about the actions, motions, their purposes, effects and the behavior they generate, it also includes the images captured during execution, etc. == Usage == The knowledge is computed by external methods using Prolog queries. In the second version of the KnowRob system, is included a better structure of the packages and documentations. Which includes some extensions from the previous version, as well as a logic based language. For example, a cup description from perception can be represented in this language as: entity(Cup,[an, object, [type, cup], [shape, cylinder], [color, orange]]) As well, a controller could represent the same object as: entity(Cup, [an, object, [type, cup], [proper_physical_parts, [an, object, [type, handle], [grasp−pose, G−pose]]]]) The interface language is comparable to other query languages for symbolic knowledge bases. KnowRob's query language integrates reasoning methods, such as the simulation-based reasoning. == Goals == The goal of the KnowRob framework is to make semantic knowledge available for service robots. It is able to answer queries about missing information in vague instructions for tasks. This is possible with the actions hierarchical representation and information about objects which can be included in certain action.

    Read more →
  • OpenCog

    OpenCog

    OpenCog is a project that aims to build an open source artificial intelligence framework. OpenCog Prime is an architecture for robot and virtual embodied cognition that defines a set of interacting components designed to give rise to human-equivalent artificial general intelligence (AGI) as an emergent phenomenon of the whole system. OpenCog Prime's design is primarily the work of Ben Goertzel while the OpenCog framework is intended as a generic framework for broad-based AGI research. Research utilizing OpenCog has been published in journals and presented at conferences and workshops including the annual Conference on Artificial General Intelligence. OpenCog is released under the terms of the GNU Affero General Public License. OpenCog is in use by more than 50 companies, including Huawei and Cisco. == Origin == OpenCog was originally based on the release in 2008 of the source code of the proprietary "Novamente Cognition Engine" (NCE) of Novamente LLC. The original NCE code is discussed in the PLN book (ref below). Ongoing development of OpenCog is supported by Artificial General Intelligence Research Institute (AGIRI), the Google Summer of Code project, Hanson Robotics, SingularityNET and others. == Components == OpenCog consists of: A graph database, dubbed the AtomSpace, that holds "atoms" (that is, terms, atomic formulas, sentences and relationships) together with their "values" (valuations or interpretations, which can be thought of as per-atom key-value databases). An example of a value would be a truth value. Atoms are globally unique, immutable and are indexed (searchable); values are fleeting and changeable. A collection of pre-defined atoms, termed Atomese, used for generic knowledge representation, such as conceptual graphs and semantic networks, as well as to represent and store the rules (in the sense of term rewriting) needed to manipulate such graphs. A collection of pre-defined atoms that encode a type subsystem, including type constructors and function types. These are used to specify the types of variables, terms and expressions, and are used to specify the structure of generic graphs containing variables. A collection of pre-defined atoms that encode both functional and imperative programming styles. These include the lambda abstraction for binding free variables into bound variables, as well as for performing beta reduction. A collection of pre-defined atoms that encode a satisfiability modulo theories solver, built in as a part of a generic graph query engine, for performing graph and hypergraph pattern matching (isomorphic subgraph discovery). This generalizes the idea of a structured query language (SQL) to the domain of generic graphical queries; it is an extended form of a graph query language. A generic rule engine, including a forward chainer and a backward chainer, that is able to chain together rules. The rules are exactly the graph queries of the graph query subsystem, and so the rule engine vaguely resembles a query planner. It is designed so as to allow different kinds of inference engines and reasoning systems to be implemented, such as Bayesian inference or fuzzy logic, or practical tasks, such as constraint solvers or motion planners. An attention allocation subsystem based on economic theory, termed ECAN. This subsystem is used to control the combinatorial explosion of search possibilities that are met during inference and chaining. An implementation of a probabilistic reasoning engine based on probabilistic logic networks. The current implementation uses the rule engine to chain together specific rules of logical inference (such as modus ponens), together with some very specific mathematical formulas assigning a probability and a confidence to each deduction. This subsystem can be thought of as a certain kind of proof assistant that works with a modified form of Bayesian inference. A probabilistic genetic program evolver called Meta-Optimizing Semantic Evolutionary Search, or MOSES. This is used to discover collections of short Atomese programs that accomplish tasks; these can be thought of as performing a kind of decision tree learning, resulting in a kind of decision forest, or rather, a generalization thereof. A natural language input system consisting of Link Grammar, and partly inspired by both Meaning-Text Theory as well as Dick Hudson's Word Grammar, which encodes semantic and syntactic relations in Atomese. A natural language generation system. An implementation of Psi-Theory for handling emotional states, drives and urges, dubbed OpenPsi. Interfaces to Hanson Robotics robots, including emotion modelling via OpenPsi. This includes the Loving AI project, used to demonstrate meditation techniques. == Organization and funding == In 2008, the Machine Intelligence Research Institute (MIRI), formerly called Singularity Institute for Artificial Intelligence (SIAI), sponsored several researchers and engineers. Many contributions from the open source community have been made since OpenCog's involvement in the Google Summer of Code in 2008 and 2009. Currently MIRI no longer supports OpenCog. OpenCog has received funding and support from several sources, including the Hong Kong government, Hong Kong Polytechnic University, the Jeffrey Epstein VI Foundation and Hanson Robotics. In 2013, OpenCog began providing AI solutions to Hanson Robotics, and in 2017, OpenCog became a founding member of SingularityNET. == Applications == Similar to other cognitive architectures, the main purpose is to create virtual humans, which are three dimensional avatar characters. The goal is to mimic behaviors like emotions, gestures and learning. For example, the emotion module in the software was only programmed because humans have emotions. Artificial General Intelligence can be realized if it simulates intelligence of humans. The self-description of the OpenCog project provides additional possible applications which are going into the direction of natural language processing and the simulation of a dog.

    Read more →
  • LanguageWare

    LanguageWare

    LanguageWare is a natural language processing (NLP) technology developed by IBM, which allows applications to process natural language text. It comprises a set of Java libraries that provide a range of NLP functions: language identification, text segmentation/tokenization, normalization, entity and relationship extraction, and semantic analysis and disambiguation. The analysis engine uses a finite-state machine approach at multiple levels, which aids its performance characteristics while maintaining a reasonably small footprint. The behaviour of the system is driven by a set of configurable lexico-semantic resources which describe the characteristics and domain of the processed language. A default set of resources comes as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources have been created that capture additional vocabularies, terminologies, rules and grammars, which may be generic to the language or specific to one or more domains. A set of Eclipse-based customization tooling, LanguageWare Resource Workbench, is available on IBM's alphaWorks site, and allows domain knowledge to be compiled into these resources and thereby incorporated into the analysis process. LanguageWare can be deployed as a set of UIMA-compliant annotators, Eclipse plug-ins or Web Services.

    Read more →
  • Quantum Artificial Intelligence Lab

    Quantum Artificial Intelligence Lab

    The Quantum Artificial Intelligence Lab (also called the Quantum AI Lab or QuAIL) is a joint initiative of NASA, Universities Space Research Association, and Google (specifically, Google Research) whose goal is to pioneer research on how quantum computing might help with machine learning and other difficult computer science problems. The lab is hosted at NASA's Ames Research Center. == History == The Quantum AI Lab was announced by Google Research in a blog post on May 16, 2013. At the time of launch, the Lab was using the most advanced commercially available quantum computer, D-Wave Two from D-Wave Systems. On October 10, 2013, Google released a short film describing the current state of the Quantum AI Lab. On October 18, 2013, Google announced that it had incorporated quantum physics into Minecraft. In January 2014, Google reported results comparing the performance of the D-Wave Two in the lab with that of classical computers. The results were ambiguous and provoked heated discussion on the Internet. On 2 September 2014, it was announced that the Google Quantum AI Lab, in partnership with UC Santa Barbara, would be launching an initiative to create quantum information processors based on superconducting electronics. On the 23rd of October 2019, the Quantum AI Lab announced in a paper that it had achieved quantum supremacy with their Sycamore processor. The claim of quantum supremacy achievement has since been debated, with a far more accurate simulation on a classical computer being possible in 2.5 days as a conservative estimate. == Present == On December 9, 2024, Google introduced the Willow processor, describing it as a "state-of-the-art quantum chip". Google claims that this new chip takes just five minutes to solve a problem that takes traditional supercomputers ten septillion years. However, experts say Willow is, for now, a largely experimental device.

    Read more →
  • DeepSeek

    DeepSeek

    Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is owned and funded by High-Flyer, a Chinese hedge fund. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies. The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025. DeepSeek-R1 provided responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1. Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1. DeepSeek's success against larger and more established rivals has been described as "upending AI". DeepSeek's models are described as "open-weight", meaning the exact parameters are openly shared, but the training data is not openly licensed. Since the January 2025 debut of DeepSeek-R1, the company has made its new models available under free and open-source software licenses, primarily the MIT License. The company reportedly recruits AI researchers from top Chinese universities and also hires from outside traditional computer science fields to broaden its models' knowledge and capabilities. DeepSeek significantly reduced training expenses for their R1 model by incorporating techniques such as mixture of experts (MoE) layers. The company also trained its models during ongoing trade restrictions on AI chip exports to China, using weaker AI chips intended for export and employing fewer units overall. Observers say this breakthrough sent "shock waves" through the industry which were described as triggering a "Sputnik moment" for the US in the field of artificial intelligence, particularly due to its open-source, cost-effective, and high-performing AI models. This threatened established AI hardware leaders such as Nvidia; Nvidia's share price dropped sharply, losing US$600 billion in market value, the largest single-company decline in U.S. stock market history. == History == === Founding and early years (2016–2023) === In February 2016, High-Flyer was co-founded by AI enthusiast Liang Wenfeng, who had been trading since the 2008 financial crisis while attending Zhejiang University. The company began stock trading using a GPU-dependent deep learning model on 21 October 2016; before then, it had used CPU-based linear models. By the end of 2017, most of its trading was driven by AI. Liang established High-Flyer as a hedge fund focused on developing and using AI trading algorithms, and by 2021 the firm was using AI exclusively, often using Nvidia chips. In 2019, the company began constructing its first computing cluster, Fire-Flyer, at a cost of 200 million yuan; it contained 1,100 GPUs interconnected at 200 Gbit/s and was retired after 1.5 years in operation. By 2021, Liang had started buying large quantities of Nvidia GPUs for an AI project, reportedly obtaining 10,000 Nvidia A100 GPUs before the United States restricted chip sales to China. Computing cluster Fire-Flyer 2 began construction in 2021 with a budget of 1 billion yuan. It was reported that in 2022, Fire-Flyer 2's capacity had been used at over 96%, totaling 56.74 million GPU hours. 27% was used to support scientific computing outside the company. During 2022, Fire-Flyer 2 had 5,000 PCIe A100 GPUs in 625 nodes, each containing 8 GPUs. At the time, it exclusively used PCIe instead of the DGX version of A100, since at the time the models it trained could fit within a single 40 GB GPU VRAM and so there was no need for the higher bandwidth of DGX (i.e., it required only data parallelism but not model parallelism). Later, it incorporated NVLinks and NCCL (Nvidia Collective Communications Library) to train larger models that required model parallelism. On 14 April 2023, High-Flyer announced the launch of an artificial general intelligence (AGI) research lab, stating that the new lab would focus on developing AI tools unrelated to the firm's financial business. Two months later, on 17 July 2023, that lab was spun off into an independent company, DeepSeek, with High-Flyer as its principal investor and backer. Venture capital investors were reluctant to provide funding, as they considered it unlikely that the venture would be able to quickly generate an "exit". === Model releases since 2023 === DeepSeek released its first model, DeepSeek Coder, on 2 November 2023, followed by the DeepSeek-LLM series on 29 November 2023. In January 2024, it released two DeepSeek-MoE models (Base and Chat), and in April 3 DeepSeek-Math models (Base, Instruct, and RL). DeepSeek-V2 was released in May 2024, followed a month later by the DeepSeek-Coder V2 series. In September 2024, DeepSeek V2.5 was introduced and revised in December. On 20 November 2024, the preview of DeepSeek-R1-Lite became available via chat. In December, DeepSeek-V3-Base and DeepSeek-V3 (chat) were released. On 20 January 2025, DeepSeek launched the DeepSeek chatbot—based on the DeepSeek-R1 model—free for iOS and Android. By 27 January, DeepSeek surpassed ChatGPT as the most downloaded freeware app on the iOS App Store in the United States, triggering an 18% drop in Nvidia's share price. On 24 March 2025, DeepSeek released DeepSeek-V3-0324 under the MIT License. On 28 May 2025, DeepSeek released DeepSeek-R1-0528 under the MIT License. The model has been noted for more tightly following official Chinese Communist Party ideology and censorship in its answers to questions than prior models. On 21 August 2025, DeepSeek released DeepSeek V3.1 under the MIT License. This model features a hybrid architecture with thinking and non-thinking modes. It also surpasses prior models like V3 and R1, by over 40% on certain benchmarks like SWE-bench and Terminal-bench. It was updated to V3.1-Terminus on 22 September 2025. V3.2-Exp was released on 29 September 2025. It uses DeepSeek Sparse Attention, a more efficient attention mechanism based on previous research published in February. DeepSeek-V3.2 was released on 1 December 2025, alongside a DeepSeek-V3.2-Speciale variant that focused on reasoning. In February 2026, Anthropic accused DeepSeek of using thousands of fraudulent accounts to generate millions of conversations with Claude to train its own large language models. In April 2026, investors began speaking with DeepSeek for a $300 million funding round, which would bring DeepSeek to a total valuation of $10 billion. On April 24, 2026, DeepSeek released a preview of its V4 series, including the 1.6-trillion parameter DeepSeek-V4-Pro and the 284-billion parameter DeepSeek-V4-Flash, both featuring a 1-million token context window, under the MIT License. DeepSeek's V4 LLM has been adopted by key semiconductor manufacturers and artificial intelligence chipmakers such as Huawei and Cambricon. == Company operation == DeepSeek is headquartered in Hangzhou, Zhejiang, and is owned and funded by High-Flyer. Its co-founder, Liang Wenfeng, serves as CEO. As of May 2024, Liang personally held an 84% stake in DeepSeek through two shell corporations. === Strategy === DeepSeek has stated that it focuses on research and does not have immediate plans for commercialization. This posture also means it can skirt certain provisions of China's AI regulations aimed at consumer-facing technologies. DeepSeek's hiring approach emphasizes skills over lengthy work experience, resulting in many hires fresh out of university. The company likewise recruits individuals without computer science backgrounds to expand the range of expertise incorporated into the models, for instance in poetry or advanced mathematics. According to The New York Times, dozens of DeepSeek researchers have or have previously had affiliations with People's Liberation Army laboratories and the Seven Sons of National Defence. Due to the impact of United States restrictions on chips, DeepSeek refined its algorithms to maximise computational efficiency and thereby leveraged older hardware and reduced energy consumption. DeepSeek also expanded on the African continent as it offers more affordable and less power-hungry AI solutions. The company has bolstered African language models and generated a number of startups, for example in Nairobi. Along with Huawei's storage and cloud computing services, the impact on the tech scene in sub-saharan Africa is considerable. DeepSeek offers local data sovereignty and more flexibility compared to Western AI platforms. == Training framework == High-Flyer/DeepSeek had operated at least two primary computing clusters: Fire-Flyer (萤火一号) and Fire-Flyer 2 (萤火二号). Fire-Flyer 1 was constructed in 2019 and was retired after 1.5 years of operation. Fi

    Read more →