Lübke English

Lübke English

The term Lübke English (or, in German, Lübke-Englisch) refers to nonsensical English created by literal word-by-word translation of German phrases, disregarding differences between the languages in syntax and meaning. Lübke English is named after Heinrich Lübke, a president of Germany in the 1960s, whose limited English made him a target of German humorists. In 2006, the German magazine konkret revealed that most of the statements ascribed to Lübke were in fact invented by the editorship of Der Spiegel, mainly by staff writer Ernst Goyke and subsequent letters to the editor. In the 1980s, comedian Otto Waalkes had a routine called "English for Runaways", which is a nonsensical literal translation of Englisch für Fortgeschrittene (actually an idiom for 'English for advanced speakers' in German – note that fortschreiten divides into fort, meaning "away" or "forward", and schreiten, meaning "to walk in steps"). In this mock "course", he translates every sentence back or forth between English and German at least once (usually from German literally into English). Though there are also other, more complex language puns, the title of this routine has gradually replaced the term Lübke English when a German speaker wants to point out naive literal translations.

Starlight Information Visualization System

Starlight is a software product originally developed at Pacific Northwest National Laboratory and now by Future Point Systems. It is an advanced visual analysis environment. In addition to using information visualization to show the importance of individual pieces of data by showing how they relate to one another, it also contains a small suite of tools useful for collaboration and data sharing, as well as data conversion, processing, augmentation and loading. The software, originally developed for the intelligence community, allows users to load data from XML files, databases, RSS feeds, web services, HTML files, Microsoft Word, PowerPoint, Excel, CSV, Adobe PDF, TXT files, etc. and analyze it with a variety of visualizations and tools. The system integrates structured, unstructured, geospatial, and multimedia data, offering comparisons of information at multiple levels of abstraction, simultaneously and in near real-time. In addition Starlight allows users to build their own named entity-extractors using a combination of algorithms, targeted normalization lists and regular expressions in the Starlight Data Engineer (SDE). As an example, Starlight might be used to look for correlations in a database containing records about chemical spills. An analyst could begin by grouping records according to the cause of the spill to reveal general trends. Sorting the data a second time, they could apply different colors based on related details such as the company responsible, age of equipment or geographic location. Maps and photographs could be integrated into the display, making it even easier to recognize connections among multiple variables. Starlight has been deployed to both the Iraq and Afghanistan wars and used on a number of large-scale projects. PNNL began developing Starlight in the mid-1990s, with funding from the Land Information Warfare Agency, a part of the Army Intelligence and Security Command and continued developed at the laboratory with funding from the NSA and the CIA. Starlight integrates visual representations of reports, radio transcripts, radar signals, maps and other information. The software system was recently honored with an R&D 100 Award for technical innovation. In 2006 Future Point Systems, a Silicon Valley startup, acquired rights to jointly develop and distribute the Starlight product in cooperation with the Pacific Northwest National Laboratory. The software is now also used outside of the military/intelligence communities in a number of commercial environments.

ChromaDB

Chroma or ChromaDB is open-source data infrastructure tailored to applications with large language models. Its headquarters are in San Francisco. In April 2023, it raised 18 million US dollars as seed funding. ChromaDB has been used in academic studies on artificial intelligence, particularly as part of the tech stack for retrieval-augmented generation.

Chatbot

A chatbot (originally chatterbot) is a software application or web interface designed to converse through text or speech. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing. Simpler chatbots have existed for decades. Chatbots have gained popularity during the AI boom of the 2020s, with the releases of generative AI chatbots such as ChatGPT, Gemini, Claude, and Grok. These chatbots typically use fine-tuned large language models to generate text. A major area where chatbots have long been used is customer service and support, with various sorts of virtual assistants. == History == === Turing test === In 1950, Alan Turing published an article entitled "Computing Machinery and Intelligence" in which he proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, to the extent that the judge is incapable of reliably distinguishing, on the basis of the conversational content alone, between the program and a real human. === Early chatbots === Joseph Weizenbaum's program ELIZA was first published in 1966. Weizenbaum did not claim that ELIZA was genuinely intelligent, and the introduction to his paper presented it more as a debunking exercise:In artificial intelligence, machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained, its magic crumbles away; it stands revealed as a mere collection of procedures. The observer says to himself "I could have written that". With that thought, he moves the program in question from the shelf marked "intelligent", to that reserved for curios. The object of this paper is to cause just such a re-evaluation of the program about to be "explained". Few programs ever needed it more. ELIZA's key method of operation involves the recognition of clue words or phrases in the input, and the output of the corresponding pre-prepared or pre-programmed responses that can move the conversation forward in an apparently meaningful way (e.g. by responding to any input that contains the word 'MOTHER' with 'TELL ME MORE ABOUT YOUR FAMILY'). Thus an illusion of understanding is generated, even though the processing involved has been merely superficial. ELIZA showed that such an illusion is surprisingly easy to generate because human judges are ready to give the benefit of the doubt when conversational responses are capable of being interpreted as "intelligent". Following ELIZA, psychiatrist Kenneth Colby developed PARRY in 1972. From 1978 to some time after 1983, the CYRUS project led by Janet Kolodner constructed a chatbot simulating Cyrus Vance (57th United States Secretary of State). It used case-based reasoning, and updated its database daily by parsing wire news from United Press International. The program was unable to process the news items subsequent to the surprise resignation of Cyrus Vance in April 1980, and the team constructed another chatbot simulating his successor, Edmund Muskie. In 1984, an interactive version of the program Racter was released which acted as a chatbot. A.L.I.C.E. was released in 1995. This uses a markup language called AIML, which is specific to its function as a conversational agent, and has since been adopted by various other developers of, so-called, Alicebots. A.L.I.C.E. is a weak AI without any reasoning capabilities. It is based on a similar pattern matching technique as ELIZA in 1966. This is not strong AI, which would require sapience and logical reasoning abilities. Jabberwacky, released in 1997, learns new responses and context based on real-time user interactions, rather than being driven from a static database. Chatbot competitions focus on the Turing test or more specific goals. Two such annual contests are the Loebner Prize and The Chatterbox Challenge (the latter has been offline since 2015, however, materials can still be found from web archives). Pre-dating the current generation of large language models, Gavagai, a Swedish language technology startup, created a Twitter-based bot in 2015 and DBpedia created a chatbot during the 2017 Google Summer of Code that communicated through Facebook Messenger. === Modern chatbots based on large language models === Modern chatbots like ChatGPT are often based on foundational large language models called generative pre-trained transformers (GPT). They are based on a deep learning architecture called the transformer, which contains artificial neural networks. They generate text after being trained on a large text corpus, and have emergent abilities that they are not specifically trained for. Chatbots integrated into apps and websites can call image-generation models or search the web. Some platforms also enable users to interact with conversational interfaces directly through web-based chat environments, allowing real-time assistance, content generation, and task automation without requiring software installation. == Application == === Messaging apps === Many companies' chatbots run on messaging apps or simply via SMS. They are used for B2C customer service, sales and marketing. In 2016, Facebook Messenger allowed developers to place chatbots on their platform. There were 30,000 bots created for Messenger in the first six months, rising to 100,000 by September 2017. Since September 2017, this has also been as part of a pilot program on WhatsApp. Airlines KLM and Aeroméxico both announced their participation in the testing; both airlines had previously launched customer services on the Facebook Messenger platform. The bots usually appear as one of the user's contacts, but can sometimes act as participants in a group chat. Many banks, insurers, media companies, e-commerce companies, airlines, hotel chains, retailers, health care providers, government entities, and restaurant chains have used chatbots to answer simple questions, increase customer engagement, for promotion, and to offer additional ways to order from them. Chatbots are also used in market research to collect short survey responses. A 2017 study showed 4% of companies used chatbots. In a 2016 study, 80% of businesses said they intended to have one by 2020. ==== As part of company apps and websites ==== Previous generations of chatbots were present on company websites, e.g. Ask Jenn from Alaska Airlines which debuted in 2008 or Expedia's virtual customer service agent which launched in 2011. The newer generation of chatbots includes IBM Watson-powered "Rocky", introduced in February 2017 by the New York City-based e-commerce company Rare Carat to provide information to prospective diamond buyers. ==== Chatbot sequences ==== Used by marketers to script sequences of messages, very similar to an autoresponder sequence. Such sequences can be triggered by user opt-in or the use of keywords within user interactions. After a trigger occurs a sequence of messages is delivered until the next anticipated user response. Each user response is used in the decision tree to help the chatbot navigate the response sequences to deliver the correct response message. === Company internal platforms === Companies have used chatbots for customer support, human resources, or in Internet-of-Things (IoT) projects. Overstock.com, for one, has reportedly launched a chatbot named Mila to attempt to automate certain processes when customer service employees request sick leave. Other large companies such as Lloyds Banking Group, Royal Bank of Scotland, Renault and Citroën are now using chatbots instead of call centres with humans to provide a first point of contact. In large companies, like in hospitals and aviation organizations, chatbots are also used to share information within organizations, and to assist and replace service desks. === Customer service === Chatbots have been proposed as a replacement for customer service departments. In 2026, The Financial Times reported on agentic chatbots that could do shopping for customers once given instructions. In 2016, Russia-based Tochka Bank launched a chatbot on Facebook for a range of financial services, including a possibility of making payments. In July 2016, Barclays Africa also launched a Facebook chatbot. === Healthcare === Chatbots are also appearing in the healthcare industry. A study suggested that physicians in the United States believed that chatbots would be most beneficial for scheduling doctor appointments, locating health clinics, or providing medication information. A 2025 review found that participants often rated chatbot responses as more empathic than those from clinicians. In 2020, WhatsApp worked with th

Text-to-video model

A text-to-video model is a form of generative artificial intelligence that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models. == Models == There are different models, including open source models. Chinese-language input CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on GitHub in 2022. That year, Meta Platforms released a partial text-to-video model called "Make-A-Video", and Google's Brain (later Google DeepMind) introduced Imagen Video, a text-to-video model with 3D U-Net. === 2023 === In February 2023, Runway released Gen-1 and Gen-2, among the first commercially available text-to-video and video-to-video models accessible to the public through a web interface. Gen-1, initially released as a video-to-video model, allowed users to transform existing video footage using text or image prompts. Gen-2, introduced in March 2023 and made publicly available in June 2023, added text-to-video capabilities, enabling users to generate videos from text prompts alone. In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation. The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences. In the same month, Adobe introduced Firefly AI as part of its features. === 2024 === In January 2024, Google announced development of a text-to-video model named Lumiere which is anticipated to integrate advanced video editing capabilities. Matthias Niessner and Lourdes Agapito at AI company Synthesia work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars. In June 2024, Luma Labs launched its Dream Machine video tool. That same month, Kuaishou extended its Kling AI text-to-video model to international users. In July 2024, TikTok owner ByteDance released Jimeng AI in China, through its subsidiary, Faceu Technology. By September 2024, the Chinese AI company MiniMax debuted its video-01 model, joining other established AI model companies like Zhipu AI, Baichuan, and Moonshot AI, which contribute to China's involvement in AI technology. In December 2024 Lightricks launched LTX Video as an open source model. === 2025 === Alternative approaches to text-to-video models include Google's Phenaki, Hour One, Colossyan, Runway's Gen-3 Alpha, and OpenAI's Sora, Several additional text-to-video models, such as Plug-and-Play, Text2LIVE, and TuneAVideo, have emerged. FLUX.1 developer Black Forest Labs has announced its text-to-video model SOTA. Google was preparing to launch a video generation tool named Veo for YouTube Shorts in 2025. In May 2025, Google launched the Veo 3 iteration of the model. It was noted for its impressive audio generation capabilities, which were a previous limitation for text-to-video models. In July 2025 Lightricks released an update to LTX Video capable of generating clips reaching 60 seconds, and in October 2025 it released LTX-2, with audio capabilities built in. === 2026 === In February 2026, ByteDance released Seedance 2.0, it was noted for its impressive realistic generation, motion and camera control and 15 second generation, however the model faced huge critiscism from Motion Picture Association for copyright infringement. After viewing a viral clip of a fight between actors Brad Pitt and Tom Cruise, Rhett Reese, who is the co-writer of Deadpool & Wolverine and Zombieland announced that on social media "I hate to say it. It’s likely over for us," further stating that "In next to no time, one person is going to be able to sit at a computer and create a movie indistinguishable from what Hollywood now releases." == Architecture and training == There are several architectures that have been used to create text-to-video models. Similar to text-to-image models, these models can be trained using Recurrent Neural Networks (RNNs) such as long short-term memory (LSTM) networks, which has been used for Pixel Transformation Models and Stochastic Video Generation Models, which aid in consistency and realism respectively. An alternative for these include transformer models. Generative adversarial networks (GANs), Variational autoencoders (VAEs), — which can aid in the prediction of human motion — and diffusion models have also been used to develop the image generation aspects of the model. Text-video datasets used to train models include, but are not limited to, WebVid-10M, HDVILA-100M, CCV, ActivityNet, and Panda-70M. These datasets contain millions of original videos of interest, generated videos, captioned-videos, and textual information that help train models for accuracy. Text-video datasets used to train models include, but are not limited to PromptSource, DiffusionDB, and VidProM. These datasets provide the range of text inputs needed to teach models how to interpret a variety of textual prompts. The video generation process involves synchronizing the text inputs with video frames, ensuring alignment and consistency throughout the sequence. This predictive process is subject to decline in quality as the length of the video increases due to resource limitations. The Will Smith Eating Spaghetti test is a benchmark for models. == Limitations == Despite the rapid evolution of text-to-video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs. Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility. Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model's ability to align generated video with the user's intended message. Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation. Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that stable diffusion models also struggle with. Examples include distorted hands and unreadable text. == Ethics == The deployment of text-to-video models raises ethical considerations related to content generation. These models have the potential to create inappropriate or unauthorized content, including explicit material, graphic violence, misinformation, and likenesses of real individuals without consent. Ensuring that AI-generated content complies with established standards for safe and ethical usage is essential, as content generated by these models may not always be easily identified as harmful or misleading. The ability of AI to recognize and filter out NSFW or copyrighted content remains an ongoing challenge, with implications for both creators and audiences. == Impacts and applications == Text-to-video models offer a broad range of applications that may benefit various fields, from educational and promotional to creative industries. These models can streamline content creation for training videos, movie previews, gaming assets, and visualizations, making it easier to generate content. During the Russo-Ukrainian war, fake videos made with artificial intelligence were created as part of a propaganda war against Ukraine and shared in social media. These included depictions of children in the Ukrainian Armed Forces, fake ads targeting children encouraging them to denounce critics of the Ukrainian government, or fictitious statements by Ukrainian President Volodymyr Zelenskyy about the country's surrender, among others. === Movies === Kaur vs Kore is the first Indian feature film made using generative AI which features dual role for the AI character of Sunny Leone, set to release in 2026. Chiranjeevi Hanuman – The Eternal is an Indian movie made entirely using Generative AI created by Vijay Subramaniam which is set for theatrical release in 2026. The movie was widely criticised by the Film makers in the Bollywood industr

Knowledge integration

Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

Powerset (company)

Powerset was an American company based in San Francisco, California, that, in 2006, was developing a natural language search engine for the Internet. On July 1, 2008, Powerset was acquired by Microsoft for an estimated $100 million (~$143 million in 2024). Powerset was working on building a natural language search engine that could find targeted answers to user questions (as opposed to keyword based search). For example, when confronted with a question like "Which U.S. state has the highest income tax?", conventional search engines ignore the question phrasing and instead do a search on the keywords "state", "highest", "income", and "tax". Powerset on the other hand, attempts to use natural language processing to understand the nature of the question and return pages containing the answer. The company was in the process of "building a natural language search engine that reads and understands every sentence on the Web". The company has licensed natural language technology from PARC, the former Xerox Palo Alto Research Center. On May 11, 2008, the company unveiled a tool for searching a fixed subset of English Wikipedia using conversational phrases rather than keywords. Acquisition by Microsoft: One significant milestone in Powerset's history was its acquisition by Microsoft on July 1, 2008, for an estimated $100 million. This acquisition was part of Microsoft's broader strategy to enhance its search capabilities and compete more effectively with other search engine providers, particularly Google. Natural Language Search Engine: Powerset's primary focus was on developing a natural language search engine capable of understanding and interpreting user queries in a more human-like manner. Instead of simply matching keywords, Powerset aimed to comprehend the meaning behind the words, allowing for more accurate and contextually relevant search results. Technology and Partnerships: Powerset had licensed natural language technology from PARC, the Xerox Palo Alto Research Center. This technology likely played a crucial role in the development of Powerset's NLP capabilities. Wikipedia Search Tool: In May 2008, Powerset unveiled a search tool that allowed users to search a fixed subset of English Wikipedia using conversational phrases rather than traditional keywords. This demonstrated the potential of Powerset's NLP technology in providing more precise and relevant search results. == Powerlabs == In a form of beta testing, Powerset opened an online community called Powerlabs on September 17, 2007. Business Week said: "The company hopes the site will marshal thousands of people to help build and improve its search engine before it goes public next year." Said The New York Times: "[Powerset Labs] goes far beyond the 'alpha' or 'beta' testing involved in most software projects, when users put a new product through rigorous testing to find its flaws. Powerset doesn’t have a product yet, but rather a collection of promising natural language technologies, which are the fruit of years of research at Xerox PARC." Powerlabs' initial search results are taken from Wikipedia. == Notable people == Barney Pell (born March 18, 1968, in Hollywood, California) was co-founder and CEO of Powerset. Pell received his Bachelor of Science degree in symbolic systems from Stanford University in 1989, where he graduated Phi Beta Kappa and was a National Merit Scholar. Pell received a PhD in computer science from Cambridge University in 1993, where he was a Marshall Scholar. He has worked at NASA, as chief strategist and vice president of business development at StockMaster.com (acquired by Red Herring in March, 2000) and at Whizbang! Labs. Prior to joining Powerset, Pell was an Entrepreneur-in-Residence at Mayfield Fund, a venture capital firm in Silicon Valley. Pell is also a founder of Moon Express, Inc., a U.S. company awarded a $10M commercial lunar contract by NASA and a competitor in the Google Lunar X PRIZE. Steve Newcomb was the COO and co-founder of Powerset. Prior to joining Powerset, he was a co-founder of Loudfire, General Manager at Promptu, and was on the board of directors at Jaxtr. He left Powerset in October 2007 to form Virgance, a social startup incubator. Lorenzo Thione (born in Como, Italy) was the product architect and co-founder of Powerset. Prior to joining Powerset, he worked at FXPAL in natural language processing and related research fields. Thione earned his master's degree in software engineering from the University of Texas at Austin. Ronald Kaplan, former manager of research in Natural Language Theory and Technology at PARC, served as the company's CTO and CSO. Ryan Ferrier is a member of the founding team of Powerset. He managed personnel and internal operations. After 2008 he went on to co-found Serious Business, which made Facebook applications and was later bought by Zynga. Another Powerset alumnus, Alex Le, became CTO of Serious Business and went on to become an executive producer at Zynga when it bought the company. Siqi Chen founded a stealth startup in mobile computing after leaving Powerset. Tom Preston-Werner worked at Powerset and left after the acquisition to found GitHub. == Investors == Powerset attracted a wide range of investors, many of whom had considerable experience in the venture capital field. The company received $12.5 million (~$18.2 million in 2024) in Series A funding during November 2007, co-led by the venture capital firms Foundation Capital and The Founders Fund. Among the better-known investors: Esther Dyson, founding chairman of ICANN, founder of the newsletter Release 1.0 and editor at Cnet Peter Thiel, founder and former CEO of PayPal Luke Nosek, founder of PayPal Todd Parker. Managing Partner, Hidden River Ventures Reid Hoffman, executive vice president of PayPal and founder of LinkedIn First Round Capital, seed-stage venture firm