AI Data Visualization Tools

AI Data Visualization Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Alexander Y. Tetelbaum

    Alexander Y. Tetelbaum

    Alexander Y. Tetelbaum (born August 16, 1948) is a Ukrainian American computer scientist, inventor, and academic who has contributed to electronic design automation (EDA) and artificial intelligence (AI) since the late 1960s; and holds 46 U.S. patents in EDA and related fields. Tetelbaum is the founding president of International Solomon University, the first Jewish university in Ukraine, established during a period of renewed efforts to address antisemitism in Ukraine. == Early life and education == He graduated from a Kyiv mathematical high school with a silver medal in 1966. Tetelbaum enrolled at the Kyiv Polytechnic Institute (KPI), now National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" in 1966, graduating in 1972 with an MS in Electronics with honors. He earned his PhD in Electrical and Computer Engineering from KPI in 1975, with a dissertation on electronic design automation, and his Doctor of Engineering Science in 1986. == Academic career == Tetelbaum began his academic career at KPI in 1973 as a junior scientist, becoming a professor in the Computer and Electrical Engineering Department in 1980. Later, he founded and served as president of International Solomon University in Kyiv from 1991 to 1996, the first Jewish university in Ukraine. The university became a major academic center for computer science and Jewish studies in the post-Soviet era. He was a visiting and adjunct professor at Michigan State University from 1993 to 1996. == Professional career == Tetelbaum worked as an engineer at the Kiev Institute of Cybernetics from 1972 to 1973, and later, he led the Design Automation Lab at Kyiv Polytechnic Institute from 1975 to 1987. In the United States, he served as EDA manager at Silicon Graphics Corporation from 1996 to 1998 and principal engineer at LSI Corporation from 1998 to 2012. He founded and served as CEO of Abelite Design Automation, Inc., from 2012 to 2022. == Contributions in computer science == Tetelbaum has contributed to electronic design automation (EDA) and artificial intelligence (AI) since the 1960s. His early work included methods for EDA, particularly physical design automation and mathematical optimization; and he developed force-directed placement and topological routing methods. Tetelbaum generalized Rent's rule for hierarchical systems and large blocks, proposing a graph-based framework that extends applicability to arbitrary partition sizes with improved accuracy. Additional IEEE and related conference contributions from the mid-1990s include: "Path Search for Complicated Function", 1995 IEEE International Symposium on Circuits and Systems "A Performance-driven Placement Approach of Standard Cells" (International Conference on Intelligent Systems, 1995) "Framework of a New Methodology for Behavioral to Physical Design Linkage" (38th Midwest Symposium on Circuits and Systems, 1996) Statistical timing design and variations Test Methodologies These and other works and patents contributed to timing-driven placement, crosstalk reduction, clock tree synthesis, and interconnect optimization in VLSI design. == Patents == Tetelbaum holds 46 U.S. patents in EDA and related fields. Notable examples include: For the full list of patents, see Justia Patents or Google Patents. == Publications == === Early publications in the Soviet Union === Before the appearance of American books on electronic design automation (EDA), Tetelbaum published several scientific books and monographs on the subject in Russian/Ukrainian. Electronic Design Automation, Kiev: Znanie Publisher, 1975. Planar Design of Electronic Circuits, Kiev: Znanie Publisher, 1977. Formal Design of Computer Systems, Moscow: Sovetskoe Radio, 1979. CAD of Electronic Equipment: Topological Approach, Kiev: Vyssha Shkola, 1980; 2nd ed. 1981. Automated Design of Electronic Circuits (1981) CAD of VLSI Circuits, Kiev: Vyssha Shkola, 1983. Topological Algorithms of Multilayer Printed Circuit Boards Routing, Moscow: Radio i Svyaz, 1983. CAD of VLSI Circuits on Master Slice Chips, Moscow: Radio i Svyaz, 1988. Increasing the Effectiveness of CAD Systems, Kiev: UMKVO, 1991. === Scientific Monographs (English) === Minimum Number of Timing Signoff Corners (2022) Interviewing AI (2026) The AI Debate (2026) New Nostradamus Predictions: 2026: The Next Decade & Beyond (2035–2050+) (2026) For a consolidated record of Tetelbaum's publications, see Alexander Y. Tetelbaum, Wikidata Q4720205. === Other publications === Tetelbaum also published educational books on problem-solving methods: Yes-No Puzzles-Games Puzzle Games for Kids Solving Non-Standard Problems Solving Non-Standard Very Hard Problems Additionally, Tetelbaum published three thrillers: Omerta Operations Executive Director Eruption Yacht Finally, he published his memoir and an entertaining book: Unfinished Equations Artificially Intelligent Humor

    Read more →
  • Mistral Vibe

    Mistral Vibe

    Mistral Vibe or Vibe (Le Chat until May 2026), is a chatbot that uses generative artificial intelligence developed in France by Mistral AI. Mistral Vibe is available in iOS and Android. Its services are operated on a freemium model. == History == In February 2024, Mistral AI released Le Chat. In January 2025, Mistral AI made a content deal with Agence France-Presse (AFP) that lets Le Chat query AFP's entire archive dating back to 1983. On 6 February 2025, a mobile app for Le Chat was released for iOS and Android, and a subscription tier, Pro, was introduced at a cost of $14.99 per month. In July 2025, Mistral AI released Voxtral, an open-source language model that understands and generates audio. Mistral introduced a voice mode for chatting that uses Voxtral, and projects, which allows grouping chats and files. In September 2025, Le Chat introduced the capability to remember previous conversations. In May 2026, Mistral AI announced the rebrand from Le Chat to Mistral Vibe and new features were introduced at the same time.

    Read more →
  • Predictive text

    Predictive text

    Predictive text is an input technology used where one key or button represents many letters, such as on the physical numeric keypads of mobile phones and in accessibility technologies. Each key press results in a prediction rather than repeatedly sequencing through the same group of "letters" it represents, in the same, invariable order. Predictive text could allow for an entire word to be input by a single keypress. Predictive text makes efficient use of fewer device keys to input writing into a text message, an e-mail, an address book, a calendar, and the like. The most widely used, general, predictive text systems are T9, iTap, eZiText, and LetterWise/WordWise. There are many ways to build a device that predicts text, but all predictive text systems have initial linguistic settings that offer predictions that are re-prioritized to adapt to each user. This learning adapts, by way of the device memory, to a user's disambiguating feedback that results in corrective key presses, such as pressing a "next" key to get to the intention. Most predictive text systems have a user database to facilitate this process. Theoretically the number of keystrokes required per desired character in the finished writing is, on average, comparable to using a keyboard. This is approximately true provided that all words used are in its database, punctuation is ignored, and no input mistakes are made when typing or spelling. The theoretical keystrokes per character, KSPC, of a keyboard is KSPC=1.00, and of multi-tap is KSPC=2.03. Eatoni's LetterWise is a predictive multi-tap hybrid, which when operating on a standard telephone keypad achieves KSPC=1.15 for English. The choice of which predictive text system is the best to use involves matching the user's preferred interface style, the user's level of learned ability to operate predictive text software, and the user's efficiency goal. There are various levels of risk in predictive text systems, versus multi-tap systems, because the predicted text that is automatically written provides the speed and mechanical efficiency benefit, which, if the user is not careful to review, results in transmitting misinformation. Predictive text systems take time to learn to use well, and so generally, a device's system has user options to set up the choice of multi-tap or any one of several schools of predictive text methods. == Background == Short message service (SMS) permits a mobile phone user to send text messages (also called messages, SMSes, texts, and txts) as a short message. The most common system of SMS text input is referred to as "multi-tap". Using multi-tap, a key is pressed multiple times to access the list of letters on that key. For instance, pressing the "2" key once displays an "a", twice displays a "b" and three times displays a "c". To enter two successive letters that are on the same key, the user must either pause or hit a "next" button. A user can type by pressing an alphanumeric keypad without looking at the electronic equipment display. Thus, multi-tap is easy to understand and can be used without any visual feedback. However, multi-tap is not very efficient, requiring potentially many keystrokes to enter a single letter. In ideal predictive text entry, all words used are in the dictionary, punctuation is ignored, no spelling mistakes are made, and no typing mistakes are made. The ideal dictionary would include all slang, proper nouns, abbreviations, URLs, foreign-language words and other user-unique words. This ideal circumstance gives predictive text software a reduction in the number of key strokes a user is required to enter a word. The user presses the number corresponding to each letter. As long as the word exists in the predictive text dictionary or is correctly disambiguated by non-dictionary systems, it will appear. For instance, pressing "4663" will typically be interpreted as the word good, provided that a linguistic database in English is currently in use, though alternatives such as home, hood and hoof are also valid interpretations of the sequence of key strokes. The most widely used systems of predictive text are Tegic's T9, Motorola's iTap, and the Eatoni Ergonomics' LetterWise and WordWise. T9 and iTap use dictionaries, but Eatoni Ergonomics' products use a disambiguation process, a set of statistical rules to recreate words from keystroke sequences. All predictive text systems require a linguistic database for every supported input language. == Dictionary vs. non-dictionary systems == Traditional disambiguation works by referencing a dictionary of commonly used words, though Eatoni offers a dictionaryless disambiguation system. In dictionary-based systems, as the user presses the number buttons, an algorithm searches the dictionary for a list of possible words that match the keypress combination and offers up the most probable choice. The user can then confirm the selection and move on, or use a key to cycle through the possible combinations. A non-dictionary system constructs words and other sequences of letters from the statistics of word parts. To attempt predictions of the intended result of keystrokes not yet entered, disambiguation may be combined with a word completion facility. Either system (disambiguation or predictive) may include a user database, which can be further classified as a "learning" system when words or phrases are entered into the user database without direct user intervention. The user database is for storing words or phrases that are not well disambiguated by the pre-supplied database. Some disambiguation systems further attempt to correct spelling, format text or perform other automatic rewrites, with the risky effect of either enhancing or frustrating user efforts to enter text. == History == The predictive text and autocomplete technology was invented out of necessities by Chinese scientists and linguists in the 1950s to solve the input inefficiency of the Chinese typewriter, as the typing process involved finding and selecting thousands of logographic characters on a tray, drastically slowing down the word processing speed. The actuating keys of the Chinese typewriter created by Lin Yutang in the 1940s included suggestions for the characters following the one selected. In 1951, the Chinese typesetter Zhang Jiying arranged Chinese characters in associative clusters, a precursor of modern predictive text entry, and broke speed records by doing so. Predictive entry of text from a telephone keypad has been known at least since the 1970s (Smith and Goodwin, 1971). Predictive text was mainly used to look up names in directories over the phone until mobile phone text messaging came into widespread use. == Example == On a typical phone keypad, if users wished to type the in a "multi-tap" keypad entry system, they would need to: Press 8 (tuv) once to select t. Press 4 (ghi) twice to select h. Press 3 (def) twice to select e. Meanwhile, in a phone with predictive text, they need only: Press 8 once to select the (tuv) group for the first character. Press 4 once to select the (ghi) group for the second character. Press 3 once to select the (def) group for the third character. The system updates the display as each keypress is entered, to show the most probable entry. In this example, prediction reduced the number of button presses from five to three. The effect is even greater with longer words and those composed of letters later in each key's sequence. A dictionary-based predictive system is based on the hope that the desired word is in the dictionary. That hope may be misplaced if the word differs in any way from common usage—in particular, if the word is not spelled or typed correctly, is slang, or is a proper noun. In these cases, some other mechanism must be used to enter the word. Furthermore, the simple dictionary approach fails with agglutinative languages, where a single word does not necessarily represent a single semantic entity. == Companies and products == Predictive text is developed and marketed in a variety of competing products, such as Nuance Communications's T9. Other products include Motorola's iTap; Eatoni Ergonomic's LetterWise (character, rather than word-based prediction); WordWise (word-based prediction without a dictionary); EQ3 (a QWERTY-like layout compatible with regular telephone keypads); Prevalent Devices's Phraze-It; Xrgomics' TenGO (a six-key reduced QWERTY keyboard system); Adaptxt (considers language, context, grammar and semantics); Lightkey (a predictive typing software for Windows); Clevertexting (statistical nature of the language, dictionaryless, dynamic key allocation); and Oizea Type (temporal ambiguity); Intelab's Tauto; WordLogic's Intelligent Input Platform™ (patented, layer-based advanced text prediction, includes multi-language dictionary, spell-check, built-in Web search); Google's Gboard. == Textonyms == Words produced by the same combination of keypresses have been called "textonyms"; also "txtonyms"; or "T9o

    Read more →
  • Text-to-video model

    Text-to-video model

    A text-to-video model is a form of generative artificial intelligence that uses a natural language description as input to produce a video relevant to the input text. Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models. == Models == There are different models, including open source models. Chinese-language input CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on GitHub in 2022. That year, Meta Platforms released a partial text-to-video model called "Make-A-Video", and Google's Brain (later Google DeepMind) introduced Imagen Video, a text-to-video model with 3D U-Net. === 2023 === In February 2023, Runway released Gen-1 and Gen-2, among the first commercially available text-to-video and video-to-video models accessible to the public through a web interface. Gen-1, initially released as a video-to-video model, allowed users to transform existing video footage using text or image prompts. Gen-2, introduced in March 2023 and made publicly available in June 2023, added text-to-video capabilities, enabling users to generate videos from text prompts alone. In March 2023, a research paper titled "VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation" was published, presenting a novel approach to video generation. The VideoFusion model decomposes the diffusion process into two components: base noise and residual noise, which are shared across frames to ensure temporal coherence. By utilizing a pre-trained image diffusion model as a base generator, the model efficiently generated high-quality and coherent videos. Fine-tuning the pre-trained model on video data addressed the domain gap between image and video data, enhancing the model's ability to produce realistic and consistent video sequences. In the same month, Adobe introduced Firefly AI as part of its features. === 2024 === In January 2024, Google announced development of a text-to-video model named Lumiere which is anticipated to integrate advanced video editing capabilities. Matthias Niessner and Lourdes Agapito at AI company Synthesia work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars. In June 2024, Luma Labs launched its Dream Machine video tool. That same month, Kuaishou extended its Kling AI text-to-video model to international users. In July 2024, TikTok owner ByteDance released Jimeng AI in China, through its subsidiary, Faceu Technology. By September 2024, the Chinese AI company MiniMax debuted its video-01 model, joining other established AI model companies like Zhipu AI, Baichuan, and Moonshot AI, which contribute to China's involvement in AI technology. In December 2024 Lightricks launched LTX Video as an open source model. === 2025 === Alternative approaches to text-to-video models include Google's Phenaki, Hour One, Colossyan, Runway's Gen-3 Alpha, and OpenAI's Sora, Several additional text-to-video models, such as Plug-and-Play, Text2LIVE, and TuneAVideo, have emerged. FLUX.1 developer Black Forest Labs has announced its text-to-video model SOTA. Google was preparing to launch a video generation tool named Veo for YouTube Shorts in 2025. In May 2025, Google launched the Veo 3 iteration of the model. It was noted for its impressive audio generation capabilities, which were a previous limitation for text-to-video models. In July 2025 Lightricks released an update to LTX Video capable of generating clips reaching 60 seconds, and in October 2025 it released LTX-2, with audio capabilities built in. === 2026 === In February 2026, ByteDance released Seedance 2.0, it was noted for its impressive realistic generation, motion and camera control and 15 second generation, however the model faced huge critiscism from Motion Picture Association for copyright infringement. After viewing a viral clip of a fight between actors Brad Pitt and Tom Cruise, Rhett Reese, who is the co-writer of Deadpool & Wolverine and Zombieland announced that on social media "I hate to say it. It’s likely over for us," further stating that "In next to no time, one person is going to be able to sit at a computer and create a movie indistinguishable from what Hollywood now releases." == Architecture and training == There are several architectures that have been used to create text-to-video models. Similar to text-to-image models, these models can be trained using Recurrent Neural Networks (RNNs) such as long short-term memory (LSTM) networks, which has been used for Pixel Transformation Models and Stochastic Video Generation Models, which aid in consistency and realism respectively. An alternative for these include transformer models. Generative adversarial networks (GANs), Variational autoencoders (VAEs), — which can aid in the prediction of human motion — and diffusion models have also been used to develop the image generation aspects of the model. Text-video datasets used to train models include, but are not limited to, WebVid-10M, HDVILA-100M, CCV, ActivityNet, and Panda-70M. These datasets contain millions of original videos of interest, generated videos, captioned-videos, and textual information that help train models for accuracy. Text-video datasets used to train models include, but are not limited to PromptSource, DiffusionDB, and VidProM. These datasets provide the range of text inputs needed to teach models how to interpret a variety of textual prompts. The video generation process involves synchronizing the text inputs with video frames, ensuring alignment and consistency throughout the sequence. This predictive process is subject to decline in quality as the length of the video increases due to resource limitations. The Will Smith Eating Spaghetti test is a benchmark for models. == Limitations == Despite the rapid evolution of text-to-video models in their performance, a primary limitation is that they are very computationally heavy which limits its capacity to provide high quality and lengthy outputs. Additionally, these models require a large amount of specific training data to be able to generate high quality and coherent outputs, which brings about the issue of accessibility. Moreover, models may misinterpret textual prompts, resulting in video outputs that deviate from the intended meaning. This can occur due to limitations in capturing semantic context embedded in text, which affects the model's ability to align generated video with the user's intended message. Various models, including Make-A-Video, Imagen Video, Phenaki, CogVideo, GODIVA, and NUWA, are currently being tested and refined to enhance their alignment capabilities and overall performance in text-to-video generation. Another issue with the outputs is that text or fine details in AI-generated videos often appear garbled, a problem that stable diffusion models also struggle with. Examples include distorted hands and unreadable text. == Ethics == The deployment of text-to-video models raises ethical considerations related to content generation. These models have the potential to create inappropriate or unauthorized content, including explicit material, graphic violence, misinformation, and likenesses of real individuals without consent. Ensuring that AI-generated content complies with established standards for safe and ethical usage is essential, as content generated by these models may not always be easily identified as harmful or misleading. The ability of AI to recognize and filter out NSFW or copyrighted content remains an ongoing challenge, with implications for both creators and audiences. == Impacts and applications == Text-to-video models offer a broad range of applications that may benefit various fields, from educational and promotional to creative industries. These models can streamline content creation for training videos, movie previews, gaming assets, and visualizations, making it easier to generate content. During the Russo-Ukrainian war, fake videos made with artificial intelligence were created as part of a propaganda war against Ukraine and shared in social media. These included depictions of children in the Ukrainian Armed Forces, fake ads targeting children encouraging them to denounce critics of the Ukrainian government, or fictitious statements by Ukrainian President Volodymyr Zelenskyy about the country's surrender, among others. === Movies === Kaur vs Kore is the first Indian feature film made using generative AI which features dual role for the AI character of Sunny Leone, set to release in 2026. Chiranjeevi Hanuman – The Eternal is an Indian movie made entirely using Generative AI created by Vijay Subramaniam which is set for theatrical release in 2026. The movie was widely criticised by the Film makers in the Bollywood industr

    Read more →
  • Software durability

    Software durability

    In software engineering, software durability means the solution ability of serviceability of software and to meet user's needs for a relatively long time. Software durability is important for user's satisfaction. For a software security to be durable, it must allow an organization to adjust the software to business needs that are constantly evolving, often in impulsive ways. Durability of software depends on four characteristics mainly; i.e. software trustworthiness, Human Trust for Serviceability, software dependability and software usability.

    Read more →
  • Deep image prior

    Deep image prior

    Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18

    Read more →
  • NetMiner

    NetMiner

    NetMiner is an all-in-one software platform for analyzing and visualizing complex network data, based on Social Network Analysis (SNA). Originally released in 2001, it supports research and education in a wide range of domains through interactive and visual data exploration. This tool allows researchers to explore their network data visually and interactively, and helps them to detect underlying patterns and structures of the network. It has also been recognized for its comprehensive features and user-friendly interface in comparative reviews of SNA software packages. == Features == === Integrated Data Environment === NetMiner supports unified management of diverse data types—including network (nodes and links), tabular, and unstructured text data—within a single platform. This enables users to perform the entire analysis workflow seamlessly without switching between tools. NetMiner also supports a wide range of analytical methods, allowing users to derive new insights by combining multiple approaches. Analytical results can be saved and reused across workflows(Add to Dataset) Graph and Network Analysis: Includes Centrality, Community Detection, Blockmodeling, and Similarity Measures. Machine learning: Provides algorithms for regression, classification, clustering, ensemble modeling and XAI(Explainable AI) Graph Neural Networks (GNNs): Supports models such as GraphSAGE, GCN, and GAT to learn from both node attributes and graph structure. Natural language processing (NLP): Uses pretrained deep learning models to analyze unstructured text, including named entity recognition and keyword extraction. Text mining and Text network analysis: Supports construction of word co-occurrence networks and topic modeling using LDA, BERTopic, enabling identification of thematic patterns and semantic structures in text data. Data Visualization: Offers advanced network visualization features, supporting multiple layout algorithms. Analytical outcomes such as centrality or community detection can be directly reflected in the network map via node size, color, and position, enhancing intuitive understanding. === AI Assistant === NetMiner integrates with external large language models such as OpenAI GPT and Google Gemini to interpret complex analysis results in natural language, summarize key findings, and suggest next steps for exploration. === Workflow and Usability === Designed to follow the structure of real-world data analysis workflows, NetMiner adopts a hierarchical data organization (Project → Workspace → Dataset → Data Item). Its web-based user interface improves clarity and reduces complexity. NetMiner 5 supports Windows 10 or higher and macOS 11 or later with M1 chip. Both academic and commercial licenses are available. == Extension == NetMiner Extension is small program to extend the functionality of NetMiner. In other words, it enables you to customize NetMiner according to your needs. By adding ‘NetMiner Extension’, you can expand your research. === Web Data Collection === NetMiner allows users to collect data from services such as YouTube, OpenAlex, Springer, and KCI via Open APIs. Collected data is automatically preprocessed and transformed to fit NetMiner’s internal structure, requiring no additional coding or external tools. SNS Data Collector: It collects social media data from YouTube, which has a large number of social media users worldwide. Biblio Data Collector: It collects the bibliographic data from Springer, OpenAlex, and KCI essential for research trend analysis. == File formats == === NetMiner data file format === .NMF === Importable/exportable formats === Plain text data: .TXT, .CSV Microsoft Excel data: .XLS, .XLSX Unstructured text data: .TXT, .CSV, .XLS(X) ※ NetMiner 4 only NetMiner 2 data: .NTF UCINet data: .DL, .DAT Pajek data: .NET, .VEC, .CLU, .PER StOCNET data file: .DAT Graph Modelling Language data: .GML(importing only) Related software UCINET Pajek Gephi StoCNET == Data structure == === Hierarchy of NetMiner data structure === NetMiner 5 supports not only graph data composed of nodes and links, but also tabular and unstructured data without fixed schema or identifiers. This enables users to easily import a wide variety of raw and unstructured data suitable for machine learning applications. Within a single workspace, users can manage node sets, link sets, and structured/unstructured data simultaneously. Multiple graph layers under a node set can be organized in a tree structure, allowing for intuitive understanding of the data currently being analyzed. == Release history == The first version of NetMiner was released on Dec 21, 2001. There have been five major updates from 2001. === NetMiner 5 === Released on June 9, 2025. NetMiner 5 retains the core features and no-code concept of NetMiner 4, but has evolved by integrating cutting-edge AI technologies. AI Assistant, Personal Analytics Tutor Support for Graph, Structured, and Unstructured Data Graph Analytics / Social Network Analysis Machine Learning(M/L) & XAI Graph Machine Learning(GML): Graph Neural Network Text Mining: Natural Language Processing(NLP), Text Network, Topic Modeling Data Visualization === NetMiner 4 (2011) === Latest version is 4.5.1. Introduced Python scripting, encrypted NMF format, semantic analysis tools (word cloud, topic modeling), and Extension - Data Collector. === NetMiner 3 (2007) === Enhanced scalability, integrated analysis-visualization modules, and DB import from Oracle, MS SQL. === NetMiner 2 (2003) === Improved statistical and network measures, visualization algorithms, and external data import modules.

    Read more →
  • Huroof

    Huroof

    Huroof (Arabic: حروف, lit. 'letters') is an Android kids application produced by the Islamic State, specifically the Islamic States' Al-Himmah Library, which is targeted towards kids in order to teach kids the Arabic alphabet, and to also get kids to support the Islamic State and its practices. == Application == Huroof uses child-like appearances on the main menu, and throughout multiple of Huroof's in-game games for learning the alphabet, a lot of the games reference jihadist concepts, including imagery of weapons (such as missile, tank, cannon, sword,...), 'violent' images, as well as Islamic State imagery, including the flag of the Islamic State, Huroof uses nasheeds from Ajnad Media Foundation for audio production in the app. Reportedly, Huroof was released via Telegram channels of the Islamic State, as well as other file sharing websites. It is not the first moblie app released by Islamic State, but it is the first time they released a moblie application targeting children. === Nasheed game === In the Huroof app, there's a game where you listen to a radio, with the Al-Bayan logo on it, and learn the Arabic alphabet while the nasheed plays. === Writing game === In Huroof, there's a game where you can write out letters of the Arabic alphabet, as well as numbers while a small child tells you what they are. === Letter choosing game === In the app, there's a game they shows you images, and you choose which letter that image/item starts with.

    Read more →
  • NNDB

    NNDB

    The Notable Names Database (NNDB) is an online database of biographical details of over 40,000 people. Soylent Communications, a sole proprietorship that also hosted the later defunct Rotten.com, describes NNDB as an "intelligence aggregator" of noteworthy persons, highlighting their interpersonal connections. The Rotten.com domain was registered in 1996 by former Apple and Netscape software engineer Thomas E. Dell, who was also known by his internet alias, "Soylent". == Entries == Each entry has an executive summary followed by a brief narrative about their life. It also lists date and cause of death if deceased. Businesspeople and government officials are listed with chronologies of their posts, positions, and board memberships. As of 2022, the site is no longer updated. == NNDB Mapper == The NNDB Mapper, a visual tool for exploring connections between people, was made available in May 2008. It required Adobe Flash 7.

    Read more →
  • Natural language understanding

    Natural language understanding

    Natural language understanding (NLU) or natural language interpretation (NLI) is a subset of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU has been considered an AI-hard problem. There is considerable commercial interest in the field because of its application to automated reasoning, machine translation, question answering, news-gathering, text categorization, voice-activation, archiving, and large-scale content analysis. == History == The program STUDENT, written in 1964 by Daniel Bobrow for his PhD dissertation at MIT, is one of the earliest known attempts at NLU by a computer. Eight years after John McCarthy coined the term artificial intelligence, Bobrow's dissertation (titled Natural Language Input for a Computer Problem Solving System) showed how a computer could understand simple natural language input to solve algebra word problems. A year later, in 1965, Joseph Weizenbaum at MIT wrote ELIZA, an interactive program that carried on a dialogue in English on any topic, the most popular being psychotherapy. ELIZA worked by simple parsing and substitution of key words into canned phrases and Weizenbaum sidestepped the problem of giving the program a database of real-world knowledge or a rich lexicon. Yet ELIZA gained surprising popularity as a toy project and can be seen as a very early precursor to current commercial systems such as those used by Ask.com. In 1969, Roger Schank at Stanford University introduced the conceptual dependency theory for NLU. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. In 1971, Terry Winograd finished writing SHRDLU for his PhD thesis at MIT. SHRDLU could understand simple English sentences in a restricted world of children's blocks to direct a robotic arm to move items. The successful demonstration of SHRDLU provided significant momentum for continued research in the field. Winograd continued to be a major influence in the field with the publication of his book Language as a Cognitive Process. At Stanford, Winograd would later advise Larry Page, who co-founded Google. In the 1970s and 1980s, the natural language processing group at SRI International continued research and development in the field. A number of commercial efforts based on the research were undertaken, e.g., in 1982 Gary Hendrix formed Symantec Corporation originally as a company for developing a natural language interface for database queries on personal computers. However, with the advent of mouse-driven graphical user interfaces, Symantec changed direction. A number of other commercial efforts were started around the same time, e.g., Larry R. Harris at the Artificial Intelligence Corporation and Roger Schank and his students at Cognitive Systems Corp. In 1983, Michael Dyer developed the BORIS system at Yale which bore similarities to the work of Roger Schank and W. G. Lehnert. The third millennium saw the introduction of systems using machine learning for text classification, such as the IBM Watson. However, experts debate how much "understanding" such systems demonstrate: e.g., according to John Searle, Watson did not even understand the questions. John Ball, cognitive scientist and inventor of the Patom Theory, supports this assessment. Natural language processing has made inroads for applications to support human productivity in service and e-commerce, but this has largely been made possible by narrowing the scope of the application. There are thousands of ways to request something in a human language that still defies conventional natural language processing. According to Wibe Wagemans, "To have a meaningful conversation with machines is only possible when we match every word to the correct meaning based on the meanings of the other words in the sentence – just like a 3-year-old does without guesswork." == Scope and context == The umbrella term "natural language understanding" can be applied to a diverse set of computer applications, ranging from small, relatively simple tasks such as short commands issued to robots, to highly complex endeavors such as the full comprehension of newspaper articles or poetry passages. Many real-world applications fall between the two extremes, for instance text classification for the automatic analysis of emails and their routing to a suitable department in a corporation does not require an in-depth understanding of the text, but needs to deal with a much larger vocabulary and more diverse syntax than the management of simple queries to database tables with fixed schemata. Throughout the years various attempts at processing natural language or English-like sentences presented to computers have taken place at varying degrees of complexity. Some attempts have not resulted in systems with deep understanding, but have helped overall system usability. For example, Wayne Ratliff originally developed the Vulcan program with an English-like syntax to mimic the English speaking computer in Star Trek. Vulcan later became the dBase system whose easy-to-use syntax effectively launched the personal computer database industry. Systems with an easy-to-use or English-like syntax are, however, quite distinct from systems that use a rich lexicon and include an internal representation (often as first order logic) of the semantics of natural language sentences. Hence the breadth and depth of "understanding" aimed at by a system determine both the complexity of the system (and the implied challenges) and the types of applications it can deal with. The "breadth" of a system is measured by the sizes of its vocabulary and grammar. The "depth" is measured by the degree to which its understanding approximates that of a fluent native speaker. At the narrowest and shallowest, English-like command interpreters require minimal complexity, but have a small range of applications. Narrow but deep systems explore and model mechanisms of understanding, but they still have limited application. Systems that attempt to understand the contents of a document such as a news release beyond simple keyword matching and to judge its suitability for a user are broader and require significant complexity, but they are still somewhat shallow. Systems that are both very broad and very deep are beyond the current state of the art. == Components and architecture == Regardless of the approach used, most NLU systems share some common components. The system needs a lexicon of the language and a parser and grammar rules to break sentences into an internal representation. The construction of a rich lexicon with a suitable ontology requires significant effort, e.g., the Wordnet lexicon required many person-years of effort. The system also needs theory from semantics to guide the comprehension. The interpretation capabilities of a language-understanding system depend on the semantic theory it uses. Competing semantic theories of language have specific trade-offs in their suitability as the basis of computer-automated semantic interpretation. These range from naive semantics or stochastic semantic analysis to the use of pragmatics to derive meaning from context. Semantic parsers convert natural-language texts into formal meaning representations. Advanced applications of NLU also attempt to incorporate logical inference within their framework. This is generally achieved by mapping the derived meaning into a set of assertions in predicate logic, then using logical deduction to arrive at conclusions. Therefore, systems based on functional languages such as Lisp need to include a subsystem to represent logical assertions, while logic-oriented systems such as those using the language Prolog generally rely on an extension of the built-in logical representation framework. The management of context in NLU can present special challenges. A large variety of examples and counter examples have resulted in multiple approaches to the formal modeling of context, each with specific strengths and weaknesses.

    Read more →
  • Live Transcribe

    Live Transcribe

    Live Transcribe is a mobile app for real-time captioning, developed by Google for the Android operating system. Development on the application began in partnership with Gallaudet University. It was publicly released as a free beta for Android 5.0+ on the Google Play Store on February 4, 2019. As of early 2023 it had been downloaded over 500 million times. == Development == Researchers Dimitri Kanevsky, Sagar Savla and Chet Gnegy at Google developed the app in collaboration with researchers at Gallaudet University, an American university for the education of the deaf and hard of hearing. The app uses machine learning to generate captions, similar to YouTube's auto-generated captions. In August 2019, Google made Live Transcribe an open-source project. == Features == The app uses speech recognition to generate live captions in over 80 languages with varying accuracy. The app, which requires connection to the Internet to function, is available to download on the Google Play Store. A later update to the app displayed information on sounds such as clapping, laughter, music, applause, and whistling. In May 2020, the app started supporting transcription in Albanian, Burmese, Estonian, Macedonian, Mongolian, Punjabi, and Uzbek, supporting 70 languages. In March 2022, the app was updated with support to transcribe offline, without Internet connection, so long as the appropriate language pack has been installed. The offline mode is only available for devices with 6GB of RAM and certain Google Pixel devices.

    Read more →
  • Deep image prior

    Deep image prior

    Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18

    Read more →
  • EditDV

    EditDV

    EditDV was a video editing software released by Radius, Inc. in late 1997 as an evolution of their earlier Radius Edit product. EditDV was one of the first products providing professional-quality editing of the then new DV format at a relatively affordable cost ($999 including Radius FireWire capture card) and was named "The Best Video Tool of 1998". Originally EditDV was available for Macintosh only but in February 2000 EditDV 2.0 for Windows was released. With version 3.0 EditDV's name was changed to CineStream. == Features == Originally bundled with a FireWire card, EditDV 1.5 got updated into a less expensive software only package for use with the newer PowerMac G3 that came with a FireWire interface. Later, a scaled down version named EditDV 1.6.1 Unplugged was released as a freeware version next to EditDV 2.0. Unlike many other applications at the time which transcoded video to M-JPEG for editing, EditDV provided lossless native editing of the DV format. Only transitions (such as dissolves or wipes), effects (such as rotating or scaling the video, adjusting the audio level, or adding titles) and filters (such as changing the brightness or color balance) needed to be rendered. This also had the disadvantage to not work with analogue video capture. EditDV was built on top of QuickTime and supported QuickTime filters as well as its own built-in effects and transitions. Effects could be animated using keyframes. EditDV 2.0 worked natively with Quicktime MOV format. For Microsoft Windows users, where the standard was AVI, this required the use of a provided external conversion tool afterwards when AVI was wanted. The user interface had a Project window for organising clips into bins, a Sequence window with a multi-track timeline for arranging clips into a program using three-point editing, and Source and Program monitor windows. A finished program could either be exported as a QuickTime movie or written back to DV tape using the "print to video" command. Version 3.0, then renamed CineStream, shifted towards web designers who wanted to add video streaming interactivity to a website. The new feature called EventStream allowed setting clickable hot spots to link to another location, either to another page with a URL or to another video. This feature distinguished CineStream from the rest of the competition. == Products == The EditDV product family included a number of related products, all sharing a similar name: EditDV Video editing software (Mac and Windows) SoftDV A QuickTime software codec for playing DV media, included as part of EditDV (Mac and Windows) MotoDV PCI-based FireWire interface with DV capture software (Mac and Windows) PhotoDV Software to capture high-quality stills from a DV tape using MotoDV hardware (Mac and Windows) RotoDV Software for rotoscoping (painting over video), released in Sept 1999 (Macintosh only) == Name changes and eventual demise == In 1999, the company Radius Inc. changed its name to Digital Origin. In 2000, Digital Origin Inc (and EditDV) was bought by Media 100. In early 2001, Media 100 released an updated version of EditDV under the new name CineStream 3.0. Later that year (October 2001) Media 100 was bought by Autodesk's Discreet Division. CineStream for Macintosh required classic Mac OS. It was never ported to Mac OS X and faced increasing competition on that platform from Apple's own Final Cut Pro application. Development of EditDV/Cinestream was officially discontinued in 2002.

    Read more →
  • Neural operators

    Neural operators

    Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization. The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs), which are critical tools in modeling the natural environment. Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs compared to existing machine learning methodologies while being significantly faster than numerical solvers. Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data, and the geosciences. In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media, and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces, and subsumes these settings as special cases when limited to a fixed input resolution. == Operator learning == Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function . Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training. The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence. Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids. == Definition and formulation == Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities. Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set. Neural operators seek to approximate some operator G : A → U {\displaystyle {\mathcal {G}}:{\mathcal {A}}\to {\mathcal {U}}} between function spaces A {\displaystyle {\mathcal {A}}} and U {\displaystyle {\mathcal {U}}} by building a parametric map G ϕ : A → U {\displaystyle {\mathcal {G}}_{\phi }:{\mathcal {A}}\to {\mathcal {U}}} . Such parametric maps G ϕ {\displaystyle {\mathcal {G}}_{\phi }} can generally be defined in the form G ϕ := Q ∘ σ ( W T + K T + b T ) ∘ ⋯ ∘ σ ( W 1 + K 1 + b 1 ) ∘ P , {\displaystyle {\mathcal {G}}_{\phi }:={\mathcal {Q}}\circ \sigma (W_{T}+{\mathcal {K}}_{T}+b_{T})\circ \cdots \circ \sigma (W_{1}+{\mathcal {K}}_{1}+b_{1})\circ {\mathcal {P}},} where P , Q {\displaystyle {\mathcal {P}},{\mathcal {Q}}} are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output dimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as multilayer perceptrons. σ {\displaystyle \sigma } is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer t = 1 , … , T {\displaystyle t=1,\dots ,T} has a respective local operator W t {\displaystyle W_{t}} (usually parameterized by a pointwise neural network), a kernel integral operator K t {\displaystyle {\mathcal {K}}_{t}} , and a bias function b t {\displaystyle b_{t}} . Given some intermediate functional representation v t {\displaystyle v_{t}} with domain D {\displaystyle D} in the t {\displaystyle t} -th hidden layer, a kernel integral operator K ϕ {\displaystyle {\mathcal {K}}_{\phi }} is defined as ( K ϕ v t ) ( x ) := ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x):=\int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy,} where the kernel κ ϕ {\displaystyle \kappa _{\phi }} is a learnable implicit neural network, parametrized by ϕ {\displaystyle \phi } . In practice, one is often given the input function to the neural operator at a specific resolution. For instance, consider the setting where one is given the evaluation of v t {\displaystyle v_{t}} at n {\displaystyle n} points { y j } j n {\displaystyle \{y_{j}\}_{j}^{n}} . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, the above integral operation can be computed as follows: ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y ≈ ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j , {\displaystyle \int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy\approx \sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}},} where Δ y j {\displaystyle \Delta _{y_{j}}} is the sub-area volume or quadrature weight associated to the point y j {\displaystyle y_{j}} . Thus, a simplified layer can be computed as v t + 1 ( x ) ≈ σ ( ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j + W t ( v t ( y j ) ) + b t ( x ) ) . {\displaystyle v_{t+1}(x)\approx \sigma \left(\sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}}+W_{t}(v_{t}(y_{j}))+b_{t}(x)\right).} The above approximation, along with parametrizing κ ϕ {\displaystyle \kappa _{\phi }} as an implicit neural network, results in the graph neural operator (GNO). There have been various parameterizations of neural operators for different applications. These typically differ in their parameterization of κ {\displaystyle \kappa } . The most popular instantiation is the Fourier neural operator (FNO). FNO takes κ ϕ ( x , y , v t ( x ) , v t ( y ) ) := κ ϕ ( x − y ) {\displaystyle \kappa _{\phi }(x,y,v_{t}(x),v_{t}(y)):=\kappa _{\phi }(x-y)} and by applying the convolution theorem, arrives at the following parameterization of the kernel integral operator: ( K ϕ v t ) ( x ) = F − 1 ( R ϕ ⋅ ( F v t ) ) ( x ) , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x)={\mathcal {F}}^{-1}(R_{\phi }\cdot ({\mathcal {F}}v_{t}))(x),} where F {\displaystyle {\mathcal {F}}} represents the Fourier transform and R ϕ {\displaystyle R_{\phi }} represents the Fourier transform of some periodic function κ ϕ {\displaystyle \kappa _{\phi }} . That is, FNO parameterizes the kernel integration directly in Fourier space, using a prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using the discrete Fourier transform (DFT) with frequencies below some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation. == Training == Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset { ( a i , u i ) } i = 1 N {\displaystyle \{(a_{i},u_{i})\}_{i=1}^{N}} of size N {\displaystyle N} , neural operators minimize (a discretization of) L U ( { ( a i , u i ) } i = 1 N ) := ∑ i = 1 N ‖ u i − G θ ( a i ) ‖ U 2 {\displaystyle {\mathcal {L}}_{\mathca

    Read more →
  • Computational semantics

    Computational semantics

    Computational semantics is a subfield of computational linguistics. Its goal is to elucidate the cognitive mechanisms supporting the generation and interpretation of meaning in humans. It usually involves the creation of computational models that simulate particular semantic phenomena, and the evaluation of those models against data from human participants. While computational semantics is a scientific field, it has many applications in real-world settings and substantially overlaps with Artificial Intelligence. Broadly speaking, the discipline can be subdivided into areas that mirror the internal organization of linguistics. For example, lexical semantics and frame semantics have active research communities within computational linguistics. Some popular methodologies are also strongly inspired by traditional linguistics. Most prominently, the area of distributional semantics, which underpins investigations into embeddings and the internals of Large Language Models, has roots in the work of Zellig Harris. Some traditional topics of interest in computational semantics are: construction of meaning representations, semantic underspecification, anaphora resolution, presupposition projection, and quantifier scope resolution. Methods employed usually draw from formal semantics or statistical semantics. Computational semantics has points of contact with the areas of lexical semantics (word-sense disambiguation and semantic role labeling), discourse semantics, knowledge representation and automated reasoning (in particular, automated theorem proving). Since 1999 there has been an ACL special interest group on computational semantics, SIGSEM.

    Read more →