AI Coding Meta

AI Coding Meta — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Feature engineering

    Feature engineering

    Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability. Beyond machine learning, the principles of feature engineering are applied in various scientific fields, including physics. For example, physicists construct dimensionless numbers such as the Reynolds number in fluid dynamics, the Nusselt number in heat transfer, and the Archimedes number in sedimentation. They also develop first approximations of solutions, such as analytical solutions for the strength of materials in mechanics. == Clustering == One of the applications of feature engineering has been clustering of feature-objects or sample-objects in a dataset. Especially, feature engineering based on matrix decomposition has been extensively used for data clustering under non-negativity constraints on the feature coefficients. These include Non-Negative Matrix Factorization (NMF), Non-Negative Matrix-Tri Factorization (NMTF), Non-Negative Tensor Decomposition/Factorization (NTF/NTD), etc. The non-negativity constraints on coefficients of the feature vectors mined by the above-stated algorithms yields a part-based representation, and different factor matrices exhibit natural clustering properties. Several extensions of the above-stated feature engineering methods have been reported in literature, including orthogonality-constrained factorization for hard clustering, and manifold learning to overcome inherent issues with these algorithms. Other classes of feature engineering algorithms include leveraging a common hidden structure across multiple inter-related datasets to obtain a consensus (common) clustering scheme. An example is Multi-view Classification based on Consensus Matrix Decomposition (MCMD), which mines a common clustering scheme across multiple datasets. MCMD is designed to output two types of class labels (scale-variant and scale-invariant clustering), and: is computationally robust to missing information, can obtain shape- and scale-based outliers, and can handle high-dimensional data effectively. Coupled matrix and tensor decompositions are popular in multi-view feature engineering. == Predictive modelling == Feature engineering in machine learning and statistical modeling involves selecting, creating, transforming, and extracting data features. Key components include feature creation from existing data, transforming and imputing missing or invalid features, reducing data dimensionality through methods like Principal Components Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), and selecting the most relevant features for model training based on importance scores and correlation matrices. Features vary in significance. Even relatively insignificant features may contribute to a model. Feature selection can reduce the number of features to prevent a model from becoming too specific to the training data set (overfitting). Feature explosion occurs when the number of identified features is too large for effective model estimation or optimization. Common causes include: Feature templates - implementing feature templates instead of coding new features Feature combinations - combinations that cannot be represented by a linear system Feature explosion can be limited via techniques such as regularization, kernel methods, and feature selection. == Automation == Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a decision tree. Deep Feature Synthesis uses simpler methods. === Multi-relational Decision Tree Learning (MRDTL) === Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as decision nodes, refined systematically until a specific termination criterion is reached. Most MRDTL studies base implementations on relational databases, which results in many redundant operations. These redundancies can be reduced by using techniques such as tuple id propagation. === Open-source implementations === There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: featuretools is a Python library for transforming time series and relational data into feature matrices for machine learning. MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. OneBM or One-Button Machine combines feature transformations and feature selection on relational data with feature selection techniques. OneBM helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost. getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in C/C++ with a Python interface. It has been shown to be at least 60 times faster than tsflex, tsfresh, tsfel, featuretools or kats. tsfresh is a Python library for feature extraction on time series data. It evaluates the quality of the features using hypothesis testing. tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. seglearn is an extension for multivariate, sequential time series data to the scikit-learn Python library. tsfel is a Python package for feature extraction on time series data. kats is a Python toolkit for analyzing time series data. === Deep feature synthesis === The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores == The feature store is where the features are stored and organized for the explicit purpose of being used to either train models (by data scientists) or make predictions (by applications that have a trained model). It is a central location where you can either create or update groups of features created from multiple different data sources, or create and update new datasets from those feature groups for training models or for use in applications that do not want to compute the features but just retrieve them when it needs them to make predictions. A feature store includes the ability to store code used to generate features, apply the code to raw data, and serve those features to models upon request. Useful capabilities include feature versioning and policies governing the circumstances under which features can be used. Feature stores can be standalone software tools or built into machine learning platforms. == Alternatives == Feature engineering can be a time-consuming and error-prone process, as it requires domain expertise and often involves trial and error. Deep learning algorithms may be used to process a large raw dataset without having to resort to feature engineering. However, deep learning algorithms still require careful preprocessing and cleaning of the input data. In addition, choosing the right architecture, hyperparameters, and optimization algorithm for a deep neural network can be a challenging and iterative process.

    Read more →
  • Fragment (computer graphics)

    Fragment (computer graphics)

    In computer graphics, a fragment is the data necessary to generate a single pixel's worth of a drawing primitive in the frame buffer. These data may include, but are not limited to: raster position depth interpolated attributes (color, texture coordinates, etc.) stencil alpha window ID As a scene is drawn, drawing primitives (the basic elements of graphics output, such as points, lines, circles, text etc.) are rasterized into fragments which are textured and combined with the existing frame buffer. How a fragment is combined with the data already in the frame buffer depends on various settings. In a typical case, a fragment may be discarded if it is further away than the pixel which is already at that location (according to the depth buffer). If it is nearer than the existing pixel, it may replace what is already there, or, if alpha blending is in use, the pixel's color may be replaced with a mixture of the fragment's color and the pixel's existing color, as in the case of drawing a translucent object. In general, a fragment can be thought of as the data needed to shade the pixel, plus the data needed to test whether the fragment survives to become a pixel (depth, alpha, stencil, scissor, window ID, etc.). Shading a fragment is done through a fragment shader (or pixel shaders in Direct3D). In computer graphics, a fragment is not necessarily opaque, and could contain an alpha value specifying its degree of transparency. The alpha is typically normalized to the range of [0, 1], with 0 denotes totally transparent and 1 denotes totally opaque. If the fragment is not totally opaque, then part of its background object could show through, which is known as alpha blending.

    Read more →
  • Network eavesdropping

    Network eavesdropping

    Network eavesdropping, also known as eavesdropping attack, sniffing attack, or snooping attack, is a method that retrieves user information through the internet. This attack happens on electronic devices like computers and smartphones. This network attack typically happens under the usage of unsecured networks, such as public wifi connections or shared electronic devices. Eavesdropping attacks through the network is considered one of the most urgent threats in industries that rely on collecting and storing data. Internet users use eavesdropping via the Internet to improve information security. A typical network eavesdropper may be called a Black-hat hacker and is considered a low-level hacker as it is simple to network eavesdrop successfully. The threat of network eavesdroppers is a growing concern. Research and discussions are brought up in the public's eye, for instance, types of eavesdropping, open-source tools, and commercial tools to prevent eavesdropping. Models against network eavesdropping attempts are built and developed as privacy is increasingly valued. Sections on cases of successful network eavesdropping attempts and its laws and policies in the National Security Agency are mentioned. Some laws include the Electronic Communications Privacy Act and the Foreign Intelligence Surveillance Act. == Types of attacks == Types of network eavesdropping include intervening in the process of decryption of messages on communication systems, attempting to access documents stored in a network system, and listening on electronic devices. Types include electronic performance monitoring and control systems, keystroke logging, man-in-the-middle attacks, observing exit nodes on a network, and Skype & Type. === Electronic performance monitoring and control systems (EPMCSs) === Electronic performance monitoring and control systems are used by employees or companies and organizations to collect, store, analyze, and report actions or performances of employers when they are working. The beginning of this system is used to increase the efficiency of workers, but instances of unintentional eavesdropping can occur, for example, when employees' casual phone calls or conversations would be recorded. === Keystroke logging === Keystroke logging is a program that can oversee the writing process of the user. It can be used to analyze the user's typing activities, as keystroke logging provides detailed information on activities like typing speed, pausing, deletion of texts, and more behaviors. By monitoring the activities and sounds of the keyboard strikes, the message typed by the user can be translated. Although keystroke logging systems do not explain reasons for pauses or deletion of texts, it allows attackers to analyze text information. Keystroke logging can also be used with eye-tracking devices which monitor the movements of the user's eyes to determine patterns of the user's typing actions which can be used to explain the reasons for pauses or deletion of texts. === Man-in-the-middle attack (MitM) === A Man-in-the-middle attack is an active eavesdropping method that intrudes on the network system. It can retrieve and alter the information sent between two parties without anyone noticing. The attacker hijacks the communication systems and gains control over the transport of data, but cannot insert voice messages that sound or act like the actual users. Attackers also create independent communications through the system with the users acting as if the conversation between users is private. The "man-in-the-middle" can also be referred to as lurkers in a social context. A lurker is a person who rarely or never posts anything online, but the person stays online and observes other users' actions. Lurking can be valuable as it lets people gain knowledge from other users. However, like eavesdropping, lurking into other users' private information violates privacy and social norms. === Observing exit nodes === Distributed networks including communication networks are usually designed so that nodes can enter and exit the network freely. However, this poses a danger in which attacks can easily access the system and may cause serious consequences, for example, leakage of the user's phone number or credit card number. In many anonymous network pathways, the last node before exiting the network may contain actual information sent by users. Tor exit nodes are an example. Tor is an anonymous communication system that allows users to hide their IP addresses. It also has layers of encryption that protect information sent between users from eavesdropping attempts trying to observe the network traffic. However, Tor exit nodes are used to eavesdrop at the end of the network traffic. The last node in the network path flowing through the traffic, for instance, Tor exit nodes, can acquire original information or messages that were transmitted between different users. === Skype & Type (S&T) === Skype & Type (S&T) is a new keyboard acoustic eavesdropping attack that takes advantage of Voice-over IP (VoIP). S&T is practical and can be used in many applications in the real world, as it does not require attackers to be close to the victim and it can work with only some leaked keystrokes instead of every keystroke. With some knowledge of the victim's typing patterns, attackers can gain a 91.7% accuracy typed by the victim. Different recording devices including laptop microphones, smartphones, and headset microphones can be used for attackers to eavesdrop on the victim's style and speed of typing. It is especially dangerous when attackers know what language the victim is typing in. == Tools to prevent eavesdropping attacks == Computer programs where the source code of the system is shared with the public for free or for commercial use can be used to prevent network eavesdropping. They are often modified to cater to different network systems, and the tools are specific in what task it performs. In this case, Advanced Encryption Standard-256, Bro, Chaosreader, CommView, Firewalls, Security Agencies, Snort, Tcptrace, and Wireshark are tools that address network security and network eavesdropping. === Advanced encryption standard-256 (AES-256) === It is a cipher block chaining (CBC) mode for ciphered messages and hash-based message codes. The AES-256 contains 256 keys for identifying the actual user, and it represents the standard used for securing many layers on the internet. AES-256 is used by Zoom Phone apps that help encrypt chat messages sent by Zoom users. If this feature is used in the app, users will only see encrypted chats when they use the app, and notifications of an encrypted chat will be sent with no content involved. === Bro === Bro is a system that detects network attackers and abnormal traffic on the internet. It emerged at the University of California, Berkeley that detects invading network systems. The system does not apply to the detection of eavesdropping by default, but can be modified to an offline analyzing tool for eavesdropping attacks. Bro runs under Digital Unix, FreeBSD, IRIX, SunOS, and Solaris operating systems, with the implementation of approximately 22,000 lines of C++ and 1,900 lines of Bro. It is still in the process of development for real-world applications. === Chaosreader === Chaosreader is a simplified version of many open-source eavesdropping tools. It creates HTML pages on the content of when a network intrusion is detected. No actions are taken when an attack occurs and only information such as time, network location on which system or wall the user is trying to attack will be recorded. === CommView === CommView is specific to Windows systems which limits real-world applications because of its specific system usage. It captures network traffic and eavesdropping attempts by using packet analyzing and decoding. === Firewalls === Firewall technology filters network traffic and blocks malicious users from attacking the network system. It prevents users from intruding into private networks. Having a firewall in the entrance to a network system requires user authentications before allowing actions performed by users. There are different types of firewall technologies that can be applied to different types of networks. === Security agencies === A Secure Node Identification Agent is a mobile agent used to distinguish secure neighbor nodes and informs the Node Monitoring System (NMOA). The NMOA stays within nodes and monitors the energy exerted, and receives information about nodes including node ID, location, signal strength, hop counts, and more. It detects nodes nearby that are moving out of range by comparing signal strengths. The NMOA signals the Secure Node Identification Agent (SNIA) and updates each other on neighboring node information. The Node BlackBoard is a knowledge base that reads and updates the agents, acting as the brain of the security system. The Node Key Management agent is created when an encryption key is inserted to th

    Read more →
  • CrocBITE

    CrocBITE

    CrocBITE (currently CrocAttack) was an online database of wild crocodilian attacks reported on humans in the world. The non-profit online research tool helped to scientifically analyze crocodilian behavior via complex models. Users were encouraged to feed information in a crowdsourcing manner. This website excludes captive crocodilian attacks, as well as non-fatal bites on professional handlers, rangers, staff, or researchers, and crocodilian attacks on pets and livestock, because its primary goal is to analyze natural human-crocodilian conflict in the wild for conservation and management purposes, and that these incidents do are not considered indicative of natural species behavior or typical human-wildlife conflict, as well as not providing enough useful data and helping researchers understand wild population behavior or typical human-wildlife conflict dynamics and helps create safety strategies for people living or working near wild crocodilians, rather than tracking workplace accidents in zoos or farms. While fatal incidents involving handlers are sometimes included on the website, typical captive incidents (such as handlers being bitten by them in zoos) are excluded because they are considered manageable professional risks rather than general public safety threats. == About == The online database was established in 2013 (2013) by Dr Adam Britton, a researcher at Charles Darwin University, his student Brandon Sideleau and Erin Britton. It was a compilation of government records, individual reports, registered contributors and historical data. Dr Simon Pooley, Junior Research fellow, Imperial College London joined hands to further the studies. The collaboration culminated when Dr Pooley met Dr Britton at the IUCN Crocodile Specialist Group, in Louisiana in 2014. The program received funds from Economic and Social Research Council, United Kingdom to the tune of A$30,000 and unspecified resourced plus amount from Big Gecko Crocodilian Research, Crocodillian.com and Charles Darwin University. The research yielded pertinent observations that provide inside into crocodile attacks. It was observed that most attacks on humans occur from bites of Saltwater crocodile as against the popular understanding of Nile crocodiles taking the top spot. This is not, however, believed to be the actual case, as most attacks by the Nile crocodile are believed to go unreported or only reported on a local level. The broad category of Nile crocodile attacks were segmented into West African crocodile and Crocodylus niloticus (the Nile Crocodile) species to get a clear understanding of their respective attack zones. The objective was that the information would be used by communities and conservation managers to help inform and educate people about how to keep safe. The information was vital for Australia and Africa where such attacks are more likely than in other parts of the world. This was the only database of its kind with such comprehensive collection of information made available online. The database is no longer online, and its founder Adam Britton is in custody having pleaded guilty to charges of bestiality on September 25, 2023. It has been rebranded and renamed CrocAttack, and serves as a updated database focusing on human-crocodilian conflict and records over 8,500 incidents from the past decades.

    Read more →
  • Software construction

    Software construction

    Software construction is the process of creating working software via coding and integration. The process includes unit and integration testing although does not include higher level testing such as system testing. Construction is an aspect of the software development lifecycle and is integrated in the various software development process models with varying focus on construction as an activity separate from other activities. In the waterfall model, a software development effort consists of sequential phases including requirements analysis, design, and planning which are prerequisites for starting construction. In an iterative model such as scrum, evolutionary prototyping, or extreme programming, construction as an activity that occurs concurrently or overlapping other activities. Construction planning may include defining the order in which components are created and integrated, the software quality management processes, and the allocation of tasks to teams and developers. To facilitate project management, numerous construction aspects can be measured; these include the amount of code developed, modified, reused, and destroyed, code complexity, code inspection statistics, faults-fixed and faults-found rates, and effort expended. These measurements can be useful for aspects such as ensuring quality and improving the process. == Activities == Construction includes many activities. === Coding === The following are a few of the key aspects of the coding activity: Naming Choice of name for each identifier. One study showed that the effort required to debug a program is minimized when variable names are between 10 and 16 characters. Logic Organization into statements and routines Highly cohesive routines proved to be less error prone than routines with lower cohesion. A study of 450 routines found that 50 percent of the highly cohesive routines were fault free compared to only 18 percent of routines with low cohesion. Another study of a different 450 routines found that routines with the highest coupling-to-cohesion ratios had 7 times as many errors as those with the lowest coupling-to-cohesion ratios and were 20 times as costly to fix. Although studies showed inconclusive results regarding the correlation between routine sizes and the rate of errors in them, but one study found that routines with fewer than 143 lines of code were 2.4 times less expensive to fix than larger routines. Another study showed that the code needed to be changed least when routines averaged 100 to 150 lines of code. Another study found that structural complexity and amount of data in a routine were correlated with errors regardless of its size. Interfaces between routines are some of the most error-prone areas of a program. One study showed that 39 percent of all errors were errors in communication between routines. Unused parameters are correlated with an increased error rate. In one study, only 17 to 29 percent of routines with more than one unreferenced variable had no errors, compared to 46 percent in routines with no unused variables. The number of parameters of a routine should be 7 at maximum as research has found that people generally cannot keep track of more than about seven chunks of information at once. One experiment showed that designs which access arrays sequentially, rather than randomly, result in fewer variables and fewer variable references. One experiment found that loops-with-exit are more comprehensible than other kinds of loops. Regarding the level of nesting in loops and conditionals, studies have shown that programmers have difficulty comprehending more than three levels of nesting. Control flow complexity has been shown to correlate with low reliability and frequent errors. Modularity Structuring and refactoring the code into classes, packages and other structures. When considering containment, the maximum number of data members in a class shouldn't exceed 7±2. Research has shown that this number is the number of discrete items a person can remember while performing other tasks. When considering inheritance, the number of levels in the inheritance tree should be limited. Deep inheritance trees have been found to be significantly associated with increased fault rates. When considering the number of routines in a class, it should be kept as small as possible. A study on C++ programs has found an association between the number of routines and the number of faults. A study by NASA showed that the putting the code into well-factored classes can double the code reusability compared to the code developed using functional design. Error handling Encoding logic to handle both planned and unplanned errors and exceptions. Resource management Managing computational resource use via exclusion mechanisms and discipline in accessing serially reusable resources, including threads or database locks. Security Prevention of code-level security breaches such as buffer overrun and array index overflow. Optimization Optimization while avoiding premature optimization. Documentation Both embedded in the code as comments and as external documents. === Integration === Integration is about combining separately constructed parts. Concerns include planning the sequence in which components will be integrated, creating scaffolding to support interim versions of the software, determining the degree of testing and quality work performed on components before they are integrated, and determining points in the project at which interim versions are tested. === Testing === Testing can reduce the time between when faulty logic is inserted in the code and when it is detected. In some cases, testing is performed after code has been written, but in test-first programming, test cases are created before code is written. Construction includes at least two forms of testing, often performed by the developer who wrote the code: unit testing and integration testing. === Reuse === Software reuse entails more than creating and using libraries. It requires formalizing the practice of reuse by integrating reuse processes and activities into the software life cycle. The tasks related to reuse in software construction during coding and testing may include: selection of the reusable code, evaluation of code or test re-usability, reporting reuse metrics. === Quality assurance === Techniques for ensuring quality as software is constructed include: Testing One study found that the average defect detection rates of Unit testing and integration testing are 30% and 35% respectively. Software inspection With respect to software inspection, one study found that the average defect detection rate of formal code inspections is 60%. Regarding the cost of finding defects, a study found that code reading detected 80% more faults per hour than testing. Another study shown that it costs six times more to detect design defects by using testing than by using inspections. A study by IBM showed that only 3.5 hours were needed to find a defect through code inspections versus 15–25 hours through testing. Microsoft has found that it takes 3 hours to find and fix a defect by using code inspections and 12 hours to find and fix a defect by using testing. In a 700 thousand lines program, it was reported that code reviews were several times as cost-effective as testing. Studies found that inspections result in 20% - 30% fewer defects per 1000 lines of code than less formal review practices and that they increase productivity by about 20%. Formal inspections will usually take 10% - 15% of the project budget and will reduce overall project cost. Researchers found that having more than 2 - 3 reviewers on a formal inspection doesn't increase the number of defects found, although the results seem to vary depending on the kind of material being inspected. Technical review With respect to technical review, one study found that the average defect detection rates of informal code reviews and desk checking are 25% and 40% respectively. Walkthroughs were found to have a defect detection rate of 20% - 40%, but were found also to be expensive especially when project pressures increase. Code reading was found by NASA to detect 3.3 defects per hour of effort versus 1.8 defects per hour for testing. It also finds 20% - 60% more errors over the life of the project than different kinds of testing. A study of 13 reviews about review meetings, found that 90% of the defects were found in preparation for the review meeting while only around 10% were found during the meeting. Static analysis With respect to Static analysis (IEEE1028), studies have shown that a combination of these techniques needs to be used to achieve a high defect detection rate. Other studies showed that different people tend to find different defects. One study found that the extreme programming practices of pair programming, desk checking, unit testing, integration testing, and regression testing can achieve a 90% defect detection rate. An experiment involving exper

    Read more →
  • MetaMask

    MetaMask

    MetaMask is a software cryptocurrency wallet developed by ConsenSys for interacting with the Ethereum blockchain and other EVM-compatible networks. It enables users to manage Ethereum accounts and connect to decentralized applications (dApps) via a browser extension or mobile app. As of early 2026, MetaMask reports over 100 million users worldwide. == Overview == MetaMask allows users to store and manage private keys, send and receive Ethereum-based cryptocurrencies and tokens (including ERC-20 and ERC-721 standards), broadcast transactions, and interact with dApps. dApps connect to the wallet via JavaScript interfaces, prompting users to approve signatures or transactions. The wallet features MetaMask Swaps, an in-app token swap aggregator sourcing liquidity from multiple decentralized exchanges (DEXs), with a service fee of 0.875%. In 2025, MetaMask introduced the MetaMask Rewards program (initially mobile-only), where users earn points for activities such as swaps, bridging, and referrals. Season 1 (October 2025 – January 2026) distributed over $30 million in Linea tokens and other perks to participants. == History == MetaMask launched in 2016 as open-source software under the MIT license. It initially supported browser extensions for Chrome and Firefox. Mobile versions were in closed beta from 2019 and publicly released for iOS and Android in September 2020. In August 2020, the license changed to a custom proprietary one. MetaMask Swaps launched on desktop in October 2020 and on mobile in March 2021. The Rewards program launched in late 2025 with Linea integration. == Criticism == MetaMask has faced criticism over privacy, including default analytics settings that share some user data (which can be disabled). Its reliance on Infura (acquired by ConsenSys in 2019) has raised concerns about centralization in Ethereum infrastructure. The wallet regularly issues warnings about phishing scams and fake airdrops impersonating MetaMask.

    Read more →
  • MultiValue database

    MultiValue database

    A MultiValue database is a type of NoSQL and multidimensional database. It is typically considered synonymous with PICK, a database originally developed as the Pick operating system. MultiValue databases include commercial products from Rocket Software, Revelation, InterSystems, Northgate Information Solutions, ONgroup, and other companies. These databases differ from a relational database in that they have features that support and encourage the use of attributes which can take a list of values, rather than all attributes being single-valued. They are often categorized with MUMPS within the category of post-relational databases, although the data model actually pre-dates the relational model. Unlike SQL-DBMS tools, most MultiValue databases can be accessed both with or without SQL. == History == Don Nelson designed the MultiValue data model in the early to mid-1960s. Dick Pick, a developer at TRW, worked on the first implementation of this model for the US Army in 1965. Pick considered the software to be in the public domain because it was written for the military, this was but the first dispute regarding MultiValue databases that was addressed by the courts. Ken Simms wrote DataBASIC, sometimes known as S-BASIC, in the mid-1970s. It was based on Dartmouth BASIC, but had enhanced features for data management. Simms played a lot of Star Trek (a text-based early computer game originally written in Dartmouth BASIC) while developing the language, to ensure that DataBASIC functioned to his satisfaction. Three of the implementations of MultiValue - PICK version R77, Microdata Reality 3.x, and Prime Information 1.0 - were very similar. In spite of attempts to standardize, particularly by International Spectrum and the Spectrum Manufacturers Association, who designed a logo for all to use, there are no standards across MultiValue implementations. Subsequently, these flavors diverged, although with some cross-over. These streams of MultiValue database development could be classified as one stemming from PICK R83, one from Microdata Reality, and one from Prime Information. Because of the differences, some implementations have provisions for supporting several flavors of the languages. An attempt to document the similarities and differences can be found at the Post-Relational Database Reference (PRDB). One reasonable hypothesis for this data model lasting 50 years, with new database implementations of the model even in the 21st century is that it provides inexpensive database solutions. == Data model example == In a MultiValue database system: a database or schema is called an "account" a table or collection is called a "file" a column or field is called a field or an "attribute", which is composed of "multi-value attributes" and "sub-value attributes" to store multiple values in the same attribute. a row or document is called a "record" or "item" Data is stored using two separate files: a "file" to store raw data and a "dictionary" to store the format for displaying the raw data. For example, assume there's a file (table) called "PERSON". In this file, there is an attribute called "eMailAddress". The eMailAddress field can store a variable number of email address values in a single record. The list [[email protected], [email protected], [email protected]] can be stored and accessed via a single query when accessing the associated record. Achieving the same (one-to-many) relationship within a traditional relational database system would include creating an additional table to store the variable number of email addresses associated with a single "PERSON" record. However, modern relational database systems support this multi-value data model too. For example, in PostgreSQL, a column can be an array of any base type. == MultiValue Basic Language == Multivalue Basic (now commonly styled as mvBasic) is a family of programming languages more or less common (and portable) to all the multivalue databases derived from the original Pick Operating System. The variations between implementations are known as flavours. The language originates from Dartmouth Basic and the earliest implementation of PickBASIC (now D3 FlashBasic). Over time various customisations and extensions have been added to take advantage of capabilities added to the different flavours while staying mainly in sync. mvBasic statements and functions are designed to access and take advantage of the multivalue database model and providing the usual capabilities of most modern languages. For example, cryptography and communications. mvBasic is typeless and lends itself to structured programming techniques. Example code is available but limited. Whilst there are commercial applications and tools available, the multivalue database community has not embraced the open source library/package model to the degree seen with other languages. The typical mvBasic compiler compiles program source to a P-code executable object and runs in an interpreter, with D3 FlashBasic and jBASE being notable exceptions. == MultiValue Query Language == Known as ENGLISH, ACCESS, AQL, UniQuery, Retrieve, CMQL, and by many other names over the years, corresponding to the different MultiValue implementations, the MultiValue query language differs from SQL in several respects. Each query is issued against a single dictionary within the schema, which could be understood as a virtual file or a portal to the database through which to view the data. LIST PEOPLE LAST_NAME FIRST_NAME EMAIL_ADDRESSES WITH LAST_NAME LIKE "Van..." The above statement would list all e-mail addresses for each person whose last name starts with "Van". A single entry would be output for each person, with multiple lines showing the multiple e-mail addresses (without repeating other data about the person).

    Read more →
  • Local coordinates

    Local coordinates

    Local coordinates are the ones used in a local coordinate system or a local coordinate space. Simple examples: Houses. In order to work in a house construction, the measurements are referred to a control arbitrary point that will allow to check it: stick/sticks on the ground, steel bar, nails... Addresses. Using house numbers to locate a house on a street; the street is a local coordinate system within a larger system composed of city townships, states, countries, postal codes, etc. Local systems exist for convenience. On ancient times, every work was made on relative bases as there was no conception of global systems. Practically, it is better to use local systems for small works as houses, buildings... For most of the applications, it is desired the position of one element relative to one building or location, and in a more local way, relative to one furniture or person. In a regular way, you will not give your position by geographical coordinates rather than "I am 15 meters away of the entry to the building". So it is a pretty common way to locate things. It is possible to bring latitude and longitude for all terrestrial locations, but unless one has a highly precise GPS device or you make astronomical observations, this is impractical. It is much simpler to use a tape, a rope, a chain... The position information (global) should be transformed into a location. Position refers to a numeric or symbolic description within a spatial reference system, whereas location refers to information about surrounding objects and their interrelationships. (Topological space) == Use == In computer graphics and computer animation, local coordinate spaces are also useful for their ability to model independently transformable aspects of geometrical scene graphs. When modeling a car, for example, it is desirable to describe the center of each wheel with respect to the car's coordinate system, but then specify the shape of each wheel in separate local spaces centered about these points. This way, the information describing each wheel can be simply duplicated four times, and independent transformations (e.g., steering rotation) can be similarly effected. Bounding volumes of objects may be described more accurately using extents in the local coordinates, (i.e. an object oriented bounding box, contrasted with the simpler axis aligned bounding box). The trade-off for this flexibility is additional computational cost: the rendering system must access the higher-level coordinate system of the car and combine it with the space of each wheel in order to draw everything in its proper place. Local coordinates also afford digital designers a means around the finite limits of numerical representation. The tread marks on a tire, for example, can be described using millimeters by allowing the whole tire to occupy the entire range of numeric precision available. The larger aspects of the car, such as its frame, might be described in centimeters, and the terrain that the car travels on could be specified in meters. In differential topology, local coordinates on a manifold are defined by means of an atlas of charts. The basic idea behind coordinate charts is that each small patch of a manifold can be endowed with a set of local coordinates. These are collected together into an atlas, and stitched together in such a way that they are self-consistent on the manifold. In Cartography and Maps, the traditional way of works are local datum. With a local datum the land can be mapped on relative small areas as a country. With the need of global systems, the transformations on between datum became a problem, so geodetic datum have been created. More than 150 local datum have been used in the world.

    Read more →
  • DeepRoute.ai

    DeepRoute.ai

    DeepRoute.ai (Chinese: 元戎启行) is a Chinese autonomous driving company founded in 2019 and headquartered in Shenzhen, China. The company develops full-stack self-driving solutions including perception, decision-making, and control systems. == History == DeepRoute.ai was founded in February 2019 in Shenzhen, China, by Zhou Guang (周光), who serves as the company's CEO. In September 2019, the company collaborated with Dongfeng for a live-streamed autonomous driving demonstration. In October 2019, during the 7th Military World Games, DeepRoute.ai conducted Robotaxi demonstration operations. In November 2019, it obtained an intelligent connected vehicle road test permit for public roads in Shenzhen. In October 2020, DeepRoute.ai signed an "Autonomous Driving Leadership Project" with Dongfeng to build one of China's largest autonomous fleets. In August 2020, DeepRoute.ai announced its partnership with Cao Cao Mobility, a Geely-backed ride-hailing company, to test Robotaxis in Hangzhou for daily operations, planning to provide Robotaxis during the 2022 Asian Games. In September 2021, DeepRoute.ai secured US$300 million in a Series B funding round led by Alibaba. In December 2021, the company unveiled its DeepRoute-Driver 2.0, an L4-level autonomous driving solution comprising five solid-state lidar sensors, eight cameras, a proprietary computing system and an optional millimeter-wave radar. with a production cost of under US$10,000. In June 2022, it partnered with Deppon Express to provide autonomous light truck freight transfer services. In March 2023, the company launched its high-precision map-free intelligent driving solution, DeepRoute-Driver 3.0. In November 2024, Great Wall Motor announced a $100 million Series C funding round for Deeproute. With this, Deeproute has completed five rounds of financing, raising a cumulative total of over $500 million. Its shareholders include Fosun RZ Capital, Yunqi Partners, Alibaba, Vision Plus Capital, and Dongfeng, among others. In the same month, Deeproute.ai emphasised that they were in "deep cooperation" with Nvidia and spoke on being part of the first batch of companies in China to get a hold of Nvidia's newer Thor chip for cars which will be used in a new system released next year. This new system will help manage more complex driving scenarios through visual cues. == Products == === VLA Model === VLA Model is a Vision–language–action model designed for autonomous driving systems. It integrates visual perception, semantic understanding, and action decision-making into a unified framework, aiming to enhance the safety and adaptability of advanced driver-assistance systems (ADAS) in complex road environments. The model was officially launched on August 26, 2025, as the core of DeepRoute.ai's DeepRoute IO 2.0 platform. The VLA model is characterized by its "visual-language-action" architecture, which incorporates a chain-of-thought (CoT) reasoning capability inspired by large language models. This design is intended to address the "black box" limitations of traditional end-to-end autonomous driving systems by enabling the model to analyze information, infer causality, and make decisions in a more transparent and interpretable manner. === Appliance === The company has partnered with several automakers including Dongfeng Motor Corporation and Geely to develop and test autonomous vehicles.

    Read more →
  • Dark data

    Dark data

    Dark data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making. The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected. IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used. In an industrial context, dark data can include information gathered by sensors and telematics. Organizations retain dark data for a multitude of reasons, and it is estimated that most companies are only analyzing 1% of their data. Often it is stored for regulatory compliance and record keeping. Some organizations believe that dark data could be useful to them in the future, once they have acquired better analytic and business intelligence technology to process the information. Because storage is inexpensive, storing data is easy. However, storing and securing the data usually entails greater expenses (or even risk) than the potential return profit. In academic discourse, the term dark data was essentially coined by Bryan P. Heidorn. He uses it to describe research data, especially from the long tail of science (the many, small research projects), which are not or no longer available for research because they disappear in a drawer without adequate data management. Without this, the data become dark, and further reasons for this are e.g. missing metadata annotation, missing data management plans and data curators. == Analysis == The term "dark data" very often refers to data that is not amenable to computer processing. For example, a company might have a great deal of data that exists only as scanned page-images. Even the bare text in such documents is not available without something like Optical character recognition, which can vary greatly in accuracy. Even with OCR, the significance of each part of the data is unavailable. An obvious examples is whether a capitalized word is a name or not, and if so, whether it represents a person, place, organization, or even a work of art. Bibliographic and other references, data within tables (that may be labeled quite adequately for humans, but not for processing), and countless assertions represented with the full complexity and ambiguity of human language. A lot of unused data is very valuable, and would be used if it could be; but is blocked because it is in formats that are difficult to process, categorise, identify, and analyse. Often the reason that business does not use their dark data is because of the amount of resources it would take and the difficulty of having that data analysed. In other words, the data is "dark" not because it is not used, but because it cannot (feasibly or affordably) be used, given its poor representation. There are many data representations that can make data much more accessible for automation. However, a great deal of information lacks any such identification of information items or relationships; and much more loses it during "downhill" conversion such as saving to page-oriented representations, printing, scanning, or faxing. The journey back "uphill" can be costly. According to Computer Weekly, 60% of organisations believe that their own business intelligence reporting capability is "inadequate" and 65% say that they have "somewhat disorganised content management approaches". == Relevance == Useful data may become dark data after it becomes irrelevant, as it is not processed fast enough. This is called "perishable insights" in "live flowing data". For example, if the geolocation of a customer is known to a business, the business can make offer based on the location, however if this data is not processed immediately, it may be irrelevant in the future. According to IBM, about 60 percent of data loses its value immediately. == Storage == According to the New York Times, 90% of energy used by data centres is wasted. If data was not stored, energy costs could be saved. Furthermore, there are costs associated with the underutilisation of information and thus missed opportunities. According to Datamation, "the storage environments of EMEA organizations consist of 54 percent dark data, 32 percent redundant, obsolete and trivial data and 14 percent business-critical data. By 2020, this can add up to $891 billion in storage and management costs that can otherwise be avoided." The continuous storage of dark data can put an organisation at risk, especially if this data is sensitive. In the case of a breach, this can result in serious repercussions. These can be financial, legal and can seriously hurt an organisation's reputation. For example, a breach of private records of customers could result in the stealing of sensitive information, which could result in identity theft. Another example could be the breach of the company's own sensitive information, for example relating to research and development. These risks can be mitigated by assessing and auditing whether this data is useful to the organisation, employing strong encryption and security and finally, if it is determined to be discarded, then it should be discarded in a way that it becomes unretrievable. == Future == It is generally considered that as more advanced computing systems for analysis of data are built, the higher the value of dark data will be. It has been noted that "data and analytics will be the foundation of the modern industrial revolution". Of course, this includes data that is currently considered "dark data" since there are not enough resources to process it. All this data that is being collected can be used in the future to bring maximum productivity and an ability for organisations to meet consumers' demand. Technology advancements are helping to leverage this dark data affordably. Furthermore, many organisations do not realise the value of dark data right now, for example in healthcare and education organisations deal with large amounts of data that could create a significant "potential to service students and patients in the manner in which the consumer and financial services pursue their target population".

    Read more →
  • Retained mode

    Retained mode

    Retained mode in computer graphics is a major pattern of API design in graphics libraries, in which the graphics library, instead of the client, retains the scene (complete object model of the rendering primitives) to be rendered and the client calls into the graphics library do not directly cause actual rendering, but make use of extensive indirection to resources, managed – thus retained – by the graphics library. It does not preclude the use of double-buffering. Immediate mode is an alternative approach. Historically, retained mode has been the dominant style in GUI libraries; however, both can coexist in the same library and are not necessarily exclusionary in practice. == Overview == In retained mode the client calls do not directly cause actual rendering, but instead update an abstract internal model (typically a list of objects) which is maintained within the library's data space. This allows the library to optimize when actual rendering takes place along with the processing of related objects. Some techniques to optimize rendering include: managing double buffering treatment of hidden surfaces by backface culling/occlusion culling (Z-buffering) only transferring data that has changed from one frame to the next from the application to the library Example of coexistence with immediate mode in the same library is OpenGL. OpenGL has immediate mode functions that can use previously defined server side objects (textures, vertex buffers and index buffers, shaders, etc.) without resending unchanged data. Examples of retained mode rendering systems include Windows Presentation Foundation, SceneKit on macOS, and PHIGS.

    Read more →
  • Micro stuttering

    Micro stuttering

    Micro stuttering is a visual artifact in real-time computer graphics in which the time intervals between consecutively displayed frames are uneven, even though the average frame rate reported by benchmarking software appears adequate. Tools such as 3DMark typically compute frame rates over intervals of one second or more, which can conceal momentary drops in the instantaneous frame rate that the viewer perceives as hitching or jerking of on-screen motion. At low frame rates the effect is visible as a stutter in moving images, degrading the experience in interactive applications such as video games. In severe cases a lower but more consistent frame rate can appear smoother than a higher but more erratic one. The term gained prominence in the late 2000s in discussions of multi-GPU rendering (see History), but micro stuttering also affects single-GPU systems. Common causes on modern hardware include real-time shader compilation, asset streaming from storage, VRAM exhaustion, and driver bugs. == Causes == === Shader compilation === A common cause of micro stuttering on modern PCs is real-time shader compilation. Shaders are small programs that instruct the GPU on how to render visual effects such as lighting, shadows, and reflections. On consoles, developers can pre-compile all shaders for the known, fixed hardware. On PCs, the variety of GPU architectures means shaders must often be compiled at run time, either when the game launches or during gameplay itself. When the rendering engine encounters a shader that has not yet been compiled, the CPU must finish the compilation before the GPU can draw the affected object. This causes a spike in frame time that the player perceives as a hitch. The problem has been particularly associated with games built on Unreal Engine 4 running under DirectX 12, because DX12 shifts more shader management responsibility to the application. Several techniques exist to reduce shader compilation stutter. Pipeline State Object (PSO) pre-caching records the shader permutations used at runtime so that they can be compiled in advance on subsequent launches. Asynchronous shader compilation moves the work to background CPU threads to avoid blocking the main rendering thread. Platform-level services such as Steam's shader pre-caching distribute previously compiled shaders to users with matching GPU hardware. The Steam Deck, which contains a single fixed GPU, benefits from pre-compiled shader caches because all units share the same hardware configuration. === Other causes === Micro stuttering on single-GPU systems can have several additional causes. CPU bottlenecks or scheduling interruptions from background tasks can prevent the processor from preparing frames at regular intervals. Asset streaming during gameplay (loading textures, geometry, or audio from storage) can produce hitches sometimes called traversal stutter; the use of solid-state drives and technologies such as DirectStorage has reduced but not eliminated this. VRAM exhaustion forces data to be swapped between video memory and system memory over the PCI Express bus, which is slower. Graphics driver bugs can also introduce stutter; Nvidia released hotfix driver 551.46 in February 2024 to correct intermittent micro stuttering when V-Sync was enabled. == Measurement == Micro stuttering drew attention to the limitations of average frame rate as a performance metric. In 2013, Scott Wasson at The Tech Report published a series of articles advocating frame time analysis, in which the delivery time of every individual frame is recorded and plotted rather than collapsed into a single frames-per-second figure. This approach was adopted by other hardware review publications in the following years. GPU reviews now routinely report 1% low and 0.1% low frame rates alongside the average. The 1% low is the average frame rate of the slowest 1% of frames in a sample; it serves as an indicator of worst-case smoothness. A large gap between the average and the 1% low suggests poor frame pacing. Tools for capturing per-frame timing data include FRAPS, PresentMon, OCAT, CapFrameX, and MSI Afterburner with RivaTuner Statistics Server. == Mitigation == === Frame pacing === Frame pacing is a software technique that regulates the timing of frame delivery to produce even intervals between displayed frames. Game engines, GPU drivers, and platform libraries all implement frame pacing strategies to varying degrees. On mobile platforms, Google provides the Android Frame Pacing library (Swappy) as part of the Android Game Development Kit. In December 2025, the Khronos Group published the VK_EXT_present_timing Vulkan extension, giving developers explicit control over presentation timing in a cross-platform graphics API for the first time. === Variable refresh rate === Variable refresh rate (VRR) display technologies allow a monitor's refresh rate to change to match the GPU's frame output. Implementations include Nvidia G-Sync (2013), AMD FreeSync (2015), and the VESA Adaptive-Sync standard built into DisplayPort 1.2a and later. VRR eliminates the screen tearing that results from a mismatch between frame rate and refresh rate, and avoids the frame-holding behaviour of V-Sync that can itself cause stutter. It is effective at smoothing moderate frame rate fluctuations but cannot compensate for large sudden spikes in frame time such as those caused by shader compilation or heavy asset streaming. VRR support has become standard in gaming monitors, televisions (via HDMI 2.1), and the Xbox Series X/S and PlayStation 5 consoles. === Frame generation === Beginning with DLSS 3 on the GeForce RTX 40 series in 2022, Nvidia introduced AI-based frame generation, which uses dedicated optical flow hardware and a neural network to create new frames between traditionally rendered ones. AMD followed with FSR 3 in 2023, using an algorithmic approach, and the AI-based FSR 4 for the Radeon RX 9000 series in 2025. DLSS 4, released in January 2025 for the GeForce RTX 50 series, can generate up to three frames per rendered frame using a technique called Multi Frame Generation. Frame generation increases the displayed frame rate but introduces its own frame pacing concerns. If the underlying rendered frames are unevenly timed, the interpolated frames can make the unevenness more apparent rather than less. DLSS 4 addresses this with hardware-level flip metering on the GPU's display engine, which controls the timing of frame presentation more precisely than the CPU-based pacing used in DLSS 3. Both vendors pair frame generation with latency-reduction features (Nvidia Reflex and AMD Anti-Lag+) to offset the additional input latency that results from inserting synthetic frames into the pipeline. === Frame rate limiters === Capping the frame rate below the display's maximum refresh rate, using tools such as RivaTuner Statistics Server, in-game limiters, or driver-level settings, is a common way to improve frame pacing. Preventing the GPU from running ahead of the display reduces variability in frame delivery times and can produce a smoother result than an uncapped but more irregular frame rate. == History == === Multi-GPU configurations === Micro stuttering was first widely documented in the late 2000s as a side effect of multi-GPU configurations using Alternate Frame Rendering (AFR), in which consecutive frames are assigned to alternating GPUs. Because each GPU may take a different amount of time to complete its assigned frame — due to varying scene complexity, driver scheduling, or inter-GPU communication overhead — the resulting frame delivery is irregular even when the average frame rate is high. Both Nvidia SLI and AMD CrossFireX were affected, with dual-GPU setups exhibiting the worst frame pacing irregularities. In 2012 benchmarks using Battlefield 3, dual Radeon HD 7970 cards in CrossFire showed 85% variation in frame delivery times compared with 7% for a single card, while dual GeForce GTX 680 cards in SLI showed only 7% variation compared with 5% for a single card. Multi-GPU micro stuttering became a significant factor in the eventual decline and discontinuation of consumer multi-GPU gaming. Nvidia restricted SLI to a handful of enthusiast-class cards from the GeForce 10 series onward, then replaced it with NVLink on the GeForce RTX 20 series, which saw limited gaming adoption. AMD ceased active CrossFire development around 2017. By the mid-2020s, neither vendor's current consumer GPUs support multi-GPU rendering for games. Other factors that contributed to the decline include DirectX 12 placing multi-GPU support in the hands of game developers rather than driver authors, the incompatibility of temporal anti-aliasing and other temporal rendering techniques with AFR, and the increasing size, power draw, and cost of individual GPUs. The third-party utility RadeonPro could reduce CrossFire micro stuttering through dynamic V-Sync and frame pacing adjustments, and AMD later introduced a driver-level frame paci

    Read more →
  • AI Mode

    AI Mode

    AI Mode is a search feature used within Google Search. In March 2025, Google introduced an experimental "AI Mode" within its search platform, enabling users to input complex, multi-part queries and receive comprehensive, AI-generated responses. This feature uses Google's Gemini model, which enhances the system's reasoning capabilities and supports multimodal inputs, including text, images, and voice. Users need to be signed in to be able to use the image generation features. Initially, AI Mode was available to Google One AI Premium subscribers in the United States, who could access it through the Search Labs platform. This phased rollout allowed Google to gather user feedback and refine the feature before a broader release.

    Read more →
  • Imieliński–Lipski algebra

    Imieliński–Lipski algebra

    In database theory, Imieliński–Lipski algebra is an extension of relational algebra onto tables with different types of null values. It is used to operate on relations with incomplete information. Imieliński–Lipski algebras are defined to satisfy precise conditions for semantically meaningful extension of the usual relational operators, such as projection, selection, union, and join, from operators on relations to operators on relations with various kinds of "null values". These conditions require that the system be safe in the sense that no incorrect conclusion is derivable by using a specified subset F of the relational operators; and that it be complete in the sense that all valid conclusions expressible by relational expressions using operators in F are in fact derivable in this system. For example, it is well known that the three-valued logic approach to deal with null values, supported treatment of nulls values by SQL is not complete, see Ullman book. To show this, let T be: Take SQL query Q SQL query Q will return empty set (no results) under 3-valued semantics currently adopted by all variants of SQL. This is the case because in SQL, NULL is never equal to any constant – in this case, neither to “Spring” nor “Fall” nor “Winter” (if there is Winter semester in this school). NULL='Spring' will evaluate to MAYBE and so will NULL='Fall'. The disjunction MAYBE OR MAYBE evaluates to MAYBE (not TRUE). Thus Igor will not be part of the answer (and of course neither will Rohit). But Igor should be returned as the answer. Indeed, regardless what semester Igor took the Networks class (no matter what was the unknown value of NULL), the selection condition will be true. This “Igor” will be missed by SQL and the SQL answer would be incomplete according to completeness requirements specified in Tomasz Imieliński, Witold Lipski, 'Incomplete Information in Relational Databases'. It is also argued there that 3-valued logic (TRUE, FALSE, MAYBE) can never provide guarantee of complete answer for tables with incomplete information. Three algebras which satisfy conditions of safety and completeness are defined as Imielinski–Lipski algebras: the Codd-Tables algebra, the V-tables algebra and the Conditional tables (C-tables) algebra. == Codd-tables algebra == Codd-tables algebra is based on the usual Codd's single NULL values. The table T above is an example of Codd-table. Codd-table algebra supports projection and positive selections only. It is also demonstrated in [IL84 that it is not possible to correctly extend more relational operators over Codd-Tables. For example, such basic operation as join is not extendable over Codd-tables. It is not possible to define selections with Boolean conditions involving negation and preserve completeness. For example, queries like the above query Q cannot be supported. In order to be able to extend more relational operators, more expressive form of null value representation is needed in tables which are called V-table. == V-tables algebra == V-tables algebra is based on many different ("marked") null values or variables allowed to appear in a table. V-tables allow to show that a value may be unknown but the same for different tuples. For example, in the table below Gaurav and Igor order the same (but unknown) beer in two unknown bars (which may, or may not be different – but remain unknown). Gaurav and Jane frequent the same unknown bar (Y1). Thus, instead one NULL value, we use indexed variables, or Skolem constants . V-tables algebra is shown to correctly support projection, positive selection (with no negation occurring in the selection condition), union, and renaming of attributes, which allows for processing arbitrary conjunctive queries. A very desirable property enjoyed by the V-table algebra is that all relational operators on tables are performed in exactly the same way as in the case of the usual relations. === Conditional tables (c-tables) algebra === Example of conditional table (c-table) is shown below. It has additional column “con” which is a Boolean condition involving variables, null values – same as in V-tables. over the following table c-table Conditional tables algebra, mainly of theoretical interest, supports projection, selection, union, join, and renaming. Under closed-world assumption, it can also handle the operator of difference, thus it can support all relational operators. == History == Imieliński–Lipski algebras were introduced by Tomasz Imieliński and Witold Lipski Jr. in Incomplete Information in Relational Databases.

    Read more →
  • Starlight Information Visualization System

    Starlight Information Visualization System

    Starlight is a software product originally developed at Pacific Northwest National Laboratory and now by Future Point Systems. It is an advanced visual analysis environment. In addition to using information visualization to show the importance of individual pieces of data by showing how they relate to one another, it also contains a small suite of tools useful for collaboration and data sharing, as well as data conversion, processing, augmentation and loading. The software, originally developed for the intelligence community, allows users to load data from XML files, databases, RSS feeds, web services, HTML files, Microsoft Word, PowerPoint, Excel, CSV, Adobe PDF, TXT files, etc. and analyze it with a variety of visualizations and tools. The system integrates structured, unstructured, geospatial, and multimedia data, offering comparisons of information at multiple levels of abstraction, simultaneously and in near real-time. In addition Starlight allows users to build their own named entity-extractors using a combination of algorithms, targeted normalization lists and regular expressions in the Starlight Data Engineer (SDE). As an example, Starlight might be used to look for correlations in a database containing records about chemical spills. An analyst could begin by grouping records according to the cause of the spill to reveal general trends. Sorting the data a second time, they could apply different colors based on related details such as the company responsible, age of equipment or geographic location. Maps and photographs could be integrated into the display, making it even easier to recognize connections among multiple variables. Starlight has been deployed to both the Iraq and Afghanistan wars and used on a number of large-scale projects. PNNL began developing Starlight in the mid-1990s, with funding from the Land Information Warfare Agency, a part of the Army Intelligence and Security Command and continued developed at the laboratory with funding from the NSA and the CIA. Starlight integrates visual representations of reports, radio transcripts, radar signals, maps and other information. The software system was recently honored with an R&D 100 Award for technical innovation. In 2006 Future Point Systems, a Silicon Valley startup, acquired rights to jointly develop and distribute the Starlight product in cooperation with the Pacific Northwest National Laboratory. The software is now also used outside of the military/intelligence communities in a number of commercial environments.

    Read more →