AI Data Visualization Tools

AI Data Visualization Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digital on-screen graphic

    Digital on-screen graphic

    A digital on-screen graphic, digitally originated graphic (DOG, bug, network bug, on-screen bug or screenbug) is a watermark-like station logo that most television broadcasters overlay over a portion of the screen area of their programs to identify the channel. They are thus a form of permanent visual station identification, increasing brand recognition and asserting ownership of the video signal. The graphic identifies the source of programming, even if it has been time-shifted or recorded. Many of these technologies allow viewers to skip or omit traditional between-programming station identification; thus the use of a DOG enables the station or network to enforce brand identification even when standard commercials are skipped. DOG watermarking helps to reduce off-the-air copyright infringement—for example, the distribution of a current series' episodes on DVD: the watermarked content is easily differentiated from "official" DVD releases, and can help identify not only the station from which the broadcast was captured, but usually the actual date of the broadcast as well. Graphics may be used to identify if the correct subscription is being used for a type of venue. For example, showing Sky Sports within a pub in the United Kingdom requires a more expensive subscription; a channel authorized under this subscription adds a pint glass graphic to the bottom of the screen for inspectors to see. The graphic changes at certain times, making it harder to counterfeit. On the other hand, watermarks pollute the picture, distract viewers' attention and may cover an important piece of information presented in the television program. Extremely bright watermarks may cause screen burn-in or image persistence on some types of television sets such as the now mostly discontinued and rarely used plasma and CRT displays, and currently commonly used OLED and LCD displays. Usage of visually perceptible embedded watermarks requires the program author to have a separate clean copy for archival purposes, but this practice was not common decades ago when watermarking became popular among broadcasters. Watermarks present an issue when archival videos are used for a documentary that strives to create a coherent story. In some cases, watermarks are blurred or digitally removed if possible to clean up the picture. In the absence of visually perceptible watermarks, content control can be ensured with visually imperceptible digital watermarks. In some cases, the graphic also shows the name of the current program. Some television networks may place additional logos or text alongside their DOG to advertise significant upcoming programs. For example, broadcasters of the Olympic Games (most notably United States broadcaster NBC) often add the Olympic rings to their DOG for a period of time leading up to and during the Games. == Usage == == Connections with sponsor tags == Another graphic on television usually connected with sports (particularly in North America, though not in Europe) is the sponsor tag. It shows the logos of certain sponsors, accompanied by some background relevant to the game, the network logo, announcement and music of some kind. == Usage in ham radio and television == In most countries, the ham station is required to periodically identify their amateur-television transmission. Such stations frequently overlay their callsign on the signal instead of placing a card in the background. Most hams use homebuilt devices or old consumer character generators to generate such identifications rather than using graphical superimposes of high cost to do so. Only rarely one can see real graphics, as the callsign is usually written in the "OSD font". == Live DOGs by hobbyists == One of the easiest and most sought-after devices used to generate DOGs by hobbyists is the 1980s vintage Sony XV-T500 video superimposer. This device can luma-key a signal, capture a still frame into memory and then overlay the keyed graphic in one of eight colors onto any CVBS signal. Another method commonly used by hobbyists and even low-budgeted television stations was Amiga computers with genlock interfaces.

    Read more →
  • Information school

    Information school

    Information school (sometimes abbreviated I-school or iSchool) is a university-level institution committed to understanding the role of information in nature and human endeavors. Synonyms include school of information, department of information studies, or information department. Information schools faculty conduct research into the fundamental aspects of information and related technologies. In addition to granting academic degrees, information schools educate information professionals, researchers, and scholars for an increasingly information-driven world. Information school can also refer, in a more restricted sense, to the members of the iSchools organization (formerly the "iSchools Project"), as governed by the iCaucus. Members of this group share a fundamental interest in the relationships between people, information, technology, and science. These schools, colleges, and departments have been either newly established or have evolved from programs focused on information systems, library science, informatics, computer science, library and information science and information science. Information schools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces like online communities, the World Wide Web, and databases to physical spaces such as libraries, museums, archives, and other repositories. Information school degree programs include course offerings in areas such as data science, information architecture, design, economics, policy, retrieval, security, and telecommunications; knowledge management, user experience design, and usability; conservation and preservation, including digital preservation; librarianship and library administration; the sociology of information; and human–computer interaction.

    Read more →
  • Artificial intelligence in Indonesia

    Artificial intelligence in Indonesia

    Artificial intelligence in Indonesia refers to development, use and governance of artificial intelligence in Indonesia. Indonesia has treated AI as a national policy area through the Strategi Nasional Kecerdasan Artifisial or National Artificial Intelligence Strategy for 2020–2045. Public discussion has focused on the role of AI in sectors such as health, agriculture, education, mobile technology and e-commerce. Recent developments include AI ethics guidance issued by the communications ministry. Proposals for a national AI roadmap and sovereign AI fund, investment in cloud and AI infrastructure, and local-language AI initiatives for Bahasa Indonesia and regional Indonesian languages. == National strategy == Indonesia's National Artificial Intelligence Strategy is known in Indonesian as Strategi Nasional Kecerdasan Artifisial or Stranas KA. The strategy was published as a long-term framework for the development and use of AI between 2020 and 2045. It is intended to guide ministries, government agencies, regional governments and other stakeholders. The strategy identifies five priority sectors: health services, bureaucratic reform, education and research, food security, and mobility and smart cities. OECD lists the Ministry of Research and Technology and the National Research and Innovation Agency as organisations associated with the strategy. The strategy was developed through consultation with public and private stakeholders. == Institutions == The Indonesian Artificial Intelligence Industry Research and Innovation Collaboration, known as KORIKA is the nodal agency for the national AI strategy. KORIKA describes its vision as creating a collaborative ecosystem to accelerate implementation of the national AI strategy towards Vision Indonesia 2045. The Ministry of Communication and Digital Affairs has also been involved in AI governance, digital policy and public communication. In 2025, Reuters reported that the ministry was preparing a national AI roadmap to give investors and developers a clearer view of Indonesia's market, infrastructure and computing capacity. == AI Governance == Indonesia has introduced policy guidance on the ethical use of artificial intelligence. The policy sets out ethical values for the development and use of AI. These include humanity, security, transparency, credibility and accountability, personal data protection, sustainable development and intellectual property protection. A UNESCO country profile on Indonesia noted that Indonesia had adopted a national AI strategy and had policy frameworks. It also identified gaps in internet access, gender inclusion, language datasets, digital talent and cybersecurity. UNESCO recommended that Indonesia update its AI standards, invest in ethical AI, strengthen research coordination and consider establishing a national agency for artificial intelligence. In May 2026, Antara News reported comments by Deputy Minister of Communication and Digital Affairs Nezar Patria. Who said that AI safety requires partnerships, shared standards and continuing dialogue. == Sectors == AI policy discussions in Indonesia have identified health, agriculture, education, government services, mobility and smart cities as areas where AI could be applied. Mobile technology and e-commerce have been discussed as important areas of AI adoption in Indonesia. Research on AI adoption in Indonesia by Siddhartha Paul Tiwari and Adi Fahrudin has also examined mobile and e-commerce sectors. UNESCO has also noted that Indonesia's large digital economy and startup ecosystem have supported AI adoption, while also pointing to challenges in talent, research capacity and cybersecurity. Indonesia is one of the developing-country markets attracting AI infrastructure investment, including data centres. == Challenges == Indonesia faces several challenges in developing and governing AI. These include gaps in computing infrastructure, uneven connectivity outside major cities, shortages of skilled workers, limited research funding, cybersecurity risks, misinformation, data leaks and the underrepresentation of Indonesian and indigenous languages in AI datasets. UNESCO noted that Bahasa is spoken by around 200 million people but remains underrepresented in AI. It also noted that Indonesia has more than 700 indigenous languages, many of which face the risk of extinction. UNESCO recommended stronger coordination in AI research and a more unified strategy for using AI in language preservation.

    Read more →
  • BioCreative

    BioCreative

    BioCreAtIvE (A critical assessment of text mining methods in molecular biology) consists in a community-wide effort for evaluating information extraction and text mining developments in the biological domain. It was preceded by the Knowledge Discovery and Data Mining (KDD) Challenge Cup for detection of gene mentions. == Community Challenges == === First edition (2004-2005) === Three main tasks were posed at the first BioCreAtIvE challenge: the entity extraction task, the gene name normalization task, and the functional annotation of gene products task. The data sets produced by this contest serve as a Gold Standard training and test set to evaluate and train Bio-NER tools and annotation extraction tools. === Second edition (2006-2007) === The second BioCreAtIvE challenge (2006-2007) had also 3 tasks: detection of gene mentions, extraction of unique idenfiers for genes and extraction information related to physical protein-protein interactions. It counted with participation of 44 teams from 13 countries. === Third edition (2011-2012) === The third edition of BioCreative included for the first time the InterActive Task (IAT), designed to evaluate the practical usability of text mining tools in real-world biocuration tasks. === Fifth edition (2016) === BioCreative V had 5 different tracks, including an interactive task (IAT) for usability of text mining systems and a track using the BioC format for curating information for BioGRID.

    Read more →
  • Highway network

    Highway network

    In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

    Read more →
  • Algorithmic Puzzles

    Algorithmic Puzzles

    Algorithmic Puzzles is a book of puzzles based on computational thinking. It was written by computer scientists Anany and Maria Levitin, and published in 2011 by Oxford University Press. == Topics == The book begins with a "tutorial" introducing classical algorithm design techniques including backtracking, divide-and-conquer algorithms, and dynamic programming, methods for the analysis of algorithms, and their application in example puzzles. The puzzles themselves are grouped into three sets of 50 puzzles, in increasing order of difficulty. A final two chapters provide brief hints and more detailed solutions to the puzzles, with the solutions forming the majority of pages of the book. Some of the puzzles are well known classics, some are variations of known puzzles making them more algorithmic, and some are new. They include: Puzzles involving chessboards, including the eight queens puzzle, knight's tours, and the mutilated chessboard problem Balance puzzles River crossing puzzles The Tower of Hanoi Finding the missing element in a data stream The geometric median problem for Manhattan distance == Audience and reception == The puzzles in the book cover a wide range of difficulty, and in general do not require more than a high school level of mathematical background. William Gasarch notes that grouping the puzzles only by their difficulty and not by their themes is actually an advantage, as it provides readers with fewer clues about their solutions. Reviewer Narayanan Narayanan recommends the book to any puzzle aficionado, or to anyone who wants to develop their powers of algorithmic thinking. Reviewer Martin Griffiths suggests another group of readers, schoolteachers and university instructors in search of examples to illustrate the power of algorithmic thinking. Gasarch recommends the book to any computer scientist, evaluating it as "a delight".

    Read more →
  • DPVweb

    DPVweb

    DPVweb is a database for virologists working on plant viruses combining taxonomic, bioinformatic and symptom data. == Description == DPVweb is a central web-based source of information about viruses, viroids and satellites of plants, fungi and protozoa. It provides comprehensive taxonomic information, including brief descriptions of each family and genus, and classified lists of virus sequences. It makes use of a large database that also holds detailed, curated, information for all sequences of viruses, viroids and satellites of plants, fungi and protozoa that are complete or that contain at least one complete gene. There are currently about 10,000 such sequences. For comparative purposes, DPVweb also contains a representative sequence of all other fully sequenced virus species with an RNA or single-stranded DNA genome. For each curated sequence the database contains the start and end positions of each feature (gene, non-translated region, etc.), and these have been checked for accuracy. As far as possible, the nomenclature for genes and proteins are standardized within genera and families. Sequences of features (either as DNA or amino acid sequences) can be directly downloaded from the website in FASTA format. The sequence information can also be accessed via client software for personal computers. == History == The Descriptions of Plant Viruses (DPVs) were first published by the Association of Applied Biologists in 1970 as a series of leaflets, each one written by an expert describing a particular plant virus. In 1998 all of the 354 DPVs published in paper were scanned, and converted into an electronic format in a database and distributed on CDROM. In 2001 the descriptions were made available on the new DPVweb site, providing open access to the now 400+ DPVs (currently 415) as well as taxonomic and sequence data on all plant viruses. == Uses == DPVweb is an aid to researchers in the field of plant virology as well as an educational resource for students of virology and molecular biology. The site provides a single point of access for all known plant virus genome sequences making it easy to collect these sequences together for further analysis and comparison. Sequence data from the DPVweb database have proved valuable for a number of projects: survey of codon usage bias amongst all plant viruses, two-way comparisons between comprehensive sets of sequences from the families Flexiviridae and Potyviridae that have helped inform taxonomy and clarify genus and species discrimination criteria, a survey and verification of the polyprotein cleavage sites within the family Potyviridae.

    Read more →
  • Seismological Facility for the Advancement of Geoscience

    Seismological Facility for the Advancement of Geoscience

    The U.S. National Science Foundation's Seismological Facility for the Advancement of Geoscience (NSF SAGE) is a distributed, multi-user national facility that provides support for state of-the-art seismic research. It is operated by EarthScope Consortium. Its previous operator was the Incorporated Research Institutions for Seismology (IRIS), until its merger with UNAVCO to become EarthScope Consortium. NSF SAGE is one of the two premier geophysical facilities in support of geoscience and geoscience education of the National Science Foundation. The other premiere geophysical facility is NSF GAGE, the Geodetic Facility for the Advancement of Geoscience. The services of the facility include support for the Global Seismographic Network (GSN), Data Services, and instrument support via the EarthScope Primary Instrument Center (EPIC), including magnetotelluric (MT) geophysical research. == Global Seismographic Network (GSN) == NSF SAGE manages 40 stations of the 152-station Global Seismographic Network (GSN) for basic global seismicity and Earth structure research. The GSN also enables earthquake hazard mission-related data operations such as: Earthquake location and characterization Tsunami warning Nuclear explosion monitoring == Data Services == SAGE Data Services (DS) is the largest facility for the archiving, curation, and distribution of seismological and other geophysical data in the world. == EarthScope Primary Instrument Center (EPIC) == The EPIC facility maintains the largest open access, shared-use pool of portable seismic sensors in the world. It is located on the campus of New Mexico Tech. == MT == NSF SAGE provides instruments for magnetotelluric (MT) or electromagnetic geophysical research for the recording of our planet's ambient electric and magnetic fields, which allow for the characterization of the conductivity of the area consisting of the shallow crust to upper mantle. This helps with analysis of results obtained from seismic imaging methodologies. The NSF SAGE facility is: Developing open source MT data formatting and processing software. Providing access to proprietary software products.

    Read more →
  • Ayoba

    Ayoba

    Ayoba is an African communication platform developed in South Africa. It is owned by Progressive Tech Holdings in Mauritius and managed by SIMFY Africa. Launched on May 4, 2019, as of April 2024, it has over 35 million active users. == History == Ayoba was first published on Google Play in February 2019. Its first marketing campaign and brand launch took place in Cameroon on May 4, 2019. In June 2019, the platform introduced its first eight channels. In November 2019, the platform reached one million active users, which increased to two million by June 2020. Subsequently, ayoba expanded its services, including the launch of games for Android in February 2020, Momo (Mobile Money) in Cameroon in May 2020, and MicroApps in May 2020. It also launched music and voice and video calling features in 12 territories in August 2020. The first version of ayoba for iOS was released in September 2020. In December of the same year, games and Messaging 2.0 were launched on the platform. In November 2020, it won Best Mobile Application at the African Digital Awards. In 2021, it won OTT Brand of the Year at the Marketing World Awards in Ghana. In December 2022, it received Top Innovative Technology and Telecom Product of the Year at the National Communications Awards in December 2022. In June 2023 ayoba partnered with BoomPlay and as of April 2024, it had 35 million monthly active users. Ayoba has partnered with Jumia Ghana to offer exclusive deals to users. Ayoba users can get a 10% discount on selected Jumia purchases through the app, with no data charges for MTN users. This partnership aims to make online shopping more affordable and accessible by integrating Jumia's offers into the ayoba app. Ayoba supports over 35 million users across Africa and provides services in 22 languages. To access the deals, users can download the ayoba app from the Google Play Store, iOS Store, or the official website. == Platform features == Chat, Call and Share: ayoba enables instant messaging, voice notes, picture sharing, and file sharing with contacts, even if they do not have the app installed. The app supports voice and video calls on both Android and iOS, as well as group chats, help channel and SMS continuity (non ayoba users receive messages as SMS, their responses appear in the ayoba app). Music: ayoba offers a free music player with daily updates on international and African music. Users can find playlists for different genres. Games: ayoba provides a selection of interactive games, including action, adventure, and children's games available on both Android and iOS. Mobile Money Transfers: In certain territories, ayoba supports mobile money transfers using MTN Mobile Money (MoMo) for transactions within the app. MicroApps: ayoba features individual MicroApps within the platform that offer content and services, including streaming channels, podcasts, and specialized apps. The availability of these apps may vary by country. == Operations == ayoba primarily focuses on the following territories: Nigeria, Cameroon, South Africa, Ghana, Côte d'Ivoire, Uganda, Republic of Congo, Benin, Zambia, Tanzania, Kenya, Senegal, Togo, Guinea Bissau, Guinea Conakry, Sudan, South Sudan, and Liberia. The company operates from its offices in Cape Town and Johannesburg, South Africa. David Gillaranz served as the CEO from 2019 to 2021, and Burak Akinci has been the CEO since 2021.

    Read more →
  • Algorithmic mechanism design

    Algorithmic mechanism design

    Algorithmic mechanism design (AMD) lies at the intersection of economic game theory, optimization, and computer science. The prototypical problem in mechanism design is to design a system for multiple self-interested participants, such that the participants' self-interested actions at equilibrium lead to good system performance. Typical objectives studied include revenue maximization and social welfare maximization. Algorithmic mechanism design differs from classical economic mechanism design in several respects. It typically employs the analytic tools of theoretical computer science, such as worst case analysis and approximation ratios, in contrast to classical mechanism design in economics which often makes distributional assumptions about the agents. It also considers computational constraints to be of central importance: mechanisms that cannot be efficiently implemented in polynomial time are not considered to be viable solutions to a mechanism design problem. This often, for example, rules out the classic economic mechanism, the Vickrey–Clarke–Groves auction. == History == Noam Nisan and Amir Ronen first coined "Algorithmic mechanism design" in a research paper published in 1999.

    Read more →
  • Operational system

    Operational system

    An operational system is a term used in data warehousing to refer to a system that is used to process the day-to-day transactions of an organization. These systems are designed in a manner that processing of day-to-day transactions is performed efficiently and the integrity of the transactional data is preserved. == Synonyms == Sometimes operational systems are referred to as operational databases, transaction processing systems, or online transaction processing systems (OLTP). However, the use of the last two terms as synonyms may be confusing, because operational systems can be batch processing systems as well. Any enterprise must necessarily maintain a lot of data about its operation.

    Read more →
  • Synthetic data

    Synthetic data

    Synthetic data are artificially generated data not produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a computer simulation can be seen as synthetic data. This encompasses most applications of physical modeling, such as music synthesizers or flight simulators. The output of such systems approximates the real thing, but is fully algorithmically generated. Synthetic data is used in a variety of fields as a filter for information that would otherwise compromise the confidentiality of particular aspects of the data. In many sensitive applications, datasets theoretically exist but cannot be released to the general public; synthetic data sidesteps the privacy issues that arise from using real consumer information without permission or compensation. == Usefulness == Synthetic data is generated to meet specific needs or certain conditions that may not be found in the original, real data. One of the hurdles in applying up-to-date machine learning approaches for complex scientific tasks is the scarcity of labeled data, a gap effectively bridged by the use of synthetic data, which closely replicates real experimental data. This can be useful when designing many systems, from simulations based on theoretical value, to database processors, etc. This helps detect and solve unexpected issues such as information processing limitations. Synthetic data are often generated to represent the authentic data and allows a baseline to be set. Another benefit of synthetic data is to protect the privacy and confidentiality of authentic data, while still allowing for use in testing systems. Computer security experts claim generated synthetic data "... enables us to create realistic behavior profiles for users and attackers. The data is used to train the fraud detection system itself, thus creating the necessary adaptation of the system to a specific environment." In defense and military contexts, synthetic data is seen as a potentially valuable tool to develop and improve complex AI systems, particularly in contexts where high-quality real-world data is scarce. At the same time, synthetic data together with the testing approach can give the ability to model real-world scenarios. == History == Scientific modelling of physical systems has a long history that runs concurrent with the history of physics. For example, research into synthesis of audio and voice can be traced back to the 1930s and before, driven forward by the developments of the telephone and audio recording technologies. Digitization gave rise to software synthesizers from the 1970s onwards. In the context of privacy-preserving statistical analysis, in 1993, the idea of original fully synthetic data was created by Donald Rubin. Rubin originally designed this to synthesize the Decennial Census long form responses for the short form households. He then released samples that did not include any actual long form records - in this he preserved anonymity of the household. Later that year, the idea of original partially synthetic data was created by Little. Little used this idea to synthesize the sensitive values on the public use file. A 1993 work fitted a statistical model to 60,000 MNIST digits, then it was used to generate over 1 million examples. Those were used to train a LeNet-4 to reach state of the art performance. In 1994, Stephen Fienberg introduced 'critical refinement', in which a parametric posterior predictive distribution (instead of a Bayes bootstrap) is used to do the sampling. Later, other important contributors to the development of synthetic data generation were Trivellore Raghunathan, Jerry Reiter, Donald Rubin, John M. Abowd, and Jim Woodcock. Collectively they came up with a solution for how to treat partially synthetic data with missing data. Similarly, they developed the technique of Sequential Regression Multivariate Imputation. == Calculations == Researchers test the framework on synthetic data, which is "the only source of ground truth on which they can objectively assess the performance of their algorithms". Synthetic data can be generated through the use of random lines, having different orientations and starting positions. Datasets can get fairly complicated. A more complicated dataset can be generated by using a synthesizer build. To create a synthesizer build, first use the original data to create a model or equation that fits the data the best. This model or equation will be called a synthesizer build. This build can be used to generate more data. Constructing a synthesizer build involves constructing a statistical model. In a linear regression line example, the original data can be plotted, and a best fit linear line can be created from the data. This line is a synthesizer created from the original data. The next step will be generating more synthetic data from the synthesizer build or from this linear line equation. In this way, the new data can be used for studies and research, and it protects the confidentiality of the original data. David Jensen from the Knowledge Discovery Laboratory explains how to generate synthetic data: "Researchers frequently need to explore the effects of certain data characteristics on their data model." To help construct datasets exhibiting specific properties, such as auto-correlation or degree disparity, proximity can generate synthetic data having one of several types of graph structure: random graphs that are generated by some random process; lattice graphs having a ring structure; lattice graphs having a grid structure, etc. In all cases, the data generation process follows the same process: Generate the empty graph structure. Generate attribute values based on user-supplied prior probabilities. Since the attribute values of one object may depend on the attribute values of related objects, the attribute generation process assigns values collectively. == Applications == === Fraud detection and confidentiality systems === Testing and training fraud detection and confidentiality systems are devised using synthetic data. Specific algorithms and generators are designed to create realistic data, which then assists in teaching a system how to react to certain situations or criteria. For example, intrusion detection software is tested using synthetic data. This data is a representation of the authentic data and may include intrusion instances that are not found in the authentic data. The synthetic data allows the software to recognize these situations and react accordingly. If synthetic data was not used, the software would only be trained to react to the situations provided by the authentic data and it may not recognize another type of intrusion. === Scientific research === Researchers doing clinical trials or any other research may generate synthetic data to aid in creating a baseline for future studies and testing. Real data can contain information that researchers may not want released, so synthetic data is sometimes used to protect the privacy and confidentiality of a dataset. Using synthetic data reduces confidentiality and privacy issues since it holds no personal information and cannot be traced back to any individual. Beyond privacy protection, synthetic data is also being explored for methodological innovation in drug development. For instance, synthetic data may be used to construct synthetic control arms as an alternative to conventional external control arms based on real-world data (RWD) or randomized controlled trials (RCTs). Collectively, regulatory agencies such as the FDA and EMA appear to be at various stages of recognizing and integrating AI-generated synthetic data into their methodologies. While there is growing consensus on the potential of such data to support model development and the broader lifecycle of medicinal products, to date no drug or medical device has been approved using solely or predominantly synthetic data—particularly not as a comparator arm generated entirely via data-driven algorithms. The quality and statistical handling of synthetic data are expected to become more prominent in future regulatory discussions, particularly in contexts such as predictive modeling (e.g., digital twins), where innovative approaches have already been referenced. === Machine learning === Synthetic data is increasingly being used for machine learning applications: a model is trained on a synthetically generated dataset with the intention of transfer learning to real data. Efforts have been made to enable more data science experiments via the construction of general-purpose synthetic data generators, such as the Synthetic Data Vault. In general, synthetic data has several natural advantages: once the synthetic environment is ready, it is fast and cheap to produce as much data as needed; synthetic data can have perfectly accurate labels, including labeling that may be very expensive or impo

    Read more →
  • T-pose

    T-pose

    In computer animation, a T-pose is a default posing for a humanoid 3D model's skeleton before it is animated. It is called so because of its shape: the straight legs and arms of a humanoid model combine to form a capital letter T. When the arms are angled downwards, the pose is sometimes referred to as an A-pose instead. Likewise, if the arms are angled upward, it is called a Y-pose. Generic terms encompassing all these (especially for non-humanoid models) include bind pose, blind pose, and reference pose. == Usage == The T-pose is primarily used as the default armature pose for skeletal animation in 3D software, which is then manipulated to create animation. The purpose of the T-pose relates to the important elements of the body being axis-aligned, thereby making it easier to rig the model for animation, physics, and other controls. Depending on the exact geometry of the model, other poses such as the A-pose may be more suitable for vertex deformation around areas such as the shoulders. Outside of being default poses in animation software, T-poses are typically used as placeholders for animation not yet completed, particularly in 3D animated video games. In some motion capture software, a T-pose must be assumed by the actor in the motion capture suit before motion capturing can begin. There are other poses used, but the T-pose is the most common one. == As an Internet meme == Starting in 2016 and resurfacing in 2017, the T-pose has become a widespread Internet meme due to its bizarre and somewhat comedic appearance, especially in video game glitches where a character's animation is unexpectedly supplanted by a T-pose. In a prerelease video of the game NBA Elite 11, the demo was filled with glitches, notably one unintentionally showing a T-pose in place of the proper animation for the model of player Andrew Bynum. The glitch later gained fame as the "Jesus Bynum glitch". Publisher EA eventually cancelled the game as they found it unsatisfactory. A similar occurrence happened with Cyberpunk 2077. In the 2023 Formula One season, driver George Russell performed a T-pose in the opening credits of the series' TV broadcasts. This quickly became a meme within the motorsports community. Russell repeated the pose after claiming pole position at the 2024 Canadian Grand Prix and winning the 2024 Austrian Grand Prix.

    Read more →
  • Information

    Information

    Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. The concept of information is relevant to and connected with various concepts, including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy. Information is often processed iteratively: Data available at one step are processed into information to be interpreted and processed at the next step. For example, in written text each symbol or letter conveys information relevant to the word it is part of, each word conveys information relevant to the phrase it is part of, each phrase conveys information relevant to the sentence it is part of, and so on until at the final step information is interpreted and becomes knowledge in a given domain. In a digital signal, bits may be interpreted into the symbols, letters, numbers, or structures that convey the information available at the next level up. The key characteristic of information is that it is subject to interpretation and processing. The derivation of information from a signal or message may be thought of as the resolution of ambiguity or uncertainty that arises during the interpretation of patterns within the signal or message. Information may be structured as data. Redundant data can be compressed up to an optimal size, which is the theoretical limit of compression. The information available through a collection of data may be derived by analysis. For example, a restaurant collects data from every customer order. That information may be analyzed to produce knowledge that is put to use when the business subsequently wants to identify the most popular or least popular dish. Information can be transmitted in time, via data storage, and space, via communication and telecommunication. Information is expressed either as the content of a message or through direct or indirect observation. That which is perceived can be construed as a message in its own right, and in that sense, all information is always conveyed as the content of a message. Information can be encoded into various forms for transmission and interpretation (for example, information may be encoded into a sequence of signs, or transmitted via a signal). It can also be encrypted for safe storage and communication. The uncertainty of an event is measured by its probability of occurrence. Uncertainty is proportional to the negative logarithm of the probability of occurrence. Information theory takes advantage of this by concluding that more uncertain events require more information to resolve their uncertainty. The bit is the standard unit of information. It is 'that which reduces uncertainty by half'. Other units such as the nat may be used. For example, the information encoded in one "fair" coin flip is log2(2/1) = 1 bit, and in two fair coin flips is log2(4/1) = 2 bits. A 2011 Science article estimates that 97% of technologically stored information was already in digital bits in 2007 and that the year 2002 was the beginning of the digital age for information storage (with digital storage capacity bypassing analogue for the first time). == Etymology and history of the concept == The English word "information" comes from Middle French enformacion/informacion/information 'a criminal investigation' and its etymon, Latin informatiō(n) 'conception, teaching, creation'. In English, "information" is an uncountable mass noun. References on "formation or molding of the mind or character, training, instruction, teaching" date from the 14th century in both English (according to Oxford English Dictionary) and other European languages. In the transition from Middle Ages to Modernity the use of the concept of information reflected a fundamental turn in epistemological basis – from "giving a (substantial) form to matter" to "communicating something to someone". Peters (1988, pp. 12–13) concludes: Information was readily deployed in empiricist psychology (though it played a less important role than other words such as impression or idea) because it seemed to describe the mechanics of sensation: objects in the world inform the senses. But sensation is entirely different from "form" – the one is sensual, the other intellectual; the one is subjective, the other objective. My sensation of things is fleeting, elusive, and idiosyncratic. For Hume, especially, sensory experience is a swirl of impressions cut off from any sure link to the real world... In any case, the empiricist problematic was how the mind is informed by sensations of the world. At first informed meant shaped by; later it came to mean received reports from. As its site of action drifted from cosmos to consciousness, the term's sense shifted from unities (Aristotle's forms) to units (of sensation). Information came less and less to refer to internal ordering or formation, since empiricism allowed for no preexisting intellectual forms outside of sensation itself. Instead, information came to refer to the fragmentary, fluctuating, haphazard stuff of sense. Information, like the early modern worldview in general, shifted from a divinely ordered cosmos to a system governed by the motion of corpuscles. Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses. In the modern era, the most important influence on the concept of information is derived from the Information theory developed by Claude Shannon and others. This theory, however, reflects a fundamental contradiction. Northrup (1993) wrote: Thus, actually two conflicting metaphors are being used: The well-known metaphor of information as a quantity, like water in the water-pipe, is at work, but so is a second metaphor, that of information as a choice, a choice made by :an information provider, and a forced choice made by an :information receiver. Actually, the second metaphor implies that the information sent isn't necessarily equal to the information received, because any choice implies a comparison with a list of possibilities, i.e., a list of possible meanings. Here, meaning is involved, thus spoiling the idea of information as a pure "Ding an sich." Thus, much of the confusion regarding the concept of information seems to be related to the basic confusion of metaphors in Shannon's theory: is information an autonomous quantity, or is information always per SE information to an observer? Actually, I don't think that Shannon himself chose one of the two definitions. Logically speaking, his theory implied information as a subjective phenomenon. But this had so wide-ranging epistemological impacts that Shannon didn't seem to fully realize this logical fact. Consequently, he continued to use metaphors about information as if it were an objective substance. This is the basic, inherent contradiction in Shannon's information theory." (Northrup, 1993, p. 5). In their seminal book The Study of Information: Interdisciplinary Messages, Almach and Mansfield (1983) collected key views on the interdisciplinary controversy in computer science, artificial intelligence, library and information science, linguistics, psychology, and physics, as well as in the social sciences. Almach (1983, p. 660) himself disagrees with the use of the concept of information in the context of signal transmission, the basic senses of information in his view all referring "to telling something or to the something that is being told. Information is addressed to human minds and is received by human minds." All other senses, including its use with regard to nonhuman organisms as well to society as a whole, are, according to Machlup, metaphoric and, as in the case of cybernetics, anthropomorphic. Hjørland (2007) describes the fundamental difference between objective and subjective views of information and argues that the subjective view has been supported by, among others, Bateson, Yovits, Span-Hansen, Brier, Buckland, Goguen, and Hjørland. Hjørland provided the following example: A stone on a field could contain different information for different people (or from one situation to another). It is not possible for information systems to map all the stone's possible information for every individual. Nor is any one mapping the one "true" mapping. But peop

    Read more →
  • Online public access catalog

    Online public access catalog

    The online public access catalog (OPAC), now frequently synonymous with library catalog, is an online database of materials held by a library or group of libraries. Online catalogs have largely replaced the analog card catalogs previously used in libraries. == History == === Early online === Although a handful of experimental systems existed as early as the 1960s, the first large-scale online catalogs were developed at Ohio State University in 1975 and the Dallas Public Library in 1978. These and other early online catalog systems tended to closely reflect the card catalogs that they were intended to replace. Using a dedicated terminal or telnet client, users could search a handful of pre-coordinate indexes and browse the resulting display in much the same way they had previously navigated the card catalog. Throughout the 1980s, the number and sophistication of online catalogs grew. The first commercial systems appeared, and would by the end of the decade largely replace systems built by libraries themselves. Library catalogs began providing improved search mechanisms, including Boolean and keyword searching, as well as ancillary functions, such as the ability to place holds on items that had been checked-out. At the same time, libraries began to develop applications to automate the purchase, cataloging, and circulation of books and other library materials. These applications, collectively known as an integrated library system (ILS) or library management system, included an online catalog as the public interface to the system's inventory. Most library catalogs are closely tied to their underlying ILS system. === Stagnation and dissatisfaction === The 1990s saw a relative stagnation in the development of online catalogs. Although the earlier character-based interfaces were replaced with ones for the Web, both the design and the underlying search technology of most systems did not advance much beyond that developed in the late 1980s. At the same time, organizations outside of libraries began developing more sophisticated information retrieval systems. Web search engines like Google and popular e-commerce websites such as Amazon.com provided simpler to use (yet more powerful) systems that could provide relevancy ranked search results using probabilistic and vector-based queries. Prior to the widespread use of the Internet, the online catalog was often the first information retrieval system library users ever encountered. Now accustomed to web search engines, newer generations of library users have grown increasingly dissatisfied with the complex (and often arcane) search mechanisms of older online catalog systems. This has, in turn, led to vocal criticisms of these systems within the library community itself, and in recent years to the development of newer (often termed 'next-generation') catalogs. === Next-generation catalogs === Newer generations of library catalog systems, typically called discovery systems (or a discovery layer), are distinguished from earlier OPACs by their use of more sophisticated search technologies, including relevancy ranking and faceted search, as well as features aimed at greater user interaction and participation with the system, including tagging and reviews. These new features rely heavily on existing metadata which may be poor or inconsistent, particularly for older records. Newer catalog platforms may be independent of the organization's integrated library system (ILS), instead providing drivers that allow for the synchronization of data between the two systems. While the original online catalog interfaces were almost exclusively built by ILS vendors, libraries have increasingly sought next-generation catalogs built by enterprise search companies and open-source software projects, often led by libraries themselves. == Union catalogs == Although library catalogs typically reflect the holdings of a single library, they can also contain the holdings of a group or consortium of libraries. These systems, known as union catalogs, are usually designed to aid the borrowing of books and other materials among the member institutions via interlibrary loan. Examples of this type of catalogs include COPAC, SUNCAT, NLA Trove, and WorldCat—the last catalogs the collections of libraries worldwide. == Related systems == There are a number of systems that share much in common with library catalogs, but have traditionally been distinguished from them. Libraries utilize these systems to search for items not traditionally covered by a library catalog, although these systems are sometimes integrated into a more comprehensive discovery system. Bibliographic databases—such as Medline, ERIC, PsycINFO, Scopus, Web of Science, and many others—index journal articles and other research data. There are also a number of applications aimed at managing documents, photographs, and other digitized or born-digital items such as Digital Commons and DSpace. Particularly in academic libraries, these systems (often known as digital library systems or institutional repository systems) assist with efforts to preserve documents created by faculty and students. Electronic resource management helps librarians to track selection, acquisition, and licensing of a library's electronic information resources.

    Read more →