AI Data Explorer Servicenow

AI Data Explorer Servicenow — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • VoxForge

    VoxForge

    VoxForge is a free speech corpus and acoustic model repository for open source speech recognition engines. VoxForge was set up to collect transcribed speech to create a free GPL speech corpus in order to be uses with open source speech recognition engines. The speech audio files will be 'compiled' into acoustic models for use with open source speech recognition engines such as Julius, ISIP, and Sphinx and HTK (note: HTK has distribution restrictions). VoxForge has used LibriVox as a source of audio data since 2007.

    Read more →
  • ARMA International

    ARMA International

    ARMA International (formerly the Association of Records Managers and Administrators) is an American not-for-profit professional association for information professionals – primarily information management (including records management) and information governance, and related industry practitioners and vendors. The association provides educational opportunities and publications covering aspects of information management broadly. == History == The Association was founded in 1955. In 1975, the Association of Records Executives and Administrators (AREA) and the American Records Management Association merged to form ARMA International. The headquarters for ARMA International is located in Overland Park, Kansas. == Operations == ARMA International services professionals in the United States, Canada, Japan, and the United Kingdom. Its members include records managers, attorneys, information technology professionals, consultants, and archivists involved in various aspects of managing records and information assets. ARMA hosts an annual conference with the goal of bringing together record and information management professionals from around the world – In 2023, ARMA hosted conferences in both the United States and Canada. Topics addressed in the 120+ educational sessions include advanced technology, creating information structure, ediscovery and information law, information management fundamentals, information project management, and reducing organizational information risk. The expo features exhibitors displaying records and information technologies, products, and services.

    Read more →
  • Library and information scientist

    Library and information scientist

    A library and information scientist, also known as a library scholar, is a researcher or academic who specializes in the field of library and information science and often participates in scholarly writing about and related to library and information science. A library and information scientist is neither limited to any one subfield of library and information science nor any one particular type of library. These scientists come from all information-related sectors including library and book history. == University of Chicago Graduate Library School == The University of Chicago Graduate Library School was established in 1928 to grant a graduate degree in librarianship with an emphasis on research. The program expanded the concept of librarianship, focused on scientific inquiry and established it as a domain for scientific study. In The Spirit of Inquiry: The Graduate Library School at Chicago, 1921-51 Richardson reviewed the history of the School and its impact on the discipline. == Bibliometric mappings == Bibliometric methods have been used to create maps of library and information science, thus identifying the most important researchers as well as their relative connections (or distances) and identifying emerging trends related to LIS publications within the field. White and McCain (1998) made a map of information science and Åström (2002), Chen, Ibekwe-SanJuan, and Hou (2010), Janssens, Leta, Glanzel, and De Moor (2006), and Zhao and Strotmann (2008) constructed some later maps of library and information science. Jabeen, Yun, Rafiq, and Jabeen (2015) mapped the growth and trends of LIS publications. == Notable library and information scientists == See also Beta Phi Mu Award, Award of Merit - Association for Information Science and Technology, Justin Winsor Prize (library)

    Read more →
  • Artificial intelligence in marketing

    Artificial intelligence in marketing

    Artificial intelligence marketing (AI marketing) is a form of marketing that uses artificial intelligence concepts and models such as machine learning, natural language processing, and computer vision to achieve marketing goals. The main difference between AI marketing and traditional forms of marketing reside in the reasoning, which is performed through a computer algorithm rather than a human. Each form of marketing has a different technique to the core of the marketing theory. Traditional marketing directly focuses on the needs of consumers; meanwhile some believe the shift AI may cause will lead marketing agencies to manage consumer needs instead. AI is used in various digital marketing spaces, such as content marketing, email marketing, online advertisement (in combination with machine learning), social media marketing, affiliate marketing, and beyond. == Historical development == AI in marketing has a long history, which goes all the way back to the 1980s. At this time, AI research was focusing on expert systems and robotics. Despite the initial research and the studies that were carried out, AI adoption remained limited. Research on it came to a stop for a while, until research was revived two decades later with the advancement in technology, the rise of big data, and a significant increase in computational power. Eventually, AI became very popular in the marketing world, and caught the eyes of many researchers as well as professionals. A large‐scale bibliometric study covering 1,580 peer‑reviewed papers published between 1982 and 2020 confirms that scholarly output on AI in marketing has surged since 2017, with Expert Systems with Applications emerging as the most prolific outlet. Prior to the application of artificial Intelligence in marketing, there was something called "collaborative filtering". This was used as early as 1998 by Amazon, and one of the first ways companies predicted consumer behavior, which enabled millions of recommendations to different customers. Personalized recommender systems are now widely used, for example to suggest music on Spotify, or TV shows on Netflix. A big milestone in AI marketing happened in 2014, when programmatic ad buying gained much greater popularity. Marketing consists of numerous manual tasks such as researching target markets, insertion orders, and managing high budgets as well as prices. In order to cut costs, and remove the need for these tedious tasks, many companies started to automate the marketing process with AI. In 2015, Google introduced RankBrain, a machine learning component of its search algorithm designed to interpret the intent behind user queries. RankBrain was followed by further AI-based search updates, including BERT in 2019, which improved the understanding of conversational queries, and the Multitask Unified Model (MUM) in 2021, which is multimodal and processes information across 75 languages. These advances shifted search engine optimization practice away from keyword matching toward content that satisfies user intent. Artificial intelligence is increasingly used in marketing to personalize user experiences and automate decision-making. For example, Netflix uses AI algorithms to recommend content based on viewing history, while Sephora employs chatbots to assist customers with product selection and availability. Programmatic advertising platforms like Google Ads leverage machine learning to optimize bidding strategies and target audiences more effectively. These applications demonstrate how AI enhances efficiency, engagement, and conversion rates across digital channels. === Artificial neural networks === An artificial neural network is a form of computer program modeled on the brain and nervous system of humans. Neural networks are composed of a series of interconnected processing neurons that function in unison to achieve certain outcomes. Using “human-like trial and error learning methods neural networks detect patterns existing within a data set ignoring data that is not significant while emphasizing the data which is most influential”. From a marketing perspective, neural networks are a form of software tool used to assist in decision making. Neural networks are effective in gathering and extracting information from large data sources and have the ability to identify cause and effect within tha data. These neural nets through the process of learning, identify relationships and connections between databases. Once knowledge has been accumulated, neural networks can be relied on to provide generalizations and can apply past knowledge and learning to a variety of situations. Neural networks help fulfill the role of marketing companies through effectively aiding in market segmentation and measurement of performance while reducing costs and improving accuracy. Due to their learning ability, flexibility, adaption, and knowledge discovery, neural networks offer many advantages over traditional models. Neural networks can be used to assist in pattern classification, forecasting and marketing analysis. == Tools and uses == Classification of customers can be facilitated through the neural network approach allowing companies to make informed marketing decisions. An example of this was employed by Spiegel Inc., a firm dealing in direct-mail operations that used neural networks to improve efficiencies. Using software developed by NeuralWare Inc., Spiegel identified the demographics of customers who had made a single purchase and those customers who had made repeat purchases. Neural networks where then able to identify the key patterns and consequently identify the customers that were most likely to repeat purchase. Understanding this information allowed Spiegel to streamline marketing efforts, and reduced costs. Sales forecasting “is the process of estimating future events with the goal of providing benchmarks for monitoring actual performance and reducing uncertainty". Artificial intelligence techniques have emerged to facilitate the process of forecasting through increasing accuracy in the areas of demand for products, distribution, employee turnover, performance measurement, and inventory control. An example of forecasting using neural networks is the Airline Marketing Assistant/Tactician; an application developed by BehabHeuristics which allows for the forecasting of passenger demand and consequent seat allocation through neural networks. This system has been used by National air Canada and USAir. Neural networks provide a useful alternative to traditional statistical models due to their reliability, time-saving characteristics and ability to recognize patterns from incomplete or noisy data. Examples of marketing analysis systems includes the Target Marketing System developed by Churchull Systems for Veratex Corporation. This support system scans a market database to identify dormant customers allowing management to make decisions regarding which key customers to target. When performing marketing analysis, neural networks can assist in the gathering and processing of information ranging from consumer demographics and credit history to the purchase patterns of consumers. Predictive analytics is a form of analytics involving the use of historical data and artificial intelligence algorithms to predict future trends and outcomes. It serves as a tool for anticipating and understanding user behavior based on patterns found in data. Predictive analytics uses artificial intelligence machine learning algorithms to recognize and predict patterns within data. Machine learning algorithms analyze the data, recognize patterns, and make predictions through continuous learning and adaptation. Predictive analytics is widely used across businesses and industries as a way to identify opportunities, avoid risks, and anticipate customer needs based on information derived from the analysis of user data. By analyzing historical customer data, artificial intelligence algorithms can deliver relevant and targeted marketing content. Recent systematic reviews show that generative large‑language models such as GPT‑3 and GPT‑4 are now routinely embedded in predictive‑analytics pipelines to mine unstructured market data and anticipate customer intent with greater precision. Personalization engines use artificial intelligence and machine learning to provide content or advertisements that are relevant to the user. User data is gathered, which then gets processed with machine learning, and patterns and trends among the users are identified. Users with shared characteristics or behaviors are then segmented into groups, and the personalization engine adjusts content and advertisements to match each segment's preferences. By processing a large amount of data, personalization engines are able to match users to advertisements and recommendations that align with their interests or preferences. Field evidence from consumer‑goods and electronics firms indicates that AI‑driven personalization can raise

    Read more →
  • JAX (software)

    JAX (software)

    JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. It is developed by Google with contributions from Nvidia and other community contributors. It is described as bringing together a modified version of the automatic differentiation system autograd and OpenXLA's XLA (Accelerated Linear Algebra). It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary features of JAX are: Providing a unified NumPy-like interface to computations that run on CPU, GPU, or TPU, in local or distributed settings. Built-in Just-In-Time (JIT) compilation via OpenXLA, an open-source machine learning compiler ecosystem. Efficient evaluation of gradients via its automatic differentiation transformations. Automatic vectorization to efficiently map functions over arrays representing batches of inputs. == Libraries using Jax == Flax Equinox Optax

    Read more →
  • Information behavior

    Information behavior

    Information behavior is a field of information science research that seeks to understand the way people search for and use information in various contexts. It can include information seeking and information retrieval, but it also aims to understand why people seek information and how they use it. The term 'information behavior' was coined by Thomas D. Wilson in 1982 and sparked controversy upon its introduction. The term has now been adopted and Wilson's model of information behavior is widely cited in information behavior literature. In 2000, Wilson defined information behavior as "the totality of human behavior in relation to sources and channels of information". A variety of theories of information behavior seek to understand the processes that surround information seeking. An analysis of the most cited publications on information behavior during the early 21st century shows its theoretical nature. Information behavior research can employ various research methodologies grounded in broader research paradigms from psychology, sociology and education. In 2003, a framework for information-seeking studies was introduced that aims to guide the production of clear, structured descriptions of research objects and positions information-seeking as a concept within information behavior. == Concepts of information behavior == === Information need === Information need is a concept introduced by Wilson. Understanding the information need of an individual involved three elements: Why the individual decides to look for information, What purpose the information they find will serve, and How the information is used once it is retrieved === Information-seeking behavior === Information-seeking behavior is a more specific concept of information behavior. It specifically focuses on searching, finding, and retrieving information. Information-seeking behavior research can focus on improving information systems or, if it includes information need, can also focus on why the user behaves the way they do. A review study on information search behavior of users highlighted that behavioral factors, personal factors, product/service factors and situational factors affect information search behavior. Information-seeking behavior can be more or less explicit on the part of users: users might seek to solve some task or to establish some piece of knowledge which can be found in the data in question, or alternatively the search process itself is part of the objective of the user, in use cases for exploring visual content or for familiarising oneself with the content of an information service. In the general case, information-seeking needs to be understood and analysed as a session rather than as a one-off transaction with a search engine, and in a broader context which includes user high-level intentions in addition to the immediate information need. === Information use === An information need is the recognition that a gap exists in one’s knowledge, prompting a desire to seek information to fill that gap. It often arises when a person encounters a problem or question they cannot resolve with their current understanding. === Information poverty and barriers === Introduced by Elfreda Chatman in 1987, information poverty is informed by the understanding that information is not equally accessible to all people. Information poverty does not describe a lack of information, but rather a worldview in which one's own experiences inside their own small world may create a distrust in the information provided by those outside their own lived experiences. == Metatheories == In Library and Information Science (LIS), a metatheory is described "a set of assumptions that orient and direct theorizing about a given phenomenon". Library and information science researchers have adopted a number of different metatheories in their research. A common concern among LIS researchers, and a prominent discussion in the field, is the broad spectrum of theories that inform the study of information behavior, information users, or information use. This variation has been noted as a cause of concern because it makes individual studies difficult to compare or synthesize if they are not guided by the same theory. This sentiment has been expressed in studies of information behavior literature from the early 1980s and more recent literature reviews have declared it necessary to refine their reviews to specific contexts or situations due to the sheer breadth of information behavior research available. Below are descriptions of some, but not all, metatheories that have guided LIS research. === Cognitivist approach === A cognitive approach to understanding information behavior is grounded in psychology. It holds the assumption that a person's thinking influences how they seek, retrieve, and use information. Researchers that approach information behavior with the assumption that it is influenced by cognition, seek to understand what someone is thinking while they engage in information behavior and how those thoughts influence their behavior. Wilson's attempt to understand information-seeking behavior by defining information need includes a cognitive approach. Wilson theorizes that information behavior is influenced by the cognitive need of an individual. By understanding the cognitive information need of an individual, we may gain insight into their information behavior. Nigel Ford takes a cognitive approach to information-seeking, focusing on the intellectual processes of information-seeking. In 2004, Ford proposed an information-seeking model using a cognitive approach that focuses on how to improve information retrieval systems and serves to establish information-seeking and information behavior as concepts in and of themselves, rather than synonymous terms. === Constructionist approach === The constructionist approach to information behavior has roots in the humanities and social sciences. It relies on social constructionism, which assumes that a person's information behavior is influenced by their experiences in society. In order to understand information behavior, constructionist researchers must first understand the social discourse that surrounds the behavior. The most popular thinker referenced in constructionist information behavior research is Michel Foucault, who famously rejected the concept of a universal human nature. The constructionist approach to information behavior research creates space for contextualizing the behavior based on the social experiences of the individual. One study that approaches information behavior research through the social constructionist approach is a study of the information behavior of a public library knitting group. The authors use a collectivist theory to frame their research, which denies the universality of information behavior and focuses on "understanding the ways that discourse communities collectively construct information needs, seeking, sources, and uses". === Constructivist approach === The constructivist approach is born out of education and sociology in which, "individuals are seen as actively constructing an understanding of their worlds, heavily influenced by the social world(s) in which they are operating". Constructivist approaches to information behavior research generally treat the individual's reality as constructed within their own mind rather than built by the society in which they live. The constructivist metatheory makes space for the influence of society and culture with social constructivism, "which argues that, while the mind constructs reality in its relationship to the world, this mental process is significantly informed by influences received from societal conventions, history and interaction with significant others". == Theories == A common concern among LIS researchers, and a prominent discussion in the field, is the broad spectrum of theories that inform LIS research. This variation has been noted as a cause of concern because it makes individual studies difficult to compare if they are not guided by the same theory. Recent studies have shown that the impact of these theories and theoretical models is very limited. LIS researchers have applied concepts and theories from many disciplines, including sociology, psychology, communication, organizational behavior, and computer science. === Wilson's theory of information behavior (1981) === The term was coined by Thomas D. Wilson in his 1981 paper, on the grounds that the current term, 'information needs' was unhelpful since 'need' could not be directly observed, while how people behaved in seeking information could be observed and investigated. However, there is increasing work in the information-searching field that is relating behaviors to underlying needs. In 2000, Wilson described information behavior as the totality of human behavior in relation to sources and channels of information, including both active and passive information-seeking, and information use. He described info

    Read more →
  • Artificial intelligence in India

    Artificial intelligence in India

    The artificial intelligence (AI) market in India is projected to reach $8 billion by 2025, growing at 40% CAGR from 2020 to 2025. This growth is part of the broader AI boom, a global period of rapid technological advancements with India being pioneer starting in the early 2010s with NLP based Chatbots from Haptik, Corover.ai, Niki.ai and then gaining prominence in the early 2020s based on reinforcement learning, marked by breakthroughs such as generative AI models from Krutrim, Sarvam, CoRover, OpenAI and Alphafold by Google DeepMind. In India, the development of AI has been similarly transformative, with applications in healthcare, finance, and education, bolstered by government initiatives like NITI Aayog's 2018 National Strategy for Artificial Intelligence. Institutions such as the Indian Statistical Institute and the Indian Institute of Science published breakthrough AI research papers and patents. India's transformation to AI is primarily being driven by startups and government initiatives & policies like Digital India. By fostering technological trust through digital public infrastructure, India is tackling socioeconomic issues by taking a bottom-up approach to AI. NASSCOM and Boston Consulting Group estimate that by 2027, India's AI services might be valued at $17 billion. According to 2025 Technology and Innovation Report, by UN Trade and Development, India ranks 10th globally for private sector investments in AI. According to Mary Meeker, India has emerged as a key market for AI platforms, accounting for the largest share of ChatGPT's mobile app users and having the third-largest user base for DeepSeek in 2025. While AI presents significant opportunities for economic growth and social development in India, challenges such as data privacy concerns, skill shortages, and ethical considerations need to be addressed for responsible AI deployment. The growth of AI in India has also led to an increase in the number of cyberattacks that use AI to target organizations. == History == === Early days (1960s-1980s) === The TIFRAC (Tata Institute of Fundamental Research Automatic Calculator) was designed and developed by a team led by Rangaswamy Narasimhan between 1954 and 1960. He worked on pattern recognition from 1961 to 1964 at the University of Illinois Urbana-Champaign's Digital Computer Laboratory. In order to conduct research on database technology, computer networking, computer graphics, and systems software, he and M. G. K. Menon founded the National Centre for Software Development and Computing Techniques. In 1965, he established the Computer Society of India and supervised the initial research work on AI at Tata Institute of Fundamental Research. Jagdish Lal launched the first computer science program in 1976 at Motilal Nehru Regional Engineering College. H. K. Kesavan from the University of Waterloo and Vaidyeswaran Rajaraman from the University of Wisconsin–Madison joined the IIT Kanpur Electrical Engineering Department in 1963–1964 as Assistant Professor and Head of Department, respectively. H.N. Mahabala, who was employed at Bendix Corporation's Computer Division, joined the department in 1965. He previously worked with Marvin Minsky. The IIT Kanpur Computer Center was led by H. K. Kesavan, with Vaidyeswaran Rajaraman serving as his deputy. Kesavan informally permitted Rajaraman and Mahabala to introduce artificial intelligence into computer science classes. The computer science program was approved by IIT Kanpur in 1971 and split out from the electrical engineering department. In 1973, an IBM System/370 Model 155 was installed at IIT Madras. John McCarthy, head of the Artificial Intelligence Laboratory at Stanford University visited IIT Kanpur in 1971. He donated PDP-1 with a time-sharing operating system. During the 1970s, the balance of payments deficit in India restricted import of computers. The Department of Computer Science and Automation at the Indian Institute of Science established in 1969, played an important role in nurturing the development of data science and artificial intelligence in India. First course on AI was introduced in the 1970s by G. Krishna. B. L. Deekshatulu introduced the first course on pattern recognition in the early 1970s. === Foundation phase === ==== 1980s ==== In the 1980s, the Indian Statistical Institute's Optical Character Recognition Project was one of the country's first attempts at studying artificial intelligence and machine learning. OCR technology has benefited greatly from the work of ISI's Computer Vision and Pattern Recognition Unit, which is headed by Bidyut Baran Chaudhuri. He also contributed in the development of computer vision and digital image processing. As part of the Indian Fifth Generation Computer Systems Research Programme, the Department of Electronics, with support from the United Nations Development Programme, initiated the Knowledge Based Computer Systems Project in 1986, marking the beginning of India's first major AI research program. Prime Minister Rajiv Gandhi requested that the Department of Electronics and IISc to initiate the Parallel Processing Project in 1986–1987. The Center for Development of Advanced Computing eventually joined those efforts. IIT Madras was selected to develop system diagnosis, ISI for image processing, National Centre for Software Technology for natural language processing and TIFR for speech processing. In 1987, the proposal of N. Seshagiri, Director General of the National Informatics Centre for the prototype development of supercomputer was cleared. Negotiations for a Cray supercomputer were underway between the Reagan administration and the Rajiv Gandhi government. US Defense Secretaries Frank Carlucci and Caspar Weinberger visited New Delhi after the US approved the transfer in 1988. The sale of a lower-end XMP-14 supercomputer was permitted in lieu of the Cray XMP-24 supercomputer due to security concerns. The Center for Development of Advanced Computing was formally established in March 1988 by the Ministry of Communications and Information Technology (previously the Ministry of IT) within the Department of Information Technology (formerly the Department of Electronics) in response to a recommendation made to the Prime Minister by the Scientific Advisory Council. The National Initiative in Supercomputing, which produced the PARAM series, was led by Vijay P. Bhatkar. For the first ten years, supercomputing and Indian language computing were the two main focus areas. C-DAC has expanded its operations in order to meet the needs in a number of domains, including network and internet software, real-time systems, artificial intelligence, and NLP. Under the direction of Professor KV Ramakrishnamacharyulu from National Sanskrit University and Professor Rajeev Sangal from the International Institute of Information Technology, Hyderabad, the Akshar Bharati Research Group was established in 1984 with support from IIT Kanpur and the University of Hyderabad for computational processing of Indian languages. They focused on computational linguistics, NLP with ontological database systems, and Indian language/translation theories with linguistic tradition. ==== 1990s ==== From IIT Kanpur, Mohan Tambe joined C-DAC in the 1990s to work on Graphics and Intelligence based Script Technology (GIST), which addressed the challenge of adapting personal computer software based on Latin script to Devanagiri and a number of other Indian language scripts. He was previously working on the Machine Translation for Indian languages Project. Within C-DAC, he established the GIST group. The technology was expanded to encompass NLP, artificial intelligence-based machine-aided language learning and translation, multimedia and multilingual computing solutions, and more. GIST resulted in the creation of G-CLASS (GIST cross language search plug-ins suite), a cross-language search engine. The Applied Artificial Intelligence Group at C-DAC has developed some basic and novel applications in the field of NLP, including machine translation, information extraction/retrieval, automatic summarization, speech recognition, text-to-speech synthesis, intelligent language teaching, and natural language-based document management with Decision Support Systems. These applications are the result of the foundation laid by previous language technology activities. Software firms in the Indian private sector began looking into AI applications, mostly in the area of business process automation. In order to allow machines to read, comprehend, and interpret human languages, the Language Technologies Research Center was founded in October 1999 at the International Institute of Information Technology, Hyderabad. It focused on the advancements in semantic parsing, information extraction, natural language generation, sentiment analysis, and dialogue systems. Some of the early AI research in India was driven by societal needs. For example; Eklavya, a knowledge-based program created by I

    Read more →
  • Two-phase commit protocol

    Two-phase commit protocol

    In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC, tupac) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction. This protocol (a specialised type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable). The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol. == Assumptions == The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that: there is stable storage at each node with a write-ahead log, no node crashes forever, the data in the write-ahead log is never lost or corrupted in a crash, and any two nodes can communicate with each other. The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost. The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant. == Basic algorithm == === Commit request (or voting) phase === The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants. The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log. Each participant replies with: either an agreement message (participant votes Yes to commit), if the participant's actions succeeded; or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit. === Commit (or completion) phase === ==== Success ==== If the coordinator received an agreement message from all participants during the commit-request phase: The coordinator sends a commit message to all the participants. Each participant completes the operation, and releases all the locks and resources held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator completes the transaction when all acknowledgements have been received. ==== Failure ==== If any participant votes No during the commit-request phase (or the coordinator's timeout expires): The coordinator sends a rollback message to all the participants. Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator undoes the transaction when all acknowledgements have been received. ==== Message flow ==== Coordinator Participant QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare/abort <------------------------------- commit/abort COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGEMENT commit/abort <-------------------------------- end An next to the record type means that the record is forced to stable storage. == Disadvantages == The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received. A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond. == Implementing the two-phase commit protocol == === Common architecture === In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named transaction managers (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., The Open Group's X/Open XA). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively. This common architecture is also effective for the distribution of other atomic commitment protocols besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants. === Protocol optimizations === Database research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations and protocol operations saving under certain system's behavior assumptions. ==== Presumed abort and presumed commit ==== Presumed abort or Presumed commit are common such optimizations. An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typical

    Read more →
  • Rclone

    Rclone

    Rclone is an open source, multi threaded, command line computer program to manage or migrate content on cloud and other high latency storage. Its capabilities include sync, transfer, crypt, cache, union, compress and mount. The rclone website lists supported backends including S3 and Google Drive. Descriptions of rclone often carry the strapline "Rclone syncs your files to cloud storage". Those prior to 2020 include the alternative "Rsync for Cloud Storage". Rclone is well known for its rclone sync and rclone mount commands. It provides further management functions analogous to those ordinarily used for files on local disks, but which tolerate some intermittent and unreliable service. Rclone is commonly used with media servers such as Plex, Emby or Jellyfin to stream content direct from consumer file storage services. Official Ubuntu, Debian, Fedora, Gentoo, Arch, Brew, Chocolatey, and other package managers include rclone. == History == Nick Craig-Wood was inspired by rsync. Concerns about the noise and power costs arising from home computer servers prompted him to embrace cloud storage and he began developing rclone as open source software in 2012 under the name swiftsync. Rclone was promoted to stable version 1.00 in July 2014. In May 2017, Amazon Drive barred new users of rclone and other upload utilities, citing security concerns. Amazon Drive had been advertised as offering unlimited storage for £55 per year. Amazon's AWS S3 service continues to support new rclone users. The original rclone logo was updated in September 2018. In March 2020, Nick Craig-Wood resigned from Memset Ltd, a cloud hosting company he founded, to focus on open source software. Amazon's AWS April 2020 public sector blog explained how the Fred Hutch Cancer Research Center were using rclone in their Motuz tool to migrate very large biomedical research datasets in and out of AWS S3 object stores. In November 2020, rclone was updated to correct a weakness in the way it generated passwords. Passwords for encrypted remotes can be generated randomly by rclone or supplied by the user. In all versions of rclone from 1.49.0 to 1.53.2 the seed value for generated passwords was based on the number of seconds elapsed in the day, and therefore not truly random. CVE-2020-28924 recommended users upgrade to the latest version of rclone and check the passwords protecting their encrypted remotes. Release 1.55 of rclone in March 2021 included features sponsored by CERN and their CS3MESH4EOSC project. The work was EU funded to promote vendor-neutral application programming interfaces and protocols for synchronisation and sharing of academic data on cloud storage. == Backends and commands == Rclone supports the following services as backends. There are others, built on standard protocols such as WebDAV or S3, that work. WebDAV backends do not support rclone functionality dependent on server side checksum or modtime. Remotes are usually defined interactively from these backends, local disk, or memory (as S3), with rclone config. Rclone can further wrap those remotes with one or more of alias, chunk, compress, crypt or union, remotes. Once defined, the remotes are referenced by other rclone commands interchangeably with the local drive. Remote names are followed by a colon to distinguish them from local drives. For example, a remote example_remote containing a folder, or pseudofolder, myfolder is referred to within a command as a path example_remote:/myfolder. Rclone commands directly apply to remotes, or mount them for file access or streaming. With appropriate cache options the mount can be addressed as if a conventional, block level disk. Commands are provided to serve remotes over SFTP, HTTP, WebDAV, FTP and DLNA. Commands can have sub-commands and flags. Filters determine which files on a remote that rclone commands are applied to. rclone rc passes commands or new parameters to existing rclone sessions and has an experimental web browser interface. === Crypt remotes === Rclone's crypt implements encryption of files at rest in cloud storage. It layers an encrypted remote over a pre-existing, cloud or other remote. Crypt is commonly used to encrypt / decrypt media, for streaming, on consumer storage services such as Google Drive. Rclone's configuration file contains the crypt password. The password can be lightly obfuscated, or the whole rclone.conf file can be encrypted. Crypt can either encrypt file content and name, or additionally full paths. In the latter case there is a potential clash with encryption for cloud backends, such as Microsoft OneDrive, having limited path lengths. Crypt remotes do not encrypt object modification time or size. The encryption mechanism for content, name and path is available, for scrutiny, on the rclone website. Key derivation is with scrypt. === Example syntax (Linux) === These examples describe paths and file names but object keys behave similarly. To recursively copy files from directory remote_stuff, at the remote xmpl, to directory stuff in the home folder:- -v enables logging and -P, progress information. By default rclone checks the file integrity (hash) after copy; can retry each file up to three times if the operation is interrupted; uses up to four parallel transfer threads, and does not apply bandwidth throttling. Running the above command again copies any new or changed files at the remote to the local folder but, like default rsync behaviour, will not delete from the local directory, files which have been removed from the remote. To additionally delete files from the local folder which have been removed from the remote - more like the behaviour of rsync with a --delete flag:- And to delete files from the source after they have been transferred to the local directory - more like the behaviour of rsync with a --remove-source-file flag:- To mount the remote directory at a mountpoint in the pre-existing, empty stuff directory in the home directory (the ampersand at the end makes the mount command run as a background process):- Default rclone syntax can be modified. Alternative transfer, filter, conflict and backend specific flags are available. Performance choices include number of concurrent transfer threads; chunk size; bandwidth limit profiling, and cache aggression. == Academic evaluation == In 2018, University of Kentucky researchers published a conference paper comparing use of rclone and other command line, cloud data transfer agents for big data. The paper was published as a result of funding by the National Science Foundation. Later that year, University of Utah's Center for High Performance Computing examined the impact of rclone options on data transfer rates. == Rclone use at HPC research sites == Examples are University of Maryland, Iowa State University, Trinity College Dublin, NYU, BYU, Indiana University, CSC Finland, Utrecht University, University of Nebraska, University of Utah, North Carolina State University, Stony Brook, Tulane University, Washington State University, Georgia Tech, National Institutes of Health, Wharton, Yale, Harvard, Minnesota, Michigan State, Case Western Reserve University, University of South Dakota, Northern Arizona University, University of Pennsylvania, Stanford, University of Southern California, UC Santa Barbara, UC Irvine, UC Berkeley, and SURFnet. == Rclone and cybercrime == May 2020 reports stated rclone had been used by hackers to exploit Diebold Nixdorf ATMs with ProLock ransomware. The FBI issued a Flash Alert MI-000125-MW on May 4, 2020, in relation to the compromise. They issued a further, related alert 20200901–001 in September 2020. Attackers had exfiltrated / encrypted data from organisations involved in healthcare, construction, finance, and legal services. Multiple US government agencies, and industrial entities were affected. Researchers established the hackers spent about a month exploring the breached networks, using rclone to archive stolen data to cloud storage, before encrypting the target system. Reported targets included LaSalle County, and the city of Novi Sad. The FBI warned January 2021, in Private Industry Notification 20210106–001, of extortion activity using Egregor ransomware and rclone. Organisations worldwide had been threatened with public release of exfiltrated data. In some cases rclone had been disguised under the name svchost. Bookseller Barnes & Noble, US retailer Kmart, games developer Ubisoft and the Vancouver metro system have been reported as victims. An April 2021, cybersecurity investigation into SonicWall VPN zero-day vulnerability SNWLID-2021-0001 by FireEye's Mandiant team established attackers UNC2447 used rclone for reconnaissance and exfiltration of victims' files. Cybersecurity and Infrastructure Security Agency Analysis Report AR21-126A confirmed this use of rclone in FiveHands ransomware attacks. A June 2021, Microsoft Security Intelligence Twitter post identified use of rclone in BazaCall cyber attacks. The attackers sent emails e

    Read more →
  • Hall circles

    Hall circles

    Hall circles (also known as M-circles and N-circles) are a graphical tool in control theory used to obtain values of a closed-loop transfer function from the Nyquist plot (or the Nichols plot) of the associated open-loop transfer function. Hall circles have been introduced in control theory by Albert C. Hall in his thesis. == Construction == Consider a closed-loop linear control system with open-loop transfer function given by transfer function G ( s ) {\displaystyle G(s)} and with a unit gain in the feedback loop. The closed-loop transfer function is given by T ( s ) = G ( s ) 1 + G ( s ) {\textstyle T(s)={\frac {G(s)}{1+G(s)}}} . To check the stability of T(s), it is possible to use the Nyquist stability criterion with the Nyquist plot of the open-loop transfer function G(s). Note, however, that the Nyquist plot of G(s) does not give the actual values of T(s). To get this information from the G(s)-plane, Hall proposed to construct the locus of points in the G(s)-plane such that T(s) has constant magnitude and also the locus of points in the G(s)-plane such that T(s) has constant phase angle. Given a positive real value M representing a fixed magnitude, and denoting G(s) by z, the points satisfying M = | T ( s ) | = | G ( s ) | | 1 + G ( s ) | = | z | | 1 + z | {\displaystyle M=|T(s)|={\frac {|G(s)|}{|1+G(s)|}}={\frac {|z|}{|1+z|}}} are given by the points z in the G(s)-plane such that the ratio of the distance between z and 0 and the distance between z and -1 is equal to M. The points z satisfying this locus condition are circles of Apollonius, and this locus is known in the context of control systems as M-circles. Given a positive real value N representing a phase angle, the points satisfying N = arg ⁡ [ G ( s ) 1 + G ( s ) ] = arg ⁡ [ G ( s ) ] − arg ⁡ [ 1 + G ( s ) ] = arg ⁡ [ z ] − arg ⁡ [ 1 + z ] {\displaystyle N=\arg \left[{\frac {G(s)}{1+G(s)}}\right]=\arg[G(s)]-\arg[1+G(s)]=\arg[z]-\arg[1+z]} are given by the points z in the G(s)-plane such that the angle between -1 and z and the angle between 0 and z is constant. In other words, the angle opposed to the line segment between -1 and 0 must be constant. This implies that the points z satisfying this locus condition are arcs of circles, and this locus is known in the context of control systems as N-circles. == Usage == To use the Hall circles, a plot of M and N circles is done over the Nyquist plot of the open-loop transfer function. The points of the intersection between these graphics give the corresponding value of the closed-loop transfer function. Hall circles are also used with the Nichols plot and in this setting, are also known as Nichols chart. Rather than overlaying directly the Hall circles over the Nichols plot, the points of the circles are transferred to a new coordinate system where the ordinate is given by 20 log 10 ⁡ ( | G ( s ) | ) {\displaystyle 20\log _{10}(|G(s)|)} and the abscissa is given by arg ⁡ ( G ( s ) ) {\displaystyle \arg(G(s))} . The advantage of using Nichols chart is that adjusting the gain of the open loop transfer function directly reflects in up and down translation of the Nichols plot in the chart.

    Read more →
  • Ontology engineering

    Ontology engineering

    In computer science, information science and systems engineering, ontology engineering is a field which studies the methods and methodologies for building ontologies, which encompasses a representation, formal naming and definition of the categories, properties and relations between the concepts, data and entities of a given domain of interest. In a broader sense, this field also includes a knowledge construction of the domain using formal ontology representations such as OWL/RDF. A large-scale representation of abstract concepts such as actions, time, physical objects and beliefs would be an example of ontological engineering. Ontology engineering is one of the areas of applied ontology, and can be seen as an application of philosophical ontology. Core ideas and objectives of ontology engineering are also central in conceptual modeling. Ontology engineering aims at making explicit the knowledge contained within software applications, and within enterprises and business procedures for a particular domain. Ontology engineering offers a direction towards solving the inter-operability problems brought about by semantic obstacles, i.e. the obstacles related to the definitions of business terms and software classes. Ontology engineering is a set of tasks related to the development of ontologies for a particular domain. Automated processing of information not interpretable by software agents can be improved by adding rich semantics to the corresponding resources, such as video files. One of the approaches for the formal conceptualization of represented knowledge domains is the use of machine-interpretable ontologies, which provide structured data in, or based on, RDF, RDFS, and OWL. Ontology engineering is the design and creation of such ontologies, which can contain more than just the list of terms (controlled vocabulary); they contain terminological, assertional, and relational axioms to define concepts (classes), individuals, and roles (properties) (TBox, ABox, and RBox, respectively). Ontology engineering is a relatively new field of study concerning the ontology development process, the ontology life cycle, the methods and methodologies for building ontologies, and the tool suites and languages that support them. A common way to provide the logical underpinning of ontologies is to formalize the axioms with description logics, which can then be translated to any serialization of RDF, such as RDF/XML or Turtle. Beyond the description logic axioms, ontologies might also contain SWRL rules. The concept definitions can be mapped to any kind of resource or resource segment in RDF, such as images, videos, and regions of interest, to annotate objects, persons, etc., and interlink them with related resources across knowledge bases, ontologies, and LOD datasets. This information, based on human experience and knowledge, is valuable for reasoners for the automated interpretation of sophisticated and ambiguous contents, such as the visual content of multimedia resources. Application areas of ontology-based reasoning include, but are not limited to, information retrieval, automated scene interpretation, and knowledge discovery. == Languages == An ontology language is a formal language used to encode the ontology. There are a number of such languages for ontologies, both proprietary and standards-based: Common logic is ISO standard 24707, a specification for a family of ontology languages that can be accurately translated into each other. The Cyc project has its own ontology language called CycL, based on first-order predicate calculus with some higher-order extensions. The Gellish language includes rules for its own extension and thus integrates an ontology with an ontology language. IDEF5 is a software engineering method to develop and maintain usable, accurate, domain ontologies. KIF is a syntax for first-order logic that is based on S-expressions. Rule Interchange Format (RIF), F-Logic and its successor ObjectLogic combine ontologies and rules. OWL is a language for making ontological statements, developed as a follow-on from RDF and RDFS, as well as earlier ontology language projects including OIL, DAML and DAML+OIL. OWL is intended to be used over the World Wide Web, and all its elements (classes, properties and individuals) are defined as RDF resources, and identified by URIs. OntoUML is a well-founded language for specifying reference ontologies. SHACL (RDF SHapes Constraints Language) is a language for describing structure of RDF data. It can be used together with RDFS and OWL or it can be used independently from them. XBRL (Extensible Business Reporting Language) is a syntax for expressing business semantics. == Methodologies and tools == DOGMA KAON OntoClean HOZO Protégé (software) Large language models == In life sciences == Life sciences is flourishing with ontologies that biologists use to make sense of their experiments. For inferring correct conclusions from experiments, ontologies have to be structured optimally against the knowledge base they represent. The structure of an ontology needs to be changed continuously so that it is an accurate representation of the underlying domain. Recently, an automated method was introduced for engineering ontologies in life sciences such as Gene Ontology (GO), one of the most successful and widely used biomedical ontology. Based on information theory, it restructures ontologies so that the levels represent the desired specificity of the concepts. Similar information theoretic approaches have also been used for optimal partition of Gene Ontology. Given the mathematical nature of such engineering algorithms, these optimizations can be automated to produce a principled and scalable architecture to restructure ontologies such as GO. Open Biomedical Ontologies (OBO), a 2006 initiative of the U.S. National Center for Biomedical Ontology, provides a common 'foundry' for various ontology initiatives, amongst which are: The Generic Model Organism Project (GMOD) Gene Ontology Consortium Sequence Ontology Ontology Lookup Service The Plant Ontology Consortium Standards and Ontologies for Functional Genomics and more

    Read more →
  • UI data binding

    UI data binding

    UI data binding is a software design pattern to simplify development of GUI applications. UI data binding binds UI elements to an application domain model. Most frameworks employ the Observer pattern as the underlying binding mechanism. To work efficiently, UI data binding has to address input validation and data type mapping. A bound control is a widget whose value is tied or bound to a field in a recordset (e.g., a column in a row of a table). Changes made to data within the control are automatically saved to the database when the control's exit event triggers. == Example == == Data binding frameworks and tools == === Delphi === DSharp third-party data binding tool OpenWire Visual Live Binding - third-party visual data binding tool === Java === JFace Data Binding JavaFX Property === .NET === Windows Forms data binding overview WPF data binding overview Avalonia Unity 3D data binding framework (available in modifications for NGUI, iGUI and EZGUI libraries) === JavaScript === Angular AngularJS Backbone.js Ember.js Datum.js knockout.js Meteor, via its Blaze live update engine OpenUI5 React Vue.js

    Read more →
  • Gas (app)

    Gas (app)

    Gas (sometimes stylized in all caps), formerly known as Melt as well as Crush, was an American anonymous social media app. Launched in August 2022, the app is oriented towards high schoolers. The app was developed by Nikita Bier, Isaiah Turner, and former Facebook engineer Dave Schatz. Gas was largely based upon the prior tbh app developed by co-founder Nikita Bier, along with Erik Hazzard, Kyle Zaragoza, and Nicolas Ducdodon in September 2017. tbh was acquired by Facebook inc. (now Meta Platforms) on October 16, 2017, and nearly a year later in July 2018 was dissolved, owing to low usage. Gas follows a similar purpose to tbh in being a social media app oriented towards high schoolers. In the app, users participate in anonymous polls regarding pre-written complimentary statements to their peers, such as "I'd say yes if (blank) asked me out on a date," "I think (blank) is the coolest kid in school," or "would make an ugly face and still look pretty." Winners of said polls receive a "flame." The name of the app is derived from this, with "gassing someone up" being Gen Z slang for complimenting someone. Users can pay a $6.99 subscription that enables "God Mode," which shows hints regarding who voted for them in a poll. Gas overtook TikTok and BeReal as the most downloaded app on the Apple App Store in October 2022 (the app is currently not available for Android). The app has over 5.1 million downloads as of early November 2022, over a million active users and 300 thousand daily downloads as of October 2022. Currently, the app is available in Canada and the majority of the United States. On January 17, 2023, Gas was acquired by Discord, however it would remain a standalone app and its developers became Discord staff members. On October 18, 2023, Discord announced that service for Gas would be permanently ending effective November 7, 2023, due to a steep decline in users. Effective November 7, the app became completely unusable. == Controversy regarding human-trafficking == Beginning in October 2022, rumors spread largely throughout TikTok and Snapchat alleged that the app was linked to human trafficking (in particular sex trafficking). According to Bier, the rumor originated with a single user review from China on October 5, and then was disseminated through TikTok accounts with "few to no US teen followers." Although largely dismissed as a hoax by experts, who cite how the app doesn't log user locations and general anonymity, the hoax became pervasive to the extent that various police departments, school systems, and local news outlets began issuing warnings regarding the app. For instance, on October 31, 2022, the police department of Piedmont, Oklahoma issued a warning to parents, encouraging them to check their children's phones, while on November 3, the Oklahoma Oktaha Public School system stated in a Facebook post that "Children are being kidnapped in other towns and this new app is thought to be the source of predators finding their location." (both statements have since been retracted by Police Chief Scott Singer and Superintendent Jerry Needham respectively). Additionally, local medial outlets such as KOCO in Oklahoma City ran stories making similar statements. The rumor had a negative impact on the app, with downloads plateauing for a two-week period in late October and with 3% of users in a single day reportedly uninstalling the app. Revenue and ratings have also reportedly dropped and the company's social media accounts have been bombarded with comments labeling them as sex-traffickers. Additionally, the four-person development team has reportedly been bombarded with various death threats as a result.

    Read more →
  • Very large database

    Very large database

    A very large database, (originally written very large data base) or VLDB, is a database that contains a very large amount of data, so much that it can require specialized architectural, management, processing and maintenance methodologies. == Definition == The vague adjectives of very and large allow for a broad and subjective interpretation, but attempts at defining a metric and threshold have been made. Early metrics were the size of the database in a canonical form via database normalization or the time for a full database operation like a backup. Technology improvements have continually changed what is considered very large. One definition has suggested that a database has become a VLDB when it is "too large to be maintained within the window of opportunity… the time when the database is quiet". == Sizes of a VLDB database == There is no absolute amount of data that can be cited. For example, one cannot say that any database with more than 1 TB of data is considered a VLDB. This absolute amount of data has varied over time as computer processing, storage and backup methods have become better able to handle larger amounts of data. That said, VLDB issues may start to appear when 1 TB is approached, and are more than likely to have appeared as 30 TB or so is exceeded. == VLDB challenges == Key areas where a VLDB may present challenges include configuration, storage, performance, maintenance, administration, availability and server resources. === Configuration === Careful configuration of databases that lie in the VLDB realm is necessary to alleviate or reduce issues raised by VLDB databases. === Administration === The complexities of managing a VLDB can increase exponentially for the database administrator as database size increases. === Availability and maintenance === When dealing with VLDB operations relating to maintenance and recovery such as database reorganizations and file copies which were quite practical on a non-VLDB take very significant amounts of time and resources for a VLDB database. In particular it typically infeasible to meet a typical recovery time objective (RTO), the maximum expected time a database is expected to be unavailable due to interruption, by methods which involve copying files from disk or other storage archives. To overcome these issues techniques such as clustering, cloned/replicated/standby databases, file-snapshots, storage snapshots or a backup manager may help achieve the RTO and availability, although individual methods may have limitations, caveats, license, and infrastructure requirements while some may risk data loss and not meet the recovery point objective (RPO). For many systems only geographically remote solutions may be acceptable. ==== Backup and recovery ==== Best practice is for backup and recovery to be architectured in terms of the overall availability and business continuity solution. === Performance === Given the same infrastructure there may typically be a decrease in performance, that is increase in response time as database size increases. Some accesses will simply have more data to process (scan) which will take proportionally longer (linear time); while the indexes used to access data may grow slightly in height requiring perhaps an extra storage access to reach the data (sub-linear time). Other effects can be caching becoming less efficient because proportionally less data can be cached and while some indexes such as the B+ automatically sustain well with growth others such as a hash table may need to be rebuilt. Should an increase in database size cause the number of accessors of the database to increase then more server and network resources may be consumed, and the risk of contention will increase. Some solutions to regaining performance include partitioning, clustering, possibly with sharding, or use of a database machine. ==== Partitioning ==== Partitioning may be able assist the performance of bulk operations on a VLDB including backup and recovery., bulk movements due to information lifecycle management (ILM), reducing contention as well as allowing optimization of some query processing. === Storage === In order to satisfy needs of a VLDB the database storage needs to have low access latency and contention, high throughput, and high availability. === Server resources === The increasing size of a VLDB may put pressure on server and network resources and a bottleneck may appear that may require infrastructure investment to resolve. == Relationship to big data == VLDB is not the same as big data, but the storage aspect of big data may involve a VLDB database. That said some of the storage solutions supporting big data were designed from the start to support large volumes of data, so database administrators may not encounter VLDB issues that older versions of traditional RDBMS's might encounter.

    Read more →
  • MarkLogic Server

    MarkLogic Server

    MarkLogic Server is a document-oriented database developed by MarkLogic. It is a NoSQL multi-model database that evolved from an XML database to natively store JSON documents and RDF triples, the data model for semantics. MarkLogic is designed to be a data hub for operational and analytical data. == History == MarkLogic Server was built to address shortcomings with existing search and data products. The product first focused on using XML as the document markup standard and XQuery as the query standard for accessing collections of documents up to hundreds of terabytes in size. Currently the MarkLogic platform is widely used in publishing, government, finance and other sectors. MarkLogic's customers are mostly Global 2000 companies. == Technology == MarkLogic uses documents without upfront schemas to maintain a flexible data model. In addition to having a flexible data model, MarkLogic uses a distributed, scale-out architecture that can handle hundreds of billions of documents and hundreds of terabytes of data. It has received Common Criteria certification, and has high availability and disaster recovery. MarkLogic is designed to run on-premises and within public or private cloud environments like Amazon Web Services. == Features == Indexing MarkLogic indexes the content and structure of documents including words, phrases, relationships, and values in over 200 languages with tokenization, collation, and stemming for core languages. Functionality includes the ability to toggle range indexes, geospatial indexes, the RDF triple index, and reverse indexes on or off based on your data, the kinds of queries that you will run, and your desired performance. Full-text search MarkLogic supports search across its data and metadata using a word or phrase and incorporates Boolean logic, stemming, wildcards, case sensitivity, punctuation sensitivity, diacritic sensitivity, and search term weighting. Data can be searched using JavaScript, XQuery, SPARQL, and SQL. Semantics MarkLogic uses RDF triples to provide semantics for ease of storing metadata and querying. ACID Unlike other NoSQL databases, MarkLogic maintains ACID consistency for transactions. Replication MarkLogic provides high availability with replica sets. Scalability MarkLogic scales horizontally using sharding. MarkLogic can run over multiple servers, balancing the load or replicating data to keep the system up and running in the event of hardware failure. Security MarkLogic has built in security features such as element-level permissions and data redaction. Optic API for Relational Operations An API that lets developers view their data as documents, graphs or rows. Security MarkLogic provides redaction, encryption, and element-level security (allowing for control on read and write rights on parts of a document). == Applications == Banking Big Data Fraud prevention Insurance Claims Management and Underwriting Master data management Recommendation engines == Licensing == MarkLogic is available under various licensing and delivery models, namely a free Developer or an Essential Enterprise license.[3] Licenses are available from MarkLogic or directly from cloud marketplaces such as Amazon Web Services and Microsoft Azure. == Releases == 2001 – Cerisent XQE 1: ACID transactions, Full-text search, XML Storage, XQuery, Role-based security 2004 – Cerisent XQE 2: Scale-out architecture, Enhanced search (stemming, thesaurus, wildcard), Backup and restore 2005 – MarkLogic Server 3: Continuing search improvements, Content Processing Framework (including PDF, Word, Excel, PPT), Failover 2008 – MarkLogic Server 4: Geospatial search, entity extraction, advanced XQuery, performance, scalability enhancements, auditing 2011 – MarkLogic Server 5: Flexible replication / DDIL, real-time indexing, advanced search, improved analytics, concurrency enhancements 2012 – MarkLogic Server 6: REST and Java APIs, App Builder, enhanced UI, improved search 2013 – MarkLogic Server 7: Semantic graph, bitemporal data, tiered storage, improved search, better management 2015 – MarkLogic Server 8: A Native JSON storage, Server-side JavaScript, Bitemporal, Node.js client API, Incremental backup, Flexible replication[16] 2017 – MarkLogic Server 9: Data integration across Relational and Non-Relational data, Advanced Encryption, Element Level Security, Redaction 2019 – MarkLogic Server 10: Enhanced Data Hub, improved SQL, security, analytics performance, cloud support 2022 – MarkLogic Server 11: MarkLogic Ops Director (Monitoring and Administration Improvements), expanded PKI 2025 – MarkLogic Server 12: Generative AI and Native Vector Search, Graph Algorithm Support, Virtual TDEs (relational views on the fly)

    Read more →