AI Chatbot Companies

AI Chatbot Companies — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cache language model

    Cache language model

    A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution. Statistical language models are key components of speech recognition systems and of many machine translation systems: they tell such systems which possible output word sequences are probable and which are improbable. The particular characteristic of a cache language model is that it contains a cache component and assigns relatively high probabilities to words or word sequences that occur elsewhere in a given text. The primary, but by no means sole, use of cache language models is in speech recognition systems. To understand why it is a good idea for a statistical language model to contain a cache component one might consider someone who is dictating a letter about elephants to a speech recognition system. Standard (non-cache) N-gram language models will assign a very low probability to the word "elephant" because it is a very rare word in English. If the speech recognition system does not contain a cache component, the person dictating the letter may be annoyed: each time the word "elephant" is spoken another sequence of words with a higher probability according to the N-gram language model may be recognized (e.g., "tell a plan"). These erroneous sequences will have to be deleted manually and replaced in the text by "elephant" each time "elephant" is spoken. If the system has a cache language model, "elephant" will still probably be misrecognized the first time it is spoken and will have to be entered into the text manually; however, from this point on the system is aware that "elephant" is likely to occur again – the estimated probability of occurrence of "elephant" has been increased, making it more likely that if it is spoken it will be recognized correctly. Once "elephant" has occurred several times, the system is likely to recognize it correctly every time it is spoken until the letter has been completely dictated. This increase in the probability assigned to the occurrence of "elephant" is an example of a consequence of machine learning and more specifically of pattern recognition. There exist variants of the cache language model in which not only single words but also multi-word sequences that have occurred previously are assigned higher probabilities (e.g., if "San Francisco" occurred near the beginning of the text subsequent instances of it would be assigned a higher probability). The cache language model was first proposed in a paper published in 1990, after which the IBM speech-recognition group experimented with the concept. The group found that implementation of a form of cache language model yielded a 24% drop in word-error rates once the first few hundred words of a document had been dictated. A detailed survey of language modeling techniques concluded that the cache language model was one of the few new language modeling techniques that yielded improvements over the standard N-gram approach: "Our caching results show that caching is by far the most useful technique for perplexity reduction at small and medium training data sizes". The development of the cache language model has generated considerable interest among those concerned with computational linguistics in general and statistical natural language processing in particular: recently, there has been interest in applying the cache language model in the field of statistical machine translation. The success of the cache language model in improving word prediction rests on the human tendency to use words in a "bursty" fashion: when one is discussing a certain topic in a certain context, the frequency with which one uses certain words will be quite different from their frequencies when one is discussing other topics in other contexts. The traditional N-gram language models, which rely entirely on information from a very small number (four, three, or two) of words preceding the word to which a probability is to be assigned, do not adequately model this "burstiness". Recently, the cache language model concept – originally conceived for the N-gram statistical language model paradigm – has been adapted for use in the neural paradigm. For instance, recent work on continuous cache language models in the recurrent neural network (RNN) setting has applied the cache concept to much larger contexts than before, yielding significant reductions in perplexity. Another recent line of research involves incorporating a cache component in a feed-forward neural language model (FN-LM) to achieve rapid domain adaptation.

    Read more →
  • Automated penetration testing

    Automated penetration testing

    Automated penetration testing (also known as autonomous penetration testing or automated offensive security) is the application of software-driven workflows and orchestration to simulate cyberattack techniques. These methods are used to identify, validate, and exploit security vulnerabilities in IT assets such as networks, applications, and cloud infrastructure. Automated penetration testing is the use of software to simulate cyberattacks in order to rapidly identify exploitable vulnerabilities across systems without relying solely on human testers. In technical literature, the term describes a spectrum of activities ranging from scripted exploit orchestration to experimental systems designed for fully autonomous attack planning. Automated Penetration Testing falls short of testing using manual experts in terms of discovery of deep complex vulnerabilities and contextual business logic vulnerabilities. == Terminology and scope == The label “automated penetration testing” appears frequently in vendor and practitioner writing but lacks a single, neutral, standards-based definition. In the literature the term’s scope varies: some authors use it to mean automation of specific penetration-testing tasks (scanning, exploitation attempts, evidence collection), others to describe integrated, repeatable assessment pipelines, and a smaller body of work investigates autonomous decision-making agents that select attack steps algorithmically. To avoid implying consensus, this article describes common techniques and architectures reported in the literature and industry, and it notes where claims are primarily found in practitioner publications or early-stage research. Its important to note the differences between automated penetration testing and traditional penetration testing using human skill. The most important difference is scope and speed. Automated penetration testing generally fails at discovering exposures and weakness associated with business logic due to a lack of contextual understanding. The benefit of Automated Penetration testing is speed at which it can be conducted. Traditional penetration testing also is expected to be accurate and contain no false positives. This is due to the human validation aspect of the test. Automated approaches are expected to contain mistakes and false positives which need to be validated upon completion of the test. == History == Automated offensive techniques build on decades of tools and scripting that aided vulnerability discovery and exploitation. Early vulnerability scanners and community scripting in the 1990s and 2000s created the first layers of automation. Later, modular exploitation frameworks (notably Metasploit) integrated scanning and exploitation modules and made automated proof-of-concept attacks more accessible. Over the 2010s–2020s, as cloud platforms, APIs and continuous delivery practices increased the need for frequent validation, academic and industry interest in formalizing automated approaches also grew. == Methodologies and architectures == Descriptions in the literature and technical reports cluster automated capabilities into several overlapping models: Scripted/engineered playbooks (task automation): Predefined workflows or playbooks encode common attack paths (for example, web application exploit sequences or lateral-movement chains). These playbooks are designed to reproduce known techniques in a controlled way to validate exploitability and reduce manual repetition. Exploit-oriented orchestration: Automation orchestrates exploitation modules from established frameworks to perform controlled proof-of-concept attacks that confirm exploitability rather than simply flagging potential weaknesses. This approach can reduce false positives versus passive scanning when tests are run in an appropriately controlled environment. Orchestrated multi-tool pipelines: A coordinated toolchain integrates reconnaissance, vulnerability scanning, credential testing, exploitation modules and reporting. Data and state persist across stages so that multi-step workflows (e.g., discover → escalate → pivot) can be executed repeatably, approximating manual penetration-test methodologies at larger scale. Continuous / CI-integrated testing: Automation embedded in build or deployment pipelines (CI/CD) triggers assessments automatically on new builds, configuration changes, or on a schedule, supporting frequent, repeatable validation aligned with DevOps practices. Academic theses and experimental work describe CI/CD-integrated proof-of-concept systems for web applications and internal networks. Research on autonomous planning and learning: Recent academic work explores machine learning and reinforcement-learning approaches to select or prioritise attack steps, generate attack sequences, or optimize the testing path; these approaches are largely experimental and raise distinct validation and safety questions. == Tools and vendors == Automated penetration testing is provided by a mix of open-source projects, commercial platforms, and professional services. These often follow the penetration testing as a service (PTaaS) model, which integrates automated scanning with manual validation by security analysts. Examples of widely known tools and vendors in the space include exploitation frameworks such as Metasploit, commercial automated platforms and PTaaS providers, and specialist vendors that offer breach-and-attack simulation (BAS) or continuous testing capabilities. == Applications and deployment models == In industry practice, some organizations deploy automated techniques through dedicated security validation platforms rather than bespoke toolchains. These platforms are typically used for continuous or scheduled validation in pre-production or controlled environments and are often positioned alongside, rather than in place of, human-led penetration testing. Examples discussed in secondary literature include platforms such as Pentera, which are commonly classified under breach-and-attack simulation or automated security validation rather than as standalone penetration-testing methodologies.

    Read more →
  • Channel (digital image)

    Channel (digital image)

    Color digital images are made of pixels, and pixels are made of combinations of primary colors represented by a series of code. A channel in this context is the grayscale image of the same size as a color image, made of just one of these primary colors. For instance, an image from a standard digital camera will have a red, green and blue channel. A grayscale image has just one channel. In geographic information systems, channels are often referred to as raster bands. Another closely related concept is feature maps, which are used in convolutional neural networks. == Overview == In the digital realm, there can be any number of conventional primary colors making up an image; a channel in this case is extended to be the grayscale image based on any such conventional primary color. By extension, a channel is any grayscale image of the same dimension as and associated with the original image. Channel is a conventional term used to refer to a certain component of an image. In reality, any image format can use any algorithm internally to store images. For instance, GIF images actually refer to the color in each pixel by an index number, which refers to a table where three color components are stored. However, regardless of how a specific format stores the images, discrete color channels can always be determined, as long as a final color image can be rendered. The concept of channels is extended beyond the visible spectrum in multispectral and hyperspectral imaging. In that context, each channel corresponds to a range of wavelengths and contains spectroscopic information. The channels can have multiple widths and ranges. Three main channel types (or color models) exist, and have respective strengths and weaknesses. === RGB images === An RGB image has three channels: red, green, and blue. RGB channels roughly follow the color receptors in the human eye, and are used in computer displays and image scanners. If the RGB image is 24-bit (the industry standard as of 2005), each channel has 8 bits, for red, green, and blue—in other words, the image is composed of three images (one for each channel), where each image can store discrete pixels with conventional brightness intensities between 0 and 255. If the RGB image is 48-bit (very high color-depth), each channel has 16-bit per pixel color, that is 16-bit red, green, and blue for each per pixel. ==== RGB color sample ==== Notice how the grey trees have similar brightness in all channels, the red dress is much brighter in the red channel than in the other two, and how the green part of the picture is shown much brighter in the green channel. === YUV === YUV images are an affine transformation of the RGB colorspace, originated in broadcasting. The Y channel correlates approximately with perceived intensity, whilst the U and V channels provide colour information. === CMYK === A CMYK image has four channels: cyan, magenta, yellow, and key (black). CMYK is the standard for print, where subtractive coloring is used. A 32-bit CMYK image (the industry standard as of 2005) is made of four 8-bit channels, one for cyan, one for magenta, one for yellow, and one for key color (typically is black). 64-bit storage for CMYK images (16-bit per channel) is not common, since CMYK is usually device-dependent, whereas RGB is the generic standard for device-independent storage. ==== CMYK color sample ==== === HSV === HSV, or hue saturation value, stores color information in three channels, just like RGB, but one channel is devoted to brightness (value), and the other two convey colour information. The value channel is similar to (but not exactly the same as) the CMYK black channel, or its negative. HSV is especially useful in lossy video compression, where loss of color information is less noticeable to the human eye. == Alpha channel == The alpha channel stores transparency information—the higher the value, the more opaque that pixel is. No camera or scanner measures transparency, although physical objects certainly can possess transparency, but the alpha channel is extremely useful for compositing digital images together. Bluescreen technology involves filming actors in front of a primary color background, then setting that color to transparent, and compositing it with a background. The GIF and PNG image formats use alpha channels on the World Wide Web to merge images on web pages so that they appear to have an arbitrary shape even on a non-uniform background. == Other channels == In 3D computer graphics, multiple channels are used for additional control over material rendering; e.g., controlling specularity and so on. == Bit depth == In digitizing images, the color channels are converted to numbers. Since images contain thousands of pixels, each with multiple channels, channels are usually encoded in as few bits as possible. Typical values are 8 bits per channel or 16 bits per channel. Indexed color effectively gets rid of channels altogether to get, for instance, 3 channels into 8 bits (GIF) or 16 bits. == Optimized channel sizes == Since the brain does not necessarily perceive distinctions in each channel to the same degree as in other channels, it is possible that differing the number of bits allocated to each channel will result in more optimal storage; in particular, for RGB images, compressing the blue channel the most and the red channel the least may be better than giving equal space to each. Among other techniques, lossy video compression uses chroma subsampling to reduce the bit depth in color channels (hue and saturation), while keeping all brightness information (value in HSV). 16-bit HiColor stores red and blue in 5 bits, and green in 6 bits.

    Read more →
  • Biometric device

    Biometric device

    A biometric device is a security identification and authentication device. Such devices use automated methods of verifying or recognising the identity of a living person based on a physiological or behavioral characteristic. These characteristics include fingerprints, facial images, iris and voice recognition. == History == Biometric devices have been in use for thousands of years. Non-automated biometric devices have been in use since 500 BC, when ancient Babylonians would sign their business transactions by pressing their fingertips into clay tablets. Automation in biometric devices was first seen in the 1960s. The Federal Bureau of Investigation (FBI) in the 1960s, introduced the Indentimat, which started checking for fingerprints to maintain criminal records. The first systems measured the shape of the hand and the length of the fingers. Although discontinued in the 1980s, the system set a precedent for future Biometric Devices. == Subgroups == The characteristic of the human body is used to access information by the users. According to these characteristics, the sub-divided groups are Chemical biometric devices: Analyses the segments of the DNA to grant access to the users. Visual biometric devices: Analyses the visual features of the humans to grant access which includes iris recognition, face recognition, Finger recognition, and Retina Recognition. Behavioral biometric devices: Analyses the Walking Ability and Signatures (velocity of sign, width of sign, pressure of sign) distinct to every human. Olfactory biometric devices: Analyses the odor to distinguish between varied users. Auditory biometric devices: Analyses the voice to determine the identity of a speaker for accessing control. == Uses == === Workplace === Biometrics are being used to establish better and accessible records of the hour's employee's work. With the increase in "Buddy Punching" (a case where employees clocked out coworkers and fraudulently inflated their work hours) employers have looked towards new technology like fingerprint recognition to reduce such fraud. Additionally, employers are also faced with the task of proper collection of data such as entry and exit times. Biometric devices make for largely fool proof and reliable ways of enabling to collect data as employees have to be present to enter biometric details which are unique to them. === Immigration === As the demand for air travel grows and more people travel, modern-day airports have to implement technology in such a way that there are no long queues. Biometrics are being implemented in more and more airports as they enable quick recognition of passengers and hence lead to lower volume of people standing in queues. One such example is of the Dubai International Airport which plans to make immigration counters a relic of the past as they implement IRIS on the move technology (IOM) which should help the seamless departures and arrivals of passengers at the airport. === Handheld and personal devices === Fingerprint sensors can be found on mobile devices. The fingerprint sensor is used to unlock the device and authorize actions, like money and file transfers, for example. It can be used to prevent a device from being used by an unauthorized person. It is also used in attendance in number of colleges and universities. == Present day biometric devices == === Personal signature verification systems === This is one of the most highly recognised and acceptable biometrics in corporate surroundings. This verification has been taken one step further by capturing the signature while taking into account many parameters revolving around this like the pressure applied while signing, the speed of the hand movement and the angle made between the surface and the pen used to make the signature. This system also has the ability to learn from users as signature styles vary for the same user. Hence by taking a sample of data, this system is able to increase its own accuracy. === Iris recognition system === Iris recognition involves the device scanning the pupil of the subject and then cross referencing that to data stored on the database. It is one of the most secure forms of authentication, as while fingerprints can be left behind on surfaces, iris prints are extremely hard to be stolen. Iris recognition is widely applied by organisations dealing with the masses, one being the Aadhaar identification system issued by the Government of India to keep records of its population. The reason for this is that iris recognition makes use of iris prints of humans, which change little over the course of one's lifetime. == Problems with present day biometric devices == === Biometric spoofing === Biometric spoofing is a method of fooling a biometric identification management system, where a counterfeit mold is presented in front of the biometric scanner. This counterfeit mold emulates the unique biometric attributes of an individual so as to confuse the system between the artifact and the real biological target and gain access to sensitive data/materials. One such high-profile case of Biometric spoofing came to the limelight when it was found that German Defence Minister, Ursula von der Leyen's fingerprint had been successfully replicated by Chaos Computer Club. The group used high quality camera lenses and shot images from 6 feet away. They used a professional finger software and mapped the contours of the Ministers thumbprint. Although progress has been made to stop spoofing. Using the principle of pulse oximetry — the liveliness of the test subject is taken into account by measure of blood oxygenation and the heart rate. This reduces attacks like the ones mentioned above, although these methods aren't commercially applicable as costs of implementation are high. This reduces their real world application and hence makes biometrics insecure until these methods are commercially viable. === Accuracy === Accuracy is a major issue with biometric recognition. Passwords are still extremely popular, because a password is static in nature, while biometric data can be subject to change (such as one's voice becoming heavier due to puberty, or an accident to the face, which could lead to improper reading of facial scan data). When testing voice recognition as a substitute to PIN-based systems, Barclays reported that their voice recognition system is 95 percent accurate. This statistic means that many of its customers' voices might still not be recognised even when correct. This uncertainty revolving around the system could lead to slower adoption of biometric devices, continuing the reliance of traditional password-based methods. == Benefits of biometric devices over traditional methods of authentication == Biometric data cannot be lent and hacking of Biometric data is complicated hence it makes it safer to use than traditional methods of authentication like passwords which can be lent and shared. Passwords do not have the ability to judge the user but rely only on the data provided by the user, which can easily be stolen while Biometrics work on the uniqueness of each individual. Passwords can be forgotten and recovering them can take time, whereas Biometric devices rely on biometric data which tends to be unique to a person, hence there is no risk of forgetting the authentication data. A study conducted among Yahoo! users found that at least 1.5 percent of Yahoo users forgot their passwords every month, hence this makes accessing services more lengthy for consumers as the process of recovering passwords is lengthy. These shortcomings make Biometric devices more efficient and reduces effort for the end user. == Future == Researchers are targeting the drawbacks of present-day biometric devices and developing to reduce problems like biometric spoofing and inaccurate intake of data. Technologies which are being developed are- The United States Military Academy are developing an algorithm that allows identification through the ways each individual interacts with their own computers; this algorithm considers unique traits like typing speed, rhythm of writing and common spelling mistakes. This data allows the algorithm to create a unique profile for each user by combining their multiple behavioral and stylometric information. This can be very difficult to replicate collectively. A recent innovation by Kenneth Okereafor and, presented an optimized and secure design of applying biometric liveness detection technique using a trait randomization approach. This novel concept potentially opens up new ways of mitigating biometric spoofing more accurately, and making impostor predictions intractable or very difficult in future biometric devices. A simulation of Kenneth Okereafor's biometric liveness detection algorithm using a 3D multi-biometric framework consisting of 15 liveness parameters from facial print, finger print and iris pattern traits resulted in a system efficiency of the 99.2% over a cardinality of 125 distinct randomization combinat

    Read more →
  • Lesk algorithm

    Lesk algorithm

    The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986. It operates on the premise that words within a given context are likely to share a common meaning. This algorithm compares the dictionary definitions of an ambiguous word with the words in its surrounding context to determine the most appropriate sense. Variations, such as the Simplified Lesk algorithm, have demonstrated improved precision and efficiency. However, the Lesk algorithm has faced criticism for its sensitivity to definition wording and its reliance on brief glosses. Researchers have sought to enhance its accuracy by incorporating additional resources like thesauruses and syntactic models. == Overview == The Lesk algorithm is based on the assumption that words in a given "neighborhood" (section of text) will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood. Versions have been adapted to use WordNet. An implementation might look like this: for every sense of the word being disambiguated one should count the number of words that are in both the neighborhood of that word and in the dictionary definition of that sense the sense that is to be chosen is the sense that has the largest number of this count. A frequently used example illustrating this algorithm is for the context "pine cone". The following dictionary definitions are used: PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees As can be seen, the best intersection is Pine #1 ⋂ Cone #3 = 2. == Simplified Lesk algorithm == In Simplified Lesk algorithm, the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context. "A comparative evaluation performed by Vasilescu et al. (2004) has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm. Note: Vasilescu et al. implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions are by default assigned sense number one in WordNet." Simplified LESK Algorithm with smart default word sense (Vasilescu et al., 2004) The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way. == Criticisms == Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions. A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions. == Lesk variants == Original Lesk (Lesk, 1986) Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the Cosine similarity measure. There are a lot of studies concerning Lesk and its extensions: Wilks and Stevenson, 1998, 1999; Mahesh et al., 1997; Cowie et al., 1992; Yarowsky, 1992; Pook and Catlett, 1988; Kilgarriff and Rosensweig, 2000; Kwong, 2001; Nastase and Szpakowicz, 2001; Gelbukh and Sidorov, 2004.

    Read more →
  • Gooch shading

    Gooch shading

    Gooch shading is a non-photorealistic rendering technique for shading objects. It is also known as "cool to warm" shading, and is widely used in technical illustration. == History == Gooch shading was developed by Amy Gooch et al. at the University of Utah School of Computing and first presented at the 1998 SIGGRAPH conference. It has since been implemented in shader libraries, software, and games released by Autodesk, Nvidia, and Valve. == Process == Gooch shading defines an additional two colors in conjunction with the original model color: a warm color (such as yellow) and a cool color (such as blue). The warm color indicates surfaces that are facing toward the light source while the cool color indicates surfaces facing away. This allows shading to occur only in mid-tones so that edge lines and highlights remain visually prominent. The Gooch shader is typically implemented in two passes: all objects in the scene are first drawn with the "cool to warm" shading, and in the second pass the object's edges are rendered in black.

    Read more →
  • Patch management

    Patch management

    Patch management (or patch management policy or patch policy or patch management process) is concerned with the identification, acquisition, distribution, testing and installation of patches to systems. Proper patch management can be a net productivity boost for an organization. Patches can be used to defend against and eliminate potential vulnerabilities of a system, so that no threats may exploit them. Problems can arise during patch management, including buggy patches that either fail to fix their problem or introduce new issues. Patch management tools help orchestrate all of the procedures involved in patch management. == Description == Patch management is defined as a sub-practice of various disciplines including vulnerability management (part of security management), lifecycle management (with further possible sub-classification into application lifecycle management and release management), change management, and systems management. The practice is broadly concerned with the identification, acquisition, distribution, and installation of patches to systems. Some definitions of patch management are as a software-level practice, while others are as a systems-level process: software, drivers, and firmware. == Cost–benefit analysis == While reserving time for patching takes up enterprise resources, there are balancing factors which can make proper patch management into a net productivity boost for an organization. Up-to-date systems often perform more efficiently, less costly, with less errors, less security risks, and better user workflow. Additionally, compliance with changing local and federal regulations are more likely to be satisfied. Patching security vulnerabilities has been one among many competing priorities for organizations, leading to longer periods before patching for some organizations. Equifax was too slow to implement its 2015 patch management plan to be able to mitigate or prevent the 2017 Equifax data breach, leading to scrutiny from regulators. == Relation to security management == Patches can be used to defend against and eliminate potential vulnerabilities of a system, so that no threats may exploit them; therefore, patch management can be considered a sub-discipline of vulnerability management. Every patchable device in a system presents an attack surface that must be secured. === Time plan === Automatic updates are where the patch is applied automatically with little to know actions or planning required. This approach is recommended for many individuals and organizations. Some organizations also have to prioritize which patches to prioritize given limited resources. Patch Tuesday is the most common process when major companies like Microsoft and Adobe release patches on a known date so that companies can plan resources around implementing the patches more quickly. Linux is open-sourced and patches can be released at any time, leading some to rely on mailing lists or other ways to be alerted to updates. === Inventory === Taking an inventory of software and hardware, including versions can make it easier to correlate with bugs or patches as they become known. Taking stock of how much education and support others in an organization need to install their patches can also help for planning how to implement the patch or design systems to begin with. Streamlining the process by using tools that can communicate with each other can also help to reduce the time of exposure to known vulnerabilities. == Challenges == There are a multitude of problems that can arise during patch management. A common issue is buggy patches, which either fail to fix their problem or introduce new issues. Another issue is deployment synchronization, since various subsystems may receive instructions to update at different times. Similarly, the difficulty of patch management across many devices may grow at an uncontrollable rate depending on organizational size. One prominent demonstration of the challenges facing proper patch management was the buggy Falcon Sensor patch by CrowdStrike which caused one of the worst IT outages of all time. == Implementations == A patch management tool (alternatively patch manager, patch management system, patch management software, or centralized patch management) help orchestrate all of the procedures involved in patch management. Tools can be in-house (applied locally by local administrators), or external, as with managed service providers (applied externally by a provider). === Patch management software === Windows Update for Business, System Center Configuration Manager, and Windows Server Update Services offer control over patch deployment, with features enabling testing, scheduling updates, and setting custom configurations on Windows platforms. === Managed service providers === == Regulatory requirements (United States) == Timely patching of software vulnerabilities is a requirement under multiple regulatory frameworks in the United States. The Health Insurance Portability and Accountability Act (HIPAA) Security Rule requires covered entities to protect electronic protected health information by implementing security measures sufficient to reduce risks to a reasonable and appropriate level, which industry guidance has long interpreted to include timely patch management. A proposed new HIPAA Security Rule would make patch management requirements explicit, mandating that covered entities and business associates deploy security patches and updates within a defined risk-based timeline and maintain written procedures for prioritizing, testing, and applying patches to systems that store, process, or transmit ePHI. The 2025 proposal continues to receive industry pushback as of December 2025. HIPAA was last updated in 2013. The Payment Card Industry Data Security Standard (PCI DSS) requires organizations to protect system components from known vulnerabilities by installing applicable security patches within one month of release for critical patches. The Cybersecurity and Infrastructure Security Agency (CISA) maintains a Known Exploited Vulnerabilities (KEV) catalog that compels U.S. federal agencies to remediate listed vulnerabilities within specified timelines. Agencies are typically required to patch within 3 weeks, though some vulnerabilities must be fixed within 24 hours.

    Read more →
  • Ware report

    Ware report

    Security Controls for Computer Systems, commonly called the Ware report, is a 1970 text by Willis Ware that was foundational in the field of computer security. == Development == A defense contractor in St. Louis, Missouri, had bought an IBM mainframe computer, which it was using for classified work on a fighter aircraft. To provide additional income, the contractor asked the Department of Defense (DoD) for permission to sell computer time on the mainframe to local businesses via remote terminals, while the classified work continued. At the time, the DoD did not have a policy to cover this. The DoD's Advanced Research Projects Agency (DARPA) asked Ware - a RAND employee - to chair a committee to examine and report on the feasibility of security controls for computer systems. The committee's report was a classified document given in January 1970 to the Defense Science Board (DSB), which had taken over the project from ARPA. After declassification, the report was published by RAND in October 1979. == Influence == The IEEE Computer Society said the report was widely circulated, and the IEEE Annals of the History of Computing said that it, together with Ware's 1967 Spring Joint Computer Conference session, marked the start of the field of computer security. The report influenced security certification standards and processes, especially in the banking and defense industries, where the report was instrumental in creating the Orange Book.

    Read more →
  • Relational data mining

    Relational data mining

    Relational data mining is the data mining technique for relational databases. Unlike traditional data mining algorithms, which look for patterns in a single table (propositional patterns), relational data mining algorithms look for patterns among multiple tables (relational patterns). For most types of propositional patterns, there are corresponding relational patterns. For example, there are relational classification rules (relational classification), relational regression tree, and relational association rules. There are several approaches to relational data mining: Inductive Logic Programming (ILP) Statistical Relational Learning (SRL) Graph Mining Propositionalization Multi-view learning == Algorithms == Multi-Relation Association Rules: Multi-Relation Association Rules (MRAR) is a new class of association rules which in contrast to primitive, simple and even multi-relational association rules (that are usually extracted from multi-relational databases), each rule item consists of one entity but several relations. These relations indicate indirect relationship between the entities. Consider the following MRAR where the first item consists of three relations live in, nearby and humid: “Those who live in a place which is near by a city with humid climate type and also are younger than 20 -> their health condition is good”. Such association rules are extractable from RDBMS data or semantic web data. == Software == Safarii: a Data Mining environment for analysing large relational databases based on a multi-relational data mining engine. Dataconda: a software, free for research and teaching purposes, that helps mining relational databases without the use of SQL. == Datasets == Relational dataset repository: a collection of publicly available relational datasets.

    Read more →
  • VieON

    VieON

    VieON is an mobile application for television and video on demand provided by VieON Joint Stock Company (formerly Dzones), a subsidiary of DatVietVAC Media and Entertainment Group in Vietnam. The app was launched in 2020, featuring over 140 domestic and international television channels, original series, popular entertainment programs known nationwide, top-tier sports events and live streaming of major events. Additionally, VieON provides animated films, television series and television programs from various countries such as South Korea and China. == History == The application was planned for development in 2016, with the cooperation of strategic consulting partner BCG Digital Ventures from the United States. Prior to 2020, VieON was a rebranded version of VTVcab ON, a product managed by Vietnam Cable Television Corporation (VTVCab) and DatVietVAC. On June 15, 2020, after four years of research and testing, the new version of VieON was officially released by DatVietVAC Group, with Vie Channel Joint Stock Company as the business entity and service provider. This is considered the official launch date of the application. On July 21, 2023, VieON transitioned its business operations and service provision to VieON Joint Stock Company. In January 2024, VieON officially launched its global version, VieON Global, targeting Vietnamese users living abroad. == Background == According to Kantar Media Vietnam, up to 84% of Vietnamese people aged 15–54 use social media daily, and in a similar survey by Nielsen, 90% of respondents said they watch live TV weekly. Additionally, according to research organization Muvi, Southeast Asia's OTT market revenue could reach $650 million annually starting next year. Understanding this, DatVietVAC Group has planned to research and develop an OTT application, even though the Vietnamese market already has some major players such as FPT Play and the international giant Netflix. Additionally, DatVietVAC does not hide its ambition to make this application the number one entertainment channel for Vietnamese people.

    Read more →
  • Touch 'n Go eWallet

    Touch 'n Go eWallet

    Touch 'n Go eWallet is a Malaysian digital wallet and online payment platform, established in Kuala Lumpur, Malaysia, in July 2017 as a joint venture between Touch 'n Go and Ant Financial. It allows users to make payments at over 280,000 merchant touch points via QR code, as well as perform peer-to-peer (P2P) money transfers. Since then, the e-wallet further diversified for users to pay for tolls via RFID or PayDirect, street parking and various online payment spanning e-hailing, car-sharing apps or taxis, various overhead bills; top-up for mobile prepaid or in-game currencies; purchases on e-commerce websites; food delivery; renewing motor insurance and other insurance/takaful plans; and even movie, bus, trains or airline tickets. == Background == Prior to the launch of the e-wallet service, Touch 'n Go provided stored-value physical all-in-one contactless card (namely Touch 'n Go cards or "TnG cards") that users can use to pay for toll fares, public transportation and parking lots as well as purchases in some retail stores. In 1999, Touch 'n Go also markets SmartTag devices that allow road users to pass through certain toll booths without the need to unwind the car window. The high entry cost of the device (around RM 100 each) also meant that only few can enjoy the seamless experience. In 2009, Touch 'n Go partnered with Maxis to launch FastTap, a new mobile payment service that utilised Near-Field Communication (NFC). Maxis customers can make payments by placing the phone near the card readers (that also supports physical bank cards and Touch ’N Go cards). However, the venture featured only one phone model, Nokia 6212, which greatly limited the public reach. In July 2012, Touch 'n Go announced another collaboration with CIMB and Maxis to create similar NFC-based online transaction service that runs on compatible smartphones. Touch 'n Go Wallet was launched in February 2017 as an QR code-based e-wallet application, to compete with Samsung Pay that utilizes NFC modules. In the controlled pilot test in Taman Tun Dr Ismail, the correspondents can experience basic functionalities (prepaid mobile service reload, bills payment, movie tickets and flight tickets purchase, transfer of money with another user, and payments at participating stores and restaurants). While the deployed version of the app was generally well-received, the existing process to transfer the balance to the physical TnG card stored value from the app garnered unanimous backlash. Test groups felt that the need to head to a self-service terminal named "Pick Up Device" in person within 24 hours for completion, along with the failure to do so (the balance would be credited back to the wallet after 24 hours), was not divulged clearly and also defeated the purpose of convenience, not to mention there were only 2 such terminals. The feature was eventually suspended. On 15 November 2017, Touch 'n Go was granted permission by the Central Bank of Malaysia to form a joint venture with Ant Financial, a Chinese-based financial company that operates Alipay. The partnership allowed the local e-wallet to learn from and build upon the operational model pioneered by Alipay. In June 2018, it was reported that Touch 'n Go was pilot testing the uses of the Touch 'n Go eWallet in Rapid Transit, as the ticketing system was enabled on the Kelana Jaya line in the Klang Valley. Pilot testing only applied to stations in Kelana Jaya, KL Gateway–Universiti, Kerinchi, KL Sentral, Dang Wangi, KLCC, and Ampang Park. The test was reported to be successful in February 2020 and was planned to be fully deployed on the LRT and MRT. Due to unforeseen circumstances, this feature did not come into fruition, the app merely adds in-app purchase of monthly concession cards called "My50". In August 2018, Touch 'n Go announced that selected drivers may experience first-hand a new RFID-based payment (later rebranded as "myRFID") that serves to replace SmartTag devices on closed toll roads with during pilot testing phase commencing on 3 September 2018. On 2 November 2018, participation in the ongoing pilot programme was expanded, allowing more drivers to sign up ahead of the public rollout of the RFID system. During the same period, Touch 'n Go has discontinued the sales of SmartTAG devices in favor of the RFID-based payment system. Initially, the installation of the RFID chip onto the car could only be done by Touch 'n Go staff at the RFID fitment centers, at no cost. As the pilot testing concluded on 15 February 2020, a self-installation kit are being offered to the public on Lazada and Shopee. Support for taxi-hailing mobile apps was added in November 2018 when Touch 'n Go partnered with EzCab and Public Cab, allowing users to make payments via QR code. This was later expanded to support MULA on 7 January 2020, and later MyCar on 4 April 2020. Touch 'n Go eWallet was also the first eWallet to convert Kuala Lumpur's most famous Ramadan bazaar in Kampong Bahru into "Kampong Kashless", a venue that can accept cashless QR payments. It welcomed more than 250,000 Malaysians including local celebrities and government officials. On 1 October 2019, some e-commerce websites owned by the Alibaba Group (TMall and Taobao) began to support Touch 'n Go eWallet payments, Lazada joined the list on 29 October 2019. Touch 'n Go eWallet was one of the three e-wallet services in Malaysia (the other being Boost and GrabPay) that was eligible for its users to receive an RM 30 credit in conjunction of E-Tunai Rakyat program under the Budget 2020 plan, that further normalizes adoption of cashless and mobile payment among Malaysians. Unlike Boost and GrabPay, whose P2P transfers were completely disabled until users have exhausted the RM 30 first, Touch 'n Go eWallet did not impose such measures. in 2020, Touch 'n Go eWallet joined DuitNow, an electronic transaction ecosystem in Malaysia which allows the funds from Touch 'n Go eWallet to be transferred to other competing services and vice versa, by implementing a standard DuitNow QR code deisgn. Japan become the first country outside Malaysia to support Touch 'n Go eWallet payment via Alipay Connect. During the COVID-19 pandemic and the enforcement of the movement control order, use of eWallets (including Touch 'n Go eWallet) increased tremendously among citizens due to its contactless nature of the payment and increased take-out orders at home; which in turn helped small and medium-sized enterprises to thrive. Touch 'n Go eWallet launched its loyalty programme – The Goal Hunter – in October 2020 where on monthly basis, users collect stamps by paying with the app in exchange for rewards that include lucky draws and other vouchers. == Services == Touch 'n Go eWallet app is available for download on both Google Play and Apple Appstore. It utilizes QR code technology for local in-store payments. The Touch 'n Go eWallet app also diversifies payment types, including but not limited to Utility bills Purchase of motor insurance policy Pay Later facility Prepaid reload and Postpaid payment to telecommunications companies loan repayments for courts, MBSJ payments, zakat and PTPTN payment for car parking P2P transfer airline ticket bookings; movie tickets from TGV Cinemas RFID refuelling at Shell stations (defunct after Shell launched its own payment app in 2024) User can reload the eWallet credit by setting up auto-reload, purchasing reload pins from convenience stores (such as 7-Eleven, KK Super Mart, MyNews, Family Mart etc.), reloading by FPX and credit/debit card. The PayDirect feature allows users to link their physical Touch 'n Go cards into the eWallet, where the toll fare can be debited from the eWallet balance when flashing the card near the sensor. In the circumstance of insufficient balance in the app, the toll fare will be deducted from the physical card's balance instead. This also conveniently allows users to view the card's remaining balance. Touch 'n Go eWallet is the first and only eWallet to offer a money-back guarantee when an unauthorised transaction is made on the user’s eWallet account, subject to Terms & Conditions. Payment via QR code scanning, including Touch 'n Go eWallet, becomes a norm in most of the shops/restaurants across Malaysia, including roadside hawkers/stall owners and automatic vending machines. The merchants usually display their owner's individual QR or Business account that they can apply for in-app. The popularity attributes to the low merchant onboarding cost (Unlike NFC payment and debit/credit card that requires purchase or rental of a payment terminal device at a yearly fee.) The app is also one of the few ewallet that supports bidirectional liquidity (alongside MAE developed by Maybank), where funds can be transferred two-way with bank accounts. This is not possible with the other major ewallets (GrabPay, Boost, ShopeePay etc.) where the money that is reloaded to the wallet cannot be transferred to another bank account, unless through manual req

    Read more →
  • Alerts.in.ua

    Alerts.in.ua

    alerts.in.ua is an online service that visualizes information about air alerts and other threats on the map of Ukraine. == History == The idea of the site appeared in the first weeks of the 2022 Russian invasion of Ukraine, during the development of other projects related to alerting the population about alarms. So, on March 2, 2022, the "Lviv Siren" bot was created, which reported on air alarms in Lviv on Twitter. Later, the idea arose to monitor alarms all over Ukraine and display them on a map. However, the lack of a single official source reporting alarms made this task much more difficult. On March 15, 2022, the Ajax Systems company announced the creation of the official Telegram channel "Air Alarm". This channel receives signals from the "Air Alarm" application and instantly publishes messages about the start and end of alarms in different regions of Ukraine. This immediately solved the problem with the source of information and gave impetus to the further implementation of the project. On March 22, 2022, the first version of the "Air Alarm Map" website was published, located on the war.ukrzen.in.ua domain. The map quickly gained popularity in social networks. It, like several other similar projects, began to be widely distributed by the mass media: Suspilne, Novyi Kanal, UNIAN, DW, Fakty ICTV, Vikna TV, Ukrainian Radio, STB, Espresso, dev.ua, itc.ua and state bodies: Center for Countering Disinformation at the National Security and Defense Council of Ukraine, Verkhovna Rada of Ukraine, Khmelnytska OVA, etc. On April 8, 2022, the site moved to the alerts.in.ua domain, where it is still available today. On August 25, 2022, the service began monitoring local official channels in addition to the main "Air Alarm". On September 11, 2022, the English version of the site was published. On March 22, 2023, its own Android application was published. The project is actively developing and has its own community. == Description == The main part of the site is a map of Ukraine, on which the regions where an air alert or other threats have been declared are highlighted in real time. As of October 16, 2022, 5 types of threats are supported: Air alarm. The threat of artillery fire. The threat of street fighting. Chemical threat. Nuclear threat. Additionally, based on media reports, information is published about other dangerous events, such as explosions, demining, etc. On the site, you can view the history of announced alarms with links to sources. Alarm statistics for different time periods are also available. For developers, there is an API that allows you to develop your own services based on information about declared alarms. The site is available in Ukrainian, English, Polish and Japanese. == Use == The map is used by: To monitor the situation in the country and the region. To illustrate the alarms announced in the mass media: TSN, Ukrainian truth, Channel 24, Suspilne, RBC Ukraine, Gromadske, Glavkom. As a map of alarms in mobile applications, there is Alarm and AirAlert. As an API for its services, including alternative alarm maps, Telegram, Viber channels, Discord bots, IoT projects, etc. == Statistics == 89.5% of users use the map from a mobile phone, 10% from a PC and 1% from a tablet. Top 6 countries by visit: Ukraine, United States, Poland, Germany, Great Britain and Japan . == Alternative projects == eMap was created by the developer Vadym Klymenko. AlarmMap is an online from the Ukrainian office of Agroprep. The official map of air alarms was developed by Ajax Systems together with the developer Artem Lemeshev, Stfalcon with the support of the Ministry of Statistics.

    Read more →
  • Software construction

    Software construction

    Software construction is the process of creating working software via coding and integration. The process includes unit and integration testing although does not include higher level testing such as system testing. Construction is an aspect of the software development lifecycle and is integrated in the various software development process models with varying focus on construction as an activity separate from other activities. In the waterfall model, a software development effort consists of sequential phases including requirements analysis, design, and planning which are prerequisites for starting construction. In an iterative model such as scrum, evolutionary prototyping, or extreme programming, construction as an activity that occurs concurrently or overlapping other activities. Construction planning may include defining the order in which components are created and integrated, the software quality management processes, and the allocation of tasks to teams and developers. To facilitate project management, numerous construction aspects can be measured; these include the amount of code developed, modified, reused, and destroyed, code complexity, code inspection statistics, faults-fixed and faults-found rates, and effort expended. These measurements can be useful for aspects such as ensuring quality and improving the process. == Activities == Construction includes many activities. === Coding === The following are a few of the key aspects of the coding activity: Naming Choice of name for each identifier. One study showed that the effort required to debug a program is minimized when variable names are between 10 and 16 characters. Logic Organization into statements and routines Highly cohesive routines proved to be less error prone than routines with lower cohesion. A study of 450 routines found that 50 percent of the highly cohesive routines were fault free compared to only 18 percent of routines with low cohesion. Another study of a different 450 routines found that routines with the highest coupling-to-cohesion ratios had 7 times as many errors as those with the lowest coupling-to-cohesion ratios and were 20 times as costly to fix. Although studies showed inconclusive results regarding the correlation between routine sizes and the rate of errors in them, but one study found that routines with fewer than 143 lines of code were 2.4 times less expensive to fix than larger routines. Another study showed that the code needed to be changed least when routines averaged 100 to 150 lines of code. Another study found that structural complexity and amount of data in a routine were correlated with errors regardless of its size. Interfaces between routines are some of the most error-prone areas of a program. One study showed that 39 percent of all errors were errors in communication between routines. Unused parameters are correlated with an increased error rate. In one study, only 17 to 29 percent of routines with more than one unreferenced variable had no errors, compared to 46 percent in routines with no unused variables. The number of parameters of a routine should be 7 at maximum as research has found that people generally cannot keep track of more than about seven chunks of information at once. One experiment showed that designs which access arrays sequentially, rather than randomly, result in fewer variables and fewer variable references. One experiment found that loops-with-exit are more comprehensible than other kinds of loops. Regarding the level of nesting in loops and conditionals, studies have shown that programmers have difficulty comprehending more than three levels of nesting. Control flow complexity has been shown to correlate with low reliability and frequent errors. Modularity Structuring and refactoring the code into classes, packages and other structures. When considering containment, the maximum number of data members in a class shouldn't exceed 7±2. Research has shown that this number is the number of discrete items a person can remember while performing other tasks. When considering inheritance, the number of levels in the inheritance tree should be limited. Deep inheritance trees have been found to be significantly associated with increased fault rates. When considering the number of routines in a class, it should be kept as small as possible. A study on C++ programs has found an association between the number of routines and the number of faults. A study by NASA showed that the putting the code into well-factored classes can double the code reusability compared to the code developed using functional design. Error handling Encoding logic to handle both planned and unplanned errors and exceptions. Resource management Managing computational resource use via exclusion mechanisms and discipline in accessing serially reusable resources, including threads or database locks. Security Prevention of code-level security breaches such as buffer overrun and array index overflow. Optimization Optimization while avoiding premature optimization. Documentation Both embedded in the code as comments and as external documents. === Integration === Integration is about combining separately constructed parts. Concerns include planning the sequence in which components will be integrated, creating scaffolding to support interim versions of the software, determining the degree of testing and quality work performed on components before they are integrated, and determining points in the project at which interim versions are tested. === Testing === Testing can reduce the time between when faulty logic is inserted in the code and when it is detected. In some cases, testing is performed after code has been written, but in test-first programming, test cases are created before code is written. Construction includes at least two forms of testing, often performed by the developer who wrote the code: unit testing and integration testing. === Reuse === Software reuse entails more than creating and using libraries. It requires formalizing the practice of reuse by integrating reuse processes and activities into the software life cycle. The tasks related to reuse in software construction during coding and testing may include: selection of the reusable code, evaluation of code or test re-usability, reporting reuse metrics. === Quality assurance === Techniques for ensuring quality as software is constructed include: Testing One study found that the average defect detection rates of Unit testing and integration testing are 30% and 35% respectively. Software inspection With respect to software inspection, one study found that the average defect detection rate of formal code inspections is 60%. Regarding the cost of finding defects, a study found that code reading detected 80% more faults per hour than testing. Another study shown that it costs six times more to detect design defects by using testing than by using inspections. A study by IBM showed that only 3.5 hours were needed to find a defect through code inspections versus 15–25 hours through testing. Microsoft has found that it takes 3 hours to find and fix a defect by using code inspections and 12 hours to find and fix a defect by using testing. In a 700 thousand lines program, it was reported that code reviews were several times as cost-effective as testing. Studies found that inspections result in 20% - 30% fewer defects per 1000 lines of code than less formal review practices and that they increase productivity by about 20%. Formal inspections will usually take 10% - 15% of the project budget and will reduce overall project cost. Researchers found that having more than 2 - 3 reviewers on a formal inspection doesn't increase the number of defects found, although the results seem to vary depending on the kind of material being inspected. Technical review With respect to technical review, one study found that the average defect detection rates of informal code reviews and desk checking are 25% and 40% respectively. Walkthroughs were found to have a defect detection rate of 20% - 40%, but were found also to be expensive especially when project pressures increase. Code reading was found by NASA to detect 3.3 defects per hour of effort versus 1.8 defects per hour for testing. It also finds 20% - 60% more errors over the life of the project than different kinds of testing. A study of 13 reviews about review meetings, found that 90% of the defects were found in preparation for the review meeting while only around 10% were found during the meeting. Static analysis With respect to Static analysis (IEEE1028), studies have shown that a combination of these techniques needs to be used to achieve a high defect detection rate. Other studies showed that different people tend to find different defects. One study found that the extreme programming practices of pair programming, desk checking, unit testing, integration testing, and regression testing can achieve a 90% defect detection rate. An experiment involving exper

    Read more →
  • Color gradient

    Color gradient

    In color science, a color gradient (also known as a color ramp or a color progression) specifies a range of position-dependent colors, usually used to fill a region. In assigning colors to a set of values, a gradient is a continuous colormap, a type of color scheme. In computer graphics, the term swatch has come to mean a palette of active colors. == Definitions == Color gradient is a set of colors arranged in a linear order (ordered) A continuous colormap is a curve through a colorspace === Strict definition === A colormap is a function which associate a real value r with point c in color space C {\displaystyle C} f : [ r m i n , r m a x ] ⊂ R → C {\displaystyle f:[r_{min},r_{max}]\subset \mathbf {R} \to C} which is defined by: a colorspace C an increasing sequence of sampling points r 0 < . . . < r m ∈ [ r m i n , r m a x ] {\displaystyle r_{0}<... Read more →

  • Key–value database

    Key–value database

    A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database. Key-value databases differ from the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well-defined data types. Exposing the data types to the database program allows it to apply various optimizations. In contrast, key-value systems treat the value as opaque to the database itself, and typically support only simple operations such as storing, retrieving, updating, and deleting a value by its key. This offers considerable flexibility and makes such systems well suited to low-latency, high-throughput workloads dominated by direct key lookups, but less suitable for applications that require complex queries or explicit relationships among records. A lack of standardization, limited transaction support, and relatively simple query interfaces long restricted many key-value systems to specialized uses, but the rapid move to cloud computing after 2010 helped drive renewed interest in them as part of the broader NoSQL movement. Some graph databases, such as ArangoDB, are also key–value databases internally, adding the concept of relationships (pointers) between records as a first-class data type. == Types and examples == Key–value systems span a wide consistency spectrum, from eventually consistent designs to strongly consistent or serializable ones, and some allow the consistency level to be configured as part of the trade-off against latency and availability. Renewed interest in key–value and other NoSQL systems was driven in part by the demands of big data, distributed, and cloud applications. Their scalability and availability made them attractive for cloud data management, although limited transaction support, low-level query interfaces, and the lack of standardization remained obstacles to wider adoption. Some maintain data in memory (RAM), while others employ solid-state drives or rotating disks. Some key–value systems add additional structure to their keys. For example, Oracle NoSQL Database organizes records using composite keys with "major" and "minor" components, an arrangement that Oracle compares to a directory-path structure in a file system. More generally, however, key–value stores are defined by their use of unique keys associated with opaque values and by their emphasis on simple key-based operations. Unix included dbm (database manager), a minimal database library written by Ken Thompson for managing associative arrays with a single key and hash-based access. Later implementations and related libraries included sdbm, GNU dbm (gdbm), and Berkeley DB. A more recent example is RocksDB, a persistent key–value storage engine developed at Facebook and designed for large-scale applications. Other examples include in-memory systems such as Memcached and Redis, and persistent systems such as Berkeley DB, Riak, and Voldemort.

    Read more →