Knowledge integration

Knowledge integration

Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

Color space

A color space is a specific organization of colors. In combination with color profiling supported by various physical devices, it supports reproducible representations of color – whether such representation entails an analog or a digital representation. A color space may be arbitrary, i.e. with physically realized colors assigned to a set of physical color swatches with corresponding assigned color names (including discrete numbers in – for example – the Pantone collection), or structured with mathematical rigor (as with the NCS System, Adobe RGB and sRGB). A "color space" is a useful conceptual tool for understanding the color capabilities of a particular device or digital file. When trying to reproduce color on another device, color spaces can show whether shadow/highlight detail and color saturation can be retained, and by how much either will be compromised. A "color model" is an abstract mathematical model describing the way colors can be represented as tuples of numbers (e.g. triples in RGB or quadruples in CMYK); however, a color model with no associated mapping function to an absolute color space is a more or less arbitrary color system with no connection to any globally understood system of color interpretation. Adding a specific mapping function between a color model and a reference color space establishes within the reference color space a definite "footprint", known as a gamut, and for a given color model, this defines a color space. For example, Adobe RGB and sRGB are two different absolute color spaces, both based on the RGB color model. When defining a color space, the usual reference standard is the CIELAB or CIEXYZ color spaces, which were specifically designed to encompass all colors the average human can see. Since "color space" identifies a particular combination of the color model and the mapping function, the word is often used informally to identify a color model. However, even though identifying a color space automatically identifies the associated color model, this usage is incorrect in a strict sense. For example, although several specific color spaces are based on the RGB color model, there is no such thing as the singular RGB color space. == History == In 1802, Thomas Young postulated the existence of three types of photoreceptors (now known as cone cells) in the eye, each of which was sensitive to a particular range of visible light. Hermann von Helmholtz developed the Young–Helmholtz theory further in 1850: that the three types of cone photoreceptors could be classified as short-preferring (blue), middle-preferring (green), and long-preferring (red), according to their response to the wavelengths of light striking the retina. The relative strengths of the signals detected by the three types of cones are interpreted by the brain as a visible color. But it is not clear that they thought of colors as being points in color space. The color-space concept was likely due to Hermann Grassmann, who developed it in two stages. First, he developed the idea of vector space, which allowed the algebraic representation of geometric concepts in n-dimensional space. Fearnley-Sander (1979) describes Grassmann's foundation of linear algebra as follows: The definition of a linear space (vector space)... became widely known around 1920, when Hermann Weyl and others published formal definitions. In fact, such a definition had been given thirty years previously by Peano, who was thoroughly acquainted with Grassmann's mathematical work. Grassmann did not put down a formal definition—the language was not available—but there is no doubt that he had the concept. With this conceptual background, in 1853, Grassmann published a theory of how colors mix; it and its three color laws are still taught, as Grassmann's law. As noted first by Grassmann... the light set has the structure of a cone in the infinite-dimensional linear space. As a result, a quotient set (with respect to metamerism) of the light cone inherits the conical structure, which allows color to be represented as a convex cone in the 3- D linear space, which is referred to as the color cone. == Examples == Colors can be created in printing with color spaces based on the CMYK color model, using the subtractive primary colors of pigment (cyan, magenta, yellow, and key [black]). To create a three-dimensional representation of a given color space, we can assign the amount of magenta color to the representation's X axis, the amount of cyan to its Y axis, and the amount of yellow to its Z axis. The resulting 3-D space provides a unique position for every possible color that can be created by combining those three pigments. Colors can be created on computer monitors with color spaces based on the RGB color model, using the additive primary colors (red, green, and blue). A three-dimensional representation would assign each of the three colors to the X, Y, and Z axes. Colors generated on a given monitor will be limited by the reproduction medium, such as the phosphor (in a CRT monitor) or filters and backlight (LCD monitor). Another way of creating colors on a monitor is with an HSL or HSV color model, based on hue, saturation, brightness (value/lightness). With such a model, the variables are assigned to cylindrical coordinates. Many color spaces can be represented as three-dimensional values in this manner, but some have more, or fewer dimensions, and some, such as Pantone, cannot be represented in this way at all. == Conversion == Color space conversion is the translation of the representation of a color from one basis to another. This typically occurs in the context of converting an image that is represented in one color space to another color space, the goal being to make the translated image look as similar as possible to the original. == RGB density == The RGB color model is implemented in different ways, depending on the capabilities of the system used. The most common incarnation in general use as of 2021 is the 24-bit implementation, with 8 bits, or 256 discrete levels of color per channel. Any color space based on such a 24-bit RGB model is thus limited to a range of 256×256×256 ≈ 16.7 million colors. Some implementations use 16 bits per component for 48 bits total, resulting in the same gamut with a larger number of distinct colors. This is especially important when working with wide-gamut color spaces (where most of the more common colors are located relatively close together), or when a large number of digital filtering algorithms are used consecutively. The same principle applies for any color space based on the same color model, but implemented at different bit depths. == Lists == CIE 1931 XYZ color space was one of the first attempts to produce a color space based on measurements of human color perception (earlier efforts were by James Clerk Maxwell, König & Dieterici, and Abney at Imperial College) and it is the basis for almost all other color spaces. The CIERGB color space is a linearly-related companion of CIE XYZ. Additional derivatives of CIE XYZ include the CIELUV, CIEUVW, and CIELAB. === Generic === RGB uses additive color mixing, because it describes what kind of light needs to be emitted to produce a given color. RGB stores individual values for red, green and blue. RGBA is RGB with an additional channel, alpha, to indicate transparency. Common color spaces based on the RGB model include sRGB, Adobe RGB, ProPhoto RGB, scRGB, and CIE RGB. CMYK uses subtractive color mixing used in the printing process, because it describes what kind of inks need to be applied so the light reflected from the substrate and through the inks produces a given color. One starts with a white substrate (canvas, page, etc.), and uses ink to subtract color from white to create an image. CMYK stores ink values for cyan, magenta, yellow and black. There are many CMYK color spaces for different sets of inks, substrates, and press characteristics (which change the dot gain or transfer function for each ink and thus change the appearance). YIQ was formerly used in NTSC (North America, Japan and elsewhere) television broadcasts for historical reasons. This system stores a luma value roughly analogous to (and sometimes incorrectly identified as) luminance, along with two chroma values as approximate representations of the relative amounts of blue and red in the color. It is similar to the YUV scheme used in most video capture systems and in PAL (Australia, Europe, except France, which uses SECAM) television, except that the YIQ color space is rotated 33° with respect to the YUV color space and the color axes are swapped. The YDbDr scheme used by SECAM television is rotated in another way. YPbPr is a scaled version of YUV. It is most commonly seen in its digital form, YCbCr, used widely in video and image compression schemes such as MPEG and JPEG. xvYCC is an international digital video color space standard published by the IEC (IEC 61966-2-4). It is based on the ITU BT.601 and BT.709

Cryptosystem

In cryptography, a cryptosystem is a suite of cryptographic algorithms needed to implement a particular security service, such as confidentiality (encryption). Typically, a cryptosystem consists of three algorithms: one for key generation, one for encryption, and one for decryption. The term cipher (sometimes cypher) is often used to refer to a pair of algorithms, one for encryption and one for decryption. Therefore, the term cryptosystem is most often used when the key generation algorithm is important. For this reason, the term cryptosystem is commonly used to refer to public key techniques; however both "cipher" and "cryptosystem" are used for symmetric key techniques. == Formal definition == Mathematically, a cryptosystem or encryption scheme can be defined as a tuple ( P , C , K , E , D ) {\displaystyle ({\mathcal {P}},{\mathcal {C}},{\mathcal {K}},{\mathcal {E}},{\mathcal {D}})} with the following properties. P {\displaystyle {\mathcal {P}}} is a set called the "plaintext space". Its elements are called plaintexts. C {\displaystyle {\mathcal {C}}} is a set called the "ciphertext space". Its elements are called ciphertexts. K {\displaystyle {\mathcal {K}}} is a set called the "key space". Its elements are called keys. E = { E k : k ∈ K } {\displaystyle {\mathcal {E}}=\{E_{k}:k\in {\mathcal {K}}\}} is a set of functions E k : P → C {\displaystyle E_{k}:{\mathcal {P}}\rightarrow {\mathcal {C}}} . Its elements are called "encryption functions". D = { D k : k ∈ K } {\displaystyle {\mathcal {D}}=\{D_{k}:k\in {\mathcal {K}}\}} is a set of functions D k : C → P {\displaystyle D_{k}:{\mathcal {C}}\rightarrow {\mathcal {P}}} . Its elements are called "decryption functions". For each e ∈ K {\displaystyle e\in {\mathcal {K}}} , there is d ∈ K {\displaystyle d\in {\mathcal {K}}} such that D d ( E e ( p ) ) = p {\displaystyle D_{d}(E_{e}(p))=p} for all p ∈ P {\displaystyle p\in {\mathcal {P}}} . Note; typically this definition is modified in order to distinguish an encryption scheme as being either a symmetric-key or public-key type of cryptosystem. == Examples == A classical example of a cryptosystem is the Caesar cipher. A more contemporary example is the RSA cryptosystem. Another example of a cryptosystem is the Advanced Encryption Standard (AES). AES is a widely used symmetric encryption algorithm that has become the standard for securing data in various applications. Paillier cryptosystem is another example used to preserve and maintain privacy and sensitive information. It is featured in electronic voting, electronic lotteries and electronic auctions.

Data stream management system

A data stream management system (DSMS) is a computer software system to manage continuous data streams. It is similar to a database management system (DBMS), which is, however, designed for static data in conventional databases. A DBMS also offers a flexible query processing so that the information needed can be expressed using queries. However, in contrast to a DBMS, a DSMS executes a continuous query that is not only performed once, but is permanently installed. Therefore, the query is continuously executed until it is explicitly uninstalled. Since most DSMS are data-driven, a continuous query produces new results as long as new data arrive at the system. This basic concept is similar to complex event processing so that both technologies are partially coalescing. == Functional principle == One important feature of a DSMS is the possibility to handle potentially infinite and rapidly changing data streams by offering flexible processing at the same time, although there are only limited resources such as main memory. The following table provides various principles of DSMS and compares them to traditional DBMS. == Processing and streaming models == One of the biggest challenges for a DSMS is to handle potentially infinite data streams using a fixed amount of memory and no random access to the data. There are different approaches to limit the amount of data in one pass, which can be divided into two classes. For the one hand, there are compression techniques that try to summarize the data and for the other hand there are window techniques that try to portion the data into (finite) parts. === Synopses === The idea behind compression techniques is to maintain only a synopsis of the data, but not all (raw) data points of the data stream. The algorithms range from selecting random data points called sampling to summarization using histograms, wavelets or sketching. One simple example of a compression is the continuous calculation of an average. Instead of memorizing each data point, the synopsis only holds the sum and the number of items. The average can be calculated by dividing the sum by the number. However, it should be mentioned that synopses cannot reflect the data accurately. Thus, a processing that is based on synopses may produce inaccurate results. === Windows === Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to FIFO lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data. There are also different approaches to implementing windows. There are, for example, approaches that use timestamps or time intervals for system-wide windows or buffer-based windows for each single processing step. Sliding-window query processing is also suitable to being implemented in parallel processors by exploiting parallelism between different windows and/or within each window extent. == Query processing == Since there are a lot of prototypes, there is no standardized architecture. However, most DSMS are based on the query processing in DBMS by using declarative languages to express queries, which are translated into a plan of operators. These plans can be optimized and executed. A query processing often consists of the following steps. === Formulation of continuous queries === The formulation of queries is mostly done using declarative languages like SQL in DBMS. Since there are no standardized query languages to express continuous queries, there are a lot of languages and variations. However, most of them are based on SQL, such as the Continuous Query Language (CQL), StreamSQL and ESP. There are also graphical approaches where each processing step is a box and the processing flow is expressed by arrows between the boxes. The language strongly depends on the processing model. For example, if windows are used for the processing, the definition of a window has to be expressed. In StreamSQL, a query with a sliding window for the last 10 elements looks like follows: This stream continuously calculates the average value of "price" of the last 10 tuples, but only considers those tuples whose prices are greater than 100.0. In the next step, the declarative query is translated into a logical query plan. A query plan is a directed graph where the nodes are operators and the edges describe the processing flow. Each operator in the query plan encapsulates the semantic of a specific operation, such as filtering or aggregation. In DSMSs that process relational data streams, the operators are equal or similar to the operators of the Relational algebra, so that there are operators for selection, projection, join, and set operations. This operator concept allows the very flexible and versatile processing of a DSMS. === Optimization of queries === The logical query plan can be optimized, which strongly depends on the streaming model. The basic concepts for optimizing continuous queries are equal to those from database systems. If there are relational data streams and the logical query plan is based on relational operators from the Relational algebra, a query optimizer can use the algebraic equivalences to optimize the plan. These may be, for example, to push selection operators down to the sources, because they are not so computationally intensive like join operators. Furthermore, there are also cost-based optimization techniques like in DBMS, where a query plan with the lowest costs is chosen from different equivalent query plans. One example is to choose the order of two successive join operators. In DBMS this decision is mostly done by certain statistics of the involved databases. But, since the data of a data streams is unknown in advance, there are no such statistics in a DSMS. However, it is possible to observe a data stream for a certain time to obtain some statistics. Using these statistics, the query can also be optimized later. So, in contrast to a DBMS, some DSMS allows to optimize the query even during runtime. Therefore, a DSMS needs some plan migration strategies to replace a running query plan with a new one. === Transformation of queries === Since a logical operator is only responsible for the semantics of an operation but does not consist of any algorithms, the logical query plan must be transformed into an executable counterpart. This is called a physical query plan. The distinction between a logical and a physical operator plan allows more than one implementation for the same logical operator. The join, for example, is logically the same, although it can be implemented by different algorithms like a Nested loop join or a Sort-merge join. Notice, these algorithms also strongly depend on the used stream and processing model. Finally, the query is available as a physical query plan. === Execution of queries === Since the physical query plan consists of executable algorithms, it can be directly executed. For this, the physical query plan is installed into the system. The bottom of the graph (of the query plan) is connected to the incoming sources, which can be everything like connectors to sensors. The top of the graph is connected to the outgoing sinks, which may be for example a visualization. Since most DSMSs are data-driven, a query is executed by pushing the incoming data elements from the source through the query plan to the sink. Each time when a data element passes an operator, the operator performs its specific operation on the data element and forwards the result to all successive operators. == Examples == AURORA, StreamBase Systems, Inc. Archived 23 March 2009 at the Wayback Machine Hortonworks DataFlow IBM Streams NIAGARA Query Engine NiagaraST: A Research Data Stream Management System at Portland State University Odysseus, an open source Java-based framework for Data Stream Management Systems Pipeline DB PIPES Archived 24 December 2016 at the Wayback Machine, webMethods Business Events QStream SAS Event Stream Processing SQLstream STREAM StreamGlobe StreamInsight TelegraphCQ WSO2 Stream Processor

Microsoft Security Development Lifecycle

The Microsoft Security Development Lifecycle (SDL) is the approach Microsoft uses to integrate security into DevOps processes (sometimes called a DevSecOps approach). You can use this SDL guidance and documentation to adapt this approach and practices to your organization. == Overview == The practices outlined in the SDL approach are applicable to all types of software development and across all platforms, ranging from traditional waterfall methodologies to modern DevOps approaches. They can generally be applied to the following: Software – whether you are developing software code for firmware, AI applications, operating systems, drivers, IoT Devices, mobile device apps, web services, plug-ins or applets, hardware microcode, low-code/no-code apps, or other software formats. Note that most practices in the SDL are applicable to secure computer hardware development as well. Platforms – whether the software is running on a ‘serverless’ platform approach, on an on-premises server, a mobile device, a cloud hosted VM, a user endpoint, as part of a Software as a Service (SaaS) application, a cloud edge device, an IoT device, or anywhere else. == Practices == The SDL recommends 10 security practices to incorporate into your development workflows. Applying the 10 security practices of SDL is an ongoing process of improvement so a key recommendation is to begin from some point and keep enhancing as you proceed. This continuous process involves changes to culture, strategy, processes, and technical controls as you embed security skills and practices into DevOps workflows. The 10 SDL practices are: Establish security standards, metrics, and governance Require use of proven security features, languages, and frameworks Perform security design review and threat modeling Define and use cryptography standards Secure the software supply chain Secure the engineering environment Perform security testing Ensure operational platform security Implement security monitoring and response Provide security training == Versions ==

Kuaishou

Kuaishou Technology is a Chinese publicly traded partly state-owned holding company based in Haidian District, Beijing, that was founded in 2011 by Hua Su (Chinese: 宿华) and Cheng Yixiao (Chinese: 程一笑). The company, listed on the Hong Kong Stock Exchange, is known for developing a mobile app for sharing users' short videos, a social network, and video special effects editor. The app is known as Kwai in many countries outside of China. It is also known as Snack Video in India, Pakistan and Indonesia. == Ownership and governance == Kuaishou's overseas team is led by the former CEO of the application 99, and staff from Google, Facebook, Netflix, and TikTok were recruited to lead the company's international expansion. The China Internet Investment Fund, a state-owned enterprise controlled by the Cyberspace Administration of China, holds a golden share ownership stake in Kuaishou. == History == Kuaishou is China's first short video platform that was developed in 2011 by engineer Hua Su and Cheng Yixiao. Prior to co-founding Kuaishou, Su Hua had worked for both Google and Baidu as a software engineer. The company is headquartered in Haidian District, Beijing. Kuaishou's predecessor "GIF Kuaishou" was founded in March 2011. GIF Kuaishou was a mobile app with which users could make and share GIF pictures. In 2013, Kuaishou became a short-video social platform. By 2013, the app had reached 100 million daily users. By 2019, it had exceeded 200 million active daily users. In March 2017, Kuaishou closed a US$350 million investment round that was led by Tencent. In January 2018, Forbes estimated the company's valuation to be US$18 billion. In April 2018, Kuaishou's app was briefly banned from Chinese app stores after China Central Television (CCTV) reported on the platform popularizing videos of teenage mothers. In 2019, the company announced a partnership with the People's Daily, an official newspaper of the Central Committee of the Chinese Communist Party, to help it experiment with the use of artificial intelligence in news. In June 2020, following the start of the 2020–2021 China–India skirmishes, the Government of India banned Kwai along with 58 other apps, citing "data and privacy issues". In January 2021, Kuaishou announced it was planning an initial public offering (IPO) to raise approximately US$5 billion. Kuaishou's stock completed its first day of trading at $300 Hong Kong dollars (HKD) (US$38.70), more than doubling its initial offer price, and causing its market value to rise to over $1 trillion HKD (US$159 billion). In February 2021, Kuaishou made a debut on the Hong Kong Stock Exchange, with its shares soaring by 194% at the opening. The company subsequently encountered major setbacks as a result of heightened regulatory restrictions on Chinese internet firms, which contributed to its share price falling by nearly 80% from its post-IPO peak. By December 2021, Kuaishou announced a major reorganization, including the layoff of 30% of its staff, primarily targeting mid-level employees earning an annual salary of $157,000 or more. This restructuring aimed to cut costs and mitigate financial losses. In October 2022, state-owned Beijing Radio and Television Station took a minority ownership stake in Kuaishou. In April 2024, a Financial Times article citing current and former Kuaishou employees stated that the company has been running an ageist redundancy programme known internally as "Limestone", culling workers in their mid-30s. In June 2024, Kuaishou and the Sichuan international communication center launched a branch center in São Paulo, Brazil. In June 2024, Kuaishou released its diffusion transformer text-to-video model, Kling, which they claimed could generate two minutes of video at 30 frames per second and in 1080p resolution. The model has been compared to that of OpenAI's Sora text-to-video model. It is accessible to the public on Kuaishou's video editing app KwaiCut via signing up for a waitlist with a Chinese phone number. In December 2025, Kuaishou came under a cyberattack which led to a temporary influx of violent and pornographic content. == Popularity == As of 2019, it had a worldwide user base of over 200 million, leading the "Most Downloaded" lists of the Google Play and Apple App Store in eight countries, such as Brazil, where it was introduced in 2019. Its main short-video platform competitor was Douyin, which is known as TikTok outside China. Compared to Douyin, Kuaishou is more popular with older users living outside China's Tier 1 cities. Its initial popularity came from videos of Chinese rural life. The app is particularly well known for its "rustic" aesthetic and is popular among rural people. Kuaishou also relied more on e-commerce revenue than on advertising revenue compared to its main competitor. == Reception == Kwai (as the app is called outside of China) was banned in India in 2020 along with other short video apps like TikTok. Kuaishou then released the clone SnackVideo, which was subsequently also banned. The app is one of the most popular social media platforms in Brazil, where Kuaishou partnered with creators to make telenovela style content, and appeals to football fans by working with football teams CR Flamengo and Santos FC and sponsoring the tournament Copa América. Kwai was notable in Brazil for spreading information (and misinformation) about the COVID-19 vaccine and political misinformation. === Manjiao Wenhua === "Manjiao wenhua" (慢脚文化) is a sarcasm term on Chinese internet on the unethical or illegal contents on Kuaishou. State broadcaster China Central Television (CCTV) reported that many contents are about child pregnancy. "Dating, pregnancy, bearing a child...these are strictly prohibited in the real time by a minor, but these contents can easily shown to audiences here." In addition, many students from primary or secondary schools make a pose of smoking. Wang Zhenhui (王贞会) from CUPSL stated that these kinds of bad values will give negative effects to the minors.

Tokenization (data security)

Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data. Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider. Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use. == Concepts and origins == The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia and scrip are terms synonymous with such tokens. In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection. In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: "The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility." To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents. == The tokenization process == The process of tokenization consists of the following steps: The application sends the tokenization data and authentication information to the tokenization system. It is stopped if authentication fails and the data is delivered to an event management system. As a result, administrators can discover problems and effectively manage the system. The system moves on to the next phase if authentication is successful. Using one-way cryptographic or random generation techniques, a token is generated and kept in a highly secure data vault. The new token is provided to the application for further use, replacing the sensitive data for processing and storage. Tokenization systems share several components according to established standards. Token generation is the process of producing a token using any means, such as one-way nonreversible cryptographic functions (e.g., a hash function with a strong, secret salt) or assignment via a randomly generated number. Random number generator (RNG) techniques are often the best choice for generating token values. Token mapping – this is the process of assigning the created token value to its original value. To enable permitted look-ups of the original value using the token as the index, a secure cross-reference database must be constructed. Token data store – this is a central repository for the token mapping process that holds the original sensitive values and their related token values. Sensitive data and token values must be securely kept in an encrypted format. Management of cryptographic keys. Strong key management procedures are required for sensitive data encryption on token data stores. == Difference from encryption == Tokenization and "classic" encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and the