AI Data Journalism

AI Data Journalism — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Mentimeter

    Mentimeter

    Mentimeter (or Menti for short) is a Swedish company based in Stockholm that develops and maintains an eponymous app used to create presentations with real-time feedback. == Foundation and background == Based in Stockholm, Sweden, the Mentimeter app was started by Swedish entrepreneur Johnny Warström and Niklas Ingvar as a response to unproductive meetings. The initial start-up budget was $500,000 raised by a group of prominent investors, including Per Appelgren in 2014, following the market's tendency to invest in Scandinavia. The app also focuses on online collaboration for the education sector, allowing students or public members to answer questions anonymously. The app enables users to share knowledge and real-time feedback on mobile devices with presentations, polls or brainstorming sessions in classes, meetings, gatherings, conferences and other group activities. == Achievements == By 2021, Mentimeter had over 270 million users and was one of Sweden's fastest-growing startups. The company also ranked #10 on 20 Fastest Growing 500 Startups Batch 16 Companies. It was ranked Stockholm's fastest growing company of the 2018 edition of the DI Gasell Award. Mentimeter has a freemium business model.

    Read more →
  • Format-preserving encryption

    Format-preserving encryption

    In cryptography, format-preserving encryption (FPE), refers to encrypting in such a way that the output (the ciphertext) is in the same format as the input (the plaintext). The meaning of "format" varies. Typically only finite sets of characters are used; numeric, alphabetic or alphanumeric. For example: Encrypting a 16-digit credit card number so that the ciphertext is another 16-digit number. Encrypting an English word so that the ciphertext is another English word. Encrypting an n-bit number so that the ciphertext is another n-bit number (this is the definition of an n-bit block cipher). For such finite domains, and for the purposes of the discussion below, the cipher is equivalent to a permutation of N integers {0, ... , N−1} where N is the size of the domain. == Motivation == === Restricted field lengths or formats === One motivation for using FPE comes from the problems associated with integrating encryption into existing applications, with well-defined data models. A typical example would be a credit card number, such as 1234567812345670 (16 bytes long, digits only). Adding encryption to such applications might be challenging if data models are to be changed, as it usually involves changing field length limits or data types. For example, output from a typical block cipher would turn credit card number into a hexadecimal (e.g.0x96a45cbcf9c2a9425cde9e274948cb67, 34 bytes, hexadecimal digits) or Base64 value (e.g. lqRcvPnCqUJc3p4nSUjLZw==, 24 bytes, alphanumeric and special characters), which will break any existing applications expecting the credit card number to be a 16-digit number. Apart from simple formatting problems, using AES-128-CBC, this credit card number might get encrypted to the hexadecimal value 0xde015724b081ea7003de4593d792fd8b695b39e095c98f3a220ff43522a2df02. In addition to the problems caused by creating invalid characters and increasing the size of the data, data encrypted using the CBC mode of an encryption algorithm also changes its value when it is decrypted and encrypted again. This happens because the random seed value that is used to initialize the encryption algorithm and is included as part of the encrypted value is different for each encryption operation. Because of this, it is impossible to use data that has been encrypted with the CBC mode as a unique key to identify a row in a database. FPE attempts to simplify the transition process by preserving the formatting and length of the original data, allowing a drop-in replacement of plaintext values with their ciphertexts in legacy applications. == Comparison to truly random permutations == Although a truly random permutation is the ideal FPE cipher, for large domains it is infeasible to pre-generate and remember a truly random permutation. So the problem of FPE is to generate a pseudorandom permutation from a secret key, in such a way that the computation time for a single value is small (ideally constant, but most importantly smaller than O(N)). == Comparison to block ciphers == An n-bit block cipher technically is a FPE on the set {0, ..., 2n-1}. If an FPE is needed on one of these standard sized sets (for example, n = 64 for DES and n = 128 for AES) a block cipher of the right size can be used. However, in typical usage, a block cipher is used in a mode of operation that allows it to encrypt arbitrarily long messages, and with an initialization vector as discussed above. In this mode, a block cipher is not an FPE. == Definition of security == In cryptographic literature (see most of the references below), the measure of a "good" FPE is whether an attacker can distinguish the FPE from a truly random permutation. Various types of attackers are postulated, depending on whether they have access to oracles or known ciphertext/plaintext pairs. == Algorithms == In most of the approaches listed here, a well-understood block cipher (such as AES) is used as a primitive to take the place of an ideal random function. This has the advantage that incorporation of a secret key into the algorithm is easy. Where AES is mentioned in the following discussion, any other good block cipher would work as well. === The FPE constructions of Black and Rogaway === Implementing FPE with security provably related to that of the underlying block cipher was first undertaken in a paper by cryptographers John Black and Phillip Rogaway, which described three ways to do this. They proved that each of these techniques is as secure as the block cipher that is used to construct it. This means that if the AES algorithm is used to create an FPE algorithm, then the resulting FPE algorithm is as secure as AES because an adversary capable of defeating the FPE algorithm can also defeat the AES algorithm. Therefore, if AES is secure, then the FPE algorithms constructed from it are also secure. In all of the following, E denotes the AES encryption operation that is used to construct an FPE algorithm and F denotes the FPE encryption operation. ==== FPE from a prefix cipher ==== One simple way to create an FPE algorithm on {0, ..., N-1} is to assign a pseudorandom weight to each integer, then sort by weight. The weights are defined by applying an existing block cipher to each integer. Black and Rogaway call this technique a "prefix cipher" and showed it was provably as good as the block cipher used. Thus, to create an FPE on the domain {0,1,2,3}, given a key K apply AES(K) to each integer, giving, for example, weight(0) = 0x56c644080098fc5570f2b329323dbf62 weight(1) = 0x08ee98c0d05e3dad3eb3d6236f23e7b7 weight(2) = 0x47d2e1bf72264fa01fb274465e56ba20 weight(3) = 0x077de40941c93774857961a8a772650d Sorting [0,1,2,3] by weight gives [3,1,2,0], so the cipher is F(0) = 3 F(1) = 1 F(2) = 2 F(3) = 0 This method is only useful for small values of N. For larger values, the size of the lookup table and the required number of encryptions to initialize the table gets too big to be practical. ==== FPE from cycle walking ==== If there is a set M of allowed values within the domain of a pseudorandom permutation P (for example P can be a block cipher like AES), an FPE algorithm can be created from the block cipher by repeatedly applying the block cipher until the result is one of the allowed values (within M). CycleWalkingFPE(x) { if P(x) is an element of M then return P(x) else return CycleWalkingFPE(P(x)) } The recursion is guaranteed to terminate. (Because P is one-to-one and the domain is finite, repeated application of P forms a cycle, so starting with a point in M the cycle will eventually terminate in M.) This has the advantage that the elements of M do not have to be mapped to a consecutive sequence {0,...,N-1} of integers. It has the disadvantage, when M is much smaller than P's domain, that too many iterations might be required for each operation. If P is a block cipher of a fixed size, such as AES, this is a severe restriction on the sizes of M for which this method is efficient. For example, an application may want to encrypt 100-bit values with AES in a way that creates another 100-bit value. With this technique, AES-128-ECB encryption can be applied until it reaches a value which has all of its 28 highest bits set to 0, which will take an average of 228 iterations to happen. ==== FPE from a Feistel network ==== It is also possible to make a FPE algorithm using a Feistel network. A Feistel network needs a source of pseudo-random values for the sub-keys for each round, and the output of the AES algorithm can be used as these pseudo-random values. When this is done, the resulting Feistel construction is good if enough rounds are used. One way to implement an FPE algorithm using AES and a Feistel network is to use as many bits of AES output as are needed to equal the length of the left or right halves of the Feistel network. If a 24-bit value is needed as a sub-key, for example, it is possible to use the lowest 24 bits of the output of AES for this value. This may not result in the output of the Feistel network preserving the format of the input, but it is possible to iterate the Feistel network in the same way that the cycle-walking technique does to ensure that format can be preserved. Because it is possible to adjust the size of the inputs to a Feistel network, it is possible to make it very likely that this iteration ends very quickly on average. In the case of credit card numbers, for example, there are 1015 possible 16-digit credit card numbers (accounting for the redundant check digit), and because the 1015 ≈ 249.8, using a 50-bit wide Feistel network along with cycle walking will create an FPE algorithm that encrypts fairly quickly on average. === The Thorp shuffle === A Thorp shuffle is like an idealized card-shuffle, or equivalently a maximally-unbalanced Feistel cipher where one side is a single bit. It is easier to prove security for unbalanced Feistel ciphers than for balanced ones. === VIL mode === For domain sizes that are a power of two, and an existing block cipher with a smaller bl

    Read more →
  • Dynamic knowledge repository

    Dynamic knowledge repository

    The dynamic knowledge repository (DKR) is a concept developed by Douglas C. Engelbart as a primary strategic focus for allowing humans to address complex problems. He has proposed that a DKR will enable us to develop a collective IQ greater than any individual's IQ. References and discussion of Engelbart's DKR concept are available at the Doug Engelbart Institute. == Definition == A knowledge repository is a computerized system that systematically captures, organizes and categorizes an organization's knowledge. The repository can be searched and data can be quickly retrieved. The effective knowledge repositories include factual, conceptual, procedural and meta-cognitive techniques. The key features of knowledge repositories include communication forums. A knowledge repository can take many forms to "contain" the knowledge it holds. A customer database is a knowledge repository of customer information and insights – or electronic explicit knowledge. A Library is a knowledge repository of books – physical explicit knowledge. A community of experts is a knowledge repository of tacit knowledge or experience. The nature of the repository only changes to contain/manage the type of knowledge it holds. A repository (as opposed to an archive) is designed to get knowledge out. It should therefore have some rules of structure, classification, taxonomy, record management, etc., to facilitate user engagement.

    Read more →
  • NRENum.net

    NRENum.net

    The NRENum.net service is an end-user ENUM service run by TERENA and the participating national research and education networking organisations (NRENs), primarily for academia. NRENum.net is considered as a complementary service and a valid alternative to the Golden ENUM tree. The domain nrenum.net is being populated in order to provide the infrastructure in DNS for storage of E.164 numbers. The NRENum.net service includes the operation of the Tier-0 root Domain Name Server(s) and the delegation of county codes to NRENum.net Registries. NRENum.net is a registered community trademark of TERENA. == Service description == E.164 Telephone Number Mapping (ENUM) is a standard protocol that is the result of work of the Internet Engineering Task Force's Telephone Number Mapping working group. ENUM translates a telephone number into a domain name. This allows users to continue to use the existing phone number formats they are familiar with, while allowing the call to be routed using DNS. This makes ENUM a quick, stable and cheap link between telecommunications systems and the Internet. RFC 3761 discusses the use of the Domain Name System for storage of E.164 numbers. More specifically, how DNS can be used for identifying available services connected to one E.164 number. The RIPE NCC provides DNS operations for e164.arpa (known as Golden ENUM tree) in accordance with the instructions from the Internet Architecture Board. The NRENum.net service is an end-user ENUM service run by TERENA and the participating NRENs primarily for academia. NRENum.net is considered as a complementary service and a valid alternative to the Golden ENUM tree. The domain nrenum.net is being populated in order to provide the infrastructure in DNS for storage of E.164 numbers. The NRENum.net service includes the operation of the Tier-0 root Domain Name Servers and the delegation of county codes to NRENum.net Registries. NRENum.net is a registered community trademark of TERENA. NRENum.net facilitates services such as Voice over IP and videoconferencing. NRENum.net tree refers to the tree structure where: Tier-0 root Domain Name Servers (technically one master and several secondary servers ensuring resilience) are run by the hosting organisations and coordinated by the NRENum.net Operations Team. Tier-1 Domain Name Servers are run by the NRENum.net (national or regional) Registries responsible for the country code(s) delegated. Tier-2 and lower DNS sub-delegations may be implemented, regulated by the national service policies. An NRENum.net Registry is an entity that is authorised by the NRENum.net Operations Team to operate the national or regional Tier-1 Domain Name Server and be responsible for the county code(s) delegated. In many countries there is a National Research and Education Networking organisation (NREN) that acts as the Registry of the country. An NRENum.net Registrar is responsible for the number/block registration in the Tier-1 DNS and a Number Validation Entity is responsible for the validation of the E.164 telephone numbers to be registered. The NREN may at the same time have the role of the NRENum.net Registry, Registrar and Validation Entity for the country code(s) delegated. A Registrant (end user) is an E.164 telephone number holder. Holders of E.164 numbers who want to be listed in the service must contact the appropriate NRENum.net Registrar. Number (block) delegation is the technical process of assigning country codes to national registries, or number blocks under country codes to end users. Number (block) registration is the technical process of configuring DNS and populating it with the appropriate ENUM records (i.e., adding NAPTR records to DNS) via registrars. The ITU-T strictly regulates the number structure of valid E.164 telephone numbers and assigns number blocks to national authorities (telecom regulators) or recently to global entities directly. The national authorities can further delegate the number ranges to local operators within the country or region. A virtual number has either a non-valid E.164 number structure (e.g., longer than 15 digits) or has a valid structure but is not assigned to any national authorities or operators. The number Validation Entity is responsible for checking the numbers to be registered to NRENum.net. == History == The idea for the NRENum.net service was conceived in 2006. NRENum.net became operational in August 2006, and was run by Bernie Höneisen, a staff member of SWITCH, and Kewin Stöckigt, a staff member of AARNet, as a private service, with technical support from SWITCH and the participants in the TERENA Task Force on Enhanced Communication Services (TF-ECS). When that task force completed its activities in 2008, TERENA agreed to take over the coordination of the NRENum.net service. By that time, nine NRENs had joined NRENum.net. The service continued to grow during the next years, and in March 2012 NRENum.net went global when RNP from Brazil joined the service as its 14th partificpant and the first outside Europe. In 2011, the participants decided to migrate the operation of the service's master Domain Name Server to NIIF and the operation of the two secondary DNSs to CARNET and SWITCH. In 2013, Internet2, AARNet and NORDUnet set up additional secondary Domain Name Servers for their regions, thereby completing the global distribution of DNS slaves and bringing the resilience of the NRENum.net infrastructure to a high level. == Governance == TERENA has established a lightweight global governance structure. The Global NRENum.net Governance Committee (GNGC) is the highest-level strategic body responsible for overall NRENum.net service definition, sustainability and long-term strategy. This includes formulating and recommending service governance principles and policies. Its members are nominated by the NRENum.net Registries in the various world regions, and are appointed by TERENA. The GNGC is composed of two members representing Europe, two representing the Asia-Pacific region, and two representing the Americas. The NRENum.net Operations Team is responsible for the day-to-day operations of the Tier-0 root DNSs and the handling of country code delegation requests. It may escalate technical or policy issues to the GNGC for discussion. TERENA is responsible for ensuring the correct and secure operations of the NRENum.net service performed by the NRENum.net Operations Team and governance by the GNGC. TERENA also supports the development of technical improvements to the NRENum.net service and promotes the deployment of NRENum.net worldwide. == Geographical deployment == Thirty-two county codes are delegated in the NRENum.net service. Below these are listed per world region. === Europe === === Asia-Pacific === === North America === +1 United States (Internet2) === Latin America === === Caribbean === === Africa === +262 Réunion, Mayotte (RENATER)

    Read more →
  • GeneTalk

    GeneTalk

    GeneTalk is a web-based platform, tool, and database for filtering, reduction and prioritization of human sequence variants from next-generation sequencing (NGS) data. GeneTalk allows editing annotation about sequence variants and build up a crowd sourced database with clinically relevant information for diagnostics of genetic disorders. GeneTalk allows searching for information about specific sequence variants and connects to experts on variants that are potentially disease-relevant. == Application to diagnostics == Users can upload NGS data in Variant Call Format (VCF) onto the GeneTalk server into their accounts. All entries of the file are preprocessed and shown in the integrated VCF viewer. Filtering tools are set by the user to reduce the number of clinically non-relevant variants. After filtering and prioritization users can interpret relevant variants by retrieving information (annotations) about variants from the GeneTalk database. The communication platform allow users to contact experts about specific variants, genes, or genetic disorders, to exchange knowledge and expertise. === Analysis procedure === Steps required to analyze VCF files Upload VCF file Edit pedigree and phenotype information for segregation filtering Filter VCF file by editing the filtering options View results and annotations Add annotations === Filtering tools === The following filtering options may be used to reduce the non-relevant sequence variants in VCF files. Functional – filter out variants that have effects on protein level Linkage – filter out variants that are on specified chromosomes Gene panel – filter variants by genes or gene panels, subscribe to publicly available gene panels or create own ones Frequency – show only variants with a genotype frequency lower than specified Inheritance – filter out variants by presumed mode of inheritance Annotation – show only variants with a score for medical relevance and scientific evidence == Communication platform and expert network == Users can share VCF files with colleagues and coworkers. The integrated mailing systems allows users to contact experts easily. Users can create annotations and comments and rate annotations regarding medical relevance and scientific evidence, that is helpful for the community of users for diagnosis of genetic disorders. Registered users provide information about their field of knowledge in their profile and can be contacted by other users. == Potential applications == Developing diagnostics Genetic analysis Capturing data generated by community Communication and exchange of knowledge and expertise

    Read more →
  • Tokenization (data security)

    Tokenization (data security)

    Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data. Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider. Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use. == Concepts and origins == The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia and scrip are terms synonymous with such tokens. In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection. In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: "The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility." To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents. == The tokenization process == The process of tokenization consists of the following steps: The application sends the tokenization data and authentication information to the tokenization system. It is stopped if authentication fails and the data is delivered to an event management system. As a result, administrators can discover problems and effectively manage the system. The system moves on to the next phase if authentication is successful. Using one-way cryptographic or random generation techniques, a token is generated and kept in a highly secure data vault. The new token is provided to the application for further use, replacing the sensitive data for processing and storage. Tokenization systems share several components according to established standards. Token generation is the process of producing a token using any means, such as one-way nonreversible cryptographic functions (e.g., a hash function with a strong, secret salt) or assignment via a randomly generated number. Random number generator (RNG) techniques are often the best choice for generating token values. Token mapping – this is the process of assigning the created token value to its original value. To enable permitted look-ups of the original value using the token as the index, a secure cross-reference database must be constructed. Token data store – this is a central repository for the token mapping process that holds the original sensitive values and their related token values. Sensitive data and token values must be securely kept in an encrypted format. Management of cryptographic keys. Strong key management procedures are required for sensitive data encryption on token data stores. == Difference from encryption == Tokenization and "classic" encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and the

    Read more →
  • Tokenization (data security)

    Tokenization (data security)

    Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data. Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider. Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use. == Concepts and origins == The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia and scrip are terms synonymous with such tokens. In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection. In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: "The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility." To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents. == The tokenization process == The process of tokenization consists of the following steps: The application sends the tokenization data and authentication information to the tokenization system. It is stopped if authentication fails and the data is delivered to an event management system. As a result, administrators can discover problems and effectively manage the system. The system moves on to the next phase if authentication is successful. Using one-way cryptographic or random generation techniques, a token is generated and kept in a highly secure data vault. The new token is provided to the application for further use, replacing the sensitive data for processing and storage. Tokenization systems share several components according to established standards. Token generation is the process of producing a token using any means, such as one-way nonreversible cryptographic functions (e.g., a hash function with a strong, secret salt) or assignment via a randomly generated number. Random number generator (RNG) techniques are often the best choice for generating token values. Token mapping – this is the process of assigning the created token value to its original value. To enable permitted look-ups of the original value using the token as the index, a secure cross-reference database must be constructed. Token data store – this is a central repository for the token mapping process that holds the original sensitive values and their related token values. Sensitive data and token values must be securely kept in an encrypted format. Management of cryptographic keys. Strong key management procedures are required for sensitive data encryption on token data stores. == Difference from encryption == Tokenization and "classic" encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and the

    Read more →
  • Social media optimization

    Social media optimization

    Social media optimization (SMO) is the use of online platforms to generate income or publicity to increase the awareness of a brand, event, product or service. Types of social media involved include RSS feeds, blogging sites, social bookmarking sites, social news websites, video sharing websites such as YouTube and social networking sites such as Facebook, Instagram, TikTok and X (Twitter). SMO is similar to search engine optimization (SEO) in that the goal is to drive web traffic, and draw attention to a company or creator. SMO's focal point is on gaining organic links to social media content. In contrast, SEO's core is about reaching the top of the search engine hierarchy. In general, social media optimization refers to optimizing a website and its content to encourage more users to use and share links to the website across social media and networking sites. SMO is used to strategically create online content ranging from well-written text to eye-catching digital photos or video clips that encourages and entices people to engage with a website. Users share this content, via its weblink, with social media contacts and friends. Common examples of social media engagement are "liking and commenting on posts, retweeting, embedding, sharing, and promoting content". Social media optimization is also an effective way of implementing online reputation management (ORM), meaning that if someone posts bad reviews of a business, an SMO strategy can ensure that the negative feedback is not the first link to come up in a list of search engine results. In the 2010s, with social media sites overtaking TV as a source for news for young people, news organizations have become increasingly reliant on social media platforms for generating web traffic. Publishers such as The Economist employ large social media teams to optimize their online posts and maximize traffic, while other major publishers now use advanced artificial intelligence (AI) technology to generate higher volumes of web traffic. == Relationship with search engine optimization == Social media optimization is an increasingly important factor in search engine optimization, which is the process of designing a website in a way so that it has as high a ranking as possible on search engines. Search engines are increasingly utilizing the recommendations of users of social networks such as Reddit, Facebook, Tumblr, Twitter, YouTube, LinkedIn, Pinterest and Instagram to rank pages in the search engine result pages. The implication is that when a webpage is shared or "liked" by a user on a social network, it counts as a "vote" for that webpage's quality. Thus, search engines can use such votes accordingly to properly ranked websites in search engine results pages. Furthermore, since it is more difficult to tip the scales or influence the search engines in this way, search engines are putting more stock into social search. This, coupled with increasingly personalized search based on interests and location, has significantly increased the importance of a social media presence in search engine optimization. Due to personalized search results, location-based social media presences on websites such as Yelp, Google Places, Foursquare, and Yahoo! Local have become increasingly important. While social media optimization is related to search engine marketing, it differs in several ways. Primarily, SMO focuses on driving web traffic from sources other than search engines, though improved search engine ranking is also a benefit of successful social media optimization. Further, SMO is helpful to target particular geographic regions in order to target and reach potential customers. This helps in lead generation (finding new customers) and contributes to high conversion rates (i.e., converting previously uninterested individuals into people who are interested in a brand or organization). == Relationship with viral marketing == Social media optimization is in many ways connected to the technique of viral marketing or "viral seeding" where word of mouth is created through the use of networking in social bookmarking, video and photo sharing websites. An effective SMO campaign can harness the power of viral marketing; for example, 80% of activity on Pinterest is generated through "repinning." Furthermore, by following social trends and utilizing alternative social networks, websites can retain existing followers while also attracting new ones. This allows businesses to build an online following and presence, all linking back to the company's website for increased traffic. For example, with an effective social bookmarking campaign, not only can website traffic be increased, but a site's rankings can also be increased. In a similar way, the engagement with blogs creates a similar result by sharing content through the use of RSS in the blogosphere. Social media optimization is considered an integral part of an online reputation management (ORM) or search engine reputation management (SERM) strategy for organizations or individuals who care about their online presence. SMO is one of six key influencers that affect Social Commerce Construct (SCC). Online activities such as consumers' evaluations and advices on products and services constitute part of what creates a Social Commerce Construct (SCC). Social media optimization is not limited to marketing and brand building. Increasingly, smart businesses are integrating social media participation as part of their knowledge management strategy (i.e., product/service development, recruiting, employee engagement and turnover, brand building, customer satisfaction and relations, business development and more). Additionally, social media optimization can be implemented to foster a community of the associated site, allowing for a healthy business-to-consumer (B2C) relationship. == Origins and implementation == According to technologist Danny Sullivan, the term "social media optimization" was first used and described by marketer Rohit Bhargava on his marketing blog in August 2006. In the same post, Bhargava established the five important rules of social media optimization. Bhargava believed that by following his rules, anyone could influence the levels of traffic and engagement on their site, increase popularity, and ensure that it ranks highly in search engine results. An additional 11 SMO rules have since been added to the list by other marketing contributors. The 16 rules of SMO, according to one source, are as follows: Increase your linkability Make tagging and bookmarking easy Reward inbound links Help your content to "travel" via sharing Encourage the mashup, where users are allowed to remix content Be a user resource, even if it doesn't help you (e.g., provide resources and information for users) Reward helpful and valuable users Participate (join the online conversation) Know how to target your audience Create new, quality content ("web scraping" of existing online content is ignored by good search engines) Be "real" in the tone and style of the posts Don't forget your roots; be humble Don't be afraid to experiment, innovate, try new things and "stay fresh" Develop an SMO strategy Choose your SMO tactics wisely Make SMO a key part of your marketing process and develop company best practices Bhargava's initial five rules were more specifically designed to SMO, while the list is now much broader and addresses everything that can be done across different social media platforms. According to author and CEO of TopRank Online Marketing, Lee Odden, a Social Media Strategy is also necessary to ensure optimization. This is a similar concept to Bhargava's list of rules for SMO. The Social Media Strategy may consider: Objectives e.g. creating brand awareness and using social media for external communications. Listening e.g. monitoring conversations relating to customers and business objectives. Audience e.g. finding out who the customers are, what they do, who they are influenced by, and what they frequently talk about. It is important to work out what customers want in exchange for their online engagement and attention. Participation and content e.g. establishing a presence and community online and engaging with users by sharing useful and interesting information. Measurement e.g. keeping a record of likes and comments on posts, and the number of sales to monitor growth and determine which tactics are most useful in optimizing social media. According to Lon Safko and David K. Brake in The Social Media Bible, it is also important to act like a publisher by maintaining an effective organizational strategy, to have an original concept and unique "edge" that differentiates one's approach from competitors, and to experiment with new ideas if things do not work the first time. If a business is blog-based, an effective method of SMO is using widgets that allow users to share content to their personal social media platforms. This will ultimately reach a wider target audience and drive mor

    Read more →
  • Knowledge integration

    Knowledge integration

    Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

    Read more →
  • Plaintext

    Plaintext

    In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. == Overview == With the advent of computing, the term plaintext expanded beyond human-readable documents to mean any data, including binary files, in a form that can be viewed or used without requiring a key or other decryption device. Information—a message, document, file, etc.—if to be communicated or stored in an unencrypted form is referred to as plaintext. Plaintext is used as input to an encryption algorithm; the output is usually termed ciphertext, particularly when the algorithm is a cipher. Codetext is less often used, and almost always only when the algorithm involved is actually a code. Some systems use multiple layers of encryption, with the output of one encryption algorithm becoming "plaintext" input for the next. == Secure handling == Insecure handling of plaintext can introduce weaknesses into a cryptosystem by letting an attacker bypass the cryptography altogether. Plaintext is vulnerable in use and in storage, whether in electronic or paper format. Physical security means the securing of information and its storage media from physical, attack—for instance by someone entering a building to access papers, storage media, or computers. Discarded material, if not disposed of securely, may be a security risk. Even shredded documents and erased magnetic media might be reconstructed with sufficient effort. If plaintext is stored in a computer file, the storage media, the computer and its components, and all backups must be secure. Sensitive data is sometimes processed on computers whose mass storage is removable, in which case physical security of the removed disk is vital. In the case of securing a computer, useful (as opposed to handwaving) security must be physical (e.g., against burglary, brazen removal under cover of supposed repair, installation of covert monitoring devices, etc.), as well as virtual (e.g., operating system modification, illicit network access, Trojan programs). Wide availability of keydrives, which can plug into most modern computers and store large quantities of data, poses another severe security headache. A spy (perhaps posing as a cleaning person) could easily conceal one, and even swallow it if necessary. Discarded computers, disk drives and media are also a potential source of plaintexts. Most operating systems do not actually erase anything— they simply mark the disk space occupied by a deleted file as 'available for use', and remove its entry from the file system directory. The information in a file deleted in this way remains fully present until overwritten at some later time when the operating system reuses the disk space. With even low-end computers commonly sold with many gigabytes of disk space and rising monthly, this 'later time' may be months later, or never. Even overwriting the portion of a disk surface occupied by a deleted file is insufficient in many cases. Peter Gutmann of the University of Auckland wrote a celebrated 1996 paper on the recovery of overwritten information from magnetic disks; areal storage densities have gotten much higher since then, so this sort of recovery is likely to be more difficult than it was when Gutmann wrote. Modern hard drives automatically remap failing sectors, moving data to good sectors. This process makes information on those failing, excluded sectors invisible to the file system and normal applications. Special software, however, can still extract information from them. Some government agencies (e.g., US NSA) require that personnel physically pulverize discarded disk drives and, in some cases, treat them with chemical corrosives. This practice is not widespread outside government, however. Garfinkel and Shelat (2003) analyzed 158 second-hand hard drives they acquired at garage sales and the like, and found that less than 10% had been sufficiently sanitized. The others contained a wide variety of readable personal and confidential information. See data remanence. Physical loss is a serious problem. The US State Department, Department of Defense, and the British Secret Service have all had laptops with secret information, including in plaintext, lost or stolen. Appropriate disk encryption techniques can safeguard data on misappropriated computers or media. On occasion, even when data on host systems is encrypted, media that personnel use to transfer data between systems is plaintext because of poorly designed data policy. For example, in October 2007, HM Revenue and Customs lost CDs that contained the unencrypted records of 25 million child benefit recipients in the United Kingdom. Modern cryptographic systems resist known plaintext or even chosen plaintext attacks, and so may not be entirely compromised when plaintext is lost or stolen. Older systems resisted the effects of plaintext data loss on security with less effective techniques—such as padding and Russian copulation to obscure information in plaintext that could be easily guessed.

    Read more →
  • Honey encryption

    Honey encryption

    Honey encryption is a type of data encryption that "produces a ciphertext, which, when decrypted with an incorrect key as guessed by the attacker, presents a plausible-looking yet incorrect plaintext." == Creators == Ari Juels and Thomas Ristenpart of the University of Wisconsin, the developers of the encryption system, presented a paper on honey encryption at the 2014 Eurocrypt cryptography conference. == Method of protection == A brute-force attack involves repeated decryption with random keys; this is equivalent to picking random plaintexts from the space of all possible plaintexts with a uniform distribution. This is effective because even though the attacker is equally likely to see any given plaintext, most plaintexts are extremely unlikely to be legitimate i.e. the distribution of legitimate plaintexts is non-uniform. Honey encryption defeats such attacks by first transforming the plaintext into a space such that the distribution of legitimate plaintexts is uniform. Thus an attacker guessing keys will see legitimate-looking plaintexts frequently and random-looking plaintexts infrequently. This makes it difficult to determine when the correct key has been guessed. In effect, honey encryption "[serves] up fake data in response to every incorrect guess of the password or encryption key." The security of honey encryption relies on the fact that the probability of an attacker judging a plaintext to be legitimate can be calculated (by the encrypting party) at the time of encryption. This makes honey encryption difficult to apply in certain applications e.g. where the space of plaintexts is very large or the distribution of plaintexts is unknown. It also means that honey encryption can be vulnerable to brute-force attacks if this probability is miscalculated. For example, it is vulnerable to known-plaintext attacks: if the attacker has a crib that a plaintext must match to be legitimate, they will be able to brute-force even Honey Encrypted data if the encryption did not take the crib into account. == Example == An encrypted credit card number is susceptible to brute-force attacks because not every string of digits is equally likely. The number of digits can range from 13 to 19, though 16 is the most common. Additionally, it must have a valid IIN and the last digit must match the checksum. An attacker can also take into account the popularity of various services: an IIN from MasterCard is probably more likely than an IIN from Diners Club Carte Blanche. Honey encryption can protect against these attacks by first mapping credit card numbers to a larger space where they match their likelihood of legitimacy. Numbers with invalid IINs and checksums are not mapped at all (i.e. have probability 0 of legitimacy). Numbers from large brands like MasterCard and Visa map to large regions of this space, while less popular brands map to smaller regions, etc. An attacker brute-forcing such an encryption scheme would only see legitimate-looking credit card numbers when they brute-force, and the numbers would appear with the frequency the attacker would expect from the real world. == Application == Juels and Ristenpart aim to use honey encryption to protect data stored on password manager services. Juels stated that "password managers are a tasty target for criminals," and worries that "if criminals get a hold of a large collection of encrypted password vaults they could probably unlock many of them without too much trouble." Hristo Bojinov, CEO and founder of Anfacto, noted that "Honey Encryption could help reduce their vulnerability. But he notes that not every type of data will be easy to protect this way. … Not all authentication or encryption system yield themselves to being honeyed."

    Read more →
  • Pepper (cryptography)

    Pepper (cryptography)

    In cryptography, a pepper is a secret added to an input such as a password during hashing with a cryptographic hash function. This value differs from a salt in that it is not stored alongside a password hash, but rather the pepper is kept separate using another meachanism, such as a Hardware Security Module. Note that the National Institute of Standards and Technology refers to this value as a secret key rather than a pepper. A pepper is similar in concept to a salt or an encryption key. It is like a salt in that it is a randomized value that is added to a password hash, and it is similar to an encryption key in that it should be kept secret. A pepper performs a comparable role to a salt or an encryption key, but while a salt is not secret (merely unique) and can be stored alongside the hashed output, a pepper is secret and must not be stored with the output. The hash and salt are usually stored in a database, but, if stored, a pepper must be stored separately to prevent it from being obtained by the attacker in case of a database breach. == History == The idea of a site- or service-specific salt (in addition to a per-user salt) has a long history, with Steven M. Bellovin proposing a local parameter in a Bugtraq post in 1995. In 1996 Udi Manber also described the advantages of such a scheme, terming it a secret salt. However, he suggested not storing the value of the secret salt, but instead rediscovering it by trial and error at password verification time. The term pepper has been used, by analogy to salt, but with a variety of meanings. For example, when discussing a challenge-response scheme, pepper has been used for a salt-like quantity, though not used for password storage; it has been used for a data transmission technique where a pepper must be guessed; and even as a part of jokes. The term pepper was proposed for a secret or local parameter stored separately from the password in a discussion of protecting passwords from rainbow table attacks. This usage did not immediately catch on: for example, Fred Wenzel added support to Django password hashing for storage based on a combination of bcrypt and HMAC with separately stored nonces, without using the term. Usage has since become more common. == Types == There are multiple different types of pepper: A shared secret that is common to all users. A randomly-selected number that must be re-discovered on every password input. These mechanisms could be combined with password salting, iterated hashing or even one another. == Shared-secret pepper == Bellovin and Webster suggest prepend a shared secret to the password before hashing, which allows easy use of existing hash functions. For example, consider two users to be added to a database. This table contains two combinations of username and password. The password is not saved, and the 8-byte (64-bit) 44534C70C6883DE2 pepper is saved in a safe place separate from the output values of the hash, in this case SHA256. Unlike the salt, the pepper does not provide protection to users who use the same password, but protects against dictionary attacks, unless the attacker has the pepper value available. Since the same pepper is not shared between different applications, an attacker is unable to reuse the hashes of one compromised database to another. A complete scheme for saving passwords may include both salt and pepper use. For example, it has been suggested to combine the pepper by encrypting salted password hashes, which allows rotation of the pepper. In the case of a shared-secret pepper, a single compromised password (via password reuse or other attack) along with a user's salt can lead to an attack to discover the pepper, rendering it ineffective. If an attacker knows a plaintext password and a user's salt, as well as the algorithm used to hash the password, then discovering the pepper can be a matter of brute forcing the values of the pepper. This is why NIST recommends the secret value be at least 112 bits, so that discovering it by exhaustive search is prohibitively expensive. The pepper must be generated anew for every application it is deployed in, otherwise a breach of one application would result in lowered security of another application. Without knowledge of the pepper, other passwords in the database will be far more difficult to extract from their hashed values, as the attacker would need to guess the password as well as the pepper. A pepper adds security to a database of salts and hashes because unless the attacker is able to obtain the pepper, cracking even a single hash is intractable, no matter how weak the original password. Even with a list of (salt, hash) pairs, an attacker must also guess the secret pepper in order to find the password which produces the hash. The NIST specification for a secret salt suggests using a Password-Based Key Derivation Function (PBKDF) with an approved Pseudorandom Function such as HMAC with SHA-3 as the hash function of the HMAC. The NIST recommendation is also to perform at least 1000 iterations of the PBKDF, and a further minimum 1000 iterations using the secret salt in place of the non-secret salt. == Randomly-selected pepper that must be re-discovered == The aim of this mechanism is to slow down password the password verification step, thus slowing attacks. The aim is similar increasing the iteration count on bcrypt or Argon2, but the mechanism is different. The secret salt or pepper must be rediscovered by the verifier or attacker each time by guessing. In this situation, the password hashing function is calculated using both the password and the pepper. At password storage time, the pepper is chosen randomly from a range between 1 and R, the hash output is calculated using the password and the pepper. The hash output is stored with the username. The pepper is then discarded. At password verification time, the verifier is provided with a username and password to verify. The originally calculated hash is retrieved for the given username, and then the hash of the password and each value between 1 and R is calculated. If any of these hash values match the stored password hash, the password is considered valid. Note, the possible values of the pepper should not be tested in a fixed order known to an attacker, otherwise a timing attack may reveal the pepper. If the password is correct, the correct pepper will be found in R/2 hash evaluations on average. If the password is incorrect, all R values must be tested before the password can be rejected.

    Read more →
  • Deep image prior

    Deep image prior

    Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18

    Read more →
  • Social media as a public utility

    Social media as a public utility

    Social media as a public utility is a theory postulating that social networking sites (such as Meta - ie:Facebook & Instagram or Alphabet - ie: YouTube & Google, but also independent sites such as Twitter, Tumblr, Snapchat etc.) are essential public services that should be regulated by the government, in a manner similar to how electric and phone utilities are typically government regulated. It is based on the notion that social media platforms have monopoly power and broad social influence. == Background == === Definitions === Social media is defined as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of User Generated Content." Furthermore, the New Zealand Government of Internal Affairs describes it as "a set of online technologies, sites, and practices which are used to share opinions, experiences and perspectives. Fundamentally it is about the conversation. In contrast with traditional media, the nature of social media is to be highly interactive." Moreover, the term social media is described as online tools that let people interact and communicate with each other. This has become a standard word for online cultural exchange and a dominant way for individuals to engage on the internet. By using social media individuals become more closely and strongly connected than ever before. The traditional definition of the term public utility is "an infrastructural necessity for the general public where the supply conditions are such that the public may not be provided with a reasonable service at reasonable prices because of monopoly in the area." Conventional public utilities include water, natural gas, and electricity. In order to secure the interests of the public, utilities are regulated. Public utilities can also be seen as natural monopolies implying that the highest degree of efficiency is accomplished under one operator in the marketplace. Public utility regulation for social media has been largely criticized because people believe it would produce undesirable and indirect effects. However, others say that truly effective government regulation would produce valuable results. Social media as a public utility is a crucial debate because utilities get regulated, so marking social media websites as utilities would require government regulation of various social media websites and platforms such as Facebook, Google, and Twitter. Applying the term public utility to social media implies that social media websites are public necessities, and, consequently, should be regulated by the government. While social media are not as essential for survival as traditional public utilities such as electricity, water, and natural gas, many people believe it has become vital for living in an interconnected world and without it, living a successful life would be difficult. Therefore, many people believe that social media has reached utility status and should be treated as a public utility. However, others believe that this is not true because social media are constantly revolutionizing and giving such platforms "utility status" would result in government regulation, which would consequently hinder innovation. Over the past decade many have debated and questioned whether or not "Internet service providers should be considered essential facilities or natural monopolies and regulated as public utilities." === Monopoly === A monopoly is defined as "a firm that is the only seller of a product or service having no close substitutes." A natural monopoly is when the entire demand within a relevant market can be satisfied at lowest cost by one firm rather than by two or more, and if such a market contains more than one firm then the firms will "quickly shake down to one through mergers or failures, or production will continue to consume more resources than necessary." In a monopoly competition is said to be short-lived, and in a natural monopoly it is said to produce inefficient results." Public utility companies can be regulated to prevent them from gaining monopolistic control. In November 2011 AT&T's proposal for merging with T-Mobile was rejected because it would have "diminished competition," and have led to the company having monopolistic power within the telephone industry. Such regulation is permitted because the telephone industry is a public utility. Similarly, Microsoft has also been prevented from taking various business actions that could result in the company gaining monopolistic power. If social media were a public utility then regulation of Google and Facebook would similarly dictate what they could and could not do. The possibility was raised in 2018 by U.S. Representative Steve King during a House Judiciary hearing on social media filtering practices. == Arguments == Advocates of this theory believe that social media websites already act like public utilities, and therefore regulation is needed. Additionally, advocates say that in the 21st century, using such websites are as necessary for communication as using traditional public utilities such as telephone, water, electricity, and natural gas are for other everyday uses. Specifically, advocates note that Google search should be treated as a public utility and needs to be regulated because it dominates the search engine market and no website can afford to ignore it. There is the position that a social media website such as Google "is a common carrier and should be regulated as such (Newman 2011)." These are reinforced by a perception that social media companies fail to properly maintain fair platforms for discourse. === Individual level === Advocates of regulating social media as a public utility believe that having an Internet presence using social media websites is imperative for individuals to adequately take part in the 21st century. Consequently, they argue that these sites are public utilities that need to be regulated to ensure that the constitutional rights of users are protected. For example, regulation may be needed to protect freedom of speech against risks such as Internet censorship and deplatforming. Social media affects people's behavior. For instance, it plays an important role in shaping its users' decisions and actions pertaining to health. This is demonstrated in a Pew Research Center research, which showed that 72 percent of American adults turned to social media for health information in 2011. Around 70 percent of people with chronic illnesses also use the platform to find cure, diagnoses, and other health answers. This development becomes a public issue as social media are likely to provide wrong medical information. Additionally, social media sites can also facilitate deleterious health behavior such as smoking, drug use, and harmful sexual behavior. === Business level === Advocates of social media as a public utility maintain that social media services dominate the Internet and are mainly owned by three or four companies that have unparalleled power to shape user interaction, and because of this power such businesses need to be regulated as public utilities. Zeynep Tufekci, University of North Carolina Chapel Hill, claims that services on the Internet such as Google, eBay, Facebook, Amazon.com, are all natural monopolies. She has stated that these services "benefit greatly from network externalities[,] which means that the more people on the service, the more useful it is for everyone," and thus it is difficult to replace the market leader. === Government level === Advocates of social media as a public utility believe that the government should impose restrictions on social media websites, such as Google, that are designed to benefit its rivals. Due to the recent substantial growth of social media websites such as Google, advocates claim that such a website "might need search neutrality regulation modeled after net neutrality regulation and that a Federal Search Commission might be needed to enforce such a regime." danah boyd expresses a future issue which the government may have to deal with in her research: Facebook is becoming an international social media website, specifically prevalent in Canada and Europe which are "two regions that love to regulate their utilities." Furthermore, recent books by New America Foundation Senior Fellow Rebecca MacKinnon and law professor Lori Andrews advise society to start considering Facebook and Google as nation-states or the "sovereigns of cyberspace." Overall, advocates of social media as a public utility believe that due to the immense popularity and necessity of social media websites, it is imperative that the Government imposes regulations in the same manner they do for electricity, water, and natural gas. == Counterarguments == Opponents of this theory say that social media websites should not be treated as public utilities because these platforms are changing every year, and because they are not essential services for s

    Read more →
  • Comparison of OLAP servers

    Comparison of OLAP servers

    The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information. == General information == == Data storage modes == == APIs and query languages == APIs and query languages OLAP servers support. == OLAP distinctive features == A list of OLAP features that are not supported by all vendors. All vendors support features such as parent-child, multilevel hierarchy, drilldown. == System limits == == Security == == Operating systems == The OLAP servers can run on the following operating systems: Note (1):The server availability depends on Java Virtual Machine not on the operating system == Support information ==

    Read more →