Convergent encryption

Convergent encryption

Convergent encryption, also known as content hash keying, is a cryptosystem that produces identical ciphertext from identical plaintext files. This has applications in cloud computing to remove duplicate files from storage without the provider having access to the encryption keys. The combination of deduplication and convergent encryption was described in a backup system patent filed by Stac Electronics in 1995. This combination has been used by Farsite, Permabit, Freenet, MojoNation, GNUnet, flud, and the Tahoe Least-Authority File Store. The system gained additional visibility in 2011 when cloud storage provider Bitcasa announced they were using convergent encryption to enable de-duplication of data in their cloud storage service. == Overview == The system computes a cryptographic hash of the plaintext in question. The system then encrypts the plaintext by using the hash as a key. Finally, the hash itself is stored, encrypted with a key chosen by the user. == Known Attacks == Convergent encryption is open to a "confirmation of a file attack" in which an attacker can effectively confirm whether a target possesses a certain file by encrypting an unencrypted, or plain-text, version and then simply comparing the output with files possessed by the target. This attack poses a problem for a user storing information that is non-unique, i.e. also either publicly available or already held by the adversary - for example: banned books or files that cause copyright infringement. An argument could be made that a confirmation of a file attack is rendered less effective by adding a unique piece of data such as a few random characters to the plain text before encryption; this causes the uploaded file to be unique and therefore results in a unique encrypted file. However, some implementations of convergent encryption where the plain-text is broken down into blocks based on file content, and each block then independently convergently encrypted may inadvertently defeat attempts at making the file unique by adding bytes at the beginning or end. Even more alarming than the confirmation attack is the "learn the remaining information attack" described by Drew Perttula in 2008. This type of attack applies to the encryption of files that are only slight variations of a public document. For example, if the defender encrypts a bank form including a ten digit bank account number, an attacker that is aware of generic bank form format may extract defender's bank account number by producing bank forms for all possible bank account numbers, encrypt them and then by comparing those encryptions with defender's encrypted file deduce the bank account number. Note that this attack can be extended to attack a large number of targets at once (all spelling variations of a target bank customer in the example above, or even all potential bank customers), and the presence of this problem extends to any type of form document: tax returns, financial documents, healthcare forms, employment forms, etc. Also note that there is no known method for decreasing the severity of this attack -- adding a few random bytes to files as they are stored does not help, since those bytes can likewise be attacked with the "learn the remaining information" approach. The only effective approach to mitigating this attack is to encrypt the contents of files with a non-convergent secret before storing (negating any benefit from convergent encryption), or to simply not use convergent encryption in the first place.

Mooky (app)

Mooky was a location-based social and dating application, designed to help its users to find the perfect match by providing a large scale of filters. Mooky was free of charge. The app made use of mobile devices' geolocation, a feature of smart phones and other devices which allows users to locate other users who are nearby. == History == Mooky was published on Google Play on April 17, 2016, by Mooky BV. The latest version of this application was version 1.0.6. == Overview == === How it works === Mooky used Facebook to build a user profile with photos and basic information, like the user's surname and age. From there on the user had to fill in their Mooky profile, which contains information about the user's height, posture, hair color, eye color, ethnicity and religion. After this the user could select its preferences to find matches nearby. === User verification === Mooky asked their users to take a selfie holding a piece of paper saying 'Mooky'. Mooky would then manually accept or decline the user verification.

Unrestricted algorithm

An unrestricted algorithm is an algorithm for the computation of a mathematical function that puts no restrictions on the range of the argument or on the precision that may be demanded in the result. The idea of such an algorithm was put forward by C. W. Clenshaw and F. W. J. Olver in a paper published in 1980. In the problem of developing algorithms for computing, as regards the values of a real-valued function of a real variable (e.g., g[x] in "restricted" algorithms), the error that can be tolerated in the result is specified in advance. An interval on the real line would also be specified for values when the values of a function are to be evaluated. Different algorithms may have to be applied for evaluating functions outside the interval. An unrestricted algorithm envisages a situation in which a user may stipulate the value of x and also the precision required in g(x) quite arbitrarily. The algorithm should then produce an acceptable result without failure.

Knowledge organization system

Knowledge organization system (KOS), concept system, or concept scheme is the generic term used in knowledge organization (KO) for the selection of concepts with an indication of selected semantic relations. Despite their differences in type, coverage, and application, all KOS aim to support the organization of knowledge and information to facilitate their management and retrieval. KOS vary in complexity from simple sorted lists to complex relational networks. They represent both structural and functional features, and serve to eliminate ambiguity, control synonyms, establish relationships, and present properties. From their origins in library and information science (LIS), KOS have been applied to other domains and disciplines within science and industry, although scholarly research and debate remain primarily within the KO field. Challenges of KOS include ambiguity of terminology, repercussions of biased systems, and potential obsolescence. KOS can be expressed in RDF and RDFS as per the Simple Knowledge Organization System (SKOS) recommendation by W3C, which aims to enable the sharing and linking of KOS via the Web. One of the largest collections of KOS is the BARTOC registry. == Types == While different schema of KOS have been proposed, most are generally arranged in terms of the complexity of their construction and maintenance. Some scholars argue that organizing KOS on a spectrum oversimplifies the shared characteristics among them, and may even result in a non-ideal structure being chosen. The following types are not exhaustive, and are often not mutually-exclusive in practice. === Term lists === Term lists are the least structured form of KOS. They include lists, glossaries, dictionaries, and synonym rings. Authority files and gazetteers may also be considered term lists, however other scholars categorize them and directories as "metadata-like models". Examples include the Union List of Artist Names name authority file and the GeoNames gazetteer. === Categorization and classification === KOS that emphasize specific (and often hierarchical) structures include subject headings, taxonomies, categorization schema, and classification schema & systems. Despite inconsistent use of the terms "categorization" and "classification" in some literature, categorization is generally loosely-assembled grouping schema and may include attributes that are not mutually exclusive (or having fuzzy boundaries), while classification is related to the arrangement of non-overlapping and mutually-exclusive classes. Classification schema may be universal (such as Dewey Decimal Classification and Information Coding Classification) or domain-specific (such as the National Library of Medicine Classification). === Relationship models === The types of KOS with greatest complexity and which utilize connections between concepts include thesauri, semantic networks, and ontologies. One of the most prominent examples of a semantic network is WordNet. === Others === Certain structures proposed to be considered types of KOS—but are not consistently included in schema—include folksonomies, topic maps, web directory structures, publication organization systems, and bibliometric maps. Some KOS organize other KOS themselves—for instance, PeriodO is a gazetteer of periodization categories. == Applications == Some early KOS were developed as a support system for abstracting and indexing services to be used by specially-trained searchers. With the growth of information digitization, usability became increasingly accessible, and more complex structures were developed. Prominent examples of KOS outside of LIS include organism taxonomy in biology, the periodic table of elements in chemistry, SIC and NAICS classification systems for industry & business, and AGROVOC agricultural controlled vocabulary. == Challenges == The study and design of KOS is an ongoing topic of discussion among KO scholars. === Terminology === [There is] a serious lack of vocabulary control in the literature on controlled vocabulary. Inconsistency of terminology within the study of KOS is a common issue. For instance, "ontology" is used for both a specific type of KOS as well as a generic term for any KOS. The terms "taxonomy", "classification", and "categorization" are also sometimes used interchangeably. === Bias === As knowledge can be historically and culturally biased, scholars have also discussed how KOS themselves can perpetuate harmful practices or stereotypes. For example, a number of concerns and criticisms about the classification of mental disorders in the Diagnostic and Statistical Manual of Mental Disorders have been raised, contributing to ongoing revisions. Ethical and intentional design approaches have been proposed for multi-perspective KOS in efforts to mitigate bias and other harmful practices. === Obsolescence === The possible obsolescence of the thesaurus and other simpler KOS has been the topic of debate, especially in the face of increasingly complex ontologies, the growing usage of "Google-like retrieval systems", and the move of KO theory and research away from LIS and toward computer science. Supporters of thesauri argue its continued usefulness for metadata enrichment, vocabulary mapping, and web services, as well as its usage in specific domains such as corporate intranets and digital image libraries.

Harold Borko

Harold Borko (1922-2012) was an American psychologist and researcher working primarily in the field of information science. == Biography == Borko was born in 1922 in New York City, New York. After serving in the US Army from 1942 to 1946 he obtained a BA in Psychology from the University of California, Los Angeles in 1948 and both his MA and PhD from the University of Southern California in Psychology in 1952. He returned to the army as a psychologist until 1956 after which he began a career working in and teaching information science. He died in California in 2012. == Information Science Career == After leaving the military Borko began working at the RAND Corporation as a Systems Training Specialist in 1956 and moved to the Systems Development Corporation a year later working in the Language Processing and Retrieval department. Alongside this work he taught Psychology at USC from 1957-65 and then moved into teaching Library Science at UCLA from 1965. In 1967 Borko left his role at the Systems Development Corporation and continued as a full-time professor at UCLA until his retirement in 1993.. From 1961 to 1995 Borko authored and co-authored over 100 articles on new developments in the field as well as the historiography of information science. He served as an editor of the Journal of Educational Data Processing from 1963-1975 and as President of the American Society for Information Science in 1966 == Partial list of works == Borko, H. (1962, May). The construction of an empirically based mathematically derived classification system. In Proceedings of the May 1-3, 1962, spring joint computer conference (pp. 279-289). Borko, H., & Bernick, M. (1963). Automatic document classification. Journal of the ACM (JACM), 10(2), 151-162. Borko, H. (1964). The Storage and Retrieval of Educational Information. Journal of Teacher Education, 15(4), 449-452. Borko, H. (1964). Measuring the reliability of subject classification by men and machines. American Documentation, 15(4), 268-273. Borko, H. (1965). The conceptual foundations of information systems. Borko, H. (1968), Information science: What is it?†. Amer. Doc., 19: 3-5. https://doi.org/10.1002/asi.5090190103 Borko, H. (1970). Experiments in book indexing by computer. Information storage and retrieval, 6(1), 5-16. Borko, H. (1985). An introduction to computer-based library systems (Lucy A. Tedd). Education for Information, 3(1), 61.

Color quantization

In computer graphics, color quantization or color image quantization is quantization applied to color spaces; it is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as visually similar as possible to the original image. Computer algorithms to perform color quantization on bitmaps have been studied since the 1970s. Color quantization is critical for displaying images with many colors on devices that can only display a limited number of colors, usually due to memory limitations, and enables efficient compression of certain types of images. The name "color quantization" is primarily used in computer graphics research literature; in applications, terms such as optimized palette generation, optimal palette generation, or decreasing color depth are used. Some of these are misleading, as the palettes generated by standard algorithms are not necessarily the best possible. == Algorithms == Most standard techniques treat color quantization as a problem of clustering points in three-dimensional space, where the points represent colors found in the original image and the three axes represent the three color channels. Almost any three-dimensional clustering algorithm can be applied to color quantization, and vice versa. After the clusters are located, typically the points in each cluster are averaged to obtain the representative color that all colors in that cluster are mapped to. The three color channels are usually red, green, and blue, but another popular choice is the Lab color space, in which Euclidean distance is more consistent with perceptual difference. The most popular algorithm by far for color quantization, invented by Paul Heckbert in 1979, is the median cut algorithm. Many variations on this scheme are in use. Before this time, most color quantization was done using the population algorithm or population method, which essentially constructs a histogram of equal-sized ranges and assigns colors to the ranges containing the most points. A more modern popular method is clustering using octrees, first conceived by Gervautz and Purgathofer and improved by Xerox PARC researcher Dan Bloomberg. If the palette is fixed, as is often the case in real-time color quantization systems such as those used in operating systems, color quantization is usually done using the "straight-line distance" or "nearest color" algorithm, which simply takes each color in the original image and finds the closest palette entry, where distance is determined by the distance between the two corresponding points in three-dimensional space. In other words, if the colors are ( r 1 , g 1 , b 1 ) {\displaystyle (r_{1},g_{1},b_{1})} and ( r 2 , g 2 , b 2 ) {\displaystyle (r_{2},g_{2},b_{2})} , we want to minimize the Euclidean distance: ( r 1 − r 2 ) 2 + ( g 1 − g 2 ) 2 + ( b 1 − b 2 ) 2 . {\displaystyle {\sqrt {(r_{1}-r_{2})^{2}+(g_{1}-g_{2})^{2}+(b_{1}-b_{2})^{2}}}.} This effectively decomposes the color cube into a Voronoi diagram, where the palette entries are the points and a cell contains all colors mapping to a single palette entry. There are efficient algorithms from computational geometry for computing Voronoi diagrams and determining which region a given point falls in; in practice, indexed palettes are so small that these are usually overkill. Color quantization is frequently combined with dithering, which can eliminate unpleasant artifacts such as banding that appear when quantizing smooth gradients and give the appearance of a larger number of colors. Some modern schemes for color quantization attempt to combine palette selection with dithering in one stage, rather than perform them independently. A number of other much less frequently used methods have been invented that use entirely different approaches. The Local K-means algorithm, conceived by Oleg Verevka in 1995, is designed for use in windowing systems where a core set of "reserved colors" is fixed for use by the system and many images with different color schemes might be displayed simultaneously. It is a post-clustering scheme that makes an initial guess at the palette and then iteratively refines it. In the early days of color quantization, the k-means clustering algorithm was deemed unsuitable because of its high computational requirements and sensitivity to initialization. In 2011, M. Emre Celebi reinvestigated the performance of k-means as a color quantizer. He demonstrated that an efficient implementation of k-means outperforms a large number of color quantization methods. The high-quality but slow NeuQuant algorithm reduces images to 256 colors by training a Kohonen neural network "which self-organises through learning to match the distribution of colours in an input image. Taking the position in RGB-space of each neuron gives a high-quality colour map in which adjacent colours are similar." It is particularly advantageous for images with gradients. Finally, one of the newer methods is spatial color quantization, conceived by Puzicha, Held, Ketterer, Buhmann, and Fellner of the University of Bonn, which combines dithering with palette generation and a simplified model of human perception to produce visually impressive results even for very small numbers of colors. It does not treat palette selection strictly as a clustering problem, in that the colors of nearby pixels in the original image also affect the color of a pixel. See sample images. == History and applications == In the early days of PCs, it was common for video adapters to support only 2, 4, 16, or (eventually) 256 colors due to video memory limitations; they preferred to dedicate the video memory to having more pixels (higher resolution) rather than more colors. Color quantization helped to justify this tradeoff by making it possible to display many high color images in 16- and 256-color modes with limited visual degradation. Many operating systems automatically perform quantization and dithering when viewing high color images in a 256 color video mode, which was important when video devices limited to 256 color modes were dominant. Modern computers can now display millions of colors at once, far more than can be distinguished by the human eye, limiting this application primarily to mobile devices and legacy hardware. Nowadays, color quantization is mainly used in GIF and PNG images. GIF, for a long time the most popular lossless and animated bitmap format on the World Wide Web, only supports up to 256 colors, necessitating quantization for many images. Some early web browsers constrained images to use a specific palette known as the web colors, leading to severe degradation in quality compared to optimized palettes. PNG images support 24-bit color, but can often be made much smaller in filesize without much visual degradation by application of color quantization, since PNG files use fewer bits per pixel for palettized images. The infinite number of colors available through the lens of a camera is impossible to display on a computer screen; thus converting any photograph to a digital representation necessarily involves some quantization. Practically speaking, 24-bit color is sufficiently rich to represent almost all colors perceivable by humans with sufficiently small error as to be visually identical (if presented faithfully), within the available color space. However, the digitization of color, either in a camera detector or on a screen, necessarily limits the available color space. Consequently there are many colors that may be impossible to reproduce, regardless of how many bits are used to represent the color. For example, it is impossible in typical RGB color spaces (common on computer monitors) to reproduce the full range of green colors that the human eye is capable of perceiving. With the few colors available on early computers, different quantization algorithms produced very different-looking output images. As a result, a lot of time was spent on writing sophisticated algorithms to be more lifelike. === Quantization for image compression === Many image file formats support indexed color. A whole-image palette typically selects 256 "representative" colors for the entire image, where each pixel references any one of the colors in the palette, as in the GIF and PNG file formats. A block palette typically selects 2 or 4 colors for each block of 4x4 pixels, used in BTC, CCC, S2TC, and S3TC. === Editor support === Many bitmap graphics editors contain built-in support for color quantization, and will automatically perform it when converting an image with many colors to an image format with fewer colors. Most of these implementations allow the user to set exactly the number of desired colors. Examples of such support include: Photoshop's Mode→Indexed Color function supplies a number of quantization algorithms ranging from the fixed Windows system and Web palettes to the proprietary Local and Global algorithms for generating palettes suited to a particu

Irish logarithm

The Irish logarithm was a system of number manipulation invented by Percy Ludgate for machine multiplication. The system used a combination of mechanical cams as lookup tables and mechanical addition to sum pseudo-logarithmic indices to produce partial products, which were then added to produce results. The technique is similar to Zech logarithms (also known as Jacobi logarithms), but uses a system of indices original to Ludgate. == Concept == Ludgate's algorithm compresses the multiplication of two single decimal numbers into two table lookups (to convert the digits into indices), the addition of the two indices to create a new index which is input to a second lookup table that generates the output product. Because both lookup tables are one-dimensional, and the addition of linear movements is simple to implement mechanically, this allows a less complex mechanism than would be needed to implement a two-dimensional 10×10 multiplication lookup table. Ludgate stated that he deliberately chose the values in his tables to be as small as he could make them; given this, Ludgate's tables can be simply constructed from first principles, either via pen-and-paper methods, or a systematic search using only a few tens of lines of program code. They do not correspond to either Zech logarithms, Remak indexes or Korn indexes. == Pseudocode == The following is an implementation of Ludgate's Irish logarithm algorithm in the Python programming language: Table 1 is taken from Ludgate's original paper; given the first table, the contents of Table 2 can be trivially derived from Table 1 and the definition of the algorithm. Note since that the last third of the second table is entirely zeros, this could be exploited to further simplify a mechanical implementation of the algorithm.