A cryptosystem is a set of cryptographic algorithms that map ciphertexts and plaintexts to each other. == Private-key cryptosystems == Private-key cryptosystems use the same key for encryption and decryption. Caesar cipher Substitution cipher Enigma machine Data Encryption Standard Twofish Serpent Camellia Salsa20 ChaCha20 Blowfish CAST5 Kuznyechik RC4 3DES Skipjack Safer IDEA Advanced Encryption Standard, also known as AES and Rijndael. == Public-key cryptosystems == Public-key cryptosystems use a public key for encryption and a private key for decryption. Diffie–Hellman key exchange RSA encryption Rabin cryptosystem Schnorr signature ElGamal encryption Elliptic-curve cryptography Lattice-based cryptography McEliece cryptosystem Multivariate cryptography Isogeny-based cryptography
Connected-component labeling
Connected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected-component labeling is not to be confused with segmentation. Connected-component labeling is used in computer vision to detect connected regions in binary digital images, although color images and data with higher dimensionality can also be processed. When integrated into an image recognition system or human-computer interaction interface, connected component labeling can operate on a variety of information. Blob extraction is generally performed on the resulting binary image from a thresholding step, but it can be applicable to gray-scale and color images as well. Blobs may be counted, filtered, and tracked. Blob extraction is related to but distinct from blob detection. == Overview == A graph, containing vertices and connecting edges, is constructed from relevant input data. The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbors'. An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbors. Connectivity is determined by the medium; image graphs, for example, can be 4-connected neighborhood or 8-connected neighborhood. Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed . == Definition == The usage of the term connected-component labeling (CCL) and its definition is quite consistent in the academic literature, whereas connected-component analysis (CCA) varies both in terminology and in its definition of the problem. Rosenfeld et al. define connected components labeling as the “[c]reation of a labeled image in which the positions associated with the same connected component of the binary input image have a unique label.” Shapiro et al. define CCL as an operator whose “input is a binary image and [...] output is a symbolic image in which the label assigned to each pixel is an integer uniquely identifying the connected component to which that pixel belongs.” There is no consensus on the definition of CCA in the academic literature. It is often used interchangeably with CCL. A more extensive definition is given by Shapiro et al.: “Connected component analysis consists of connected component labeling of the black pixels followed by property measurement of the component regions and decision making.” The definition for connected-component analysis presented here is more general, taking the thoughts expressed in into account. == Algorithms == The algorithms discussed can be generalised to arbitrary dimensions, albeit with increased time and space complexity. === One component at a time === This is a fast and very simple method to implement and understand. It is based on graph traversal methods in graph theory. In short, once the first pixel of a connected component is found, all the connected pixels of that connected component are labelled before going onto the next pixel in the image. This algorithm is part of Vincent and Soille's watershed segmentation algorithm, other implementations also exist. In order to do that a linked list is formed that will keep the indexes of the pixels that are connected to each other, steps (2) and (3) below. The method of defining the linked list specifies the use of a depth or a breadth first search. For this particular application, there is no difference which strategy to use. The simplest kind of a last in first out queue implemented as a singly linked list will result in a depth first search strategy. It is assumed that the input image is a binary image, with pixels being either background or foreground and that the connected components in the foreground pixels are desired. The algorithm steps can be written as: Start from the first pixel in the image. Set current label to 1. Go to (2). If this pixel is a foreground pixel and it is not already labelled, give it the current label and add it as the first element in a queue, then go to (3). If it is a background pixel or it was already labelled, then repeat (2) for the next pixel in the image. Pop out an element from the queue, and look at its neighbours (based on any type of connectivity). If a neighbour is a foreground pixel and is not already labelled, give it the current label and add it to the queue. Repeat (3) until there are no more elements in the queue. Go to (2) for the next pixel in the image and increment current label by 1. Note that the pixels are labelled before being put into the queue. The queue will only keep a pixel to check its neighbours and add them to the queue if necessary. This algorithm only needs to check the neighbours of each foreground pixel once and doesn't check the neighbours of background pixels. The pseudocode is: algorithm OneComponentAtATime(data) input : imageData[xDim][yDim] initialization : label = 0, labelArray[xDim][yDim] = 0, statusArray[xDim][yDim] = false, queue1, queue2; for i = 0 to xDim do for j = 0 to yDim do if imageData[i][j] has not been processed do if imageData[i][j] is a foreground pixel do check its four neighbors(north, south, east, west) : if neighbor is not processed do if neighbor is a foreground pixel do add it to queue1 else update its status to processed end if labelArray[i][j] = label (give label) statusArray[i][j] = true (update status) while queue1 is not empty do For each pixel in the queue do : check its four neighbors if neighbor is not processed do if neighbor is a foreground pixel do add it to queue2 else update its status to processed end if give it the current label update its status to processed remove the current element from queue1 copy queue2 into queue1 end While increase the label end if else update its status to processed end if end if end if end for end for === Two-pass === Relatively simple to implement and understand, the two-pass algorithm, (also known as the Hoshen–Kopelman algorithm) iterates through 2-dimensional binary data. The algorithm makes two passes over the image: the first pass to assign temporary labels and record equivalences, and the second pass to replace each temporary label by the smallest label of its equivalence class. The input data can be modified in situ (which carries the risk of data corruption), or labeling information can be maintained in an additional data structure. Connectivity checks are carried out by checking neighbor pixels' labels (neighbor elements whose labels are not assigned yet are ignored), or say, the north-east, the north, the north-west and the west of the current pixel (assuming 8-connectivity). 4-connectivity uses only north and west neighbors of the current pixel. The following conditions are checked to determine the value of the label to be assigned to the current pixel (4-connectivity is assumed) Conditions to check: Does the pixel to the left (west) have the same value as the current pixel? Yes – We are in the same region. Assign the same label to the current pixel No – Check next condition Do both pixels to the north and west of the current pixel have the same value as the current pixel but not the same label? Yes – We know that the north and west pixels belong to the same region and must be merged. Assign the current pixel the minimum of the north and west labels, and record their equivalence relationship No – Check next condition Does the pixel to the left (west) have a different value and the one to the north the same value as the current pixel? Yes – Assign the label of the north pixel to the current pixel No – Check next condition Do the pixel's north and west neighbors have different pixel values than current pixel? Yes – Create a new label id and assign it to the current pixel The algorithm continues this way, and creates new region labels whenever necessary. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships. Union-find essentially stores labels which correspond to the same blob in a disjoint-set data structure, making it easy to remember the equivalence of two labels by the use of an interface method E.g.: findSet(l). findSet(l) returns the minimum label value that is equivalent to the function argument 'l'. Once the initial labeling and equivalence recording is completed, the second pass merely replaces each pixel label with its equivalent disjoint-set representative element. A faster-scanning algorithm for connected-region extraction is presented below. On the first pass: Iterate through each element of the data by column, then by row (Raster Scanning) If the element is not the background Get the neighboring elements of the current element If there are no neighbors, uniquely
Reverse data management
Reverse data management describes a branch and set of research questions in relational database theory that aim to reverse the common focus of standard data management. Instead of focusing on the "forward" transformation of an input databases (a set of relational tables) to an output table, which is the main focus of standard query evaluation, reverse data management reverses that focus and studies the possible input database transformations that would achieve a desired output. Usually the objective is to find an intervention (a deletion, addition, or change of tuples) of minimal size, in order to achieve a particular change in the output. The problem has been studied at least since the 1980s, but has received renewed attention due to an influential paper in the early 2000s that made a connection between provenance and view propagation. The term was coined in a VLDB 2011 vision paper. The problem has been receiving significant attention in recent years due to its connection to computational fairness. == Topics in reverse data management problems == Example topics in reverse data management include: Deletion propagation with source side-effects: Find a minimal number of tuples to delete in the database in order to delete a particular tuple in the output. Deletion propagation with view side-effects: Find a set of tuples to delete in the database in order to delete a particular tuple in the output, while removing the minimal number of other output tuples. Causal responsibility: Find a minimal number of tuples to delete in the database in order to make a particular input tuple counterfactual. This notion is inspired by the notions of actual cause and causal responsibility from the work of Halpern and Pearl. Resilience: Find a minimal number of tuples to delete in the database in order to make a Boolean query false. The complexity of this problem is identical to the problem of deletion propagation with source-side effects over a different database. Smallest witness problem: Find a minimal number of tuples to keep in the a database (or equivalently, delete a maximal number of tuples) while keeping a particular tuple in the output. Minimum repair: Given a database that violates certain integrity constraints, find a minimal number of tuples to delete in the database in order to fulfill all constraints (also called to "repair" the database).
Controlled vocabulary
A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction. == In library and information science == In library and information science, controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of homographs, synonyms and polysemes by a bijection between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency. For example, in the Library of Congress Subject Headings (a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (cockroach versus Periplaneta americana), and choices between synonyms (automobile versus car), among other difficult issues. Choices of preferred terms are based on the principles of user warrant (what terms users are likely to use), literary warrant (what terms are generally used in the literature and documents), and structural warrant (terms chosen by considering the structure, scope of the controlled vocabulary). Controlled vocabularies also typically handle the problem of homographs with qualifiers. For example, the term pool has to be qualified to refer to either swimming pool or the game pool to ensure that each preferred term or heading refers to only one concept. === Types used in libraries === There are two main kinds of controlled vocabulary tools used in libraries: subject headings and thesauri. While the differences between the two are diminishing, there are still some minor differences: Historically, subject headings were designed to describe books in library catalogs by catalogers while thesauri were used by indexers to apply index terms to documents and articles. Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines. Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct order. Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the Library of Congress Subject Heading itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "Broader term" and "Narrow term". The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Well known subject heading systems include the Library of Congress system, Medical Subject Headings (MeSH) created by the United States National Library of Medicine, and Sears. Well known thesauri include the Art and Architecture Thesaurus and the ERIC Thesaurus. When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language. Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as tags, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as metadata. == Indexing languages == There are three main types of indexing languages. Controlled indexing language – only approved terms can be used by the indexer to describe the document Natural language indexing language – any term from the document in question can be used to describe the document Free indexing language – any term (not only from the document) can be used to describe the document When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing exhaustivity, the more terms indexed for each document. In recent years free text search as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is indexed). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search". === Advantages === Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused by the inherent ambiguity of natural language. Take the English word football for example. Football is the name given to a number of different team sports. Worldwide the most popular of these team sports is association football, which also happens to be called soccer in several countries. The word football is also applied to rugby football (rugby union and rugby league), American football, Australian rules football, Gaelic football, and Canadian football. A search for football therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by tagging the documents in such a way that the ambiguities are eliminated. Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic). In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term. === Disadvantages === A controlled vocabulary search may lead to unsatisfactory recall, in that it will fail to retrieve some documents that are actually relevant to the search question. This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer. Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main focus. But it turns out that for the searcher that article is relevant and hence recall fails. A free text search would automatically pick up that article regardless. On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination. Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words. The use of controlled vocabularies can be costly compared to free
Materials informatics
Materials informatics is a field of study that applies the principles of informatics and data science to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. The term "materials informatics" is frequently used interchangeably with "data science", "machine learning", and "artificial intelligence" by the community. This is an emerging field, with a goal to achieve high-speed and robust acquisition, management, analysis, and dissemination of diverse materials data with the goal of greatly reducing the time and risk required to develop, produce, and deploy new materials, which generally takes longer than 20 years. This field of endeavor is not limited to some traditional understandings of the relationship between materials and information. Some more narrow interpretations include combinatorial chemistry, process modeling, materials databases, materials data management, and product life cycle management. Materials informatics is at the convergence of these concepts, but also transcends them and has the potential to achieve greater insights and deeper understanding by applying lessons learned from data gathered on one type of material to others. By gathering appropriate meta data, the value of each individual data point can be greatly expanded. == Databases == Databases are essential for any informatics research and applications. In material informatics many databases exist containing both empirical data obtained experimentally, and theoretical data obtained computationally. Big data that can be used for machine learning is particularly difficult to obtain for experimental data due to the lack of a standard for reporting data and the variability in the experimental environment. This lack of big data has led to growing effort in developing machine learning techniques that utilize data extremely data sets. On the other hand, large uniform database of theoretical density functional theory (DFT) calculations exists. These databases have proven their utility in high-throughput material screening and discovery. Some common DFT databases and high throughput tools are listed below: Databases: MaterialsProject.org, MaterialsWeb.org (University of Florida) HT software: Pymatgen, MPInterfaces, Matminer == Beyond computational methods? == The concept of materials informatics is addressed by the Materials Research Society. For example, materials informatics was the theme of the December 2006 issue of the MRS Bulletin. The issue was guest-edited by John Rodgers of Innovative Materials, Inc., and David Cebon of Cambridge University, who described the "high payoff for developing methodologies that will accelerate the insertion of materials, thereby saving millions of investment dollars." The editors focused on the limited definition of materials informatics as primarily focused on computational methods to process and interpret data. They stated that "specialized informatics tools for data capture, management, analysis, and dissemination" and "advances in computing power, coupled with computational modeling and simulation and materials properties databases" will enable such accelerated insertion of materials. A broader definition of materials informatics goes beyond the use of computational methods to carry out the same experimentation, viewing materials informatics as a framework in which a measurement or computation is one step in an information-based learning process that uses the power of a collective to achieve greater efficiency in exploration. When properly organized, this framework crosses materials boundaries to uncover fundamental knowledge of the basis of physical, mechanical, and engineering properties. == Challenges == While there are many who believe in the future of informatics in the materials development and scaling process, many challenges remain. Hill, et al., write that "Today, the materials community faces serious challenges to bringing about this data-accelerated research paradigm, including diversity of research areas within materials, lack of data standards, and missing incentives for sharing, among others. Nonetheless, the landscape is rapidly changing in ways that should benefit the entire materials research enterprise." This remaining tension between traditional materials development methodologies and the use of more computationally, machine learning, and analytics approaches will likely exist for some time as the materials industry overcomes some of the cultural barriers necessary to fully embrace such new ways of thinking. == Analogy from Biology == The overarching goals of bioinformatics and systems biology may provide a useful analogy. Andrew Murray of Harvard University expresses the hope that such an approach "will save us from the era of "one graduate student, one gene, one PhD". Similarly, the goal of materials informatics is to save us from one graduate student, one alloy, one PhD. Such goals will require more sophisticated strategies and research paradigms than applying data-science methods to the same tasks set currently undertaken by students.
Anna Becker
Anna Becker is an Israeli researcher known in the field of artificial intelligence and computer science within the financial field. == Early life and education == Becker was born in Russia and immigrated to Israel at 16 after graduating from a school in Moscow. At 17, she began her studies at Technion – Israel Institute of Technology. During her master's degree in computer science, she taught first-year students of the same course, and at 27, Becker completed her PhD in Computer Science and Artificial Intelligence. == Career == While pursuing her PhD, Becker resolved an NP-complete approximation algorithm that had been unresolved for over twenty years. This made her a recognized scholar in the field. After completing her PhD, she developed an approximation technique by a factor of two. This technique is widely used today in operating systems, database systems, and VLSI chip designs. She then founded and sold Strategy Runner, a fintech software. After this, she founded EndoTech, an algorithmic trading platform based on artificial intelligence and machine learning. EndoTech's trading strategies have been operating in live cryptocurrency markets since 2017. The platform's BTC Alpha strategy has reported an average annual return of 163% on fixed capital over eight years of live operation, with a maximum drawdown of 14% and a trade accuracy rate of approximately 83%. In 2026, EndoTech entered a partnership with Bit1 Exchange to make its BTC Alpha and ETH Alpha copy trading strategies accessible to retail investors with no minimum deposit requirement, through a full-custody model in which user funds remain in their own exchange wallets at all times.As of 2023, Becker is working on Fianchetto Fund, an AI-based investing analysis platform. Becker has also co-authored a book on Bayesian networks, which has been published widely in the field of computer science and artificial intelligence.
Object storage
Object storage (also known as object-based storage or blob storage) is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems, which manage data as a file hierarchy, and block storage, which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity. Object storage systems allow retention of massive amounts of unstructured data in which data is written once and read once (or many times). Object storage is used for purposes such as storing objects like videos and photos on Facebook, songs on Spotify, or files in online collaboration services, such as Dropbox. One of the limitations with object storage is that it is not intended for transactional data, as object storage was not designed to replace NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file. == History == === Origins === Jim Starkey coined the term blob working at Digital Equipment Corporation to refer to opaque data entities. The terminology was adopted for Rdb/VMS. Blob is often humorously explained to be an abbreviation for binary large object. According to Starkey, this backronym arose when Terry McKiever, working in marketing at Apollo Computer felt that the term needed to be an abbreviation. McKiever began using the expansion basic large object. This was later eclipsed by the retroactive explanation of blobs as binary large objects. According to Starkey, "Blob don't stand for nothin'." Rejecting the acronym, he explained his motivation behind the coinage, saying, "A blob is the thing that ate Cincinnatti [sic], Cleveland, or whatever", referring to the 1958 science fiction film The Blob. In 1995, research led by Garth Gibson on Network-Attached Secure Disks first promoted the concept of splitting less common operations, like namespace manipulations, from common operations, like reads and writes, to optimize the performance and scale of both. In the same year, a Belgian company – FilePool – was established to build the basis for archiving functions. Object storage was proposed at Gibson's Carnegie Mellon University lab as a research project in 1996. Another key concept was abstracting the writes and reads of data to more flexible data containers (objects). Fine grained access control through object storage architecture was further described by one of the NASD team, Howard Gobioff, who later was one of the inventors of the Google File System. Other related work includes the Coda filesystem project at Carnegie Mellon, which started in 1987, and spawned the Lustre file system. There is also the OceanStore project at UC Berkeley, which started in 1999 and the Logistical Networking project at the University of Tennessee Knoxville, which started in 1998. In 1999, Gibson founded Panasas to commercialize the concepts developed by the NASD team. === Development === Seagate Technology played a central role in the development of object storage. According to the Storage Networking Industry Association (SNIA), "Object storage originated in the late 1990s: Seagate specifications from 1999 Introduced some of the first commands and how operating system effectively removed from consumption of the storage." A preliminary version of the "OBJECT BASED STORAGE DEVICES Command Set Proposal" dated 10/25/1999 was submitted by Seagate as edited by Seagate's Dave Anderson and was the product of work by the National Storage Industry Consortium (NSIC) including contributions by Carnegie Mellon University, Seagate, IBM, Quantum, and StorageTek. This paper was proposed to INCITS T-10 (International Committee for Information Technology Standards) with a goal to form a committee and design a specification based on the SCSI interface protocol. This defined objects as abstracted data, with unique identifiers and metadata, how objects related to file systems, along with many other innovative concepts. Anderson presented many of these ideas at the SNIA conference in October 1999. The presentation revealed an IP Agreement that had been signed in February 1997 between the original collaborators (with Seagate represented by Anderson and Chris Malakapalli) and covered the benefits of object storage, scalable computing, platform independence, and storage management. == Architecture == === Abstraction of storage === One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of blocks or (exclusively) files. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions. === Inclusion of rich custom metadata within the object === Object storage explicitly separates file metadata from data to support additional capabilities. As opposed to fixed metadata in file systems (filename, creation date, type, etc.), object storage provides for full function, custom, object-level metadata in order to: Capture application-specific or user-specific information for better indexing purposes Support data-management policies (e.g. a policy to drive object movement from one storage tier to another) Centralize management of storage across many individual nodes and clusters Optimize metadata storage (e.g. encapsulated, database or key value storage) and caching/indexing (when authoritative metadata is encapsulated with the metadata inside the object) independently from the data storage (e.g. unstructured binary storage) Additionally, in some object-based file-system implementations: The file system clients only contact metadata servers once when the file is opened and then get content directly via object-storage servers (vs. block-based file systems which would require constant metadata access) Data objects can be configured on a per-file basis to allow adaptive stripe width, even across multiple object-storage servers, supporting optimizations in bandwidth and I/O Object-based storage devices (OSD) as well as some software implementations (e.g., DataCore Swarm) manage metadata and data at the storage device level: Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, data is organized into flexible-sized data containers, called objects Each object has both data (an uninterpreted sequence of bytes) and metadata (an extensible set of attributes describing the object); physically encapsulating both together benefits recoverability. The command interface includes commands to create and delete objects, write bytes and read bytes to and from individual objects, and to set and get attributes on objects Security mechanisms provide per-object and per-command access control === Programmatic data management === Object storage provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes Create, read, update and delete (CRUD) functions for basic read, write and delete operations. Some object storage implementations go further, supporting additional functionality like object/file versioning, object replication, life-cycle management and movement of objects between different tiers and types of storage. Most API implementations are REST-based, allowing the use of many standard HTTP calls. == Implementation == === Cloud storage === The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon S3, which debuted in March 2006, Microsoft Azure Blob Storage, IBM Cloud Object Storage, Rackspace Cloud Files (whose code was donated in 2010 to Openstack project and released as OpenStack Swift), and Google Cloud Storage released in May 2010. === Object-based file systems === Some distributed file systems use an object-based architecture, where file metadata is stored in metadata servers and file data is stored i