AI Face Judge

AI Face Judge — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Language model benchmark

    Language model benchmark

    A language model benchmark is a standardized test designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like answering questions, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. In addition to accuracy, the metrics can include throughput, energy efficiency, bias, trust, and sustainability. == Overview == === Types === Benchmarks may be described by the following adjectives, not mutually exclusive: Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as well as bilingual translation benchmarked by BLEU scores. Question answering: These tasks have a text question and a text answer, often multiple-choice. They can be open-book or closed-book. Open-book QA resembles reading comprehension questions, with relevant passages included as annotation in the question, in which the answer appears. Closed-book QA includes no relevant passages. Closed-book QA is also called open-domain question-answering. Before the era of large language models, open-book QA was more common, and understood as testing information retrieval methods. Closed-book QA became common since GPT-2 as a method to measure knowledge stored within model parameters. Omnibus: An omnibus benchmark combines many benchmarks, often previously published. It is intended as an all-in-one benchmarking solution. Reasoning: These tasks are usually in the question-answering format, but are intended to be more difficult than standard question answering. Multimodal: These tasks require processing not only text, but also other modalities, such as images and sound. Examples include OCR and transcription. Agency: These tasks are for a language-model–based software agent that operates a computer for a user, such as editing images, browsing the web, etc. Adversarial: A benchmark is "adversarial" if the items in the benchmark are picked specifically so that certain models do badly on them. Adversarial benchmarks are often constructed after state of the art (SOTA) models have saturated (achieved 100% performance) a benchmark, to renew the benchmark. A benchmark is "adversarial" only at a certain moment in time, since what is adversarial may cease to be adversarial as newer SOTA models appear. Public/Private: A benchmark might be partly or entirely private, meaning that some or all of the questions are not publicly available. The idea is that if a question is publicly available, then it might be used for training, which would be "training on the test set" and invalidate the result of the benchmark. Usually, only the guardians of the benchmark have access to the private subsets, and to score a model on such a benchmark, one must send the model weights, or provide API access, to the guardians. The boundary between a benchmark and a dataset is not sharp. Generally, a dataset contains three "splits": training, test, and validation. Both the test and validation splits are essentially benchmarks. In general, a benchmark is distinguished from a test/validation dataset in that a benchmark is typically intended to be used to measure the performance of many different models that are not trained specifically for doing well on the benchmark, while a test/validation set is intended to be used to measure the performance of models trained specifically on the corresponding training set. In other words, a benchmark may be thought of as a test/validation set without a corresponding training set. Conversely, certain benchmarks may be used as a training set, such as the English Gigaword or the One Billion Word Benchmark, which in modern language is just the negative log-likelihood loss on a pretraining set with 1 billion words. Indeed, the distinction between benchmark and dataset in language models became sharper after the rise of the pretraining paradigm, whereby a model is first trained on massive, unlabeled datasets to learn general language patterns, syntax, and knowledge (pretraining), and the base model is then adapted to specific, downstream tasks using smaller, labeled datasets (fine-tuning). === Lifecycle === Generally, the life cycle of a benchmark consists of the following steps: Inception: A benchmark is published. It can be simply given as a demonstration of the power of a new model (implicitly) that others then picked up as a benchmark, or as a benchmark that others are encouraged to use (explicitly). Growth: More papers and models use the benchmark, and the performance on the benchmark grows. Maturity, degeneration or deprecation: A benchmark may be saturated, after which researchers move on to other benchmarks. Progress on the benchmark may also be neglected as the field moves to focus on other benchmarks. Renewal: A saturated benchmark can be upgraded to make it no longer saturated, allowing further progress. === Construction === Like datasets, benchmarks are typically constructed by several methods, individually or in combination: Web scraping: Ready-made question-answer pairs may be scraped online, such as from websites that teach mathematics and programming. Conversion: Items may be constructed programmatically from scraped web content, such as by blanking out named entities from sentences, and asking the model to fill in the blank. This was used for making the CNN/Daily Mail Reading Comprehension Task. Crowd sourcing: Items may be constructed by paying people to write them, such as on Amazon Mechanical Turk. This was used for making the MCTest. === Evaluation === Generally, benchmarks are fully automated. This limits the questions that can be asked. For example, with mathematical questions, "proving a claim" would be difficult to automatically check, while "calculate an answer with a unique integer answer" would be automatically checkable. With programming tasks, the answer can generally be checked by running unit tests, with an upper limit on runtime. The benchmark scores are of the following kinds: For multiple choice or cloze questions, common scores are accuracy (frequency of correct answer), precision, recall, F1 score, etc. pass@n: The model is given n {\displaystyle n} attempts to solve each problem. If any attempt is correct, the model earns a point. The pass@n score is the model's average score over all problems. k@n: The model makes n {\displaystyle n} attempts to solve each problem, but only k {\displaystyle k} attempts out of them are selected for submission. If any submission is correct, the model earns a point. The k@n score is the model's average score over all problems. cons@n: The model is given n {\displaystyle n} attempts to solve each problem. If the most common answer is correct, the model earns a point. The cons@n score is the model's average score over all problems. Here "cons" stands for "consensus" or "majority voting". The pass@n score can be estimated more accurately by making N > n {\displaystyle N>n} attempts, and use the unbiased estimator 1 − ( N − c n ) ( N n ) {\displaystyle 1-{\frac {\binom {N-c}{n}}{\binom {N}{n}}}} , where c {\displaystyle c} is the number of correct attempts. For less well-formed tasks, where the output can be any sentence, there are the following commonly used scores including BLEU ROUGE, METEOR, NIST, word error rate, LEPOR, CIDEr, and SPICE. === Issues === error: Some benchmark answers may be wrong. ambiguity: Some benchmark questions may be ambiguously worded. subjective: Some benchmark questions may not have an objective answer at all. This problem generally prevents creative writing benchmarks. Similarly, this prevents benchmarking writing proofs in natural language, though benchmarking proofs in a formal language is possible. open-ended: Some benchmark questions may not have a single answer of a fixed size. This problem generally prevents programming benchmarks from using more natural tasks such as "write a program for X", and instead uses tasks such as "write a function that implements specification X". inter-annotator agreement: Some benchmark questions may be not fully objective, such that even people would not agree with 100% on what the answer should be. This is common in natural language processing tasks, such as syntactic annotation. shortcut: Some benchmark questions may be easily solved by an "unintended" shortcut. For example, in the SNLI benchmark, having a negative word like "not" in the second sentence is a strong signal for the "Contradiction" category, regardless of what the se

    Read more →
  • Concordance (publishing)

    Concordance (publishing)

    A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Historically, concordances have been compiled only for works of special importance, such as the Vedas, Bible, Qur'an or the works of Shakespeare, James Joyce or classical Latin and Greek authors, because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era. A concordance is more than an index, with additional material such as commentary, definitions and topical cross-indexing which makes producing one a labor-intensive process even when assisted by computers. In the precomputing era, search technology was unavailable, and a concordance offered readers of long works such as the Bible something comparable to search results for every word that they would have been likely to search for. Today, the ability to combine the result of queries concerning multiple terms (such as searching for words near other words) has reduced interest in concordance publishing. In addition, mathematical techniques such as latent semantic indexing have been proposed as a means of automatically identifying linguistic information based on word context. A bilingual concordance is a concordance based on aligned parallel text. A topical concordance is a list of subjects that a book covers (usually The Bible), with the immediate context of the coverage of those subjects. Unlike a traditional concordance, the indexed word does not have to appear in the verse. The best-known topical concordance is Nave's Topical Bible. The first Bible concordance was compiled for the Vulgate Bible by Hugh of St Cher (d.1262), who employed 500 friars to assist him. In 1448, Rabbi Mordecai Nathan completed a concordance to the Hebrew Bible. It took him ten years. A concordance to the Greek New Testament was published in 1546 by Sixt Birck, and the Septuagint was done a by Conrad Kircher in 1602. The first concordance to the English Bible was published in 1550 by John Merbecke. According to Cruden, it did not employ the verse numbers devised by Robert Stephens in 1545, but "the pretty large concordance" of Mr Cotton did. Then followed Cruden's Concordance and Strong's Concordance. == Use in linguistics == Concordances are frequently used in linguistics, when studying a text. For example: comparing different usages of the same word analysing keywords analysing word frequencies finding and analysing phrases and idioms finding translations of subsentential elements, e.g. terminology, in bitexts and translation memories creating indexes and word lists (also useful for publishing) Concordancing techniques are widely used in national text corpora such as American National Corpus (ANC), British National Corpus (BNC), and Corpus of Contemporary American English (COCA) available on-line. Stand-alone applications that employ concordancing techniques are known as concordancers or more advanced corpus managers. Some of them have integrated part-of-speech taggers (POS taggers) and enable the user to create their own POS-annotated corpora to conduct various types of searches adopted in corpus linguistics. == Inversion == The reconstruction of the text of some of the Dead Sea Scrolls involved a concordance. Access to some of the scrolls was governed by a "secrecy rule" that allowed only the original International Team or their designates to view the original materials. After the death of Roland de Vaux in 1971, his successors repeatedly refused to even allow the publication of photographs to other scholars. This restriction was circumvented by Martin Abegg in 1991, who used a computer to "invert" a concordance of the missing documents made in the 1950s which had come into the hands of scholars outside of the International Team, to obtain an approximate reconstruction of the original text of 17 of the documents. This was soon followed by the release of the original text of the scrolls.

    Read more →
  • Enterprise data planning

    Enterprise data planning

    Enterprise data planning is the starting point for enterprise wide change. It states the destination and describes how you will get there. It defines benefits, costs and potential risks. It provides measures to be used along the way to judge progress and adjust the journey according to changing circumstances. Data is fundamental to investment enterprises. Effective, economic management of data underpins operations and enables transformations needed to satisfy customer demands, competition and regulation. Data warehouse(s) and other aspects of the overall data architecture are critical to the enterprise. EDMworks has created a strategic data planning approach for the Investment Sector. It consists of a planning process, planning intranets, templates and training materials. EDMworks planning process is based on the belief that extensive domain knowledge significantly shortens planning iterations and enables progressively higher quality plans to be produced and implemented. This approach drives the development of an effective and economic enterprise data architecture. Enterprise data planning is based on proven business disciplines. Key architectural layers for data and applications are then added in order to provide an enterprise wide understanding of the uses and interdependencies of data. This enables the definition of the core components of the EDM plan: Industry structure and business objectives Assessment of systems and services Target architecture for applications, data and infrastructure Target organization structures Systems, database, infrastructure and organizational plans Business case, costs, benefits, results and risks. EDMworks uses several components from the Open Systems Group TOGAF enterprise systems planning process. TOGAF acts as an extension to good business planning methods to provide a framework for the development of the systems and data architectural components. == History == James Martin was one of the pathfinders in data planning methodologies. He was one of the first to identify data as being an enterprise wide asset that required management. He developed a series of tools and methods to support that process. Most of the large consulting firms developed their own methods to address the same basic issue. Frequently, their approaches were incorporated into their own branded system development methodologies that encompassed the complete systems development life-cycle. Others, such as Ed Tozer, developed more focused offerings that dealt with the complexities of extracting key business needs from senior management and then defining relevant architectural visions for the specific enterprise. From these various sources, the concepts of Business, Data, Applications and Technology Architectures emerged. The Open Group Architectural Framework (TOGAF) has taken this work forward and has established a sound method in TOGAF version 9. EDMworks approach is to adopt these planning and architectural practices as a basis and then add two additional dimensions to the planning and implementation focus: Domain knowledge of the Investments sector. Investments is a complex global industry with a common set of characteristics about clients, information vendors, competition and regulation. Domain knowledge significantly improves the quality of the planning and implementation processes Development of people and teams. Change is a major feature of in any Enterprise Data Management program and people and teams both need development in order to make EDM effective throughout an organization.

    Read more →
  • Run-time algorithm specialization

    Run-time algorithm specialization

    In computer science, run-time algorithm specialization is a methodology for creating efficient algorithms for costly computation tasks of certain kinds. The methodology originates in the field of automated theorem proving and, more specifically, in the Vampire theorem prover project. The idea is inspired by the use of partial evaluation in optimising program translation. Many core operations in theorem provers exhibit the following pattern. Suppose that we need to execute some algorithm a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} in a situation where a value of A {\displaystyle A} is fixed for potentially many different values of B {\displaystyle B} . In order to do this efficiently, we can try to find a specialization of a l g {\displaystyle {\mathit {alg}}} for every fixed A {\displaystyle A} , i.e., such an algorithm a l g A {\displaystyle {\mathit {alg}}_{A}} , that executing a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is equivalent to executing a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} . The specialized algorithm may be more efficient than the generic one, since it can exploit some particular properties of the fixed value A {\displaystyle A} . Typically, a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} can avoid some operations that a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} would have to perform, if they are known to be redundant for this particular parameter A {\displaystyle A} . In particular, we can often identify some tests that are true or false for A {\displaystyle A} , unroll loops and recursion, etc. == Difference from partial evaluation == The key difference between run-time specialization and partial evaluation is that the values of A {\displaystyle A} on which a l g {\displaystyle {\mathit {alg}}} is specialised are not known statically, so the specialization takes place at run-time. There is also an important technical difference. Partial evaluation is applied to algorithms explicitly represented as codes in some programming language. At run-time, we do not need any concrete representation of a l g {\displaystyle {\mathit {alg}}} . We only have to imagine a l g {\displaystyle {\mathit {alg}}} when we program the specialization procedure. All we need is a concrete representation of the specialized version a l g A {\displaystyle {\mathit {alg}}_{A}} . This also means that we cannot use any universal methods for specializing algorithms, which is usually the case with partial evaluation. Instead, we have to program a specialization procedure for every particular algorithm a l g {\displaystyle {\mathit {alg}}} . An important advantage of doing so is that we can use some powerful ad hoc tricks exploiting peculiarities of a l g {\displaystyle {\mathit {alg}}} and the representation of A {\displaystyle A} and B {\displaystyle B} , which are beyond the reach of any universal specialization methods. == Specialization with compilation == The specialized algorithm has to be represented in a form that can be interpreted. In many situations, usually when a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is to be computed on many values of B {\displaystyle B} in a row, a l g A {\displaystyle {\mathit {alg}}_{A}} can be written as machine code instructions for a special abstract machine, and it is typically said that A {\displaystyle A} is compiled. The code itself can then be additionally optimized by answer-preserving transformations that rely only on the semantics of instructions of the abstract machine. The instructions of the abstract machine can usually be represented as records. One field of such a record, an instruction identifier (or instruction tag), would identify the instruction type, e.g. an integer field may be used, with particular integer values corresponding to particular instructions. Other fields may be used for storing additional parameters of the instruction, e.g. a pointer field may point to another instruction representing a label, if the semantics of the instruction require a jump. All instructions of the code can be stored in a traversable data structure such as an array, linked list, or tree. Interpretation (or execution) proceeds by fetching instructions in some order, identifying their type, and executing the actions associated with said type. In many programming languages, such as C and C++, a simple switch statement may be used to associate actions with different instruction identifiers. Modern compilers usually compile a switch statement with constant (e.g. integer) labels from a narrow range by storing the address of the statement corresponding to a value i {\displaystyle i} in the i {\displaystyle i} -th cell of a special array, as a means of efficient optimisation. This can be exploited by taking values for instruction identifiers from a small interval of values. == Data-and-algorithm specialization == There are situations when many instances of A {\displaystyle A} are intended for long-term storage and the calls of a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} occur with different B {\displaystyle B} in an unpredictable order. For example, we may have to check a l g ( A 1 , B 1 ) {\displaystyle {\mathit {alg}}(A_{1},B_{1})} first, then a l g ( A 2 , B 2 ) {\displaystyle {\mathit {alg}}(A_{2},B_{2})} , then a l g ( A 1 , B 3 ) {\displaystyle {\mathit {alg}}(A_{1},B_{3})} , and so on. In such circumstances, full-scale specialization with compilation may not be suitable due to excessive memory usage. However, we can sometimes find a compact specialized representation A ′ {\displaystyle A^{\prime }} for every A {\displaystyle A} , that can be stored with, or instead of, A {\displaystyle A} . We also define a variant a l g ′ {\displaystyle {\mathit {alg}}^{\prime }} that works on this representation and any call to a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} is replaced by a l g ′ ( A ′ , B ) {\displaystyle {\mathit {alg}}^{\prime }(A^{\prime },B)} , intended to do the same job faster.

    Read more →
  • Packed pixel

    Packed pixel

    In packed pixel or chunky framebuffer organization, the bits defining each pixel are clustered and stored consecutively. For example, if there are 16 bits per pixel, each pixel is represented in two consecutive (contiguous) 8-bit bytes in the framebuffer. If there are 4 bits per pixel, each framebuffer byte defines two pixels, one in each nibble. The latter example is as opposed to storing a single 4-bit pixel in a byte, leaving 4 bits of the byte unused. If a pixel has more than one channel, the channels are interleaved when using packed pixel organization. Packed pixel displays were common on early microcomputer system that shared a single main memory for both the central processing unit (CPU) and display driver. In such systems, memory was normally accessed a byte at a time, so by packing the pixels, the display system could read out several pixels worth of data in a single read operation. Packed pixel is one of two major ways to organize graphics data in memory, the other being planar organization, where each pixel is made of individual bits stored in their own plane. For a 4-bit color value, memory would be organized as four screen-sized planes of one bit each and a single pixel's value built up by selecting the appropriate bit from each plane. Planar organization has the advantage that the data can be accessed in parallel, and is used when memory bandwidth is an issue.

    Read more →
  • Sequential algorithm

    Sequential algorithm

    In computer science, a sequential algorithm or serial algorithm is an algorithm that is executed sequentially – once through, from start to finish, without other processing executing – as opposed to concurrently or in parallel. The term is primarily used to contrast with concurrent algorithm or parallel algorithm; most standard computer algorithms are sequential algorithms, and not specifically identified as such, as sequentialness is a background assumption. Concurrency and parallelism are in general distinct concepts, but they often overlap – many distributed algorithms are both concurrent and parallel – and thus "sequential" is used to contrast with both, without distinguishing which one. If these need to be distinguished, the opposing pairs sequential/concurrent and serial/parallel may be used. "Sequential algorithm" may also refer specifically to an algorithm for decoding a convolutional code.

    Read more →
  • Storage area network

    Storage area network

    A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN). Although a SAN provides only block-level access, file systems built on top of SANs do provide file-level access and are known as shared-disk file systems. Newer SAN configurations enable hybrid SAN and allow traditional block storage that appears as local storage but also object storage for web services through APIs. == Storage architectures == Storage area networks (SANs) are sometimes referred to as network behind the servers and historically developed out of a centralized data storage model, but with its own data network. A SAN is, at its simplest, a dedicated network for data storage. In addition to storing data, SANs allow for the automatic backup of data, and the monitoring of the storage as well as the backup process. A SAN is a combination of hardware and software. It grew out of data-centric mainframe architectures, where clients in a network can connect to several servers that store different types of data. To scale storage capacities as the volumes of data grew, direct-attached storage (DAS) was developed, where disk arrays or just a bunch of disks (JBODs) were attached to servers. In this architecture, storage devices can be added to increase storage capacity. However, the server through which the storage devices are accessed is a single point of failure, and a large part of the LAN network bandwidth is used for accessing, storing and backing up data. To solve the single point of failure issue, a direct-attached shared storage architecture was implemented, where several servers could access the same storage device. DAS was the first network storage system and is still widely used where data storage requirements are not very high. Out of it developed the network-attached storage (NAS) architecture, where one or more dedicated file server or storage devices are made available in a LAN. Therefore, the transfer of data, particularly for backup, still takes place over the existing LAN. If more than a terabyte of data was stored at any one time, LAN bandwidth became a bottleneck. Therefore, SANs were developed, where a dedicated storage network was attached to the LAN, and terabytes of data are transferred over a dedicated high speed and bandwidth network. Within the SAN, storage devices are interconnected. Transfer of data between storage devices, such as for backup, happens behind the servers and is meant to be transparent. In a NAS architecture data is transferred using the TCP and IP protocols over Ethernet. Distinct protocols were developed for SANs, such as Fibre Channel, iSCSI, Infiniband. Therefore, SANs often have their own network and storage devices, which have to be bought, installed, and configured. This makes SANs inherently more expensive than NAS architectures. == Components == SANs have their own networking devices, such as SAN switches. To access the SAN, so-called SAN servers are used, which in turn connect to SAN host adapters. Within the SAN, a range of data storage devices may be interconnected, such as SAN-capable disk arrays, JBODs and tape libraries. === Host layer === Servers that allow access to the SAN and its storage devices are said to form the host layer of the SAN. Such servers have host adapters, which are cards that attach to slots on the server motherboard (usually PCI slots) and run with a corresponding firmware and device driver. Through the host adapters the operating system of the server can communicate with the storage devices in the SAN. In Fibre channel deployments, a cable connects to the host adapter through the gigabit interface converter (GBIC). GBICs are also used on switches and storage devices within the SAN, and they convert digital bits into light impulses that can then be transmitted over the Fibre Channel cables. Conversely, the GBIC converts incoming light impulses back into digital bits. The predecessor of the GBIC was called gigabit link module (GLM). === Fabric layer === The fabric layer consists of SAN networking devices that include SAN switches, routers, protocol bridges, gateway devices, and cables. SAN network devices move data within the SAN, or between an initiator, such as an HBA port of a server, and a target, such as the port of a storage device. When SANs were first built, hubs were the only devices that were Fibre Channel capable, but Fibre Channel switches were developed and hubs are now rarely found in SANs. Switches have the advantage over hubs that they allow all attached devices to communicate simultaneously, as a switch provides a dedicated link to connect all its ports with one another. When SANs were first built, Fibre Channel had to be implemented over copper cables, these days multimode optical fibre cables are used in SANs. SANs are usually built with redundancy, so SAN switches are connected with redundant links. SAN switches connect the servers with the storage devices and are typically non-blocking allowing transmission of data across all attached wires at the same time. SAN switches are for redundancy purposes set up in a meshed topology. A single SAN switch can have as few as 8 ports and up to 32 ports with modular extensions. So-called director-class switches can have as many as 128 ports. In switched SANs, the Fibre Channel switched fabric protocol FC-SW-6 is used under which every device in the SAN has a hardcoded World Wide Name (WWN) address in the host bus adapter (HBA). If a device is connected to the SAN its WWN is registered in the SAN switch name server. In place of a WWN, or worldwide port name (WWPN), SAN Fibre Channel storage device vendors may also hardcode a worldwide node name (WWNN). The ports of storage devices often have a WWN starting with 5, while the bus adapters of servers start with 10 or 21. === Storage layer === The serialized Small Computer Systems Interface (SCSI) protocol is often used on top of the Fibre Channel switched fabric protocol in servers and SAN storage devices. The Internet Small Computer Systems Interface (iSCSI) over Ethernet and the Infiniband protocols may also be found implemented in SANs, but are often bridged into the Fibre Channel SAN. However, Infiniband and iSCSI storage devices, in particular, disk arrays, are available. The various storage devices in a SAN are said to form the storage layer. It can include a variety of hard disk and magnetic tape devices that store data. In SANs, disk arrays are joined through a RAID which makes a lot of hard disks look and perform like one big storage device. Every storage device, or even partition on that storage device, has a logical unit number (LUN) assigned to it. This is a unique number within the SAN. Every node in the SAN, be it a server or another storage device, can access the storage by referencing the LUN. The LUNs allow for the storage capacity of a SAN to be segmented and for the implementation of access controls. A particular server, or a group of servers, may, for example, be only given access to a particular part of the SAN storage layer, in the form of LUNs. When a storage device receives a request to read or write data, it will check its access list to establish whether the node, identified by its LUN, is allowed to access the storage area, also identified by a LUN. LUN masking is a technique whereby the host bus adapter and the SAN software of a server restrict the LUNs for which commands are accepted. In doing so LUNs that should never be accessed by the server are masked. Another method to restrict server access to particular SAN storage devices is fabric-based access control, or zoning, which is enforced by the SAN networking devices and servers. Under zoning, server access is restricted to storage devices that are in a particular SAN zone. == Network protocols == A mapping layer to other protocols is used to form a network: ATA over Ethernet (AoE), mapping of AT Attachment (ATA) over Ethernet Fibre Channel Protocol (FCP), a mapping of SCSI over Fibre Channel Fibre Channel over Ethernet (FCoE) ESCON over Fibre Channel (FICON), used by mainframe computers HyperSCSI, mapping of SCSI over Ethernet iFCP or SANoIP mapping of FCP over IP iSCSI, mapping of SCSI over TCP/IP iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand Network block device, mapping device node requests on UNIX-like systems over stream sockets like TCP/IP SCSI RDMA Protocol (SRP), another SCSI implementation for remote direct memory access (RDMA) transports Storage networks may also be built using Serial Attached SCSI (SAS) and Serial ATA (SATA) technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from Para

    Read more →
  • Object storage

    Object storage

    Object storage (also known as object-based storage or blob storage) is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems, which manage data as a file hierarchy, and block storage, which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity. Object storage systems allow retention of massive amounts of unstructured data in which data is written once and read once (or many times). Object storage is used for purposes such as storing objects like videos and photos on Facebook, songs on Spotify, or files in online collaboration services, such as Dropbox. One of the limitations with object storage is that it is not intended for transactional data, as object storage was not designed to replace NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file. == History == === Origins === Jim Starkey coined the term blob working at Digital Equipment Corporation to refer to opaque data entities. The terminology was adopted for Rdb/VMS. Blob is often humorously explained to be an abbreviation for binary large object. According to Starkey, this backronym arose when Terry McKiever, working in marketing at Apollo Computer felt that the term needed to be an abbreviation. McKiever began using the expansion basic large object. This was later eclipsed by the retroactive explanation of blobs as binary large objects. According to Starkey, "Blob don't stand for nothin'." Rejecting the acronym, he explained his motivation behind the coinage, saying, "A blob is the thing that ate Cincinnatti [sic], Cleveland, or whatever", referring to the 1958 science fiction film The Blob. In 1995, research led by Garth Gibson on Network-Attached Secure Disks first promoted the concept of splitting less common operations, like namespace manipulations, from common operations, like reads and writes, to optimize the performance and scale of both. In the same year, a Belgian company – FilePool – was established to build the basis for archiving functions. Object storage was proposed at Gibson's Carnegie Mellon University lab as a research project in 1996. Another key concept was abstracting the writes and reads of data to more flexible data containers (objects). Fine grained access control through object storage architecture was further described by one of the NASD team, Howard Gobioff, who later was one of the inventors of the Google File System. Other related work includes the Coda filesystem project at Carnegie Mellon, which started in 1987, and spawned the Lustre file system. There is also the OceanStore project at UC Berkeley, which started in 1999 and the Logistical Networking project at the University of Tennessee Knoxville, which started in 1998. In 1999, Gibson founded Panasas to commercialize the concepts developed by the NASD team. === Development === Seagate Technology played a central role in the development of object storage. According to the Storage Networking Industry Association (SNIA), "Object storage originated in the late 1990s: Seagate specifications from 1999 Introduced some of the first commands and how operating system effectively removed from consumption of the storage." A preliminary version of the "OBJECT BASED STORAGE DEVICES Command Set Proposal" dated 10/25/1999 was submitted by Seagate as edited by Seagate's Dave Anderson and was the product of work by the National Storage Industry Consortium (NSIC) including contributions by Carnegie Mellon University, Seagate, IBM, Quantum, and StorageTek. This paper was proposed to INCITS T-10 (International Committee for Information Technology Standards) with a goal to form a committee and design a specification based on the SCSI interface protocol. This defined objects as abstracted data, with unique identifiers and metadata, how objects related to file systems, along with many other innovative concepts. Anderson presented many of these ideas at the SNIA conference in October 1999. The presentation revealed an IP Agreement that had been signed in February 1997 between the original collaborators (with Seagate represented by Anderson and Chris Malakapalli) and covered the benefits of object storage, scalable computing, platform independence, and storage management. == Architecture == === Abstraction of storage === One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of blocks or (exclusively) files. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions. === Inclusion of rich custom metadata within the object === Object storage explicitly separates file metadata from data to support additional capabilities. As opposed to fixed metadata in file systems (filename, creation date, type, etc.), object storage provides for full function, custom, object-level metadata in order to: Capture application-specific or user-specific information for better indexing purposes Support data-management policies (e.g. a policy to drive object movement from one storage tier to another) Centralize management of storage across many individual nodes and clusters Optimize metadata storage (e.g. encapsulated, database or key value storage) and caching/indexing (when authoritative metadata is encapsulated with the metadata inside the object) independently from the data storage (e.g. unstructured binary storage) Additionally, in some object-based file-system implementations: The file system clients only contact metadata servers once when the file is opened and then get content directly via object-storage servers (vs. block-based file systems which would require constant metadata access) Data objects can be configured on a per-file basis to allow adaptive stripe width, even across multiple object-storage servers, supporting optimizations in bandwidth and I/O Object-based storage devices (OSD) as well as some software implementations (e.g., DataCore Swarm) manage metadata and data at the storage device level: Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, data is organized into flexible-sized data containers, called objects Each object has both data (an uninterpreted sequence of bytes) and metadata (an extensible set of attributes describing the object); physically encapsulating both together benefits recoverability. The command interface includes commands to create and delete objects, write bytes and read bytes to and from individual objects, and to set and get attributes on objects Security mechanisms provide per-object and per-command access control === Programmatic data management === Object storage provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes Create, read, update and delete (CRUD) functions for basic read, write and delete operations. Some object storage implementations go further, supporting additional functionality like object/file versioning, object replication, life-cycle management and movement of objects between different tiers and types of storage. Most API implementations are REST-based, allowing the use of many standard HTTP calls. == Implementation == === Cloud storage === The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon S3, which debuted in March 2006, Microsoft Azure Blob Storage, IBM Cloud Object Storage, Rackspace Cloud Files (whose code was donated in 2010 to Openstack project and released as OpenStack Swift), and Google Cloud Storage released in May 2010. === Object-based file systems === Some distributed file systems use an object-based architecture, where file metadata is stored in metadata servers and file data is stored i

    Read more →
  • Cloudflare

    Cloudflare

    Cloudflare, Inc., is an American technology company headquartered in San Francisco, California, that provides a range of internet services, including content delivery network (CDN) services, cloud cybersecurity, DDoS mitigation, and ICANN-accredited domain registration. The company's services act primarily as a reverse proxy between website visitors and a customer's hosting provider, improving performance and protecting against malicious traffic. Cloudflare was founded in 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn. The company went public on the New York Stock Exchange in 2019 under the ticker symbol NET. Cloudflare has since expanded its offerings to include edge computing through its Workers platform, a public DNS resolver (1.1.1.1), and a VPN-like service known as WARP. In recent years, the company has integrated artificial intelligence into its infrastructure, acquiring companies such as Replicate and launching tools to manage AI bots and scrapers. According to W3Techs, Cloudflare is used by approximately 21.3% of all websites on the Internet as of January 2026. The company has been the subject of controversy regarding its policy of content neutrality. While Cloudflare executives have historically advocated for remaining a neutral infrastructure provider, the company has terminated services for specific high-profile websites associated with hate speech and violence, including The Daily Stormer, 8chan, and Kiwi Farms, following significant public pressure. Cloudflare has also faced criticism and litigation regarding copyright infringement by websites using its services, notably losing a lawsuit against Japanese publishers in 2025. The company experienced significant global outages in late 2025 which disrupted services for major platforms internationally. == History == Cloudflare was founded on July 26, 2009, by Matthew Prince, Lee Holloway, and Michelle Zatlyn. Prince and Holloway had previously collaborated on Project Honey Pot, a product of Unspam Technologies that partly inspired the basis of Cloudflare. In 2009, the company was venture-capital funded. On August 15, 2019, Cloudflare submitted its S-1 filing for an initial public offering on the New York Stock Exchange under the stock ticker NET. It opened for public trading on September 13, 2019, at $15 per share. According to the company, the name 'Cloudflare' was chosen, over the initial 'WebWall', because it best described what they were trying to do: build a "firewall in the cloud." In 2020, Cloudflare co-founder and COO Michelle Zatlyn was named president. Cloudflare has acquired web-services and security companies, including StopTheHacker (February 2014), CryptoSeal (June 2014), Eager Platform Co. (December 2016), Neumob (November 2017), S2 Systems (January 2020), Linc (December 2020), Zaraz (December 2021), Vectrix (February 2022), Area 1 Security (February 2022), Nefeli Networks (March 2024), BastionZero (May 2024), and Kivera (October 2024). Replicate (November 2025), and Human Native (January 2026). Since at least 2017, Cloudflare has used a wall of lava lamps at its San Francisco headquarters as a source of randomness for encryption keys, alongside double pendulums at its London offices and a Geiger counter at its Singapore offices. The lava lamp installation implements the Lavarand method, where a camera transforms the unpredictable shapes of the "lava" blobs into a digital image. In Q4 2022, Cloudflare provided paid services to 162,086 customers. In October 2024, Cloudflare won a lawsuit against patent troll Sable Networks. Sable paid Cloudflare $225,000, granted it a royalty-free license to its patent portfolio, and dedicated its patents to the public by abandoning its patent rights. In November 2025, it was announced Cloudflare had agreed to acquire Replicate, a San Francisco–based platform that enables software developers to run, fine-tune, and deploy open-source machine-learning models via an API without managing infrastructure. In January 2026, Cloudflare released an analysis regarding BGP routing leaks observed from the Venezuelan state-owned ISP CANTV (AS8048), which occurred on January 2 coincides with the arrest of Nicolás Maduro. While some security researchers had speculated that the outages were linked to U.S. cyber operations, Cloudflare's data indicated that the anomalies were consistent with a pattern of "insufficient routing export and import policies" by the ISP rather than malicious external interference. In January 2026, Cloudflare acquired Human Native, an AI data marketplace that brokers transactions between developers and content creators, for an undisclosed amount. On January 16, 2026, Cloudflare acquired The Astro Technology Company, the developers behind the open-source web framework Astro. In May 2026, Cloudflare announced the elimination of approximately 1,100 positions, around 20 percent of its workforce, in a restructuring the company attributed to the rapid adoption of artificial intelligence tools. The announcement coincided with the company's first-quarter 2026 earnings, which reported a record $639.8 million in quarterly revenue, a 34 percent year-over-year increase. CEO Matthew Prince stated the cuts were not driven by performance concerns but reflected roles made obsolete by AI, and that Cloudflare expected to employ more people by the end of 2027 than at any point during 2026. == Products == Cloudflare provides network and security products for consumers and businesses, utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content distribution network to serve content across its network of servers. It supports transport layer protocols TCP, UDP, QUIC, and many application layer protocols such as DNS over HTTPS, SMTP, and HTTP/2 with support for HTTP/2 Server Push. As of 2023, Cloudflare handles an average of 45 million HTTP requests per second. As of 2024, Cloudflare servers are powered by AMD EPYC 9684X processors. Cloudflare also provides analysis and reports on large-scale outages, including Verizon's October 2024 outage. === Artificial intelligence === In 2023, Cloudflare launched "Workers AI", a framework allowing for use of Nvidia GPU's within Cloudflare's network. In 2024, Cloudflare launched a tool that prevents bots from scraping websites. To build automatic bot detector models, the company analyzed "AI" bots and crawler traffic. The company also launched an "AI" assistant to generate charts based on queries by leveraging "Workers AI". Cloudflare announced plans in September 2024 to launch a marketplace where website owners can sell "AI" model providers access to scrape their site's content. In March 2025, Cloudflare announced a new feature called "AI Labyrinth", which combats unauthorized "AI" data scraping by serving fake "AI"-generated content to LLM bots. In July, the company rolled out a permission-based setting to allow websites to automatically block online bots from scraping data and content. Cloudflare released AutoRAG into beta in 2025. AutoRAG (retrieval augmented generation) creates a vector database of a website's unstructured content to identify relationships between concepts. It is part of an initiative with Microsoft, alongside their NLWeb standard, to make websites easier for people and automated systems to query. Cloudflare and GoDaddy partnered in April 2026 to enable AI Crawl Control features on GoDaddy hosted websites. This would allow site owners to decide how AI bot crawlers interact with their content. === DDoS mitigation === Cloudflare provides free and paid DDoS mitigation services that protect customers from distributed denial of service (DDoS) attacks. Cloudflare received media attention in June 2011 for providing DDoS mitigation for the website of LulzSec, a black hat hacking group. In March 2013, The Spamhaus Project was targeted by a DDoS attack that Cloudflare reported exceeded 300 gigabits per second (Gbit/s). Patrick Gilmore, of Akamai, stated that at the time it was "the largest publicly announced DDoS attack in the history of the Internet". While trying to defend Spamhaus against the DDoS attacks, Cloudflare ended up being attacked as well; Google and other companies eventually came to Spamhaus' defense and helped it to absorb the unprecedented amount of attack traffic. In 2014, Cloudflare began providing free DDoS mitigation for artists, activists, journalists, and human rights groups under the name "Project Galileo". In 2017, it extended the service to electoral infrastructure and political campaigns under the name "Athenian Project". By 2025, more than 2,900 users and organizations were participating in Project Galileo, including 31 US states. In February 2014, Cloudflare claimed to have mitigated an NTP reflection attack against an unnamed European customer, which it stated peaked at 400 Gbit/s. In November 2014, it reported a 500 Gbit/s DDoS attack in Hong Kong. In July 2021, the company claimed to have absorbed a DDoS atta

    Read more →
  • KiSAO

    KiSAO

    The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. KiSAO is part of the BioModels.net project and of the COMBINE initiative. == Structure == KiSAO consists of three main branches: simulation algorithm simulation algorithm characteristic simulation algorithm parameter The elements of each algorithm branch are linked to characteristic and parameter branches using has characteristic and has parameter relationships accordingly. The algorithm branch itself is hierarchically structured using relationships which denote that the descendant algorithms were derived from, or specify, more general ancestors.

    Read more →
  • Algorithm engineering

    Algorithm engineering

    Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithmics theory and practical applications of algorithms in software engineering. It is a general methodology for algorithmic research. == Origins == In 1995, a report from an NSF-sponsored workshop "with the purpose of assessing the current goals and directions of the Theory of Computing (TOC) community" identified the slow speed of adoption of theoretical insights by practitioners as an important issue and suggested measures to reduce the uncertainty by practitioners whether a certain theoretical breakthrough will translate into practical gains in their field of work, and tackle the lack of ready-to-use algorithm libraries, which provide stable, bug-free and well-tested implementations for algorithmic problems and expose an easy-to-use interface for library consumers. But also, promising algorithmic approaches have been neglected due to difficulties in mathematical analysis. The term "algorithm engineering" was first used with specificity in 1997, with the first Workshop on Algorithm Engineering (WAE97), organized by Giuseppe F. Italiano. == Difference from algorithm theory == Algorithm engineering does not intend to replace or compete with algorithm theory, but tries to enrich, refine and reinforce its formal approaches with experimental algorithmics (also called empirical algorithmics). This way it can provide new insights into the efficiency and performance of algorithms in cases where the algorithm at hand is less amenable to algorithm theoretic analysis, formal analysis pessimistically suggests bounds which are unlikely to appear on inputs of practical interest, the algorithm relies on the intricacies of modern hardware architectures like data locality, branch prediction, instruction stalls, instruction latencies which the machine model used in Algorithm Theory is unable to capture in the required detail, the crossover between competing algorithms with different constant costs and asymptotic behaviors needs to be determined. == Methodology == Some researchers describe algorithm engineering's methodology as a cycle consisting of algorithm design, analysis, implementation and experimental evaluation, joined by further aspects like machine models or realistic inputs. They argue that equating algorithm engineering with experimental algorithmics is too limited, because viewing design and analysis, implementation and experimentation as separate activities ignores the crucial feedback loop between those elements of algorithm engineering. === Realistic models and real inputs === While specific applications are outside the methodology of algorithm engineering, they play an important role in shaping realistic models of the problem and the underlying machine, and supply real inputs and other design parameters for experiments. === Design === Compared to algorithm theory, which usually focuses on the asymptotic behavior of algorithms, algorithm engineers need to keep further requirements in mind: Simplicity of the algorithm, implementability in programming languages on real hardware, and allowing code reuse. Additionally, constant factors of algorithms have such a considerable impact on real-world inputs that sometimes an algorithm with worse asymptotic behavior performs better in practice due to lower constant factors. === Analysis === Some problems can be solved with heuristics and randomized algorithms in a simpler and more efficient fashion than with deterministic algorithms. Unfortunately, this makes even simple randomized algorithms difficult to analyze because there are subtle dependencies to be taken into account. === Implementation === Huge semantic gaps between theoretical insights, formulated algorithms, programming languages and hardware pose a challenge to efficient implementations of even simple algorithms, because small implementation details can have rippling effects on execution behavior. The only reliable way to compare several implementations of an algorithm is to spend an considerable amount of time on tuning and profiling, running those algorithms on multiple architectures, and looking at the generated machine code. === Experiments === See: Experimental algorithmics === Application engineering === Implementations of algorithms used for experiments differ in significant ways from code usable in applications. While the former prioritizes fast prototyping, performance and instrumentation for measurements during experiments, the latter requires thorough testing, maintainability, simplicity, and tuning for particular classes of inputs. === Algorithm libraries === Stable, well-tested algorithm libraries like LEDA play an important role in technology transfer by speeding up the adoption of new algorithms in applications. Such libraries reduce the required investment and risk for practitioners, because it removes the burden of understanding and implementing the results of academic research. == Conferences == Two main conferences on Algorithm Engineering are organized annually, namely: Symposium on Experimental Algorithms (SEA), established in 1997 (formerly known as WEA). SIAM Meeting on Algorithm Engineering and Experiments (ALENEX), established in 1999. The 1997 Workshop on Algorithm Engineering (WAE'97) was held in Venice (Italy) on September 11–13, 1997. The Third International Workshop on Algorithm Engineering (WAE'99) was held in London, UK in July 1999. The first Workshop on Algorithm Engineering and Experimentation (ALENEX99) was held in Baltimore, Maryland on January 15–16, 1999. It was sponsored by DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science (at Rutgers University), with additional support from SIGACT, the ACM Special Interest Group on Algorithms and Computation Theory, and SIAM, the Society for Industrial and Applied Mathematics.

    Read more →
  • World Congress of Universal Documentation

    World Congress of Universal Documentation

    The World Congress of Universal Documentation was held from 16 to 21 August 1937 in Paris, France. Delegates from 45 countries met to discuss means by which all of the world's information, in print, in manuscript, and in other forms, could be efficiently organized and made accessible. == The Congress in the history of information science == The Congress, held at the Trocadéro under "the auspices" of the Institut International de Bibliographie, was "the apotheosis" of a general movement in the 1930s towards the classification of the growing mass of information and the improvement of access to that information. For the first time in the history of information science, technological means were beginning to catch up with theoretical ends, and the discussions at the conference reflected that fact. Its participation in the Congress was one of the first projects of the American Documentation Institute (ADI). Participants in the conference discussed what has been more recently called "a continuously updated hypertext encyclopedia." Joseph Reagle sees many of the ideas considered at the conference as forerunners of some of the key goals and norms of Wikipedia. == Microfilm == The main resolution adopted by the congress proposed that microfilm be used to make information universally available. Watson Davis, chairman of the American delegation and president of the ADI, stated that the volume of information being produced created difficult problems of access and preservation, but that these could be solved by the use of microfilm. In his address to the Congress, Davis said: Most immediate and practical to put into operation is the microfilming of material in libraries upon demand. It will become fashionable and economical to send a potential book borrower a little strip of microfilm for his permanent possession instead of the book and then badgering him to return it before he has had a chance to use it effectively. I believe that reading machines for microfilm will become as common as typewriters in studies and laboratories. If the principal libraries and information centers of the world will cooperate in such "bibliofilm services," as they are called, if they exchange orders and have essentially uniform methods, forms for ordering, standard microfilm format and production methods and comparable if not uniform prices, the resources of any library will be placed at the disposal of any scholar or scientist anywhere in the world. All the libraries cooperating will merge into one world library without loss of identity or individuality. The world's documentation will become available to even the most isolated and individualistic scholar. The Congress included two separate exhibits on microfilm. One was of the equipment used at the Bibliothèque nationale de France and the other, coordinated by Herman H. Fussler of the University of Chicago, consisting of "an entire microfilm laboratory," complete with cameras, a darkroom, and various kinds of reading machines. Emanuel Goldberg presented a paper on an early copying camera he had invented. Other resolutions passed by the Congress concerned uniform standards for the preparation of articles, for classifying books and other documents, for indexing newspapers and periodicals, and for cooperation between libraries. == H. G. Wells == In his address to the Congress, H. G. Wells said that he thought that his idea of the "world brain" was a precursor to the ideas other delegates were proposing, and explicitly linked the projects being discussed to the work of the encyclopédistes: I am speaking of a process of mental organization throughout the world which I believe to be as inevitable as anything can be in human affairs. All the distresses and horrors of the present time are fundamentally intellectual. The world has to pull its mind together, and this [Congress] is the beginning of its efforts. Civilization is a Phoenix. It perishes in flames and even as it dies it is born again. This synthesis of knowledge upon which you are working is the necessary beginning of a new world. It is good to be meeting here in Paris where the first encyclopedia of power was made. It would be impossible to overrate our debt to Diderot and his associates. == Other participants == Participants in the Congress included authors, librarians, scholars, archivists, scientists, and editors. Some of the notable people in attendance not mentioned above were:

    Read more →
  • Microsoft Teams

    Microsoft Teams

    Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applications and services. Teams gradually replaced earlier Microsoft messaging and collaboration platforms, including Skype for Business, Skype, Flip, and Microsoft Classroom. The platform saw significant growth during the COVID-19 pandemic, alongside competitors such as Zoom, Slack, and Google Meet, as organizations shifted to remote work and virtual meetings. As of January 2023, Microsoft reported approximately 280 million monthly active users. == History == On August 29, 2007, Microsoft acquired Parlano, the developer of the persistent group chat tool MindAlign. Years later, on March 4, 2016, Microsoft considered acquiring Slack for $8 billion. However, the proposal was reportedly opposed by Bill Gates, who advocated for focusing on enhancing Skype for Business instead. Lu Qi, then executive vice president of Applications and Services, had led the initiative to pursue the Slack acquisition. Following Lu's departure later that year, Microsoft announced Microsoft Teams on November 2, 2016, at an event in New York City, positioning it as a direct competitor to Slack. Teams launched worldwide on March 14, 2017. The service was initially led by corporate vice president Brian MacDonald. In response to the launch, Slack published a full-page advertisement in The New York Times welcoming the competition and outlining its product philosophy. Although Slack was used by 28 companies in the Fortune 100, The Verge wrote that executives would question paying for the service if Teams provides a similar function in their company's existing Office 365 subscription. However, ZDNET noted that the platforms initially served different markets, as Teams did not support external users, making it less appealing to small businesses and freelancers, a limitation Microsoft later addressed. In response to Teams' announcement, Slack deepened in-product integration with Google services. In May 2017, Microsoft announced that Teams would replace Microsoft Classroom in Office 365 Education. A free version of Teams was released on July 12, 2018, offering most core features at no cost, albeit with limits on users and storage. In January 2019, Microsoft introduced updates targeting "Firstline Workers" to improve Teams’ performance across shared or limited-access devices. In September 2019, Microsoft announced the retirement of Skype for Business in favor of Teams, which took effect on July 31, 2021. In early 2020, Microsoft introduced a push-to-talk "Walkie Talkie" feature aimed at firstline workers using smartphones and tablets over Wi-Fi or cellular networks. The COVID-19 pandemic significantly boosted usage of Teams. On March 19, 2020, Microsoft reported 44 million daily active users. In April, the platform logged 4.1 billion meeting minutes in a single day. A public preview of Microsoft Teams for Linux was released in December 2019, but the Linux client was discontinued in 2022. In July 2020, Microsoft shut down its video game livestreaming platform Mixer, and announced that some of its technologies would be repurposed for use in Teams. On February 28, 2025, Microsoft announced that Skype would be fully retired on May 5, 2025, with users given options to export their data or transition to Microsoft Teams. In October 2025, together with other Microsoft 365 suite apps, Teams had its logo updated. == Usage == == Underlying software == Microsoft Teams, as part of the Microsoft 365 suite, utilizes SharePoint and Exchange Online. Each Team, Shared Channel, and Private Channel has its own Microsoft 365 Group and SharePoint Site used for file storage. Messages are stored in Cosmos DB and are journaled to Exchange Online mailboxes. Private messages, including messages in Private Channels, are journaled to the sender and recipients' mailboxes. Public Channel messages are journaled to their corresponding Team's group mailbox, whereas, messages from Shared Channels are journaled to their own mailboxes. Contacts and voicemail are stored in Exchange Online. Microsoft Teams client is a web-based desktop app, originally developed on top of the Electron framework which combines the Chromium rendering engine and the Node.js JavaScript platform. Version 2.0 client was rebuilt using the Evergreen version of Microsoft Edge WebView2 in place of Electron. == Features == === Chats === Teams allows users to communicate in two-way persistent chats with one or multiple participants. Participants can message using text, emojis, stickers and gifs, as well as sharing links and files. In August 2022, the chat feature was updated for "chat with yourself"; allowing for the organization of files, notes, comments, images, and videos within a private chat tab. === Teams === Teams allows communities, groups, or teams to contribute in a shared workspace where messages and digital content on a specific topic are shared. Team members can join through an invitation sent by a team administrator or owner or sharing of a specific URL. Teams for Education allows admins and teachers to set up groups for classes, professional learning communities (PLCs), staff members, and everyone. === Channels === Channels allow team members to communicate without the use of email or group SMS (texting). Users can reply to posts with text, images, GIFs, and image macros. Direct messages send private messages to designated users rather than the entire channel. Connectors can be used within a channel to submit information contacted through a third-party service. Connectors include Mailchimp, Facebook Pages, Twitter, Power BI and Bing News. === Group conversations === Ad-hoc groups can be created to share instant messaging, audio calls (VoIP), and video calls inside the client software. === Telephone replacement === A feature on one of the higher cost licencing tiers allows connectivity to the public switched telephone network (PSTN) telephone system. This allows users to use Teams as if it were a telephone, making and receiving calls over the PSTN, including the ability to host "conference calls" with multiple participants. === Meeting === Meetings can be scheduled with multiple participants able to share audio, video, chat and presented content with all participants. Multiple users can connect via a meeting link. Automated minutes are possible using the recording and transcript features. Teams has a plugin for Microsoft Outlook to schedule a Teams Meeting in Outlook for a specific date and time and invite others to attend. If a meeting is scheduled within a channel, users visiting the channel are able to see if a meeting is in progress. ==== Teams Live Events ==== Teams Live Events replaces Skype Meeting Broadcast for users to broadcast to 10,000 participants on Teams, Yammer, or Microsoft Stream. ==== Breakout Rooms ==== Breakout rooms split a meeting into small groups. This is often utilized for collaboration during trainings or any environment where having all participants speak at once could be disruptive or unfeasible. Breakout rooms can be set by the hosts to a certain length of time, after which all participants will automatically rejoin the main meeting room. ==== Front Row ==== Front Row adjusts the layout of the viewer's screen, placing the speaker or content in the center of the gallery with other meeting participant's video feeds reduced in size and located below the speaker. === Education === Microsoft Teams for Education allows teachers to distribute, provide feedback, and grade student assignments turned in via Teams using the Assignments tab through Office 365 for Education subscribers. Quizzes can also be assigned to students through an integration with Office Forms. === Protocols === Microsoft Teams is based on a number of Microsoft-specific protocols. Video conferences are realized over the protocol MNP24, known from the Skype consumer version. VoIP and video conference clients based on SIP and H.323 need special gateways to connect to Microsoft Teams servers. With the help of Interactive Connectivity Establishment (ICE), clients behind Network address translation routers and restrictive firewalls are also able to connect, if peer-to-peer is not possible. === Integrations === Microsoft Teams has integrations through Microsoft AppSource, its integration marketplace. In 2020, Microsoft partnered with KUDO, a cloud-based solution with language interpretation, to allow integrated language meeting controls. In June 2022, an update was released using AI to improve call audio through the elimination of background feedback loops and cancelling non-vocal audio. == Anti-trust controversy == In July 2023, the European Commission opened an anti-trust investigation into the possibility that Microsoft unfairly used its office suite market power to increase sales of Teams and hurt

    Read more →
  • Emotion Markup Language

    Emotion Markup Language

    An Emotion Markup Language (EML or EmotionML) has first been defined by the W3C Emotion Incubator Group (EmoXG) as a general-purpose emotion annotation and representation language, which should be usable in a large variety of technological contexts where emotions need to be represented. Emotion-oriented computing (or "affective computing") is gaining importance as interactive technological systems become more sophisticated. Representing the emotional states of a user or the emotional states to be simulated by a user interface requires a suitable representation format; in this case a markup language is used. EmotionML version 1.0 was published by the group in May 2014. == Example == Here is an example of an EmotionML document describing emotions expressed in a video recording of the interaction between a teacher, Alice, and a student, Bob. == History == In 2006, a first W3C Incubator Group, the Emotion Incubator Group (EmoXG), was set up "to investigate a language to represent the emotional states of users and the emotional states simulated by user interfaces" with the final Report published on 10 July 2007. In 2007, the Emotion Markup Language Incubator Group (EmotionML XG) was set up as a follow-up to the Emotion Incubator Group, "to propose a specification draft for an Emotion Markup Language, to document it in a way accessible to non-experts, and to illustrate its use in conjunction with a number of existing markups." The final report of the Emotion Markup Language Incubator Group, Elements of an EmotionML 1.0, was published on 20 November 2008. The work then was continued in 2009 in the frame of the W3C's Multimodal Interaction Activity, with the First Public Working Draft of "Emotion Markup Language (EmotionML) 1.0" being published on 29 October 2009. The Last Call Working Draft of "Emotion Markup Language 1.0", was published on 7 April 2011. The Last Call Working Draft addressed all open issues that arose from feedback of the community on the First Call Working Draft as well as results of a workshop held in Paris in October 2010. Along with the Last Call Working Draft, a list of vocabularies for EmotionML has been published to aid developers using common vocabularies for annotating or representing emotions. Annual draft updates were published until the 1.0 version was finished in 2014. == Reasons for defining an emotion markup language == A standard for an emotion markup language would be useful for the following purposes: To enhance computer-mediated human-human or human-machine communication. Emotions are a basic part of human communication and should therefore be taken into account, e.g. in emotional Chat systems or emphatic voice boxes. This involves specification, analysis and display of emotion related states. To enhance systems' processing efficiency. Emotion and intelligence are strongly interconnected. The modeling of human emotions in computer processing can help to build more efficient systems, e.g. using emotional models for time-critical decision enforcement. To allow the analysis of non-verbal behavior, emotion, mental states that can be provided using web services to enable data collection, analysis, and reporting. Concrete examples of existing technology that could apply EmotionML include: Opinion mining / sentiment analysis in Web 2.0, to automatically track customer's attitude regarding a product across blogs; Affective monitoring, such as ambient assisted living applications, fear detection for surveillance purposes, or using wearable sensors to test customer satisfaction; Wellness technologies that provide assistance according to a person's emotional state with the goal to improve the person's well-being; Character design and control for games and virtual worlds; Building web services to capture, analysis, and report data of non-verbal behavior, emotion and mental states of an individual or group across the internet using standard web technologies such as HTML5 and JSON. Social robots, such as guide robots engaging with visitors; Expressive speech synthesis, generating synthetic speech with different emotions, such as happy or sad, friendly or apologetic; expressive synthetic speech would for example make more information available to blind and partially sighted people, and enrich their experience of the content; Emotion recognition (e.g., for spotting angry customers in speech dialog systems, to improve computer games or e-Learning applications); Support for people with disabilities, such as educational programs for people with autism. EmotionML can be used to make the emotional intent of content explicit. This would enable people with learning disabilities (such as Asperger syndrome) to realise the emotional context of the content; EmotionML can be used for media transcripts and captions. Where emotions are marked up to help deaf or hearing impaired people who cannot hear the soundtrack, more information is made available to enrich their experience of the content. The Emotion Incubator Group has listed 39 individual use cases for an Emotion markup language. A standardised way to mark up the data needed by such "emotion-oriented systems" has the potential to boost development primarily because data that was annotated in a standardised way can be interchanged between systems more easily, thereby simplifying a market for emotional databases, and the standard can be used to ease a market of providers for sub-modules of emotion processing systems, e.g. a web service for the recognition of emotion from text, speech or multi-modal input. == The challenge of defining a generally usable emotion markup language == Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure, as there is no consensus on the number of relevant emotions, on the names that should be given to them or how else best to describe them. For example, the difference between ":)" and "(:" is small, but using a standardized markup it would make one invalid. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see the Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group. Given this lack of agreement on descriptors in the field, the only practical way of defining an emotion markup language is the definition of possible structural elements and to allow users to "plug in" vocabularies that they consider appropriate for their work. An additional challenge lies in the aim to provide a markup language that is generally usable. The requirements that arise from different use cases are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states and affective avatars need yet another level of detail for expressing emotions in an appropriate way. For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice. == Applications and web services benefiting from an emotion markup language == There are a range of existing projects and applications to which an emotion markup language will enable the building of webservices to measure capture data of individuals non-verbal behavior, mental states, and emotions and allowing results to be reported and rendered in a standardized format using standard web technologies such as JSON and HTML5. One such project is measuring affect data across the Internet using EyesWeb.

    Read more →
  • Iteration

    Iteration

    Iteration means repeating a process to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is the starting point of the next iteration. In mathematics and computer science, iteration (along with the related technique of recursion) is a standard element of algorithms. == Mathematics == In mathematics, iteration may refer to the process of iterating a function, i.e. applying a function repeatedly, using the output from one iteration as the input to the next. Iteration of apparently simple functions can produce complex behaviors and difficult problems – for examples, see the Collatz conjecture and juggler sequences. Another use of iteration in mathematics is in iterative methods which are used to produce approximate numerical solutions to certain mathematical problems. Newton's method is an example of an iterative method. Manual calculation of a number's square root is a common use and a well-known example. == Computing == In computing, iteration is a technique that marks out of a block of statements within a computer program for a defined number of repetitions. That block of statements is said to be iterated. A computer programmer might also refer to that block of statements as an iteration. === Implementations === Loops constitute the most common language constructs for performing iterations. The following pseudocode "iterates" three times the line of code between begin & end through a for loop, and uses the values of i as increments. It is permissible, and often necessary, to use values from other parts of the program outside the bracketed block of statements, to perform the desired function. Iterators constitute alternative language constructs to loops, which ensure consistent iterations over specific data structures. They can eventually save time and effort in later coding attempts. In particular, an iterator allows one to repeat the same kind of operation at each node of such a data structure, often in some pre-defined order. Iteratees are purely functional language constructs, which accept or reject data during the iterations. === Relation with recursion === Recursions and iterations have different algorithmic definitions, even though they can generate identical results. The primary difference is that recursion can be a solution without prior knowledge as to how many times the action must repeat, while a successful iteration requires that foreknowledge. Some types of programming languages, known as functional programming languages, are designed such that they do not set up a block of statements for explicit repetition, as with the for loop. Instead, those programming languages exclusively use recursion. Rather than call out a block of code to repeate a pre-defined number of times, the executing code block instead "divides" the work into a number of separate pieces, after which the code block executes itself on each individual piece. Each piece of work is divided repeatedly until the "amount" of work is as small as possible, at which point the algorithm does that work very quickly. The algorithm then "reverses" and reassembles the pieces into a complete whole. The classic example of recursion is in list-sorting algorithms, such as merge sort. The merge sort recursive algorithm first repeatedly divides the list into consecutive pairs. Each pair is then ordered, then each consecutive pair of pairs, and so forth until the elements of the list are in the desired order. The code below is an example of a recursive algorithm in the Scheme programming language that outputs the same result as the pseudocode under the previous heading. == Education == In some schools of pedagogy, iterations are used to describe the process of teaching or guiding students to repeat experiments, assessments, or projects, until more accurate results are found, or the student has mastered the technical skill. This idea is found in the old adage, "Practice makes perfect." In particular, "iterative" is defined as the "process of learning and development that involves cyclical inquiry, enabling multiple opportunities for people to revisit ideas and critically reflect on their implication." Unlike computing and math, educational iterations are not predetermined; instead, the task is repeated until success according to some external criteria (often a test) is achieved.

    Read more →