Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations. A realization of the collective operations is provided by the Message Passing Interface (MPI). == Definitions == In all asymptotic runtime functions, we denote the latency α {\displaystyle \alpha } (or startup time per message, independent of message size), the communication cost per word β {\displaystyle \beta } , the number of processing units p {\displaystyle p} and the input size per node n {\displaystyle n} . In cases where we have initial messages on more than one node we assume that all local messages are of the same size. To address individual processing units we use p i ∈ { p 0 , p 1 , … , p p − 1 } {\displaystyle p_{i}\in \{p_{0},p_{1},\dots ,p_{p-1}\}} . If we do not have an equal distribution, i.e. node p i {\displaystyle p_{i}} has a message of size n i {\displaystyle n_{i}} , we get an upper bound for the runtime by setting n = max ( n 0 , n 1 , … , n p − 1 ) {\displaystyle n=\max(n_{0},n_{1},\dots ,n_{p-1})} . A distributed memory model is assumed. The concepts are similar for the shared memory model. However, shared memory systems can provide hardware support for some operations like broadcast (§ Broadcast) for example, which allows convenient concurrent read. Thus, new algorithmic possibilities can become available. == Broadcast == The broadcast pattern is used to distribute data from one processing unit to all processing units, which is often needed in SPMD parallel programs to dispense input or global values. Broadcast can be interpreted as an inverse version of the reduce pattern (§ Reduce). Initially only root r {\displaystyle r} with i d {\displaystyle id} 0 {\displaystyle 0} stores message m {\displaystyle m} . During broadcast m {\displaystyle m} is sent to the remaining processing units, so that eventually m {\displaystyle m} is available to all processing units. Since an implementation by means of a sequential for-loop with p − 1 {\displaystyle p-1} iterations becomes a bottleneck, divide-and-conquer approaches are common. One possibility is to utilize a binomial tree structure with the requirement that p {\displaystyle p} has to be a power of two. When a processing unit is responsible for sending m {\displaystyle m} to processing units i . . j {\displaystyle i..j} , it sends m {\displaystyle m} to processing unit ⌈ ( i + j ) / 2 ⌉ {\displaystyle \left\lceil (i+j)/2\right\rceil } and delegates responsibility for the processing units ⌈ ( i + j ) / 2 ⌉ . . j {\displaystyle \left\lceil (i+j)/2\right\rceil ..j} to it, while its own responsibility is cut down to i . . ⌈ ( i + j ) / 2 ⌉ − 1 {\displaystyle i..\left\lceil (i+j)/2\right\rceil -1} . Binomial trees have a problem with long messages m {\displaystyle m} . The receiving unit of m {\displaystyle m} can only propagate the message to other units, after it received the whole message. In the meantime, the communication network is not utilized. Therefore pipelining on binary trees is used, where m {\displaystyle m} is split into an array of k {\displaystyle k} packets of size ⌈ n / k ⌉ {\displaystyle \left\lceil n/k\right\rceil } . The packets are then broadcast one after another, so that data is distributed fast in the communication network. Pipelined broadcast on balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , whereas for the non-pipelined case it takes O ( ( α + β n ) log p ) {\displaystyle {\mathcal {O}}((\alpha +\beta n)\log p)} cost. == Reduce == The reduce pattern is used to collect data or partial results from different processing units and to combine them into a global result by a chosen operator. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by ⊗ {\displaystyle \otimes } and the result is eventually stored on p 0 {\displaystyle p_{0}} . The reduction operator ⊗ {\displaystyle \otimes } must be associative at least. Some algorithms require a commutative operator with a neutral element. Operators like s u m {\displaystyle sum} , m i n {\displaystyle min} , m a x {\displaystyle max} are common. Implementation considerations are similar to broadcast (§ Broadcast). For pipelining on binary trees the message must be representable as a vector of smaller object for component-wise reduction. Pipelined reduce on a balanced binary tree is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} . == All-Reduce == The all-reduce pattern (also called allreduce) is used if the result of a reduce operation (§ Reduce) must be distributed to all processing units. Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} initially. All m i {\displaystyle m_{i}} are aggregated by an operator ⊗ {\displaystyle \otimes } and the result is eventually stored on all p i {\displaystyle p_{i}} . Analog to the reduce operation, the operator ⊗ {\displaystyle \otimes } must be at least associative. All-reduce can be interpreted as a reduce operation with a subsequent broadcast (§ Broadcast). For long messages a corresponding implementation is suitable, whereas for short messages, the latency can be reduced by using a hypercube (Hypercube (communication pattern) § All-Gather/ All-Reduce) topology, if p {\displaystyle p} is a power of two. All-reduce can also be implemented with a butterfly algorithm and achieve optimal latency and bandwidth. All-reduce is possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} , since reduce and broadcast are possible in O ( α log p + β n ) {\displaystyle {\mathcal {O}}(\alpha \log p+\beta n)} with pipelining on balanced binary trees. All-reduce implemented with a butterfly algorithm achieves the same asymptotic runtime. == Prefix-Sum/Scan == The prefix-sum or scan operation is used to collect data or partial results from different processing units and to compute intermediate results by an operator, which are stored on those processing units. It can be seen as a generalization of the reduce operation (§ Reduce). Given p {\displaystyle p} processing units, message m i {\displaystyle m_{i}} is on processing unit p i {\displaystyle p_{i}} . The operator ⊗ {\displaystyle \otimes } must be at least associative, whereas some algorithms require also a commutative operator and a neutral element. Common operators are s u m {\displaystyle sum} , m i n {\displaystyle min} and m a x {\displaystyle max} . Eventually processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ <= i {\displaystyle \otimes _{i'<=i}} m i ′ {\displaystyle m_{i'}} . In the case of the so-called exclusive prefix sum, processing unit p i {\displaystyle p_{i}} stores the prefix sum ⊗ i ′ < i {\displaystyle \otimes _{i'
Cloud Native Computing Foundation
The Cloud Native Computing Foundation (CNCF) is a subsidiary of the Linux Foundation founded in 2015 to support cloud-native computing. == History == It was announced alongside Kubernetes 1.0, an open source container cluster manager, which was contributed to the Linux Foundation by Google as a seed technology. Founding members include Google, CoreOS, Mesosphere, Red Hat, Twitter, Huawei, Intel, RX-M, Cisco, IBM, Docker, Univa, and VMware. Today, CNCF is supported by over 450 members. In August 2018 Google announced that it was handing over operational control of Kubernetes to the community. == Projects == Argo is a collection of tools for getting work done with Kubernetes. Among its main features are Workflows and Events. It was accepted to CNCF on March 26, 2020 at the Incubating maturity level and then moved to the Graduated maturity level on December 6, 2022. cert-manager provisions and manages TLS certificates in Kubernetes. It was accepted to CNCF on November 10, 2020, moved to the Incubating maturity level on September 19, 2022, and then moved to the Graduated maturity level on September 29, 2024. Cilium provides networking, security, and observability for Kubernetes deployments using eBPF technology. It joined the CNCF at incubation level in October 2021 and the CNCF announced its graduation in October 2023. containerd is an industry-standard core container runtime. It is currently available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system. In 2015, Docker donated the OCI Specification to The Linux Foundation with a reference implementation called runc. Since February 28, 2019 it is an official CNCF project. Its general availability and intention to donate the project to CNCF was announced by Docker in 2017. CoreDNS is a DNS server that chains plugins. Its graduation was announced in 2019. Dapr, the distributed application runtime, provides APIs for building secure and reliable microservices and agentic AI systems. Dapr was donated to the CNCF in November 2021 and joined at incubation level. The CNCF announced its graduation in November 2024. Envoy: Originally built at Lyft to move their architecture away from a monolith, Envoy is a high-performance open source edge and service proxy that makes the network transparent to applications. Lyft contributed Envoy to Cloud Native Computing Foundation in September 2017. etcd is a distributed key value store, providing a method of storing data across a cluster of machines. It became a CNCF incubating project in 2018 at KubeCon+CloudNativeCon North America in Seattle that year. Falco is an open source and cloud native runtime security initiative. It is the "de facto Kubernetes threat detection engine". It became an incubating project in January 2020 and graduated in February 2024. Flux is an open source project for powering GitOps in Kubernetes clusters. It provides the GitOps Toolkit, a set of Kubernetes APIs that allow you to define how configuration source code is securely pulled into your cluster and deployed by popular Kubernetes manifests rendering engines like Kustomize and Helm. The most recommended source mechanism is the OCIRepository API, which provides enhanced security and benefits from container image tooling out there. Flux has also notification integrations with popular services like Prometheus Alertmanager, PagerDuty, Slack and so on. Flux has graduated in CNCF in 2022. Harbor is an "open source trusted cloud native registry project that stores, signs, and scans content." It became an incubating project in September 2019 and graduated in June 2020. Helm is a package manager that helps developers "easily manage and deploy applications onto the Kubernetes cluster." It joined the incubating level in June 2018 and graduated in April 2020. Istio is a service mesh technology. It was accepted by CNCF in September 2022 and graduated on July 12, 2023. Jaeger, Created by Uber Engineering, Jaeger is an open source distributed tracing system inspired by Google Dapper paper and OpenZipkin community. It can be used for tracing microservice-based architectures, including distributed context propagation, distributed transaction monitoring, root cause analysis, service dependency analysis, and performance/latency optimization. The Cloud Native Computing Foundation Technical Oversight Committee voted to accept Jaeger as the 12th hosted project in September 2017 and became a graduated project in 2019. In 2020 it became an approved and fully integrated part of the CNCF ecosystem. Kubernetes is an open source framework for automating deployment and managing applications in a containerized and clustered environment. "It aims to provide better ways of managing related, distributed components across the varied infrastructure." It was originally designed by Google and donated to The Linux Foundation to form the Cloud Native Computing Foundation with Kubernetes as the seed technology. The "large and diverse" community supporting the project has made its staying power more robust than other, older technologies of the same ilk. In January 2020, the CNCF annual report showed significant growth in interest, training, event attendance and investment related to Kubernetes. Linkerd is CNCF's fifth member project, and the project that coined the term "service mesh". Linkerd adds observability, security, and reliability features to applications by adding them to the platform rather than the application layer, and features a "micro-proxy" to maximize speed and security of its data plane. Linkerd graduated from CNCF in July 2021. Open Policy Agent (OPA) is "an open source general-purpose policy engine and language for cloud infrastructure." It became a CNCF incubating project in April 2019. OPA graduated from CNCF in February 2021. Prometheus is a cloud monitoring tool sponsored by SoundCloud in early iterations. In August 2018, the tool was designated a graduated project by the Cloud Native Computing Foundation. It is now a Cloud Native Computing Foundation member project. Rook is CNCF's first cloud native storage project. It became an incubation level project in 2018 and graduated in October 2020. SPIFFE is an open standard and framework for workload identity, much the same way that OAuth is an open standard and framework for human identity. It is built from the ground up to accommodate modern computing environments, which operate with systems scale and velocity (as opposed to human scale and velocity), while still maintaining interoperability with existing technologies like OAuth and X.509 Public key infrastructure. Unlike other identity standards, SPIFFE supports multiple credential types for a single identity, ensuring that the highly varied needs of production environments are consistently met without compromise. SPIFFE joined the CNCF as a sandbox project in 2018, was accepted to incubation in 2020, and graduated in 2022. SPIRE is an open source identity provider for workloads based on the SPIFFE framework. It is highly pluggable, and fills the attestation and issuance needs required by any workload identity solution. The plugin interfaces it exposes allows users to write integrations with in-house systems, build internal self-service portals, and more. It is a very powerful building block for issuing short-lived identity credentials to dynamic cloud workloads. SPIRE became a CNCF Graduated project in 2022. The Update Framework (TUF) helps developers to secure new or existing software update systems, which are often found to be vulnerable to many known attacks. TUF addresses this widespread problem by providing a comprehensive, flexible security framework that developers can integrate with any software update system. TUF was CNCF's first security-focused project and the ninth project overall to graduate from the foundation's hosting program. TiKV provides a distributed key–value database. Vitess is a database clustering system for horizontal scaling of MySQL, first created for internal use by YouTube. It became a CNCF project in 2018 and graduated in November 2019. Contour is a management server for Envoy that can direct the management of Kubernetes' traffic. Contour also provides routing features that are more advanced than Kubernetes' out-of-the-box Ingress specification. VMWare contributed the project to CNCF in July 2020. Cortex offers horizontally scalable, multi-tenant, long-term storage for Prometheus and works alongside Amazon DynamoDB, Google Bigtable, Cassandra, S3, GCS, and Microsoft Azure. It was introduced into the ecosystem incubator alongside Thanos in August 2020. CRI-O is an Open Container Initiative (OCI) based "implementation of Kubernetes Container Runtime Interface". CRI-O allows Kubernetes to be container runtime-agnostic. It became an incubating project in 2019. gRPC is a "modern open source high performance RPC framework that can run in any environment." The project was formed in 2015 when Google decided to open sou
Style sheet (web development)
A web style sheet is a form of separation of content and presentation for web design in which the markup (i.e., HTML or XHTML) of a webpage contains the page's semantic content and structure, but does not define its visual layout (style). Instead, the style is defined in an external style sheet file using a style sheet language such as CSS or XSLT. This design approach is identified as a "separation" because it largely supersedes the antecedent methodology in which a page's markup defined both style and structure. The philosophy underlying this methodology is a specific case of separation of concerns. == Benefits == Separation of style and content has advantages, but has only become practical after improvements in popular web browsers' CSS implementations. === Speed === Overall, users experience of a site utilising style sheets will generally be quicker than sites that do not use the technology. ‘Overall’ as the first page will probably load more slowly – because the style sheet AND the content will need to be transferred. Subsequent pages will load faster because no style information will need to be downloaded – the CSS file will already be in the browser’s cache. === Maintainability === Holding all the presentation styles in one file can reduce the maintenance time and reduces the chance of error, thereby improving presentation consistency. For example, the font color associated with a type of text element may be specified — and therefore easily modified — throughout an entire website simply by changing one short string of characters in a single file. The alternative approach, using styles embedded in each individual page, would require a cumbersome, time consuming, and error-prone edit of every file. === Accessibility === Sites that use CSS with either XHTML or HTML are easier to tweak so that they appear similar in different browsers (Chrome, Internet Explorer, Mozilla Firefox, Opera, Safari, etc.). Sites using CSS "degrade gracefully" in browsers unable to display graphical content, such as Lynx, or those so very old that they cannot use CSS. Browsers ignore CSS that they do not understand, such as CSS 3 statements. This enables a wide variety of user agents to be able to access the content of a site even if they cannot render the style sheet or are not designed with graphical capability in mind. For example, a browser using a refreshable braille display for output could disregard layout information entirely, and the user would still have access to all page content. === Customization === If a page's layout information is stored externally, a user can decide to disable the layout information entirely, leaving the site's bare content still in a readable form. Site authors may also offer multiple style sheets, which can be used to completely change the appearance of the site without altering any of its content. Most modern web browsers also allow the user to define their own style sheet, which can include rules that override the author's layout rules. This allows users, for example, to bold every hyperlink on every page they visit. Browser extensions like Stylish and Stylus have been created to facilitate management of such user style sheets. === Consistency === Because the semantic file contains only the meanings an author intends to convey, the styling of the various elements of the document's content is very consistent. For example, headings, emphasized text, lists and mathematical expressions all receive consistently applied style properties from the external style sheet. Authors need not concern themselves with the style properties at the time of composition. These presentational details can be deferred until the moment of presentation. === Portability === The deferment of presentational details until the time of presentation means that a document can be easily re-purposed for an entirely different presentation medium with merely the application of a new style sheet already prepared for the new medium and consistent with elemental or structural vocabulary of the semantic document. A carefully authored document for a web page can easily be printed to a hard-bound volume complete with headers and footers, page numbers and a generated table of contents simply by applying a new style sheet.
BitClout
BitClout was an open source blockchain-based social media platform. On the platform, users could post short-form writings and photos, award money to posts they particularly like by clicking a diamond icon, as well as buy and sell "creator coins" (personalized tokens whose value depends on people's reputations). BitClout ran on a custom proof of work blockchain, and was a prototype of what can be built on DeSo (short for "Decentralized Social"). BitClout's founder and primary leader is Nader al-Naji, known pseudonymously as "Diamondhands". Under development since 2019, BitClout's blockchain created its first block in January 2021, and BitClout itself launched publicly in March 2021. The platform launched with 15,000 "reserved" accounts — a move intended to prevent impersonation, but which backfired as some people with reserved accounts tried to actively distance themselves. Later, in September 2021, BitClout was revealed to be the flagship product of the DeSo blockchain. == History == === Origins (2019 - March 2021) === In early 2019, Nader al-Naji became interested in "mixing investing and social media". He started creating a custom blockchain in May 2019, but didn't tell anyone else until November 2020. However, in the fall of 2020, al-Naji pitched BitClout's own investors under his real name and began posting job listings for a "new operation". Although BitClout was not originally intended to launch until mid-2021, its development was sped up due to "zeitgeist about decentralized social media" in January 2021. BitClout's first block was mined on 18 January 2021. Its next block was mined on 1 March 2021. === As BitClout (March - September 2021) === In early March 2021, about fifty investors received links to a password-protected website with the BitClout white paper. They were encouraged to explore the site and send the same link to "two or three other 'trusted contacts'". Within weeks users were spending millions of dollars per day on the platform. The platform's founders said they were "completely unprepared", having planned to have a "soft-launch". The leader went by the name "diamondhands" on the platform. On 24 March 2021, BitClout launched out of private beta. Investors include Sequoia Capital, Andreessen Horowitz, the venture capital firm Social Capital, Coinbase Ventures, Winklevoss Capital Management, Alexis Ohanian, Polychain, Pantera, and Digital Currency Group (CoinDesk's parent company). During its initial launch, BitClout's currency could be bought with bitcoin, but not sold except on Discord servers or Twitter threads. A single bitcoin wallet related to BitClout received more than $165M worth of deposits. In March 2021, law firm Anderson Kill P.C. sent Nader al-Naji, the presumed leader of the BitClout platform, a cease-and-desist letter, demanding the removal of Brandon Curtis's account and alleging that BitClout violated sections 1798 and 3344 of the California Civil Code by using Curtis's name and likeness without his consent. Curtis also tweeted, "Adopting Bitcoin's aesthetic to raise VC funding to carry out unethical and blatantly illegal schemes like BitClout: not cool". (However, Curtis's coin, despite not being listed on the official website, can still be bought by users searching for the original username.) Additionally, in April 2021, Lee Hsien Loong asked for his name and photograph to be removed from the site, stating that he has "nothing to do with the platform" and that "it is misleading and done without [his] permission". On 18 May 2021, diamondhands announced that 100% of the BitClout code went public. On 12 June 2021, the supply of BitClout was capped at around 11 million coins. On 18 July 2021, BitClout added the ability for users to mint and purchase NFTs within the platform. === As part of DeSo (September 2021 - July 2024) === On 21 September 2021, it was revealed that BitClout was a prototype built on DeSo, short for "Decentralized Social". As a part of this revelation, diamondhands confirmed his identity as Nader al-Naji. (As early as April 2021, it had been believed that diamondhands indeed was that person.)The Bitclout project raised $200M in funding, which went to setting up the DeSo Foundation. === End and aftermath (July 2024 - present) === In July 2024, al-Naji was arrested by the FBI and charged with wire fraud involving BitClout. He also faced civil charges of securities fraud and unregistered offers and sales of securities from the Securities and Exchange Commission. In response, the official "deso" account posted that al-Naji was "safe and at home" and "that this experience has only reinforced [his] commitment to DeSo". In February 2025, the Justice Department dropped its case against al-Naji. In March 2026, the SEC voluntarily dismissed the enforcement case with prejudice. == Design == BitClout is a social media platform. Its users can post short-form writings and photos (similarly to Twitter). They can award money to posts they particularly like by clicking a diamond icon (similarly to Twitch Bits). The prices of each account's "creator coin" goes up and down with the popularity of the celebrity behind it. For example, if someone says something negative, the value of their corresponding account may go down. This price is computed automatically according to the formula p r i c e _ i n _ b i t c l o u t = .003 ∗ c r e a t o r _ c o i n s _ i n _ c i r c u l a t i o n 2 {\displaystyle price\_in\_bitclout=.003creator\_coins\_in\_circulation^{2}} . At launch time, BitClout scraped 15,000 profiles of celebrities from Twitter to create "reserved" accounts in their names. To claim a reserved account, the account holder would need to tweet about it (which also serves as a marketing strategy). At least 80 such reserved profiles have been claimed. Proof of stake was introduced in March 2024.
Influence-for-hire
Influence-for-hire or collective influence, refers to the economy that has emerged around buying and selling influence on social media platforms. == Overview == Companies that engage in the influence-for-hire industry range from content farms to high-end public relations agencies. Traditionally influence operations have largely been confined to public sector actors like intelligence agencies, in the influence-for-hire industry the groups conduction the operations are private with commerce being their primary consideration. However many of the clients in the influence-for-hire industry are countries or countries acting through proxies. They are often located in countries with less expensive digital labor. == History == In May 2021, Facebook took a Ukrainian influence-for-hire network offline. Facebook attributed the network to organizations and consultants linked to Ukrainian politicians including Andriy Derkach. During the COVID-19 pandemic state sponsored misinformation was spread through influence-for-hire networks. In August 2021, a report published by the Australian Strategic Policy Institute implicated the Chinese government and the ruling Chinese Communist Party in campaigns of online manipulation conducted against Australia and Taiwan using influence-for-hire.
Active learning (machine learning)
Active learning is a special case of machine learning in which a learning algorithm can interactively query a human user (or some other information source) to label new data points with the desired outputs. The human user must possess expertise in the problem domain, including the ability to consult authoritative sources when necessary. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the teacher for labels. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. However, there is a risk that the algorithm is overwhelmed by uninformative examples. Recent developments are dedicated to multi-label active learning, hybrid active learning and active learning in a single-pass (on-line) context, combining concepts from the field of machine learning (e.g. conflict and ignorance) with adaptive, incremental learning policies in the field of online machine learning. Using active learning allows for faster development of a machine learning algorithm, when comparative updates would require a quantum or super computer. Large-scale active learning projects may benefit from crowdsourcing frameworks such as Amazon Mechanical Turk that include many humans in the active learning loop. == Definitions == Let T be the total set of all data under consideration. For example, in a protein engineering problem, T would include all proteins that are known to have a certain interesting activity and all additional proteins that one might want to test for that activity. During each iteration, i, T is broken up into three subsets T K , i {\displaystyle \mathbf {T} _{K,i}} : Data points where the label is known. T U , i {\displaystyle \mathbf {T} _{U,i}} : Data points where the label is unknown. T C , i {\displaystyle \mathbf {T} _{C,i}} : A subset of TU,i that is chosen to be labeled. Most of the current research in active learning involves the best method to choose the data points for TC,i. == Scenarios == Pool-based sampling: In this approach, which is the most well known scenario, the learning algorithm attempts to evaluate the entire dataset before selecting data points (instances) for labeling. It is often initially trained on a fully labeled subset of the data using a machine-learning method such as logistic regression or SVM that yields class-membership probabilities for individual data instances. The candidate instances are those for which the prediction is most ambiguous. Instances are drawn from the entire data pool and assigned a confidence score, a measurement of how well the learner "understands" the data. The system then selects the instances for which it is the least confident and queries the teacher for the labels. The theoretical drawback of pool-based sampling is that it is memory-intensive and is therefore limited in its capacity to handle enormous datasets, but in practice, the rate-limiting factor is that the teacher is typically a (fatiguable) human expert who must be paid for their effort, rather than computer memory. Stream-based selective sampling: Here, each consecutive unlabeled instance is examined one at a time with the machine evaluating the informativeness of each item against its query parameters. The learner decides for itself whether to assign a label or query the teacher for each datapoint. As contrasted with Pool-based sampling, the obvious drawback of stream-based methods is that the learning algorithm does not have sufficient information, early in the process, to make a sound assign-label-vs ask-teacher decision, and it does not capitalize as efficiently on the presence of already labeled data. Therefore, the teacher is likely to spend more effort in supplying labels than with the pool-based approach. Membership query synthesis: This is where the learner generates synthetic data from an underlying natural distribution. For example, if the dataset are pictures of humans and animals, the learner could send a clipped image of a leg to the teacher and query if this appendage belongs to an animal or human. This is particularly useful if the dataset is small. The challenge here, as with all synthetic-data-generation efforts, is in ensuring that the synthetic data is consistent in terms of meeting the constraints on real data. As the number of variables/features in the input data increase, and strong dependencies between variables exist, it becomes increasingly difficult to generate synthetic data with sufficient fidelity. For example, to create a synthetic data set for human laboratory-test values, the sum of the various white blood cell (WBC) components in a white blood cell differential must equal 100, since the component numbers are really percentages. Similarly, the enzymes alanine transaminase (ALT) and aspartate transaminase (AST) measure liver function (though AST is also produced by other tissues, e.g., lung, pancreas) A synthetic data point with AST at the lower limit of normal range (8–33 units/L) with an ALT several times above normal range (4–35 units/L) in a simulated chronically ill patient would be physiologically impossible. == Query strategies == Algorithms for determining which data points should be labeled can be organized into a number of different categories, based upon their purpose: Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al. propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Expected model change: label those points that would most change the current model. Expected error reduction: label those points that would most reduce the model's generalization error. Exponentiated Gradient Exploration for Active Learning: In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration. Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be. Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the "committee" disagrees the most Querying from diverse subspaces or partitions: When the underlying model is a forest of trees, the leaf nodes might represent (overlapping) partitions of the original feature space. This offers the possibility of selecting instances from non-overlapping or minimally overlapping partitions for labeling. Variance reduction: label those points that would minimize output variance, which is one of the components of error. Conformal prediction: predicts that a new data point will have a label similar to old data points in some specified way and degree of the similarity within the old examples is used to estimate the confidence in the prediction. Mismatch-first farthest-traversal: The primary selection criterion is the prediction mismatch between the current model and nearest-neighbour prediction. It targets on wrongly predicted data points. The second selection criterion is the distance to previously selected data, the farthest first. It aims at optimizing the diversity of selected data. User-centered labeling strategies: Learning is accomplished by applying dimensionality reduction to graphs and figures like scatter plots. Then the user is asked to label the compiled data (categorical, numerical, relevance scores, relation between two instances). A wide variety of algorithms have been studied that fall into these categories. While the traditional AL strategies can achieve remarkable performance, it is often challenging to predict in advance which strategy is the most suitable in a particular situation. In recent years, meta-learning algorithms have been gaining in popularity. Some of them have been proposed to tackle the problem of learning AL strategies instead of relying on manually designed strategies. A benchmark which compares 'meta-learning approaches to active learning' to 'traditional heuristic-based Active Learning' may give intuitions if 'Learning active learning' is at the crossroads == Minimum marginal hyperplane == Some active learning algorithms are built upon support-vector machines (SVMs) and exploit the structure of the SVM to determine which data points to label. Such methods usually calculate the margin, W, of each u
Online exhibition
An online exhibition, also referred to as a virtual exhibition, online gallery, cyber-exhibition, is an exhibition whose venue is cyberspace. Museums and other organizations create online exhibitions for many reasons. For example, an online exhibition may: expand on material presented at, or generate interest in, or create a durable online record of, a physical exhibition; save production costs (insurance, shipping, installation); solve conservation/preservation problems (e.g., handling of fragile or rare objects); reach lots more people: "Access to information is no longer restricted to those who can afford travel and museum visits, but is available to anyone who has access to a computer with an Internet connection. Unlike physical exhibitions, online exhibitions are not restricted by time; they are not forced to open and close but may be available 24 hours a day. In the nonprofit world, many museums, libraries, archives, universities, and other cultural organizations create online exhibitions. A database of such exhibitions is Library and Archival Exhibitions on the Web. Online exhibition organizers may use techniques such as marquee text, display advertisements, and in-event emails to engage patrons. Various guides have been published to help organizations create effective online exhibitions. The earliest museum with a physical existence to create a programme of substantial online exhibitions with high resolution images of artefacts was the Museum of the History of Science in Oxford, the first of which, The Measurers: a Flemish Image of Mathematics in the Sixteenth Century and an exhibition of early photographs, were published on 21 August 1995. == Examples of online exhibitions == International Museum of Women is an online-only museum that does not have a physical building and instead offers online exhibitions about women's issues globally as well as an online community. Online exhibitions include "Imagining Ourselves" (launched 2006) about women's identity, "Women, Power and Politics" (2008), and "Economica: Women and the Global Economy" (2009). Tucson LGBTQ Museum is an online-only museum that does not have a physical building and instead offers online exhibitions about LGBTQ history. The online photographic, audio, video, text, and other historical exhibitions include exhibits from the 1700s to the present day. The effort began in the summer of 1967 and spanned almost 50 years. International New Media Gallery (INMG) is an online museum specialising in moving image and screen-based art. The INMG is dedicated to exploring current debates and topics in art history: touching on areas such as migration, war, environmental activism and the internet itself. The gallery publishes extensive academic catalogues alongside its exhibitions. It also hosts spaces for discussion and debate, both online and offline. Virtual Museum of Modern Nigerian Art – the VMMNA is the first of its kind in Africa. Hosted by the Pan-African University, Lagos, Nigeria this virtual museum offers a good view of the development on Nigerian Art in the past fifty years.