AI Detector Spanish

AI Detector Spanish — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Learnable function class

    Learnable function class

    In statistical learning theory, a learnable function class is a set of functions for which an algorithm can be devised to asymptotically minimize the expected risk, uniformly over all probability distributions. The concept of learnable classes are closely related to regularization in machine learning, and provides large sample justifications for certain learning algorithms. == Definition == === Background === Let Ω = X × Y = { ( x , y ) } {\displaystyle \Omega ={\mathcal {X}}\times {\mathcal {Y}}=\{(x,y)\}} be the sample space, where y {\displaystyle y} are the labels and x {\displaystyle x} are the covariates (predictors). F = { f : X ↦ Y } {\displaystyle {\mathcal {F}}=\{f:{\mathcal {X}}\mapsto {\mathcal {Y}}\}} is a collection of mappings (functions) under consideration to link x {\displaystyle x} to y {\displaystyle y} . L : Y × Y ↦ R {\displaystyle L:{\mathcal {Y}}\times {\mathcal {Y}}\mapsto \mathbb {R} } is a pre-given loss function (usually non-negative). Given a probability distribution P ( x , y ) {\displaystyle P(x,y)} on Ω {\displaystyle \Omega } , define the expected risk I P ( f ) {\displaystyle I_{P}(f)} to be: I P ( f ) = ∫ L ( f ( x ) , y ) d P ( x , y ) {\displaystyle I_{P}(f)=\int L(f(x),y)dP(x,y)} The general goal in statistical learning is to find the function in F {\displaystyle {\mathcal {F}}} that minimizes the expected risk. That is, to find solutions to the following problem: f ^ = arg ⁡ min f ∈ F I P ( f ) {\displaystyle {\hat {f}}=\arg \min _{f\in {\mathcal {F}}}I_{P}(f)} But in practice the distribution P {\displaystyle P} is unknown, and any learning task can only be based on finite samples. Thus we seek instead to find an algorithm that asymptotically minimizes the empirical risk, i.e., to find a sequence of functions { f ^ n } n = 1 ∞ {\displaystyle \{{\hat {f}}_{n}\}_{n=1}^{\infty }} that satisfies lim n → ∞ P ( I P ( f ^ n ) − inf f ∈ F I P ( f ) > ϵ ) = 0 {\displaystyle \lim _{n\rightarrow \infty }\mathbb {P} (I_{P}({\hat {f}}_{n})-\inf _{f\in {\mathcal {F}}}I_{P}(f)>\epsilon )=0} One usual algorithm to find such a sequence is through empirical risk minimization. === Learnable function class === We can make the condition given in the above equation stronger by requiring that the convergence is uniform for all probability distributions. That is: The intuition behind the more strict requirement is as such: the rate at which sequence { f ^ n } {\displaystyle \{{\hat {f}}_{n}\}} converges to the minimizer of the expected risk can be very different for different P ( x , y ) {\displaystyle P(x,y)} . Because in real world the true distribution P {\displaystyle P} is always unknown, we would want to select a sequence that performs well under all cases. However, by the no free lunch theorem, such a sequence that satisfies (1) does not exist if F {\displaystyle {\mathcal {F}}} is too complex. This means we need to be careful and not allow too "many" functions in F {\displaystyle {\mathcal {F}}} if we want (1) to be a meaningful requirement. Specifically, function classes that ensure the existence of a sequence { f ^ n } {\displaystyle \{{\hat {f}}_{n}\}} that satisfies (1) are known as learnable classes. It is worth noting that at least for supervised classification and regression problems, if a function class is learnable, then the empirical risk minimization automatically satisfies (1). Thus in these settings not only do we know that the problem posed by (1) is solvable, we also immediately have an algorithm that gives the solution. == Interpretations == If the true relationship between y {\displaystyle y} and x {\displaystyle x} is y ∼ f ∗ ( x ) {\displaystyle y\sim f^{}(x)} , then by selecting the appropriate loss function, f ∗ {\displaystyle f^{}} can always be expressed as the minimizer of the expected loss across all possible functions. That is, f ∗ = arg ⁡ min f ∈ F ∗ I P ( f ) {\displaystyle f^{}=\arg \min _{f\in {\mathcal {F}}^{}}I_{P}(f)} Here we let F ∗ {\displaystyle {\mathcal {F}}^{}} be the collection of all possible functions mapping X {\displaystyle {\mathcal {X}}} onto Y {\displaystyle {\mathcal {Y}}} . f ∗ {\displaystyle f^{}} can be interpreted as the actual data generating mechanism. However, the no free lunch theorem tells us that in practice, with finite samples we cannot hope to search for the expected risk minimizer over F ∗ {\displaystyle {\mathcal {F}}^{}} . Thus we often consider a subset of F ∗ {\displaystyle {\mathcal {F}}^{}} , F {\displaystyle {\mathcal {F}}} , to carry out searches on. By doing so, we risk that f ∗ {\displaystyle f^{}} might not be an element of F {\displaystyle {\mathcal {F}}} . This tradeoff can be mathematically expressed as In the above decomposition, part ( b ) {\displaystyle (b)} does not depend on the data and is non-stochastic. It describes how far away our assumptions ( F {\displaystyle {\mathcal {F}}} ) are from the truth ( F ∗ {\displaystyle {\mathcal {F}}^{}} ). ( b ) {\displaystyle (b)} will be strictly greater than 0 if we make assumptions that are too strong ( F {\displaystyle {\mathcal {F}}} too small). On the other hand, failing to put enough restrictions on F {\displaystyle {\mathcal {F}}} will cause it to be not learnable, and part ( a ) {\displaystyle (a)} will not stochastically converge to 0. This is the well-known overfitting problem in statistics and machine learning literature. == Example: Tikhonov regularization == A good example where learnable classes are used is the so-called Tikhonov regularization in reproducing kernel Hilbert space (RKHS). Specifically, let F ∗ {\displaystyle {\mathcal {F^{}}}} be an RKHS, and | | ⋅ | | 2 {\displaystyle ||\cdot ||_{2}} be the norm on F ∗ {\displaystyle {\mathcal {F^{}}}} given by its inner product. It is shown in that F = { f : | | f | | 2 ≤ γ } {\displaystyle {\mathcal {F}}=\{f:||f||_{2}\leq \gamma \}} is a learnable class for any finite, positive γ {\displaystyle \gamma } . The empirical minimization algorithm to the dual form of this problem is arg ⁡ min f ∈ F ∗ { ∑ i = 1 n L ( f ( x i ) , y i ) + λ | | f | | 2 } {\displaystyle \arg \min _{f\in {\mathcal {F}}^{}}\left\{\sum _{i=1}^{n}L(f(x_{i}),y_{i})+\lambda ||f||_{2}\right\}} This was first introduced by Tikhonov to solve ill-posed problems. Many statistical learning algorithms can be expressed in such a form (for example, the well-known ridge regression). The tradeoff between ( a ) {\displaystyle (a)} and ( b ) {\displaystyle (b)} in (2) is geometrically more intuitive with Tikhonov regularization in RKHS. We can consider a sequence of { F γ } {\displaystyle \{{\mathcal {F}}_{\gamma }\}} , which are essentially balls in F ∗ {\displaystyle {\mathcal {F^{}}}} with centers at 0. As γ {\displaystyle \gamma } gets larger, F γ {\displaystyle {\mathcal {F}}_{\gamma }} gets closer to the entire space, and ( b ) {\displaystyle (b)} is likely to become smaller. However we will also suffer smaller convergence rates in ( a ) {\displaystyle (a)} . The way to choose an optimal γ {\displaystyle \gamma } in finite sample settings is usually through cross-validation. == Relationship to empirical process theory == Part ( a ) {\displaystyle (a)} in (2) is closely linked to empirical process theory in statistics, where the empirical risk { ∑ i = 1 n L ( y i , f ( x i ) ) , f ∈ F } {\displaystyle \{\sum _{i=1}^{n}L(y_{i},f(x_{i})),f\in {\mathcal {F}}\}} are known as empirical processes. In this field, the function class F {\displaystyle {\mathcal {F}}} that satisfies the stochastic convergence are known as uniform Glivenko–Cantelli classes. It has been shown that under certain regularity conditions, learnable classes and uniformly Glivenko-Cantelli classes are equivalent. Interplay between ( a ) {\displaystyle (a)} and ( b ) {\displaystyle (b)} in statistics literature is often known as the bias-variance tradeoff. However, note that in the authors gave an example of stochastic convex optimization for General Setting of Learning where learnability is not equivalent with uniform convergence.

    Read more →
  • Emotion Markup Language

    Emotion Markup Language

    An Emotion Markup Language (EML or EmotionML) has first been defined by the W3C Emotion Incubator Group (EmoXG) as a general-purpose emotion annotation and representation language, which should be usable in a large variety of technological contexts where emotions need to be represented. Emotion-oriented computing (or "affective computing") is gaining importance as interactive technological systems become more sophisticated. Representing the emotional states of a user or the emotional states to be simulated by a user interface requires a suitable representation format; in this case a markup language is used. EmotionML version 1.0 was published by the group in May 2014. == Example == Here is an example of an EmotionML document describing emotions expressed in a video recording of the interaction between a teacher, Alice, and a student, Bob. == History == In 2006, a first W3C Incubator Group, the Emotion Incubator Group (EmoXG), was set up "to investigate a language to represent the emotional states of users and the emotional states simulated by user interfaces" with the final Report published on 10 July 2007. In 2007, the Emotion Markup Language Incubator Group (EmotionML XG) was set up as a follow-up to the Emotion Incubator Group, "to propose a specification draft for an Emotion Markup Language, to document it in a way accessible to non-experts, and to illustrate its use in conjunction with a number of existing markups." The final report of the Emotion Markup Language Incubator Group, Elements of an EmotionML 1.0, was published on 20 November 2008. The work then was continued in 2009 in the frame of the W3C's Multimodal Interaction Activity, with the First Public Working Draft of "Emotion Markup Language (EmotionML) 1.0" being published on 29 October 2009. The Last Call Working Draft of "Emotion Markup Language 1.0", was published on 7 April 2011. The Last Call Working Draft addressed all open issues that arose from feedback of the community on the First Call Working Draft as well as results of a workshop held in Paris in October 2010. Along with the Last Call Working Draft, a list of vocabularies for EmotionML has been published to aid developers using common vocabularies for annotating or representing emotions. Annual draft updates were published until the 1.0 version was finished in 2014. == Reasons for defining an emotion markup language == A standard for an emotion markup language would be useful for the following purposes: To enhance computer-mediated human-human or human-machine communication. Emotions are a basic part of human communication and should therefore be taken into account, e.g. in emotional Chat systems or emphatic voice boxes. This involves specification, analysis and display of emotion related states. To enhance systems' processing efficiency. Emotion and intelligence are strongly interconnected. The modeling of human emotions in computer processing can help to build more efficient systems, e.g. using emotional models for time-critical decision enforcement. To allow the analysis of non-verbal behavior, emotion, mental states that can be provided using web services to enable data collection, analysis, and reporting. Concrete examples of existing technology that could apply EmotionML include: Opinion mining / sentiment analysis in Web 2.0, to automatically track customer's attitude regarding a product across blogs; Affective monitoring, such as ambient assisted living applications, fear detection for surveillance purposes, or using wearable sensors to test customer satisfaction; Wellness technologies that provide assistance according to a person's emotional state with the goal to improve the person's well-being; Character design and control for games and virtual worlds; Building web services to capture, analysis, and report data of non-verbal behavior, emotion and mental states of an individual or group across the internet using standard web technologies such as HTML5 and JSON. Social robots, such as guide robots engaging with visitors; Expressive speech synthesis, generating synthetic speech with different emotions, such as happy or sad, friendly or apologetic; expressive synthetic speech would for example make more information available to blind and partially sighted people, and enrich their experience of the content; Emotion recognition (e.g., for spotting angry customers in speech dialog systems, to improve computer games or e-Learning applications); Support for people with disabilities, such as educational programs for people with autism. EmotionML can be used to make the emotional intent of content explicit. This would enable people with learning disabilities (such as Asperger syndrome) to realise the emotional context of the content; EmotionML can be used for media transcripts and captions. Where emotions are marked up to help deaf or hearing impaired people who cannot hear the soundtrack, more information is made available to enrich their experience of the content. The Emotion Incubator Group has listed 39 individual use cases for an Emotion markup language. A standardised way to mark up the data needed by such "emotion-oriented systems" has the potential to boost development primarily because data that was annotated in a standardised way can be interchanged between systems more easily, thereby simplifying a market for emotional databases, and the standard can be used to ease a market of providers for sub-modules of emotion processing systems, e.g. a web service for the recognition of emotion from text, speech or multi-modal input. == The challenge of defining a generally usable emotion markup language == Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure, as there is no consensus on the number of relevant emotions, on the names that should be given to them or how else best to describe them. For example, the difference between ":)" and "(:" is small, but using a standardized markup it would make one invalid. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small number of dimensions; emotions have an intensity, and so on. For details, see the Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group. Given this lack of agreement on descriptors in the field, the only practical way of defining an emotion markup language is the definition of possible structural elements and to allow users to "plug in" vocabularies that they consider appropriate for their work. An additional challenge lies in the aim to provide a markup language that is generally usable. The requirements that arise from different use cases are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish only a very small number of different states and affective avatars need yet another level of detail for expressing emotions in an appropriate way. For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice. == Applications and web services benefiting from an emotion markup language == There are a range of existing projects and applications to which an emotion markup language will enable the building of webservices to measure capture data of individuals non-verbal behavior, mental states, and emotions and allowing results to be reported and rendered in a standardized format using standard web technologies such as JSON and HTML5. One such project is measuring affect data across the Internet using EyesWeb.

    Read more →
  • Enumeration algorithm

    Enumeration algorithm

    In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms. == Formal definitions == An enumeration problem P {\displaystyle P} is defined as a relation R {\displaystyle R} over strings of an arbitrary alphabet Σ {\displaystyle \Sigma } : R ⊆ Σ ∗ × Σ ∗ {\displaystyle R\subseteq \Sigma ^{}\times \Sigma ^{}} An algorithm solves P {\displaystyle P} if for every input x {\displaystyle x} the algorithm produces the (possibly infinite) sequence y {\displaystyle y} such that y {\displaystyle y} has no duplicate and z ∈ y {\displaystyle z\in y} if and only if ( x , z ) ∈ R {\displaystyle (x,z)\in R} . The algorithm should halt if the sequence y {\displaystyle y} is finite. == Common complexity classes == Enumeration problems have been studied in the context of computational complexity theory, and several complexity classes have been introduced for such problems. A very general such class is EnumP, the class of problems for which the correctness of a possible output can be checked in polynomial time in the input and output. Formally, for such a problem, there must exist an algorithm A which takes as input the problem input x, the candidate output y, and solves the decision problem of whether y is a correct output for the input x, in polynomial time in x and y. For instance, this class contains all problems that amount to enumerating the witnesses of a problem in the class NP. Other classes that have been defined include the following. In the case of problems that are also in EnumP, these problems are ordered from least to most specific: Output polynomial, the class of problems whose complete output can be computed in polynomial time. Incremental polynomial time, the class of problems where, for all i, the i-th output can be produced in polynomial time in the input size and in the number i. Polynomial delay, the class of problems where the delay between two consecutive outputs is polynomial in the input (and independent from the output). Strongly polynomial delay, the class of problems where the delay before each output is polynomial in the size of this specific output (and independent from the input or from the other outputs). The preprocessing is generally assumed to be polynomial. Constant delay, the class of problems where the delay before each output is constant, i.e., independent from the input and output. The preprocessing phase is generally assumed to be polynomial in the input. == Common techniques == Backtracking: The simplest way to enumerate all solutions is by systematically exploring the space of possible results (partitioning it at each successive step). However, performing this may not give good guarantees on the delay, i.e., a backtracking algorithm may spend a long time exploring parts of the space of possible results that do not give rise to a full solution. Flashlight search: This technique improves on backtracking by exploring the space of all possible solutions but solving at each step the problem of whether the current partial solution can be extended to a partial solution. If the answer is no, then the algorithm can immediately backtrack and avoid wasting time, which makes it easier to show guarantees on the delay between any two complete solutions. In particular, this technique applies well to self-reducible problems. Closure under set operations: If we wish to enumerate the disjoint union of two sets, then we can solve the problem by enumerating the first set and then the second set. If the union is non disjoint but the sets can be enumerated in sorted order, then the enumeration can be performed in parallel on both sets while eliminating duplicates on the fly. If the union is not disjoint and both sets are not sorted then duplicates can be eliminated at the expense of a higher memory usage, e.g., using a hash table. Likewise, the cartesian product of two sets can be enumerated efficiently by enumerating one set and joining each result with all results obtained when enumerating the second step. == Examples of enumeration problems == The vertex enumeration problem, where we are given a polytope described as a system of linear inequalities and we must enumerate the vertices of the polytope. Enumerating the minimal transversals of a hypergraph. This problem is related to monotone dualization and is connected to many applications in database theory and graph theory. Enumerating the answers to a database query, for instance a conjunctive query or a query expressed in monadic second-order. There have been characterizations in database theory of which conjunctive queries could be enumerated with linear preprocessing and constant delay. The problem of enumerating maximal cliques in an input graph, e.g., with the Bron–Kerbosch algorithm Listing all elements of structures such as matroids and greedoids Several problems on graphs, e.g., enumerating independent sets, paths, cuts, etc. Enumerating the satisfying assignments of representations of Boolean functions, e.g., a Boolean formula written in conjunctive normal form or disjunctive normal form, a binary decision diagram such as an OBDD, or a Boolean circuit in restricted classes studied in knowledge compilation, e.g., NNF. == Connection to computability theory == The notion of enumeration algorithms is also used in the field of computability theory to define some high complexity classes such as RE, the class of all recursively enumerable problems. This is the class of sets for which there exist an enumeration algorithm that will produce all elements of the set: the algorithm may run forever if the set is infinite, but each solution must be produced by the algorithm after a finite time.

    Read more →
  • Single source of truth

    Single source of truth

    In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas such that every data element is mastered (or edited) in only one place, providing data normalization to a canonical form (for example, in database normalization or content transclusion). There are several scenarios with respect to copies and updates: The master data is never copied and instead only references to it are made; this means that all reads and updates go directly to the SSOT. The master data is copied but the copies are only read and only the master data is updated; if requests to read data are only made on copies, this is an instance of CQRS. The master data is copied and the copies are updated; this needs a reconciliation mechanism when there are concurrent updates. Updates on copies can be thrown out whenever a concurrent update is made on the master, so they are not considered fully committed until propagated to the master. (many blockchains work that way.) Concurrent updates are merged. (if an automatic merge fails, it could fall back on another strategy, which could be the previous strategy or something else like manual intervention, which most source version control systems do.) The advantages of SSOT architectures include easier prevention of mistaken inconsistencies (such as a duplicate value/copy somewhere being forgotten), and greatly simplified version control. Without a SSOT, dealing with inconsistencies implies either complex and error-prone consensus algorithms, or using a simpler architecture that's liable to lose data in the face of inconsistency (the latter may seem unacceptable but it is sometimes a very good choice; it is how most blockchains operate: a transaction is actually final only if it was included in the next block that is mined). Ideally, SSOT systems provide data that are authentic (and authenticatable), relevant, and referable. Deployment of an SSOT architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional or unintentional denormalization of any explicit data model) pose a risk for retrieval of outdated, and therefore incorrect, information. Common examples (i.e., example classes of implementation) are as follows: In electronic health records (EHRs), it is imperative to accurately validate patient identity against a single referential repository, which serves as the SSOT. Duplicate representations of data within the enterprise would be implemented by the use of pointers rather than duplicate database tables, rows, or cells. This ensures that data updates to elements in the authoritative location are comprehensively distributed to all federated database constituencies in the larger overall enterprise architecture. EHRs are an excellent class for exemplifying how SSOT architecture is both poignantly necessary and challenging to achieve: it is challenging because inter-organization health information exchange is inherently a cybersecurity competence hurdle, and nonetheless it is necessary, to prevent medical errors, to prevent the wasted costs of inefficiency (such as duplicated work or rework), and to make the primary care and medical home concepts feasible (to achieve competent care transitions). Single-source publishing as a general principle or ideal in content management relies on having SSOTs, via transclusion or (otherwise, at least) substitution. Substitution happens via libraries of objects that can be propagated as static copies which are later refreshed when necessary (that is, when refreshing of the copy-paste or import is triggered by a larger updating event). Component content management systems are a class of content management systems that aim to provide competence on this level. == Implementation == === Ontologic interactions === An acknowledged prerequisite (of the notion that any given single source of truth can exist) is that it depends on the ontologic condition that no more than a single truth (about any particular fact or idea) exists, an assertion that is ontologic in both the IT sense and the general sense of that word. In many instances, this presents no problem (for example, within particular namespaces, or even across them, as long as naming collisions or broader name conflicts are adequately handled). The broadest contexts (and thus thorniest, regarding ontologic discrepancies) require adequate epistemic regime comparison and reconciliation (or at least negotiation or transactional exchanges). An archetypal example of this class of reconciliation is that two theological seminary libraries, from two different religions (X and Y), could exchange information with an SSOT architecture, but the unification of truth would reside on the level of the statement that "religion X asserts that God is purple whereas religion Y asserts that God is green", rather than on the level of "God is purple" or "God is green". === Architectures or architectural features === An ideal implementation of SSOT is rarely possible in most enterprises. This is because many organisations have multiple information systems, each of which needs access to data relating to the same entities (e.g., customer). Often these systems are purchased as commercial off-the-shelf products from vendors and cannot be modified in trivial ways. Each of these various systems therefore needs to store its own version of common data or entities, and therefore each system must retain its own copy of a record (hence immediately violating the SSOT approach defined above). For example, an enterprise resource planning (ERP) system (such as SAP or Oracle e-Business Suite) may store a customer record; the customer relationship management (CRM) system also needs a copy of the customer record (or part of it) and the warehouse dispatch system might also need a copy of some or all of the customer data (e.g., shipping address). In cases where vendors do not support such modifications, it is not always possible to replace these records with pointers to the SSOT. For organisations (with more than one information system) wishing to implement a Single Source of Truth (without modifying all but one master system to store pointers to other systems for all entities), some supporting architectures are: Master data management (MDM) Event store and event sourcing (ES) ==== Master data management (MDM) ==== A master data management system typically serves as the source of truth for an organization's metadata, helping to ensure accuracy and consistency throughout that organizations multiple data sources. Typically the MDM acts as a hub for multiple systems, many of which could allow (be the source of truth for) updates to different aspects of information on a given entity. For example, the CRM system may be the "source of truth" for most aspects of the customer, and is updated by a call centre operator. However, a customer may (for example) also update their address via a customer service web site, with a different back-end database from the CRM system. The MDM application receives updates from multiple sources, acts as a broker to determine which updates are to be regarded as authoritative (the golden record) and then syndicates this updated data to all subscribing systems. The MDM application normally requires an ESB to syndicate its data to multiple subscribing systems. ==== Event store and event sourcing (ES) ==== In event oriented architectures, it has become increasingly common to find an implementation of the Event Sourcing pattern which stores the system state as an ordered sequence of state changes. To do this, you need an Event Store, a particular type of database designed to hold all the events that change the state of the system. The event store in an Event Sourcing + Command Query Responsibility Separation + Domain Driven Design + Messaging architecture is in fact a "single source of truth", with the additional advantage that it can also act as an Enterprise Service Bus as it can listen directly to the event store for status changes as everything passes by. In addition, by saving all the events, it also plays the role of Data Warehouse. One last advantage is that through this system the Shared Database pattern can be implemented, another technique not mentioned to obtain a single source of truth. ==== Data warehouse (DW) ==== While the primary purpose of a data warehouse is to support reporting and analysis of data that has been combined from multiple sources, the fact that such data has been combined (according to business logic embedded in the data transformation and integration processes) means that the data warehouse is often used as a de facto SSOT. Generally, however, the data available from the data warehouse are not used to update other systems; rather the DW becomes

    Read more →
  • Universal IR Evaluation

    Universal IR Evaluation

    In computer science, Universal IR Evaluation (information retrieval evaluation) aims to develop measures of database retrieval performance that shall be comparable across all information retrieval tasks. == Measures of "relevance" == IR (information retrieval) evaluation begins whenever a user submits a query (search term) to a database. If the user is able to determine the relevance of each document in the database (relevant or not relevant), then for each query, the complete set of documents is naturally divided into four distinct (mutually exclusive) subsets: relevant documents that are retrieved, not relevant documents that are retrieved, relevant documents that are not retrieved, and not relevant documents that are not retrieved. These four subsets (of documents) are denoted by the letters a, b, c, d respectively and are called Swets variables, named after their inventor. In addition to the Swets definitions, four relevance metrics have also been defined: Recall refers to the fraction of relevant documents that are retrieved (a/(a+b)), and Precision refers to the fraction of retrieved documents that are relevant (a/(a+c)). These are the most commonly used and well-known relevance metrics found in the IR evaluation literature. Two less commonly used metrics include the Fallout, i.e., the fraction of not relevant documents that are retrieved (b/(b+d)), and the Miss, which refers to the fraction of relevant documents that are not retrieved (c/(c+d)) during any given search. == Universal IR evaluation techniques == Universal IR evaluation addresses the mathematical possibilities and relationships among the four relevance metrics Precision, Recall, Fallout and Miss, denoted by P, R, F and M, respectively. One aspect of the problem involves finding a mathematical derivation of a complete set of universal IR evaluation points. The complete set of 16 points, each one a quadruple of the form (P, R, F, M), describes all the possible universal IR outcomes. For example, many of us have had the experience of querying a database and not retrieving any documents at all. In this case, the Precision would take on the undetermined form 0/0, the Recall and Fallout would both be zero, and the Miss would be any value greater than zero and less than one (assuming a mix of relevant and not relevant documents were in the database, none of which were retrieved). This universal IR evaluation point would thus be denoted by (0/0, 0, 0, M), which represents only one of the 16 possible universal IR outcomes. The mathematics of universal IR evaluation is a fairly new subject since the relevance metrics P, R, F, M were not analyzed collectively until recently (within the past decade). A lot of the theoretical groundwork has already been formulated, but new insights in this area await discovery.

    Read more →
  • Microsoft Office PerformancePoint Server

    Microsoft Office PerformancePoint Server

    Microsoft Office PerformancePoint Server is a business intelligence software product released in 2007 by Microsoft. The product was generally an integration of the acquisitions from ProClarity - the Planning Server and Monitoring Server - into Microsoft's SharePoint server product line. Although discontinued in 2009, the dashboard, scorecard, and analytics capabilities of PerformancePoint Server were incorporated into SharePoint 2010 and later versions. PerformancePoint Server also provided a planning and budgeting component directly integrated with Excel. == History == Microsoft offered preview releases of PerformancePoint Server starting in mid-2006. Previews of the product were formed from Business Scorecard Manager 2005 and the Planning Server component. Acquisitions ProClarity and Great Plains brought additional analytics and planning/reporting capabilities, as well as companion products ProClarity 6.3 and FRx. PerformancePoint Server was officially released in November 2007. Microsoft discontinued PerformancePoint Server as an independent product in 2009 and folded its dashboard, scorecard and analytics capabilities into PerformancePoint Services in SharePoint Server 2010. == Monitoring Server Component == Business monitoring capabilities, including dashboards, scorecards & key performance indicators, navigable reports for deeper analysis, strategy maps, and linked filtering, are provided by PerformancePoint's Monitoring Server component. A Dashboard Designer application that is distributed from Monitoring Server enables business analysts or IT Administrators to: create & test data source connections create views that use those data connections assemble the views into a dashboard deploy the dashboard as a SharePoint page Dashboard Designer saved content and security information back to the Monitoring Server. Data source connections, such as OLAP cubes or relational tables, were also made through Monitoring Server. After a dashboard has been published to the Monitoring Server database, it would be deployed as a SharePoint page and shared with other users as such. When the pages were opened in a web browser, Monitoring Server updated the data in the views by connecting back to the original data sources. == Planning Server Component == PerformancePoint's Planning Server component supported maintenance of logical business models, budget & approval workflows, enterprise data sources, and it followed Generally Accepted Accounting Principles. Planning Server made use of Excel for input and line-of-business reporting, as well as SQL Server for storing and processing business models. == Management Reporter Component == The Management Reporter component was designed to perform financial reporting and can read PerformancePoint Planning models directly. A development kit was also available to allow this component to read other models.

    Read more →
  • Operational historian

    Operational historian

    In manufacturing, an operational historian is a time-series database application that is developed for operational process data. Historian software is often embedded or used in conjunction with standard DCS and PLC control systems to provide enhanced data capture, validation, compression, and aggregation capabilities. Historians have been deployed in almost every industry and contribute to functions such as supervisory control, performance monitoring, quality assurance, and, more recently, machine learning applications which can learn from vast quantities of historical data. These systems were originally developed to capture instrumentation and control data, which led many to use the term "tag" for a stream of process data, referring to the physical "tags" which had been placed on instrumentation for manually capturing data. Raw data may be accessed via OPC HDA, SQL, or REST API interfaces. == Operational Support == Operational historians are typically used within the manufacturing facility by engineers and operators for supervisory functions and analysis. An operational historian will typically capture all instrumentation and control data, whereas an enterprise historian that is deployed to support business functions will capture only a subset of the plant data. Typically, these applications offer data access through dedicated APIs (Application Programming Interfaces) and SDKs (Software Development Kits) which offer high-performance read and write operations. These operate through vendor-specific or custom applications. Front-end tools for trending process data over time are the most common interfaces to these databases. Because these applications are typically deployed next to or near the source of their process data, they are often marketed and sold as 'real-time database systems.' This distinction varies among vendors, who often have to make tradeoffs in performance between data capture and presentation, and application and analysis functionality. The following is a list of typical challenges for operational historians: data collection from instrumentation and controls storage and archiving of very large volumes of data organization of data in the form of "tags" or "points" limiting of monitoring (alarms) and validation aggregation and interpolation manual data entry (MDE) == Data access == As opposed to enterprise historians, the data access layer in the operational historian is designed to offer sophisticated data fetching modes without complex information analysis facilities. The following settings are typically available for data access operations: Data scope (single point or tag, history based on time range, history based on sample count) Request modes (raw data, last-known value, aggregation, interpolation) Sampling (single point, all points without sampling, all points with interval sampling) Data omission (based on the sample quality, based on the sample value, based on the count) Even though the operational historians are rarely relational database management systems, they often offer SQL-based interfaces to query the database. In most of such implementations, the dialect does not follow the SQL standard in order to provide syntax for specifying data access operations parameters.

    Read more →
  • Algorithmic logic

    Algorithmic logic

    Algorithmic logic is a calculus of programs that allows the expression of semantic properties of programs by appropriate logical formulas. It provides a framework that enables proving the formulas from the axioms of program constructs such as assignment, iteration and composition instructions and from the axioms of the data structures in question see Mirkowska & Salwicki (1987), Banachowski et al. (1977). The following diagram helps to locate algorithmic logic among other logics. [ P r o p o s i t i o n a l l o g i c o r S e n t e n t i a l c a l c u l u s ] ⊂ [ P r e d i c a t e c a l c u l u s o r F i r s t o r d e r l o g i c ] ⊂ [ C a l c u l u s o f p r o g r a m s o r Algorithmic logic ] {\displaystyle \qquad \left[{\begin{array}{l}\mathrm {Propositional\ logic} \\or\\\mathrm {Sentential\ calculus} \end{array}}\right]\subset \left[{\begin{array}{l}\mathrm {Predicate\ calculus} \\or\\\mathrm {First\ order\ logic} \end{array}}\right]\subset \left[{\begin{array}{l}\mathrm {Calculus\ of\ programs} \\or\\{\mbox{Algorithmic logic}}\end{array}}\right]} The formalized language of algorithmic logic (and of algorithmic theories of various data structures) contains three types of well formed expressions: Terms - i.e. expressions denoting operations on elements of data structures, formulas - i.e. expressions denoting the relations among elements of data structures, programs - i.e. algorithms - these expressions describe the computations. For semantics of terms and formulas consult pages on first-order logic and Tarski's semantics. The meaning of a program K {\displaystyle K} is the set of possible computations of the program. Algorithmic logic is one of many logics of programs. Another logic of programs is dynamic logic, see dynamic logic, Harel, Kozen & Tiuryn (2000).

    Read more →
  • Bright Computing

    Bright Computing

    Bright Computing, Inc. was a developer of software for deploying and managing high-performance (HPC) clusters, Kubernetes clusters, and OpenStack private clouds in on-premises data centers as well as in the public cloud. In 2022, it was acquired by Nvidia. == History == Bright Computing was founded by Matthijs van Leeuwen in 2009, who spun the company out of ClusterVision, which he had co-founded with Alex Ninaber and Arijan Sauer. Alex and Matthijs had worked together at UK’s Compusys, which was one of the first companies to commercially build HPC clusters. They left Compusys in 2002 to start ClusterVision in the Netherlands, after determining there was a growing market for building and managing supercomputer clusters using off-the-shelf hardware components and open source software, tied together with their own customized scripts. ClusterVision also provided delivery and installation support services for HPC clusters at universities and government entities. In 2004, Martijn de Vries joined ClusterVision and began development of cluster management software. The software was made available to customers in 2008, under the name ClusterVisionOS v4. In 2009, Bright Computing was spun out of ClusterVision. ClusterVisionOS was renamed Bright Cluster Manager, and van Leeuwen was named Bright Computing’s CEO. In February 2016, Bright appointed Bill Wagner as chief executive officer. Matthijs van Leeuwen became chief strategy officer, and then left the company and board of directors in 2018. In January 2022 Bright was acquired by Nvidia. Nvidia cited using Bright's Amsterdam facility as a development center. The acquisition occurred after several layoffs under Bill Wagner. == Customers == Early customers included Boeing, Sandia National Laboratories, Virginia Tech, Hewlett Packard, NSA, and Drexel University. Many early customers were introduced through resellers, including SICORP, Cray, Dell, and Advanced HPC. As of 2019, the company had more than 700 customers, including more than fifty Fortune 500 Companies. == Products and services == Bright Cluster Manager for HPC lets customers deploy and manage complete clusters. It provides management for the hardware, the operating system, the HPC software, and users. In 2014, the company announced Bright OpenStack, software to deploy, provision, and manage OpenStack-based private cloud infrastructures. In 2016, Bright started bundling several machine learning frameworks and associated tools and libraries with the product, to make it very easy to get machine learning workload up and running on a Bright cluster. In December 2018, version 8.2 was released, which introduced support for the ARM64 architecture, edge capabilities to build clusters spread out over many different geographical locations, improved workload accounting & reporting features, as well as many improvements to Bright's integration with Kubernetes. Bright Cluster Manager software was frequently sold through original equipment manufacturer (OEM) resellers, including Dell and HPE. In version 10, Bright Cluster Manager was merged into the NVIDIA Base Command Manager. Bright Computing was covered by Software Magazine and Yahoo! Finance, among other publications. == Awards == In 2016, Bright Computing was awarded a €1.5M Horizon 2020 SME Instrument grant from the European Commission. Bright Computing was one of only 33 grant recipients from 960 submitted proposals. In its category only 5 out of 260 grants were awarded. 2015 HPCwire Editor’s Choice Award for “Best HPC Cluster Solution or Technology" Main Software 50 “Highest Growth” award winner, 2013 Deloitte Technology Fast50 “Rising Star 2013” award winner Bio-IT World Conference & Expo ‘13, Boston, MA, winner of “IT Hardware & Infrastructure” category of the “Best of Show Award” program Red Herring Top 100 Global Award, 2013

    Read more →
  • List of library and information science journals

    List of library and information science journals

    This list covers the journals, magazines, periodicals already published and continuing in the discipline of library and information science (LIS). It doesn't include ceased titles or predatory journals. Titles listed were taken from various scholarly sources, UGC Care and Wikipedia articles. == LIS journal prestige as assessed by LIS faculty == In a 2013 article by Laura Manzari, 232 LIS faculty members from ALA-accredited information science programs ranked the most prestigious journals in library and information science. The following journals were ranked in the top ten most prestigious: Journal of the Association for Information Science and Technology The Library Quarterly Annual Review of Information Science and Technology Journal of Documentation Library Trends Library and Information Science Research Information Processing and Management Journal of Education for Library and Information Science Education College & Research Libraries First Monday (journal) A subsequent study by Safón and Docampo in 2023 identified impactful LIS journals based on their influence on papers published in other LIS publications. Journals listed in the top ten in this study that did not appear in Manzari's list include: Scientometrics International Journal of Information Management Quantitative Science Studies MIS Quarterly Information and Management Journal of the Association for Information Systems Journal of Informetrics The Journal of Academic Librarianship == India == Annals of Library and Information Studies. (Pub: CSIR-NIScPR ), Formerly: Annals of Library Science. ISSN 0003-4835. (1954-) OPEN ACCESS Collnet Journal of Scientrometrics and Information Management (Pub: Taru Publications, Online through Taylor and Francis) ISSN: 0973-7766 Online 2168-930X. College Libraries (Pub: West Bengal College Librarians’ Association (WBCLA) ISSN 0972-1975, Quarterly DESIDOC Journal of Library and Information Technology (DJLIT) (Formerly: DESIDOC Bulletin 0970-8154, DESIDOC Bulletin of Information Technology. 0971-4383/0974-0643) (Pub: Defence Scientific Information & Documentation Centre) ISSN: 0974-0643, ISSN: 0976-4658 (O), Bi-monthly, OPEN ACCESS. Grandhalaya Sarvaswam (Bilingual: Telugu & English) [Pub: Andhra Pradesh Library Association, Vijayawada, Andhra Pradesh, India] (1915–) Gyankosh: Journal of Library and Information Management. (Pub: Integrated Academy Of Management And Technology. Through: Indian Journals.Com). ISSN: 2229-4023 (P), 2249-3182. Half yearly. IASLIC Bulletin (Pub: Indian Association of Special Libraries and Information Centres) ISSN: 0018-8411. Quarterly (1956-) IASLIC Newsletter (Pub: Indian Association of Special Libraries and Information Centres. (Pub: Indian Association of Special Libraries and Information Centres) ISSN 0018-845X. Monthly. (1966-) INFLIBNET Newsletter. (Pub: INFLIBNET). Monthly. Informatics Studies. (Pub: Centre For Informatics Research And Development). Quarterly. Through: Indian journals.com. ISSN: 2583-8994 (Online), 2320-530X (Print) ISST Journal of Advances in Librarianship (Pub:Intellectuals Society for Socio-Techno Welfare) ISSN: 0976-9021. Semiannual. Journal of Advanced Research in Library and Information Science. (JALIS Publishers). 4/year. ISSN 2277-2219. Journal of Indian Library Association (Pub: Indian Library Association). ISSN (P) 2277-5145 O) 2456-513X. Quarterly. (1965-). Journal of Scientometric Research. (Pub: Phcog.Net). ISSN (P) 2321-6654, (O) 2320-0057]; Frequency : Triannual. KELPRO Bulletin (Pub: Kerala Library Professionals' Organisation - KELPRO). ISSN 0975-4911( Print),2582-497X (O).(1993-) KIIT Journal of Library and Information Management (Pub: KIIT University, online through Indian Journals.com) Half yearly. ISSN: 2348-0858. Library Herald. (Pub: Delhi Library Association - DLA). Quarterly. ISSN: 0024-2292. Library Progress (International). (Pub: Bpas Publications, Through: ). Half yearly. ISSN: 0970-1052. (O) ISSN: 2320-317X. (1981-) Pearl: A Journal of Library and Information Science. (Pub: University Library Teacher's Association of Andhra Pradesh, Hyderabad), ISSN: 0973-7081 (print), 0975-6922 (online). Quarterly. RBU Journal of Library and Information Science. (Pub: Rabindra Bharati University).ISSN: 0972-2750. Annual. SALIS Journal of Information Management and Technology - SJIMT. (Pub: Society for the Advancement of Library and Information Science). Half-yearly. ISSN 0975-4105. SALIS Journal of Library and Information Science - SJLIS: an International Journal. (Pub: Society for the Advancement of Library and Information Science). Half-yearly. ISSN: 0973-3108. SRELS journal of Information and Knowledge (Formerly: Library Science with a Slant to Documentation, ISSN: 0024-2543; Library Science with a Slant to Documentation and Information Studies ISSN: 0970-6089; SRELS Journal of Information Management ISSN: ). Quarterly. ISSN: 2583-9314 (O) World Digital Libraries. Half yearly. ISSN: 0974-567X (P), 0975-7597 (O). == Other countries == African Journal of Library, Archives and Information Science Art Libraries Journal (Cambridge University Press) Bibliothèque de l'École des Chartes Canadian Journal of Information and Library Science Cataloging & Classification Quarterly Communications in Information Literacy Cataloging & Classification Quarterly Catholic Library Association Children and Libraries Code4Lib Journal College & Research Libraries Communications in Information Literacy Disability in Library and Information Studies Electronic Journal of Academic and Special Librarianship El Profesional de la Información (es) (EPI) (Formerly Information World en Español) Evidence Based Library and Information Practice (journal) Faslname-ye Ketab Florida Libraries. Florida Library Association. Georgia Library Quarterly. Quarterly. (Pub: Georgia Library Association). Hipertext.net IFLA Journal In the Library with the Lead Pipe Information & Culture International Journal of Information Retrieval Research (IJIRR) Information Processing and Management Information Research Information Sciences (journal) Information Visualization (journal) Information, Communication & Society International Journal of Geographical Information Science Information Research: An International Electronic Journal (IR) Internet Research (journal) Issues in Science and Technology Librarianship Italian Journal of Library and Information Studies (JLIS.it) JLIS.it Journal of Documentation (JDoc) Journal of Information Ethics Journal of Information Science (JIS) Journal of Information Technology Journal of Informetrics Journal of Librarianship and Information Science Journal of Library & Information Studies - JLIS. (Pub: National Taiwan University) Journal of Library Administration Journal of Religious & Theological Information Journal of the Association for Information Science and Technology (Formerly Journal of the American Society for Information Science and Technology) (JASIST) Journal of the Medical Library Association Journal of the Canadian Health Libraries Association (Pub: Canadian Health Libraries Association). Knowledge Organization (journal) Knowledge Quest. (Pub: American Association of School Librarians) Library and Information Science Abstracts Library Literature and Information Science Library, Information Science & Technology Abstracts Library Literature and Information Science Retrospective Library Review (journal) Library Trends Libri (journal) Malaysian Journal of Library and Information Science MLA Forum New Century Library New Review of Children's Literature and Librarianship Notes (journal) Portal – Libraries and the Academy Progressive Librarian, Progressive Librarians Guild Reference and User Services Quarterly Reference Services Review Research Evaluation (journal) Scientometrics (journal) Serials Review South African Journal of Libraries and Information Science The Charleston Advisor The Christian Librarian, from the Association of Christian Librarians The Journal of Academic Librarianship The Library Quarterly (LQ) The Public-Access Computer Systems Review TripleC Webolog

    Read more →
  • Best arm identification

    Best arm identification

    Best arm identification (BAI) is a sequential one-player game where the player has to find the best action (arm) among a list of actions (arms) by collecting information in the most efficient way. It is a multi-armed bandit game as a player only gets information about an arm by playing it. The most common objective in multi-armed bandit games is to minimize the regret (i.e., play the best action as much as possible), but in BAI, the goal is to find the best arm as efficiently as possible. This problem naturally arises in scenarios such as adaptive clinical trials where the number of patients is limited and the quantification of the confidence in a treatment is important. It also arises in hyperparameter optimization where the goal is to find the optimal choice of hyperparameters for an algorithm with the smallest possible number of experiments, as it can be costly in terms of time, energy, or money. == Stochastic multi-armed bandit == The stochastic multi-armed bandit (MAB) is a sequential game with one player and K {\displaystyle K} actions (arms). Each arm has an unknown probability distribution associated with it. At each turn, the player has to choose one action and receive an observation from the probability distribution associated with the arm. The more you play an arm, the more you get information on its probability distribution. === Best arm identification === In BAI the goal is to find the arm that has the probability distribution with the highest mean. BAI may be either fixed confidence or fixed horizon. In a fixed-confidence game, a confidence level δ {\displaystyle \delta } is fixed at the beginning of the game and the goal is to find the best arm with this confidence level in as few turns as possible. In a fixed horizon game, the number of turns T {\displaystyle T} is fixed, and the goal is to find the best arm with the highest possible confidence in T {\displaystyle T} turns. === Math formalisation === We have one player and K {\displaystyle K} actions (arms). Behind each arm k ∈ { 1 , … , K } {\displaystyle k\in \{1,\ldots ,K\}} lies an unknown distribution ν k {\displaystyle \nu _{k}} with mean μ k {\displaystyle \mu _{k}} . Each distribution ν k {\displaystyle \nu _{k}} belongs to a known family D {\displaystyle {\mathcal {D}}} (such as the set of Gaussian distributions or Bernoulli distributions). At each time step t {\displaystyle t} , the player selects an arm a t {\displaystyle a_{t}} and observes an independent sample X t ∼ ν a t {\displaystyle X_{t}\sim \nu _{a_{t}}} from the corresponding distribution. We will note μ ∗ := max μ a {\displaystyle \mu ^{}:=\max \mu _{a}} the highest mean. An arm a {\displaystyle a} that satisfies μ a = μ ∗ {\displaystyle \mu _{a}=\mu ^{}} is called an optimal arm; otherwise it is called suboptimal arm. In best arm identification (BAI) the objective is to identify an optimal arm. Two main settings for BAI appear in the literature: Fixed confidence: In this setting, one typically assumes that there exists a unique optimal arm. A confidence level δ ∈ ( 0 , 1 ) {\displaystyle \delta \in (0,1)} is specified at the beginning. The algorithm must stop at some finite stopping time τ δ < + ∞ {\displaystyle \tau _{\delta }<+\infty } and return an arm a ^ τ δ {\displaystyle {\hat {a}}_{\tau _{\delta }}} such that the probability of error is bounded: P ( a ^ τ δ ≠ a ∗ ) ≤ δ {\displaystyle \mathbb {P} ({\hat {a}}_{\tau _{\delta }}\neq a^{})\leq \delta } . The objective is to minimize the expected sample complexity E [ τ δ ] {\displaystyle \mathbb {E} [\tau _{\delta }]} . Such a setting appears, for example, when a constraint on the confidence is required (for example, if we require a confidence level of 95%, so δ = 1 − 0.95 = 0.05 {\displaystyle \delta =1-0.95=0.05} ). Fixed horizon: In this setting, the number of samples T {\displaystyle T} is fixed in advance. The goal is to design an algorithm that minimizes the probability of misidentifying the optimal arm: P ( a ^ T ≠ a ∗ ) {\displaystyle \mathbb {P} ({\hat {a}}_{T}\neq a^{})} . This setting appears when the number of experiments is limited (for drug tests, the number of patients can be fixed in advance). === Example of simple modelling === In the case where we have K {\displaystyle K} treatments and we want to be sure with a confidence level of 95% which treatment is the best to heal a specific disease. Each treatment heals or does not heal the disease with a probability μ k {\displaystyle \mu _{k}} , which means that each distribution is a Bernoulli distribution, so D {\displaystyle {\mathcal {D}}} is the set of Bernoulli distributions. We can use a BAI algorithm to minimize E [ τ 0.05 ] {\displaystyle \mathbb {E} [\tau _{0.05}]} , the number of patients required to find the best treatment with probability 95%. == Applications == Best arm identification naturally arises in several practical domains: Adaptive clinical trials: The objective is to identify the most effective treatment based on sequentially collected patient data. Each treatment can be modeled as having an underlying distribution of outcomes. The goal is to identify the treatment with the highest expected outcome with high confidence (fixed confidence setting δ {\displaystyle \delta } ) while minimizing the number of drug test patients (minimise E [ τ δ ] {\displaystyle \mathbb {E} [\tau _{\delta }]} ), as it costs to pay patients for this and we would like to use as little as possible less effective drugs. Hyperparameter tuning: Selecting the best configuration for machine learning models efficiently by treating each hyperparameter setting as an arm. The goal is to find the best hyperparameter with as few experiments possible as experiments are costly in time and in energy == Fixed confidence level == In the fixed-confidence setting, the goal is to design an algorithm that identifies the best arm with a prescribed confidence level δ {\displaystyle \delta } while minimizing the expected number of samples. Any such algorithm requires two key components: Stopping rule: A decision criterion that determines when to stop sampling. Formally, this defines a stopping time τ δ {\displaystyle \tau _{\delta }} and returns an arm a ^ τ δ {\displaystyle {\hat {a}}_{\tau _{\delta }}} such that P ( a ^ τ δ ≠ a ⋆ ) ≤ δ {\displaystyle \mathbb {P} ({\hat {a}}_{\tau _{\delta }}\neq a^{\star })\leq \delta } and P ( τ δ < + ∞ ) = 1 {\displaystyle \mathbb {P} (\tau _{\delta }<+\infty )=1} . Sampling rule: A policy π {\displaystyle \pi } that, at each round t {\displaystyle t} , selects the next arm to sample a t {\displaystyle a_{t}} based on all previous observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s Read more →

  • World Congress of Universal Documentation

    World Congress of Universal Documentation

    The World Congress of Universal Documentation was held from 16 to 21 August 1937 in Paris, France. Delegates from 45 countries met to discuss means by which all of the world's information, in print, in manuscript, and in other forms, could be efficiently organized and made accessible. == The Congress in the history of information science == The Congress, held at the Trocadéro under "the auspices" of the Institut International de Bibliographie, was "the apotheosis" of a general movement in the 1930s towards the classification of the growing mass of information and the improvement of access to that information. For the first time in the history of information science, technological means were beginning to catch up with theoretical ends, and the discussions at the conference reflected that fact. Its participation in the Congress was one of the first projects of the American Documentation Institute (ADI). Participants in the conference discussed what has been more recently called "a continuously updated hypertext encyclopedia." Joseph Reagle sees many of the ideas considered at the conference as forerunners of some of the key goals and norms of Wikipedia. == Microfilm == The main resolution adopted by the congress proposed that microfilm be used to make information universally available. Watson Davis, chairman of the American delegation and president of the ADI, stated that the volume of information being produced created difficult problems of access and preservation, but that these could be solved by the use of microfilm. In his address to the Congress, Davis said: Most immediate and practical to put into operation is the microfilming of material in libraries upon demand. It will become fashionable and economical to send a potential book borrower a little strip of microfilm for his permanent possession instead of the book and then badgering him to return it before he has had a chance to use it effectively. I believe that reading machines for microfilm will become as common as typewriters in studies and laboratories. If the principal libraries and information centers of the world will cooperate in such "bibliofilm services," as they are called, if they exchange orders and have essentially uniform methods, forms for ordering, standard microfilm format and production methods and comparable if not uniform prices, the resources of any library will be placed at the disposal of any scholar or scientist anywhere in the world. All the libraries cooperating will merge into one world library without loss of identity or individuality. The world's documentation will become available to even the most isolated and individualistic scholar. The Congress included two separate exhibits on microfilm. One was of the equipment used at the Bibliothèque nationale de France and the other, coordinated by Herman H. Fussler of the University of Chicago, consisting of "an entire microfilm laboratory," complete with cameras, a darkroom, and various kinds of reading machines. Emanuel Goldberg presented a paper on an early copying camera he had invented. Other resolutions passed by the Congress concerned uniform standards for the preparation of articles, for classifying books and other documents, for indexing newspapers and periodicals, and for cooperation between libraries. == H. G. Wells == In his address to the Congress, H. G. Wells said that he thought that his idea of the "world brain" was a precursor to the ideas other delegates were proposing, and explicitly linked the projects being discussed to the work of the encyclopédistes: I am speaking of a process of mental organization throughout the world which I believe to be as inevitable as anything can be in human affairs. All the distresses and horrors of the present time are fundamentally intellectual. The world has to pull its mind together, and this [Congress] is the beginning of its efforts. Civilization is a Phoenix. It perishes in flames and even as it dies it is born again. This synthesis of knowledge upon which you are working is the necessary beginning of a new world. It is good to be meeting here in Paris where the first encyclopedia of power was made. It would be impossible to overrate our debt to Diderot and his associates. == Other participants == Participants in the Congress included authors, librarians, scholars, archivists, scientists, and editors. Some of the notable people in attendance not mentioned above were:

    Read more →
  • Electronic business

    Electronic business

    Electronic business (also known as online business or e-business) is any kind of business or commercial activity that includes sharing information across the internet. Commerce constitutes the exchange of products and services between businesses, groups, and individuals; and can be seen as one of the essential activities of any business. E-commerce focuses on the use of ICT to enable the external activities and relationships of the business with individuals, groups, and other organizations, while e-business does not only deal with online commercial operations of enterprises, but also deals with their other organizational matters such as human resource management and production. The term "e-business" was coined by IBM's marketing and Internet team in 1996. == Market participants == Electronic business can take place between a very large number of market participants; it can be between business and consumer, private individuals, public administrations, or any other organizations such as non-governmental organizations (NGOs). These various market participants can be divided into three main groups: Business (B) Consumer (C) Administration (A) All of them can be either buyers or service providers within the market. There are nine possible combinations for electronic business relationships. B2C and B2B belong to E-commerce, while A2B and A2A belong to the E-government sector which is also a part of the electronic business. == History == One of the founding pillars of electronic business was the development of the Electronic Data Interchange (EDI) electronic data interchange. This system replaced traditional mailing and faxing of documents with a digital transfer of data from one computer to another, without any human intervention. Michael Aldrich is considered the developer of the predecessor to online shopping. In 1979, the entrepreneur connected a television set to a transaction processing computer with a telephone line and called it "teleshopping", meaning shopping at distance. From the mid-nineties, major advancements were made in the commercial use of the Internet. Amazon, which launched in 1995, started as an online bookstore and grew to become nowadays the largest online retailer worldwide, selling food, toys, electronics, apparel and more. Other successful stories of online marketplaces include eBay or Etsy. In 1994, IBM, with its agency Ogilvy & Mather, began to use its foundation in IT solutions and expertise to market itself as a leader of conducting business on the Internet through the term "e-business." Then CEO Louis V. Gerstner, Jr. was prepared to invest $1 billion to market this new brand. After conducting worldwide market research in October 1997, IBM began with an eight-page piece in The Wall Street Journal that would introduce the concept of "e-business" and advertise IBM's expertise in the new field. IBM decided not to trademark the term "e-business" in the hopes that other companies would use the term and create an entirely new industry. However, this proved to be too successful and by 2000, to differentiate itself, IBM launched a $300 million campaign about its "e-business infrastructure" capabilities. Since that time, the terms, "e-business" and "e-commerce" have been loosely interchangeable and have become a part of the common vernacular. According to the U.S. Department Of Commerce, the estimated retail e-commerce sales in Q1 2020 were representing almost 12% of total U.S. retail sales, against 4% for Q1 2010. == Business model == The transformation toward e-business is complex and in order for it to succeed, there is a need to balance between strategy, an adapted business model (e-intermediary, marketplaces), right processes (sales, marketing) and technology (Supply Chain Management, Customer Relationship Management). When organizations go online, they have to decide which e-business models best suit their goals. A business model is defined as the organization of product, service and information flows, and the source of revenues and benefits for suppliers and customers. The concept of the e-business model is the same but used in online presence. === Revenue model === A key component of the business model is the revenue model or profit model, which is a framework for generating revenues. It identifies which revenue source to pursue, what value to offer, how to price the value, and who pays for the value. It is a key component of a company's business model. It primarily identifies what product or service will be created in order to generate revenues and the ways in which the product or service will be sold. Without a well-defined revenue model, that is, a clear plan of how to generate revenues, new businesses will more likely struggle due to costs that they will not be able to sustain. By having a revenue model, a business can focus on a target audience, fund development plans for a product or service, establish marketing plans, begin a line of credit and raise capital. ==== E-commerce ==== E-commerce (short for "electronic commerce") is trading in products or services using computer networks, such as the Internet. Electronic commerce draws on technologies such as mobile commerce, electronic funds transfer, supply chain management, Internet marketing, online transaction processing, electronic data interchange (EDI), inventory management systems, and automated data collection. Modern electronic commerce typically uses the World Wide Web for at least one part of the transaction's life cycle, although it may also use other technologies such as e-mail. == Concerns == While much has been written of the economic advantages of Internet-enabled commerce, there is also evidence that some aspects of the internet such as maps and location-aware services may serve to reinforce economic inequality and the digital divide. Electronic commerce may be responsible for consolidation and the decline of mom-and-pop, brick and mortar businesses resulting in increases in income inequality. === Security === E-business systems naturally have greater security risks than traditional business systems, therefore it is important for e-business systems to be fully protected against these risks. A far greater number of people have access to e-businesses through the internet than would have access to a traditional business. Customers, suppliers, employees, and numerous other people use any particular e-business system daily and expect their confidential information to stay secure. Hackers are one of the great threats to the security of e-businesses. Some common security concerns for e-Businesses include keeping business and customer information private and confidential, the authenticity of data, and data integrity. Some of the methods of protecting e-business security and keeping information secure include physical security measures as well as data storage, data transmission, anti-virus software, firewalls, and encryption to list a few. ==== Privacy and confidentiality ==== Confidentiality is the extent to which businesses makes personal information available to other businesses and individuals. With any business, confidential information must remain secure and only be accessible to the intended recipient. However, this becomes even more difficult when dealing with e-businesses specifically. To keep such information secure means protecting any electronic records and files from unauthorized access, as well as ensuring safe transmission and data storage of such information. Tools such as encryption and firewalls manage this specific concern within e-business. ==== Authenticity ==== E-business transactions pose greater challenges for establishing authenticity due to the ease with which electronic information may be altered and copied. Both parties in an e-business transaction want to have the assurance that the other party is who they claim to be, especially when a customer places an order and then submits a payment electronically. One common way to ensure this is to limit access to a network or trusted parties by using a virtual private network (VPN) technology. The establishment of authenticity is even greater when a combination of techniques are used, and such techniques involve checking "something you know" (i.e. password or PIN), "something you need" (i.e. credit card), or "something you are" (i.e. digital signatures or voice recognition methods). Many times in e-business, however, "something you are" is pretty strongly verified by checking the purchaser's "something you have" (i.e. credit card) and "something you know" (i.e. card number). ==== Data integrity ==== Data integrity answers the question "Can the information be changed or corrupted in any way?" This leads to the assurance that the message received is identical to the message sent. A business needs to be confident that data is not changed in transit, whether deliberately or by accident. To help with data integrity, firewalls protect stored data against unauthorized access, while

    Read more →
  • Information strategist

    Information strategist

    An information strategist analyses the information flow within an organisation and directs its information resources to better serve the organisation's strategic goals. They work with information technology or within a corporate library to direct high quality information from a variety of sources to users, based upon their profiles and needs. In warfare, information strategists not only seek to improve information flows for their own side but also try to disrupt the information flows of the enemy in order to demoralize and deceive them.

    Read more →
  • Friendly artificial intelligence

    Friendly artificial intelligence

    Friendly artificial intelligence (friendly AI or FAI) is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests such as fostering the improvement of the human species. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained. == Etymology and usage == The term was coined by Eliezer Yudkowsky, who is best known for popularizing the idea, to discuss superintelligent artificial agents that reliably implement human values. Stuart J. Russell and Peter Norvig's leading artificial intelligence textbook, Artificial Intelligence: A Modern Approach, describes the idea: Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. "Friendly" is used in this context as technical terminology, and picks out agents that are safe and useful, not necessarily ones that are "friendly" in the colloquial sense. The concept is primarily invoked in the context of discussions of recursively self-improving artificial agents that rapidly explode in intelligence, on the grounds that this hypothetical technology would have a large, rapid, and difficult-to-control impact on human society. == Risks of unfriendly AI == The roots of concern about artificial intelligence are very old. Kevin LaGrandeur showed that the dangers specific to AI can be seen in ancient literature concerning artificial humanoid servants such as the golem, or the proto-robots of Gerbert of Aurillac and Roger Bacon. In those stories, the extreme intelligence and power of these humanoid creations clash with their status as slaves (which by nature are seen as sub-human), and cause disastrous conflict. By 1942 these themes prompted Isaac Asimov to create the "Three Laws of Robotics"—principles hard-wired into all the robots in his fiction, intended to prevent them from turning on their creators, or allowing them to come to harm. In modern times as the prospect of superintelligent AI looms nearer, philosopher Nick Bostrom has said that superintelligent AI systems with goals that are not aligned with human ethics are intrinsically dangerous unless extreme measures are taken to ensure the safety of humanity. He put it this way: Basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has. Therefore, it is extremely important that the goals we endow it with, and its entire motivation system, is 'human friendly.' In 2008, Eliezer Yudkowsky called for the creation of "friendly AI" to mitigate existential risk from advanced artificial intelligence. He explains: "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." Steve Omohundro says that a sufficiently advanced AI system will, unless explicitly counteracted, exhibit a number of basic "drives", such as resource acquisition, self-preservation, and continuous self-improvement, because of the intrinsic nature of any goal-driven systems and that these drives will, "without special precautions", cause the AI to exhibit undesired behavior. Alexander Wissner-Gross says that AIs driven to maximize their future freedom of action (or causal path entropy) might be considered friendly if their planning horizon is longer than a certain threshold, and unfriendly if their planning horizon is shorter than that threshold. Luke Muehlhauser, writing for the Machine Intelligence Research Institute, recommends that machine ethics researchers adopt what Bruce Schneier has called the "security mindset": Rather than thinking about how a system will work, imagine how it could fail. For instance, he suggests even an AI that only makes accurate predictions and communicates via a text interface might cause unintended harm. In 2014, Luke Muehlhauser and Nick Bostrom underlined the need for 'friendly AI'; nonetheless, the difficulties in designing a 'friendly' superintelligence, for instance via programming counterfactual moral thinking, are considerable. == Coherent extrapolated volition == Yudkowsky advances the Coherent Extrapolated Volition (CEV) model. According to him, our coherent extrapolated volition is "our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted". Rather than a Friendly AI being designed directly by human programmers, it is to be designed by a "seed AI" programmed to first study human nature and then produce the AI that humanity would want, given sufficient time and insight, to arrive at a satisfactory answer. The appeal to an objective through contingent human nature (perhaps expressed, for mathematical purposes, in the form of a utility function or other decision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to the meta-ethical problem of defining an objective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity. == Other approaches == Steve Omohundro has proposed a "scaffolding" approach to AI safety, in which one provably safe AI generation helps build the next provably safe generation. Seth Baum argues that the development of safe, socially beneficial artificial intelligence or artificial general intelligence is a function of the social psychology of AI research communities and so can be constrained by extrinsic measures and motivated by intrinsic measures. Intrinsic motivations can be strengthened when messages resonate with AI developers; Baum argues that, in contrast, "existing messages about beneficial AI are not always framed well". Baum advocates for "cooperative relationships, and positive framing of AI researchers" and cautions against characterizing AI researchers as "not want(ing) to pursue beneficial designs". In his book Human Compatible, AI researcher Stuart J. Russell lists three principles to guide the development of beneficial machines. He emphasizes that these principles are not meant to be explicitly coded into the machines; rather, they are intended for the human developers. The principles are as follows: The machine's only objective is to maximize the realization of human preferences. The machine is initially uncertain about what those preferences are. The ultimate source of information about human preferences is human behavior. The "preferences" Russell refers to "are all-encompassing; they cover everything you might care about, arbitrarily far into the future." Similarly, "behavior" includes any choice between options, and the uncertainty is such that some probability, which may be quite small, must be assigned to every logically possible human preference. == Public policy == James Barrat, author of Our Final Invention, suggested that "a public-private partnership has to be created to bring A.I.-makers together to share ideas about security—something like the International Atomic Energy Agency, but in partnership with corporations." He urges AI researchers to convene a meeting similar to the Asilomar Conference on Recombinant DNA, which discussed risks of biotechnology. John McGinnis encourages governments to accelerate friendly AI research. Because the goalposts of friendly AI are not necessarily eminent, he suggests a model similar to the National Institutes of Health, where "Peer review panels of computer and cognitive scientists would sift through projects and choose those that are designed both to advance AI and assure that such advances would be accompanied by appropriate safeguards." McGinnis feels that peer review is better "than regulation to address technical issues that are not possible to capture through bureaucratic mandates". McGinnis notes that his proposal stands in contrast to that of the Machine Intelligence Research Institute, which generally aims to avoid government involvement in friendly AI. == Criticism == Some critics believe that both human-level AI and superintelligence are unlikely and that, therefore, friendly AI is unlik

    Read more →