AI Artists On Spotify

AI Artists On Spotify — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AI agent

    AI agent

    In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents that can pursue goals, use tools, and take actions with varying degrees of autonomy. In practice, they usually operate within human-defined objectives, constraints, and available tools. == Overview == AI agents possess several key attributes, including goal-directed behavior, natural language interfaces, the capacity to use external tools, and the ability to perform multi-step tasks. Their control flow is frequently driven by large language models (LLMs). Agent systems may also include memory components, planning logic, tool interfaces, and orchestration software for coordinating agent components. AI agents do not have a standard definition. NIST describes agentic AI as an emerging area requiring standards for secure operation, interoperability, and reliable interaction with external systems. A common application of AI agents is task automation: for example, booking travel plans based on a user's prompted request. Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents. Several protocols have been proposed for standardizing inter-agent communication, with examples including the Model Context Protocol, Gibberlink, and many others. Some of these protocols are also used for connecting agents to external applications. In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF), with the goal of ensuring agentic AI evolves transparently and collaboratively. == History == AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024. == Training and testing == Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training such agents. == Autonomous capabilities == The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, likening most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical. == Cognitive architecture == The following are some internal design options for reasoning within an agent: Retrieval-augmented generation ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps. Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache. A tool/agent registry, for organizing software functions or other agents that the agent can use. One-shot model querying, which queries the model once to create the plan of action. === Reference architecture === Ken Huang proposed an AI agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it: Layer 1: Foundation models - provide the core AI engines to power agent capabilities. Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG. Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents. Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents. Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents. Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform to regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together. Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users. == Orchestration patterns == To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following: Prompt chaining: A sequence where the output of one step serves as the input for the next. Routing: The classification of an input to direct it to a specialized downstream task or tool. Parallelization: The simultaneous execution of multiple tasks. Sequential processing: A fixed, linear progression of tasks through a predefined pipeline. Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement. == Multimodal AI agents == In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots. == Applications == As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents. The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator). By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses. In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment. === Applications in government === Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls. In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses. In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep r

    Read more →
  • Information seeking

    Information seeking

    Information seeking is the process or activity of attempting to obtain information in both human and technological contexts. Information seeking is related to, but different from, information retrieval (IR). == Compared to information retrieval == Traditionally, IR tools have been designed for IR professionals to enable them to effectively and efficiently retrieve information from a source. It is assumed that the information exists in the source and that a well-formed query will retrieve it (and nothing else). It has been argued that laypersons' information seeking on the internet is very different from information retrieval as performed within the IR discourse. Yet, internet search engines are built on IR principles. Since the late 1990s a body of research on how casual users interact with internet search engines has been forming, but the topic is far from fully understood. IR can be said to be technology-oriented, focusing on algorithms and issues such as precision and recall. Information seeking may be understood as a more human-oriented and open-ended process than information retrieval. In information seeking, one does not know whether there exists an answer to one's query, so the process of seeking may provide the learning required to satisfy one's information need. == In different contexts == Much library and information science (LIS) research has focused on the information-seeking practices of practitioners within various fields of professional work. Studies have been carried out into the information-seeking behaviors of librarians, academics, medical professionals, engineers, lawyers and mini-publics(among others). Much of this research has drawn on the work done by Leckie, Pettigrew (now Fisher) and Sylvain, who in 1996 conducted an extensive review of the LIS literature (as well as the literature of other academic fields) on professionals' information seeking. The authors proposed an analytic model of professionals' information seeking behaviour, intended to be generalizable across the professions, thus providing a platform for future research in the area. The model was intended to "prompt new insights... and give rise to more refined and applicable theories of information seeking" (1996, p. 188). The model has been adapted by Wilkinson (2001) who proposes a model of the information seeking of lawyers. Recent studies in this topic address the concept of information-gathering that "provides a broader perspective that adheres better to professionals' work-related reality and desired skills." (Solomon & Bronstein, 2021). == Theories of information-seeking behavior == A variety of theories of information behavior – e.g. Zipf's Principle of Least Effort, Brenda Dervin's Sense Making, Elfreda Chatman's Life in the Round – seek to understand the processes that surround information seeking. In addition, many theories from other disciplines have been applied in investigating an aspect or whole process of information seeking behavior. A review of the literature on information seeking behavior shows that information seeking has generally been accepted as dynamic and non-linear (Foster, 2005; Kuhlthau 2006). People experience the information search process as an interplay of thoughts, feelings and actions (Kuhlthau, 2006). Donald O. Case (2007) also wrote a good book that is a review of the literature. Information seeking has been found to be linked to a variety of interpersonal communication behaviors beyond question-asking, to include strategies such as candidate answers. Robinson's (2010) research suggests that when seeking information at work, people rely on both other people and information repositories (e.g., documents and databases), and spend similar amounts of time consulting each (7.8% and 6.4% of work time, respectively; 14.2% in total). However, the distribution of time among the constituent information seeking stages differs depending on the source. When consulting other people, people spend less time locating the information source and information within that source, similar time understanding the information, and more time problem solving and decision making, than when consulting information repositories. Furthermore, the research found that people spend substantially more time receiving information passively (i.e., information that they have not requested) than actively (i.e., information that they have requested), and this pattern is also reflected when they provide others with information. == Wilson's nested model of conceptual areas == The concepts of information seeking, information retrieval, and information behaviour are objects of investigation of information science. Within this scientific discipline a variety of studies has been undertaken analyzing the interaction of an individual with information sources in case of a specific information need, task, and context. The research models developed in these studies vary in their level of scope. Wilson (1999) therefore developed a nested model of conceptual areas, which visualizes the interrelation of the here mentioned central concepts. Wilson defines models of information behavior to be "statements, often in the form of diagrams, that attempt to describe an information-seeking activity, the causes and consequences of that activity, or the relationships among stages in information-seeking behaviour" (1999: 250).

    Read more →
  • Least-squares spectral analysis

    Least-squares spectral analysis

    Least-squares spectral analysis (LSSA) is a class of methods for estimating a frequency spectrum by fitting sinusoids to data using a least-squares fit. Unlike Fourier analysis, the most widely used spectral method in science, data need not be equally spaced to use LSSA. Furthermore, while Fourier analysis generally amplifies long-period noise in long or gapped records, LSSA mitigates such problems. The first strictly least-squares LSSA method was developed in 1969 and 1971, and is known as the Vaníček method or the Gauss–Vaniček method, after its inventor Petr Vaníček and Carl Friedrich Gauss, the inventor of the least-squares method for error minimization. A widely known LSSA variant is the Lomb method or the Lomb–Scargle periodogram, based on dated computational simplifications of the Vaníček method introduced in the 1970s and 1980s, first by Nicholas R. Lomb and later by Jeffrey D. Scargle. Other LSSA variants have been subsequently developed. == Historical background == The close connections between Fourier analysis, the periodogram, and the least-squares fitting of sinusoids have been known for a long time. However, most developments are restricted to complete data sets of equally spaced samples. In 1963, Freek J. M. Barning of Mathematisch Centrum, Amsterdam, handled unequally spaced data by similar techniques, including both a periodogram analysis equivalent to what nowadays is called the Lomb method and least-squares fitting of selected frequencies of sinusoids determined from such periodograms — and connected by a procedure known today as the matching pursuit with post-back fitting or the orthogonal matching pursuit. Petr Vaníček, a Canadian geophysicist and geodesist of the University of New Brunswick, proposed in 1969 also the matching-pursuit approach for equally and unequally spaced data, which he called "successive spectral analysis" and the result a "least-squares periodogram". He generalized this method to account for any systematic components beyond a simple mean, such as a "predicted linear (quadratic, exponential, ...) secular trend of unknown magnitude", and applied it to a variety of samples, in 1971. Vaníček's strictly least-squares method was then simplified in 1976 by Nicholas R. Lomb of the University of Sydney, who pointed out its close connection to periodogram analysis. Subsequently, the definition of a periodogram of unequally spaced data was modified and analyzed by Jeffrey D. Scargle of NASA Ames Research Center, who showed that, with minor changes, it becomes identical to Lomb's least-squares formula for fitting individual sinusoid frequencies. Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out regarding least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent." Press summarizes the development this way: A completely different method of spectral analysis for unevenly sampled data, one that mitigates these difficulties and has some other very desirable properties, was developed by Lomb, based in part on earlier work by Barning and Vanicek, and additionally elaborated by Scargle. In 1989, Michael J. Korenberg of Queen's University in Kingston, Ontario, developed the "fast orthogonal search" method of more quickly finding a near-optimal decomposition of spectra or other problems, similar to the technique that later became known as the orthogonal matching pursuit. == Development of LSSA and variants == === The Vaníček method === In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies using a standard linear regression or least-squares fit. The frequencies are chosen using a method similar to Barning's, but going further in optimizing the choice of each successive new frequency by picking the frequency that minimizes the residual after least-squares fitting (equivalent to the fitting technique now known as matching pursuit with pre-backfitting). The number of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids). The relationship between the DFT and the approximation of trigonometric functions using the least-squares method is well explained in (Strutz, 2017). A data vector Φ is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix A by evaluating each function at the sample times, with weight vector x: ϕ ≈ A x , {\displaystyle \phi \approx {\textbf {A}}x,} where the weights vector x is chosen to minimize the sum of squared errors in approximating Φ. The solution for x is closed-form, using standard linear regression: x = ( A T A ) − 1 A T ϕ . {\displaystyle x=({\textbf {A}}^{\mathrm {T} }{\textbf {A}})^{-1}{\textbf {A}}^{\mathrm {T} }\phi .} Here the matrix A can be based on any set of functions mutually independent (not necessarily orthogonal) when evaluated at the sample times; functions used for spectral analysis are typically sines and cosines evenly distributed over the frequency range of interest. If we choose too many frequencies in a too-narrow frequency range, the functions will be insufficiently independent, the matrix ill-conditioned, and the resulting spectrum meaningless. When the basis functions in A are orthogonal (that is, not correlated, meaning the columns have zero pair-wise dot products), the matrix ATA is diagonal; when the columns all have the same power (sum of squares of elements), then that matrix is an identity matrix times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and sinusoids chosen as sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycles per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This case is known as the discrete Fourier transform, slightly rewritten in terms of measurements and coefficients. x = A T ϕ {\displaystyle x={\textbf {A}}^{\mathrm {T} }\phi } — DFT case for N equally spaced samples and frequencies, within a scalar factor. === The Lomb method === Trying to lower the computational burden of the Vaníček method in 1976 (no longer an issue), Lomb proposed using the above simplification in general, except for pair-wise correlations between sine and cosine bases of the same frequency, since the correlations between pairs of sinusoids are often small, at least when they are not tightly spaced. This formulation is essentially that of the traditional periodogram but adapted for use with unevenly spaced samples. The vector x is a reasonably good estimate of an underlying spectrum, but since we ignore any correlations, Ax is no longer a good approximation to the signal, and the method is no longer a least-squares method — yet in the literature continues to be referred to as such. Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula so to find a time delay τ {\displaystyle \tau } first, such that this pair of sinusoids would be mutually orthogonal at sample times t j {\displaystyle t_{j}} and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequency. This procedure made his modified periodogram method exactly equivalent to Lomb's method. Time delay τ {\displaystyle \tau } by definition equals to tan ⁡ 2 ω τ = ∑ j sin ⁡ 2 ω t j ∑ j cos ⁡ 2 ω t j . {\displaystyle \tan {2\omega \tau }={\frac {\sum _{j}\sin 2\omega t_{j}}{\sum _{j}\cos 2\omega t_{j}}}.} Then the periodogram at frequency ω {\displaystyle \omega } is estimated as: P x ( ω ) = 1 2 [ [ ∑ j X j cos ⁡ ω ( t j − τ ) ] 2 ∑ j cos 2 ⁡ ω ( t j − τ ) + [ ∑ j X j sin ⁡ ω ( t j − τ ) ] 2 ∑ j sin 2 ⁡ ω ( t j − τ ) ] , {\displaystyle P_{x}(\omega )={\frac {1}{2}}\left[{\frac {\left[\sum _{j}X_{j}\cos \omega (t_{j}-\tau )\right]^{2}}{\sum _{j}\cos ^{2}\omega (t_{j}-\tau )}}+{\frac {\left[\sum _{j}X_{j}\sin \omega (t_{j}-\tau )\right]^{2}}{\sum _{j}\sin ^{2}\omega (t_{j}-\tau )}}\right],} which, as Scargle reports, has the same statistical distribution as the periodogram in the evenly sampled case. At any individual frequency ω {\displaystyle \omega } , this method gives the same power as does a least-squares fit to sinusoids of that frequency and of the form: ϕ ( t ) = A sin ⁡ ω t + B cos ⁡ ω t . {\displaystyle \phi (t)=A\sin \omega t+B\cos \omega t.} In practice, it is always difficult to judge if a given Lomb peak is significant or not, especially when the nature of the noise is unknown, so for example a false-alarm spectr

    Read more →
  • Enumeration algorithm

    Enumeration algorithm

    In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms. == Formal definitions == An enumeration problem P {\displaystyle P} is defined as a relation R {\displaystyle R} over strings of an arbitrary alphabet Σ {\displaystyle \Sigma } : R ⊆ Σ ∗ × Σ ∗ {\displaystyle R\subseteq \Sigma ^{}\times \Sigma ^{}} An algorithm solves P {\displaystyle P} if for every input x {\displaystyle x} the algorithm produces the (possibly infinite) sequence y {\displaystyle y} such that y {\displaystyle y} has no duplicate and z ∈ y {\displaystyle z\in y} if and only if ( x , z ) ∈ R {\displaystyle (x,z)\in R} . The algorithm should halt if the sequence y {\displaystyle y} is finite. == Common complexity classes == Enumeration problems have been studied in the context of computational complexity theory, and several complexity classes have been introduced for such problems. A very general such class is EnumP, the class of problems for which the correctness of a possible output can be checked in polynomial time in the input and output. Formally, for such a problem, there must exist an algorithm A which takes as input the problem input x, the candidate output y, and solves the decision problem of whether y is a correct output for the input x, in polynomial time in x and y. For instance, this class contains all problems that amount to enumerating the witnesses of a problem in the class NP. Other classes that have been defined include the following. In the case of problems that are also in EnumP, these problems are ordered from least to most specific: Output polynomial, the class of problems whose complete output can be computed in polynomial time. Incremental polynomial time, the class of problems where, for all i, the i-th output can be produced in polynomial time in the input size and in the number i. Polynomial delay, the class of problems where the delay between two consecutive outputs is polynomial in the input (and independent from the output). Strongly polynomial delay, the class of problems where the delay before each output is polynomial in the size of this specific output (and independent from the input or from the other outputs). The preprocessing is generally assumed to be polynomial. Constant delay, the class of problems where the delay before each output is constant, i.e., independent from the input and output. The preprocessing phase is generally assumed to be polynomial in the input. == Common techniques == Backtracking: The simplest way to enumerate all solutions is by systematically exploring the space of possible results (partitioning it at each successive step). However, performing this may not give good guarantees on the delay, i.e., a backtracking algorithm may spend a long time exploring parts of the space of possible results that do not give rise to a full solution. Flashlight search: This technique improves on backtracking by exploring the space of all possible solutions but solving at each step the problem of whether the current partial solution can be extended to a partial solution. If the answer is no, then the algorithm can immediately backtrack and avoid wasting time, which makes it easier to show guarantees on the delay between any two complete solutions. In particular, this technique applies well to self-reducible problems. Closure under set operations: If we wish to enumerate the disjoint union of two sets, then we can solve the problem by enumerating the first set and then the second set. If the union is non disjoint but the sets can be enumerated in sorted order, then the enumeration can be performed in parallel on both sets while eliminating duplicates on the fly. If the union is not disjoint and both sets are not sorted then duplicates can be eliminated at the expense of a higher memory usage, e.g., using a hash table. Likewise, the cartesian product of two sets can be enumerated efficiently by enumerating one set and joining each result with all results obtained when enumerating the second step. == Examples of enumeration problems == The vertex enumeration problem, where we are given a polytope described as a system of linear inequalities and we must enumerate the vertices of the polytope. Enumerating the minimal transversals of a hypergraph. This problem is related to monotone dualization and is connected to many applications in database theory and graph theory. Enumerating the answers to a database query, for instance a conjunctive query or a query expressed in monadic second-order. There have been characterizations in database theory of which conjunctive queries could be enumerated with linear preprocessing and constant delay. The problem of enumerating maximal cliques in an input graph, e.g., with the Bron–Kerbosch algorithm Listing all elements of structures such as matroids and greedoids Several problems on graphs, e.g., enumerating independent sets, paths, cuts, etc. Enumerating the satisfying assignments of representations of Boolean functions, e.g., a Boolean formula written in conjunctive normal form or disjunctive normal form, a binary decision diagram such as an OBDD, or a Boolean circuit in restricted classes studied in knowledge compilation, e.g., NNF. == Connection to computability theory == The notion of enumeration algorithms is also used in the field of computability theory to define some high complexity classes such as RE, the class of all recursively enumerable problems. This is the class of sets for which there exist an enumeration algorithm that will produce all elements of the set: the algorithm may run forever if the set is infinite, but each solution must be produced by the algorithm after a finite time.

    Read more →
  • Automaton

    Automaton

    An automaton ( ; pl.: automata or automatons) is a relatively self-operating machine or control mechanism designed to automatically follow a sequence of operations or respond to predetermined instructions. Some automata, such as bellstrikers in mechanical clocks, are designed to give the illusion to the casual observer that they are operating under their own power or will, like a mechanical robot. The term has long been commonly associated with automated puppets that resemble moving humans or animals, built to impress and/or to entertain people. Animatronics are a modern type of automata with electronics, often used for the portrayal of characters or creatures in films and in theme park attractions. == Etymology == The word automaton is the latinization of the Ancient Greek automaton (αὐτόματον), which means "acting of one's own will". It was first used by Homer to describe an automatic door opening, or automatic movement of wheeled tripods. It is more often used to describe non-electronic moving machines, especially those that have been made to resemble human or animal actions, such as the jacks on old public striking clocks, or the cuckoo and any other animated figures on a cuckoo clock. == History == === Ancient === There are many examples of automata in Greek mythology: Hephaestus created automata for his workshop; Talos was an artificial man of bronze; King Alkinous of the Phaiakians employed gold and silver watchdogs. According to Aristotle, Daedalus used quicksilver to make his wooden statue of Aphrodite move. In other Greek legends he used quicksilver to install voice in his moving statues. The automata in the Hellenistic world were intended as tools, toys, religious spectacles, or prototypes for demonstrating basic scientific principles. Numerous water-powered automata were built by Ktesibios, a Greek inventor and the first head of the Great Library of Alexandria; for example, he "used water to sound a whistle and make a model owl move. He had invented the world's first 'cuckoo clock'". This tradition continued in Alexandria with inventors such as the Greek mathematician Hero of Alexandria (sometimes known as Heron), whose writings on hydraulics, pneumatics, and mechanics described siphons, a fire engine, a water organ, the aeolipile, and a programmable cart. Philo of Byzantium was famous for his inventions. Complex mechanical devices are known to have existed in Hellenistic Greece, though the only surviving example is the Antikythera mechanism, the earliest known analog computer. The clockwork is thought to have come originally from Rhodes, where there was apparently a tradition of mechanical engineering; the island was renowned for its automata; to quote Pindar's seventh Olympic Ode: The animated figures stand Adorning every public street And seem to breathe in stone, or move their marble feet. However, the information gleaned from recent scans of the fragments indicate that it may have come from the colonies of Corinth in Sicily and implies a connection with Archimedes. According to Jewish legend, King Solomon used his wisdom to design a throne with mechanical animals which hailed him as king when he ascended it; upon sitting down an eagle would place a crown upon his head, and a dove would bring him a Torah scroll. It is also said that when King Solomon stepped upon the throne, a mechanism was set in motion. As soon as he stepped upon the first step, a golden ox and a golden lion each stretched out one foot to support him and help him rise to the next step. On each side, the animals helped the King up until he was comfortably seated upon the throne. In ancient China, a curious account of automata is found in the Lie Zi text, believed to have originated around 400 BCE and compiled around the fourth century CE. Within it there is a description of a much earlier encounter between King Mu of Zhou (1023–957 BCE) and a mechanical engineer known as Yan Shi, an 'artificer'. The latter proudly presented the king with a very realistic and detailed life-size, human-shaped figure of his mechanical handiwork: The king stared at the figure in astonishment. It walked with rapid strides, moving its head up and down, so that anyone would have taken it for a live human being. The artificer touched its chin, and it began singing, perfectly in tune. He touched its hand, and it began posturing, keeping perfect time...As the performance was drawing to an end, the robot winked its eye and made advances to the ladies in attendance, whereupon the king became incensed and would have had Yen Shih [Yan Shi] executed on the spot had not the latter, in mortal fear, instantly taken the robot to pieces to let him see what it really was. And, indeed, it turned out to be only a construction of leather, wood, glue and lacquer, variously coloured white, black, red and blue. Examining it closely, the king found all the internal organs complete—liver, gall, heart, lungs, spleen, kidneys, stomach and intestines; and over these again, muscles, bones and limbs with their joints, skin, teeth and hair, all of them artificial...The king tried the effect of taking away the heart, and found that the mouth could no longer speak; he took away the liver and the eyes could no longer see; he took away the kidneys and the legs lost their power of locomotion. The king was delighted. Other notable examples of automata include Archytas' dove, mentioned by Aulus Gellius. Similar Chinese accounts of flying automata are written of the 5th century BC Mohist philosopher Mozi and his contemporary Lu Ban, who made artificial wooden birds (ma yuan) that could successfully fly according to the Han Fei Zi and other texts. === Medieval === The manufacturing tradition of automata continued in the Greek world well into the Middle Ages. On his visit to Constantinople in 949 ambassador Liutprand of Cremona described automata in the emperor Theophilos' palace, including "lions, made either of bronze or wood covered with gold, which struck the ground with their tails and roared with open mouth and quivering tongue," "a tree of gilded bronze, its branches filled with birds, likewise made of bronze gilded over, and these emitted cries appropriate to their species" and "the emperor's throne" itself, which "was made in such a cunning manner that at one moment it was down on the ground, while at another it rose higher and was to be seen up in the air." Similar automata in the throne room (singing birds, roaring and moving lions) were described by Luitprand's contemporary the Byzantine emperor Constantine Porphyrogenitus, in his book De Ceremoniis (Perì tês Basileíou Tákseōs). In the mid-8th century, the first wind powered automata were built: "statues that turned with the wind over the domes of the four gates and the palace complex of the Round City of Baghdad". The "public spectacle of wind-powered statues had its private counterpart in the 'Abbasid palaces where automata of various types were predominantly displayed." Also in the 8th century, the Muslim alchemist, Jābir ibn Hayyān (Geber), included recipes for constructing artificial snakes, scorpions, and humans that would be subject to their creator's control in his coded Book of Stones. In 827, Abbasid caliph al-Ma'mun had a silver and golden tree in his palace in Baghdad, which had the features of an automatic machine. There were metal birds that sang automatically on the swinging branches of this tree built by Muslim inventors and engineers. The Abbasid caliph al-Muqtadir also had a silver and golden tree in his palace in Baghdad in 917, with birds on it flapping their wings and singing. In the 9th century, the Banū Mūsā brothers invented a programmable automatic flute player and which they described in their Book of Ingenious Devices. Al-Jazari described complex programmable humanoid automata amongst other machines he designed and constructed in the Book of Knowledge of Ingenious Mechanical Devices in 1206. His automaton was a boat with four automatic musicians that floated on a lake to entertain guests at royal drinking parties. His mechanism had a programmable drum machine with pegs (cams) that bump into little levers that operate the percussion. The drummer could be made to play different rhythms and drum patterns if the pegs were moved around. Al-Jazari constructed a hand washing automaton first employing the flush mechanism now used in modern toilets. It features a female automaton standing by a basin filled with water. When the user pulls the lever, the water drains and the automaton refills the basin. His "peacock fountain" was another more sophisticated hand washing device featuring humanoid automata as servants who offer soap and towels. Mark E. Rosheim describes it as follows: "Pulling a plug on the peacock's tail releases water out of the beak; as the dirty water from the basin fills the hollow base a float rises and actuates a linkage which makes a servant figure appear from behind a door under the peacock and offer soap.

    Read more →
  • Vector-field consistency

    Vector-field consistency

    Vector-Field Consistency is a consistency model for replicated data (for example, objects), initially described in a paper which was awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007. It has since been enhanced for increased scalability and fault-tolerance in a recent paper. == Description == This consistency model was initially designed for replicated data management in ad hoc gaming in order to minimize bandwidth usage without sacrificing playability. Intuitively, it captures the notion that although players require, wish, and take advantage of information regarding the whole of the game world (as opposed to a restricted view to rooms, arenas, etc. of limited size employed in many multiplayer video games), they need to know information with greater freshness, frequency, and accuracy as other game entities are located closer and closer to the player's position. It prescribes a multidimensional divergence bounding scheme, based on a vector field that employs consistency vectors k=(θ,σ,ν), standing for maximum allowed time - or replica staleness, sequence - or missing updates, and value - or user-defined measured replica divergence, applied to all space coordinates in game scenario or world. The consistency vector-fields emanate from field-generators designated as pivots (for example, players) and field intensity attenuates as distance grows from these pivots in concentric or square-like regions. This consistency model unifies locality-awareness techniques employed in message routing and consistency enforcement for multiplayer games, with divergence bounding techniques traditionally employed in replicated database and web scenarios.

    Read more →
  • Umbrella review

    Umbrella review

    In medical research, an umbrella review is a review of systematic reviews or meta-analyses. They may also be called overviews of reviews, reviews of reviews, summaries of systematic reviews, or syntheses of reviews. Umbrella reviews are among the highest levels of evidence currently available in medicine. By summarizing information from multiple overview articles, umbrella reviews make it easier to review the evidence and allow for comparison of results between each of the individual reviews. Umbrella reviews may address a broader question than a typical review, such as discussing multiple different treatment comparisons instead of only one. They are especially useful for developing guidelines and clinical practice, and when comparing competing interventions.

    Read more →
  • Physical schema

    Physical schema

    A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation. A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system. As of 2012 seven main databases dominate the commercial marketplace: Informix, Oracle, Postgres, SQL Server, Sybase, IBM Db2 and MySQL. Other RDBMS systems tend either to be legacy databases or used within academia such as universities or further education colleges. Physical data models for each implementation would differ significantly, not least due to underlying operating-system requirements that may sit underneath them. For example: SQL Server runs only on Microsoft Windows operating-systems (Starting with SQL Server 2017, SQL Server runs on Linux. It's the same SQL Server database engine, with many similar features and services regardless of your operating system), while Oracle and MySQL can run on Solaris, Linux and other UNIX-based operating-systems as well as on Windows. This means that the disk requirements, security requirements and many other aspects of a physical data model will be influenced by the RDBMS that a database administrator (or an organization) chooses to use. == Physical schema == Physical schema is a term used in data management to describe how data is to be represented and stored (files, indices, etc.) in secondary storage using a particular database management system (DBMS) (e.g., Oracle RDBMS, Sybase SQL Server, etc.). In the ANSI/SPARC Architecture three schema approach, the internal schema is the view of data that involved data management technology. This is as opposed to an external schema that reflects an individual's view of the data, or the conceptual schema that is the integration of a set of external schemas. The logical schema was the way data were represented to conform to the constraints of a particular approach to database management. At that time the choices were hierarchical and network. Describing the logical schema, however, still did not describe how physically data would be stored on disk drives. That is the domain of the physical schema. Now logical schemas describe data in terms of relational tables and columns, object-oriented classes, and XML tags. A single set of tables, for example, can be implemented in numerous ways, up to and including an architecture where table rows are maintained on computers in different countries.

    Read more →
  • ReRites

    ReRites

    ReRites (also known as RERITES, ReadingRites, Big Data Poetry) is a literary work of "Human + A.I. poetry" by David Jhave Johnston that used neural network models trained to generate poetry which the author then edited. ReRites won the Robert Coover Award for a Work of Electronic Literature in 2022. == About the project == The ReRites project began as a daily rite of writing with a neural network, expanded into a series of performances from which video documentation has been published online, and concluded with a set of 12 books and an accompanying book of essays published by Anteism Books in 2019. In Electronic Literature, Scott Rettberg describes the early phases of the project in 2016, when it bore the preliminary name Big Data Poetry. Jhave (the artist name that David Jhave Johnston goes by) describes the process of writing ReRites as a rite: "Every morning for 2 hours (normally 6:30–8:30am) I get up and edit the poetic output of a neural net. Deleting, weaving, conjugating, lineating, cohering. Re-writing. Re-wiring authorship: hybrid augmented enhanced evolutionary". There is video documentation of the writing process. The human editing of the neural network's output is fundamental to this project, and Jhave gives examples of both unedited text extracts and his edited versions in publications about the project. Kyle Booten describes ReRites as "simultaneously dusty and outrageously verdant, monotonously sublime and speckled with beautiful and rare specimens". === Performances === ReRites was first shared with an audience through a series of performances where audience members and poets would participate in reading the automatically generated texts, which appeared on screen so fast that human readers could barely keep up. This has been described as allowing participants to "re-discover[..] the peculiar pleasures of being embodied", or, in Jhave's own words, as a space where human participants were "playing their wits and voices against an evocative infinite deep-learning muse". The first performance was at Brown University's Interrupt Festival in 2019. It has been performed many times since, including at the Barbican Centre in London and Anteism Books. === Print publications === For a single year Jhave published one book of poetry from the ReRites project each month. These twelve volumes are accompanied by a book of essays, all published by Anteism Books. The accompanying essays provide critical responses to the project from poets and scholars including Allison Parrish, Johanna Drucker, Kyle Booten, Stephanie Strickland, John Cayley, Lai-Tze Fan, Nick Montfort, Mairéad Byrne, and Chris Funkhouser. Allison Parrish notes elsewhere that these paratexts to ReRites serve a legitimising function for a genre of poetry that is not yet institutionally acknowledged. === Technical details === Starting in 2016 under the name Big Data Poetry, Jhave generated poems using, in his own words, "neural network code (..) adapted from three corporate github-hosted machine-learning libraries: TensorFlow (Google), PyTorch (Facebook), and AWD-LSTM (SalesForce)". He explains that the "models were trained on a customised corpus of 600,000 lines of poetry ranging from the romantic epoch to the 20th century avant garde". Jhave maintains a GitHub repository with some of the code supporting ReRites. == Reception == ReRites is described by John Cayley as "one of the most thorough and beautiful" poetic responses to machine learning. The work's influence on the field of electronic literature was acknowledged in 2022, when the work won the Electronic Literature Organization's Robert Coover Award for a Work of Electronic Literature. The jury described ReRites as particularly poignant in the time of the pandemic, as it was "a documentation of the performance of the private ritual of writing and the obsessive-compulsive need for writers to communicate — even when no one else is reading". The question of authorship and voice in ReRites has been raised by several critics. Although generated poetry is an established genre in electronic literature, Cayley notes that unlike the combinatory poems created by authors like Nick Montfort, where the author explicitly defines which words and phrases will be recombined, ReRites has "not been directed by literary preconceptions inscribed in the program itself, but only by patterns and rhythms pre-existing in the corpora". In an essay for the Australian journal TEXT, David Thomas Henry Wright asks how to understand authorship and authority in ReRites: "Who or what is the authority of the work? The original data fed into the machine, that is not currently retrievable or discernible from the final works? The code that was taken and adapted for his purposes? Or Jhave, the human editor?" Wright concludes that Jhave is the only actor with any intentionality and therefore the authority of the work. The centrality of the human editor is also emphasised by other scholars. In a chapter analysing ReRites Malthe Stavning Erslev argues that the machine learning misrepresents the dataset it is trained on. While ReRites uses 21st century neural networks, it has been compared to earlier literary traditions. Poet Victoria Stanton, who read at one of the ReRites performances, has compared ReRites to found poetry, while David Thomas Henry Wright compares it to the Oulipo movement and Mark Amerika to the cut-up technique. Scholars also position ReRites firmly within the long tradition of generative poetry both in electronic literature and print, stretching from the I Ching, Queneau's Cent Mille Milliards de Poemes and Nabokov's Pale Fire to computer-generated poems like Christopher Strachey's Love Letter Generator (1952) and more contemporary examples. Jhave describes the process of working with the output from the neural network as "carving". In his book My Life as an Artificial Creative Intelligence, Mark Amerika writes that the "method of carving the digital outputs provided by the language model as part of a collaborative remix jam session with GPT-2, where the language artist and the language model play off each other’s unexpected outputs as if caught in a live postproduction set, is one I share with electronic literature composer David Jhave Johnston, whose AI poetry experiments precede my own investigations."

    Read more →
  • Retention period

    Retention period

    A retention period (associated with a retention schedule or retention program) is an aspect of records and information management (RIM) and the records life cycle that identifies the duration of time for which the information should be maintained or "retained", irrespective of format (paper, electronic, or other). Retention periods vary with different types of information, based on content and a variety of other factors, including internal organizational need, regulatory requirements for inspection or audit, legal statutes of limitation, involvement in litigation, and taxation and financial reporting needs, as well as other factors as defined by local, regional, state, national, and/or international governing entities. Once an applicable retention period has elapsed for a given type or series of information, and all holds/moratoriums have been released, the information is typically destroyed using an approved and effective destruction method, which renders the information completely and irreversibly unusable via any means. Alternatively, it may be converted from one form to another (e.g. from paper to electronic), depending on the defined retention period per format. Information with historical value beyond its "usable value" may be accessioned to the custody of an archive organization for permanent or extended long-term preservation. == Defensible disposition == Defensible disposition refers to the ability of an identified and applied retention period to effectively provide for the defense of the record, and its eventual destruction or accessioning when scrutinized within a court of law or by other review. It is commonly advised by records and information management (RIM) professionals that any and all retention periods applied to organizational information should be reviewed and approved for use by competent legal counsel, which represents the organization, and is familiar with the specific business needs and legal and regulatory requirements of the organization. Additionally, a practical approach to information assessment/classification, proper documentation of the disposition program, strategic review of disposition policy over time for efficacy are required for proper defensible disposition. == Guidance and education organizations == ARMA International Information and Records Management Society filerskeepers records retention FAQ

    Read more →
  • Ubiquitous computing

    Ubiquitous computing

    Ubiquitous computing (or "ubicomp") is a concept in software engineering, hardware engineering and computer science where computing is made to appear seamlessly anytime and everywhere. In contrast to desktop computing, ubiquitous computing implies use on any device, in any location, and in any format. A user interacts with the computer, which can exist in many different forms, including laptop computers, tablets, smart phones and terminals in everyday objects such as a refrigerator or a pair of glasses. The underlying technologies to support ubiquitous computing include the Internet, advanced middleware, kernels, operating systems, mobile codes, sensors, microprocessors, new I/Os and user interfaces, computer networks, mobile protocols, global navigational systems, and new materials. This paradigm is also described as pervasive computing, ambient intelligence, or "everyware". Each term emphasizes slightly different aspects. When primarily concerning the objects involved, it is also known as physical computing, the Internet of Things, haptic computing, and "things that think". Rather than propose a single definition for ubiquitous computing and for these related terms, a taxonomy of properties for ubiquitous computing has been proposed, from which different kinds or flavors of ubiquitous systems and applications can be described. Ubiquitous computing themes include: distributed computing, mobile computing, location computing, mobile networking, sensor networks, human–computer interaction, context-aware smart home technologies, and artificial intelligence. == Core concepts == Ubiquitous computing is the concept of using small internet connected and inexpensive computers to help with everyday functions in an automated fashion. Mark Weiser proposed three basic forms for ubiquitous computing devices: Tabs: a wearable device that is approximately a centimeter in size Pads: a hand-held device that is approximately a decimeter in size Boards: an interactive larger display device that is approximately a meter in size Ubiquitous computing devices proposed by Mark Weiser are all based around flat devices of different sizes with a visual display. These conceptual device categories were later implemented at Xerox PARC in experimental systems including the PARCTab, PARCPad, and LiveBoard, which served as early prototypes of handheld, tablet-style, and large interactive display computing environments. Expanding beyond those concepts there is a large array of other ubiquitous computing devices that could exist. == History == Mark Weiser coined the phrase "ubiquitous computing" around 1988, during his tenure as Chief Technologist of the Xerox Palo Alto Research Center (PARC). Both alone and with PARC Director and Chief Scientist John Seely Brown, Weiser wrote some of the earliest papers on the subject, largely defining it and sketching out its major concerns. == Recognizing the effects of extending processing power == Recognizing that the extension of processing power into everyday scenarios would necessitate understandings of social, cultural and psychological phenomena beyond its proper ambit, Weiser was influenced by many fields outside computer science, including "philosophy, phenomenology, anthropology, psychology, post-Modernism, sociology of science and feminist criticism". He was explicit about "the humanistic origins of the 'invisible ideal in post-modernist thought'", referencing as well the ironically dystopian Philip K. Dick novel Ubik. Andy Hopper from Cambridge University UK proposed and demonstrated the concept of "Teleporting" – where applications follow the user wherever he/she moves. Roy Want (now at Google), while at Olivetti Research Ltd, designed the first "Active Badge System", which is an advanced location computing system where personal mobility is merged with computing. Later at Xerox PARC, he designed and built the "PARCTab" or simply "Tab", widely recognized as the world's first Context-Aware computer, which has great similarity to the modern smartphone. Bill Schilit (now at Google) also did some earlier work in this topic, and participated in the early Mobile Computing workshop held in Santa Cruz in 1996. Ken Sakamura of the University of Tokyo, Japan leads the Ubiquitous Networking Laboratory (UNL), Tokyo as well as the T-Engine Forum. The joint goal of Sakamura's Ubiquitous Networking specification and the T-Engine forum, is to enable any everyday device to broadcast and receive information. MIT has also contributed significant research in this field, notably Things That Think consortium (directed by Hiroshi Ishii, Joseph A. Paradiso and Rosalind Picard) at the Media Lab and the CSAIL effort known as Project Oxygen. Other major contributors include University of Washington (Shwetak Patel, Anind Dey and James Landay), Dartmouth College's HealthX Lab (directed by Andrew Campbell), Georgia Tech's College of Computing (Gregory Abowd and Thad Starner), Cornell Tech's People Aware Computing Lab (directed by Tanzeem Choudhury), NYU's Interactive Telecommunications Program, UC Irvine's Department of Informatics, Microsoft Research, Intel Research and Equator, Ajou University UCRi & CUS. == Examples == One of the earliest ubiquitous systems was artist Natalie Jeremijenko's "Live Wire", also known as "Dangling String", installed at Xerox PARC during Mark Weiser's time there. This was a piece of string attached to a stepper motor and controlled by a LAN connection; network activity caused the string to twitch, yielding a peripherally noticeable indication of traffic. Weiser called this an example of calm technology. A present manifestation of this trend is the widespread diffusion of mobile phones. Many mobile phones support high speed data transmission, video services, and other services with powerful computational ability. Although these mobile devices are not necessarily manifestations of ubiquitous computing, there are examples, such as Japan's Yaoyorozu ("Eight Million Gods") Project in which mobile devices, coupled with radio frequency identification tags demonstrate that ubiquitous computing is already present in some form. Ambient Devices has produced an "orb", a "dashboard", and a "weather beacon": these decorative devices receive data from a wireless network and report current events, such as stock prices and the weather, like the Nabaztag, which was invented by Rafi Haladjian and Olivier Mével, and manufactured by the company Violet. The Australian futurist Mark Pesce has produced a highly configurable 52-LED LAMP enabled lamp which uses Wi-Fi named MooresCloud after Gordon Moore. The Unified Computer Intelligence Corporation launched a device called Ubi – The Ubiquitous Computer designed to allow voice interaction with the home and provide constant access to information. Ubiquitous computing research has focused on building an environment in which computers allow humans to focus attention on select aspects of the environment and operate in supervisory and policy-making roles. Ubiquitous computing emphasizes the creation of a human computer interface that can interpret and support a user's intentions. For example, MIT's Project Oxygen seeks to create a system in which computation is as pervasive as air: In the future, computation will be human centered. It will be freely available everywhere, like batteries and power sockets, or oxygen in the air we breathe...We will not need to carry our own devices around with us. Instead, configurable generic devices, either handheld or embedded in the environment, will bring computation to us, whenever we need it and wherever we might be. As we interact with these "anonymous" devices, they will adopt our information personalities. They will respect our desires for privacy and security. We won't have to type, click, or learn new computer jargon. Instead, we'll communicate naturally, using speech and gestures that describe our intent... This is a fundamental transition that does not seek to escape the physical world and "enter some metallic, gigabyte-infested cyberspace" but rather brings computers and communications to us, making them "synonymous with the useful tasks they perform". Network robots link ubiquitous networks with robots, contributing to the creation of new lifestyles and solutions to address a variety of social problems including the aging of population and nursing care. The "Continuity" set of features, introduced by Apple in OS X Yosemite, can be seen as an example of ubiquitous computing. == Issues == Privacy is easily the most often-cited criticism of ubiquitous computing (ubicomp), and may be the greatest barrier to its long-term success. == Research centres == This is a list of notable institutions who claim to have a focus on Ubiquitous computing sorted by country: Canada Topological Media Lab, Concordia University, Canada Finland Community Imaging Group, University of Oulu, Finland Germany Telecooperation Office (TECO), Karlsruhe Institute of Technology, Ger

    Read more →
  • NCSA Brown Dog

    NCSA Brown Dog

    NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the National Center for Supercomputing Applications (NCSA) that is funded by the National Science Foundation (NSF). == History == Brown Dog is part of the DataNet partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the University of Illinois Urbana-Champaign as well as researchers from Boston University and the University of North Carolina at Chapel Hill. == Unstructured, uncurated, long tail data == Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project. == Approach == Brown Dog describes itself as the "super mutt" of software (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today. == Technology == Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog. === Data Tilling Service === Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given URL for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering. This service runs on port 9443. === Data Access Proxy === Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or Domain Name Service, the DAP configuration would be entered into a user's machine and browser settings. Data requests over HTTP would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves. This service runs on port 8184. == Use cases == Brown Dog targets three use cases proposed by groups within the EarthCube research communities. Developers and researchers from these communities will work together on use cases that span geoscience, engineering, biology and social science. === Long tail vegetation data in ecology and global change biology === This use case is led by Michael Dietze, Boston University Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation. === Designing green infrastructure considering storm water and human requirements === This use case is led by Barbara Minsker], University of Illinois at Urbana-Champaign]; William Sullivan, University of Illinois at Urbana-Champaign; Arthur Schmidt, University of Illinois at Urbana-Champaign. This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology. === Development and application for critical zone studies === This use case is led by Praveen Kumar, University of Illinois at Urbana-Champaign Critical Zone (CZ) is the "skin" of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes. The Critical Zone supports all terrestrial living systems. Its upper part is the bio-mantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this bio-dynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration. The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context "unstructured" data is viewed broadly as consisting of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text. == NSF Award == CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIU

    Read more →
  • Confidential computing

    Confidential computing

    Confidential computing is a security and privacy-enhancing computational technique focused on protecting data in use. Confidential computing can be used in conjunction with storage and network encryption, which protect data at rest and data in transit respectively. It is designed to address software, protocol, cryptographic, and basic physical and supply-chain attacks, although some critics have demonstrated architectural and side-channel attacks effective against the technology. The technology protects data in use by performing computations in a hardware-based trusted execution environment (TEE). Confidential data is released to the TEE only once it is assessed to be trustworthy. Different types of confidential computing define the level of data isolation used, whether virtual machine, application, or function, and the technology can be deployed in on-premise data centers, edge locations, or the public cloud. It is often compared with other privacy-enhancing computational techniques such as fully homomorphic encryption, secure multi-party computation, and Trusted Computing. Confidential computing is promoted by the Confidential Computing Consortium (CCC) industry group, whose membership includes major providers of the technology. == Properties == Trusted execution environments (TEEs) "prevent unauthorized access or modification of applications and data while they are in use, thereby increasing the security level of organizations that manage sensitive and regulated data". Trusted execution environments can be instantiated on a computer's processing components such as a central processing unit (CPU) or a graphics processing unit (GPU). In their various implementations, TEEs can provide different levels of isolation including virtual machine, individual application, or compute functions. Typically, data in use in a computer's compute components and memory exists in a decrypted state and can be vulnerable to examination or tampering by unauthorized software or administrators. According to the CCC, confidential computing protects data in use through a minimum of three properties: Data confidentiality: "Unauthorized entities cannot view data while it is in use within the TEE". Data integrity: "Unauthorized entities cannot add, remove, or alter data while it is in use within the TEE". Code integrity: "Unauthorized entities cannot add, remove, or alter code executing in the TEE". In addition to trusted execution environments, remote cryptographic attestation is an essential part of confidential computing. The attestation process assesses the trustworthiness of a system and helps ensure that confidential data is released to a TEE only after it presents verifiable evidence that it is genuine and operating with an acceptable security posture. It allows the verifying party to assess the trustworthiness of a confidential computing environment through an "authentic, accurate, and timely report about the software and data state" of that environment. "Hardware-based attestation schemes rely on a trusted hardware component and associated firmware to execute attestation routines in a secure environment". Without attestation, a compromised system could deceive others into trusting it, claim it is running certain software in a TEE, and potentially compromise the confidentiality or integrity of the data being processed or the integrity of the trusted code. == Technical approaches == Technical approaches to confidential computing may vary in which software, infrastructure and administrator elements are allowed to access confidential data. The "trust boundary," which circumscribes a trusted computing base (TCB), defines which elements have the potential to access confidential data, whether they are acting benignly or maliciously. Confidential computing implementations enforce the defined trust boundary at a specific level of data isolation. The three main types of confidential computing are: Virtual machine isolation Application isolation, also known as process isolation Function isolation, also known as library isolation Virtual machine isolation removes the elements controlled by the computer infrastructure or cloud provider, but allows potential data access by elements inside a virtual machine running on the infrastructure. Application or process isolation permits data access only by authorized software applications or processes. Function or library isolation is designed to permit data access only by authorized subroutines or modules within a larger application, blocking access by any other system element, including unauthorized code in the larger application. == Threat model == As confidential computing is concerned with the protection of data in use, only certain threat models can be addressed by this technique. Other types of attacks are better addressed by other privacy-enhancing technologies. === In scope === The following threat vectors are generally considered in scope for confidential computing: Software attacks: including attacks on the host’s software and firmware. This may include the operating system, hypervisor, BIOS, other software and workloads. Protocol attacks: including "attacks on protocols associated with attestation as well as workload and data transport". This includes vulnerabilities in the "provisioning or placement of the workload" or data that could cause a compromise. Cryptographic attacks: including "vulnerabilities found in ciphers and algorithms due to a number of factors, including mathematical breakthroughs, availability of computing power and new computing approaches such as quantum computing". The CCC notes several caveats in this threat vector, including relative difficulty of upgrading cryptographic algorithms in hardware and recommendations that software and firmware be kept up-to-date. A multi-faceted, defense-in-depth strategy is recommended as a best practice. Basic physical attacks: including cold boot attacks, bus and cache snooping and plugging attack devices into an existing port, such as a PCI Express slot or USB port. Basic upstream supply-chain attacks: including attacks that would compromise TEEs through changes such as added debugging ports. The degree and mechanism of protection against these threats varies with specific confidential computing implementations. === Out of scope === Threats generally defined as out of scope for confidential computing include: Sophisticated physical attacks: including physical attacks that "require long-term and/or invasive access to hardware" such as chip scraping techniques and electron microscope probes. Upstream hardware supply-chain attacks: including attacks on the CPU manufacturing process, CPU supply chain in key injection/generation during manufacture. Attacks on components of a host system that are not directly providing the capabilities of the trusted execution environment are also generally out-of-scope. Availability attacks: confidential computing is designed to protect the confidentiality and integrity of protected data and code. It does not address availability attacks such as Denial of Service or Distributed Denial of Service attacks. == Use cases == Confidential computing can be deployed in the public cloud, on-premise data centers, or distributed "edge" locations, including network nodes, branch offices, industrial systems and others. === Data privacy and security === Confidential computing protects the confidentiality and integrity of data and code from the infrastructure provider, unauthorized or malicious software and system administrators, and other cloud tenants, which may be a concern for organizations seeking control over sensitive or regulated data. The additional security capabilities offered by confidential computing can help accelerate the transition of more sensitive workloads to the cloud or edge locations. === Multi-party analytics === Confidential computing can enable multiple parties to engage in joint analysis using confidential or regulated data inside a TEE while preserving privacy and regulatory compliance. In this case, all parties benefit from the shared analysis, but no party's sensitive data or confidential code is exposed to the other parties or system host. Examples include multiple healthcare organizations contributing data to medical research, or multiple banks collaborating to identify financial fraud or money laundering. Oxford University researchers proposed the alternative paradigm called "Confidential Remote Computing" (CRC), which supports confidential operations in Trusted Execution Environments across endpoint computers considering multiple stakeholders as mutually distrustful data, algorithm and hardware providers. === Confidential generative AI === Confidential computing technologies can be applied to various stages of a generative AI deployments to help increase data or model privacy, security, and regulatory compliance. TEEs and remote attestation can protect the integrity of data during AI model training, keep

    Read more →
  • Ontology for Biomedical Investigations

    Ontology for Biomedical Investigations

    The Ontology for Biomedical Investigations (OBI) is an open-access, integrated ontology for the description of biological and clinical investigations. OBI provides a model for the design of an investigation, the protocols and instrumentation used, the materials used, the data generated and the type of analysis performed on it. The project is being developed as part of the OBO Foundry and as such adheres to all the principles therein such as orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and the use of a common formal language. In OBI the common formal language used is the Web Ontology Language (OWL). As of March 2008, a pre-release version of the ontology was made available at the project's SVN repository. == Scope == The Ontology for Biomedical Investigations (OBI) addresses the need for controlled vocabularies to support integration and joint ("cross-omics") analysis of experimental data, a need originally identified in the transcriptomics domain by the FGED Society, which developed the MGED Ontology as an annotation resource for microarray data.Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. (November 2007). "The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration". Nature Biotechnology. 25 (11): 1251–5. doi:10.1038/nbt1346. PMC 2814061. PMID 17989687. OBI uses the basic formal ontology upper-level ontology as a means of describing general entities that do not belong to a specific problem domain. As such, all OBI classes are a subclass of some BFO class. The ontology has the scope of modeling all biomedical investigations and as such contains ontology terms for aspects such as: biological material – for example blood plasma instrument (and parts of an instrument therein) – for example DNA microarray, centrifuge information content – such as an image or a digital information entity such as an electronic medical record design and execution of an investigation (and individual experiments therein) – for example study design, electrophoresis material separation data transformation (incorporating aspects such as data normalization and data analysis) – for example principal components analysis dimensionality reduction, mean calculation Less 'concrete' aspects such as the role a given entity may play in a particular scenario (for example the role of a chemical compound in an experiment) and the function of an entity (for example the digestive function of the stomach to nutriate the body) are also covered in the ontology. == OBI consortium == The MGED Ontology was originally identified in the transcriptomics domain by the FGED Society and was developed to address the needs of data integration. Following a mutual decision to collaborate, this effort later became a wider collaboration between groups such as FGED, PSI and MSI in response to the needs of areas such as transcriptomics, proteomics and metabolomics and the FuGO (Functional Genomics Investigation Ontology) was created. This later became the OBI covering the wider scope of all biomedical investigations. As an international, cross-domain initiative, the OBI consortium draws upon a pool of experts from a variety of fields, not limited to biology. The current list of OBI consortium members is available at the OBI consortium website. The consortium is made up of a coordinating committee which is a combination of two subgroups, the Community Representative (those representing a particular biomedical community) and the Core Developers (ontology developers who may or may not be members of any single community). Separate to the coordinating committee is the Developers Working Group which consists of developers within the communities collaborating in the development of OBI at the discretion of current OBI Consortium members. == Papers on OBI ==

    Read more →
  • Operational historian

    Operational historian

    In manufacturing, an operational historian is a time-series database application that is developed for operational process data. Historian software is often embedded or used in conjunction with standard DCS and PLC control systems to provide enhanced data capture, validation, compression, and aggregation capabilities. Historians have been deployed in almost every industry and contribute to functions such as supervisory control, performance monitoring, quality assurance, and, more recently, machine learning applications which can learn from vast quantities of historical data. These systems were originally developed to capture instrumentation and control data, which led many to use the term "tag" for a stream of process data, referring to the physical "tags" which had been placed on instrumentation for manually capturing data. Raw data may be accessed via OPC HDA, SQL, or REST API interfaces. == Operational Support == Operational historians are typically used within the manufacturing facility by engineers and operators for supervisory functions and analysis. An operational historian will typically capture all instrumentation and control data, whereas an enterprise historian that is deployed to support business functions will capture only a subset of the plant data. Typically, these applications offer data access through dedicated APIs (Application Programming Interfaces) and SDKs (Software Development Kits) which offer high-performance read and write operations. These operate through vendor-specific or custom applications. Front-end tools for trending process data over time are the most common interfaces to these databases. Because these applications are typically deployed next to or near the source of their process data, they are often marketed and sold as 'real-time database systems.' This distinction varies among vendors, who often have to make tradeoffs in performance between data capture and presentation, and application and analysis functionality. The following is a list of typical challenges for operational historians: data collection from instrumentation and controls storage and archiving of very large volumes of data organization of data in the form of "tags" or "points" limiting of monitoring (alarms) and validation aggregation and interpolation manual data entry (MDE) == Data access == As opposed to enterprise historians, the data access layer in the operational historian is designed to offer sophisticated data fetching modes without complex information analysis facilities. The following settings are typically available for data access operations: Data scope (single point or tag, history based on time range, history based on sample count) Request modes (raw data, last-known value, aggregation, interpolation) Sampling (single point, all points without sampling, all points with interval sampling) Data omission (based on the sample quality, based on the sample value, based on the count) Even though the operational historians are rarely relational database management systems, they often offer SQL-based interfaces to query the database. In most of such implementations, the dialect does not follow the SQL standard in order to provide syntax for specifying data access operations parameters.

    Read more →