AI Essay Writer

AI Essay Writer — hands-on reviews, top picks, pricing, pros and cons and a practical how-to guide on Aizhi.

  • Automation

    Automation

    Automation describes a wide range of technologies that reduce human intervention in processes, mainly by predetermining decision criteria, subprocess relationships, and related actions, as well as embodying those predeterminations in machines. Automation has been achieved by various means including mechanical, hydraulic, pneumatic, electrical, electronic devices, and computers, usually in combination. Complicated systems, such as modern factories, airplanes, and ships typically use combinations of all of these techniques. The benefits of automation includes labor savings, reducing waste, savings in electricity costs, savings in material costs, and improvements to quality, accuracy, and precision. Automation includes the use of various equipment and control systems such as machinery, processes in factories, boilers, and heat-treating ovens, switching on telephone networks, steering, stabilization of ships, aircraft and other applications and vehicles with reduced human intervention. Examples range from a household thermostat controlling a boiler to a large industrial control system with tens of thousands of input measurements and output control signals. In the simplest type of an automatic control loop, a controller compares a measured value of a process with a desired set value and processes the resulting error signal to change some input to the process, in such a way that the process stays at its set point despite disturbances. This closed-loop control is an application of negative feedback to a system. The mathematical basis of control theory began in the 18th century and advanced rapidly in the 20th. The term automation, inspired by the earlier word automatic (coming from automaton), was not widely used before 1947, when Ford established an automation department. It was during this time that the industry was rapidly adopting feedback controllers, Technological advancements introduced in the 1930s revolutionized various industries significantly. The World Bank's World Development Report of 2019 shows evidence that the new industries and jobs in the technology sector outweigh the economic effects of workers being displaced by automation. Job losses and downward mobility blamed on automation have been cited as one of many factors in the resurgence of nationalist, protectionist and populist politics in the US, UK and France, among other countries since the 2010s. == History == === Early history === It was a preoccupation of the Greeks and Arabs (in the period between about 300 BC and about 1200 AD) to keep an accurate track of time. In Ptolemaic Egypt, about 270 BC, Ctesibius described a float regulator for a water clock, a device not unlike the ball and cock in a modern flush toilet. This was the earliest feedback-controlled mechanism. The appearance of the mechanical clock in the 14th century made the water clock and its feedback control system obsolete. The Persian Banū Mūsā brothers, in their Book of Ingenious Devices (850 AD), described a number of automatic controls. Two-step level controls for fluids, a form of discontinuous variable structure controls, were developed by the Banu Musa brothers. They also described a feedback controller. The design of feedback control systems up through the Industrial Revolution was by trial-and-error, together with a great deal of engineering intuition. It was not until the mid-19th century that the stability of feedback control systems was analyzed using mathematics, the formal language of automatic control theory. The centrifugal governor was invented by Christiaan Huygens in the seventeenth century, and used to adjust the gap between millstones. === Industrial Revolution in Western Europe === The introduction of prime movers, or self-driven machines advanced grain mills, furnaces, boilers, and the steam engine created a new requirement for automatic control systems including temperature regulators (invented in 1624; see Cornelius Drebbel), pressure regulators (1681), float regulators (1700) and speed control devices. Another control mechanism was used to tent the sails of windmills. It was patented by Edmund Lee in 1745. Also in 1745, Jacques de Vaucanson invented the first automated loom. Around 1800, Joseph Marie Jacquard created a punch-card system to program looms. In 1771 Richard Arkwright invented the first fully automated spinning mill driven by water power, known at the time as the water frame. An automatic flour mill was developed by Oliver Evans in 1785, making it the first completely automated industrial process. A centrifugal governor was used by Mr. Bunce of England in 1784 as part of a model steam crane. The centrifugal governor was adopted by James Watt for use on a steam engine in 1788 after Watt's partner Boulton saw one at a flour mill Boulton & Watt were building. The governor could not actually hold a set speed; the engine would assume a new constant speed in response to load changes. The governor was able to handle smaller variations such as those caused by fluctuating heat load to the boiler. Also, there was a tendency for oscillation whenever there was a speed change. As a consequence, engines equipped with this governor were not suitable for operations requiring constant speed, such as cotton spinning. Several improvements to the governor, plus improvements to valve cut-off timing on the steam engine, made the engine suitable for most industrial uses before the end of the 19th century. Advances in the steam engine stayed well ahead of science, both thermodynamics and control theory. The governor received relatively little scientific attention until James Clerk Maxwell published a paper that established the beginning of a theoretical basis for understanding control theory. === 20th century === Relay logic was introduced with factory electrification, which underwent rapid adaptation from 1900 through the 1920s. Central electric power stations were also undergoing rapid growth and the operation of new high-pressure boilers, steam turbines and electrical substations created a great demand for instruments and controls. Central control rooms became common in the 1920s, but as late as the early 1930s, most process controls were on-off. Operators typically monitored charts drawn by recorders that plotted data from instruments. To make corrections, operators manually opened or closed valves or turned switches on or off. Control rooms also used color-coded lights to send signals to workers in the plant to manually make certain changes. The development of the electronic amplifier during the 1920s, which was important for long-distance telephony, required a higher signal-to-noise ratio, which was solved by negative feedback noise cancellation. This and other telephony applications contributed to the control theory. In the 1940s and 1950s, German mathematician Irmgard Flügge-Lotz developed the theory of discontinuous automatic controls, which found military applications during the Second World War to fire control systems and aircraft navigation systems. Controllers, which were able to make calculated changes in response to deviations from a set point rather than on-off control, began being introduced in the 1930s. Controllers allowed manufacturing to continue showing productivity gains to offset the declining influence of factory electrification. Factory productivity was greatly increased by electrification in the 1920s. U.S. manufacturing productivity growth fell from 5.2%/yr 1919–29 to 2.76%/yr 1929–41. Alexander Field notes that spending on non-medical instruments increased significantly from 1929 to 1933 and remained strong thereafter. The First and Second World Wars saw major advancements in the field of mass communication and signal processing. Other key advances in automatic controls include differential equations, stability theory and system theory (1938), frequency domain analysis (1940), ship control (1950), and stochastic analysis (1941). Starting in 1958, various systems based on solid-state digital logic modules for hard-wired programmed logic controllers (the predecessors of programmable logic controllers [PLC]) emerged to replace electro-mechanical relay logic in industrial control systems for process control and automation, including early Telefunken/AEG Logistat, Siemens Simatic, Philips/Mullard/Valvo Norbit, BBC Sigmatronic, ACEC Logacec, Akkord Estacord, Krone Mibakron, Bistat, Datapac, Norlog, SSR, or Procontic systems. In 1959 Texaco's Port Arthur Refinery became the first chemical plant to use digital control. Conversion of factories to digital control began to spread rapidly in the 1970s as the price of computer hardware fell. === Significant applications === The automatic telephone switchboard was introduced in 1892 along with dial telephones. By 1929, 31.9% of the Bell system was automatic. Automatic telephone switching originally used vacuum tube amplifiers and electro-mechanical switches, which consumed a large amount of electricity. Call volume eve

    Read more →
  • Information scientist

    Information scientist

    The term information scientist developed in the latter part of the twentieth century by Wm. Hovey Smith to describe an individual, usually with a relevant subject degree (such as one in Information and Computer Science - CIS) or high level of subject knowledge, providing focused information to scientific and technical research staff in industry. It is a role quite distinct from and complementary to that of a librarian. Developments in end-user searching, together with some convergence between the roles of librarian and information scientist, have led to a diminution in its use in this context, and the term information officer or information professional (information specialist) are also now used. The term was, and is, also used for an individual carrying out research in information science. Brian C. Vickery mentions that the Institute of Information Scientists (IIS) was established in London during 1958 and lists the criteria put forward by this institute "Criteria for Information Science" (appendix 1) as well as his own "Areas of study in information science" (appendix 2). The IIS merged with the Library Association in 2002 to form the Chartered Institute of Library and Information Professionals (CILIP). == Notable Information Scientists == See also Award of Merit - Association for Information Science and Technology Marcia Bates David Blair (information technologist) Samuel C. Bradford Michael Buckland John M. Carroll Blaise Cronin Emilia Currás Brenda Dervin Eugene Garfield Paul B. Kantor Frederick Wilfrid Lancaster Calvin Mooers Tefko Saracevic Linda C. Smith Robert Saxton Taylor Brian Campbell Vickery Thomas D. Wilson == Additional reading == Ellis, David and Merete Haugan. (1997) "Modelling the information seeking patterns of engineers and research scientists in an industrial environment" (Journal of Documentation, Volume 53(4): pp. 384–403) Poole, Alex H. (2024). "'There's a big difference between going through life with the wind at your back, and going through life leaning into the wind': Feminism in Post-World War II Information Science". Proceedings of the Association for Information Science and Technology. 61: 300–313. doi:10.1002/pra2.1029. Vickery, Brian Campbell (1988) "Essays presented to B. C. Vickery" (Journal of Documentation, Volume 44, pp. 199–283). Vickery, B. & Vickery, A. (1987) Information Science in theory and practice (London: Bowker-Saur, pp. 361–369)

    Read more →
  • Reservoir sampling

    Reservoir sampling

    Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far. == Motivation == Suppose we see a sequence of items, one at a time. We want to keep 10 items in memory, and we want them to be selected at random from the sequence. If we know the total number of items n and can access the items arbitrarily, then the solution is easy: select 10 distinct indices i between 1 and n with equal probability, and keep the i-th elements. The problem is that we do not always know the exact n in advance. == Simple: Algorithm R == A simple and popular but slow algorithm, Algorithm R, was created by Jeffrey Vitter. Initialize an array R {\displaystyle R} indexed from 1 {\displaystyle 1} to k {\displaystyle k} , containing the first k items of the input x 1 , . . . , x k {\displaystyle x_{1},...,x_{k}} . This is the reservoir. For each new input x i {\displaystyle x_{i}} , generate a random number j uniformly in { 1 , . . . , i } {\displaystyle \{1,...,i\}} . If j ∈ { 1 , . . . , k } {\displaystyle j\in \{1,...,k\}} , then set R [ j ] := x i . {\displaystyle R[j]:=x_{i}.} Otherwise, discard x i {\displaystyle x_{i}} . Return R {\displaystyle R} after all inputs are processed. This algorithm works by induction on i ≥ k {\displaystyle i\geq k} . While conceptually simple and easy to understand, this algorithm needs to generate a random number for each item of the input, including the items that are discarded. The algorithm's asymptotic running time is thus O ( n ) {\displaystyle O(n)} . Generating this amount of randomness and the linear run time causes the algorithm to be unnecessarily slow if the input population is large. This is Algorithm R, implemented as follows: == Optimal: Algorithm L == If we generate n {\displaystyle n} random numbers u 1 , . . . , u n ∼ U [ 0 , 1 ] {\displaystyle u_{1},...,u_{n}\sim U[0,1]} independently, then the indices of the smallest k {\displaystyle k} of them is a uniform sample of the k {\displaystyle k} -subsets of { 1 , . . . , n } {\displaystyle \{1,...,n\}} . The process can be done without knowing n {\displaystyle n} : Keep the smallest k {\displaystyle k} of u 1 , . . . , u i {\displaystyle u_{1},...,u_{i}} that has been seen so far, as well as w i {\displaystyle w_{i}} , the index of the largest among them. For each new u i + 1 {\displaystyle u_{i+1}} , compare it with u w i {\displaystyle u_{w_{i}}} . If u i + 1 < u w i {\displaystyle u_{i+1} Read more →

  • ARMA International

    ARMA International

    ARMA International (formerly the Association of Records Managers and Administrators) is an American not-for-profit professional association for information professionals – primarily information management (including records management) and information governance, and related industry practitioners and vendors. The association provides educational opportunities and publications covering aspects of information management broadly. == History == The Association was founded in 1955. In 1975, the Association of Records Executives and Administrators (AREA) and the American Records Management Association merged to form ARMA International. The headquarters for ARMA International is located in Overland Park, Kansas. == Operations == ARMA International services professionals in the United States, Canada, Japan, and the United Kingdom. Its members include records managers, attorneys, information technology professionals, consultants, and archivists involved in various aspects of managing records and information assets. ARMA hosts an annual conference with the goal of bringing together record and information management professionals from around the world – In 2023, ARMA hosted conferences in both the United States and Canada. Topics addressed in the 120+ educational sessions include advanced technology, creating information structure, ediscovery and information law, information management fundamentals, information project management, and reducing organizational information risk. The expo features exhibitors displaying records and information technologies, products, and services.

    Read more →
  • Supermind AI

    Supermind AI

    Supermind is a state-funded Chinese artificial intelligence platform that tracks scientists and researchers internationally. The platform is the flagship project of Shenzhen's International Science and Technology Information Center. It mines data from science and technology databases such as Springer, Wiley, Clarivate and Elsevier. It is intended to detect technological breakthroughs and to identify possible sources of talent as part of China's efforts to advance technologically. The platform also uses government data security and security intelligence organizations such as Peng Cheng Laboratory, the China National GeneBank, BGI Group and the Key Laboratory of New Technologies of Security Intelligence. According to Hong Kong-based Asia Times, the platform, "While not an overt espionage tool...may be used to identify key personnel who could be bribed, deceived or manipulated into divulging classified information". The Organisation for Economic Co-operation and Development (OECD) flagged the project as an incident, meaning it may be of interest to policymakers and other stakeholders. US technology group American Edge Project criticized the project as a global risk of China's security services using the platform to place agents in jobs with access to important information, recruit technical personnel, and identify targets for hacking operations.

    Read more →
  • Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics (CPMD) refers to either a method used in molecular dynamics (also known as the Car–Parrinello method) or the computational chemistry software package used to implement this method. The CPMD method is one of the major methods for calculating ab initio molecular dynamics (ab initio MD or AIMD). Ab initio molecular dynamics (AIMD) is a computational method that uses first principles through quantum mechanics to simulate the motion of atoms in a system. It is a type of molecular dynamics (MD) simulation that does not rely on empirical potentials or force fields to describe the interactions between atoms, but rather calculates these interactions entirely from the electronic structure of the system using quantum mechanics. In an ab initio MD simulation, the total energy of the system is calculated at each time step using density functional theory (DFT), Hartree-Fock (HF), or other electronic structure calculation methods. The forces acting on each atom are then determined from the gradient of the energy with respect to the atomic coordinates, and the equations of motion are solved to predict the trajectory of the atoms. AIMD permits chemical bond breaking and forming events to occur and accounts for electronic polarization effect. Therefore, Ab initio MD simulations can be used to study a wide range of phenomena, including the structural, thermodynamic, and dynamic properties of materials and chemical reactions. They are particularly useful for systems that are not well described by empirical potentials or force fields, such as systems with strong electronic correlation or systems with many degrees of freedom. However, ab initio MD simulations are computationally demanding and require significant computational resources. The CPMD method is related to the more common Born–Oppenheimer molecular dynamics (BOMD) method in that the quantum mechanical effect of the electrons is included in the calculation of energy and forces for the classical motion of the nuclei. CPMD and BOMD are different types of AIMD. However, whereas BOMD treats the electronic structure problem within the time-independent Schrödinger equation, CPMD explicitly includes the electrons as active degrees of freedom, via (fictitious) dynamical variables. The software is a parallelized plane wave / pseudopotential implementation of density functional theory, particularly designed for ab initio molecular dynamics. == Car–Parrinello method == The Car–Parrinello method is a type of molecular dynamics, usually employing periodic boundary conditions, planewave basis sets, and density functional theory, proposed by Roberto Car and Michele Parrinello in 1985 while working at SISSA, who were subsequently awarded the Dirac Medal by ICTP in 2009. In contrast to Born–Oppenheimer molecular dynamics wherein the nuclear (ions) degree of freedom are propagated using ionic forces which are calculated at each iteration by approximately solving the electronic problem with conventional matrix diagonalization methods, the Car–Parrinello method explicitly introduces the electronic degrees of freedom as (fictitious) dynamical variables, writing an extended Lagrangian for the system which leads to a system of coupled equations of motion for both ions and electrons. In this way, an explicit electronic minimization at each time step, as done in Born–Oppenheimer MD, is not needed: after an initial standard electronic minimization, the fictitious dynamics of the electrons keeps them on the electronic ground state corresponding to each new ionic configuration visited along the dynamics, thus yielding accurate ionic forces. In order to maintain this adiabaticity condition, it is necessary that the fictitious mass of the electrons is chosen small enough to avoid a significant energy transfer from the ionic to the electronic degrees of freedom. This small fictitious mass in turn requires that the equations of motion are integrated using a smaller time step than the one (1–10 fs) commonly used in Born–Oppenheimer molecular dynamics. Currently, the CPMD method can be applied to systems that consist of a few tens or hundreds of atoms and access timescales on the order of tens of picoseconds. == General approach == In CPMD the core electrons are usually described by a pseudopotential and the wavefunction of the valence electrons are approximated by a plane wave basis set. The ground state electronic density (for fixed nuclei) is calculated self-consistently, usually using the density functional theory method. Kohn-Sham equations are often used to calculate the electronic structure, where electronic orbitals are expanded in a plane-wave basis set. Then, using that density, forces on the nuclei can be computed, to update the trajectories (using, e.g. the Verlet integration algorithm). In addition, however, the coefficients used to obtain the electronic orbital functions can be treated as a set of extra spatial dimensions, and trajectories for the orbitals can be calculated in this context. == Fictitious dynamics == CPMD is an approximation of the Born–Oppenheimer MD (BOMD) method. In BOMD, the electrons' wave function must be minimized via matrix diagonalization at every step in the trajectory. CPMD uses fictitious dynamics to keep the electrons close to the ground state, preventing the need for a costly self-consistent iterative minimization at each time step. The fictitious dynamics relies on the use of a fictitious electron mass (usually in the range of 400 – 800 a.u.) to ensure that there is very little energy transfer from nuclei to electrons, i.e. to ensure adiabaticity. Any increase in the fictitious electron mass resulting in energy transfer would cause the system to leave the ground-state BOMD surface. === Lagrangian === L = 1 2 ( ∑ I n u c l e i M I R ˙ I 2 + μ ∑ i o r b i t a l s ∫ d r | ψ ˙ i ( r , t ) | 2 ) − E [ { ψ i } , { R I } ] + ∑ i j Λ i j ( ∫ d r ψ i ψ j − δ i j ) , {\displaystyle {\mathcal {L}}={\frac {1}{2}}\left(\sum _{I}^{\mathrm {nuclei} }\ M_{I}{\dot {\mathbf {R} }}_{I}^{2}+\mu \sum _{i}^{\mathrm {orbitals} }\int d\mathbf {r} \ |{\dot {\psi }}_{i}(\mathbf {r} ,t)|^{2}\right)-E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]+\sum _{ij}\Lambda _{ij}\left(\int d\mathbf {r} \ \psi _{i}\psi _{j}-\delta _{ij}\right),} where μ {\displaystyle \mu } is the fictitious mass parameter; E[{ψi},{RI}] is the Kohn–Sham energy density functional, which outputs energy values when given Kohn–Sham orbitals and nuclear positions. === Orthogonality constraint === ∫ d r ψ i ∗ ( r , t ) ψ j ( r , t ) = δ i j , {\displaystyle \int d\mathbf {r} \ \psi _{i}^{}(\mathbf {r} ,t)\psi _{j}(\mathbf {r} ,t)=\delta _{ij},} where δij is the Kronecker delta. === Equations of motion === The equations of motion are obtained by finding the stationary point of the Lagrangian under variations of ψi and RI, with the orthogonality constraint. M I R ¨ I = − ∇ I E [ { ψ i } , { R I } ] {\displaystyle M_{I}{\ddot {\mathbf {R} }}_{I}=-\nabla _{I}\,E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]} μ ψ ¨ i ( r , t ) = − δ E δ ψ i ∗ ( r , t ) + ∑ j Λ i j ψ j ( r , t ) , {\displaystyle \mu {\ddot {\psi }}_{i}(\mathbf {r} ,t)=-{\frac {\delta E}{\delta \psi _{i}^{}(\mathbf {r} ,t)}}+\sum _{j}\Lambda _{ij}\psi _{j}(\mathbf {r} ,t),} where Λij is a Lagrangian multiplier matrix to comply with the orthonormality constraint. === Born–Oppenheimer limit === In the formal limit where μ → 0, the equations of motion approach Born–Oppenheimer molecular dynamics. == Software packages == There are a number of software packages available for performing AIMD simulations. Some of the most widely used packages include: CP2K: an open-source software package for AIMD. Quantum Espresso: an open-source package for performing DFT calculations. It includes a module for AIMD. VASP: a commercial software package for performing DFT calculations. It includes a module for AIMD. Gaussian: a commercial software package that can perform AIMD. NWChem: an open-source software package for AIMD. LAMMPS: an open-source software package for performing classical and ab initio MD simulations. SIESTA: an open-source software package for AIMD. ORCA: a general-purpose quantum chemistry package. == Applications == Studying the behavior of water across different environments, such as near a hydrophobic graphene sheet. Investigating the structure and dynamics of liquid water at ambient temperature. Solving the heat transfer problems (heat conduction and thermal radiation), such as in Si/Ge superlattices. Probing the proton transfer along hydrogen-bonds in different environments, such as in 1D water chains inside carbon nanotubes. Evaluating the critical point of crystals, composites, and solid-state materials, such as aluminum. Predicting and modelling different phases and phase transitions, such as in the amorphous phase of the phase-change memory material GeSbTe. Studying the combustion of combustibles, such as lignite-water systems. Measuring th

    Read more →
  • Ecoinformatics

    Ecoinformatics

    Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics. A few definitions have been circulating, mostly centered on the creation of tools to access and analyze natural system data. However, the scope and aims of ecoinformatics are certainly broader than the development of metadata standards to be used in documenting datasets. Ecoinformatics aims to facilitate environmental research and management by developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different environmental datasets to be combined to test ecological hypotheses. Ecoinformatics is related to the concept of ecosystem services. Ecoinformatics characterize the semantics of natural system knowledge. For this reason, much of today's ecoinformatics research relates to the branch of computer science known as knowledge representation, and active ecoinformatics projects are developing links to activities such as the Semantic Web. Current initiatives to effectively manage, share, and reuse ecological data are indicative of the increasing importance of fields like ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE, Data Conservancy, and Artificial Intelligence for Environment & Sustainability. == Software Development Lifecycle == Central to the concept of ecoinformatics is the Software Development Lifecycle (SDLC), a systematic framework for writing, implementing, and maintaining software products. Typically in Ecoinformatics projects, the development pipeline includes data collection, usually from several different environmental data sources, then integrating these data sources together, and then analyzing the data. Here, each step of the SDLC is described in the context of ecoinformatics, per Michener et al. It is important to note that the plan, collect, assure, describes and preserve steps refer to the data collection entity, which can be individual researchers or large data-collection networks, while the discover, integrate, and analyze steps typically refer to the individual researcher. Plan: Ecoinformatics projects require data from several databases. Each database holds different data, and therefore researchers should identify what types of environmental or ecological data they will need to answer their research question. Collect: Data is collected in several different ways. In ecoinformatics, this is usually restricted to manually entering data into a spreadsheet, and parsing data from an existing database. The growth of relational databases has made it easier for ecologists to download relevant data and integrate datasets together Assure: Data entries should be checked thoroughly to validate their accuracy and usability, such as to check for outliers and erroneous points. The same principle applies to data downloaded from datasets. This responsibility falls on both the ecologist downloading the data, and the entity that sets up the data collection system. Describe: An accurate description of the metadata of a dataset that is used in a study should include enough information to deduce the data collection and processing methodology, when the data were collected, why the data were collected, and how the data were stored. This is important for reproducibility, especially for projects that build on each other and may recycle data Preserve: After data is collected by an institutional entity, it should be archived such that it is easily accessible. Ideally, this is in databases that are maintained and not at risk of deprecation Discover: While there are good practices for discovering data to start a research project, this process is often marred by a lack of usable, published data, as researchers may collect data specific to their study, but may not publish this data for wider use. On the data collection end, this can be addressed by better data-sharing practices, such as by linking datasets when publishing papers or studies. On the data procurement end, this can be addressed by more precise data searching, such as using key words to find relevant datasets. Integrate: Synthesizing datasets together can be difficult and labor-intensive, largely due to the methodological differences in data collection. There are several approaches to this, but the best practices typically involve computational approaches, namely using R or Python, to automate the processes and prevent errors Analyze: Data analysis can take several forms, and should be tailored to the specific ecological project. However, all data analysis methods should be well-documented, including the procedure for analysis, justification for analysis methods, and any shortcomings in a specific approach. == Applications of Ecoinformatics Across Ecology == === Ecosystem Ecology === Source: Ecosystem studies, by definition, encompass interactions across the entire life sciences spectrum, from microscopic biochemical reactions to large-scale geological phenomena. As a result, big databases may not be designed specifically for any particular research question, but should be inclusive enough to support most studies. Since ecosystem-level questions require a broad perspective, data-related ecosystem projects would likely incorporate data from several databases. A common framework for incorporating data into ecosystem-level studies is the network science model, in which data collection mechanisms and resources are treated like a large, interconnected network instead of individual entities. The network may include several data collection stations within one databases, or may span across multiple databases. Currently there are several large-scale networks, but they do not generate data on the scale to consider ecology as a big data science. A current challenge for ecoinformatics in ecosystem ecology is that most funding is prioritized for generating new data rather than maintaining existing data infrastructures. Integrating data across the different spatial scales can also be difficult, since each dataset may hold different types of data. === Urban Ecology === Source: The current push for smart cities, and sensor network integration into infrastructure, has positioned as a major source of data for ecological studies. Typical urban ecology questions address the effects of urbanization on the local ecosystem, and how to drive future development to promote urban biodiversity. While sensor networks in cities typically collect environmental data to optimize city processes, they may also be used for ecological initiatives, especially with respect to understanding the complex, multi-layered relationship between cities and their local ecosystem. It can also be used to better understand the current landscape of cities, and identify avenues for rewinding of cities. For example, analyzing mobility patterns can identify areas that may lend themselves well to building parks and green spaces. Bird watching data can also be used to identify the types of bird species in a local area. === Infectious Disease === Source: Like other disciplines of ecology, emerging infectious disease and epidemiology span multiple scales, from understanding the genetics that drive disease trends to large-scale spatiotemporal analyses. As a result, infectious disease studies can incorporate everything from bioinformatics, genetic sequences, amino acid sequences, and environmental observation data. On the micro-scale, these data can then be used to predict infectivity/transmissibility, drug resistance, drug candidates, and mutation sites. On the macro-scale, it can be used to identify societal trends or environmental factors that lend themselves to spillover, locations of infection, and practices that cause disease transmission. == Databases == Source: USGS National Streamflow sensor network GBIF Neotoma Paleobiology database European Vegetation Archive USDA Forest Inventory Analysis TRY BIEN AmeriFlux TEAM iNaturalist NEON GLEON LTER CZO TERN SAEON

    Read more →
  • E-Science librarianship

    E-Science librarianship

    E-Science librarianship refers to a role for librarians in e-Science. == Early scholars == Early references to e-Science and librarianship involve information studies scholars researching cyberinfrastructure and emerging networked information and knowledge communities. Notably Christine Borgman, Professor and Presidential Chair in Information Studies at the University of California, Los Angeles (UCLA) was a key player in bringing e-Science, and the idea of networked knowledge communities, to the attention of the library profession. In 2004, as a visiting fellow at the Oxford Internet Institute, she conducted research and lectured publicly on e-Science, Digital Libraries, and Knowledge Communities. In 2007 Anna K. Gold, formerly of MIT and Cal Poly, San Luis Obispo, authored a series of articles in D-Lib Magazine that opened the door for academic libraries to begin exploring roles, skills, and strategies for engaging in e-Science: Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians and Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries. == Academic research and health sciences libraries == In 2007, the Association of Research Libraries (ARL) e-Science task force issued its report on e-Science and librarianship. The ARL's report encouraged its member libraries to position themselves to engage with researchers involved in e-Science (eScience) by cultivating new research support strategies and developing their digital scholarship infrastructure. E-Science has multiple attributes; Tony and Jessie Hey framed e-Science for the library community by characterizing it as a research methodology: "e-Science is not a new scientific discipline in its own right: e-Science is shorthand for the set of tools and technologies required to support collaborative, networked science". In addition to academic libraries' interests in providing support for their researchers engaging in e-Science, the health sciences library community also emerged as a major proponent for creating librarian positions for supporting the information needs of large-scale, networked, research collaborations on their campuses. Neil Rambo, current director of NYU's Health Sciences Library and former director of University of Washington Health Sciences Library, was the first to use the term in the Journal of the Medical Library Association, in his 2009 editorial e-Science and the Biomedical Library. Rambo's definition of e-Science highlighted the potential e-Science held for creating data as a research product: "E-science is a new research methodology, fueled by networked capabilities and the practical possibility of gathering and storing vast amounts of data." In response to this article the University of Massachusetts Medical School Lamar Soutter Library and National Network of Libraries of Medicine, New England Region encouraged health sciences libraries to cooperate to identify skills and develop a program for training e-Science Librarians. Then, in 2013, Shannon Bohle, an archivist who was employed in the library at Cold Spring Harbor Laboratory, an NCI-designated basic cancer research facility, used experience gained there and previous papers and presentations about preserving scientific archival materials to expand the traditional definition of e-Science by including the terms, principles, and practices used in archival science. These included in the definition the "long-term storage and accessibility of all materials generated through the scientific process," as well as examples of material types traditionally preserved in archives, like "electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints," as well as library materials ("print and/or electronic publications"). == Roles == Many areas of science are about to be transformed by the availability of vast amounts of new scientific data that can potentially provide insights at a level of detail never before envisaged. However, this new data dominant era brings new challenges for the scientists and they will need the skills and technologies both of computer scientists and of the library community to manage, search and curate these new data resources. Libraries will not be immune from change in this new world of research. Karen Williams identifies roles in the following areas for librarians in the developing world of e-Science. Campus Engagement Content/Collection Development and Management Teaching and Learning Scholarly Communication E-Scholarship and Digital Tools Reference/Help Services Outreach Fund Raising Exhibit and Event Planning Leadership == Challenges for research libraries == E-science tends toward inter- and multidisciplinary approaches that depend on computation and computer science. Research libraries have traditionally been discipline focused and, although increasingly technologically sophisticated, do not have systems of the scale or complexity of the e-science environment. E-science is data intensive, but research libraries have not typically been responsible for scientific data. E-science is frequently conducted in a team context, often distributed across multiple institutions and on a global scale. The primary constituency of libraries generally comprises those affiliated with the local institution. Licenses for electronic content are typically restricted to a particular institutional community, and the infrastructure to move institutional licenses into a multi-institutional environment is not well developed. E-science challenges all these traditional paradigms of research library organization and services. == Skills == Garritano & Carlson were among the first to outline a skill set for librarians seeking to support the data needs of e-Science; they identified five skill categories librarians new to this area should expect to adapt or develop when participating on such projects: Library and information science expertise Subject expertise Partnerships and outreach (both internal and external) Participating in sponsored research Balancing workload An example of librarians reconfiguring traditional librarian skills to meet the needs of researchers engaging in e-Science is Witt & Carlson's adaptation of the traditional reference interview into a "data interview" in order to provide effective data management and e-Science services. This interview consists of ten practical queries necessary for understanding the provenance and expectations for the preservation of datasets typical of e-Science that also help illustrate some of the educational tools and skills needed by a librarian new to e-Science. "What is the story of the data? What form and format are the data in? What is the expected lifespan of the dataset? How could the data be used, reused, and repurposed? How large is the dataset, and what is its rate of growth? Who are the potential audiences for the data? Who owns the data? Does the dataset include any sensitive information? What publications or discoveries have resulted from the data? How should the data be made accessible?" == Resources == In 2009 the Lamar Soutter Library at the University of Massachusetts Medical School (UMMS) and the National Network of Libraries of Medicine, New England Region (NN/LM NER) funded an e-Science program for building the skills highlighted above for librarians. Elaine Russo Martin, Director of Library Services at the Lamar Soutter Library and Director of the NN/LM NER developed this comprehensive e-Science program to build librarians' subject expertise in the sciences, developing their data management skills, and their familiarity with cyberinfrastructure and e-Science. Three major products of this program are the e-Science web portal for librarians, the E-Science Symposium, and the New England Collaborative Data Management Curriculum (NECDMC). This portal includes educational resources for specific tools and subject/discipline tutorials and modules to assist librarians new to e-Science. UMMS and NN/LM NER also publish an open access journal called the Journal of eScience Librarianship.

    Read more →
  • Artificial consciousness

    Artificial consciousness

    Artificial consciousness, also known as machine consciousness, synthetic consciousness, or digital consciousness, is consciousness hypothesized to be possible for artificial intelligence. It is also the corresponding field of study, which draws insights from philosophy of mind, philosophy of artificial intelligence, cognitive science and neuroscience. The term "sentience" can be used when specifically designating ethical considerations stemming from a form of phenomenal consciousness (P-consciousness, or the ability to feel qualia). Since sentience involves the ability to experience ethically positive or negative (i.e., valenced) mental states, it may justify welfare concerns and legal protection, as with non-human animals. Some scholars believe that consciousness is generated by the interoperation of various parts of the brain; these mechanisms are labeled the neural correlates of consciousness (NCC). Some further believe that constructing a system (e.g., a computer system) that can emulate this NCC interoperation would result in a system that is conscious. Some scholars reject the possibility of non-biological conscious beings. == Philosophical views == As there are many hypothesized types of consciousness, there are many potential implementations of artificial consciousness. In the philosophical literature, perhaps the most common taxonomy of consciousness is into "access" and "phenomenal" variants. Access consciousness concerns those aspects of experience that can be apprehended, while phenomenal consciousness concerns those aspects of experience that seemingly cannot be apprehended, instead being characterized qualitatively in terms of "raw feels", "what it is like" or qualia. === Plausibility debate === Type-identity theorists and other skeptics hold the view that consciousness can be realized only in particular physical systems because consciousness has properties that necessarily depend on physical constitution. In his 2001 article "Artificial Consciousness: Utopia or Real Possibility," Giorgio Buttazzo says that a common objection to artificial consciousness is that, "Working in a fully automated mode, they [the computers] cannot exhibit creativity, unreprogrammation (which means can 'no longer be reprogrammed', from rethinking), emotions, or free will. A computer, like a washing machine, is a slave operated by its components." For other theorists (e.g., functionalists), who define mental states in terms of causal roles, any system that can instantiate the same pattern of causal roles, regardless of physical constitution, will instantiate the same mental states, including consciousness. ==== Thought experiments ==== David Chalmers proposed two thought experiments intending to demonstrate that "functionally isomorphic" systems (those with the same "fine-grained functional organization", i.e., the same information processing) will have qualitatively identical conscious experiences, regardless of whether they are based on biological neurons or digital hardware. The "fading qualia" is a reductio ad absurdum thought experiment. It involves replacing, one by one, the neurons of a brain with a functionally identical component, for example based on a silicon chip. Chalmers makes the hypothesis, knowing it in advance to be absurd, that "the qualia fade or disappear" when neurons are replaced one-by-one with identical silicon equivalents. Since the original neurons and their silicon counterparts are functionally identical, the brain's information processing should remain unchanged, and the subject's behaviour and introspective reports would stay exactly the same. Chalmers argues that this leads to an absurd conclusion: the subject would continue to report normal conscious experiences even as their actual qualia fade away. He concludes that the subject's qualia actually don't fade, and that the resulting robotic brain, once every neuron is replaced, would remain just as sentient as the original biological brain. Similarly, the "dancing qualia" thought experiment is another reductio ad absurdum argument. It supposes that two functionally isomorphic systems could have different perceptions (for instance, seeing the same object in different colors, like red and blue). It involves a switch that alternates between a chunk of brain that causes the perception of red, and a functionally isomorphic silicon chip, that causes the perception of blue. Since both perform the same function within the brain, the subject would not notice any change during the switch. Chalmers argues that this would be highly implausible if the qualia were truly switching between red and blue, hence the contradiction. Therefore, he concludes that the equivalent digital system would not only experience qualia, but it would perceive the same qualia as the biological system (e.g., seeing the same color). Greg Egan's short story Learning To Be Me (mentioned in §In fiction), illustrates how undetectable duplication of the brain and its functionality could be from a first-person perspective. Critics object that Chalmers' proposal begs the question in assuming that all mental properties and external connections are already sufficiently captured by abstract causal organization. Van Heuveln et al. argue that the dancing qualia argument contains an equivocation fallacy, conflating a "change in experience" between two systems with an "experience of change" within a single system. Mogensen argues that the fading qualia argument can be resisted by appealing to vagueness at the boundaries of consciousness and the holistic structure of conscious neural activity, which suggests consciousness may require specific biological substrates rather than being substrate-independent. Anil Seth argues that the complexity of brain neurons intrinsically matters in addition to their function and that it is not possible to replace any part of the brain with a perfect silicon equivalent. He points out that some of biological neurons exhibit activity aimed at cleaning up metabolic waste products, and writes that a perfect silicon replacement would require a silicon-based metabolism, but silicon is not suitable for creating such artificial metabolism. ==== In large language models ==== In 2022, Google engineer Blake Lemoine made a viral claim that Google's LaMDA chatbot was sentient. Lemoine supplied as evidence the chatbot's humanlike answers to many of his questions; however, the chatbot's behavior was judged by the scientific community as likely a consequence of mimicry, rather than machine sentience. Lemoine's claim was widely derided for being ridiculous. Moreover, attributing consciousness based solely on the basis of LLM outputs or the immersive experience created by an algorithm is considered a fallacy. However, while philosopher Nick Bostrom states that LaMDA is unlikely to be conscious, he additionally poses the question of "what grounds would a person have for being sure about it?" One would have to have access to unpublished information about LaMDA's architecture, and also would have to understand how consciousness works, and then figure out how to map the philosophy onto the machine: "(In the absence of these steps), it seems like one should be maybe a little bit uncertain. [...] there could well be other systems now, or in the relatively near future, that would start to satisfy the criteria." David Chalmers argued in 2023 that LLMs today display impressive conversational and general intelligence abilities, but are likely not conscious yet, as they lack some features that may be necessary, such as recurrent processing, a global workspace, and unified agency. Nonetheless, he considers that non-biological systems can be conscious, and suggested that future, extended models (LLM+s) incorporating these elements might eventually meet the criteria for consciousness, raising both profound scientific questions and significant ethical challenges. However, the view that consciousness can exist without biological phenomena is controversial and some reject it. Kristina Šekrst cautions that anthropomorphic terms such as "hallucination" can obscure important ontological differences between artificial and human cognition. While LLMs may produce human-like outputs, she argues that it does not justify ascribing mental states or consciousness to them. Instead, she advocates for an epistemological framework (such as reliabilism) that recognizes the distinct nature of AI knowledge production. She suggests that apparent understanding in LLMs may be a sophisticated form of AI hallucination. She also questions what would happen if an LLM were trained without any mention of consciousness. === Testing === Sentience is an inherently first-person phenomenon. Because of that, and due to the lack of an empirical definition of sentience, directly measuring it may be impossible. Although systems may display numerous behaviors correlated with sentience, determining whether a system is sentient is known as the hard pr

    Read more →
  • Data (word)

    Data (word)

    The word data is most often used as a singular collective mass noun in educated everyday usage. However, due to the history and etymology of the word, considerable controversy has existed on whether it should be considered a mass noun used with verbs conjugated in the singular, or should be treated as the plural of the now-rarely-used datum. == Usage in English == In one sense, data is the plural form of datum. Datum actually can also be a count noun with the plural datums (see usage in datum article) that can be used with cardinal numbers (e.g., "80 datums"); data (originally a Latin plural) is not used like a normal count noun with cardinal numbers and can be plural with plural determiners such as these and many, or it can be used as a mass noun with a verb in the singular form. Even when a very small quantity of data is referenced (one number, for example), the phrase piece of data is often used, as opposed to datum. The debate over appropriate usage continues, but "data" as a singular form is far more common. In English, the word datum is still used in the general sense of "an item given". In cartography, geography, nuclear magnetic resonance and technical drawing, it is often used to refer to a single specific reference datum from which distances to all other data are measured. Any measurement or result is a datum, though data point is now far more common. Data is indeed most often used as a singular mass noun in educated everyday usage. Some major newspapers, such as The New York Times, use it either in the singular or plural. In The New York Times, the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day. The Wall Street Journal explicitly allows this usage in its style guide. The Associated Press style guide classifies data as a collective noun that takes the singular when treated as a unit but the plural when referring to individual items (e.g., "The data is sound" and "The data have been carefully collected"). In scientific writing, data is often treated as a plural, as in These data do not support the conclusions, but the word is also used as a singular mass entity like information (e.g., in computing and related disciplines). British usage now widely accepts treating data as singular in standard English, including everyday newspaper usage at least in non-scientific use. UK scientific publishing still prefers treating it as a plural. Some UK university style guides recommend using data for both singular and plural use, and others recommend treating it only as a singular in connection with computers. The IEEE Computer Society allows usage of data as either a mass noun or plural based on author preference, while IEEE in the editorial style manual indicates to always use the plural form. Some professional organizations and style guides require that authors treat data as a plural noun. For example, the Air Force Flight Test Center once stated that the word data is always plural, never singular.

    Read more →
  • Knowledge organization system

    Knowledge organization system

    Knowledge organization system (KOS), concept system, or concept scheme is the generic term used in knowledge organization (KO) for the selection of concepts with an indication of selected semantic relations. Despite their differences in type, coverage, and application, all KOS aim to support the organization of knowledge and information to facilitate their management and retrieval. KOS vary in complexity from simple sorted lists to complex relational networks. They represent both structural and functional features, and serve to eliminate ambiguity, control synonyms, establish relationships, and present properties. From their origins in library and information science (LIS), KOS have been applied to other domains and disciplines within science and industry, although scholarly research and debate remain primarily within the KO field. Challenges of KOS include ambiguity of terminology, repercussions of biased systems, and potential obsolescence. KOS can be expressed in RDF and RDFS as per the Simple Knowledge Organization System (SKOS) recommendation by W3C, which aims to enable the sharing and linking of KOS via the Web. One of the largest collections of KOS is the BARTOC registry. == Types == While different schema of KOS have been proposed, most are generally arranged in terms of the complexity of their construction and maintenance. Some scholars argue that organizing KOS on a spectrum oversimplifies the shared characteristics among them, and may even result in a non-ideal structure being chosen. The following types are not exhaustive, and are often not mutually-exclusive in practice. === Term lists === Term lists are the least structured form of KOS. They include lists, glossaries, dictionaries, and synonym rings. Authority files and gazetteers may also be considered term lists, however other scholars categorize them and directories as "metadata-like models". Examples include the Union List of Artist Names name authority file and the GeoNames gazetteer. === Categorization and classification === KOS that emphasize specific (and often hierarchical) structures include subject headings, taxonomies, categorization schema, and classification schema & systems. Despite inconsistent use of the terms "categorization" and "classification" in some literature, categorization is generally loosely-assembled grouping schema and may include attributes that are not mutually exclusive (or having fuzzy boundaries), while classification is related to the arrangement of non-overlapping and mutually-exclusive classes. Classification schema may be universal (such as Dewey Decimal Classification and Information Coding Classification) or domain-specific (such as the National Library of Medicine Classification). === Relationship models === The types of KOS with greatest complexity and which utilize connections between concepts include thesauri, semantic networks, and ontologies. One of the most prominent examples of a semantic network is WordNet. === Others === Certain structures proposed to be considered types of KOS—but are not consistently included in schema—include folksonomies, topic maps, web directory structures, publication organization systems, and bibliometric maps. Some KOS organize other KOS themselves—for instance, PeriodO is a gazetteer of periodization categories. == Applications == Some early KOS were developed as a support system for abstracting and indexing services to be used by specially-trained searchers. With the growth of information digitization, usability became increasingly accessible, and more complex structures were developed. Prominent examples of KOS outside of LIS include organism taxonomy in biology, the periodic table of elements in chemistry, SIC and NAICS classification systems for industry & business, and AGROVOC agricultural controlled vocabulary. == Challenges == The study and design of KOS is an ongoing topic of discussion among KO scholars. === Terminology === [There is] a serious lack of vocabulary control in the literature on controlled vocabulary. Inconsistency of terminology within the study of KOS is a common issue. For instance, "ontology" is used for both a specific type of KOS as well as a generic term for any KOS. The terms "taxonomy", "classification", and "categorization" are also sometimes used interchangeably. === Bias === As knowledge can be historically and culturally biased, scholars have also discussed how KOS themselves can perpetuate harmful practices or stereotypes. For example, a number of concerns and criticisms about the classification of mental disorders in the Diagnostic and Statistical Manual of Mental Disorders have been raised, contributing to ongoing revisions. Ethical and intentional design approaches have been proposed for multi-perspective KOS in efforts to mitigate bias and other harmful practices. === Obsolescence === The possible obsolescence of the thesaurus and other simpler KOS has been the topic of debate, especially in the face of increasingly complex ontologies, the growing usage of "Google-like retrieval systems", and the move of KO theory and research away from LIS and toward computer science. Supporters of thesauri argue its continued usefulness for metadata enrichment, vocabulary mapping, and web services, as well as its usage in specific domains such as corporate intranets and digital image libraries.

    Read more →
  • Sparse identification of non-linear dynamics

    Sparse identification of non-linear dynamics

    Sparse identification of nonlinear dynamics (SINDy) is a data-driven algorithm for obtaining dynamical systems from data. Given a series of snapshots of a dynamical system and its corresponding time derivatives, SINDy performs a sparsity-promoting regression (such as LASSO and sparse Bayesian inference) on a library of nonlinear candidate functions of the snapshots against the derivatives to find the governing equations. This procedure relies on the assumption that most physical systems only have a few dominant terms which dictate the dynamics, given an appropriately selected coordinate system and quality training data. It has been applied to identify the dynamics of fluids, based on proper orthogonal decomposition, as well as other complex dynamical systems, such as biological networks. == Mathematical Overview == First, consider a dynamical system of the form x ˙ = d d t x ( t ) = f ( x ( t ) ) , {\displaystyle {\dot {\textbf {x}}}={\frac {d}{dt}}{\textbf {x}}(t)={\textbf {f}}({\textbf {x}}(t)),} where x ( t ) ∈ R n {\displaystyle {\textbf {x}}(t)\in \mathbb {R} ^{n}} is a state vector (snapshot) of the system at time t {\displaystyle t} and the function f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} defines the equations of motion and constraints of the system. The time derivative may be either prescribed or numerically approximated from the snapshots. With x {\displaystyle {\textbf {x}}} and x ˙ {\displaystyle {\dot {\textbf {x}}}} sampled at m {\displaystyle m} equidistant points in time ( t 1 , t 2 , ⋯ , t m {\displaystyle t_{1},t_{2},\cdots ,t_{m}} ), these can be arranged into matrices of the form X = [ x T ( t 1 ) x T ( t 2 ) ⋮ x T ( t m ) ] = [ x 1 ( t 1 ) x 2 ( t 1 ) ⋯ x n ( t 1 ) x 1 ( t 2 ) x 2 ( t 2 ) ⋯ x n ( t 2 ) ⋮ ⋮ ⋱ ⋮ x 1 ( t m ) x 2 ( t m ) ⋯ x n ( t m ) ] , {\displaystyle {\bf {{X}={\begin{bmatrix}\mathbf {x} ^{\mathsf {T}}(t_{1})\\\mathbf {x} ^{\mathsf {T}}(t_{2})\\\vdots \\\mathbf {x} ^{\mathsf {T}}(t_{m})\end{bmatrix}}={\begin{bmatrix}x_{1}(t_{1})&x_{2}(t_{1})&\cdots &x_{n}(t_{1})\\x_{1}(t_{2})&x_{2}(t_{2})&\cdots &x_{n}(t_{2})\\\vdots &\vdots &\ddots &\vdots \\x_{1}(t_{m})&x_{2}(t_{m})&\cdots &x_{n}(t_{m})\end{bmatrix}},}}} and similarly for X ˙ {\displaystyle {\dot {\mathbf {X} }}} . Next, a library Θ ( X ) {\displaystyle \mathbf {\Theta } (\mathbf {X} )} of nonlinear candidate functions of the columns of X {\displaystyle {\textbf {X}}} is constructed, which may be constant, polynomial, or more exotic functions (like trigonometric and rational terms, and so on): Θ ( X ) = [ | | | | | | 1 X X 2 X 3 ⋯ sin ⁡ ( X ) cos ⁡ ( X ) ⋯ | | | | | | ] {\displaystyle \ \ \ {\bf {{\Theta }({\bf {{X})={\begin{bmatrix}\vline &\vline &\vline &\vline &&\vline &\vline &\\1&{\bf {X}}&{\bf {{X}^{2}}}&{\bf {{X}^{3}}}&\cdots &\sin({\bf {{X})}}&\cos({\bf {{X})}}&\cdots \\\vline &\vline &\vline &\vline &&\vline &\vline &\end{bmatrix}}}}}}} The number of possible model structures from this library is combinatorially high. f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} is then substituted by Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} and a vector of coefficients Ξ = [ ξ 1 ξ 2 ⋯ ξ n ] {\displaystyle {\bf {{\Xi }=\left[{\bf {{\xi }_{1}{\bf {{\xi }_{2}\cdots {\bf {{\xi }_{n}}}}}}}\right]}}} determining the active terms in f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} : X ˙ = Θ ( X ) Ξ {\displaystyle {\dot {\bf {X}}}={\bf {{\Theta }({\bf {{X}){\bf {\Xi }}}}}}} Because only a few terms are expected to be active at each point in time, an assumption is made that f ( x ( t ) ) {\displaystyle {\textbf {f}}({\textbf {x}}(t))} admits a sparse representation in Θ ( X ) {\displaystyle {\bf {{\Theta }({\textbf {X}})}}} . This then becomes an optimization problem in finding a sparse Ξ {\displaystyle {\bf {\Xi }}} which optimally embeds X ˙ {\displaystyle {\dot {\textbf {X}}}} . In other words, a parsimonious model is obtained by performing least squares regression on the system (4) with sparsity-promoting ( L 1 {\displaystyle L_{1}} ) regularization ξ k = arg ⁡ min ξ k ′ | | X ˙ k − Θ ( X ) ξ k ′ | | 2 + λ | | ξ k ′ | | 1 , {\displaystyle {\bf {{\xi }_{k}={\underset {\bf {{\xi }'_{k}}}{\arg \min }}\left|\left|{\dot {\bf {X}}}_{k}-{\bf {{\Theta }({\bf {{X}){\bf {{\xi }'_{k}}}}}}}\right|\right|_{2}+\lambda \left|\left|{\bf {{\xi }'_{k}}}\right|\right|_{1},}}} where λ {\displaystyle \lambda } is a regularization parameter. Finally, the sparse set of ξ k {\displaystyle {\bf {{\xi }_{k}}}} can be used to reconstruct the dynamical system: x ˙ k = Θ ( x ) ξ k {\displaystyle {\dot {x}}_{k}={\bf {{\Theta }({\bf {{x}){\bf {{\xi }_{k}}}}}}}}

    Read more →
  • Pharmacy automation

    Pharmacy automation

    Pharmacy automation involves the mechanical processes of handling and distributing medications. Any pharmacy task may be involved, including counting small objects (e.g., tablets, capsules); measuring and mixing powders and liquids for compounding; tracking and updating customer information in databases (e.g., personally identifiable information (PII), medical history, drug interaction risk detection); and inventory management. This article focuses on the changes that have taken place in the local, or community pharmacy since the 1960s. == History == Dispensing medications in a community pharmacy before the 1970s was a time-consuming operation. The pharmacist dispensed prescriptions in tablet or capsule form with a simple tray and spatula. Many new medications were developed by pharmaceutical manufacturers at an ever-increasing pace, and medications prices were rising steeply. A typical community pharmacist was working longer hours and often forced to hire staff to handle increased workloads which resulted in less time to focus on safety issues. These additional factors led to use of a machine to count medications. The original electronic portable digital tablet counting technology was invented in Manchester, England between 1967 and 1970 by the brothers John and Frank Kirby. I had the original idea of how the machine would work and it was my patent, but it was a joint effort getting it to work in a saleable form. It was 3 years of very hard work. I had originally studied heavy electrical engineering before changing over to Medical School and qualifying as a Medical Doctor in 1968. In fact I was Senior House (Casualty) Officer (A&E or ER) in 1970 at North Manchester General Hospital when I filed the patent. I must have been the only hospital doctor in Britain with an oscilloscope, a soldering iron and a drawing board in his room in the Doctors' Residence. The housekeepers were bemused by all the wires. Frank originally trained as a Banker but quit to take a job with a local electronics firm during the development. He died in 1987, a terrible loss. [Extract from personal communication received in March 2010 from John Kirby.] Frank and John Kirby and their associate Rodney Lester were pioneers in pharmacy automation and small-object counting technology. In 1967, the Kirbys invented a portable digital tablet counter to count tablets and capsules. With Lester they formed a limited company. In 1970, their invention was patented and put into production in Oldham, England. The tablet counter aided the pharmacy industry with time-consuming manual counting of drug prescriptions. A counting machine consistently counted medications accurately and quickly. This aspect of pharmacy automation was quickly adopted, and innovations emerged every decade to aid the pharmacy industry to deliver medications quickly, safely, and economically. Modern pharmacies have many new options to improve their workflow by using the new technology, and can choose intelligently from the many options available. === Chronology === On 1 January 1971 commercial production of the first portable digital tablet counters in the World began. John Kirby had filed U.K. Patent number GB1358378(A) on 8 September 1970 and U.S. patent number 3789194 on 9 August 1971. These early electronic counters were designed to help pharmacies replace the common (but often inaccurate) practice of counting medications by hand. In 1975, the digital technology was exported to America. In early 1980 a dedicated research, development and production facility was built in Oldham, England at a cost of £500,000. Between 1982 and 1983, two separate development facilities had been created. In America, overseen by Rodney Lester; and in England, overseen by the Kirby brothers. In 1987, Frank Kirby died. In 1989, John Kirby moved his UK facility to Devon, England. A simple to operate machine had been developed to accurately and quickly count prescription medications. Technology improvements soon resulted in a more compact model. The price of such equipment in 1980 was around £1,300. This substantial investment in new technology was a major financial consideration, but the pharmacy community considered the use of a counting machine as a superior method compared to hand-counting medications. These early devices became known as tablet counter, capsule counter, pill counter, or drug counter. The new counting technology replaced manual methods in many industries such as, vitamin and diet supplement manufacturing. Technicians needed a small, affordable device to count and bottle medications. In England and America, the 1980s and 1990s saw new the development of high-speed machines for counting and bottle filling, Like their pharmacy-based counterparts, these industrial units were designed to be fast and simple to operate, yet remain small and cost effective. In America, in the late 1990s/early 2000s a new type of tablet counter appeared. It was simple to use, compact, inexpensive, and had good counting accuracy. At the turn of the millennium technical advances allowed the design of counters with a software verification system. With an onboard computer, displaying photo images of medications to assist the pharmacist or pharmacy technician to verify that the correct medication was being dispensed. In addition, a database for storing all prescriptions that were counted on the device. Between September 2005 and May 2007, American Capital made a major financial investment in Kirby Lester, which then relocated to a larger facility to expand its research and development capabilities. This move added extra space for product research and development facility (R&D). It allowed the opportunity to develop new advanced technology products that met the pharmacy's needs for simple, accurate, and cost-effective ways to dispense prescriptions safely. Pictured here is an early American type of integrated counter and packaging device. This machine was a third generation step in the evolution of pharmacy automated devices. Later models held pre-counted containers of commonly-prescribed medications. == Global variations == In the EU member states legislation was introduced in 1998 which had a major effect on UK Pharmacy operations. It effectively prohibited the use of tablet counters for counting and dispensing bulk packaged tablets. Both usage and sales of the machines in the UK declined rapidly as a result of the introduction of blister packaging for medicines. == Current state of the industry == A tablet counter has become a standard in more than 30,000 sites in 35 countries (as of 2010) (including many non-pharmacy sites, such as manufacturing facilities that use a counting machine as a check for small items). During the 1990s through 2012, numerous new pharmacy automation products came to market. During this timeframe, counting technologies, robotics, workflow management software, and interactive voice recognition (IVR) systems for retail (both chain and independent), outpatient, government, and closed-door pharmacies (mail order and central fill) were all introduced. Additionally, the concept of scalability - of migrating from an entry-level product to the next level of automation (e.g., counting technology to robotics) - was introduced and subsequently launched a new product line in 1997. Pharmacists everywhere are making the switch to automation for its increased speed, greater accuracy, and better security. As the industry evolves and customer expectations grow, automation is becoming less of a luxury and more of a necessity. Especially for independent pharmacies, automation is now a means of keeping up with the competition of large chain pharmacies. == Technological changes and design improvements == Constant developments in technology make the dispensing of prescription medications safer, more accurate and more efficient. In America, in 2008, "next-generation" counting and verification systems were introduced. Based on the counting technology employed in preceding models, later machines included the ability to help the pharmacy operate more effectively. Equipped with a new computer interface to a pharmacy management system, with workflow and inventory software. It also included "checks and balances" to ensure the technician and pharmacist were dispensing the correct medication for each patient. This is something that is important to keep reported correctly when dealing with controlled substances like narcotics. This was a step forward to verify all 100% of prescriptions that were dispensed by pharmacy staff. In America, in 2009, further advanced counters were designed that included the ability to dispense hands-free – a feature that many operators had desired. This allowed pharmacies to automate their most commonly dispensed medications via calibrated cassettes. Thirty of a pharmacy's common medications would now be dispensed automatically. Another new model doubled that throughput via an enclosed robotic mechanism. Robo

    Read more →
  • Flajolet–Martin algorithm

    Flajolet–Martin algorithm

    The Flajolet–Martin algorithm is an algorithm for approximating the number of distinct elements in a stream with a single pass and space-consumption logarithmic in the maximal number of possible distinct elements in the stream (the count-distinct problem). The algorithm was introduced by Philippe Flajolet and G. Nigel Martin in their 1984 article "Probabilistic Counting Algorithms for Data Base Applications". Later it has been refined in "LogLog counting of large cardinalities" by Marianne Durand and Philippe Flajolet, and "HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm" by Philippe Flajolet et al. In their 2010 article "An optimal algorithm for the distinct elements problem", Daniel M. Kane, Jelani Nelson and David P. Woodruff give an improved algorithm, which uses nearly optimal space and has optimal O(1) update and reporting times. == The algorithm == Assume that we are given a hash function h a s h ( x ) {\displaystyle \mathrm {hash} (x)} that maps input x {\displaystyle x} to integers in the range [ 0 ; 2 L − 1 ] {\displaystyle [0;2^{L}-1]} , and where the outputs are sufficiently uniformly distributed. Note that the set of integers from 0 to 2 L − 1 {\displaystyle 2^{L}-1} corresponds to the set of binary strings of length L {\displaystyle L} . For any non-negative integer y {\displaystyle y} , define b i t ( y , k ) {\displaystyle \mathrm {bit} (y,k)} to be the k {\displaystyle k} -th bit in the binary representation of y {\displaystyle y} , such that: y = ∑ k ≥ 0 b i t ( y , k ) 2 k . {\displaystyle y=\sum _{k\geq 0}\mathrm {bit} (y,k)2^{k}.} We then define a function ρ ( y ) {\displaystyle \rho (y)} that outputs the position of the least-significant set bit in the binary representation of y {\displaystyle y} , and L {\displaystyle L} if no such set bit can be found as all bits are zero: ρ ( y ) = { min { k ≥ 0 ∣ b i t ( y , k ) ≠ 0 } y > 0 L y = 0 {\displaystyle \rho (y)={\begin{cases}\min\{k\geq 0\mid \mathrm {bit} (y,k)\neq 0\}&y>0\\L&y=0\end{cases}}} Note that with the above definition we are using 0-indexing for the positions, starting from the least significant bit. For example, ρ ( 13 ) = ρ ( 1101 2 ) = 0 {\displaystyle \rho (13)=\rho (1101_{2})=0} , since the least significant bit is a 1 (0th position), and ρ ( 8 ) = ρ ( 1000 2 ) = 3 {\displaystyle \rho (8)=\rho (1000_{2})=3} , since the least significant set bit is at the 3rd position. At this point, note that under the assumption that the output of our hash function is uniformly distributed, then the probability of observing a hash output ending with 2 k {\displaystyle 2^{k}} (a one, followed by k {\displaystyle k} zeroes) is 2 − ( k + 1 ) {\displaystyle 2^{-(k+1)}} , since this corresponds to flipping k {\displaystyle k} heads and then a tail with a fair coin. Now the Flajolet–Martin algorithm for estimating the cardinality of a multiset M {\displaystyle M} is as follows: Initialize a bit-vector BITMAP to be of length L {\displaystyle L} and contain all 0s. For each element x {\displaystyle x} in M {\displaystyle M} : Calculate the index i = ρ ( h a s h ( x ) ) {\displaystyle i=\rho (\mathrm {hash} (x))} . Set B I T M A P [ i ] = 1 {\displaystyle \mathrm {BITMAP} [i]=1} . Let R {\displaystyle R} denote the smallest index i {\displaystyle i} such that B I T M A P [ i ] = 0 {\displaystyle \mathrm {BITMAP} [i]=0} . Estimate the cardinality of M {\displaystyle M} as 2 R / ϕ {\displaystyle 2^{R}/\phi } , where ϕ ≈ 0.77351 {\displaystyle \phi \approx 0.77351} . The idea is that if n {\displaystyle n} is the number of distinct elements in the multiset M {\displaystyle M} , then B I T M A P [ 0 ] {\displaystyle \mathrm {BITMAP} [0]} is accessed approximately n / 2 {\displaystyle n/2} times, B I T M A P [ 1 ] {\displaystyle \mathrm {BITMAP} [1]} is accessed approximately n / 4 {\displaystyle n/4} times and so on. Consequently, if i ≫ log 2 ⁡ n {\displaystyle i\gg \log _{2}n} , then B I T M A P [ i ] {\displaystyle \mathrm {BITMAP} [i]} is almost certainly 0, and if i ≪ log 2 ⁡ n {\displaystyle i\ll \log _{2}n} , then B I T M A P [ i ] {\displaystyle \mathrm {BITMAP} [i]} is almost certainly 1. If i ≈ log 2 ⁡ n {\displaystyle i\approx \log _{2}n} , then B I T M A P [ i ] {\displaystyle \mathrm {BITMAP} [i]} can be expected to be either 1 or 0. The correction factor ϕ ≈ 0.77351 {\displaystyle \phi \approx 0.77351} (OEIS: A244256) is found by calculations, which can be found in the original article. == Improving accuracy == A problem with the Flajolet–Martin algorithm in the above form is that the results vary significantly. A common solution has been to run the algorithm multiple times with k {\displaystyle k} different hash functions and combine the results from the different runs. One idea is to take the mean of the k {\displaystyle k} results together from each hash function, obtaining a single estimate of the cardinality. The problem with this is that averaging is very susceptible to outliers (which are likely here). A different idea is to use the median, which is less prone to be influences by outliers. The problem with this is that the results can only take form 2 R / ϕ {\displaystyle 2^{R}/\phi } , where R {\displaystyle R} is integer. A common solution is to combine both the mean and the median: Create k ⋅ l {\displaystyle k\cdot l} hash functions and split them into k {\displaystyle k} distinct groups (each of size l {\displaystyle l} ). Within each group use the mean for aggregating together the l {\displaystyle l} results, and finally take the median of the k {\displaystyle k} group estimates as the final estimate. The 2007 HyperLogLog algorithm splits the multiset into subsets and estimates their cardinalities, then it uses the harmonic mean to combine them into an estimate for the original cardinality.

    Read more →
  • Magic state distillation

    Magic state distillation

    Magic state distillation is a method for creating more accurate quantum states from multiple noisy ones, which is important for building fault tolerant quantum computers. It has also been linked to quantum contextuality, a concept thought to contribute to quantum computers' power. The technique was first proposed by Emanuel Knill in 2004, and further analyzed by Sergey Bravyi and Alexei Kitaev the same year. Thanks to the Gottesman–Knill theorem, it is known that some quantum operations (operations in the Clifford group) can be perfectly simulated in polynomial time on a classical computer. In order to achieve universal quantum computation, a quantum computer must be able to perform operations outside this set. Magic state distillation achieves this, in principle, by concentrating the usefulness of imperfect resources, represented by mixed states, into states that are conducive for performing operations that are difficult to simulate classically. A variety of qubit magic state distillation routines and distillation routines for qubits with various advantages have been proposed. == Stabilizer formalism == The Clifford group consists of a set of n {\displaystyle n} -qubit operations generated by the gates {H, S, CNOT} (where H is Hadamard and S is [ 1 0 0 i ] {\displaystyle {\begin{bmatrix}1&0\\0&i\end{bmatrix}}} ) called Clifford gates. The Clifford group generates stabilizer states which can be efficiently simulated classically, as shown by the Gottesman–Knill theorem. This set of gates with a non-Clifford operation is universal for quantum computation. == Magic states == Magic states are purified from n {\displaystyle n} copies of a mixed state ρ {\displaystyle \rho } . These states are typically provided via an ancilla to the circuit. A magic state for the π / 6 {\displaystyle \pi /6} rotation operator is | M ⟩ = cos ⁡ ( β / 2 ) | 0 ⟩ + e i π 4 sin ⁡ ( β / 2 ) | 1 ⟩ {\displaystyle |M\rangle =\cos(\beta /2)|0\rangle +e^{i{\frac {\pi }{4}}}\sin(\beta /2)|1\rangle } where β = arccos ⁡ ( 1 3 ) {\displaystyle \beta =\arccos \left({\frac {1}{\sqrt {3}}}\right)} . A non-Clifford gate can be generated by combining (copies of) magic states with Clifford gates. Since a set of Clifford gates combined with a non-Clifford gate is universal for quantum computation, magic states combined with Clifford gates are also universal. == Purification algorithm for distilling |M〉 == The first magic state distillation algorithm, invented by Sergey Bravyi and Alexei Kitaev, is as follows. Input: Prepare 5 imperfect states. Output: An almost pure state having a small error probability. repeat Apply the decoding operation of the five-qubit error correcting code and measure the syndrome. If the measured syndrome is | 0000 ⟩ {\displaystyle |0000\rangle } , the distillation attempt is successful. else Get rid of the resulting state and restart the algorithm. until The states have been distilled to the desired purity.

    Read more →