AI Assistant Maker

AI Assistant Maker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Army Chief Information Officer/G-6

    Army Chief Information Officer/G-6

    In September 2020, the Army realigned the previously consolidated CIO/G-6 function into two separate roles, Office of the Chief Information Officer and Deputy Chief of Staff, G-6, that report to the secretary of the Army and chief of staff of the Army, respectively. The realignment came after several months of planning and coordination. Lt. Gen. John Morrison was nominated to the Senate for promotion and assignment as the G-6 and confirmed, assuming that position in August 2020. Subsequently, the Secretary of the Army, Ryan McCarthy appointed Dr. Raj G. Iyer as the first civilian Chief Information Officer, a career Senior Executive Service position in November 2020. == G-6 == Advise chief of staff of the Army and the Chief Information Officer on planning, fielding, and execution of C4IT worldwide Army operations Develop and execute the plan for the Unified Network Implement Army information assurance Supervise C4IT, Signal support, Information security, Force structure and equipping activities in support of warfighting operations Oversee management of the Signal forces == Planned realignment == On June 11, 2020, the Army announced that the two roles of CIO and Deputy Chief of Staff, G-6 (DCS, G-6) would be realigned no later than August 31, 2020, with separate individuals responsible for each position. With the realignment: CIO core functions will be policy, governance, and oversight. Focus areas include: Information Environment, Cybersecurity, Enterprise Architecture, and Data Policy/Oversight/Governance, Enterprise Architecture, Enterprise Cloud Management and IT Spend/Category Management. DCS, G-6 core functions will be planning, strategy, and implementation. Focus areas include: Information Environment/Network, Planning and Integration, Theater Synchronization, Architecture Integration, Enterprise Information Environment (EIE) Mission Area Portfolio Management and Mission Decision Packet Management. In order to support multi-domain operations, the Army will have to connect Enterprise networks and tactical networks. —LTG Morrison, DCS, G-6 DCS G-6 released the Army Unified Network Plan under the Army Digital Transformation Strategy, to help the Army to establish a Multi-Domain Operations capable force by 2028. The Unified Network will enable Army formations, as part of the Joint Force, to operate in highly contested and congested operational environments with the speed and global range to achieve decision dominance and maintain overmatch. The plan shapes, synchronizes, integrates and governs Unified Network efforts and aligns the personnel, organizational structure and capabilities required to enable MDO at all echelons. == Chief signal officers and their successors == Chief signal officers (1860–1964) Maj. Albert J. Myer 1860–1863 Lt. Col. William J. L. Nicodemus 1863–1864 Col. Benjamin F. Fisher 1864–1866 Col. Albert J. Myer 1866–1880 (promoted to brigadier general 16 June 1880) Brig. Gen. William B. Hazen 1880–1887 Brig. Gen. Adolphus W. Greely 1887–1906 Brig. Gen. James Allen 1906–1913 Brig. Gen. George P. Scriven 1913–1917 Brig. Gen. George O. Squier 1917–1923 (promoted to major general 6 October 1917) Maj. Gen. Charles McK. Saltzman 1924–1928 Maj. Gen. George Sabin Gibbs 1928–1931 Maj. Gen. Irving J. Carr 1931–1934 Maj. Gen. James B. Allison 1935–1937 Maj. Gen. Joseph O. Mauborgne 1937–1941 Maj. Gen. Dawson Olmstead 1941–1943 Maj. Gen. Harry C. Ingles 1943–1947 Maj. Gen. Spencer B. Akin 1947–1951 Maj. Gen. George I. Back 1951–1955 Lt. Gen. James D. O’Connell 1955–1959 Maj. Gen. Ralph T. Nelson 1959–1962 Maj. Gen. Earle F. Cook 1962–1963 Maj. Gen. David Parker Gibbs 1963–1964 Chiefs of communications-electronics (1964–1967) Maj. Gen. David Parker Gibbs 1964–1966 Maj. Gen. Walter E. Lotz, Jr. 1966–1967 Assistant chiefs of staff for communications-electronics (1967–1974) Maj. Gen. Walter E. Lotz, Jr. 1967–1968 Maj. Gen. George E. Pickett 1968–1972 Lt. Gen. Thomas Rienzi 1972–1974 Directors of telecommunications and command and control (1974–1978) (a directorate of ODCSOPS) Lt. Gen. Thomas Rienzi 1974–1977 Lt. Gen. Charles R. Myer 1977–1978 Assistant chiefs of staff for automation and communications (1978–1981) Lt. Gen. Charles R. Myer 1978–1979 Maj. Gen. Clay T. Buckingham 1979–1981 Assistant deputy chiefs of staff for operations and plans (command, control, communications, and computers) (1981–1984) Maj. Gen. Clay T. Buckingham 1981–1982 Maj. Gen. James M. Rockwell 1982–1984 Assistant chiefs of staff for information management (1984–1987) Lt. Gen. David K. Doyle 1984–1986 Lt. Gen. Thurman D. Rodgers 1986–1987 Directors of information systems for command, control, communications, and computers Lt. Gen. Thurman D. Rodgers 1987–1988 Lt. Gen. Bruce R. Harris 1988–1990 Lt. Gen. Jerome B. Hilmes 1990–1992 Lt. Gen. Peter A. Kind 1992–1994 Lt. Gen. Otto J. Guenther 1995–1997 Lt. Gen. William H. Campbell Chief Information Officer, Military Deputy to the Army Acquisition Executive, and Director of Information Systems for Command, Control, Communications and Computers Lt. Gen. William H. Campbell 1997–2000

    Read more →
  • Knowledge spillover

    Knowledge spillover

    Knowledge spillover is an exchange of ideas among individuals. Knowledge spillover is usually replaced by terminations of technology spillover, R&D spillover and/or spillover (economics) when the concept is specific to technology management and innovation economics. In knowledge management economics, knowledge spillovers are non-rival knowledge market costs incurred by a party not agreeing to assume the costs that has a spillover effect of stimulating technological improvements in a neighbor through one's own innovation. Such innovations often come from specialization within an industry. There are two kinds of knowledge spillovers: internal and external. Internal knowledge spillover occurs if there is a positive impact of knowledge between individuals within an organization that produces goods and/or services. An external knowledge spillover occurs when the positive impact of knowledge is between individuals outside of a production organization. Marshall–Arrow–Romer (MAR) spillovers, Porter spillovers and Jacobs spillovers are three types of spillovers. == Conceptualizations == === Marshall–Arrow–Romer === Marshall–Arrow–Romer (MAR) spillover has its origins in 1890, where the English economist Alfred Marshall developed a theory of knowledge spillovers. Knowledge spillovers later were extended by economists Kenneth Arrow (1962) and Paul Romer (1986). In 1992, Edward Glaeser, Hedi Kallal, José Scheinkman, and Andrei Shleifer pulled together the Marshall–Arrow–Romer views on knowledge spillovers and accordingly named the view MAR spillover in 1992. Under the Marshall–Arrow–Romer (MAR) spillover view, the proximity of firms within a common industry often affects how well knowledge travels among firms to facilitate innovation and growth. The closer the firms are to one another, the greater the MAR spillover. The exchange of ideas is largely from employee to employee, in that employees from different firms in an industry exchange ideas about new products and new ways to produce goods. The opportunity to exchange ideas that lead to innovations key to new products and improved production methods. Research on the Cambridge IT Cluster (UK) suggests that technological knowledge spillovers might only happen rarely and are less important than other cluster benefits such as labour market pooling. === Porter === Porter (1990), like MAR, argues that knowledge spillovers in specialized, geographically concentrated industries stimulate growth. He insists, however, that local competition, as opposed to local monopoly, fosters the pursuit and rapid adoption of innovation. He gives examples of Italian ceramics and gold jewellery industries, in which hundreds of firms are located together and fiercely compete to innovate since the alternative to innovation is demise. Porter's externalities are maximized in cities with geographically specialized, competitive industries. === Jacobs === Under the Jacobs spillover view, the proximity of firms from different industries affect how well knowledge travels among firms to facilitate innovation and growth. This is in contrast to MAR spillovers, which focus on firms in a common industry. The diverse proximity of a Jacobs spillover brings together ideas among individuals with different perspectives to encourage an exchange of ideas and foster innovation in an industrially diverse environment. Developed in 1969 by urbanist Jane Jacobs and John Jackson the concept that Detroit’s shipbuilding industry from the 1830s was the critical antecedent leading to the 1890s development of the auto industry in Detroit since the gasoline engine firms easily transitioned from building gasoline engines for ships to building them for automobiles. == Incoming and outgoing spillovers == Knowledge spillover has asymmetric directions. The focal entity and receives or outflows know-how to others, creating incoming and outgoing spillovers. Cassiman and Veugelers (2002) use survey data and estimate incoming and outgoing spillover and study the economic impacts. Incoming spillover increases growth opportunity and productivity improvements of receivers, while outgoing spillover leads to free rider problem in the technology competition. Chen et al. (2013) use econometric method to gauge incoming spillover, a way that applies for all companies without survey. They find that incoming spillover explains R&D profits of industrial firms. == Policy implications == As information is largely non-rival in nature, certain measures must be taken to ensure that, for the originator, the information remains a private asset. As the market cannot do this efficiently, public regulations have been implemented to facilitate a more appropriate equilibrium. As a result, the concept of intellectual property rights have developed and ensure the ability of entrepreneurs to temporarily hold on to the profitability of their ideas through patents, copyrights, trade secrets, and other governmental safeguards. Conversely, such barriers to entry prevent the exploitation of informational developments by rival firms within an industry. For example, Wang (2023) indicates that technology spillovers are reduced by 27% to 51% when trade secrets laws are implemented by the Uniform Trade Secrets Act in the US. On the other hand, when the research and development of a private firm results in a social benefit, unaccounted for within the market price, often greater than the private return of the firm's research, then a subsidy to offset the underproduction of that benefit might be offered to the firm in return for its continued output of that benefit. Government subsidies are often controversial, and while they might often result in a more appropriate social equilibrium, they could also lead to undesirable political repercussions as such a subsidy must come from taxpayers, some of whom may not directly benefit from the researching firm's subsidized knowledge spillover. The concept of knowledge spillover is also used to justify subsidies to foreign direct investment, as foreign investors help diffuse technology among local firms. == Examples == Business parks are a good specific example of concentrated businesses that may benefit from MAR spillover. Many semiconductor firms intentionally located their research and development facilities in Silicon Valley to take advantage of MAR spillover. In addition, the film industry in Los Angeles, California, and elsewhere relies on a geographic concentration of specialists (directors, producers, scriptwriters, and set designers) to bring together narrow aspects of movie-making into a final product. A general example of a knowledge spillover could be the collective growth associated with the research and development of online social networking tools like Facebook, YouTube, and Twitter. Such tools have not only created a positive feedback loop, and a host of originally unintended benefits for their users, but have also created an explosion of new software, programming platforms, and conceptual breakthroughs that have perpetuated the development of the industry as a whole. The advent of online marketplaces, the utilization of user profiles, the widespread democratization of information, and the interconnectivity between tools within the industry have all been products of each tool's individual developments. These developments have since spread outside the industry into the mainstream media as news and entertainment firms have developed their own market feedback applications within the tools themselves, and their own versions of online networking tools (e.g. CNN’s iReport).

    Read more →
  • Lai–Robbins lower bound

    Lai–Robbins lower bound

    The Lai–Robbins lower bound gives an asymptotic lower bound on the regret that any uniformly good algorithm must incur in the stochastic multi-armed bandit problem. The original result was proved by Tze Leung Lai and Herbert Robbins in 1985 for parametric exponential families. Later work extended the statement to more general classes of distributions. == Multi-armed bandit problem == The multi-armed bandit problem (MAB) is a sequential game in which the player must trade off exploration (to learn) and exploitation (to earn). The player chooses among K {\displaystyle K} actions (arms) with unknown distributions ν = ( ν 1 , … , ν K ) {\displaystyle \nu =(\nu _{1},\dots ,\nu _{K})} . The player is assumed to know a class of distributions D {\displaystyle {\mathcal {D}}} such that for every k {\displaystyle k} one has ν k ∈ D {\displaystyle \nu _{k}\in {\mathcal {D}}} (for example, D {\displaystyle {\mathcal {D}}} may be the family of Gaussian or Bernoulli distributions). At each round t = 1 , … , T {\displaystyle t=1,\dots ,T} the player selects (pulls) an arm a t {\displaystyle a_{t}} and observes a reward X t ∼ ν a t {\displaystyle X_{t}\sim \nu _{a_{t}}} . We denote N a ( t ) := ∑ s = 1 t 1 { a s = a } {\displaystyle N_{a}(t):=\sum _{s=1}^{t}\mathbf {1} _{\{a_{s}=a\}}} the number of times arm a {\displaystyle a} has been pulled in the first t {\displaystyle t} rounds, μ ( ν ) := ( μ 1 , … , μ K ) {\displaystyle \mu (\nu ):=(\mu _{1},\dots ,\mu _{K})} the vector of arm means, where μ k = E X ∼ ν k [ X ] {\displaystyle \mu _{k}=\mathbb {E} _{X\sim \nu _{k}}[X]} , μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} the highest mean Δ a := μ ∗ − μ a ≥ 0 {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}\geq 0} the gap of arm a {\displaystyle a} . An arm a {\displaystyle a} with μ a = μ ∗ {\displaystyle \mu _{a}=\mu ^{}} is called an optimal arm; otherwise it is a suboptimal arm. The goal is to minimize the regret at horizon T {\displaystyle T} , defined by R T := ∑ a = 1 K Δ a E [ N a ( T ) ] . {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\,\mathbb {E} [N_{a}(T)].} Intuitively, the regret is the (expected) total loss compared to always playing an optimal arm: regret = ∑ a ( cost of playing a ) × ( times a is played ) . {\displaystyle {\text{regret}}=\sum _{a}\ ({\text{cost of playing }}a)\times ({\text{times }}a{\text{ is played}}).} An MAB algorithm is a (possibly randomized) policy that, at each round t {\displaystyle t} , choose an arm a_t by using the observations received from previous turns. === Intuitive example === Suppose a farmer must choose, each year, one of K {\displaystyle K} seed varieties to plant. Each variety k {\displaystyle k} has an unknown average yield μ k {\displaystyle \mu _{k}} . If the farmer knew the best variety (with mean μ ∗ {\displaystyle \mu ^{}} ) he would plant it every year; in reality he must try varieties to learn which is best. The cumulative regret after T {\displaystyle T} years measures the total expected loss in yield due to imperfect knowledge. Remarks The model above is the stochastic MAB; there also exist adversarial variants. One may consider a fixed-horizon setting (known T {\displaystyle T} ) or an anytime setting (unknown T {\displaystyle T} ). == Lai–Robbins lower bound == The theorem gives the right amount of time we should pull a suboptimal arm k {\displaystyle k} to distinguish whether we are in the instance with ν k {\displaystyle \nu _{k}} or with ν ~ k {\displaystyle {\tilde {\nu }}_{k}} where ν ~ k {\displaystyle {\tilde {\nu }}_{k}} is such that μ ~ k > μ ∗ {\displaystyle {\tilde {\mu }}_{k}>\mu ^{}} . Knowning a lower bound on the number of pull of every suboptimal arm gives a lower bound on the regret as only suboptimal arms contribute to the regret. Before stating the formal theorem we need to define what is a consistent algorithm. === Consistency (uniformly good algorithms) === Let D {\displaystyle {\mathcal {D}}} be a class of probability distributions and consider K {\displaystyle K} arms with reward distributions ν = ( ν 1 , … , ν K ) ∈ D K {\displaystyle \nu =(\nu _{1},\dots ,\nu _{K})\in {\mathcal {D}}^{K}} . An algorithm is said to be consistent (also called uniformly good) on D K {\displaystyle {\mathcal {D}}^{K}} if, for every instance ν ∈ D K {\displaystyle \nu \in {\mathcal {D}}^{K}} , the expected regret R T ( ν ) {\displaystyle R_{T}(\nu )} grows subpolynomially: ∀ α > 0 , R T ( ν ) = o ( T α ) as T → ∞ {\displaystyle \forall \alpha >0,\qquad R_{T}(\nu )=o(T^{\alpha })\quad {\text{as }}T\to \infty } This assumption excludes algorithms that perform well on some instances but incur linear regret on others. === Formal lower bound === For any suboptimal arm a {\displaystyle a} . For a distribution ν a ∈ D {\displaystyle \nu _{a}\in {\mathcal {D}}} and a threshold x {\displaystyle x} , define K inf ( ν a , x , D ) := inf { KL ⁡ ( ν a , ν ′ ) : ν ′ ∈ D , μ ′ > x } {\displaystyle {\mathcal {K}}_{\inf }(\nu _{a},x,{\mathcal {D}}):=\inf {\Bigl \{}\operatorname {KL} (\nu _{a},\nu '):\nu '\in {\mathcal {D}},\ \mu '>x{\Bigr \}}} where KL ⁡ ( ⋅ , ⋅ ) {\displaystyle \operatorname {KL} (\cdot ,\cdot )} denotes the Kullback-Leibler divergence. Then, for any algorithm consistent on D K {\displaystyle {\mathcal {D}}^{K}} and for every instance ν ∈ D K {\displaystyle \nu \in {\mathcal {D}}^{K}} , every suboptimal arm a {\displaystyle a} satisfies E ν [ N a ( T ) ] ≥ ln ⁡ T K inf ( ν a , μ ∗ , D ) + o ( ln ⁡ T ) {\displaystyle \mathbb {E} _{\nu }[N_{a}(T)]\geq {\frac {\ln T}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{},{\mathcal {D}})}}+o(\ln T)} Consequently, the regret satisfies R T ( ν ) ≥ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ , D ) ) ln ⁡ T + o ( ln ⁡ T ) {\displaystyle R_{T}(\nu )\geq \left(\sum _{a:\,\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{},{\mathcal {D}})}}\right)\ln T+o(\ln T)} The original 1985 paper established this result for exponential families; later work showed that the bound holds under much weaker assumptions on D {\displaystyle {\mathcal {D}}} . === Intuition === Consistency imposes that, for every ν {\displaystyle \nu } , the number of pulls of an optimal arm must be large. This means that μ ∗ {\displaystyle \mu ^{}} is estimated very accurately. The goal is to determine, for a suboptimal arm k {\displaystyle k} , how many samples are needed to be confident, with the appropriate level of confidence, that μ k < μ ∗ {\displaystyle \mu _{k}<\mu ^{}} . To do so, we use what is called the most confusing instance: an instance close to ν {\displaystyle \nu } such that arm k {\displaystyle k} is optimal. We define it as ν ~ {\displaystyle {\tilde {\nu }}} such that, for all a ≠ k {\displaystyle a\neq k} , ν ~ a = ν a {\displaystyle {\tilde {\nu }}_{a}=\nu _{a}} , and ν ~ k {\displaystyle {\tilde {\nu }}_{k}} is chosen so that μ ~ k > μ ∗ {\displaystyle {\tilde {\mu }}_{k}>\mu ^{}} . The objective is to determine how many samples of arm k {\displaystyle k} are required to distinguish whether we are in the instance with ν k {\displaystyle \nu _{k}} or with ν ~ k {\displaystyle {\tilde {\nu }}_{k}} in terms of KL {\displaystyle \operatorname {KL} } distance. == Algorithms achieving the Lai–Robbins lower bound == Several algorithms are known to achieve the Lai–Robbins asymptotic lower bound under specific assumptions on the reward distribution class D {\displaystyle {\mathcal {D}}} . The following list summarizes a non-exhaustive list of algorithms matching the lower bound. == Extension to other problems == === Structured bandit === A more complexe is structured bandit where we know that the mean of each arm is in a set with some restriction. In this case we can prove a smaller lower bound that use the knowledge of this set. === Best arm identification (BAI) === A similar result has been proved for best arm identification, which is the same game except that, instead of minimizing the regret, the goal is to identify the best arm with probability 1 − δ {\displaystyle 1-\delta } using as few rounds as possible. === Reinforcement Learning (RL) === Similar results have been proved for regret minimization in average-reward reinforcement learning. The order is also ln ⁡ T {\displaystyle \ln T} , with a constant that depends on the problem.

    Read more →
  • Long division

    Long division

    In arithmetic, long division is a standard division algorithm suitable for dividing multi-digit numbers that is simple enough to perform by hand. It breaks down a division problem into a series of easier steps. As in all division problems, one number, called the dividend, is divided by another, called the divisor, producing a result called the quotient. It enables computations involving arbitrarily large numbers to be performed by following a series of simple steps. The abbreviated form of long division is called short division, which is almost always used instead of long division when the divisor has only one digit. == History == Related algorithms have existed since the 12th century. Al-Samawal al-Maghribi (1125–1174) performed calculations with decimal numbers that essentially require long division, leading to infinite decimal results, but without formalizing the algorithm. Caldrini (1491) is the earliest printed example of long division, known as the Danda method in medieval Italy, and it became more practical with the introduction of decimal notation for fractions by Pitiscus (1608). The specific algorithm in modern use was introduced by Henry Briggs c. 1600. == Education == Inexpensive calculators and computers have become the most common tools for performing division in educational and professional contexts worldwide, reducing reliance on traditional paper-and-pencil techniques. Internally, these devices implement various division algorithms, many of which rely on iterative approximations and multiplication to improve computational efficiency. Educational approaches to teaching division vary across countries and regions, reflecting differing curricular priorities. In North America, long division has been de-emphasized or, in some cases, removed from portions of the curriculum as part of reform mathematics, which emphasizes conceptual understanding and the use of technology. In contrast, many education systems in Europe and Asia continue to emphasize mastery of standard algorithms, including long division, as a foundational arithmetic skill. For example, curricula in countries such as Japan and Germany typically introduce and reinforce long division during primary education, often alongside mental arithmetic strategies and problem-solving techniques. International assessments such as the Trends in International Mathematics and Science Study (TIMSS) highlight these differences, showing variation in how procedural fluency and conceptual understanding are balanced across educational systems. These differing approaches reflect broader educational philosophies regarding the balance between procedural fluency, conceptual understanding, and the role of technology in mathematics education. == Method == In English-speaking countries, long division does not use the division slash ⟨∕⟩ or division sign ⟨÷⟩ symbols but instead constructs a tableau. The divisor is separated from the dividend by a right parenthesis ⟨)⟩ or vertical bar ⟨|⟩; the dividend is separated from the quotient by a vinculum (i.e., an overbar). The combination of these two symbols is sometimes known as a long division symbol, division bracket, or even a bus stop. It developed in the 18th century from an earlier single-line notation separating the dividend from the quotient by a left parenthesis. The process is begun by dividing the left-most digit of the dividend by the divisor. The quotient (rounded down to an integer) becomes the first digit of the result, and the remainder is calculated (this step is notated as a subtraction). This remainder carries forward when the process is repeated on the following digit of the dividend (notated as 'bringing down' the next digit to the remainder). When all digits have been processed and no remainder is left, the process is complete. An example is shown below, representing the division of 500 by 4 (with a result of 125). 125 (Explanations) 4)500 4 ( 4 × 1 = 4) 10 ( 5 - 4 = 1) 8 ( 4 × 2 = 8) 20 (10 - 8 = 2) 20 ( 4 × 5 = 20) 0 (20 - 20 = 0) A more detailed breakdown of the steps goes as follows: Find the shortest sequence of digits starting from the left end of the dividend, 500, that the divisor 4 goes into at least once. In this case, this is simply the first digit, 5. The largest number that the divisor 4 can be multiplied by without exceeding 5 is 1, so the digit 1 is put above the 5 to start constructing the quotient. Next, the 1 is multiplied by the divisor 4, to obtain the largest whole number that is a multiple of the divisor 4 without exceeding the 5 (4 in this case). This 4 is then placed under and subtracted from the 5 to get the remainder, 1, which is placed under the 4 under the 5. Afterwards, the first as-yet unused digit in the dividend, in this case the first digit 0 after the 5, is copied directly underneath itself and next to the remainder 1, to form the number 10. At this point the process is repeated enough times to reach a stopping point: The largest number by which the divisor 4 can be multiplied without exceeding 10 is 2, so 2 is written above as the second leftmost quotient digit. This 2 is then multiplied by the divisor 4 to get 8, which is the largest multiple of 4 that does not exceed 10; so 8 is written below 10, and the subtraction 10 minus 8 is performed to get the remainder 2, which is placed below the 8. The next digit of the dividend (the last 0 in 500) is copied directly below itself and next to the remainder 2 to form 20. Then the largest number by which the divisor 4 can be multiplied without exceeding 20, which is 5, is placed above as the third leftmost quotient digit. This 5 is multiplied by the divisor 4 to get 20, which is written below and subtracted from the existing 20 to yield the remainder 0, which is then written below the second 20. At this point, since there are no more digits to bring down from the dividend and the last subtraction result was 0, we can be assured that the process finished. If the last remainder when we ran out of dividend digits had been something other than 0, there would have been two possible courses of action: We could just stop there and say that the dividend divided by the divisor is the quotient written at the top with the remainder written at the bottom, and write the answer as the quotient followed by a fraction that is the remainder divided by the divisor. We could extend the dividend by writing it as, say, 500.000... and continue the process (using a decimal point in the quotient directly above the decimal point in the dividend), in order to get a decimal answer, as in the following example. 31.75 4)127.00 12 (12 ÷ 4 = 3) 07 (0 remainder, bring down next figure) 4 (7 ÷ 4 = 1 r 3) 3.0 (bring down 0 and the decimal point) 2.8 (7 × 4 = 28, 30 ÷ 4 = 7 r 2) 20 (an additional zero is brought down) 20 (5 × 4 = 20) 0 In this example, the decimal part of the result is calculated by continuing the process beyond the units digit, "bringing down" zeros as being the decimal part of the dividend. This example also illustrates that, at the beginning of the process, a step that produces a zero can be omitted. Since the first digit 1 is less than the divisor 4, the first step is instead performed on the first two digits 12. Similarly, if the divisor were 13, one would perform the first step on 127 rather than 12 or 1. === Basic procedure for long division of n ÷ m === Find the location of all decimal points in the dividend n and divisor m. If necessary, simplify the long division problem by moving the decimals of the divisor and dividend by the same number of decimal places, to the right (or to the left), so that the decimal of the divisor is to the right of the last digit. When doing long division, keep the numbers lined up straight from top to bottom under the tableau. After each step, be sure the remainder for that step is less than the divisor. If it is not, there are three possible problems: the multiplication is wrong, the subtraction is wrong, or a greater quotient is needed. In the end, the remainder, r, is added to the growing quotient as a fraction, r⁄m. === Invariant property and correctness === The basic presentation of the steps of the process (above) focuses on what steps are to be performed, rather than the properties of those steps that ensure the result will be correct (specifically, that q × m + r = n, where q is the final quotient and r the final remainder). A slight variation of presentation requires more writing, and requires that we change, rather than just update, digits of the quotient, but can shed more light on why these steps actually produce the right answer by allowing evaluation of q × m + r at intermediate points in the process. This illustrates the key property used in the derivation of the algorithm (below). Specifically, we amend the above basic procedure so that we fill the space after the digits of the quotient under construction with 0's, to at least the 1's place, and include those 0's in the numbers we write below the division bra

    Read more →
  • Neural scaling law

    Neural scaling law

    In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

    Read more →
  • Run-time algorithm specialization

    Run-time algorithm specialization

    In computer science, run-time algorithm specialization is a methodology for creating efficient algorithms for costly computation tasks of certain kinds. The methodology originates in the field of automated theorem proving and, more specifically, in the Vampire theorem prover project. The idea is inspired by the use of partial evaluation in optimising program translation. Many core operations in theorem provers exhibit the following pattern. Suppose that we need to execute some algorithm a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} in a situation where a value of A {\displaystyle A} is fixed for potentially many different values of B {\displaystyle B} . In order to do this efficiently, we can try to find a specialization of a l g {\displaystyle {\mathit {alg}}} for every fixed A {\displaystyle A} , i.e., such an algorithm a l g A {\displaystyle {\mathit {alg}}_{A}} , that executing a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is equivalent to executing a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} . The specialized algorithm may be more efficient than the generic one, since it can exploit some particular properties of the fixed value A {\displaystyle A} . Typically, a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} can avoid some operations that a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} would have to perform, if they are known to be redundant for this particular parameter A {\displaystyle A} . In particular, we can often identify some tests that are true or false for A {\displaystyle A} , unroll loops and recursion, etc. == Difference from partial evaluation == The key difference between run-time specialization and partial evaluation is that the values of A {\displaystyle A} on which a l g {\displaystyle {\mathit {alg}}} is specialised are not known statically, so the specialization takes place at run-time. There is also an important technical difference. Partial evaluation is applied to algorithms explicitly represented as codes in some programming language. At run-time, we do not need any concrete representation of a l g {\displaystyle {\mathit {alg}}} . We only have to imagine a l g {\displaystyle {\mathit {alg}}} when we program the specialization procedure. All we need is a concrete representation of the specialized version a l g A {\displaystyle {\mathit {alg}}_{A}} . This also means that we cannot use any universal methods for specializing algorithms, which is usually the case with partial evaluation. Instead, we have to program a specialization procedure for every particular algorithm a l g {\displaystyle {\mathit {alg}}} . An important advantage of doing so is that we can use some powerful ad hoc tricks exploiting peculiarities of a l g {\displaystyle {\mathit {alg}}} and the representation of A {\displaystyle A} and B {\displaystyle B} , which are beyond the reach of any universal specialization methods. == Specialization with compilation == The specialized algorithm has to be represented in a form that can be interpreted. In many situations, usually when a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is to be computed on many values of B {\displaystyle B} in a row, a l g A {\displaystyle {\mathit {alg}}_{A}} can be written as machine code instructions for a special abstract machine, and it is typically said that A {\displaystyle A} is compiled. The code itself can then be additionally optimized by answer-preserving transformations that rely only on the semantics of instructions of the abstract machine. The instructions of the abstract machine can usually be represented as records. One field of such a record, an instruction identifier (or instruction tag), would identify the instruction type, e.g. an integer field may be used, with particular integer values corresponding to particular instructions. Other fields may be used for storing additional parameters of the instruction, e.g. a pointer field may point to another instruction representing a label, if the semantics of the instruction require a jump. All instructions of the code can be stored in a traversable data structure such as an array, linked list, or tree. Interpretation (or execution) proceeds by fetching instructions in some order, identifying their type, and executing the actions associated with said type. In many programming languages, such as C and C++, a simple switch statement may be used to associate actions with different instruction identifiers. Modern compilers usually compile a switch statement with constant (e.g. integer) labels from a narrow range by storing the address of the statement corresponding to a value i {\displaystyle i} in the i {\displaystyle i} -th cell of a special array, as a means of efficient optimisation. This can be exploited by taking values for instruction identifiers from a small interval of values. == Data-and-algorithm specialization == There are situations when many instances of A {\displaystyle A} are intended for long-term storage and the calls of a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} occur with different B {\displaystyle B} in an unpredictable order. For example, we may have to check a l g ( A 1 , B 1 ) {\displaystyle {\mathit {alg}}(A_{1},B_{1})} first, then a l g ( A 2 , B 2 ) {\displaystyle {\mathit {alg}}(A_{2},B_{2})} , then a l g ( A 1 , B 3 ) {\displaystyle {\mathit {alg}}(A_{1},B_{3})} , and so on. In such circumstances, full-scale specialization with compilation may not be suitable due to excessive memory usage. However, we can sometimes find a compact specialized representation A ′ {\displaystyle A^{\prime }} for every A {\displaystyle A} , that can be stored with, or instead of, A {\displaystyle A} . We also define a variant a l g ′ {\displaystyle {\mathit {alg}}^{\prime }} that works on this representation and any call to a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} is replaced by a l g ′ ( A ′ , B ) {\displaystyle {\mathit {alg}}^{\prime }(A^{\prime },B)} , intended to do the same job faster.

    Read more →
  • Least-squares spectral analysis

    Least-squares spectral analysis

    Least-squares spectral analysis (LSSA) is a class of methods for estimating a frequency spectrum by fitting sinusoids to data using a least-squares fit. Unlike Fourier analysis, the most widely used spectral method in science, data need not be equally spaced to use LSSA. Furthermore, while Fourier analysis generally amplifies long-period noise in long or gapped records, LSSA mitigates such problems. The first strictly least-squares LSSA method was developed in 1969 and 1971, and is known as the Vaníček method or the Gauss–Vaniček method, after its inventor Petr Vaníček and Carl Friedrich Gauss, the inventor of the least-squares method for error minimization. A widely known LSSA variant is the Lomb method or the Lomb–Scargle periodogram, based on dated computational simplifications of the Vaníček method introduced in the 1970s and 1980s, first by Nicholas R. Lomb and later by Jeffrey D. Scargle. Other LSSA variants have been subsequently developed. == Historical background == The close connections between Fourier analysis, the periodogram, and the least-squares fitting of sinusoids have been known for a long time. However, most developments are restricted to complete data sets of equally spaced samples. In 1963, Freek J. M. Barning of Mathematisch Centrum, Amsterdam, handled unequally spaced data by similar techniques, including both a periodogram analysis equivalent to what nowadays is called the Lomb method and least-squares fitting of selected frequencies of sinusoids determined from such periodograms — and connected by a procedure known today as the matching pursuit with post-back fitting or the orthogonal matching pursuit. Petr Vaníček, a Canadian geophysicist and geodesist of the University of New Brunswick, proposed in 1969 also the matching-pursuit approach for equally and unequally spaced data, which he called "successive spectral analysis" and the result a "least-squares periodogram". He generalized this method to account for any systematic components beyond a simple mean, such as a "predicted linear (quadratic, exponential, ...) secular trend of unknown magnitude", and applied it to a variety of samples, in 1971. Vaníček's strictly least-squares method was then simplified in 1976 by Nicholas R. Lomb of the University of Sydney, who pointed out its close connection to periodogram analysis. Subsequently, the definition of a periodogram of unequally spaced data was modified and analyzed by Jeffrey D. Scargle of NASA Ames Research Center, who showed that, with minor changes, it becomes identical to Lomb's least-squares formula for fitting individual sinusoid frequencies. Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out regarding least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent." Press summarizes the development this way: A completely different method of spectral analysis for unevenly sampled data, one that mitigates these difficulties and has some other very desirable properties, was developed by Lomb, based in part on earlier work by Barning and Vanicek, and additionally elaborated by Scargle. In 1989, Michael J. Korenberg of Queen's University in Kingston, Ontario, developed the "fast orthogonal search" method of more quickly finding a near-optimal decomposition of spectra or other problems, similar to the technique that later became known as the orthogonal matching pursuit. == Development of LSSA and variants == === The Vaníček method === In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of progressively determined frequencies using a standard linear regression or least-squares fit. The frequencies are chosen using a method similar to Barning's, but going further in optimizing the choice of each successive new frequency by picking the frequency that minimizes the residual after least-squares fitting (equivalent to the fitting technique now known as matching pursuit with pre-backfitting). The number of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids). The relationship between the DFT and the approximation of trigonometric functions using the least-squares method is well explained in (Strutz, 2017). A data vector Φ is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix A by evaluating each function at the sample times, with weight vector x: ϕ ≈ A x , {\displaystyle \phi \approx {\textbf {A}}x,} where the weights vector x is chosen to minimize the sum of squared errors in approximating Φ. The solution for x is closed-form, using standard linear regression: x = ( A T A ) − 1 A T ϕ . {\displaystyle x=({\textbf {A}}^{\mathrm {T} }{\textbf {A}})^{-1}{\textbf {A}}^{\mathrm {T} }\phi .} Here the matrix A can be based on any set of functions mutually independent (not necessarily orthogonal) when evaluated at the sample times; functions used for spectral analysis are typically sines and cosines evenly distributed over the frequency range of interest. If we choose too many frequencies in a too-narrow frequency range, the functions will be insufficiently independent, the matrix ill-conditioned, and the resulting spectrum meaningless. When the basis functions in A are orthogonal (that is, not correlated, meaning the columns have zero pair-wise dot products), the matrix ATA is diagonal; when the columns all have the same power (sum of squares of elements), then that matrix is an identity matrix times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and sinusoids chosen as sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycles per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This case is known as the discrete Fourier transform, slightly rewritten in terms of measurements and coefficients. x = A T ϕ {\displaystyle x={\textbf {A}}^{\mathrm {T} }\phi } — DFT case for N equally spaced samples and frequencies, within a scalar factor. === The Lomb method === Trying to lower the computational burden of the Vaníček method in 1976 (no longer an issue), Lomb proposed using the above simplification in general, except for pair-wise correlations between sine and cosine bases of the same frequency, since the correlations between pairs of sinusoids are often small, at least when they are not tightly spaced. This formulation is essentially that of the traditional periodogram but adapted for use with unevenly spaced samples. The vector x is a reasonably good estimate of an underlying spectrum, but since we ignore any correlations, Ax is no longer a good approximation to the signal, and the method is no longer a least-squares method — yet in the literature continues to be referred to as such. Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified the standard periodogram formula so to find a time delay τ {\displaystyle \tau } first, such that this pair of sinusoids would be mutually orthogonal at sample times t j {\displaystyle t_{j}} and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequency. This procedure made his modified periodogram method exactly equivalent to Lomb's method. Time delay τ {\displaystyle \tau } by definition equals to tan ⁡ 2 ω τ = ∑ j sin ⁡ 2 ω t j ∑ j cos ⁡ 2 ω t j . {\displaystyle \tan {2\omega \tau }={\frac {\sum _{j}\sin 2\omega t_{j}}{\sum _{j}\cos 2\omega t_{j}}}.} Then the periodogram at frequency ω {\displaystyle \omega } is estimated as: P x ( ω ) = 1 2 [ [ ∑ j X j cos ⁡ ω ( t j − τ ) ] 2 ∑ j cos 2 ⁡ ω ( t j − τ ) + [ ∑ j X j sin ⁡ ω ( t j − τ ) ] 2 ∑ j sin 2 ⁡ ω ( t j − τ ) ] , {\displaystyle P_{x}(\omega )={\frac {1}{2}}\left[{\frac {\left[\sum _{j}X_{j}\cos \omega (t_{j}-\tau )\right]^{2}}{\sum _{j}\cos ^{2}\omega (t_{j}-\tau )}}+{\frac {\left[\sum _{j}X_{j}\sin \omega (t_{j}-\tau )\right]^{2}}{\sum _{j}\sin ^{2}\omega (t_{j}-\tau )}}\right],} which, as Scargle reports, has the same statistical distribution as the periodogram in the evenly sampled case. At any individual frequency ω {\displaystyle \omega } , this method gives the same power as does a least-squares fit to sinusoids of that frequency and of the form: ϕ ( t ) = A sin ⁡ ω t + B cos ⁡ ω t . {\displaystyle \phi (t)=A\sin \omega t+B\cos \omega t.} In practice, it is always difficult to judge if a given Lomb peak is significant or not, especially when the nature of the noise is unknown, so for example a false-alarm spectr

    Read more →
  • List of artificial intelligence projects

    List of artificial intelligence projects

    The following is a list of current and past, non-classified notable artificial intelligence projects. == Specialized projects == === Brain-inspired === Blue Brain Project, an attempt to create a synthetic brain by reverse-engineering the mammalian brain down to the molecular level. Google Brain, a deep learning project part of Google X attempting to have intelligence similar or equal to human-level. Human Brain Project, ten-year scientific research project, based on exascale supercomputers. === Cognitive architectures === 4CAPS, developed at Carnegie Mellon University under Marcel A. Just ACT-R, developed at Carnegie Mellon University under John R. Anderson. AIXI, Universal Artificial Intelligence developed by Marcus Hutter at IDSIA and ANU. CALO, a DARPA-funded, 25-institution effort to integrate many artificial intelligence approaches (natural language processing, speech recognition, machine vision, probabilistic logic, planning, reasoning, many forms of machine learning) into an AI assistant that learns to help manage your office environment. CHREST, developed under Fernand Gobet at Brunel University and Peter C. Lane at the University of Hertfordshire. CLARION, developed under Ron Sun at Rensselaer Polytechnic Institute and University of Missouri. CoJACK, an ACT-R inspired extension to the JACK multi-agent system that adds a cognitive architecture to the agents for eliciting more realistic (human-like) behaviors in virtual environments. Copycat, by Douglas Hofstadter and Melanie Mitchell at the Indiana University. DUAL, developed at the New Bulgarian University under Boicho Kokinov. FORR developed by Susan L. Epstein at The City University of New York. IDA and LIDA, implementing Global Workspace Theory, developed under Stan Franklin at the University of Memphis. OpenCog Prime, developed using the OpenCog Framework. Procedural Reasoning System (PRS), developed by Michael Georgeff and Amy L. Lansky at SRI International. Psi-Theory developed under Dietrich Dörner at the Otto-Friedrich University in Bamberg, Germany. Soar, developed under Allen Newell and John Laird at Carnegie Mellon University and the University of Michigan. Society of Mind and its successor The Emotion Machine proposed by Marvin Minsky. Subsumption architectures, developed e.g. by Rodney Brooks (though it could be argued whether they are cognitive). === Games === AlphaGo, software developed by Google that plays the Chinese board game Go. Chinook, a computer program that plays English draughts; the first to win the world champion title in the competition against humans. Deep Blue, a chess-playing computer developed by IBM which beat Garry Kasparov in 1997. Halite, an artificial intelligence programming competition created by Two Sigma in 2016. Libratus, a poker AI that beat world-class poker players in 2017, intended to be generalisable to other applications. The Matchbox Educable Noughts and Crosses Engine (sometimes called the Machine Educable Noughts and Crosses Engine or MENACE) was a mechanical computer made from 304 matchboxes designed and built by artificial intelligence researcher Donald Michie in 1961. Quick, Draw!, an online game developed by Google that challenges players to draw a picture of an object or idea and then uses a neural network to guess what the drawing is. The Samuel Checkers-playing Program (1959) was among the world's first successful self-learning programs, and as such a very early demonstration of the fundamental concept of artificial intelligence (AI). Stockfish AI, an open source chess engine currently ranked the highest in many computer chess rankings. TD-Gammon, a program that learned to play world-class backgammon partly by playing against itself (temporal difference learning with neural networks). === Internet activism === Serenata de Amor, project for the analysis of public expenditures and detect discrepancies. === Knowledge and reasoning === Alice (Microsoft), a project from Microsoft Research Lab aimed at improving decision-making in Economics Braina, an intelligent personal assistant application with a voice interface for Windows OS. Cyc, an attempt to assemble an ontology and database of everyday knowledge, enabling human-like reasoning. Eurisko, a language by Douglas Lenat for solving problems which consists of heuristics, including some for how to use and change its heuristics. Google Now, an intelligent personal assistant with a voice interface in Google's Android and Apple Inc.'s iOS, as well as Google Chrome web browser on personal computers. Holmes a new AI created by Wipro. Microsoft Cortana, an intelligent personal assistant with a voice interface in Microsoft's various Windows 10 editions. MindsDB, is an AI automation platform for building AI/ML powered features and applications. Mycin, an early medical expert system. Open Mind Common Sense, a project based at the MIT Media Lab to build a large common sense knowledge base from online contributions. Siri, an intelligent personal assistant and knowledge navigator with a voice-interface in Apple Inc.'s iOS and macOS. SNePS, simultaneously a logic-based, frame-based, and network-based knowledge representation, reasoning, and acting system. Viv (software), a new AI by the creators of Siri. Wolfram Alpha, an online service that answers queries by computing the answer from structured data. === Motion and manipulation === AIBO, the robot pet for the home, grew out of Sony's Computer Science Laboratory (CSL). Cog, a robot developed by MIT to study theories of cognitive science and artificial intelligence, now discontinued. === Music === Melomics, a bioinspired technology for music composition and synthesization of music, where computers develop their own style, rather than mimic musicians. === Natural language processing === AIML, an XML dialect for creating natural language software agents. Apache Lucene, a high-performance, full-featured text search engine library written entirely in Java. Apache OpenNLP, a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking and parsing. Artificial Linguistic Internet Computer Entity (A.L.I.C.E.), a natural language processing chatterbot. ChatGPT, a chatbot built on top of OpenAI's GPT-3.5 and GPT-4 family of large language models. Claude, a family of large language models developed by Anthropic and launched in 2023. Claude LLMs achieved high coding scores in several recognized LLM benchmarks. Cleverbot, successor to Jabberwacky, now with 170m lines of conversation, Deep Context, fuzziness and parallel processing. Cleverbot learns from around 2 million user interactions per month. DeepSeek: Chinese chatbot funded by hedge fund High-Flyer. DBRX, 136 billion parameter open sourced large language model developed by Mosaic ML and Databricks. ELIZA, a famous 1966 computer program by Joseph Weizenbaum, which parodied person-centered therapy. FreeHAL, a self-learning conversation simulator (chatterbot) which uses semantic nets to organize its knowledge to imitate a very close human behavior within conversations. Gemini, a family of multimodal large language model developed by Google's DeepMind. Drives the Gemini chatbot, formerly known as Bard. GigaChat, a chatbot by Russian Sberbank. GPT-3, a 2020 language model developed by OpenAI that can produce text difficult to distinguish from that written by a human. Jabberwacky, a chatbot by Rollo Carpenter, aiming to simulate natural human chat. LaMDA, a family of conversational neural language models developed by Google. LLaMA, a 2023 language model family developed by Meta that includes 7, 13, 33 and 65 billion parameter models.[1] Mycroft, a free and open-source intelligent personal assistant that uses a natural language user interface. PARRY, another early chatterbot, written in 1972 by Kenneth Colby, attempting to simulate a paranoid schizophrenic. SHRDLU, an early natural language processing computer program developed by Terry Winograd at MIT from 1968 to 1970. SYSTRAN, a machine translation technology by the company of the same name, used by Yahoo!, AltaVista and Google, among others. === Speech recognition === CMU Sphinx, a group of speech recognition systems developed at Carnegie Mellon University. DeepSpeech, an open-source Speech-To-Text engine based on Baidu's deep speech research paper. Whisper, an open-source speech recognition system developed at OpenAI. === Speech synthesis === 15.ai, a real-time artificial intelligence text-to-speech tool developed by an anonymous researcher from MIT. Amazon Polly, a speech synthesis software by Amazon. Festival Speech Synthesis System, a general multi-lingual speech synthesis system developed at the Centre for Speech Technology Research (CSTR) at the University of Edinburgh. WaveNet, a deep neural network for generating raw audio. === Video === CapCut is a video editor tool, developed

    Read more →
  • Site reliability engineering

    Site reliability engineering

    Site reliability engineering (SRE) is a discipline in the field of software engineering and IT infrastructure support that monitors and improves the availability and performance of deployed software systems and large software services (which are expected to deliver reliable response times across events such as new software deployments, hardware failures, and cybersecurity attacks). There is typically a focus on automation and an infrastructure as code methodology. SRE uses elements of software engineering, IT infrastructure, web development, and operations to assist with reliability. It is similar to DevOps as they both aim to improve the reliability and availability of deployed software systems. == History == Site Reliability Engineering originated at Google with Benjamin Treynor Sloss, who founded SRE team in 2003. The concept expanded within the software development industry, leading various companies to employ site reliability engineers. By March 2016, Google had more than 1,000 site reliability engineers on staff. Dedicated SRE teams are common at larger web development companies. In middle-sized and smaller companies, DevOps teams sometimes perform SRE, as well. Organizations that have adopted the concept include Airbnb, Dropbox, IBM, LinkedIn, Netflix, and Wikimedia. == Definition == Site reliability engineers (SREs) are responsible for a combination of system availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. SREs often have backgrounds in software engineering, systems engineering, and/or system administration. The focuses of SRE include automation, system design, and improvements to system resilience. SRE is considered a specific implementation of DevOps; focusing specifically on building reliable systems, whereas DevOps covers a broader scope of operations. Despite having different focuses, some companies have rebranded their operations teams to SRE teams. == Principles and practices == Common definitions of the practices include (but are not limited to): Automation of repetitive tasks for cost-effectiveness. Defining reliability goals to prevent endless effort. Design of systems with a goal to reduce risks to availability, latency, and efficiency. Observability, the ability to ask arbitrary questions about a system without having to know ahead of time what to ask. Common definitions of the principles include (but are not limited to): Toil management, the implementation of the first principle outlined above. Defining and measuring reliability goals—SLIs, SLOs, and error budgets. Non-Abstract Large Scale Systems Design (NALSD) with a focus on reliability. Designing for and implementing observability. Defining, testing, and running an incident management process. Capacity planning. Change and release management, including CI/CD. Chaos engineering. == Deployment == SRE teams collaborate with other departments within organizations to guide the implementation of the mentioned principles. Below is an overview of common practices: === Kitchen Sink === Kitchen Sink refers to the expansive and often unbounded scope of services and workflows that SRE teams oversee. Unlike traditional roles with clearly defined boundaries, SREs are tasked with various responsibilities, including system performance optimization, incident management, and automation. This approach allows SREs to address multiple challenges, ensuring that systems run efficiently and evolve in response to changing demands and complexities. === Infrastructure === Infrastructure SRE teams focus on maintaining and improving the reliability of systems that support other teams' workflows. While they sometimes collaborate with platform engineering teams, their primary responsibility is ensuring up-time, performance, and efficiency. Platform teams, on the other hand, primarily develop the software and systems used across the organization. While reliability is a goal for both, platform teams prioritize creating and maintaining the tools and services used by internal stakeholders, whereas Infrastructure SRE teams are tasked with ensuring those systems run smoothly and meet reliability standards. === Tools === SRE teams utilize a variety of tools with the aim of measuring, maintaining, and enhancing system reliability. These tools play a role in monitoring performance, identifying issues, and facilitating proactive maintenance. For instance, Nagios Core is commonly employed for system monitoring and alerting, while Prometheus (software) is frequently used for collecting and querying metrics in cloud-native environments. === Product or Application === SRE teams dedicated to specific products or applications are common in large organizations. These teams are responsible for ensuring the reliability, scalability, and performance of key services. In larger companies, it's typical to have multiple SRE teams, each focusing on different products or applications, ensuring that each area receives specialized attention to meet performance and availability targets. === Embedded === In an embedded model, individual SREs or small SRE pairs are integrated within software engineering teams. These SREs collaborate with developers, applying core SRE principles—such as automation, monitoring, and incident response—directly to the software development lifecycle. This approach aims to enhance reliability, performance, and collaboration between SREs and developers. === Consulting === Consulting SRE teams specialize in advising organizations on the implementation of SRE principles and practices. Typically composed of seasoned SREs with a history across various implementations, these teams provide insights and guidance for specific organizational needs. When working directly with clients, these SREs are often referred to as 'Customer Reliability Engineers.' In large organizations that have adopted SRE, a hybrid model is common. This model includes various implementations, such as multiple Product/Application SRE teams dedicated to addressing the specific reliability needs of different products. An Infrastructure SRE team may collaborate with a Platform engineering group to achieve shared reliability goals for a unified platform that supports all products and applications. == Industry == Since 2014, the USENIX organization has hosted the annual SREcon conference, bringing together site reliability engineers from various industries. This conference is a platform for professionals to share knowledge, explore effective practices, and discuss trends in site reliability engineering.

    Read more →
  • Algorithmic logic

    Algorithmic logic

    Algorithmic logic is a calculus of programs that allows the expression of semantic properties of programs by appropriate logical formulas. It provides a framework that enables proving the formulas from the axioms of program constructs such as assignment, iteration and composition instructions and from the axioms of the data structures in question see Mirkowska & Salwicki (1987), Banachowski et al. (1977). The following diagram helps to locate algorithmic logic among other logics. [ P r o p o s i t i o n a l l o g i c o r S e n t e n t i a l c a l c u l u s ] ⊂ [ P r e d i c a t e c a l c u l u s o r F i r s t o r d e r l o g i c ] ⊂ [ C a l c u l u s o f p r o g r a m s o r Algorithmic logic ] {\displaystyle \qquad \left[{\begin{array}{l}\mathrm {Propositional\ logic} \\or\\\mathrm {Sentential\ calculus} \end{array}}\right]\subset \left[{\begin{array}{l}\mathrm {Predicate\ calculus} \\or\\\mathrm {First\ order\ logic} \end{array}}\right]\subset \left[{\begin{array}{l}\mathrm {Calculus\ of\ programs} \\or\\{\mbox{Algorithmic logic}}\end{array}}\right]} The formalized language of algorithmic logic (and of algorithmic theories of various data structures) contains three types of well formed expressions: Terms - i.e. expressions denoting operations on elements of data structures, formulas - i.e. expressions denoting the relations among elements of data structures, programs - i.e. algorithms - these expressions describe the computations. For semantics of terms and formulas consult pages on first-order logic and Tarski's semantics. The meaning of a program K {\displaystyle K} is the set of possible computations of the program. Algorithmic logic is one of many logics of programs. Another logic of programs is dynamic logic, see dynamic logic, Harel, Kozen & Tiuryn (2000).

    Read more →
  • Run-time algorithm specialization

    Run-time algorithm specialization

    In computer science, run-time algorithm specialization is a methodology for creating efficient algorithms for costly computation tasks of certain kinds. The methodology originates in the field of automated theorem proving and, more specifically, in the Vampire theorem prover project. The idea is inspired by the use of partial evaluation in optimising program translation. Many core operations in theorem provers exhibit the following pattern. Suppose that we need to execute some algorithm a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} in a situation where a value of A {\displaystyle A} is fixed for potentially many different values of B {\displaystyle B} . In order to do this efficiently, we can try to find a specialization of a l g {\displaystyle {\mathit {alg}}} for every fixed A {\displaystyle A} , i.e., such an algorithm a l g A {\displaystyle {\mathit {alg}}_{A}} , that executing a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is equivalent to executing a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} . The specialized algorithm may be more efficient than the generic one, since it can exploit some particular properties of the fixed value A {\displaystyle A} . Typically, a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} can avoid some operations that a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} would have to perform, if they are known to be redundant for this particular parameter A {\displaystyle A} . In particular, we can often identify some tests that are true or false for A {\displaystyle A} , unroll loops and recursion, etc. == Difference from partial evaluation == The key difference between run-time specialization and partial evaluation is that the values of A {\displaystyle A} on which a l g {\displaystyle {\mathit {alg}}} is specialised are not known statically, so the specialization takes place at run-time. There is also an important technical difference. Partial evaluation is applied to algorithms explicitly represented as codes in some programming language. At run-time, we do not need any concrete representation of a l g {\displaystyle {\mathit {alg}}} . We only have to imagine a l g {\displaystyle {\mathit {alg}}} when we program the specialization procedure. All we need is a concrete representation of the specialized version a l g A {\displaystyle {\mathit {alg}}_{A}} . This also means that we cannot use any universal methods for specializing algorithms, which is usually the case with partial evaluation. Instead, we have to program a specialization procedure for every particular algorithm a l g {\displaystyle {\mathit {alg}}} . An important advantage of doing so is that we can use some powerful ad hoc tricks exploiting peculiarities of a l g {\displaystyle {\mathit {alg}}} and the representation of A {\displaystyle A} and B {\displaystyle B} , which are beyond the reach of any universal specialization methods. == Specialization with compilation == The specialized algorithm has to be represented in a form that can be interpreted. In many situations, usually when a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is to be computed on many values of B {\displaystyle B} in a row, a l g A {\displaystyle {\mathit {alg}}_{A}} can be written as machine code instructions for a special abstract machine, and it is typically said that A {\displaystyle A} is compiled. The code itself can then be additionally optimized by answer-preserving transformations that rely only on the semantics of instructions of the abstract machine. The instructions of the abstract machine can usually be represented as records. One field of such a record, an instruction identifier (or instruction tag), would identify the instruction type, e.g. an integer field may be used, with particular integer values corresponding to particular instructions. Other fields may be used for storing additional parameters of the instruction, e.g. a pointer field may point to another instruction representing a label, if the semantics of the instruction require a jump. All instructions of the code can be stored in a traversable data structure such as an array, linked list, or tree. Interpretation (or execution) proceeds by fetching instructions in some order, identifying their type, and executing the actions associated with said type. In many programming languages, such as C and C++, a simple switch statement may be used to associate actions with different instruction identifiers. Modern compilers usually compile a switch statement with constant (e.g. integer) labels from a narrow range by storing the address of the statement corresponding to a value i {\displaystyle i} in the i {\displaystyle i} -th cell of a special array, as a means of efficient optimisation. This can be exploited by taking values for instruction identifiers from a small interval of values. == Data-and-algorithm specialization == There are situations when many instances of A {\displaystyle A} are intended for long-term storage and the calls of a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} occur with different B {\displaystyle B} in an unpredictable order. For example, we may have to check a l g ( A 1 , B 1 ) {\displaystyle {\mathit {alg}}(A_{1},B_{1})} first, then a l g ( A 2 , B 2 ) {\displaystyle {\mathit {alg}}(A_{2},B_{2})} , then a l g ( A 1 , B 3 ) {\displaystyle {\mathit {alg}}(A_{1},B_{3})} , and so on. In such circumstances, full-scale specialization with compilation may not be suitable due to excessive memory usage. However, we can sometimes find a compact specialized representation A ′ {\displaystyle A^{\prime }} for every A {\displaystyle A} , that can be stored with, or instead of, A {\displaystyle A} . We also define a variant a l g ′ {\displaystyle {\mathit {alg}}^{\prime }} that works on this representation and any call to a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} is replaced by a l g ′ ( A ′ , B ) {\displaystyle {\mathit {alg}}^{\prime }(A^{\prime },B)} , intended to do the same job faster.

    Read more →
  • Magic state distillation

    Magic state distillation

    Magic state distillation is a method for creating more accurate quantum states from multiple noisy ones, which is important for building fault tolerant quantum computers. It has also been linked to quantum contextuality, a concept thought to contribute to quantum computers' power. The technique was first proposed by Emanuel Knill in 2004, and further analyzed by Sergey Bravyi and Alexei Kitaev the same year. Thanks to the Gottesman–Knill theorem, it is known that some quantum operations (operations in the Clifford group) can be perfectly simulated in polynomial time on a classical computer. In order to achieve universal quantum computation, a quantum computer must be able to perform operations outside this set. Magic state distillation achieves this, in principle, by concentrating the usefulness of imperfect resources, represented by mixed states, into states that are conducive for performing operations that are difficult to simulate classically. A variety of qubit magic state distillation routines and distillation routines for qubits with various advantages have been proposed. == Stabilizer formalism == The Clifford group consists of a set of n {\displaystyle n} -qubit operations generated by the gates {H, S, CNOT} (where H is Hadamard and S is [ 1 0 0 i ] {\displaystyle {\begin{bmatrix}1&0\\0&i\end{bmatrix}}} ) called Clifford gates. The Clifford group generates stabilizer states which can be efficiently simulated classically, as shown by the Gottesman–Knill theorem. This set of gates with a non-Clifford operation is universal for quantum computation. == Magic states == Magic states are purified from n {\displaystyle n} copies of a mixed state ρ {\displaystyle \rho } . These states are typically provided via an ancilla to the circuit. A magic state for the π / 6 {\displaystyle \pi /6} rotation operator is | M ⟩ = cos ⁡ ( β / 2 ) | 0 ⟩ + e i π 4 sin ⁡ ( β / 2 ) | 1 ⟩ {\displaystyle |M\rangle =\cos(\beta /2)|0\rangle +e^{i{\frac {\pi }{4}}}\sin(\beta /2)|1\rangle } where β = arccos ⁡ ( 1 3 ) {\displaystyle \beta =\arccos \left({\frac {1}{\sqrt {3}}}\right)} . A non-Clifford gate can be generated by combining (copies of) magic states with Clifford gates. Since a set of Clifford gates combined with a non-Clifford gate is universal for quantum computation, magic states combined with Clifford gates are also universal. == Purification algorithm for distilling |M〉 == The first magic state distillation algorithm, invented by Sergey Bravyi and Alexei Kitaev, is as follows. Input: Prepare 5 imperfect states. Output: An almost pure state having a small error probability. repeat Apply the decoding operation of the five-qubit error correcting code and measure the syndrome. If the measured syndrome is | 0000 ⟩ {\displaystyle |0000\rangle } , the distillation attempt is successful. else Get rid of the resulting state and restart the algorithm. until The states have been distilled to the desired purity.

    Read more →
  • Elowan

    Elowan

    Elowan is a plant-robot cyborg. Using its own internal bioelectrical signals, The plant has a robotic extension that makes it move towards light sources. Electrodes are inserted into the leaves, stem, and ground to detect the faint bioelectrical signals the plant produces. Then they are amplified so the robot can read them. So when the plant "wants" to go to light, the cyborg automatically goes to the nearest light source. Future extensions of the robot could provide: Protection, growth frameworks, and nutrients. Other factors that could make the cyborg move are temperature, soil, and gravity conditions Elowan is one in a series of plant-electronic hybrid experiments.

    Read more →
  • DONE

    DONE

    The Data-based Online Nonlinear Extremumseeker (DONE) algorithm is a black-box optimization algorithm. DONE models the unknown cost function and attempts to find an optimum of the underlying function. The DONE algorithm is suitable for optimizing costly and noisy functions and does not require derivatives. An advantage of DONE over similar algorithms, such as Bayesian optimization, is that the computational cost per iteration is independent of the number of function evaluations. == Methods == The DONE algorithm was first proposed by Hans Verstraete and Sander Wahls in 2015. The algorithm fits a surrogate model based on random Fourier features and then uses a well-known L-BFGS algorithm to find an optimum of the surrogate model. == Applications == DONE was first demonstrated for maximizing the signal in optical coherence tomography measurements, but has since then been applied to various other applications. For example, it was used to help extending the field of view in light sheet fluorescence microscopy.

    Read more →
  • Informationist

    Informationist

    An informationist (or information specialist in context) provides research and knowledge management services in the context of clinical care or biomedical research. Although there is no one educational pathway or formalized set of skills or knowledge for informationists, one way to think of the informationist is as one who possesses the knowledge and skill of a medical librarian with extensive research specialization and some formal clinical or public health education that goes beyond on-the-job osmosis. Medical librarians and other biomedical professional organizations have been exploring the possibilities for evaluating how informationists are being used and whether their activities supplement or replace medical library activity. More generally, an informationist is a professional who works with information within a particular business, analytic or scientific context to drive toward outcomes based on evidence, analysis, prediction and execution. For example, an extension of the term is increasingly emerging in financial services, life sciences and health care industries. Though still nascently in use, its adoption applies to individuals with extensive industry expertise, acute familiarity with organizational structures and processes, deep domain level information mastery and information systems technical savvy. Informationists in this context support transformational initiatives within and across functional areas of an enterprise as architects, governance experts, continuous improvement advocates and strategists. == Background == The term was proposed in 2000 by Davidoff & Florance. Their editorial suggested that physicians should be delegating their information needs to informationists, just as they currently order CT scans from radiologists or cardiac catheterizations from cardiologists. They conceived of an information professional who was embedded in (and indeed, supported by) the clinical departments. Supporters of the concept see it as a means for librarians to reinvigorate connections with the faculty/clinicians, as well as provide superior service by dint of informationists' biomedical training. Critics complained that the idea is nothing new; librarians already provide in-depth, high quality information services and clinical medical librarians have been working alongside physicians, nurses and other clinicians for years. Large informationist programs in the U.S. exist at the National Institutes of Health and at Vanderbilt University. Welch Medical Library at Johns Hopkins University (JHU) is developing an informationist service model in which its 10 clinical and public health librarians are moving from serving as liaison librarians for assigned departments toward becoming embedded informationists within their departments. To prepare for the embedded informationist role, librarians are undertaking education as needed to supplement their backgrounds. For example, librarians bring experience in clinical behavior counseling, public health, nursing, and more. Informationist training can then focus upon filling gaps in research methods knowledge more so than on gaining additional knowledge in the librarian's area of expertise. Courses, seminars and workshops being undertaken include those covering systematic reviews, evidence-based medicine, critical appraisal, medical language, anatomy and physiology, biostatistics, and clinical research. The term informationist is related to that of informatician—also informaticist—and many informationists do possess skills in clinical topics, bioinformatics, and biomedical informatics. Harvard University, the University of Pittsburgh, and Washington University in St. Louis are examples of institutional libraries which have hired PhD-level scientists (who may or may not have library degrees) to provide informatics support for biomedical research.

    Read more →