AI Grammar Maker Free

AI Grammar Maker Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Color vision

    Color vision

    Color vision (CV), a feature of visual perception, is an ability to perceive differences between light composed of different frequencies independently of light intensity. Color perception is a part of the larger visual system and is mediated by a complex process between neurons that begins with differential stimulation of different types of photoreceptors by light entering the eye. Those photoreceptors then emit outputs that are propagated through many layers of neurons ultimately leading to higher cognitive functions in the brain. Color vision is found in many animals and is mediated by similar underlying mechanisms with common types of biological molecules and a complex history of the evolution of color vision within different animal taxa. In primates, color vision may have evolved under selective pressure for a variety of visual tasks including the foraging for nutritious young leaves, ripe fruit, and flowers, as well as detecting predator camouflage and emotional states in other primates. == Wavelength == Isaac Newton discovered that white light after being split into its component colors when passed through a dispersive prism could be recombined to make white light by passing them through a different prism. The visible light spectrum ranges from about 380 to 740 nanometers. Spectral colors (colors that are produced by a narrow band of wavelengths) such as red, orange, yellow, green, cyan, blue, and violet can be found in this range. These spectral colors do not refer to a single wavelength, but rather to a set of wavelengths: red, 625–740 nm; orange, 590–625 nm; yellow, 565–590 nm; green, 500–565 nm; cyan, 485–500 nm; blue, 450–485 nm; violet, 380–450 nm. Wavelengths longer or shorter than this range are called infrared or ultraviolet, respectively. Humans cannot generally see these wavelengths, but other animals may. === Hue detection === Sufficient differences in wavelength cause a difference in the perceived hue; the just-noticeable difference in wavelength varies from about 1 nm in the blue-green and yellow wavelengths to 10 nm and more in the longer red and shorter blue wavelengths. Although the human eye can distinguish up to a few hundred hues, when those pure spectral colors are mixed together or diluted with white light, the number of distinguishable chromaticities can be much higher. In very low light levels, vision is scotopic: light is detected by rod cells of the retina. Rods are maximally sensitive to wavelengths near 500 nm and play little, if any, role in color vision. In brighter light, such as daylight, vision is photopic: light is detected by cone cells which are responsible for color vision. Cones are sensitive to a range of wavelengths, but are most sensitive to wavelengths near 555 nm. Between these regions, mesopic vision comes into play and both rods and cones provide signals to the retinal ganglion cells. The shift in color perception from dim light to daylight gives rise to differences known as the Purkinje effect. The perception of "white" is formed by the entire spectrum of visible light, or by mixing colors of just a few wavelengths in animals with few types of color receptors. In humans, white light can be perceived by combining wavelengths such as red, green, and blue, or just a pair of complementary colors such as blue and yellow. === Non-spectral colors === There are a variety of colors in addition to spectral colors and their hues. These include grayscale colors, shades of colors obtained by mixing grayscale colors with spectral colors, violet-red colors, impossible colors, and metallic colors. Grayscale colors include white, gray, and black. Rods contain rhodopsin, which reacts to light intensity, providing grayscale coloring. Shades include colors such as pink or brown. Pink is obtained from mixing red and white. Brown may be obtained from mixing orange with gray or black. Navy is obtained from mixing blue and black. Violet-red colors include hues and shades of magenta. The light spectrum is a line on which violet is one end and the other is red, and yet we see hues of purple that connect those two colors. Impossible colors are a combination of cone responses that cannot be naturally produced. For example, medium cones cannot be activated completely on their own; if they were, we would see a 'hyper-green' color. == Dimensionality == Color vision is categorized foremost according to the dimensionality of the color gamut, which is defined by the number of primaries required to represent the color vision. This is generally equal to the number of photopsins expressed: a correlation that holds for vertebrates but not invertebrates. The common vertebrate ancestor possessed four photopsins (expressed in cones) plus rhodopsin (expressed in rods), so was tetrachromatic. However, many vertebrate lineages have lost one or many photopsin genes, leading to lower-dimension color vision. The dimensions of color vision range from 1-dimensional and up: == Physiology of color perception == Perception of color begins with specialized retinal cells known as cone cells. Cone cells contain different forms of opsin – a pigment protein – that have different spectral sensitivities. Humans contain three types, resulting in trichromatic color vision. Each individual cone contains pigments composed of opsin apoprotein covalently linked to a light-absorbing prosthetic group: either 11-cis-hydroretinal or, more rarely, 11-cis-dehydroretinal. The cones are conventionally labeled according to the ordering of the wavelengths of the peaks of their spectral sensitivities: short (S), medium (M), and long (L) cone types. These three types do not correspond well to particular colors as we know them. Rather, the perception of color is achieved by a complex process that starts with the differential output of these cells in the retina and which is finalized in the visual cortex and associative areas of the brain. For example, while the L cones have been referred to simply as red receptors, microspectrophotometry has shown that their peak sensitivity is in the greenish-yellow region of the spectrum. Similarly, the S cones and M cones do not directly correspond to blue and green, although they are often described as such. The RGB color model, therefore, is a convenient means for representing color but is not directly based on the types of cones in the human eye. The peak response of human cone cells varies, even among individuals with typical color vision; in some non-human species this polymorphic variation is even greater, and it may well be adaptive. === Theories === Two complementary theories of color vision are the trichromatic theory and the opponent process theory. The trichromatic theory, or Young–Helmholtz theory, proposed in the 19th century by Thomas Young and Hermann von Helmholtz, posits three types of cones preferentially sensitive to blue, green, and red, respectively. Others have suggested that the trichromatic theory is not specifically a theory of color vision but a theory of receptors for all vision, including color but not specific or limited to it. Equally, it has been suggested that the relationship between the phenomenal opponency described by Ewald Hering and the physiological opponent processes are not straightforward (see below), making of physiological opponency a mechanism that is relevant to the whole of vision, and not just to color vision alone. Hering proposed the opponent process theory in 1872. It states that the visual system interprets color in an antagonistic way: red vs. green, blue vs. yellow, black vs. white. Both theories are generally accepted as valid, describing different stages in visual physiology, visualized in the adjacent diagram. Green–magenta and blue–yellow are scales with mutually exclusive boundaries. In the same way that there cannot exist a "slightly negative" positive number, a single eye cannot perceive a bluish-yellow or a reddish-green. Although these two theories are both currently widely accepted theories, past and more recent work has led to criticism of the opponent process theory, stemming from a number of what are presented as discrepancies in the standard opponent process theory. For example, the phenomenon of an after-image of complementary color can be induced by fatiguing the cells responsible for color perception, by staring at a vibrant color for a length of time, and then looking at a white surface. This phenomenon of complementary colors shows that cyan, rather than green, is the complement of red, and that magenta, rather than red, is the complement of green. It therefore also shows that the reddish-green color supposed to be impossible by opponent process theory is actually the color yellow. Although this phenomenon is more readily explained by the trichromatic theory, explanations for the discrepancy may include alterations to the opponent process theory, such as redefining the opponent colors as red vs. cyan, to reflect this effect. Despite such criticis

    Read more →
  • NoSQL

    NoSQL

    NoSQL (originally meaning "not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which organize data into rows and columns like a spreadsheet, NoSQL databases use a single data structure—such as key–value pairs, wide columns, graphs, or documents—to hold information. Since this non-relational design does not require a fixed schema, it scales easily to manage large, often unstructured datasets. NoSQL systems are sometimes called "Not only SQL" because they can support SQL-like query languages or work alongside SQL databases in polyglot-persistent setups, where multiple database types are combined. Non-relational databases date back to the late 1960s, but the term "NoSQL" emerged in the early 2000s, spurred by the needs of Web 2.0 companies like social media platforms. NoSQL databases are popular in big data and real-time web applications due to their simple design, ability to scale across clusters of machines (called horizontal scaling), and precise control over data availability. These structures can speed up certain tasks and are often considered more adaptable than fixed database tables. However, many NoSQL systems prioritize speed and availability over strict consistency (per the CAP theorem), using eventual consistency—where updates reach all nodes eventually, typically within milliseconds, but may cause brief delays in accessing the latest data, known as stale reads. While most lack full ACID transaction support, some, like MongoDB, include it as a key feature. == Barriers to adoption == Barriers to wider NoSQL adoption include their use of low-level query languages instead of SQL, inability to perform ad hoc joins across tables, lack of standardized interfaces, and significant investments already made in relational databases. Some NoSQL systems risk losing data through lost writes or other forms, though features like write-ahead logging—a method to record changes before they’re applied—can help prevent this. For distributed transaction processing across multiple databases, keeping data consistent is a challenge for both NoSQL and relational systems, as relational databases cannot enforce rules linking separate databases, and few systems support both ACID transactions and X/Open XA standards for managing distributed updates. Limitations within the interface environment are overcome using semantic virtualization protocols, such that NoSQL services are accessible to most operating systems. == History == The term NoSQL was used by Carlo Strozzi in 1998 to name his lightweight Strozzi NoSQL open-source relational database that did not expose the standard Structured Query Language (SQL) interface, but was still relational. His NoSQL RDBMS is distinct from the around-2009 general concept of NoSQL databases. Strozzi suggests that, because the current NoSQL movement "departs from the relational model altogether, it should therefore have been called more appropriately 'NoREL'", referring to "not relational". Johan Oskarsson, then a developer at Last.fm, reintroduced the term NoSQL in early 2009 when he organized an event to discuss "open-source distributed, non-relational databases". The name attempted to label the emergence of an increasing number of non-relational, distributed data stores, including open source clones of Google's Bigtable/MapReduce and Amazon's DynamoDB. == Types and examples == There are various ways to classify NoSQL databases, with different categories and subcategories, some of which overlap. What follows is a non-exhaustive classification by data model, with examples: === Key–value store === Key–value (KV) stores use the associative array (also called a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key–value pairs, such that each possible key appears at most once in the collection. The key–value model is one of the simplest non-trivial data models, and richer data models are often implemented as an extension of it. The key–value model can be extended to a discretely ordered model that maintains keys in lexicographic order. This extension is computationally powerful, in that it can efficiently retrieve selective key ranges. Key–value stores can use consistency models ranging from eventual consistency to serializability. Some databases support ordering of keys. There are various hardware implementations, and some users store data in memory (RAM), while others on solid-state drives (SSD) or rotating disks (aka hard disk drive (HDD)). === Document store === The central concept of a document store is that of a "document". While the details of this definition differ among document-oriented databases, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON and binary forms like BSON. Documents are addressed in the database via a unique key that represents that document. Another defining characteristic of a document-oriented database is an API or query language to retrieve documents based on their contents. Different implementations offer different ways of organizing and/or grouping documents: Collections Tags Non-visible metadata Directory hierarchies Compared to relational databases, collections could be considered analogous to tables and documents analogous to records. But they are different – every record in a table has the same sequence of fields, while documents in a collection may have fields that are completely different. === Graph === Graph databases are designed for data whose relations are well represented as a graph consisting of elements connected by a finite number of relations. Examples of data include social relations, public transport links, road maps, network topologies, etc. Graph databases and their query language == Performance == The performance of NoSQL databases is usually evaluated using the metric of throughput, which is measured as operations per second. Performance evaluation must pay attention to the right benchmarks such as production configurations, parameters of the databases, anticipated data volume, and concurrent user workloads. Ben Scofield rated different categories of NoSQL databases as follows: Performance and scalability comparisons are most commonly done using the YCSB benchmark. == Handling relational data == Since most NoSQL databases lack ability for joins in queries, the database schema generally needs to be designed differently. There are three main techniques for handling relational data in a NoSQL database. (See table join and ACID support for NoSQL databases that support joins.) === Multiple queries === Instead of retrieving all the data with one query, it is common to do several queries to get the desired data. NoSQL queries are often faster than traditional SQL queries, so the cost of additional queries may be acceptable. If an excessive number of queries would be necessary, one of the other two approaches is more appropriate. === Caching, replication and non-normalized data === Instead of only storing foreign keys, it is common to store actual foreign values along with the model's data. For example, each blog comment might include the username in addition to a user id, thus providing easy access to the username without requiring another lookup. When a username changes, however, this will now need to be changed in many places in the database. Thus this approach works better when reads are much more common than writes. === Nesting data === With document databases like MongoDB it is common to put more data in a smaller number of collections. For example, in a blogging application, one might choose to store comments within the blog post document, so that with a single retrieval one gets all the comments. Thus in this approach a single document contains all the data needed for a specific task. == ACID and join support == A database is marked as supporting ACID properties (atomicity, consistency, isolation, durability) or join operations if the documentation for the database makes that claim. However, this doesn't necessarily mean that the capability is fully supported in a manner similar to most SQL databases. == Query optimization and indexing in NoSQL databases == Different NoSQL databases, such as DynamoDB, MongoDB, Cassandra, Couchbase, HBase, and Redis, exhibit varying behaviors when querying non-indexed fields. Many perform full-table or collection scans for such queries, applying filtering operations after retrieving data. However, modern NoSQL databases often incorporate advanced features to optimize query performance. For example, MongoDB supports compound indexes and query-optimization strategies, Cassandra offers secondary indexes and materialized views, and Redis employs custom indexing mechanisms tailored to specific use cases. Systems like El

    Read more →
  • Online analytical processing

    Online analytical processing

    In computing, online analytical processing (OLAP) (), is an approach to quickly answer multi-dimensional analytical (MDA) queries. The term OLAP was created as a slight modification of the traditional database term online transaction processing (OLTP). OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture. OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. By contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region's sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints. These viewpoints are sometimes called dimensions (such as looking at the same sales by salesperson, or by date, or by customer, or by product, or by region, etc.). Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases. OLAP is typically contrasted to OLTP (online transaction processing), which is generally characterized by much less complex queries, in a larger volume, to process transactions rather than for the purpose of business intelligence or reporting. Whereas OLAP systems are mostly optimized for read, OLTP has to process all kinds of queries (read, insert, update and delete). == Overview of OLAP systems == At the core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or a hypercube). It consists of numeric facts called measures that are categorized by dimensions. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a vector space. The usual interface to manipulate an OLAP cube is a matrix interface, like Pivot tables in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure. A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data". The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions". Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications. Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models. === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an aggregate function (or aggregation function). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data. Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-complete. Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A search algorithm. Some aggregation functions can be computed for the entire OLAP cube by precomputing values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a divide and conquer algorithm to the multidimensional problem to compute them efficiently. For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called decomposable aggregation functions, and include COUNT, MAX, MIN, and SUM, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions. In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include AVERAGE (tracking sum and count, dividing at the end) and RANGE (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include DISTINCT COUNT, MEDIAN, and MODE; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space). == Types == OLAP systems have been traditionally categorized using the following taxonomy. === Multidimensional OLAP (MOLAP) === MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the functional database model do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. Advantages of MOLAP Fast query performance due to optimized storage, multidimensional indexing and caching. Smaller on-disk size of data compared to data stored in relational database due to compression techniques. Automated computation of higher-level aggregates of the data. It is very compact for low dimension data se

    Read more →
  • Jump-and-Walk algorithm

    Jump-and-Walk algorithm

    Jump-and-Walk is an algorithm for point location in triangulations (though most of the theoretical analysis were performed in 2D and 3D random Delaunay triangulations). Surprisingly, the algorithm does not need any preprocessing or complex data structures except some simple representation of the triangulation itself. The predecessor of Jump-and-Walk was due to Lawson (1977) and Green and Sibson (1978), which picks a random starting point S and then walks from S toward the query point Q one triangle at a time. But no theoretical analysis was known for these predecessors until after mid-1990s. Jump-and-Walk picks a small group of sample points and starts the walk from the sample point which is the closest to Q until the simplex containing Q is found. The algorithm was a folklore in practice for some time, and the formal presentation of the algorithm and the analysis of its performance on 2D random Delaunay triangulation was done by Devroye, Mucke and Zhu in mid-1990s (the paper appeared in Algorithmica, 1998). The analysis on 3D random Delaunay triangulation was done by Mucke, Saias and Zhu (ACM Symposium of Computational Geometry, 1996). In both cases, a boundary condition was assumed, namely, Q must be slightly away from the boundary of the convex domain where the vertices of the random Delaunay triangulation are drawn. In 2004, Devroye, Lemaire and Moreau showed that in 2D the boundary condition can be withdrawn (the paper appeared in Computational Geometry: Theory and Applications, 2004). Jump-and-Walk has been used in many famous software packages, e.g., QHULL, Triangle and CGAL.

    Read more →
  • Learning rate

    Learning rate

    In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting. While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction. Too high a learning rate will make the learning jump over minima, but too low a learning rate will either take too long to converge or get stuck in an undesirable local minimum. In order to achieve faster convergence, prevent oscillations and getting stuck in undesirable local minima the learning rate is often varied during training either in accordance to a learning rate schedule or by using an adaptive learning rate. The learning rate and its adjustments may also differ per parameter, in which case it is a diagonal matrix that can be interpreted as an approximation to the inverse of the Hessian matrix in Newton's method. The learning rate is related to the step length determined by inexact line search in quasi-Newton methods and related optimization algorithms. == Learning rate schedule == Initial rate can be left as system default or can be selected using a range of techniques. A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations. This is mainly done with two parameters: decay and momentum. There are many different learning rate schedules but the most common are time-based, step-based and exponential. Decay serves to settle the learning in a nice place and avoid oscillations, a situation that may arise when too high a constant learning rate makes the learning jump back and forth over a minimum, and is controlled by a hyperparameter. Momentum is analogous to a ball rolling down a hill; we want the ball to settle at the lowest point of the hill (corresponding to the lowest error). Momentum both speeds up the learning (increasing the learning rate) when the error cost gradient is heading in the same direction for a long time and also avoids local minima by 'rolling over' small bumps. Momentum is controlled by a hyperparameter analogous to a ball's mass which must be chosen manually—too high and the ball will roll over minima which we wish to find, too low and it will not fulfil its purpose. The formula for factoring in the momentum is more complex than for decay but is most often built in with deep learning libraries such as Keras. Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration. Factoring in the decay the mathematical formula for the learning rate is: η n + 1 = η 0 1 + d n {\displaystyle \eta _{n+1}={\frac {\eta _{0}}{1+dn}}} where η {\displaystyle \eta } is the learning rate, η 0 {\displaystyle \eta _{0}} is the original learning rate, d {\displaystyle d} is a decay parameter and n {\displaystyle n} is the iteration step. Step-based learning schedules changes the learning rate according to some predefined steps. The decay application formula is here defined as: η n = η 0 d ⌊ 1 + n r ⌋ {\displaystyle \eta _{n}=\eta _{0}d^{\left\lfloor {\frac {1+n}{r}}\right\rfloor }} where η n {\displaystyle \eta _{n}} is the learning rate at iteration n {\displaystyle n} , η 0 {\displaystyle \eta _{0}} is the initial learning rate, d {\displaystyle d} is how much the learning rate should change at each drop (0.5 corresponds to a halving) and r {\displaystyle r} corresponds to the drop rate, or how often the rate should be dropped (10 corresponds to a drop every 10 iterations). The floor function ( ⌊ … ⌋ {\displaystyle \lfloor \dots \rfloor } ) here drops the value of its input to 0 for all values smaller than 1. Exponential learning schedules are similar to step-based, but instead of steps, a decreasing exponential function is used. The mathematical formula for factoring in the decay is: η n = η 0 e − d n {\displaystyle \eta _{n}=\eta _{0}e^{-dn}} where d {\displaystyle d} is a decay parameter. == Adaptive learning rate == The issue with learning rate schedules is that they all depend on hyperparameters that must be manually chosen for each given learning session and may vary greatly depending on the problem at hand or the model used. To combat this, there are many different types of adaptive gradient descent algorithms such as Adagrad, Adadelta, RMSprop, and Adam which are generally built into deep learning libraries such as Keras.

    Read more →
  • Enterprise architecture

    Enterprise architecture

    Enterprise architecture (EA) is a business function concerned with the structures and behaviours of a business, especially business roles and processes that create and use business data. The international definition according to the Federation of Enterprise Architecture Professional Organizations is "a well-defined practice for conducting enterprise analysis, design, planning, and implementation, using a comprehensive approach at all times, for the successful development and execution of strategy. Enterprise architecture applies architecture principles and practices to guide organizations through the business, information, process, and technology changes necessary to execute their strategies. These practices utilize the various aspects of an enterprise to identify, motivate, and achieve these changes." The United States Federal Government is an example of an organization that practices EA, in this case with its Capital Planning and Investment Control processes. Companies such as Independence Blue Cross, Intel, Volkswagen AG, and InterContinental Hotels Group also use EA to improve their business architectures as well as to improve business performance and productivity. Additionally, the Federal Enterprise Architecture's reference guide aids federal agencies in the development of their architectures. == Introduction == As a discipline, EA "proactively and holistically lead[s] enterprise responses to disruptive forces by identifying and analyzing the execution of change" towards organizational goals. EA gives business and IT leaders recommendations for policy adjustments and provides best strategies to support and enable business development and change within the information systems the business depends on. EA provides a guide for decision making towards these objectives. The National Computing Centre's EA best practice guidance states that an EA typically "takes the form of a comprehensive set of cohesive models that describe the structure and functions of an enterprise. The individual models in an EA are arranged in a logical manner that provides an ever-increasing level of detail about the enterprise." Important players within EA include enterprise architects and solutions architects. Enterprise architects are at the top level of the architect hierarchy, meaning they have more responsibilities than solutions architects. While solutions architects focus on their own relevant solutions, enterprise architects focus on solutions for and the impact on the whole organization. Enterprise architects oversee many solution architects and business functions. As practitioners of EA, enterprise architects support an organization's strategic vision by acting to align people, process, and technology decisions with actionable goals and objectives that result in quantifiable improvements toward achieving that vision. The practice of EA "analyzes areas of common activity within or between organizations, where information and other resources are exchanged to guide future states from an integrated viewpoint of strategy, business, and technology." === Definitions === The term enterprise can be defined as an organizational unit, organization, or collection of organizations that share a set of common goals and collaborate to provide specific products or services to customers. In that sense, the term enterprise covers various types of organizations, regardless of their size, ownership model, operational model, or geographical distribution. It includes those organizations' complete sociotechnical system, including people, information, processes, and technologies. Enterprise as a sociotechnical system defines the scope of EA. The term architecture refers to fundamental concepts or properties of a system in its environment; and embodied in its elements, relationships, and in the principles of its design and evolution. A methodology for developing and using architecture to guide the transformation of a business from a baseline state to a target state, sometimes through several transition states, is usually known as an enterprise architecture framework. A framework provides a structured collection of processes, techniques, artifact descriptions, reference models, and guidance for the production and use of an enterprise-specific architecture description. Open-source tools supporting EA practice, such as the Essential Project, have also been evaluated for suitability in academic and commercial training contexts. Paramount to changing the EA is the identification of a sponsor. Their mission, vision, strategy, and the governance framework define all roles, responsibilities, and relationships involved in the anticipated transformation. Changes considered by enterprise architects typically include innovations in the structure or processes of an organization; innovations in the use of information systems or technologies; the integration and/or standardization of business processes; and improvement of the quality and timeliness of business information. According to the standard ISO/IEC/IEEE 42010, the product used to describe the architecture of a system is called an architectural description. In practice, an architectural description contains a variety of lists, tables, and diagrams. These are models known as views. In the case of EA, these models describe the logical business functions or capabilities, business processes, human roles and actors, the physical organization structure, data flows and data stores, business applications and platform applications, hardware, and communications infrastructure. The first use of the term "enterprise architecture" is often incorrectly attributed to John Zachman's 1987 A framework for information systems architecture. The first publication to use it was instead a National Institute of Standards (NIST) Special Publication on the challenges of information system integration. The NIST article describes EA as consisting of several levels. Business unit architecture is the top level and might be a total corporate entity or a sub-unit. It establishes for the whole organization necessary frameworks for "satisfying both internal information needs" as well as the needs of external entities, which include cooperating organizations, customers, and federal agencies. The lower levels of the EA that provide information to higher levels are more attentive to detail on behalf of their superiors. In addition to this structure, business unit architecture establishes standards, policies, and procedures that either enhance or stymie the organization's mission. The main difference between these two definitions is that Zachman's concept was the creation of individual information systems optimized for business, while NIST's described the management of all information systems within a business unit. The definitions in both publications, however, agreed that due to the "increasing size and complexity of the [i]mplementations of [i]nformation systems... logical construct[s] (or architecture) for defining and controlling the interfaces and... [i]ntegration of all the components of a system" is necessary. Zachman in particular urged for a "strategic planning methodology." == Overview == === Schools of thought === Within the field of enterprise architecture, there are three overarching schools: Enterprise IT Design, Enterprise Integrating, and Enterprise Ecosystem Adaption. Which school one subscribes to will impact how they see the EA's purpose and scope, as well as the means of achieving it, the skills needed to conduct it, and the locus of responsibility for conducting it. Under Enterprise IT Design, the main purpose of EA is to guide the process of planning and designing an enterprise's IT/IS capabilities to meet the desired organizational objectives, often by greater alignment between IT/IS and business concerns. Architecture proposals and decisions are limited to the IT/IS aspects of the enterprise and other aspects service only as inputs. The Enterprise Integrating school believes that the purpose of EA is to create a greater coherency between the various concerns of an enterprise (HR, IT, Operations, etc.), including the link between strategy formulation and execution. Architecture proposals and decisions here encompass all aspects of the enterprise. The Enterprise Ecosystem Adaption school states that the purpose of EA is to foster and maintain the learning capabilities of enterprises so they may be sustainable. Consequently, a great deal of emphasis is put on improving the capabilities of the enterprise to improve itself, to innovate, and to coevolve with its environment. Typically, proposals and decisions encompass both the enterprise and its environment. === Benefits, challenges, and criticisms === The benefits of EA are achieved through its direct and indirect contributions to organizational goals. Notable benefits include support in the areas related to design and re-design of the organizational structures during mergers, acquisitions, or

    Read more →
  • StoredIQ

    StoredIQ

    StoredIQ was a company founded for information lifecycle management (ILM) of unstructured data. Founded in 2001 as Deepfile in Austin, Texas by Jeff Erramouspe, Jeff Bone, Russell Turpin, Rudy Rouhana, Laura Arbilla and Brett Funderburg, the company changed its name in 2005 to StoredIQ. It continued to operate successfully for over a decade until it was acquired in 2012 by IBM. It now serves as a platform for IBM's information life cycle governance, big data governance and enterprise content management technologies. StoredIQ was awarded five patents by the USPTO. The first, originally filed in 2003, enabled unstructured data in file systems to be manipulated in a similar way to information stored in databases. Subsequent patents built upon the patented actionable file system with further enhancements specific to Enterprise Policy Management and expanding the reach of StoredIQ's management capability all the way to individual desktops. In 2008 StoredIQ was recognized as "Best in Compliance" by Network Products Guide. At the same time, StoredIQ was being recognized as a "Top 5 Provider" by the prestigious Socha-Gelbmann eDiscovery survey. There were takeover negotiations with EMC Corporation, initially a strategic investor in StoredIQ, however, the company rejected the approach, leaving EMC to acquire a competitor. The company published a whitepaper titled The Truth About Big Data. This promotion combined with StoredIQ's patented technology led to IBM selecting StoredIQ as the basis for some products.

    Read more →
  • Subject indexing

    Subject indexing

    Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, the objective is to identify and describe the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document, such as a book; objects in a collection, such as a library; and documents (such as books and articles) within a field of knowledge. Subject indexing is used in information retrieval especially to create bibliographic indexes to retrieve documents on a particular subject. Examples of academic indexing services are Zentralblatt MATH, Chemical Abstracts, and PubMed. The index terms were mostly assigned by experts but author keywords are also common. The process of indexing begins with any analysis of the subject of the document. The indexer must then identify terms that appropriately identify the subject, either by extracting words directly from the document or assigning words from a controlled vocabulary. The terms in the index are then presented in a systematic order. Indexers must decide how many terms to include and how specific the terms should be. Together this gives a depth of indexing. == Subject analysis == The first step in indexing is to decide on the subject matter of the document. In manual indexing, the indexer would consider the subject matter in terms of answer to a set of questions such as "Does the document deal with a specific product, condition or phenomenon?". As the analysis is influenced by the knowledge and experience of the indexer, it follows that two indexers may analyze the content differently and so come up with different index terms. This will impact on the success of retrieval. === Automatic vs. manual subject analysis === Automatic indexing follows set processes of analyzing frequencies of word patterns and comparing results to other documents in order to assign to subject categories. This requires no understanding of the material being indexed. This leads to more uniform indexing but at the expense of the true meaning being interpreted. A computer program will not understand the meaning of statements and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus their attention on certain parts of the document such as the title, abstract, summary and conclusions, as analyzing the full text in depth is costly and time-consuming. An automated system takes away the time limit and allows the entire document to be analyzed, but also has the option to be directed to particular parts of the document. == Term selection == The second stage of indexing involves the translation of the subject analysis into a set of index terms. This can involve extracting from the document or assigning from a controlled vocabulary. With the ability to conduct a full text search widely available, many people have come to rely on their own expertise in conducting information searches and full text search has become very popular. Subject indexing and its experts, professional indexers, catalogers, and librarians, remains crucial to information organization and retrieval. These experts understand controlled vocabularies and are able to find information that cannot be located by full text search. The cost of expert analysis to create subject indexing is not easily compared to the cost of hardware, software and labor to manufacture a comparable set of full-text, fully searchable materials. With new web applications that allow every user to annotate documents, social tagging has gained popularity especially in the Web. One application of indexing, the book index, remains relatively unchanged despite the information revolution. === Extraction/Derived indexing === Extraction indexing involves taking words directly from the document. It uses natural language and lends itself well to automated techniques where word frequencies are calculated and those with a frequency over a pre-determined threshold are used as index terms. A stop-list containing common words (such as "the", "and") would be referred to and such stop words would be excluded as index terms. Automated extraction indexing may lead to loss of meaning of terms by indexing single words as opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing also has the problem that, even with use of a stop-list to remove common words, some frequent words may not be useful for allowing discrimination between documents. For example, the term glucose is likely to occur frequently in any document related to diabetes. Therefore, use of this term would likely return most or all the documents in the database. Post-coordinated indexing where terms are combined at the time of searching would reduce this effect but the onus would be on the searcher to link appropriate terms as opposed to the information professional. In addition terms that occur infrequently may be highly significant for example a new drug may be mentioned infrequently but the novelty of the subject makes any reference significant. One method for allowing rarer terms to be included and common words to be excluded by automated techniques would be a relative frequency approach where frequency of a word in a document is compared to frequency in the database as a whole. Therefore, a term that occurs more often in a document than might be expected based on the rest of the database could then be used as an index term, and terms that occur equally frequently throughout will be excluded. Another problem with automated extraction is that it does not recognize when a concept is discussed but is not identified in the text by an indexable keyword. Since this process is based on simple string matching and involves no intellectual analysis, the resulting product is more appropriately known as a concordance than an index. === Assignment indexing === An alternative is assignment indexing where index terms are taken from a controlled vocabulary. This has the advantage of controlling for synonyms as the preferred term is indexed and synonyms or related terms direct the user to the preferred term. This means the user can find articles regardless of the specific term used by the author and saves the user from having to know and check all possible synonyms. It also removes any confusion caused by homographs by inclusion of a qualifying term. A third advantage is that it allows the linking of related terms whether they are linked by hierarchy or association, e.g. an index entry for an oral medication may list other oral medications as related terms on the same level of the hierarchy but would also link to broader terms such as treatment. Assignment indexing is used in manual indexing to improve inter-indexer consistency as different indexers will have a controlled set of terms to choose from. Controlled vocabularies do not completely remove inconsistencies as two indexers may still interpret the subject differently. == Index presentation == The final phase of indexing is to present the entries in a systematic order. This may involve linking entries. In a pre-coordinated index the indexer determines the order in which terms are linked in an entry by considering how a user may formulate their search. In a post-coordinated index, the entries are presented singly and the user can link the entries through searches, most commonly carried out by computer software. Post-coordination results in a loss of precision in comparison to pre-coordination. == Depth of indexing == Indexers must make decisions about what entries should be included and how many entries an index should incorporate. The depth of indexing describes the thoroughness of the indexing process with reference to exhaustivity and specificity. === Exhaustivity === An exhaustive index is one which lists all possible index terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man-hours are required. The additional time taken in an automated system would be much less significant. At the other end of the scale, in a selective index only the most important aspects are covered. Recall is reduced in a selective index as if an indexer does not include enough terms, a highly relevant article may be overlooked. Therefore, indexers should strive for a balance and consider what the document may be used. They may also have to consider the implications of time and expense. === Specificity === The specificity describes how closel

    Read more →
  • Right to explanation

    Right to explanation

    In the regulation of algorithms, particularly artificial intelligence and its subfield of machine learning, a right to [an] explanation is a right to be given an explanation for an output of the algorithm. Such rights primarily refer to individual rights to be given an explanation for decisions that significantly affect an individual, particularly legally or financially. For example, a person who applies for a loan and is denied may ask for an explanation, which could be "Credit bureau X reports that you declared bankruptcy last year; this is the main factor in considering you too likely to default, and thus we will not give you the loan you applied for." Some such legal rights already exist, while the scope of a general "right to explanation" is a matter of ongoing debate. There have been arguments made that a "social right to explanation" is a crucial foundation for an information society, particularly as the institutions of that society will need to use digital technologies, artificial intelligence, machine learning. In other words, that the related automated decision making systems that use explainability would be more trustworthy and transparent. Without this right, which could be constituted both legally and through professional standards, the public will be left without much recourse to challenge the decisions of automated systems. == Examples == === Credit scoring in the United States === Under the Equal Credit Opportunity Act (Regulation B of the Code of Federal Regulations), Title 12, Chapter X, Part 1002, §1002.9, creditors are required to notify applicants who are denied credit with specific reasons for the detail. As detailed in §1002.9(b)(2): (2) Statement of specific reasons. The statement of reasons for adverse action required by paragraph (a)(2)(i) of this section must be specific and indicate the principal reason(s) for the adverse action. Statements that the adverse action was based on the creditor's internal standards or policies or that the applicant, joint applicant, or similar party failed to achieve a qualifying score on the creditor's credit scoring system are insufficient. The official interpretation of this section details what types of statements are acceptable. Creditors comply with this regulation by providing a list of reasons (generally at most 4, per interpretation of regulations), consisting of a numeric reason code (as identifier) and an associated explanation, identifying the main factors affecting a credit score. An example might be: 32: Balances on bankcard or revolving accounts too high compared to credit limits === European Union === The European Union General Data Protection Regulation (GDPR, enacted 2016, taking effect 2018) extends the automated decision-making rights in the 1995 Data Protection Directive to provide a legally disputed form of a right to an explanation, stated as such in Recital 71: "[the data subject should have] the right ... to obtain an explanation of the decision reached". In full: The data subject should have the right not to be subject to a decision, which may include a measure, evaluating personal aspects relating to him or her which is based solely on automated processing and which produces legal effects concerning him or her or similarly significantly affects him or her, such as automatic refusal of an online credit application or e-recruiting practices without any human intervention. ... In any case, such processing should be subject to suitable safeguards, which should include specific information to the data subject and the right to obtain human intervention, to express his or her point of view, to obtain an explanation of the decision reached after such assessment and to challenge the decision. However, the extent to which the regulations themselves provide a "right to explanation" is heavily debated. There are two main strands of criticism. There are significant legal issues with the right as found in Article 22 — as recitals are not binding, and the right to an explanation is not mentioned in the binding articles of the text, having been removed during the legislative process. In addition, there are significant restrictions on the types of automated decisions that are covered — which must be both "solely" based on automated processing, and have legal or similarly significant effects — which significantly limits the range of automated systems and decisions to which the right would apply. In particular, the right is unlikely to apply in many of the cases of algorithmic controversy that have been picked up in the media. The UK has also recently amended its implementation of Article 22. A second potential source of such a right has been pointed to in Article 15, the "right of access by the data subject". This restates a similar provision from the 1995 Data Protection Directive, allowing the data subject access to "meaningful information about the logic involved" in the same significant, solely automated decision-making, found in Article 22. Yet this too suffers from alleged challenges that relate to the timing of when this right can be drawn upon, as well as practical challenges that mean it may not be binding in many cases of public concern. Other EU legislative instruments contain explanation rights. The European Union's Artificial Intelligence Act provides in Article 86 a "[r]ight to explanation of individual decision-making" of certain high risk systems which produce significant, adverse effects to an individual's health, safety or fundamental rights. The right provides for "clear and meaningful explanations of the role of the AI system in the decision-making procedure and the main elements of the decision taken", although only applies to the extent other law does not provide such a right. The Digital Services Act in Article 27, and the Platform to Business Regulation in Article 5, both contain rights to have the main parameters of certain recommender systems to be made clear, although these provisions have been criticised as not matching the way that such systems work. The Platform Work Directive, which provides for regulation of automation in gig economy work as an extension of data protection law, further contains explanation provisions in Article 11, using the specific language of "explanation" in a binding article rather than a recital as is the case in the GDPR. Scholars note that remains uncertainty as to whether these provisions imply sufficiently tailored explanation in practice which will need to be resolved by courts. === France === In France the 2016 Loi pour une République numérique (Digital Republic Act or loi numérique) amends the country's administrative code to introduce a new provision for the explanation of decisions made by public sector bodies about individuals. It notes that where there is "a decision taken on the basis of an algorithmic treatment", the rules that define that treatment and its "principal characteristics" must be communicated to the citizen upon request, where there is not an exclusion (e.g. for national security or defence). These should include the following: the degree and the mode of contribution of the algorithmic processing to the decision- making; the data processed and its source; the treatment parameters, and where appropriate, their weighting, applied to the situation of the person concerned; the operations carried out by the treatment. Scholars have noted that this right, while limited to administrative decisions, goes beyond the GDPR right to explicitly apply to decision support rather than decisions "solely" based on automated processing, as well as provides a framework for explaining specific decisions. Indeed, the GDPR automated decision-making rights in the European Union, one of the places a "right to an explanation" has been sought within, find their origins in French law in the late 1970s. == Criticism == Some argue that a "right to explanation" is at best unnecessary, at worst harmful, and threatens to stifle innovation. Specific criticisms include: favoring human decisions over machine decisions, being redundant with existing laws, and focusing on process over outcome. Authors of study "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For" Lilian Edwards and Michael Veale argue that a right to explanation is not the solution to harms caused to stakeholders by algorithmic decisions. They also state that the right of explanation in the GDPR is narrowly defined, and is not compatible with how modern machine learning technologies are being developed. With these limitations, defining transparency within the context of algorithmic accountability remains a problem. For example, providing the source code of algorithms may not be sufficient and may create other problems in terms of privacy disclosures and the gaming of technical systems. To mitigate this issue, Edwards and Veale argue that an auditing system could be more effective, to allow auditors to loo

    Read more →
  • Interviewer effect

    Interviewer effect

    The interviewer effect (also called interviewer variance or interviewer error) is the distortion of response to an interviewer-administered data collection effort which results from differential reactions to the social style and personality of interviewers or to their presentation of particular questions. The use of fixed-wording questions is one method of reducing interviewer bias. Anthropological research and case-studies are also affected by the problem, which is exacerbated by the self-fulfilling prophecy, when the researcher is also the interviewer it is also any effect on data gathered from interviewing people that is caused by the behavior or characteristics (real or perceived) of the interviewer. Interviewer effects can also be associated with the characteristics of the interviewer, such as race. Whether black respondents are interviewed by white interviewers or black interviewers has a strong impact on their responses to both attitude questions and behavioral ones. In the latter case, for example, if black respondents are interviewed by black interviewers in pre-election surveys, they are more likely to actually vote in the upcoming election than if they are interviewed by white interviewers. Furthermore, the race of the interviewer can also affect answers to factual questions that might take the form of a test of how informed the respondent is. Black respondents in a survey of political knowledge, for example, get fewer correct answers to factual questions about politics when interviewed by white interviewers than when interviewed by black interviewers. This is consistent with the research literature on stereotype threat, which finds diminished test performance of potentially stigmatised groups when the interviewer or test supervisor is from a perceived higher status group. Interviewer effects can be mitigated somewhat by randomly assigning subjects to different interviewers, or by using tools such as computer-assisted telephone interviewing (CATI).

    Read more →
  • Algorithm

    Algorithm

    In mathematics and computer science, an algorithm ( ) is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning). In contrast, a heuristic is an approach to solving problems without well-defined correct or optimal results. For example, although social media recommender systems are commonly called "algorithms", they actually rely on heuristics as there is no truly "correct" recommendation. As an effective method, an algorithm can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and input, a computation occurs at each step, eventually producing output and terminating. The transition between states can be non-deterministic; randomized algorithms incorporate random input. == Etymology == Around 825 AD, Persian scientist and polymath Muḥammad ibn Mūsā al-Khwārizmī wrote kitāb al-ḥisāb al-hindī ("Book of Indian computation") and kitab al-jam' wa'l-tafriq al-ḥisāb al-hindī ("Addition and subtraction in Indian arithmetic"). In the early 12th century, Latin translations of these texts involving the Hindu–Arabic numeral system and arithmetic appeared, for example Liber Alghoarismi de practica arismetrice, attributed to John of Seville, and Liber Algoritmi de numero Indorum, attributed to Adelard of Bath. Here, alghoarismi or algoritmi is the Latinization of Al-Khwarizmi's name; the text starts with the phrase Dixit Algoritmi, or "Thus spoke Al-Khwarizmi". The word algorism in English came to mean the use of place-value notation in calculations; it occurs in the Ancrene Wisse from circa 1225. By the time Geoffrey Chaucer wrote The Canterbury Tales in the late 14th century, he used a variant of the same word in describing augrym stones, stones used for place-value calculation. In the 15th century, under the influence of the Greek word ἀριθμός (arithmos, "number"; cf. "arithmetic"), the Latin word was altered to algorithmus. By 1596, this form of the word was used in English, as algorithm, by Thomas Hood. == Definition == One informal definition is "a set of rules that precisely defines a sequence of operations", which would include all computer programs, and any bureaucratic procedure or cook-book recipe. In general, a program is an algorithm only if it stops eventually. Formally, algorithm is an explicit set of instructions to produce an output, that can be followed by a computer or a human performing specific operations on symbols.. == History == === Ancient algorithms === Step-by-step procedures for solving mathematical problems have been recorded since antiquity. This includes in Babylonian mathematics (around 2500 BC), Egyptian mathematics (around 1550 BC), Indian mathematics (around 800 BC and later), the Ifa Oracle (around 500 BC), Greek mathematics (around 240 BC), Chinese mathematics (around 200 BC and later), and Arabic mathematics (around 800 AD). The earliest evidence of algorithms is found in ancient Mesopotamian mathematics. A Sumerian clay tablet found in Shuruppak near Baghdad and dated to c. 2500 BC describes the earliest division algorithm. During the Hammurabi dynasty c. 1800 – c. 1600 BC, Babylonian clay tablets described algorithms for computing formulas. Algorithms were also used in Babylonian astronomy. Babylonian clay tablets describe and employ algorithmic procedures to compute the time and place of significant astronomical events. Algorithms for arithmetic are also found in ancient Egyptian mathematics, dating back to the Rhind Mathematical Papyrus c. 1550 BC. Algorithms were later used in ancient Hellenistic mathematics. Two examples are the Sieve of Eratosthenes, which was described in the Introduction to Arithmetic by Nicomachus, and the Euclidean algorithm, which was first described in Euclid's Elements (c. 300 BC).Examples of ancient Indian mathematics included the Shulba Sutras, the Kerala School, and the Brāhmasphuṭasiddhānta. In the 9th century, Muḥammad ibn Mūsā al-Khwārizmī revolutionized the field by establishing the algorithm as a systematic, finite sequence of logical steps to solve mathematical problems. In his influential work, The Compendious Book on Calculation by Completion and Balancing, he moved beyond specific numerical solutions to introduce general procedures for algebraic reduction and balancing. This transformed mathematics into a 'mechanical' process of well-defined rules—a fundamental shift that laid the groundwork for modern algorithmic theory. The Latin translation of his arithmetic treatise, titled Algoritmi de numero Indorum, led to the term algorithm being derived from the Latinization of his name, Algoritmi, specifically to describe this new rule-based approach to mathematics. The first cryptographic algorithm for deciphering encrypted code was developed by Al-Kindi, a 9th-century Arab mathematician, in A Manuscript On Deciphering Cryptographic Messages. He gave the first description of cryptanalysis by frequency analysis, the earliest codebreaking algorithm. === Computers === ==== Weight-driven clocks ==== Weight-driven clocks were a key European invention in Middle Ages, specifically the verge escapement mechanism producing the tick of mechanical clocks. Accurate automatic machines led to mechanical automata in the 13th century and computational machines—the difference and analytical engines of Charles Babbage and Ada Lovelace in the mid-19th century. Lovelace designed the first algorithm intended for a computer, Babbage's analytical engine, the first real Turing-complete computer, more than the mechanical calculators of the time. Although the full implementation of Babbage's second device was only built decades after her lifetime, Lovelace has been called "history's first programmer". ==== Electromechanical relay ==== The Jacquard loom, a precursor to punch cards, and telephone switching machines led to the development of the first computers. By the mid-19th century, the telegraph, was in use throughout the world. By the late 19th century, ticker tape (c. 1870s) and punch cards (c. 1890) were developed. Then came the teleprinter (c. 1910) with its punched-paper use of Baudot code on tape. Telephone-switching networks of electromechanical relays were invented in 1835. These led to the invention of the digital adding device by George Stibitz in 1937. While working in Bell Laboratories, he observed the "burdensome" use of mechanical calculators with gears, prompting him to experiment create an experimental digital adder at home. === Formalization === In 1928, a partial formalization of the modern concept of algorithms began with attempts to solve David Hilbert's Entscheidungsproblem (decision problem). Later formalizations were framed as attempts to define "effective calculability" or "effective method". Those formalizations included the Gödel–Herbrand–Kleene recursive functions of 1930, 1934 and 1935, Alonzo Church's lambda calculus of 1936, Emil Post's Formulation 1 of 1936, and Alan Turing's Turing machines of 1936–37 and 1939. === Modern Algorithms === For decades, it was assumed that algorithm evolution progresses from heuristics to formal algorithms. A Symbolic integration provides a classic illustration. In 1961, James Slagle’s program SAINT used heuristics to solve 52 of 54 freshman calculus exercises from an MIT textbook (≈96%). In 1967, Larry Moses’s SIN refined the heuristics and achieved 100% success, though it remained heuristic. Finally, in 1969, Robert Risch introduced the Risch Algorithm with formal guarantees. This trajectory defined the traditional path: heuristics evolving until a definitive, guaranteed algorithm emerged. However, the rise of transformer-based AI has inverted this sequence — classical algorithms are now being displaced by heuristics once again. Algorithms have evolved and improved in many ways as time goes on. Common uses of algorithms today include social media apps like Instagram and YouTube. Algorithms are used as a way to analyze what people like and push more of those things to the people who interact with them. Quantum computing uses quantum algorithm procedures to solve problems faster. More recently, in 2024, NIST updated their post-quantum encryption standards, which includes new encryption algorithms to enhance defenses against attacks using quantum computing. == Representations == Algorithms can be expressed in many kinds of notation, including natural languages, pseudocode, flowcharts, drakon-charts, programming languages or control tables. Natural language expressions of algorithms tend to be verbose and ambiguous and are rarely used for complex or technical algor

    Read more →
  • KiSAO

    KiSAO

    The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. KiSAO is part of the BioModels.net project and of the COMBINE initiative. == Structure == KiSAO consists of three main branches: simulation algorithm simulation algorithm characteristic simulation algorithm parameter The elements of each algorithm branch are linked to characteristic and parameter branches using has characteristic and has parameter relationships accordingly. The algorithm branch itself is hierarchically structured using relationships which denote that the descendant algorithms were derived from, or specify, more general ancestors.

    Read more →
  • Jive (software)

    Jive (software)

    Jive (formerly known as Clearspace, then Jive SBS, then Jive Engage) is a commercial Java EE-based Enterprise 2.0 collaboration and knowledge management tool produced by Jive Software. It was first released as "Clearspace" in 2006, then renamed SBS (for "Social Business Software") in March 2009, then renamed "Jive Engage" in 2011, and renamed simply to "Jive" in 2012. Jive integrates the functionality of online communities, microblogging, social networking, discussion forums, blogs, wikis, and IM under one unified user interface. Content placed into any of the systems (blog, wiki, documentation, etc.) can be found through a common search interface. Other features include RSS capability, email integration, a reputation and reward system for participation, personal user profiles, JAX-WS web service interoperability, and integration with the Spring Framework. The product is a pure-Java server-side web application and will run on any platform where Java (JDK 1.5 or higher) is installed. It does not require a dedicated server - users have reported successful deployment in both shared environments and multiple machine clusters. As of Jive 8, released March 30, 2015, there is a Jive-n version which is for internal use (hosted by the consumer or hosted by Jive as a service) and a Jive-x version which is an external version hosted as a service. Jive no longer supports wiki markup language. == Server requirements for Jive 8-n == The following are the server requirements for Jive 8-n Operating systems: RHEL version 6 or 7 for x86_64, CentOS version 6 or 7 for x86_64 or SuSE Enterprise Linux Server (SLES) 11 and 12 for x86_64 Application Servers: Jive ships with its own embedded Apache HTTPD and Tomcat servers as part of the install package. It is not possible to deploy the application onto other appservers. Databases: MySQL (5.1, 5.5, 5.6) Oracle (11gR2, 12c) Postgres (9.0, 9.1, 9.2, 9.3, 9.4 - 9.2 or higher recommended) Microsoft SQL Server (2008R2, 2012, 2014) Environment: Jive recommends a server with at least 4GB of RAM and a dual-core 2 GHz processor with x86_64 architecture The product integrates with an LDAP repository or Active Directory For optimal deployment with a large community Jive Software recommends: using dedicated cache and document-conversion servers hosting the application and database servers separately == Releases == Jive 8, released on March 30, 2015 Jive 7, released in October 2013 Jive 9.0.x, released in November 2016 Jive 9, released in November 2016, supported now

    Read more →
  • Interviewer effect

    Interviewer effect

    The interviewer effect (also called interviewer variance or interviewer error) is the distortion of response to an interviewer-administered data collection effort which results from differential reactions to the social style and personality of interviewers or to their presentation of particular questions. The use of fixed-wording questions is one method of reducing interviewer bias. Anthropological research and case-studies are also affected by the problem, which is exacerbated by the self-fulfilling prophecy, when the researcher is also the interviewer it is also any effect on data gathered from interviewing people that is caused by the behavior or characteristics (real or perceived) of the interviewer. Interviewer effects can also be associated with the characteristics of the interviewer, such as race. Whether black respondents are interviewed by white interviewers or black interviewers has a strong impact on their responses to both attitude questions and behavioral ones. In the latter case, for example, if black respondents are interviewed by black interviewers in pre-election surveys, they are more likely to actually vote in the upcoming election than if they are interviewed by white interviewers. Furthermore, the race of the interviewer can also affect answers to factual questions that might take the form of a test of how informed the respondent is. Black respondents in a survey of political knowledge, for example, get fewer correct answers to factual questions about politics when interviewed by white interviewers than when interviewed by black interviewers. This is consistent with the research literature on stereotype threat, which finds diminished test performance of potentially stigmatised groups when the interviewer or test supervisor is from a perceived higher status group. Interviewer effects can be mitigated somewhat by randomly assigning subjects to different interviewers, or by using tools such as computer-assisted telephone interviewing (CATI).

    Read more →
  • Semantic integration

    Semantic integration

    Semantic integration is the process of interrelating information from diverse sources, for example calendars and to do lists, email archives, presence information (physical, psychological, and social), documents of all sorts, contacts (including social graphs), search results, and advertising and marketing relevance derived from them. In this regard, semantics focuses on the organization of and action upon information by acting as an intermediary between heterogeneous data sources, which may conflict not only by structure but also context or value. == Applications and methods == In enterprise application integration (EAI), semantic integration can facilitate or even automate the communication between computer systems using metadata publishing. Metadata publishing potentially offers the ability to automatically link ontologies. One approach to (semi-)automated ontology mapping requires the definition of a semantic distance or its inverse, semantic similarity and appropriate rules. Other approaches include so-called lexical methods, as well as methodologies that rely on exploiting the structures of the ontologies. For explicitly stating similarity/equality, there exist special properties or relationships in most ontology languages. OWL, for example has "owl:equivalentClass", "owl:equivalentProperty" and "owl:sameAs". Eventually system designs may see the advent of composable architectures where published semantic-based interfaces are joined together to enable new and meaningful capabilities. These could predominately be described by means of design-time declarative specifications, that could ultimately be rendered and executed at run-time. Semantic integration can also be used to facilitate design-time activities of interface design and mapping. In this model, semantics are only explicitly applied to design and the run-time systems work at the syntax level. This "early semantic binding" approach can improve overall system performance while retaining the benefits of semantic driven design. == Semantic integration situations == From the industry use case, it has been observed that the semantic mappings were performed only within the scope of the ontology class or the datatype property. These identified semantic integrations are (1) integration of ontology class instances into another ontology class without any constraint, (2) integration of selected instances in one ontology class into another ontology class by the range constraint of the property value and (3) integration of ontology class instances into another ontology class with the value transformation of the instance property. Each of them requires a particular mapping relationship, which is respectively: (1) equivalent or subsumption mapping relationship, (2) conditional mapping relationship that constraints the value of property (data range) and (3) transformation mapping relationship that transforms the value of property (unit transformation). Each identified mapping relationship can be defined as either (1) direct mapping type, (2) data range mapping type or (3) unit transformation mapping type. == KG vs. RDB approaches == In the case of integrating supplemental data source, KG(Knowledge graph) formally represents the meaning involved in information by describing concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources. The rules can be applied on KG more efficiently using graph query. For example, the graph query does the data inference through the connected relations, instead of repeated full search of the tables in relational database. KG facilitates the integration of new heterogeneous data by just adding new relationships between existing information and new entities. This facilitation is emphasized for the integration with existing popular linked open data source such as Wikidata.org. SQL query is tightly coupled and rigidly constrained by datatype within the specific database and can join tables and extract data from tables, and the result is generally a table, and a query can join tables by any columns which match by datatype. SPARQL query is the standard query language and protocol for Linked Open Data on the web and loosely coupled with the database so that it facilitates the reusability and can extract data through the relations free from the datatype, and not only extract but also generate additional knowledge graph with more sophisticated operations(logic: transitive/symmetric/inverseOf/functional). The inference based query (query on the existing asserted facts without the generation of new facts by logic) can be fast comparing to the reasoning based query (query on the existing plus the generated/discovered facts based on logic). The information integration of heterogeneous data sources in traditional database is intricate, which requires the redesign of the database table such as changing the structure and/or addition of new data. In the case of semantic query, SPARQL query reflects the relationships between entities in a way that aligned with human's understanding of the domain, so the semantic intention of the query can be seen on the query itself. Unlike SPARQL, SQL query, which reflects the specific structure of the database and derived from matching the relevant primary and foreign keys of tables, loses the semantics of the query by missing the relationships between entities. Below is the example that compares SPARQL and SQL queries for medications that treats "TB of vertebra". SELECT ?medication WHERE { ?diagnosis a example:Diagnosis . ?diagnosis example:name “TB of vertebra” . ?medication example:canTreat ?diagnosis . } SELECT DRUG.medID FROM DIAGNOSIS, DRUG, DRUG_DIAGNOSIS WHERE DIAGNOSIS.diagnosisID=DRUG_DIAGNOSIS.diagnosisID AND DRUG.medID=DRUG_DIAGNOSIS.medID AND DIAGNOSIS.name=”TB of vertebra” == Examples == The Pacific Symposium on Biocomputing has been a venue for the popularization of the ontology mapping task in the biomedical domain, and a number of papers on the subject can be found in its proceedings.

    Read more →