AI Generator With No Limits

AI Generator With No Limits — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • TAChart

    TAChart

    TAChart is a component for the Lazarus IDE that provides charting services. Similar to Tchart and Teechart for Delphi it supports a collection of different chart types including bar charts, pie charts, line charts and point series. Apart from a screen canvas, output is possible in form of SVG, OpenGL, printer, WMF, and other formats. TAChart is bundled with the Lazarus Component Library. Although not intended to be a TChart clone, why its usage differs in certain points, its basic functionality is very similar and some source code written for TeeChart may be reused. == History == The first version of TAChart was developed by Philippe Martinole for the TeleAuto project, a program for automation of astronomic observations. Later functionality was introduced by Luis Rodrigues while porting the Epanet application from Delphi to Lazarus. In the ensuing years the code has extensively rewritten, expanded and is now maintained by Alexander Klenin. == Data sources == TAChart is able to use input from various sources. Examples include lists of real values, user defined buffers in the computer's memory, vectors of random values, fields in databases, calculated values provided by pre-defined functions and results of embedded code written in Pascal Script

    Read more →
  • Birkhoff algorithm

    Birkhoff algorithm

    Birkhoff's algorithm (also called Birkhoff-von-Neumann algorithm) is an algorithm for decomposing a bistochastic matrix into a convex combination of permutation matrices. It was published by Garrett Birkhoff in 1946. It has many applications. One such application is for the problem of fair random assignment: given a randomized allocation of items, Birkhoff's algorithm can decompose it into a lottery on deterministic allocations. == Terminology == A bistochastic matrix (also called: doubly-stochastic) is a matrix in which all elements are greater than or equal to 0 and the sum of the elements in each row and column equals 1. An example is the following 3-by-3 matrix: ( 0.2 0.3 0.5 0.6 0.2 0.2 0.2 0.5 0.3 ) {\displaystyle {\begin{pmatrix}0.2&0.3&0.5\\0.6&0.2&0.2\\0.2&0.5&0.3\end{pmatrix}}} A permutation matrix is a special case of a bistochastic matrix, in which each element is either 0 or 1 (so there is exactly one "1" in each row and each column). An example is the following 3-by-3 matrix: ( 0 1 0 0 0 1 1 0 0 ) {\displaystyle {\begin{pmatrix}0&1&0\\0&0&1\\1&0&0\end{pmatrix}}} A Birkhoff decomposition (also called: Birkhoff-von-Neumann decomposition) of a bistochastic matrix is a presentation of it as a sum of permutation matrices with non-negative weights. For example, the above matrix can be presented as the following sum: 0.2 ( 0 1 0 0 0 1 1 0 0 ) + 0.2 ( 1 0 0 0 1 0 0 0 1 ) + 0.1 ( 0 1 0 1 0 0 0 0 1 ) + 0.5 ( 0 0 1 1 0 0 0 1 0 ) {\displaystyle 0.2{\begin{pmatrix}0&1&0\\0&0&1\\1&0&0\end{pmatrix}}+0.2{\begin{pmatrix}1&0&0\\0&1&0\\0&0&1\end{pmatrix}}+0.1{\begin{pmatrix}0&1&0\\1&0&0\\0&0&1\end{pmatrix}}+0.5{\begin{pmatrix}0&0&1\\1&0&0\\0&1&0\end{pmatrix}}} Birkhoff's algorithm receives as input a bistochastic matrix and returns as output a Birkhoff decomposition. == Tools == A permutation set of an n-by-n matrix X is a set of n entries of X containing exactly one entry from each row and from each column. A theorem by Dénes Kőnig says that: Every bistochastic matrix has a permutation-set in which all entries are positive.The positivity graph of an n-by-n matrix X is a bipartite graph with 2n vertices, in which the vertices on one side are n rows and the vertices on the other side are the n columns, and there is an edge between a row and a column if the entry at that row and column is positive. A permutation set with positive entries is equivalent to a perfect matching in the positivity graph. A perfect matching in a bipartite graph can be found in polynomial time, e.g. using any algorithm for maximum cardinality matching. Kőnig's theorem is equivalent to the following:The positivity graph of any bistochastic matrix admits a perfect matching.A matrix is called scaled-bistochastic if all elements are non-negative, and the sum of each row and column equals c, where c is some positive constant. In other words, it is c times a bistochastic matrix. Since the positivity graph is not affected by scaling:The positivity graph of any scaled-bistochastic matrix admits a perfect matching. == Algorithm == Birkhoff's algorithm is a greedy algorithm: it greedily finds perfect matchings and removes them from the fractional matching. It works as follows. Let i = 1. Construct the positivity graph GX of X. Find a perfect matching in GX, corresponding to a positive permutation set in X. Let z[i] > 0 be the smallest entry in the permutation set. Let P[i] be a permutation matrix with 1 in the positive permutation set. Let X := X − z[i] P[i]. If X contains nonzero elements, Let i = i + 1 and go back to step 2. Otherwise, return the sum: z[1] P[1] + ... + z[2] P[2] + ... + z[i] P[i]. The algorithm is correct because, after step 6, the sum in each row and each column drops by z[i]. Therefore, the matrix X remains scaled-bistochastic. Therefore, in step 3, a perfect matching always exists. == Run-time complexity == By the selection of z[i] in step 4, in each iteration at least one element of X becomes 0. Therefore, the algorithm must end after at most n2 steps. However, the last step must simultaneously make n elements 0, so the algorithm ends after at most n2 − n + 1 steps, which implies O ( n 2 ) {\displaystyle O(n^{2})} . In 1960, Joshnson, Dulmage and Mendelsohn showed that Birkhoff's algorithm actually ends after at most n2 − 2n + 2 steps, which is tight in general (that is, in some cases n2 − 2n + 2 permutation matrices may be required). == Application in fair division == In the fair random assignment problem, there are n objects and n people with different preferences over the objects. It is required to give an object to each person. To attain fairness, the allocation is randomized: for each (person, object) pair, a probability is calculated, such that the sum of probabilities for each person and for each object is 1. The probabilistic-serial procedure can compute the probabilities such that each agent, looking at the matrix of probabilities, prefers his row of probabilities over the rows of all other people (this property is called envy-freeness). This raises the question of how to implement this randomized allocation in practice? One cannot just randomize for each object separately, since this may result in allocations in which some people get many objects while other people get no objects. Here, Birkhoff's algorithm is useful. The matrix of probabilities, calculated by the probabilistic-serial algorithm, is bistochastic. Birkhoff's algorithm can decompose it into a convex combination of permutation matrices. Each permutation matrix represents a deterministic assignment, in which every agent receives exactly one object. The coefficient of each such matrix is interpreted as a probability; based on the calculated probabilities, it is possible to pick one assignment at random and implement it. == Extensions == The problem of computing the Birkhoff decomposition with the minimum number of terms has been shown to be NP-hard, but some heuristics for computing it are known. This theorem can be extended for the general stochastic matrix with deterministic transition matrices. Budish, Che, Kojima and Milgrom generalize Birkhoff's algorithm to non-square matrices, with some constraints on the feasible assignments. They also present a decomposition algorithm that minimizes the variance in the expected values. Vazirani generalizes Birkhoff's algorithm to non-bipartite graphs. Valls et al. showed that it is possible to obtain an ϵ {\displaystyle \epsilon } -approximate decomposition with O ( log ⁡ ( 1 / ϵ 2 ) ) {\displaystyle O(\log(1/\epsilon ^{2}))} permutations.

    Read more →
  • Outline of artificial intelligence

    Outline of artificial intelligence

    The following outline is provided as an overview of and topical guide to artificial intelligence: Artificial intelligence (AI) is intelligence exhibited by machines or software. It is also the name of the scientific field which studies how to create computers and computer software that are capable of intelligent behavior. == AI terminology == Glossary of artificial intelligence == Goals and applications == === General intelligence === Artificial general intelligence AI-complete === Reasoning and problem solving === Automated reasoning Mathematics Automated theorem prover Computer-assisted proof – Computer algebra General Problem Solver Expert system – Decision support system – Clinical decision support system – === Knowledge representation === Knowledge representation Knowledge management Cyc === Planning === Automated planning and scheduling Strategic planning Sussman anomaly – === Learning === Machine learning – Constrained Conditional Models – Deep learning – Neural modeling fields – Supervised learning – Weak supervision (semi-supervised learning) – Unsupervised learning – === Natural language processing === Natural language processing (outline) – Chatterbots – Language identification – Large language model – Retrieval-augmented generation – Natural language user interface – Natural language understanding – Machine translation – Statistical semantics – Question answering – Semantic translation – Concept mining – Data mining – Text mining – Process mining – E-mail spam filtering – Information extraction – Named-entity extraction – Coreference resolution – Named-entity recognition – Relationship extraction – Terminology extraction – === Perception === Machine perception Pattern recognition – Computer Audition – Speech recognition – Speaker recognition – Computer vision (outline) – Image processing Intelligent word recognition – Object recognition – Optical mark recognition – Handwriting recognition – Optical character recognition – Automatic number plate recognition – Information extraction – Image retrieval – Automatic image annotation – Facial recognition systems – Silent speech interface – Activity recognition – Percept (artificial intelligence) === Robotics === Robotics – Behavior-based robotics – Cognitive – Cybernetics – Developmental robotics – Evolutionary robotics – === Control === Intelligent control Self-management (computer science) – Autonomic Computing – Autonomic Networking – === Social intelligence === Affective computing Kismet === Game playing === Game artificial intelligence – Computer game bot – computer replacement for human players. Video game AI – Computer chess – Computer Go – General game playing – General video game playing – === Creativity, art and entertainment === Artificial creativity Artificial life Artificial intelligence art AI anthropomorphism AI agent AI web browser AI boom AI slop Creative computing Generative artificial intelligence Generative pre trained transformer Uncanny valley Music and artificial intelligence Computational humor Chatbot === Integrated AI systems === AIBO – Sony's robot dog. It integrates vision, hearing and motorskills. Asimo (2000 to present) – humanoid robot developed by Honda, capable of walking, running, negotiating through pedestrian traffic, climbing and descending stairs, recognizing speech commands and the faces of specific individuals, among a growing set of capabilities. MIRAGE – A.I. embodied humanoid in an augmented reality environment. Cog – M.I.T. humanoid robot project under the direction of Rodney Brooks. QRIO – Sony's version of a humanoid robot. TOPIO, TOSY's humanoid robot that can play ping-pong with humans. Watson (2011) – computer developed by IBM that played and won the game show Jeopardy! It is now being used to guide nurses in medical procedures. Purpose: Open domain question answering Technologies employed: Natural language processing Information retrieval Knowledge representation Automated reasoning Machine learning Project Debater (2018) – artificially intelligent computer system, designed to make coherent arguments, developed at IBM's lab in Haifa, Israel. === Intelligent personal assistants === Intelligent personal assistant – Amazon Alexa – Assistant – Braina – Cortana – Google Assistant – Google Now – Mycroft – Siri – Viv – === Other applications === Artificial life – simulation of natural life through the means of computers, robotics, or biochemistry. Automatic target recognition – Diagnosis (artificial intelligence) – Speech generating device – Vehicle infrastructure integration – Virtual Intelligence – == History == History of artificial intelligence Progress in artificial intelligence Timeline of artificial intelligence AI effect – as soon as AI successfully solves a problem, the problem is no longer considered by the public to be a part of AI. This phenomenon has occurred in relation to every AI application produced, so far, throughout the history of development of AI. AI winter – a period of disappointment and funding reductions occurring after a wave of high expectations and funding in AI. Such funding cuts occurred in the 1970s, for instance. Moore's law === History by period === 2017 in artificial intelligence 2018 in artificial intelligence 2019 in artificial intelligence 2020 in artificial intelligence 2021 in artificial intelligence 2022 in artificial intelligence 2023 in artificial intelligence 2024 in artificial intelligence 2025 in artificial intelligence 2026 in artificial intelligence 2027 in artificial intelligence 2028 in artificial intelligence 2029 in artificial intelligence === History by subject === History of logic (formal reasoning is an important precursor of AI) History of machine learning (timeline) History of machine translation (timeline) History of natural language processing History of optical character recognition (timeline) == AI algorithms and techniques == === Search === Discrete search algorithms Uninformed search Brute force search – Problem-solving technique and algorithmic paradigmPages displaying short descriptions of redirect targets Search tree – Data structure in tree form sorted for fast lookup Breadth-first search – Algorithm to search the nodes of a graph Depth-first search – Algorithm to search the nodes of a graph State space search – Class of search algorithmsPages displaying short descriptions of redirect targets Informed search Best-first search – Graph exploring search algorithm A search algorithm – Algorithm used for pathfinding and graph traversal Heuristics – Problem-solving methodPages displaying short descriptions of redirect targets Pruning (algorithm) – Data compression techniquePages displaying short descriptions of redirect targets Adversarial search Minmax algorithm – Decision rule used for minimizing the possible loss for a worst-case scenarioPages displaying short descriptions of redirect targets Logic as search Production system (computer science) – Computer program used to provide artificial intelligence Rule based system – Type of computer systemPages displaying short descriptions of redirect targets Production rule – Computer program used to provide artificial intelligence Inference rule – Method of deriving conclusionsPages displaying short descriptions of redirect targets Horn clause – Type of logical formula Forward chaining – Inference engine in an expert system Backward chaining – Method of forming inferences Planning as search State space search – Class of search algorithmsPages displaying short descriptions of redirect targets Means–ends analysis – Problem solving technique === Optimization search === Optimization (mathematics) algorithms Hill climbing – Optimization algorithm Simulated annealing – Probabilistic optimization technique and metaheuristic Beam search – Heuristic search algorithm Random optimization – Optimization technique in mathematics Evolutionary computation Genetic algorithms – Competitive algorithm for searching a problem spacePages displaying short descriptions of redirect targets Gene expression programming – Evolutionary algorithm Genetic programming – Evolving computer programs with techniques analogous to natural genetic processes Differential evolution – Method of mathematical optimization Society based learning algorithms. Swarm intelligence – Collective behavior of decentralized, self-organized systems Particle swarm optimization – Iterative simulation method Ant colony optimization – Optimization algorithmPages displaying short descriptions of redirect targets Metaheuristic – Optimization technique === Logic === Logic and automated reasoning Programming using logic Logic programming – Programming paradigm based on formal logic See "Logic as search" above. Forms of Logic Propositional logic First-order logic First-order logic with equality Constraint satisfaction – Process in artificial intelligence and operations research Fuzzy logic Fuzzy set theory – Sets whose elements have degrees of membershipPages displaying short descriptions

    Read more →
  • AI-driven design automation

    AI-driven design automation

    AI-driven design automation is the use of artificial intelligence (AI) to automate and improve different parts of the electronic design automation (EDA) process. It is particularly important in the design of integrated circuits (chips) and complex electronic systems, where it can potentially increase productivity, decrease costs, and speed up design cycles. AI Driven Design Automation uses several methods, including machine learning, expert systems, and reinforcement learning. These are used for many tasks, from planning a chip's architecture and logic synthesis to its physical design and final verification. == History == === 1980s–1990s: Expert systems and early experiments === The use of AI for design automation originated in the 1980s and 1990s, mainly with the creation of expert systems. These systems tried to capture the knowledge and practical rules used by human design experts, and used these rules, along with reasoning engines, to direct the design process. A notable early project was the ULYSSES system from Carnegie Mellon University. ULYSSES was a CAD tool integration environment that let expert designers turn their design methods into scripts that could be run automatically. It treated design tools as sources of knowledge that a scheduler could manage. Another example was the ADAM (Advanced Design AutoMation) system at the University of Southern California, which used an expert system called the Design Planning Engine. This engine figured out design strategies on the fly and handled different design jobs by organizing specialized knowledge into structured formats called frames. Other systems like DAA (Design Automation Assistant) used a rule-based approach for specific jobs, such as register transfer level (RTL) design for systems like the IBM 370. Researchers at Carnegie Mellon University also created TALIB, an expert system for mask layout that used over 1200 rules, and EMUCS/DAA for CPU architectural design which used about 70 rules. These projects showed that AI worked better for problems where relatively few rules were required to describe much larger amounts of data. At the same time, there was a surge of tools called silicon compilers like MacPitts, Arsenic, and Palladio. They used algorithms and search techniques to explore different design paradigms. This was another way to automate design, even if it was not always based on expert systems. Early tests with neural networks in VLSI design also happened during this time, although they were not as common as systems based on rules. === 2000s: Introduction of machine learning === In the 2000s, interest in AI for design automation increased. This was mostly because of better machine learning (ML) algorithms and more available data from design and manufacturing. For example, they were used to model and reduce the effects of small manufacturing differences in semiconductor devices. This became very important as the size of components on chips became smaller. The large amount of data created during chip design provided the foundation needed to train smarter ML models. This allowed for predicting outcomes and optimizing in areas that were hard to automate before. === 2016–2020: Reinforcement learning and large scale initiatives === A major turning point happened in the mid to late 2010s, sparked by successes in other areas of AI. The success of DeepMind's AlphaGo in mastering the game of Go inspired researchers. They began to apply reinforcement learning (RL) to difficult EDA problems. These problems often require searching through many options and making a series of decisions. In 2018, the U.S. DARPA started the Intelligent Design of Electronic Assets (IDEA) program. A main goal of IDEA was to create a fully automated layout generator that required no human intervention, able to produce a chip design ready for manufacturing from RTL specifications in 24 hours. Another big initiative was the OpenROAD project, a large effort under IDEA led by UC San Diego with industry and university partners, aimed to build an open source, independent toolchain. It used machine learning, parallelization and divide and conquer approaches. A much-publicized but controversial demonstration of RL's potential came from Google researchers between 2020 and 2021. They created a deep reinforcement learning method for planning the layout of a chip, known as floorplanning. They reported that this method created layouts that were as good as or better than those made by human experts, and it did so in less than six hours. This method used a type of network called a graph convolutional neural network. It showed that it could learn general patterns that could be applied to new problems, getting better as it saw more chip designs. The technology was later used to design Google's Tensor Processing Unit (TPU) accelerators. However, in the original paper, the improvement (if any) from AI was not demonstrated. There was no comparison with existing non-AI tools performing the same task, and since the data is proprietary, no ability for anyone else to perform this comparison. Various efforts to reproduce the AI algorithm, and compare its results with various commercial and academic tools, have yielded mixed results with no conclusive advantage to AI. === 2020s: Autonomous systems and agents === Entering the 2020s, the industry saw the commercial launch of autonomous AI driven EDA systems. For example, Synopsys launched DSO.ai (Design Space Optimization AI) in early 2020, calling it the first autonomous artificial intelligence application for chip design in the industry. This system uses reinforcement learning to search for the best ways to optimize a design within the huge number of possible solutions, trying to improve power, performance, and area (PPA). By 2023, DSO.ai had been used to produce over 100 commercial chips, showing mainstream adoption. Synopsys later grew its AI tools into a suite called Synopsys.ai. The goal was to use AI in the entire EDA workflow, including verification and testing. These advancements, which combine modern AI methods with cloud computing and large data resources, have led to talks about a new phase in EDA. Industry experts and participants sometimes call this 'EDA 4.0'. This new era is defined by the widespread use of AI and machine learning to deal with growing design complexity, automate more of the design process, and help engineers handle the huge amounts of data that EDA tools create. The purpose of EDA 4.0 is to optimize product performance, get products to market faster and make development and manufacturing smoother through intelligent automation. == Applications == Artificial intelligence (AI) is now used in many stages of the electronic design workflow. It aims to improve productivity, get better results, and handle the growing complexity of modern integrated circuits. AI helps designers from the very first ideas about architecture all the way to manufacturing and testing. === High level synthesis and architectural exploration === In the first phases of chip design, AI helps with High Level Synthesis (HLS) and exploring different system level design options (DSE). These processes are key for turning general ideas into detailed hardware plans. AI algorithms, often using supervised learning, are used to build simpler, substitute models. These models can quickly guess important design measurements like area, performance, and power for many different architectural options or HLS settings. For example, the Ithemal tool uses deep neural networks to estimate how fast basic code blocks will run, which helps in making processor architecture decisions. Similarly, PRIMAL uses machine learning estimate power use at the register transfer level (RTL), giving early information about how much power the chip will use. Reinforcement learning (RL) and Bayesian optimization are also used to guide the DSE process. They help search through the many parameters to find the best HLS settings or architectural details like cache sizes. LLMs are also being tested for creating architectural plans or initial C code for HLS, as seen with GPT4AIGChip. === Logic synthesis and optimization === Logic synthesis starts from a high level hardware description and generates an optimized list of electronic gates, known as a gate level netlist, that is ready for placement, routing, and then construction in a specific manufacturing process. AI methods help with different parts of this process, including logic optimization, technology mapping, and making improvements after mapping. Supervised learning, especially with Graph Neural Networks (GNNs), is good at handling data or problems that can be represented as graphs. Since circuit diagrams are instances of directed graphs, supervised learning can help create models that predict design properties like power or error rates in circuits. In logic synthesis and optimization reinforcement learning is used to perform logic optimization directly. In some cases ag

    Read more →
  • INDIAai

    INDIAai

    INDIAai is a web portal launched by the Government of India on 07 March 2024 for artificial intelligence-related developments in India. It is known as the National AI Portal of India, which was jointly started by the Ministry of Electronics and Information Technology (MeitY), the National e-Governance Division (NeGD) and the National Association of Software and Service Companies (NASSCOM) with support from the Department of School Education and Literacy (DoSE&L) and Ministry of Human Resource Development. == History == The portal was launched on 30 May 2020, by Ravi Shankar Prasad, the Union Minister for Electronics and IT, Law and Justice and Communications, on the first anniversary of the second tenure of Prime Minister Narendra Modi-led government. A national program for the youth, 'Responsible AI for Youth', was also launched on the same day. As of 2022, the website was visited by more than 4.5 lakh users with 1.2 million page views. It has 1151 articles on artificial intelligence, 701 news stories, 98 reports, 95 case studies and 213 videos on its portal. It maintains a database on AI ecosystem of India featuring 121 government initiatives and 281 startups. In May 2022, INDIAai released a book titled 'AI for Everyone' that covers the basics of AI. Cabinet chaired by the Prime Minister Narendra Modi has approved the comprehensive national-level IndiaAI mission with a budget outlay of Rs.10,371.92 crore. The Mission will be implemented by ‘IndiaAI’ Independent Business Division (IBD) under Digital India Corporation (DIC). == Objective and features == It aims to function as a one-stop portal for all AI-related development in India. The platform publishes resources such as articles, news, interviews, and investment funding news and events for AI startups, AI companies, and educational firms related to artificial intelligence in India. It also distributes documents, case studies, and research reports. Additionally, the platform provides education and employment opportunities related to AI. It offers AI courses, both free and paid.

    Read more →
  • BioCreative

    BioCreative

    BioCreAtIvE (A critical assessment of text mining methods in molecular biology) consists in a community-wide effort for evaluating information extraction and text mining developments in the biological domain. It was preceded by the Knowledge Discovery and Data Mining (KDD) Challenge Cup for detection of gene mentions. == Community Challenges == === First edition (2004-2005) === Three main tasks were posed at the first BioCreAtIvE challenge: the entity extraction task, the gene name normalization task, and the functional annotation of gene products task. The data sets produced by this contest serve as a Gold Standard training and test set to evaluate and train Bio-NER tools and annotation extraction tools. === Second edition (2006-2007) === The second BioCreAtIvE challenge (2006-2007) had also 3 tasks: detection of gene mentions, extraction of unique idenfiers for genes and extraction information related to physical protein-protein interactions. It counted with participation of 44 teams from 13 countries. === Third edition (2011-2012) === The third edition of BioCreative included for the first time the InterActive Task (IAT), designed to evaluate the practical usability of text mining tools in real-world biocuration tasks. === Fifth edition (2016) === BioCreative V had 5 different tracks, including an interactive task (IAT) for usability of text mining systems and a track using the BioC format for curating information for BioGRID.

    Read more →
  • Data management plan

    Data management plan

    A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this may lead to data being well-managed in the present, and prepared for preservation in the future. DMPs were originally used in 1966 to manage aeronautical and engineering projects' data collection and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs. == Importance == Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector. In the 2010s, funding agencies increasingly required data management plans as part of the proposal and evaluation process, despite little or no evidence of their efficacy. == Major components == "There is no general and definitive list of topics that should be covered in a DMP for a research project", and researchers are often left to their own devices as to how to fill out a DMP. === Information about data and data format === A description of data to be produced by the project. This might include (but is not limited to) data that are: Experimental Observational Raw or derived Physical collections Models Simulations Curriculum materials Software Images How will the data be acquired? When and where will they be acquired? After collection, how will the data be processed? Include information about Software used Algorithms Scientific workflows File formats that will be used, justify those formats, and describe the naming conventions used. Quality assurance & quality control measures that will be taken during sample collection, analysis, and processing. If existing data are used, what are their origins? How will the data collected be combined with existing data? What is the relationship between the data collected and existing data? How will the data be managed in the short-term? Consider the following: Version control for files Backing up data and data products Security & protection of data and data products Who will be responsible for management === Metadata content and format === Metadata are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as "data about data". Issues to be considered include: How detailed has the metadata to be in order to make the data meaningful? How will the metadata be created and/or captured? Examples include lab notebooks, GPS hand-held units, Auto-saved files on instruments, etc. What format will be used for the metadata? What are the metadata standards commonly used in the respective scientific discipline? There should be justification for the format chosen. === Policies for access, sharing, and re-use === Describe any obligations that exist for sharing data collected. These may include obligations from funding agencies, institutions, other professional organizations, and legal requirements. Include information about how data will be shared, including when the data will be accessible, how long the data will be available, how access can be gained, and any rights that the data collector reserves for using data. Address any ethical or privacy issues with data sharing Address intellectual property & copyright issues. Who owns the copyright? What are the institutional, publisher, and/or funding agency policies associated with intellectual property? Are there embargoes for political, commercial, or patent reasons? Describe the intended future uses/users for the data Indicate how the data should be cited by others. How will the issue of persistent citation be addressed? For example, if the data will be deposited in a public archive, will the dataset have a persistent identifier (e.g., ARK, DOI, Handle, PURL, URN) assigned to it? === Long-term storage and data management === Researchers should identify an appropriate archive for the long-term preservation of their data. By identifying the archive early in the project, the data can be formatted, transformed, and documented appropriately to meet the requirements of the archive. Researchers should consult colleagues and professional societies in their discipline to determine the most appropriate database, and include a backup archive in their data management plan in case their first choice goes out of existence. Early in the project, the primary researcher should identify what data will be preserved in an archive. Usually, preserving the data in its most raw form is desirable, although data derivatives and products can also be preserved. An individual should be identified as the primary contact person for archived data, and ensure contact information is always kept up-to-date in case there are requests for data or information about data. === Budget === Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are Human resources and staff as they handle data preparation, management, documentation, and preservation Hardware and/or software needed for data management, backing up, security, documentation, and preservation Costs associated with submitting the data to an archive The data management plan should include how these costs will be paid. == NSF Data Management Plan == All grant proposals submitted to National Science Foundation (NSF) must include a Data Management Plan that is no more than two pages. This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following: The types of data The standards to be used for data and metadata format and content Policies for access and sharing Policies and provisions for re-use Plans for archiving data Policy summarized from the NSF Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): Promptly publish with appropriate authorship Share data, samples, physical collections, and supporting materials with others, within a reasonable time frame Share software and inventions Investigators can keep their legal rights over their intellectual property, but they still have to make their results, data, and collections available to others Policies will be implemented via Proposal review Award negotiations and conditions Support/incentives == ESRC Data Management Plan == Since 1995, the UK's Economic and Social Research Council (ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The UK Data Service, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. ESRC has a longstanding arrangement with the UK Data A

    Read more →
  • Computer and information science

    Computer and information science

    Computer and information science (CIS; also known as information and computer science) is a field that emphasizes both computing and informatics, upholding the strong association between the fields of information sciences and computer sciences and treating computers as a tool rather than a field. Information science is one with a long history, unlike the relatively very young field of computer science, and is primarily concerned with gathering, storing, disseminating, sharing and protecting any and all forms of information. It is a broad field, covering a myriad of different areas but is often referenced alongside computer science because of the incredibly useful nature of computers and computer programs in helping those studying and doing research in the field – particularly in helping to analyse data and in spotting patterns too broad for a human to intuitively perceive. While information science is sometimes confused with information theory, the two have vastly different subject matter. Information theory focuses on one particular mathematical concept of information while information science is focused on all aspects of the processes and techniques of information. Computer science, in contrast, is less focused on information and its different states, but more, in a very broad sense, on the use of computers – both in theory and practice – to design and implement algorithms in order to aid the processing of information during the different states described above. It has strong foundations in the field of mathematics, as the very first recognised practitioners of the field were renowned mathematicians such as Alan Turing. Information science and computing began to converge in the 1950s and 1960s, as information scientists started to realize the many ways computers would improve information storage and retrieval. == Terminology == Due to the distinction between computers and computing, some of the research groups refer to computing or datalogy. The French refer to computer science as the term informatique. The term information and communications technology (ICT), refers to how humans communicate with using machines and computers, making a distinction from information and computer science, which is how computers use and gain information. Informatics is also distinct from computer science, which encompasses the study of logic and low-level computing issues. == Education == Universities may confer degrees with a major in computer and information science, not to be confused with a more specific Bachelor of Computer Science or respective graduate computer science degrees. The QS World University Rankings is one of the most widely recognised and distinguished university comparisons. They ranked the top 10 universities for computer science and information systems in 2015. They are: Massachusetts Institute of Technology (MIT) Stanford University University of Oxford Carnegie Mellon University Harvard University University of California, Berkeley (UCB) University of Cambridge The Hong Kong University of Science and Technology Swiss Federal Institute of Technology (ETH Zurich) Princeton University A Computer Information Science degree gives students both network and computing knowledge which is needed to design, develop, and assist information systems which helps to solve business problems and to support business problems and to support business operations and decision making at a managerial level also. == Areas of information and computer science == Due to the nature of this field, many topics are also shared with computer science and information systems. The discipline of Information and Computer Science spans a vast range of areas from basic computer science theory (algorithms and computational logic) to in depth analysis of data manipulation and use within technology. === Programming theory === The process of taking a given algorithm and encoding it into a language that can be understood and executed by a computer. There are many different types of programming languages and various different types of computers, however, they all have the same goal: to turn algorithms into machine code. Popular programming languages used within the academic study of CIS include, but are not limited to: Java, Python, C#, C++, Perl, Ruby, Pascal, Swift, Visual Basic. === Information and information systems === The academic study of software and hardware systems that process large quantities and data, support large scale data management and how data can be used. This is where the field is unique from the standard study of computer science. The area of information systems focuses on the networks of hardware and software that are required to process, manipulate and distribute such data. === Computer systems and organisations === The process of analysing computer architecture and various logic circuits. This involves looking at low level computer processes at bit level computation. This is an in-depth look into the hardware processing of a computational system, involving looking at the basic structure of a computer and designing such systems. This can also involve evaluating complex circuit diagrams, and being able to construct these to solve a main problem. The main purpose behind this area of study is to achieve an understanding of how computers function on a basic level, often through tracing machine operations. === Machines, languages, and computation === This is the study into fundamental computer algorithms, which are the basis to computer programs. Without algorithms, no computer programs would exist. This also involves the process of looking into various mathematical functions behind computational algorithms, basic theory and functional (low level) programming. In an academic setting, this area would introduce the fundamental mathematical theorems and functions behind theoretical computer science which are the building blocks for other areas in the field. Complex topics such as; proofs, algebraic functions and sets will be introduced during studies of CIS. == Developments == Information and computer science is a field that is rapidly developing with job prospects for students being extremely promising with 75.7% of graduates gaining employment. Also the IT industry employs one in twenty of the workforce with it predicted to increase nearly five times faster than the average of the UK and between 2012 and 2017 more than half a million people will be needed within the industry and the fact that nine out of ten tech firms are suffering from candidate shortages which is having a negative impact on their business as it delays the creation and development of new products, and it's predicted in the US that in the next decade there will be more than one million jobs in the technology sector than computer science graduates to fill them. Because of this programming is now being taught at an earlier age with an aim to interest students from a young age into computer and information science hopefully leading more children to study this at a higher level. For example, children in England will now be exposed to computer programming at the age of 5 due to an updated national curriculum. == Employment == Due to the wide variety of jobs that now involve computer and information science related tasks, it is difficult to provide a comprehensive list of possible jobs in this area, but some of the key areas are artificial intelligence, software engineering and computer networking and communication. Work in this area also tends to require sufficient understanding of mathematics and science. Moreover, jobs that having a CIS degree can lead to, include: systems analyst, network administrator, system architect, information systems developer, web programmer, or software developer. The earning potential for CIS graduates is quite promising. A 2013 survey from the National Association of Colleges and Employers (NACE) found that the average starting salary for graduates who earned a degree in a computer related field was $59,977, up 4.3% from the prior year. This is higher than other popular degrees such as business ($54,234), education ($40,480) and math and sciences ($42,724). Furthermore, Payscale ranked 129 college degrees based on their graduates earning potential with engineering, math, science, and technology fields dominating the ranking. With eight computer related degrees appearing among the top 30. With the lowest starting salary for these jobs being $49,900. A Rasmussen College article describes various jobs CIS graduates may obtain with software applications developers at the top making a median income of $98,260. According to the National Careers Service an Information Scientist can expect to earn £24,000+ per year as a starting salary.

    Read more →
  • Inbox by Gmail

    Inbox by Gmail

    Inbox by Gmail was an email service developed by Google. Announced on a limited invitation-only basis on October 22, 2014, it was officially released to the public on May 28, 2015. Inbox was shut down by Google on April 2, 2019. Available on the web, and through mobile apps for Android and iOS, Inbox by Gmail aimed to improve email productivity and organization through several key features. Bundles gathered emails on the same topic together; highlighted surface key details from messages, reminders and assists; and a "snooze" functionality enabled users to control when specific information would appear. Updates to the service enabled an "undo send" feature; a "Smart Reply" feature that automatically generated short reply examples for certain emails; integration with Google Calendar for event organization, previews of newsletters; and a "Save to Inbox" feature that let users save links for later use. Inbox by Gmail received generally positive reviews. At its launch, it was called "minimalist and lovely, full of layers and easy to navigate", with features deemed helpful in finding the right messages—one reviewer noted that the service felt "a lot like the future of email". However, it also received criticism, particularly for a low density of information, algorithms that needed tweaking, and because the service required users to "give up the control" of organizing their own email, meaning that "Anyone who already has a system for organizing their emails will likely find themselves fighting Google's system". Google noted in March 2016 that 10% of all replies on mobile originated from Inbox's Smart Reply feature. Google announced it would discontinue Inbox by Gmail in March 2019, with many of its features integrated into Gmail proper. == Features == Inbox by Gmail scanned the user's incoming Gmail messages for information. It gathered email messages related to the same overall topic into an organized bundle, with a title describing the bundle's content. For example, flight tickets, car rentals, and hotel reservations were grouped under "Travel", giving the user an easier overview of emails. Users could also group emails together manually, to "teach" the Inbox how the user worked. The service highlighted key details and important information in messages, such as flight itineraries, event information, photos and documents. Inbox could retrieve updated information from the Internet, including the real-time status of flights and package deliveries. Users could set reminders to bring up important messages later. When a user needed particular information, Inbox could assist the user by displaying the necessary details. Where Inbox highlights information was not needed immediately, users could "snooze" a message or reminder, with options to make the information reappear at a later time or specific location. In June 2015, Google added an "Undo Send" feature to Inbox, giving the user 10 seconds to undo sending a message. In November 2015, Google added "Smart Reply" functionality to the mobile apps. With Smart Reply, Inbox determined which emails could be answered with a short reply, generating three example responses from which the user could select one with a single tap. Smart Reply (initially available only on the Android and iOS mobile apps) was added to the Inbox website in March 2016, Google announcing that "10% of all your replies on mobile already use Smart Reply". By May 2017, Google said Smart Reply was driving about 12% of replies in inbox on mobile. In April 2016, Google updated Inbox with three new features; Google Calendar event organization, newsletter previews, and a "Save to Inbox" functionality that let the user save links for later use, rather than having to email links to themselves. In December 2017, Google introduced an "Unsubscribe" card that let users easily unsubscribe from mailing lists. The card appeared for email messages (from specific senders) that the user had not opened for a month. A few popular Inbox by Gmail features were subsequently added to Gmail: "Snoozing" of emails Nudges: Gmail could move old messages back to the top of the inbox when it thought a follow up or reply might be required. Hover actions: Placing the mouse cursor over a certain part of the message could quickly effect an action, such as archiving, without its being opened. Smart reply: This feature employed boilerplate text to suggest appropriate replies. Google reportedly wished, at a time then to be decided, to add the "bundles" feature to Gmail, which at the time was available only in Inbox for Gmail. By March 2020, many Inbox features were still missing from Gmail. == Platforms == Inbox by Gmail was announced on a limited invitation-only basis on October 22, 2014, available on the web, and through the Android and iOS mobile operating systems. It was officially released to the public on May 28, 2015. == Reception == David Pierce of The Verge praised the service, writing that it was "minimalist and lovely, full of layers and easy to navigate. It's remarkably fast and smooth on all platforms, and far better on iOS than the Gmail app". However, he criticized the app's low density of information, with only a few emails visible on the screen at a time, making it "a bit of a challenge" for users who need to go through "hundreds of emails" every day. Although positive that "Inbox feels a lot like the future of email", Pierce wrote that there was "plenty of algorithm tweaking and design condensing to do", with particular attention needed on a "compact view" for denser view of information on the screen. Sarah Mitroff of CNET also praised Inbox, writing, "Not only is it visually appealing, it's also full of features that help you find every message you need, when you need it". She added that users must "give up the control" to organize their email, and that it "won't vibe with everyone", but admitted that "if you're willing ... the app will reward you with a smarter and cleaner inbox." Mitroff noted that, initially, users had to coach the app about which bundle was appropriate for certain emails, writing, "It's a tedious process at first, by [sic] in just a few days Inbox starts to get it right." Regarding any downsides of the service, Mitroff wrote that "Inbox has a built-in strategy for managing your emails that works best on its own. Anyone who already has a system for organizing their emails will likely find themselves fighting Google's system". == Discontinuation and legacy == Google ended the service in March 2019. Google called Inbox "a great place to experiment with new ideas" and noted that many of those ideas had been migrated to Gmail. The company wanted, going forward, to focus its resources on a single email system. Several services, like Shortwave, attempted to resurrect some of the features of Inbox by Gmail to attract its old users. Similarly, Inbox Reborn, an actively maintained browser extension developed by a team of volunteer developers from around the world since 2018, aims to recreate the core features and visual style of Inbox by Gmail within the standard Gmail interface. The project continues to focus on preserving functionalities such as email bundling and streamlined workflows to provide users with a familiar productivity experience. Afterwards, most people moved to Spark, Spike, or Newton. According to a product manager at Google, a "more focused approach" regarding email was the companies goal. This is likely the reason they moved away from Inbox.

    Read more →
  • Vector-field consistency

    Vector-field consistency

    Vector-Field Consistency is a consistency model for replicated data (for example, objects), initially described in a paper which was awarded the best-paper prize in the ACM/IFIP/Usenix Middleware Conference 2007. It has since been enhanced for increased scalability and fault-tolerance in a recent paper. == Description == This consistency model was initially designed for replicated data management in ad hoc gaming in order to minimize bandwidth usage without sacrificing playability. Intuitively, it captures the notion that although players require, wish, and take advantage of information regarding the whole of the game world (as opposed to a restricted view to rooms, arenas, etc. of limited size employed in many multiplayer video games), they need to know information with greater freshness, frequency, and accuracy as other game entities are located closer and closer to the player's position. It prescribes a multidimensional divergence bounding scheme, based on a vector field that employs consistency vectors k=(θ,σ,ν), standing for maximum allowed time - or replica staleness, sequence - or missing updates, and value - or user-defined measured replica divergence, applied to all space coordinates in game scenario or world. The consistency vector-fields emanate from field-generators designated as pivots (for example, players) and field intensity attenuates as distance grows from these pivots in concentric or square-like regions. This consistency model unifies locality-awareness techniques employed in message routing and consistency enforcement for multiplayer games, with divergence bounding techniques traditionally employed in replicated database and web scenarios.

    Read more →
  • Retention period

    Retention period

    A retention period (associated with a retention schedule or retention program) is an aspect of records and information management (RIM) and the records life cycle that identifies the duration of time for which the information should be maintained or "retained", irrespective of format (paper, electronic, or other). Retention periods vary with different types of information, based on content and a variety of other factors, including internal organizational need, regulatory requirements for inspection or audit, legal statutes of limitation, involvement in litigation, and taxation and financial reporting needs, as well as other factors as defined by local, regional, state, national, and/or international governing entities. Once an applicable retention period has elapsed for a given type or series of information, and all holds/moratoriums have been released, the information is typically destroyed using an approved and effective destruction method, which renders the information completely and irreversibly unusable via any means. Alternatively, it may be converted from one form to another (e.g. from paper to electronic), depending on the defined retention period per format. Information with historical value beyond its "usable value" may be accessioned to the custody of an archive organization for permanent or extended long-term preservation. == Defensible disposition == Defensible disposition refers to the ability of an identified and applied retention period to effectively provide for the defense of the record, and its eventual destruction or accessioning when scrutinized within a court of law or by other review. It is commonly advised by records and information management (RIM) professionals that any and all retention periods applied to organizational information should be reviewed and approved for use by competent legal counsel, which represents the organization, and is familiar with the specific business needs and legal and regulatory requirements of the organization. Additionally, a practical approach to information assessment/classification, proper documentation of the disposition program, strategic review of disposition policy over time for efficacy are required for proper defensible disposition. == Guidance and education organizations == ARMA International Information and Records Management Society filerskeepers records retention FAQ

    Read more →
  • Sieve of Pritchard

    Sieve of Pritchard

    In mathematics, the sieve of Pritchard is an algorithm for finding all prime numbers up to a specified bound. Like the ancient sieve of Eratosthenes, it has a simple conceptual basis in number theory. It is especially suited to quick hand computation for small bounds. Whereas the sieve of Eratosthenes marks off each non-prime for each of its prime factors, the sieve of Pritchard avoids considering almost all non-prime numbers by building progressively larger wheels, which represent the pattern of numbers not divisible by any of the primes processed thus far. It thereby achieves a better asymptotic complexity, and was the first sieve with a running time sublinear in the specified bound. Its asymptotic running-time has not been improved on, and it deletes fewer composites than any other known sieve. It was created in 1979 by Paul Pritchard. Since Pritchard has created a number of other sieve algorithms for finding prime numbers, the sieve of Pritchard is sometimes singled out by being called the wheel sieve (by Pritchard himself) or the dynamic wheel sieve. == Overview == A prime number is a natural number that has no natural number divisors other than the number 1 and itself. To find all the prime numbers less than or equal to a given integer N, a sieve algorithm examines a set of candidates in the range 2, 3, …, N, and eliminates those that are not prime, leaving the primes at the end. The sieve of Eratosthenes examines all of the range, first removing all multiples of the first prime 2, then of the next prime 3, and so on. The sieve of Pritchard instead examines a subset of the range consisting of numbers that occur on successive wheels, which represent the pattern of numbers left after each successive prime is processed by the sieve of Eratosthenes. For i > 0, the ith wheel Wi represents this pattern. It is the set of numbers between 1 and the product Pi = p1 · p2 ⋯ pi of the first i prime numbers that are not divisible by any of these prime numbers (and is said to have an associated length Pi). This is because adding Pi to a number does not change whether it is divisible by one of the first i prime numbers, since the remainder on division by any one of these primes is unchanged. So W1 = {1} with length P1 = 2 represents the pattern of odd numbers; W2 = {1,5} with length P2 = 6 represents the pattern of numbers not divisible by 2 or 3; etc. Wheels are so-called because Wi can be usefully visualized as a circle of circumference Pi with its members marked at their corresponding distances from an origin. Then rolling the wheel along the number line marks points corresponding to successive numbers not divisible by one of the first i prime numbers. The animation shows W2 being rolled up to 30. It is useful to define Wi → n for n > 0 to be the result of rolling Wi up to n. Then the animation generates W2 → 30 = {1,5,7,11,13,17,19,23,25,29}. Note that up to 52 − 1 = 24, this consists only of 1 and the primes between 5 and 25. The sieve of Pritchard is derived from the observation that this holds generally: for all i > 0, the values in Wi → (p2i+1 − 1) are 1 and the primes between pi+1 and p2i+1. It even holds for i = 0, where the wheel has length 1 and contains just 1 (representing all the natural numbers). So the sieve of Pritchard starts with the trivial wheel W0 and builds successive wheels until the square of the wheel's first member after 1 is at least N. Wheels grow very quickly, but only their values up to N are needed and generated. It remains to find a method for generating the next wheel. Note in the animation that W3 = {1,5,7,11,13,17,19,23,25,29} − {5 · 1 , 5 · 5} can be obtained by rolling W2 up to 30 and then removing 5 times each member of W2.This also holds generally: for all i ≥ 0, Wi+1 = (Wi → Pi+1) − {pi+1 · w | w ∈ Wi}. Rolling Wi past Pi just adds values to Wi, so the current wheel is first extended by getting each successive member starting with w = 1, adding Pi to it, and inserting the result in the set. Then the multiples of pi+1 are deleted. Care must be taken to avoid a number being deleted that itself needs to be multiplied by pi+1. The sieve of Pritchard as originally presented does so by first skipping past successive members until finding the maximum one needed, and then doing the deletions in reverse order by working back through the set. This is the method used in the first animation above. A simpler approach is just to gather the multiples of pi+1 in a list, and then delete them. Another approach is given by Gries and Misra. If the main loop terminates with a wheel whose length is less than N, it is extended up to N to generate the remaining primes. The algorithm, for finding all primes up to N, is therefore as follows: Start with a set W = {1} and length = 1 representing wheel 0, and prime p = 2. As long as p2 ≤ N, do the following: if length < N, then extend W by repeatedly getting successive members w of W starting with 1 and inserting length + w into W as long as it does not exceed p · length or N; increase length to the minimum of p · length and N. repeatedly delete p times each member of W by first finding the largest ≤ length and then working backwards. note the prime p, then set p to the next member of W after 1 (or 3 if p was 2). if length < N, then extend W to N by repeatedly getting successive members w of W starting with 1 and inserting length + w into W as long as it does not exceed N; On termination, the rest of the primes up to N are the members of W after 1. === Example === To find all the prime numbers less than or equal to 150, proceed as follows. Start with wheel 0 with length 1, representing all natural numbers 1, 2, 3...: 1 The first number after 1 for wheel 0 (when rolled) is 2; note it as a prime. Now form wheel 1 with length 2 × 1 = 2 by first extending wheel 0 up to 2 and then deleting 2 times each number in wheel 0, to get: 1 2 The first number after 1 for wheel 1 (when rolled) is 3; note it as a prime. Now form wheel 2 with length 3 × 2 = 6 by first extending wheel 1 up to 6 and then deleting 3 times each number in wheel 1, to get 1 2 3 5 The first number after 1 for wheel 2 is 5; note it as a prime. Now form wheel 3 with length 5 × 6 = 30 by first extending wheel 2 up to 30 and then deleting 5 times each number in wheel 2 (in reverse order), to get 1 2 3 5 7 11 13 17 19 23 25 29 The first number after 1 for wheel 3 is 7; note it as a prime. Now wheel 4 has length 7 × 30 = 210, so we only extend wheel 3 up to our limit 150. (No further extending will be done now that the limit has been reached.) We then delete 7 times each number in wheel 3 until we exceed our limit 150, to get the elements in wheel 4 up to 150: 1 2 3 5 7 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 77 79 83 89 91 97 101 103 107 109 113 119 121 127 131 133 137 139 143 149 The first number after 1 for this partial wheel 4 is 11; note it as a prime. Since we have finished with rolling, we delete 11 times each number in the partial wheel 4 until we exceed our limit 150, to get the elements in wheel 5 up to 150: 1 2 3 5 7 11 13 17 19 23 25 29 31 37 41 43 47 49 53 59 61 67 71 73 77 79 83 89 91 97 101 103 107 109 113 119 121 127 131 133 137 139 143 149 The first number after 1 for this partial wheel 5 is 13. Since 13 squared is at least our limit 150, we stop. The remaining numbers (other than 1) are the rest of the primes up to our limit 150. Just 8 composite numbers are removed, once each. The rest of the numbers considered (other than 1) are prime. In comparison, the natural version of Eratosthenes sieve (stopping at the same point) removes composite numbers 184 times. == Pseudocode == The sieve of Pritchard can be expressed in pseudocode, as follows: algorithm Sieve of Pritchard is input: an integer N >= 2. output: the set of prime numbers in {1,2,...,N}. let W and Pr be sets of integer values, and all other variables integer values. k, W, length, p, Pr := 1, {1}, 2, 3, {2}; {invariant: p = pk+1 and W = Wk ∩ {\displaystyle \cap } {1,2,...,N} and length = minimum of Pk,N and Pr = the primes up to pk} while p2 <= N do if (length < N) then Extend W,length to minimum of plength,N; Delete multiples of p from W; Insert p into Pr; k, p := k+1, next(W, 1) if (length < N) then Extend W,length to N; return Pr ∪ {\displaystyle \cup } W - {1}; where next(W, w) is the next value in the ordered set W after w. procedure Extend W,length to n is {in: W = Wk and length = Pk and n > length} {out: W = Wk → {\displaystyle \rightarrow } n and length = n} integer w, x; w, x := 1, length+1; while x <= n do Insert x into W; w := next(W,w); x := length + w; length := n; procedure Delete multiples of p from W,length is integer w; w := p; while pw <= length do w := next(W,w); while w > 1 do w := prev(W,w); Remove pw from W; where prev(W, w) is the previous value in the ordered set W before w. The algorithm can be initialized with W0 instead of W1 at the minor complication of making next(W, 1) a special case when k = 0. This a

    Read more →
  • Tensor (machine learning)

    Tensor (machine learning)

    In machine learning, the term tensor informally refers to two different concepts: (i) a way of organizing data and (ii) a multilinear (tensor) transformation. Data may be organized in a multidimensional array (M-way array), informally referred to as a "data tensor"; however, in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor"), may be analyzed either by artificial neural networks or tensor methods. Tensor decomposition factors data tensors into smaller tensors. Operations on data tensors can be expressed in terms of matrix multiplication and the Kronecker product. The computation of gradients, a crucial aspect of backpropagation, can be performed using software libraries such as PyTorch and TensorFlow. Computations are often performed on graphics processing units (GPUs) using CUDA, and on dedicated hardware such as Google's Tensor Processing Unit or Nvidia's Tensor core. These developments have greatly accelerated neural network architectures, and increased the size and complexity of models that can be trained. == History == A tensor is by definition a multilinear map. In mathematics, this may express a multilinear relationship between sets of algebraic objects. In physics, tensor fields, considered as tensors at each point in space, are useful in expressing mechanics such as stress or elasticity. In machine learning, the exact use of tensors depends on the statistical approach being used. In 2001, the field of signal processing and statistics were making use of tensor methods. Pierre Comon surveys the early adoption of tensor methods in the fields of telecommunications, radio surveillance, chemometrics and sensor processing. Linear tensor rank methods (such as, Parafac/CANDECOMP) analyzed M-way arrays ("data tensors") composed of higher order statistics that were employed in blind source separation problems to compute a linear model of the data. He noted several early limitations in determining the tensor rank and efficient tensor rank decomposition. In the early 2000s, multilinear tensor methods crossed over into computer vision, computer graphics and machine learning with papers by Vasilescu or in collaboration with Terzopoulos, such as Human Motion Signatures, TensorFaces TensorTextures and Multilinear Projection. Multilinear algebra, the algebra of higher-order tensors, is a suitable and transparent framework for analyzing the multifactor structure of an ensemble of observations and for addressing the difficult problem of disentangling the causal factors based on second order or higher order statistics associated with each causal factor. Tensor (multilinear) factor analysis disentangles and reduces the influence of different causal factors with multilinear subspace learning. When treating an image or a video as a 2- or 3-way array, i.e., "data matrix/tensor", tensor methods reduce spatial or time redundancies as demonstrated by Wang and Ahuja. Yoshua Bengio, Geoff Hinton and their collaborators briefly discuss the relationship between deep neural networks and tensor factor analysis beyond the use of M-way arrays ("data tensors") as inputs. One of the early uses of tensors for neural networks appeared in natural language processing. A single word can be expressed as a vector via Word2vec. Thus a relationship between two words can be encoded in a matrix. However, for more complex relationships such as subject-object-verb, it is necessary to build higher-dimensional networks. In 2009, the work of Sutskever introduced Bayesian Clustered Tensor Factorization to model relational concepts while reducing the parameter space. From 2014 to 2015, tensor methods become more common in convolutional neural networks (CNNs). Tensor methods organize neural network weights in a "data tensor", analyze and reduce the number of neural network weights. Lebedev et al. accelerated CNN networks for character classification (the recognition of letters and digits in images) by using 4D kernel tensors. == Definition == Let F {\displaystyle \mathbb {F} } be a field (such as the real numbers R {\displaystyle \mathbb {R} } or the complex numbers C {\displaystyle \mathbb {C} } ). A tensor T ∈ F I 1 × I 2 × … × I C {\displaystyle {\mathcal {T}}\in {\mathbb {F} }^{I_{1}\times I_{2}\times \ldots \times I_{C}}} is a multilinear transformation from a set of domain vector spaces to a range vector space: T : { F I 1 × F I 2 × … F I C } ↦ F I 0 {\displaystyle {\mathcal {T}}:\{{\mathbb {F} }^{I_{1}}\times {\mathbb {F} }^{I_{2}}\times \ldots {\mathbb {F} }^{I_{C}}\}\mapsto {\mathbb {F} }^{I_{0}}} Here, C {\displaystyle C} and I 0 , I 1 , … , I C {\displaystyle I_{0},I_{1},\ldots ,I_{C}} are positive integers, and ( C + 1 ) {\displaystyle (C+1)} is the number of modes of a tensor (also known as the number of ways of a multi-way array). The dimensionality of mode c {\displaystyle c} is I c {\displaystyle I_{c}} , for 0 ≤ c ≤ C {\displaystyle 0\leq c\leq C} . In statistics and machine learning, an image is vectorized when viewed as a single observation, and a collection of vectorized images is organized as a "data tensor". For example, a set of facial images { d i p , i e , i l , i v ∈ R I X } {\displaystyle \{{\mathbb {d} }_{i_{p},i_{e},i_{l},i_{v}}\in {\mathbb {R} }^{I_{X}}\}} with I X {\displaystyle I_{X}} pixels that are the consequences of multiple causal factors, such as a facial geometry i p ( 1 ≤ i p ≤ I P ) {\displaystyle i_{p}(1\leq i_{p}\leq I_{P})} , an expression i e ( 1 ≤ i e ≤ I E ) {\displaystyle i_{e}(1\leq i_{e}\leq I_{E})} , an illumination condition i l ( 1 ≤ i l ≤ I L ) {\displaystyle i_{l}(1\leq i_{l}\leq I_{L})} , and a viewing condition i v ( 1 ≤ i v ≤ I V ) {\displaystyle i_{v}(1\leq i_{v}\leq I_{V})} may be organized into a data tensor (ie. multiway array) D ∈ R I X × I P × I E × I L × V {\displaystyle {\mathcal {D}}\in {\mathbb {R} }^{I_{X}\times I_{P}\times I_{E}\times I_{L}\times V}} where I P {\displaystyle I_{P}} are the total number of facial geometries, I E {\displaystyle I_{E}} are the total number of expressions, I L {\displaystyle I_{L}} are the total number of illumination conditions, and I V {\displaystyle I_{V}} are the total number of viewing conditions. Tensor factorizations methods such as TensorFaces and multilinear (tensor) independent component analysis factorizes the data tensor into a set of vector spaces that span the causal factor representations, where an image is the result of tensor transformation T {\displaystyle {\mathcal {T}}} that maps a set of causal factor representations to the pixel space. Another approach to using tensors in machine learning is to embed various data types directly. For example, a grayscale image, commonly represented as a discrete 2-way array D ∈ R I R X × I C X {\displaystyle {\mathbf {D} }\in {\mathbb {R} }^{I_{RX}\times I_{CX}}} with dimensionality I R X × I C X {\displaystyle I_{RX}\times I_{CX}} where I R X {\displaystyle I_{RX}} are the number of rows and I C X {\displaystyle I_{CX}} are the number of columns. When an image is treated as 2-way array or 2nd order tensor (i.e. as a collection of column/row observations), tensor factorization methods compute the image column space, the image row space and the normalized PCA coefficients or the ICA coefficients. Similarly, a color image with RGB channels, D ∈ R N × M × 3 . {\displaystyle {\mathcal {D}}\in \mathbb {R} ^{N\times M\times 3}.} may be viewed as a 3rd order data tensor or 3-way array.-------- In natural language processing, a word might be expressed as a vector v {\displaystyle v} via the Word2vec algorithm. Thus v {\displaystyle v} becomes a mode-1 tensor v ↦ A ∈ R N . {\displaystyle v\mapsto {\mathcal {A}}\in \mathbb {R} ^{N}.} The embedding of subject-object-verb semantics requires embedding relationships among three words. Because a word is itself a vector, subject-object-verb semantics could be expressed using mode-3 tensors v a × v b × v c ↦ A ∈ R N × N × N . {\displaystyle v_{a}\times v_{b}\times v_{c}\mapsto {\mathcal {A}}\in \mathbb {R} ^{N\times N\times N}.} In practice the neural network designer is primarily concerned with the specification of embeddings, the connection of tensor layers, and the operations performed on them in a network. Modern machine learning frameworks manage the optimization, tensor factorization and backpropagation automatically. === As unit values === Tensors may be used as the unit values of neural networks which extend the concept of scalar, vector and matrix values to multiple dimensions. The output value of single layer unit y m {\displaystyle y_{m}} is the sum-product of its input units and the connection weights filtered through the activation function f {\displaystyle f} : y m = f ( ∑ n x n u m , n ) , {\displaystyle y_{m}=f\left(\sum _{n}x_{n}u_{m,n}\right),} where y m ∈ R .

    Read more →
  • XOR swap algorithm

    XOR swap algorithm

    In computer programming, the exclusive or swap (sometimes shortened to XOR swap) is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required. The algorithm is primarily a novelty and a way of demonstrating properties of the exclusive or operation. It is sometimes discussed as a program optimization, but there are almost no cases where swapping via exclusive or provides benefit over the standard, obvious technique. == The algorithm == Conventional swapping requires the use of a temporary storage variable. Using the XOR swap algorithm, however, no temporary storage is needed. The algorithm is as follows: Since XOR is a commutative operation, either X XOR Y or Y XOR X can be used interchangeably in any of the foregoing three lines. Note that on some architectures the first operand of the XOR instruction specifies the target location at which the result of the operation is stored, preventing this interchangeability. The algorithm typically corresponds to three machine-code instructions, represented by corresponding pseudocode and assembly instructions in the three rows of the following table: In the above System/370 assembly code sample, R1 and R2 are distinct registers, and each XR operation leaves its result in the register named in the first argument. Using x86 assembly, values X and Y are in registers eax and ebx (respectively), and xor places the result of the operation in the first register (Note: x86 supports XCHG instruction so using triple XOR do not make sense on this architecture). In RISC-V assembly, value X and Y are in registers x10 and x11, and xor places the result of the operation in the first operand. However, in the pseudocode or high-level language version or implementation, the algorithm fails if x and y use the same storage location, since the value stored in that location will be zeroed out by the first XOR instruction, and then remain zero; it will not be "swapped with itself". This is not the same as if x and y have the same values. The trouble only comes when x and y use the same storage location, in which case their values must already be equal. That is, if x and y use the same storage location, then the line: sets x to zero (because x = y so X XOR Y is zero) and sets y to zero (since it uses the same storage location), causing x and y to lose their original values. == Proof of correctness == The binary operation XOR over bit strings of length N {\displaystyle N} exhibits the following properties (where ⊕ {\displaystyle \oplus } denotes XOR): L1. Commutativity: A ⊕ B = B ⊕ A {\displaystyle A\oplus B=B\oplus A} L2. Associativity: ( A ⊕ B ) ⊕ C = A ⊕ ( B ⊕ C ) {\displaystyle (A\oplus B)\oplus C=A\oplus (B\oplus C)} L3. Identity exists: there is a bit string, 0, (of length N) such that A ⊕ 0 = A {\displaystyle A\oplus 0=A} for any A {\displaystyle A} L4. Each element is its own inverse: for each A {\displaystyle A} , A ⊕ A = 0 {\displaystyle A\oplus A=0} . Suppose that we have two distinct registers R1 and R2 as in the table below, with initial values A and B respectively. We perform the operations below in sequence, and reduce our results using the properties listed above. === Linear algebra interpretation === As XOR can be interpreted as binary addition and a pair of bits can be interpreted as a vector in a two-dimensional vector space over the field with two elements, the steps in the algorithm can be interpreted as multiplication by 2×2 matrices over the field with two elements. For simplicity, assume initially that x and y are each single bits, not bit vectors. For example, the step: which also has the implicit: corresponds to the matrix ( 1 1 0 1 ) {\displaystyle \left({\begin{smallmatrix}1&1\\0&1\end{smallmatrix}}\right)} as ( 1 1 0 1 ) ( x y ) = ( x + y y ) . {\displaystyle {\begin{pmatrix}1&1\\0&1\end{pmatrix}}{\begin{pmatrix}x\\y\end{pmatrix}}={\begin{pmatrix}x+y\\y\end{pmatrix}}.} The sequence of operations is then expressed as: ( 1 1 0 1 ) ( 1 0 1 1 ) ( 1 1 0 1 ) = ( 0 1 1 0 ) {\displaystyle {\begin{pmatrix}1&1\\0&1\end{pmatrix}}{\begin{pmatrix}1&0\\1&1\end{pmatrix}}{\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&1\\1&0\end{pmatrix}}} (working with binary values, so 1 + 1 = 0 {\displaystyle 1+1=0} ), which expresses the elementary matrix of switching two rows (or columns) in terms of the transvections (shears) of adding one element to the other. To generalize to where X and Y are not single bits, but instead bit vectors of length n, these 2×2 matrices are replaced by 2n×2n block matrices such as ( I n I n 0 I n ) . {\displaystyle \left({\begin{smallmatrix}I_{n}&I_{n}\\0&I_{n}\end{smallmatrix}}\right).} These matrices are operating on values, not on variables (with storage locations), hence this interpretation abstracts away from issues of storage location and the problem of both variables sharing the same storage location. == Code example == A C function that implements the XOR swap algorithm: The code first checks if the addresses are distinct and uses a guard clause to exit the function early if they are equal. Without that check, if they were equal, the algorithm would fold to a triple x ^= x resulting in zero. == Reasons for avoidance in practice == On modern CPU architectures, the XOR technique can be slower than using a temporary variable to do swapping. At least on recent x86 CPUs, both by AMD and Intel, moving between registers regularly incurs zero latency. (This is called MOV-elimination.) Even if there is not any architectural register available to use, the XCHG instruction will be at least as fast as the three XORs taken together. Another reason is that modern CPUs strive to execute instructions in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order, negating any benefits of instruction-level parallelism. === Aliasing === The XOR swap is also complicated in practice by aliasing. If an attempt is made to XOR-swap the contents of some location with itself, the result is that the location is zeroed out and its value lost. Therefore, XOR swapping must not be used blindly in a high-level language if aliasing is possible. This issue does not apply if the technique is used in assembly to swap the contents of two registers. Similar problems occur with call by name, as in Jensen's Device, where swapping i and A[i] via a temporary variable yields incorrect results due to the arguments being related: swapping via temp = i; i = A[i]; A[i] = temp changes the value for i in the second statement, which then results in the incorrect i value for A[i] in the third statement. == Variations == The underlying principle of the XOR swap algorithm can be applied to any operation meeting criteria L1 through L4 above. Replacing XOR by addition and subtraction gives various slightly different, but largely equivalent, formulations. For example: Unlike the XOR swap, this variation requires that the underlying processor or programming language uses a method such as modular arithmetic or bignums to guarantee that the computation of X + Y cannot cause an error due to integer overflow. Therefore, it is seen even more rarely in practice than the XOR swap. However, the implementation of AddSwap above in the C programming language always works even in case of integer overflow, since, according to the C standard, addition and subtraction of unsigned integers follow the rules of modular arithmetic, i. e. are done in the cyclic group Z / 2 s Z {\displaystyle \mathbb {Z} /2^{s}\mathbb {Z} } where s {\displaystyle s} is the number of bits of unsigned int. Indeed, the correctness of the algorithm follows from the fact that the formulas ( x + y ) − y = x {\displaystyle (x+y)-y=x} and ( x + y ) − ( ( x + y ) − y ) = y {\displaystyle (x+y)-((x+y)-y)=y} hold in any abelian group. This generalizes the proof for the XOR swap algorithm: XOR is both the addition and subtraction in the abelian group ( Z / 2 Z ) s {\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{s}} (which is the direct sum of s copies of Z / 2 Z {\displaystyle \mathbb {Z} /2\mathbb {Z} } ). This doesn't hold when dealing with the signed int type (the default for int). Signed integer overflow is an undefined behavior in C and thus modular arithmetic is not guaranteed by the standard, which may lead to incorrect results. The sequence of operations in AddSwap can be expressed via matrix multiplication as: ( 1 − 1 0 1 ) ( 1 0 1 − 1 ) ( 1 1 0 1 ) = ( 0 1 1 0 ) {\displaystyle {\begin{pmatrix}1&-1\\0&1\end{pmatrix}}{\begin{pmatrix}1&0\\1&-1\end{pmatrix}}{\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&1\\1&0\end{pmatrix}}} == Application to register allocation == On architectures lacking a dedicated swap instruction, because it avoids the extra temporary register, the XOR swap algorithm is required for optimal register allocatio

    Read more →
  • Scriptella

    Scriptella

    Scriptella is an open source extract transform load (ETL) and script execution tool written in Java. It allows the use of SQL or another scripting language suitable for the data source to perform required transformations. Scriptella does not offer any graphical user interface. == Typical use == Database migration. Database creation/update scripts. Cross-database ETL operations, import/export. Alternative for Ant task. Automated database schema upgrade. == Features == Simple XML syntax for scripts. Add dynamics to your existing SQL scripts by creating a thin wrapper XML file: Support for multiple datasources (or multiple connections to a single database) in an ETL file. Support for many useful JDBC features, e.g. parameters in SQL including file blobs and JDBC escaping. Performance and low memory usage are one of the primary goals. Support for evaluated expressions and properties (JEXL syntax) Support for cross-database ETL scripts by using elements Transactional execution Error handling via elements Conditional scripts/queries execution (similar to Ant if/unless attributes but more powerful) Easy-to-Use as a standalone tool or Ant task, without deployment or installation. Easy-To-Run ETL files directly from Java code. Built-in adapters for popular databases for a tight integration. Support for any database with JDBC/ODBC compliant driver. Service Provider Interface (SPI) for interoperability with non-JDBC DataSources and integration with scripting languages. Out of the box support for JSR 223 (Scripting for the Java Platform) compatible languages. Built-in CSV, TEXT, XML, LDAP, Lucene, Velocity, JEXL and Janino providers. Integration with Java EE, Spring Framework, JMX and JNDI for enterprise ready scripts.

    Read more →