AI Chat Free Online

AI Chat Free Online — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Image stitching

    Image stitching

    Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image. Commonly performed through the use of computer software, most approaches to image stitching require nearly exact overlaps between images and identical exposures to produce seamless results, although some stitching algorithms actually benefit from differently exposed images by doing high-dynamic-range imaging in regions of overlap. Some digital cameras can stitch their photos internally. == Applications == Image stitching is widely used in modern applications, such as the following: Document mosaicing Image stabilization feature in camcorders that use frame-rate image alignment High-resolution image mosaics in digital maps and satellite imagery Medical imaging Multiple-image super-resolution imaging Video stitching Object insertion == Process == The image stitching process can be divided into three main components: image registration, calibration, and blending. === Image stitching algorithms === In order to estimate image alignment, algorithms are needed to determine the appropriate mathematical model relating pixel coordinates in one image to pixel coordinates in another. Algorithms that combine direct pixel-to-pixel comparisons with gradient descent (and other optimization techniques) can be used to estimate these parameters. Distinctive features can be found in each image and then efficiently matched to rapidly establish correspondences between pairs of images. When multiple images exist in a panorama, techniques have been developed to compute a globally consistent set of alignments and to efficiently discover which images overlap one another. A final compositing surface onto which to warp or projectively transform and place all of the aligned images is needed, as are algorithms to seamlessly blend the overlapping images, even in the presence of parallax, lens distortion, scene motion, and exposure differences. === Image stitching issues === Since the illumination in two views cannot be guaranteed to be identical, stitching two images could create a visible seam. Other reasons for seams could be the background changing between two images for the same continuous foreground. Other major issues to deal with are the presence of parallax, lens distortion, scene motion, and exposure differences. In a non-ideal real-life case, the intensity varies across the whole scene, and so does the contrast and intensity across frames. Additionally, the aspect ratio of a panorama image needs to be taken into account to create a visually pleasing composite. For panoramic stitching, the ideal set of images will have a reasonable amount of overlap (at least 15–30%) to overcome lens distortion and have enough detectable features. The set of images will have consistent exposure between frames to minimize the probability of seams occurring. === Keypoint detection === Feature detection is necessary to automatically find correspondences between images. Robust correspondences are required in order to estimate the necessary transformation to align an image with the image it is being composited on. Corners, blobs, Harris corners, and differences of Gaussians of Harris corners are good features since they are repeatable and distinct. One of the first operators for interest point detection was developed by Hans Moravec in 1977 for his research involving the automatic navigation of a robot through a clustered environment. Moravec also defined the concept of "points of interest" in an image and concluded these interest points could be used to find matching regions in different images. The Moravec operator is considered to be a corner detector because it defines interest points as points where there are large intensity variations in all directions. This often is the case at corners. However, Moravec was not specifically interested in finding corners, just distinct regions in an image that could be used to register consecutive image frames. Harris and Stephens improved upon Moravec's corner detector by considering the differential of the corner score with respect to direction directly. They needed it as a processing step to build interpretations of a robot's environment based on image sequences. Like Moravec, they needed a method to match corresponding points in consecutive image frames, but were interested in tracking both corners and edges between frames. SIFT and SURF are recent key-point or interest point detector algorithms but a point to note is that SURF is patented and its commercial usage restricted. Once a feature has been detected, a descriptor method like SIFT descriptor can be applied to later match them. === Registration === Image registration involves matching features in a set of images or using direct alignment methods to search for image alignments that minimize the sum of absolute differences between overlapping pixels. When using direct alignment methods one might first calibrate one's images to get better results. Additionally, users may input a rough model of the panorama to help the feature matching stage, so that e.g. only neighboring images are searched for matching features. Since there are smaller group of features for matching, the result of the search is more accurate and execution of the comparison is faster. To estimate a robust model from the data, a common method used is known as RANSAC. The name RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an iterative method for robust parameter estimation to fit mathematical models from sets of observed data points which may contain outliers. The algorithm is non-deterministic in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are performed. It being a probabilistic method means that different results will be obtained for every time the algorithm is run. The RANSAC algorithm has found many applications in computer vision, including the simultaneous solving of the correspondence problem and the estimation of the fundamental matrix related to a pair of stereo cameras. The basic assumption of the method is that the data consists of "inliers", i.e., data whose distribution can be explained by some mathematical model, and "outliers" which are data that do not fit the model. Outliers are considered points which come from noise, erroneous measurements, or simply incorrect data. For the problem of homography estimation, RANSAC works by trying to fit several models using some of the point pairs and then checking if the models were able to relate most of the points. The best model – the homography, which produces the highest number of correct matches – is then chosen as the answer for the problem; thus, if the ratio of number of outliers to data points is very low, the RANSAC outputs a decent model fitting the data. === Calibration === Image calibration aims to minimize differences between an ideal lens models and the camera-lens combination that was used, optical defects such as distortions, exposure differences between images, vignetting, camera response and chromatic aberrations. If feature detection methods were used to register images and absolute positions of the features were recorded and saved, stitching software may use the data for geometric optimization of the images in addition to placing the images on the panosphere. Panotools and its various derivative programs use this method. ==== Alignment ==== Alignment may be necessary to transform an image to match the view point of the image it is being composited with. Alignment, in simple terms, is a change in the coordinates system so that it adopts a new coordinate system which outputs image matching the required viewpoint. The types of transformations an image may go through are pure translation, pure rotation, a similarity transform which includes translation, rotation and scaling of the image which needs to be transformed, Affine or projective transform. Projective transformation is the farthest an image can transform (in the set of two dimensional planar transformations), where only visible features that are preserved in the transformed image are straight lines whereas parallelism is maintained in an affine transform. Projective transformation can be mathematically described as x ′ = H ⋅ x , {\displaystyle x'=H\cdot x,} where x {\displaystyle x} is points in the old coordinate system, x ′ {\displaystyle x'} is the corresponding points in the transformed image and H {\displaystyle H} is the homography matrix. Expressing the points x {\displaystyle x} and x ′ {\displaystyle x'} using the camera intrinsics ( K {\displaystyle K} and K ′ {\displaystyle K'} ) and its rotation and translation [ R t ] {\displaystyle [R\,t]} to the real-world coordinates X {\displaystyle X} and < m a t h > x {\displaystyle x} and x ′ {\displaystyle x'} ', we get Using the abo

    Read more →
  • Vinelink.com

    Vinelink.com

    Vinelink.com (VINE) is a national website in the United States that allows victims of crime, and the general public, to track the movements of prisoners held by the various states and territories. The first four letters in the websites name, "vine", are an acronym for "Victim Information and Notification Everyday". Vinelink.com displays information, based on the information provided by the various states' departments of correction and other law enforcement agencies, on whether an inmate is in custody, has been released, has been granted parole or probation, or has escaped from custody. In some cases, the website will reveal whether a defendant has been granted parole or probation, but then subsequently violated conditions of their release and become a fugitive. Information provided on Vinelink.com represents metadata, in that the website lists a defendant's custody status; but does not list what the individual is charged with, their criminal history, or the amount of their bail, if applicable. Internet users accessing the Vinelink.com website choose from a map of states and provinces within the United States where they wish to perform a search for an inmate. The user may then search for an individual using the inmate's or parolee's name, or by entering the inmate's specific department of corrections inmate number, if known. When the inmate's custody status changes, users who have registered to be notified of such changes will be notified via email, phone or both. This information is currently released upon request, without the website requesting reasons for the users search or requiring payment, as public records available to the general public. Inmate information is available for most states, and for Puerto Rico, on the website. The states of Arizona, Georgia, Massachusetts, Montana, New Hampshire and West Virginia provide very limited information on the site. In March of 2025, The Maine Sheriff's Association entered into a contract to pilot the use of the VINE system in three counties in the state as well as a regional jail, therefore making South Dakota the only state that does not participate in the VINE system to any degree. The website does not provide data on prisoners detained by the Federal Bureau of Prisons which has its own inmate locator web site nor for inmates of the U.S. military prisons.

    Read more →
  • MaPS S.A.

    MaPS S.A.

    MaPS S.A. is a software editor founded in 2011 by Thierry Muller. The company is headquartered in Luxembourg. Its platform, called MaPS System, provides Data Management software for Multichannel Marketing. == History and funding == The first version of MaPS System was released under the agency Prem1um S.A. in 2005 in the partnership with Pingroom agency. In combination with MaPS System, Prem1um also provided various consulting services in Marketing, Publishing and Sales. This is where MaPS System takes its names (M stands for Marketing, P for Publishing and S for Sales). In 2011, after being successful, Prem1um S.A. decided to enable the software MaPS System to operate independently under MaPS S.A., as a separate company and editor of the software. The first financial supports were provided by Malta ICI, a Venture Capital firm, and the local partner Chameleon Invest, a seed-capital fund led by Business Angels, who invested €900,000. In a second investment round in 2014 led by Newion Investments, a Venture Capital firm, €1.4 Million were raised, thus amounting to total assets of €2.2 Million. In 2016, the company was taken over by three private investors. In 2018, after two years of continuous growth and European expansion in Belgium, Germany and Switzerland, MaPS S.A acquired Awevo, an e-commerce web agency. == Products == The services included in MaPS System range from the data centralization, Data Governance to an optimized Multichannel Marketing. The software currently includes more than 35 modules for Master Data Management, Product Information Management, Digital Asset Management, Business Process Management including catalogue Publishing features. == Certifications == In 2019, MaPS System and Awevo received "Made in Luxembourg" label, given to the companies whose services are entirely designed in Luxembourg, without any production or development offshoring. MaPS System is a member of ICT Cluster by Luxinnovation.

    Read more →
  • BRS/Search

    BRS/Search

    BRS/Search is a full-text database and information retrieval system. BRS/Search uses a fully inverted indexing system to store, locate, and retrieve unstructured data. It was the search engine that in 1977 powered Bibliographic Retrieval Services (BRS) commercial operations with 20 databases (including the first national commercial availability of MEDLINE); it has changed ownership several times during its development and is currently sold as Livelink ECM Discovery Server by Open Text Corporation. == Early development == Development on what was to become BRS began as Biomedical Communications Network (BCN) at the State University of New York at Albany (SUNY). BCN, which went online in 1968, provided on-line access to nine databases, including MEDLINE and BIOSIS Previews, to large universities and medical schools primarily in the Northeast of the USA. State funding for the project was withdrawn in 1975, and Bibliographic Retrieval Services (BRS) was formed as a non-profit concern the following year. It was incorporated in May 1976 as a for-profit corporation with Ron Quake as president, Jan Egeland as vice president in charge of marketing and training, and Lloyd Palmer as vice president of systems. == BRS commercial operations == In December 1976, the First BRS User Meeting was held in Syracuse, New York, and by January 1977 BRS started commercial operations with 20 databases (including the first national commercial availability of MEDLINE) and 9 million records, using modified IBM STAIRS (STorage And Information Retrieval System) software, Telenet for telecommunications, and timesharing mainframe computers of Carrier Corporation. In October 1980 BRS was sold by Egeland and Quake to Indian Head, Inc., a subsidiary of the Dutch company Thyssen-Bornemisza Group. == 1989–1993 == In 1989 Robert Maxwell acquired BRS and the BRS/Search software; he announced the planned incorporation of the ORBIT Search Service and BRS Information Technologies and renamed the whole group Maxwell Online, Inc. At that time BRS Information Technologies was serving the medical and academic library marketplace with over 150 databases. Maxwell later bought the publishing company Macmillan and put Maxwell Online under Macmillan. In the same year BRS/LINK (hypertext connection of databases; first application delivering full text) was announced. The initial BRS/LINK application "relates the citation in a bibliographic database to its full-text article in a second database," and "eliminates the need to re-execute a search strategy in the second database in order to find the corresponding full-text article." Initially BRS/LINK supported linking only selected bibliographic databases: MEDLINE, Health Planning and Administration, and MEDLINE References on AIDS to the full-text Comprehensive Core Medical Library. At the time of Robert Maxwell’s death in 1991, Macmillan brought in Andrew Gregory to represent the company during the 2 years that Maxwell’s affairs were being settled and to prepare Maxwell Online to be able to sell the components. Maxwell Online shortly thereafter underwent yet another name change, this time to InfoPro Technologies. == Dataware Technologies ownership of BRS/SEARCH == Early in 1994, InfoPro Technologies, a subsidiary of MHC Inc. (holding company for Macmillan Inc.), the former Maxwell Online service, sold off all its subsidiaries. ORBIT Search Services went to the French-owned Questel, the dial-up BRS Search Services to CD Plus Technologies (later to become OVID), and BRS Software Products (including BRS/SEARCH) to Dataware Technologies. Almost up to the end of InfoPro Technologies, BRS Software had been the fastest growing segment of the company. At the 14th BRS North American Users Group Conference in 1999, Dave Schubmehl of Dataware Technologies presented a paper in which he stated "The purpose of this presentation is to update BRS users on upcoming releases of BRS/Search, NetAnswer, and other Dataware products. BRS/Search 7.0 will include features specifically requested by customers, as well as other enhancements. Earlier this year, Dataware acquired Sovereign Hill Software, makers of InQuery. In light of that acquisition, and Dataware's other development projects, we'll look at Dataware's plans for all products, including BRS/Search and NetAnswer." == Open Text acquisition of BRS/Search == In 2001 BRS/Search was acquired by Open Text and became LiveLink ECM Discovery Server. It is now referred to as Open Text Discovery Server. Open Text still supports both BRS/Search and NetAnswer. The core BRS/Search technology in the Open Text portfolio was augmented with other capabilities through various acquisitions. For example, Dataware's acquisition of Sovereign-Hill brought InQuery, “a probabilistic information retrieval system using an inference network”, which was developed by the University of Massachusetts Amherst Center for Intelligent Information Retrieval] out of the UMass CIIR and into the marketplace. A product re-branding table shows the range of products, their old names and their new names. InQuery is a concept search engine that uses noun phrases, parts of speech and other co-occurrence relationships in overlapping passages of text rather than single term inverted indexes of single words in documents. Open Text's portfolio has grown to include Hummingbird Content Management, and has always included BASIS. == 2003 == BRS/Search North America User's Group (BRSNAUG) website with a June 8, 2003 date listed the following features for BRS/Search. The BRSNAUG also disincorporated in 2003. Cross-references to BRS/Search on the World Wide Web point to Open Text Livelink. Engine features include: Rapid query response time. Numerical data handling and elementary statistical processing (sum, avg, min, max) Search results weighting and relevancy ranking Left- and right-truncation and expansion of search terms Superior data compression – loaded databases typically use only about 1.5 times the input stream size in disk space Large capacity databases – up to 100 million documents, each with up to 65,000 paragraphs Fine control of indexing and searching – right down to the word, sentence, and paragraph level Fine control over data security. Document access can be controlled at the database, document, and paragraph level International language support for all 7/8 bit characters sets and customizable language tables Flexible and customizable stop word lists ANSI-compatible thesauri Hypertext links within and between documents and databases (R6.x) Support for natural language parsing of queries Automatic document summarization tools Client/Server development Programming interfaces for World-Wide Web (HTTP, HTML) access to databases

    Read more →
  • Machine-learned interatomic potential

    Machine-learned interatomic potential

    Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

    Read more →
  • Reservoir sampling

    Reservoir sampling

    Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far. == Motivation == Suppose we see a sequence of items, one at a time. We want to keep 10 items in memory, and we want them to be selected at random from the sequence. If we know the total number of items n and can access the items arbitrarily, then the solution is easy: select 10 distinct indices i between 1 and n with equal probability, and keep the i-th elements. The problem is that we do not always know the exact n in advance. == Simple: Algorithm R == A simple and popular but slow algorithm, Algorithm R, was created by Jeffrey Vitter. Initialize an array R {\displaystyle R} indexed from 1 {\displaystyle 1} to k {\displaystyle k} , containing the first k items of the input x 1 , . . . , x k {\displaystyle x_{1},...,x_{k}} . This is the reservoir. For each new input x i {\displaystyle x_{i}} , generate a random number j uniformly in { 1 , . . . , i } {\displaystyle \{1,...,i\}} . If j ∈ { 1 , . . . , k } {\displaystyle j\in \{1,...,k\}} , then set R [ j ] := x i . {\displaystyle R[j]:=x_{i}.} Otherwise, discard x i {\displaystyle x_{i}} . Return R {\displaystyle R} after all inputs are processed. This algorithm works by induction on i ≥ k {\displaystyle i\geq k} . While conceptually simple and easy to understand, this algorithm needs to generate a random number for each item of the input, including the items that are discarded. The algorithm's asymptotic running time is thus O ( n ) {\displaystyle O(n)} . Generating this amount of randomness and the linear run time causes the algorithm to be unnecessarily slow if the input population is large. This is Algorithm R, implemented as follows: == Optimal: Algorithm L == If we generate n {\displaystyle n} random numbers u 1 , . . . , u n ∼ U [ 0 , 1 ] {\displaystyle u_{1},...,u_{n}\sim U[0,1]} independently, then the indices of the smallest k {\displaystyle k} of them is a uniform sample of the k {\displaystyle k} -subsets of { 1 , . . . , n } {\displaystyle \{1,...,n\}} . The process can be done without knowing n {\displaystyle n} : Keep the smallest k {\displaystyle k} of u 1 , . . . , u i {\displaystyle u_{1},...,u_{i}} that has been seen so far, as well as w i {\displaystyle w_{i}} , the index of the largest among them. For each new u i + 1 {\displaystyle u_{i+1}} , compare it with u w i {\displaystyle u_{w_{i}}} . If u i + 1 < u w i {\displaystyle u_{i+1} Read more →

  • Schema crosswalk

    Schema crosswalk

    A schema crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema. It maps the elements in one schema to the equivalent elements in another. Crosswalk tables are often employed within or in parallel to enterprise systems, especially when multiple systems are interfaced or when the system includes legacy system data. In the context of Interfaces, they function as an internal extract, transform, load (ETL) mechanism. For example, this is a metadata crosswalk from MARC standards to Dublin Core: Crosswalks show people where to put the data from one scheme into a different scheme. They are often used by libraries, archives, museums, and other cultural institutions to translate data to or from MARC standards, Dublin Core, Text Encoding Initiative (TEI), and other metadata schemes. For example, an archive has a MARC record in its catalog describing a manuscript. Suppose the archive makes a digital copy of that manuscript and wants to display it on the web along with the information from the catalog. In that case, it will have to translate the data from the MARC catalog record into a different format, such as Metadata Object Description Schema, that is viewable on a webpage. Because MARC has various fields than MODS, decisions must be made about where to put the data into MODS. This type of "translating" from one format to another is often called "metadata mapping" or "field mapping," and is related to "data mapping", and "semantic mapping". Crosswalks also have several technical capabilities. They help databases using different metadata schemes to share information. They help metadata harvesters create union catalogs. They enable search engines to search multiple databases simultaneously with a single query. == Challenges for crosswalks == One of the biggest challenges for crosswalks is that no two metadata schemes are 100% equivalent. One scheme may have a field that doesn't exist in another scheme or a field that is split into two different fields in another scheme; this is why data is often lost when mapping from a complex scheme to a simpler one. For example, when mapping from MARC to Simple Dublin Core, the distinction between types of titles is lost: Simple Dublin Core only has one "Title" element, so all of the different types of MARC titles get lumped together without further distinctions. A future attempt to convert the metadata back into MARC would enter the information in the basic MARC 245 Title Statement field, with none of the original distinctions. This is why crosswalks are said to be "lateral" (one-way) mappings from one scheme to another. Separate crosswalks would be required to map from scheme A to scheme B and from scheme B to scheme A. === Difficulties in mapping === Other mapping problems arise when: One scheme has one element that needs to be split up with different parts of it placed in multiple other elements in the second scheme ("one-to-many" mapping) One scheme allows an element to be repeated more than once while another only allows that element to appear once with multiple terms in it Schemes have different data formats (e.g. John Doe or Doe, John) An element in one scheme is indexed, but the equivalent element in the other scheme is not Schemes may use different controlled vocabularies Schemes change their standards over time Some of these problems are not fixable. As Karen Coyle says in "Crosswalking Citation Metadata: The University of California's Experience," "The more metadata experience we have, the more it becomes clear that metadata perfection is not attainable, and anyone who attempts it will be sorely disappointed. When metadata is crosswalked between two or more unrelated sources, there will be data elements that cannot be reconciled in an ideal manner. The key to a successful metadata crosswalk is intelligent flexibility. It is essential to focus on the important goals and be willing to compromise to reach a practical conclusion to projects."

    Read more →
  • Semantic heterogeneity

    Semantic heterogeneity

    Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogeneity is compounded due to the flexibility of semi-structured data and various tagging methods applied to documents or unstructured data. Semantic heterogeneity is one of the more important sources of differences in heterogeneous datasets. Yet, for multiple data sources to interoperate with one another, it is essential to reconcile these semantic differences. Decomposing the various sources of semantic heterogeneities provides a basis for understanding how to map and transform data to overcome these differences. == Classification == One of the first known classification schemes applied to data semantics is from William Kent in the late 80s. Kent's approach dealt more with structural mapping issues than differences in meaning, which he pointed to data dictionaries as potentially solving. One of the most comprehensive classifications is from Pluempitiwiriyawej and Hammer, "Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources". They classify heterogeneities into three broad classes: Structural conflicts arise when the schema of the sources representing related or overlapping data exhibit discrepancies. Structural conflicts can be detected when comparing the underlying schema. The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts between the element types and attribute names. Domain conflicts arise when the semantics of the data sources that will be integrated exhibit discrepancies. Domain conflicts can be detected by looking at the information contained in the schema and using knowledge about the underlying data domains. The class of domain conflicts includes schematic discrepancy, scale or unit, precision, and data representation conflicts. Data conflicts refer to discrepancies among similar or related data values across multiple sources. Data conflicts can only be detected by comparing the underlying sources. The class of data conflicts includes ID-value, missing data, incorrect spelling, and naming conflicts between the element contents and the attribute values. Moreover, mismatches or conflicts can occur between set elements (a "population" mismatch) or attributes (a "description" mismatch). Michael Bergman expanded upon this schema by adding a fourth major explicit category of language, and also added some examples of each kind of semantic heterogeneity, resulting in about 40 distinct potential categories . This table shows the combined 40 possible sources of semantic heterogeneities across sources: A different approach toward classifying semantics and integration approaches is taken by Sheth et al. Under their concept, they split semantics into three forms: implicit, formal and powerful. Implicit semantics are what is either largely present or can easily be extracted; formal languages, though relatively scarce, occur in the form of ontologies or other description logics; and powerful (soft) semantics are fuzzy and not limited to rigid set-based assignments. Sheth et al.'s main point is that first-order logic (FOL) or description logic is inadequate alone to properly capture the needed semantics. == Relevant applications == Besides data interoperability, relevant areas in information technology that depend on reconciling semantic heterogeneities include data mapping, semantic integration, and enterprise information integration, among many others. From the conceptual to actual data, there are differences in perspective, vocabularies, measures and conventions once any two data sources are brought together. Explicit attention to these semantic heterogeneities is one means to get the information to integrate or interoperate. A mere twenty years ago, information technology systems expressed and stored data in a multitude of formats and systems. The Internet and Web protocols have done much to overcome these sources of differences. While there is a large number of categories of semantic heterogeneity, these categories are also patterned and can be anticipated and corrected. These patterned sources inform what kind of work must be done to overcome semantic differences where they still reside.

    Read more →
  • Artificial intelligence controversies

    Artificial intelligence controversies

    The controversies surrounding artificial intelligence encompass a broad range of public, academic, and political debates regarding the societal effects of artificial intelligence (AI). These debates intensified particularly in the late 2010s and 2020s, coinciding with an accelerated period of development known as the AI boom. While advocates emphasize the technology's potential to solve complex problems and enhance human quality of life, detractors highlight a wide array of dangers and challenges. These include concerns over ethics, plagiarism and theft, fraud, safety and alignment, environmental impacts, technological unemployment, and the spread of misinformation. It also covers severe future or theoretical challenges, such as the emergence of artificial superintelligence and existential risks. == 2016 == === Microsoft Tay chatbot (2016) === On March 23, 2016, Microsoft released Tay, a chatbot designed to mimic the language patterns of a 19-year-old American girl and learn from interactions with Twitter users. Soon after its launch, Tay began posting racist, sexist, and otherwise inflammatory tweets after Twitter users deliberately taught it offensive phrases and exploited its "repeat after me" capability. Examples of controversial outputs included Holocaust denial and calls for genocide using racial slurs. Within 16 hours of its release, Microsoft suspended the Twitter account, deleted the offensive tweets, and stated that Tay had suffered from a "coordinated attack by a subset of people" that "exploited a vulnerability." Tay was briefly and accidentally re-released on March 30 during testing, after which it was permanently shut down. Microsoft CEO Satya Nadella later stated that Tay "has had a great influence on how Microsoft is approaching AI" and taught the company the importance of taking accountability. == 2022 == === Voiceverse NFT plagiarism scandal (2022) === On January 14, 2022, voice actor Troy Baker announced a partnership with Voiceverse, a blockchain-based company that marketed proprietary AI voice cloning technology as non-fungible tokens (NFT), triggering immediate backlash over environmental concerns, fears that AI could displace human voice actors, and concerns about fraud. Later that same day, the pseudonymous creator of 15.ai—a free, non-commercial AI voice synthesis research project—revealed through server logs that Voiceverse had used 15.ai to generate voice samples, pitch-shifted them to make them unrecognizable, and falsely marketed them as their own proprietary technology before selling them as NFTs; the developer of 15.ai had previously stated that they had no interest in incorporating NFTs into their work. Voiceverse confessed within an hour and stated that their marketing team had used 15.ai without attribution while rushing to create a demo. News publications and AI watchdog groups universally characterized the incident as theft stemming from generative artificial intelligence. === Théâtre D'opéra Spatial (2022) === On August 29, 2022, Jason Michael Allen won first place in the "emerging artist" (non-professional) division of the "Digital Arts/Digitally-Manipulated Photography" category of the Colorado State Fair's fine arts competition with Théâtre D'opéra Spatial, a digital artwork created using the AI image generator Midjourney, Adobe Photoshop, and AI upscaling tools, becoming one of the first images made using generative AI to win such a prize. Allen disclosed his use of Midjourney when submitting, though the judges did not know it was an AI tool but stated they would have awarded him first place regardless. While there was little contention about the image at the fair, reactions to the win on social media were negative. On September 5, 2023, the United States Copyright Office ruled that the work was not eligible for copyright protection as the human creative input was de minimis and that copyright rules "exclude works produced by non-humans." == 2023 == === Statements on AI risk (2023) === On March 22, 2023, the Future of Life Institute published an open letter calling on "all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4", citing risks such as AI-generated propaganda, extreme automation of jobs, human obsolescence, and a society-wide loss of control. The letter, published a week after the release of OpenAI's GPT-4, asserted that current large language models were "becoming human-competitive at general tasks". It received more than 30,000 signatures, including academic AI researchers and industry CEOs such as Yoshua Bengio, Stuart Russell, Elon Musk, Steve Wozniak and Yuval Noah Harari. The letter was criticized for diverting attention from more immediate societal risks such as algorithmic biases, with Timnit Gebru and others arguing that it amplified "some futuristic, dystopian sci-fi scenario" instead of current problems with AI. On May 30, 2023, the Center for AI Safety released a one-sentence statement signed by hundreds of artificial intelligence experts and other notable figures: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." Signatories included Turing laureates Geoffrey Hinton and Yoshua Bengio, as well as the scientific and executive leaders of several major AI companies, including Sam Altman, Demis Hassabis, and Bill Gates. The statement prompted responses from political leaders, including UK Prime Minister Rishi Sunak, who retweeted it with a statement that the UK government would look carefully into it, and White House Press Secretary Karine Jean-Pierre, who commented that AI "is one of the most powerful technologies that we see currently in our time." Skeptics, including from Human Rights Watch, argued that scientists should focus on known risks of AI instead of speculative future risks. === Removal of Sam Altman from OpenAI (2023) === On November 17, 2023, OpenAI's board of directors ousted co-founder and chief executive Sam Altman, stating that "the board no longer has confidence in his ability to continue leading OpenAI." The removal was precipitated by employee concerns about his handling of artificial intelligence safety and allegations of abusive behavior. Altman was reinstated on November 22 after pressure from employees and investors, including a letter signed by 745 of OpenAI's 770 employees threatening mass resignations if the board did not resign. The removal and subsequent reinstatement caused widespread reactions, including Microsoft's stock falling nearly three percent following the initial announcement and then rising over two percent to an all-time high after Altman was hired to lead a Microsoft AI research team before his reinstatement. The incident also prompted investigations from the Competition and Markets Authority and the Federal Trade Commission into Microsoft's relationship with OpenAI. == 2024 == === Taylor Swift deepfake pornography controversy (2024) === In late January 2024, sexually explicit AI-generated deepfake images of Taylor Swift were proliferated on X, with one post reported to have been seen over 47 million times before its removal. Disinformation research firm Graphika traced the images back to 4chan, while members of a Telegram group had discussed ways to circumvent censorship safeguards of AI image generators to create pornographic images of celebrities. The images prompted responses from anti-sexual assault advocacy groups, US politicians, and Swifties. Microsoft CEO Satya Nadella called the incident "alarming and terrible." X briefly blocked searches of Swift's name on January 27, 2024, and Microsoft enhanced its text-to-image model safeguards to prevent future abuse. On January 30, US senators Dick Durbin, Lindsey Graham, Amy Klobuchar, and Josh Hawley introduced a bipartisan bill that would allow victims to sue individuals who produced or possessed "digital forgeries" with intent to distribute, or those who received the material knowing it was made without consent. === Google Gemini image generation controversy (2024) === In February 2024, social media users reported that Google's Gemini chatbot was generating images that featured people of color and women in historically inaccurate contexts—such as Vikings, Nazi soldiers, and the Founding Fathers—and refusing prompts to generate images of white people. The images were derided on social media, including by conservatives who cited them as evidence of Google's "wokeness", and criticized by Elon Musk, who denounced Google's products as biased and racist. In response, Google paused Gemini's ability to generate images of people. Google executive Prabhakar Raghavan released a statement explaining that Gemini had "overcompensate[d]" in its efforts to strive for diversity and acknowledging that the images were "embarrassing and wrong". Google CEO Sundar Pichai called the incident offensive and unacceptable in an internal memo, promising struc

    Read more →
  • List of information schools

    List of information schools

    This list of information schools, sometimes abbreviated to iSchools, includes members of the iSchools organization. The iSchools organization reflects a consortium of over 130 information schools across the globe. == History == The first iSchools Caucus was formed in 1988 by Syracuse, Pittsburgh, and Drexel and was called the Gang of Three (sometimes gang of four with Rutgers). Syracuse renamed the School of Library Science as the School of Information Studies in 1974, and is considered as the first “iSchool” in history. The group was formally named "the iSchools Caucus" or more casually, the iCaucus. By 2003, the group expanded to include the Universities of Michigan, Washington, Illinois, UNC, Florida State, Indiana, and Texas, and was called the Gang of Ten. The current iSchools Caucus organization was formalized by 2005, with additions of UC Berkeley, UC Irvine, UCLA, Penn State, Georgia Tech, Maryland, Toronto, Carnegie Mellon and Singapore Management University. == iSchools organization == The iSchools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces such as online communities, social networking, the World Wide Web, and databases to physical spaces such as libraries, museums, collections, and other repositories. "School of Information", "Department of Information Studies", or "Information Department" are often the names of the participating organizations. Degree programs at iSchools include course offerings in areas such as information architecture, design, policy, and economics; knowledge management, user experience design, and usability; preservation and conservation; librarianship and library administration; the sociology of information; and human-computer interaction and computer science. === Leadership === The executive committee of the iSchools is made up of the current chair (Ina Fourie, University of Pretoria, South Africa), past chair (Gillian Oliver, Monash University, Australia) and the chair elect (Javed Mostafa, University of Toronto Canada), plus representatives from the three regions (North America, Europe, and Asia-Pacific). The current executive director is Slava Sterzer. == Member institutions == Between 2010 and 2026, the organization expanded globally beyond North America, growing to 133 member schools as of March 2026. For an updated and complete list of member schools, please visit the member database of the iSchools. == iConferences == Members of the iSchools organize a regular academic conference, known as the iConference, hosted by a different member institution each year. September 2005: Pennsylvania State University October 2006: University of Michigan February 2008: University of California, Los Angeles February 2009: University of North Carolina February 2010: University of Illinois at Urbana-Champaign February 2011: University of Washington, Seattle February 2012: University of Toronto February 2013: University of North Texas March 2014: Humboldt-Universität zu Berlin March 2015: University of California, Irvine March 2016: Drexel University March 2017: Wuhan University March 2018: University of Sheffield and Northumbria University March 2019: University of Maryland March 2020: University of Borås (virtual only) March 2021: Renmin University of China (virtual only) February/March 2022: University of Texas at Austin, University College Dublin & Kyushu University (virtual only) March 2023: Universitat Oberta de Catalunya March 2024: Jilin University March 2025: Indiana University March/April 2026: Edinburgh Napier University 2027: Victoria University of Wellington == Other schools of information == Other information schools and programs include: Documentation Research and Training Centre, Indian Statistical Institute, Bangalore San Jose State University, School of Information University of Southern California Library Science Degree Ankara University, Department of Information and Records Management, Ankara/Turkey Marmara University, Department of Information and Records Management, Istanbul/Turkey University of Kelaniya, Department of Library and Information Science, Kelaniya/Sri Lanka University of Colombo, National Institute of Library and Information Science (NILIS), Colombo/Sri Lanka Chicago State University, Department of Information Studies

    Read more →
  • Metadata

    Metadata

    Metadata (or metainformation) is data (or information) that defines and describes the characteristics of other data. It often helps to describe, explain, locate, or otherwise make data easier to retrieve, use, or manage. For example, the title, author, and publication date of a book are metadata about the book. But, while a data asset is finite, its metadata is infinite. As such, efforts to define, classify types, or structure metadata are expressed as examples in the context of its use. The term "metadata" has a history dating to the 1960s where it occurred in computer science and in popular culture. Different types of metadata serve different functions. For example, descriptive metadata for a document might include the author, creation date, file size and keywords. Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance. Unique metadata standards exist for different disciplines (e.g., museum collections, digital audio files, websites, etc.). Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc. In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations. == Types == There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials. Administrative metadata – the information to help manage a resource, like resource type, and permissions, and when and how it was created. Reference metadata – the information about the contents and quality of statistical data. Statistical metadata – also called process data, may describe processes that collect, process, or produce statistical data. Legal metadata – provides information about the creator, copyright holder, and public licensing, if provided. Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways. While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata. Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball, metadata can be divided into three categories: technical metadata (or internal metadata), business metadata (or external metadata), and process metadata. Dan Linstedt, creator of the data vault methodology, says business metadata "...provide[s] definition of the functionality, definition of the data, definition of the elements, and definition of how the data is used within business...business metadata includes business requirements, time-lines, business metrics, business process flows, and business terminology." Business metadata is important because it can greatly facilitate the usefulness of the data to business people. A simple example of business metadata is a glossary entry. Hover functionality in an application or web form can enable a glossary definition to be shown when cursor is on a field or term. Other examples of business metadata include annotation ability within applications. For example, a business user may be viewing a business intelligence (BI) report and notice a trend in the data. The user may have background knowledge as to why this trend occurs. Some business intelligence tools enable the user to create an annotation within the report that explains the trend. Such an annotation can enhance other users' understanding of the data. This example is especially powerful because it is created by a business user for the use of other business people. NISO distinguishes three types of metadata: descriptive, structural, and administrative. Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights, while preservation metadata contains information to preserve and save a resource. Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production. An additional type of metadata beginning to be more developed is accessibility metadata. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata. The Schema.org website has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done. == History == Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases. In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards. An early description of "meta data" for computer systems was written by David Griffel and Stuart McIntosh at the MIT Center for International Studies in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data." == Definition == Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include: Means of creation of the data Source of the data Time and date of creation Creator or author of the data Location on a computer network where the data was created Standards used Data quality For example, a digital image may include metadata that describes the size of the image, its color depth, resolution,

    Read more →
  • HAKMEM

    HAKMEM

    HAKMEM, alternatively known as AI Memo 239, is a February 1972 "memo" (technical report) of the MIT AI Lab containing a wide variety of hacks, including useful and clever algorithms for mathematical computation, some number theory and schematic diagrams for hardware – in Guy L. Steele's words, "a bizarre and eclectic potpourri of technical trivia". Contributors included about two dozen members and associates of the AI Lab. The title of the report is short for "hacks memo", abbreviated to six upper case characters that would fit in a single PDP-10 machine word (using a six-bit character set). == History == HAKMEM is notable as an early compendium of algorithmic technique, particularly for its practical bent, and as an illustration of the wide-ranging interests of AI Lab people of the time, which included almost anything other than AI research. HAKMEM contains original work in some fields, notably continued fractions. == Introduction == Compiled with the hope that a record of the random things people do around here can save some duplication of effort -- except for fun. Here is some little known data which may be of interest to computer hackers. The items and examples are so sketchy that to decipher them may require more sincerity and curiosity than a non-hacker can muster. Doubtless, little of this is new, but nowadays it's hard to tell. So we must be content to give you an insight, or save you some cycles, and to welcome further contributions of items, new or used.

    Read more →
  • Ideonomy

    Ideonomy

    Ideonomy is a combinatorial "science of ideas" developed by American independent scholar Patrick M. Gunkel (1947–2017). Specifically, Ideonomy is concerned with the systematic organization of ideas and the discovery of the rules behind how ideas combine, diverge, and transform. Gunkel defined ideonomy as "the science of the laws of ideas and of the application of such laws to the generation of all possible ideas in connection with any subject, idea, or thing." In his 1992 book A History of Knowledge, Charles Van Doren compared ideonomy to a "mining operation" that excavates meanings and thought to discover treasures hidden deep within language. Sources from the 1980s and 1990s demonstrate that ideonomy was useful to academic researchers in fields including biology, toxicology, and nursing/patient care. Beginning in the 2010s, academics in a wide range of fields including machine learning, marketing, computational modeling, and cybersecurity have relied on materials generated for ideonomy to provide methodological support for their research. == Etymology and definition == The word "ideonomy" combines the Greek roots ideo- (from idea, meaning pattern or form) and -nomy (from nomos, meaning law or custom). The suffix -nomy suggests the laws concerning or the totality of knowledge about a given subject, as in astronomy or taxonomy. In a note posted on the MIT ideonomy website, Gunkel states that the word was supposedly first coined by the French Encyclopedists to refer to a science of ideas. No evidence is provided for this statement, however. The concept bears some relationship to Antoine Destutt de Tracy's "ideology" (1796), which originally meant a systematic science of ideas before acquiring its modern political connotations. Gunkel provided several metaphorical descriptions of ideonomy: An "idea bank": a computer network enabling systematic exploration of infinite possible ideas A "kaleidoscope" that can exhibit all possible combinations and transformations of ideas A "prism" capable of diffracting any idea into its cognitive components A "gigantic microscope for magnifying the ideocosm" == History and development == In 1984, Gunkel received a five-year unsolicited grant from the Richard Lounsbery Foundation of New York to develop ideonomy. A June 1, 1987 article on the front page of The Wall Street Journal brought Gunkel and ideonomy to wider public attention. Some academics were interested in using ideonomy's techniques, including biologist Betsey Dyer, who published several contemporaneous peer-reviewed studies citing ideonomy. Academic researchers in the field of toxicology and nursing/patient care also used ideonomy. However, ideonomy's broadest contribution to date came beginning in the 2010s, as a list of personality traits generated for combinatorial matching was used by researchers in artificial intelligence to code human emotions for machine-learning tasks, develop computational models related to personality, develop a measurement framework for influencer-brand recommender systems, and aid information awareness/cybersecurity assessment. == Methodology == The foundational empirical method of ideonomy involves the systematic creation of extensive lists. Gunkel's apartment reportedly contained thousands of lists on every conceivable topic. Gunkel termed each list an "organon," which he described as expanding through "combination, permutation, transformation, generalization, specialization, intersection, interaction, reapplication, recursive use, etc. of existing organons." The ideonomic process follows a progressive structure. The ideonomist begins with a simple list of examples of a particular idea, concept, or thing. The list need not be exhaustive. By studying this list, the ideonomist isolates and identifies types. This categorical analysis then reveals missing items, allowing the primary list to be improved and refined. Gunkel emphasized that list items must not only cover genuine categories of nature but also be formulated in ways that yield the largest possible number of syntactically coherent possibilities when combined. The core technique of ideonomy is "ideocombinatorics"—the systematic intersection and combination of items from different lists to generate novel composite concepts. Gunkel developed computer programs to automate this process. For example, combining a list of 230 Universal Elementary Shapes (pits, pyramids, trenches, hemispheres, needles) with a list of 74 Types of Order (recurrence, identity, likeness of parts) yields 17,020 possible "shapes of order." These combinations, when phrased as questions ("Can there be pits of recurrence?"), could suggest new categories of phenomena worthy of investigation. The computer-generated output is typically repetitive and often meaningless. However, with sufficient frequency, the combinations yield results that are unexpectedly interesting and fruitful. In one documented case, Gunkel's programs generated 45,540 questions about toxins for microbiologist David Bermudes. One question—"Can hierarchies of cell process be used as a basis for classifying toxic action?"—prompted Bermudes to develop a novel approach to classifying biological toxins by the type of molecule they attack, rather than by chemical structure or physiological system affected. According to one contemporaneous account of ideonomy, "Gunkel takes for his field all fields and all ideas about anything. He uses a computer to generate lists of words and phrases and by juxtaposition reviews the resultant patterns for novel ideas. The computer is ideal for this task because the mind would rebel at the formidable processing task ideonomy involves. What we have here is computer generated originality." == Applications == Gunkel and his supporters identified several practical applications for ideonomic methods: Scientific research: Biologist Betsey Dyer of Wheaton College published research crediting ideonomy for helping to generate ideas. Medical science: When Austin pathologist Michael T. O'Brien was presented with the ideonomically-generated question "Can arteries have rashes?", he initially dismissed it as nonsense. Upon reflection, he realized that large arteries are supplied with blood by tiny vessels that might become inflamed and dilated, analogous to skin vessels in a rash—a phenomenon potentially worth researching. Analogical thinking: Harvard law professor Robert Clark used ideonomic analogies to write a research paper comparing plant structure with human hierarchies. Artificial intelligence: Douglas Lenat, a researcher at Microelectronics and Computer Technology Corporation (MCC) in Austin, suggested that Gunkel's lists enumerating types of human mistakes could help design AI systems capable of recognizing and correcting their own errors. == Reception and criticism == Ideonomy received mixed reactions from the academic and scientific communities. Prominent supporters included: Edward Fredkin, former director of MIT's computer science laboratory, who praised Gunkel's "provocative ideas on artificial intelligence." Marvin Minsky, AI scientist and MIT professor, who described ideonomy as "perhaps the most extensive study of ways to generate ideas." Frederick Seitz, president emeritus of Rockefeller University, who noted Gunkel's "encyclopedic scope" Robert C. Clark, Harvard law professor, who called Gunkel "the most intelligent person I ever met" However, skeptics questioned whether ideonomy constituted a genuine science. Fredkin himself noted that Gunkel "pours out about 60 ideas a minute, and 59 of them are bad," though he added that "even with one good idea out of 60, it's still an amazing accomplishment." Douglas Lenat observed that brainstorming with Gunkel was "a bit like being hit over the head by the muse with a sledgehammer" and that "he puts people off." Gunkel himself acknowledged that ideonomy was in its infancy and might seem "absurdly utopian." His planned magnum opus on ideonomy remained incomplete, and was posted on an MIT website thanks to faculty advisor Whitman Richards. Gunkel wrote: "Pioneering in a completely new field, yes in a new science, is almost unreal. It is heartbreaking, it is pitiable, it is almost inhuman. Honestly, it is a hell. There is nothing heroic about it." == Related concepts == Gunkel identified several historical precedents for ideonomic thinking: Gottfried Wilhelm Leibniz (1646–1716): The philosopher's work on a universal characteristic (characteristica universalis) and calculus of reasoning Peter Mark Roget (1779–1869): Creator of Roget's Thesaurus, which organized concepts into a systematic taxonomy Dmitri Mendeleev (1834–1907): Developer of the periodic table, demonstrating how combining lists of element families could reveal previously unseen connections Fritz Zwicky (1898–1974): The Caltech astrophysicist whom Gunkel called the "grandfather of ideonomy" for his development of "morphological research"—systematic exploration of all possible solutions t

    Read more →
  • Algorithm

    Algorithm

    In mathematics and computer science, an algorithm ( ) is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning). In contrast, a heuristic is an approach to solving problems without well-defined correct or optimal results. For example, although social media recommender systems are commonly called "algorithms", they actually rely on heuristics as there is no truly "correct" recommendation. As an effective method, an algorithm can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and input, a computation occurs at each step, eventually producing output and terminating. The transition between states can be non-deterministic; randomized algorithms incorporate random input. == Etymology == Around 825 AD, Persian scientist and polymath Muḥammad ibn Mūsā al-Khwārizmī wrote kitāb al-ḥisāb al-hindī ("Book of Indian computation") and kitab al-jam' wa'l-tafriq al-ḥisāb al-hindī ("Addition and subtraction in Indian arithmetic"). In the early 12th century, Latin translations of these texts involving the Hindu–Arabic numeral system and arithmetic appeared, for example Liber Alghoarismi de practica arismetrice, attributed to John of Seville, and Liber Algoritmi de numero Indorum, attributed to Adelard of Bath. Here, alghoarismi or algoritmi is the Latinization of Al-Khwarizmi's name; the text starts with the phrase Dixit Algoritmi, or "Thus spoke Al-Khwarizmi". The word algorism in English came to mean the use of place-value notation in calculations; it occurs in the Ancrene Wisse from circa 1225. By the time Geoffrey Chaucer wrote The Canterbury Tales in the late 14th century, he used a variant of the same word in describing augrym stones, stones used for place-value calculation. In the 15th century, under the influence of the Greek word ἀριθμός (arithmos, "number"; cf. "arithmetic"), the Latin word was altered to algorithmus. By 1596, this form of the word was used in English, as algorithm, by Thomas Hood. == Definition == One informal definition is "a set of rules that precisely defines a sequence of operations", which would include all computer programs, and any bureaucratic procedure or cook-book recipe. In general, a program is an algorithm only if it stops eventually. Formally, algorithm is an explicit set of instructions to produce an output, that can be followed by a computer or a human performing specific operations on symbols.. == History == === Ancient algorithms === Step-by-step procedures for solving mathematical problems have been recorded since antiquity. This includes in Babylonian mathematics (around 2500 BC), Egyptian mathematics (around 1550 BC), Indian mathematics (around 800 BC and later), the Ifa Oracle (around 500 BC), Greek mathematics (around 240 BC), Chinese mathematics (around 200 BC and later), and Arabic mathematics (around 800 AD). The earliest evidence of algorithms is found in ancient Mesopotamian mathematics. A Sumerian clay tablet found in Shuruppak near Baghdad and dated to c. 2500 BC describes the earliest division algorithm. During the Hammurabi dynasty c. 1800 – c. 1600 BC, Babylonian clay tablets described algorithms for computing formulas. Algorithms were also used in Babylonian astronomy. Babylonian clay tablets describe and employ algorithmic procedures to compute the time and place of significant astronomical events. Algorithms for arithmetic are also found in ancient Egyptian mathematics, dating back to the Rhind Mathematical Papyrus c. 1550 BC. Algorithms were later used in ancient Hellenistic mathematics. Two examples are the Sieve of Eratosthenes, which was described in the Introduction to Arithmetic by Nicomachus, and the Euclidean algorithm, which was first described in Euclid's Elements (c. 300 BC).Examples of ancient Indian mathematics included the Shulba Sutras, the Kerala School, and the Brāhmasphuṭasiddhānta. In the 9th century, Muḥammad ibn Mūsā al-Khwārizmī revolutionized the field by establishing the algorithm as a systematic, finite sequence of logical steps to solve mathematical problems. In his influential work, The Compendious Book on Calculation by Completion and Balancing, he moved beyond specific numerical solutions to introduce general procedures for algebraic reduction and balancing. This transformed mathematics into a 'mechanical' process of well-defined rules—a fundamental shift that laid the groundwork for modern algorithmic theory. The Latin translation of his arithmetic treatise, titled Algoritmi de numero Indorum, led to the term algorithm being derived from the Latinization of his name, Algoritmi, specifically to describe this new rule-based approach to mathematics. The first cryptographic algorithm for deciphering encrypted code was developed by Al-Kindi, a 9th-century Arab mathematician, in A Manuscript On Deciphering Cryptographic Messages. He gave the first description of cryptanalysis by frequency analysis, the earliest codebreaking algorithm. === Computers === ==== Weight-driven clocks ==== Weight-driven clocks were a key European invention in Middle Ages, specifically the verge escapement mechanism producing the tick of mechanical clocks. Accurate automatic machines led to mechanical automata in the 13th century and computational machines—the difference and analytical engines of Charles Babbage and Ada Lovelace in the mid-19th century. Lovelace designed the first algorithm intended for a computer, Babbage's analytical engine, the first real Turing-complete computer, more than the mechanical calculators of the time. Although the full implementation of Babbage's second device was only built decades after her lifetime, Lovelace has been called "history's first programmer". ==== Electromechanical relay ==== The Jacquard loom, a precursor to punch cards, and telephone switching machines led to the development of the first computers. By the mid-19th century, the telegraph, was in use throughout the world. By the late 19th century, ticker tape (c. 1870s) and punch cards (c. 1890) were developed. Then came the teleprinter (c. 1910) with its punched-paper use of Baudot code on tape. Telephone-switching networks of electromechanical relays were invented in 1835. These led to the invention of the digital adding device by George Stibitz in 1937. While working in Bell Laboratories, he observed the "burdensome" use of mechanical calculators with gears, prompting him to experiment create an experimental digital adder at home. === Formalization === In 1928, a partial formalization of the modern concept of algorithms began with attempts to solve David Hilbert's Entscheidungsproblem (decision problem). Later formalizations were framed as attempts to define "effective calculability" or "effective method". Those formalizations included the Gödel–Herbrand–Kleene recursive functions of 1930, 1934 and 1935, Alonzo Church's lambda calculus of 1936, Emil Post's Formulation 1 of 1936, and Alan Turing's Turing machines of 1936–37 and 1939. === Modern Algorithms === For decades, it was assumed that algorithm evolution progresses from heuristics to formal algorithms. A Symbolic integration provides a classic illustration. In 1961, James Slagle’s program SAINT used heuristics to solve 52 of 54 freshman calculus exercises from an MIT textbook (≈96%). In 1967, Larry Moses’s SIN refined the heuristics and achieved 100% success, though it remained heuristic. Finally, in 1969, Robert Risch introduced the Risch Algorithm with formal guarantees. This trajectory defined the traditional path: heuristics evolving until a definitive, guaranteed algorithm emerged. However, the rise of transformer-based AI has inverted this sequence — classical algorithms are now being displaced by heuristics once again. Algorithms have evolved and improved in many ways as time goes on. Common uses of algorithms today include social media apps like Instagram and YouTube. Algorithms are used as a way to analyze what people like and push more of those things to the people who interact with them. Quantum computing uses quantum algorithm procedures to solve problems faster. More recently, in 2024, NIST updated their post-quantum encryption standards, which includes new encryption algorithms to enhance defenses against attacks using quantum computing. == Representations == Algorithms can be expressed in many kinds of notation, including natural languages, pseudocode, flowcharts, drakon-charts, programming languages or control tables. Natural language expressions of algorithms tend to be verbose and ambiguous and are rarely used for complex or technical algor

    Read more →
  • Enterprise bus matrix

    Enterprise bus matrix

    The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball's approach to dimensional modeling conformed dimension. The bus matrix defines part of the data warehouse bus architecture and is an output of the business requirements phase in the Kimball lifecycle. It is applied in the following phases of dimensional modeling and development of the data warehouse. The matrix can be categorized as a hybrid model, being part technical design tool, part project management tool and part communication tool == Background == The need for an enterprise bus matrix stems from the way one goes about creating the overall data warehouse environment. Historically there have been two approaches: a structured, centralized and planned approach and a more loosely defined, department specific approach, in which solutions are developed in a more independent matter. Autonomous projects can result in a range of isolated stove pipe data marts. Naturally each approach has its issues; the visionary approach often struggles with long delivery cycles and lack of reaction time as needs emerge and scope issues arise. On the other hand, the development of isolated data marts leads to stovepipe systems that lack synergy in development. Over time this approach will lead to a so-called data-mart-in-a-box architecture where interoperability and lack of cohesion is apparent, and can hinder the realization of an overall enterprise data warehouse. As an attempt to handle this issue, Ralph Kimball introduced the enterprise bus. == Description == The bus matrix purpose is one of high abstraction and visionary planning on the data warehouse architectural level. By dictating coherency in the development and implementation of an overall data warehouse the bus architecture approach enables an overall vision of the broader enterprise integration and consistency while at the same time dividing the problem into more manageable parts – all in a technology and software independent manner. The bus matrix and architecture builds upon the concept of conformed dimensions, creating a structure of common dimensions that ideally can be used across the enterprise by all business processes related to the data warehouse and the corresponding fact tables from which they derive their context. According to Kimball and Margy Ross's article “Differences of Opinion” "The Enterprise Data warehouse built on the bus architecture ”identifies and enforces the relationship between business process metrics (facts) and descriptive attributes (dimensions)”. The concept of a bus is well known in the language of information technology, and is what reflects the conformed dimension concept in the data warehouse, creating the skeletal structure where all parts of a system connect, ensuring interoperability and consistency of data, and at the same time considers future expansion. This makes the conformed dimensions act as the integration ‘glue’, creating a robust backbone of the enterprise Data Warehouse.

    Read more →