AI Headshot Business

AI Headshot Business — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Tesla Dojo

    Tesla Dojo

    Tesla Dojo is a series of supercomputers designed and built by Tesla for computer vision video processing and recognition. It was used for training Tesla's machine learning models to improve its Full Self-Driving (FSD) advanced driver-assistance system. It went into production in July 2023. Dojo's goal was to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla's 4+ million cars. This goal led to a considerably different architecture than conventional supercomputer designs. In August 2025, Bloomberg News reported that the Dojo project had been disbanded, though it was restarted in January 2026. == History == Tesla operates several massively parallel computing clusters for developing its Autopilot advanced driver assistance system. Its primary unnamed cluster using 5,760 Nvidia A100 graphics processing units (GPUs) was touted by Andrej Karpathy in 2021 at the fourth International Joint Conference on Computer Vision and Pattern Recognition (CCVPR 2021) to be "roughly the number five supercomputer in the world" at approximately 81.6 petaflops, based on scaling the performance of the Nvidia Selene supercomputer, which uses similar components. However, the performance of the primary Tesla GPU cluster has been disputed, as it was not clear if this was measured using single-precision or double-precision floating point numbers (FP32 or FP64). Tesla also operates a second 4,032 GPU cluster for training and a third 1,752 GPU cluster for automatic labeling of objects. The primary unnamed Tesla GPU cluster has been used for processing one million video clips, each ten seconds long, taken from Tesla Autopilot cameras operating in Tesla cars in the real world, running at 36 frames per second. Collectively, these video clips contained six billion object labels, with depth and velocity data; the total size of the data set was 1.5 petabytes. This data set was used for training a neural network intended to help Autopilot computers in Tesla cars understand roads. By August 2022, Tesla had upgraded the primary GPU cluster to 7,360 GPUs. Dojo was first mentioned by Elon Musk in April 2019 during Tesla's "Autonomy Investor Day". In August 2020, Musk stated it was "about a year away" due to power and thermal issues. Dojo was officially announced at Tesla's Artificial Intelligence (AI) Day on August 19, 2021. Tesla revealed details of the D1 chip and its plans for "Project Dojo", a datacenter that would house 3,000 D1 chips; the first "Training Tile" had been completed and delivered the week before. In October 2021, Tesla released a "Dojo Technology" whitepaper describing the Configurable Float8 (CFloat8) and Configurable Float16 (CFloat16) floating point formats and arithmetic operations as an extension of Institute of Electrical and Electronics Engineers (IEEE) standard 754. At the follow-up AI Day in September 2022, Tesla announced it had built several System Trays and one Cabinet. During a test, the company stated that Project Dojo drew 2.3 megawatts (MW) of power before tripping a local San Jose, California power substation. At the time, Tesla was assembling one Training Tile per day. In August 2023, Tesla powered on Dojo for production use as well as a new training cluster configured with 10,000 Nvidia H100 GPUs. In January 2024, Musk described Dojo as "a long shot worth taking because the payoff is potentially very high. But it's not something that is a high probability." In June 2024, Musk explained that ongoing construction work at Gigafactory Texas is for a computing cluster claiming that it is planned to comprise an even mix of "Tesla AI" and Nvidia/other hardware with a total thermal design power of at first 130 MW and eventually exceeding 500 MW. In August 2025, Bloomberg News reported that the Dojo project was disbanded, though Musk announced it would be restarted in January 2026 with a new chip iteration. == Technical architecture == The fundamental unit of the Dojo supercomputer is the D1 chip, designed by a team at Tesla led by ex-AMD CPU designer Ganesh Venkataramanan, including Emil Talpes, Debjit Das Sarma, Douglas Williams, Bill Chang, and Rajiv Kurian. The D1 chip is manufactured by the Taiwan Semiconductor Manufacturing Company (TSMC) using 7 nanometer (nm) semiconductor nodes, has 50 billion transistors and a large die size of 645 mm2 (1.0 square inch). Updating at Artificial Intelligence (AI) Day in 2022, Tesla announced that Dojo would scale by deploying multiple ExaPODs, in which there would be: 10 Cabinets per ExaPOD (1,062,000 cores, 3,000 D1 chips) 2 System Trays per Cabinet (106,200 cores, 300 D1 chips) 6 Training Tiles per System Tray (53,100 cores, along with host interface hardware) 25 D1 chips per Training Tile (8,850 cores) 354 computing cores per D1 chip According to Venkataramanan, Tesla's senior director of Autopilot hardware, Dojo will have more than an exaflop (a million teraflops) of computing power. For comparison, according to Nvidia, in August 2021, the (pre-Dojo) Tesla AI-training center used 720 nodes, each with eight Nvidia A100 Tensor Core GPUs for 5,760 GPUs in total, providing up to 1.8 exaflops of performance. === D1 chip === Each node (computing core) of the D1 processing chip is a general purpose 64-bit CPU with a superscalar core. It supports internal instruction-level parallelism, and includes simultaneous multithreading (SMT). It doesn't support virtual memory and uses limited memory protection mechanisms. Dojo software/applications manage chip resources. The D1 instruction set supports both 64-bit scalar and 64-byte single instruction, multiple data (SIMD) vector instructions. The integer unit mixes reduced instruction set computer (RISC-V) and custom instructions, supporting 8, 16, 32, or 64 bit integers. The custom vector math unit is optimized for machine learning kernels and supports multiple data formats, with a mix of precisions and numerical ranges, many of which are compiler composable. Up to 16 vector formats can be used simultaneously. ==== Node ==== Each D1 node uses a 32-byte fetch window holding up to eight instructions. These instructions are fed to an eight-wide decoder which supports two threads per cycle, followed by a four-wide, four-way SMT scalar scheduler that has two integer units, two address units, and one register file per thread. Vector instructions are passed further down the pipeline to a dedicated vector scheduler with two-way SMT, which feeds either a 64-byte SIMD unit or four 8×8×4 matrix multiplication units. The network on-chip (NOC) router links cores into a two-dimensional mesh network. It can send one packet in and one packet out in all four directions to/from each neighbor node, along with one 64-byte read and one 64-byte write to local SRAM per clock cycle. Hardware native operations transfer data, semaphores and barrier constraints across memories and CPUs. System-wide double data rate 4 (DDR4) synchronous dynamic random-access memory (SDRAM) memory works like bulk storage. ==== Memory ==== Each core has a 1.25 megabytes (MB) of SRAM main memory. Load and store speeds reach 400 gigabytes (GB) per second and 270 GB/sec, respectively. The chip has explicit core-to-core data transfer instructions. Each SRAM has a unique list parser that feeds a pair of decoders and a gather engine that feeds the vector register file, which together can directly transfer information across nodes. ==== Die ==== Twelve nodes (cores) are grouped into a local block. Nodes are arranged in an 18×20 array on a single die, of which 354 cores are available for applications. The die runs at 2 gigahertz (GHz) and totals 440 MB of SRAM (360 cores × 1.25 MB/core). It reaches 376 teraflops using 16-bit brain floating point (BF16) numbers or using configurable 8-bit floating point (CFloat8) numbers, which is a Tesla proposal, and 22 teraflops at FP32. Each die comprises 576 bi-directional serializer/deserializer (SerDes) channels along the perimeter to link to other dies, and moves 8 TB/sec across all four die edges. Each D1 chip has a thermal design power of approximately 400 watts. === Training Tile === The water-cooled Training Tile packages 25 D1 chips into a 5×5 array. Each tile supports 36 TB/sec of aggregate bandwidth via 40 input/output (I/O) chips - half the bandwidth of the chip mesh network. Each tile supports 10 TB/sec of on-tile bandwidth. Each tile has 11 GB of SRAM memory (25 D1 chips × 360 cores/D1 × 1.25 MB/core). Each tile achieves 9 petaflops at BF16/CFloat8 precision (25 D1 chips × 376 TFLOP/D1). Each tile consumes 15 kilowatts; 288 amperes at 52 volts. === System Tray === Six tiles are aggregated into a System Tray, which is integrated with a host interface. Each host interface includes 512 x86 cores, providing a Linux-based user environment. Previously, the Dojo System Tray was known as the Training Matrix, which includes six Training Tiles, 20 Dojo Interface Processor cards across four host servers, and Ethernet-l

    Read more →
  • Vinelink.com

    Vinelink.com

    Vinelink.com (VINE) is a national website in the United States that allows victims of crime, and the general public, to track the movements of prisoners held by the various states and territories. The first four letters in the websites name, "vine", are an acronym for "Victim Information and Notification Everyday". Vinelink.com displays information, based on the information provided by the various states' departments of correction and other law enforcement agencies, on whether an inmate is in custody, has been released, has been granted parole or probation, or has escaped from custody. In some cases, the website will reveal whether a defendant has been granted parole or probation, but then subsequently violated conditions of their release and become a fugitive. Information provided on Vinelink.com represents metadata, in that the website lists a defendant's custody status; but does not list what the individual is charged with, their criminal history, or the amount of their bail, if applicable. Internet users accessing the Vinelink.com website choose from a map of states and provinces within the United States where they wish to perform a search for an inmate. The user may then search for an individual using the inmate's or parolee's name, or by entering the inmate's specific department of corrections inmate number, if known. When the inmate's custody status changes, users who have registered to be notified of such changes will be notified via email, phone or both. This information is currently released upon request, without the website requesting reasons for the users search or requiring payment, as public records available to the general public. Inmate information is available for most states, and for Puerto Rico, on the website. The states of Arizona, Georgia, Massachusetts, Montana, New Hampshire and West Virginia provide very limited information on the site. In March of 2025, The Maine Sheriff's Association entered into a contract to pilot the use of the VINE system in three counties in the state as well as a regional jail, therefore making South Dakota the only state that does not participate in the VINE system to any degree. The website does not provide data on prisoners detained by the Federal Bureau of Prisons which has its own inmate locator web site nor for inmates of the U.S. military prisons.

    Read more →
  • Two-phase commit protocol

    Two-phase commit protocol

    In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC, tupac) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction. This protocol (a specialised type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable). The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol. == Assumptions == The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that: there is stable storage at each node with a write-ahead log, no node crashes forever, the data in the write-ahead log is never lost or corrupted in a crash, and any two nodes can communicate with each other. The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost. The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant. == Basic algorithm == === Commit request (or voting) phase === The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants. The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log. Each participant replies with: either an agreement message (participant votes Yes to commit), if the participant's actions succeeded; or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit. === Commit (or completion) phase === ==== Success ==== If the coordinator received an agreement message from all participants during the commit-request phase: The coordinator sends a commit message to all the participants. Each participant completes the operation, and releases all the locks and resources held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator completes the transaction when all acknowledgements have been received. ==== Failure ==== If any participant votes No during the commit-request phase (or the coordinator's timeout expires): The coordinator sends a rollback message to all the participants. Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator undoes the transaction when all acknowledgements have been received. ==== Message flow ==== Coordinator Participant QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare/abort <------------------------------- commit/abort COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGEMENT commit/abort <-------------------------------- end An next to the record type means that the record is forced to stable storage. == Disadvantages == The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received. A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond. == Implementing the two-phase commit protocol == === Common architecture === In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named transaction managers (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., The Open Group's X/Open XA). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively. This common architecture is also effective for the distribution of other atomic commitment protocols besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants. === Protocol optimizations === Database research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations and protocol operations saving under certain system's behavior assumptions. ==== Presumed abort and presumed commit ==== Presumed abort or Presumed commit are common such optimizations. An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typical

    Read more →
  • Knowledge organization system

    Knowledge organization system

    Knowledge organization system (KOS), concept system, or concept scheme is the generic term used in knowledge organization (KO) for the selection of concepts with an indication of selected semantic relations. Despite their differences in type, coverage, and application, all KOS aim to support the organization of knowledge and information to facilitate their management and retrieval. KOS vary in complexity from simple sorted lists to complex relational networks. They represent both structural and functional features, and serve to eliminate ambiguity, control synonyms, establish relationships, and present properties. From their origins in library and information science (LIS), KOS have been applied to other domains and disciplines within science and industry, although scholarly research and debate remain primarily within the KO field. Challenges of KOS include ambiguity of terminology, repercussions of biased systems, and potential obsolescence. KOS can be expressed in RDF and RDFS as per the Simple Knowledge Organization System (SKOS) recommendation by W3C, which aims to enable the sharing and linking of KOS via the Web. One of the largest collections of KOS is the BARTOC registry. == Types == While different schema of KOS have been proposed, most are generally arranged in terms of the complexity of their construction and maintenance. Some scholars argue that organizing KOS on a spectrum oversimplifies the shared characteristics among them, and may even result in a non-ideal structure being chosen. The following types are not exhaustive, and are often not mutually-exclusive in practice. === Term lists === Term lists are the least structured form of KOS. They include lists, glossaries, dictionaries, and synonym rings. Authority files and gazetteers may also be considered term lists, however other scholars categorize them and directories as "metadata-like models". Examples include the Union List of Artist Names name authority file and the GeoNames gazetteer. === Categorization and classification === KOS that emphasize specific (and often hierarchical) structures include subject headings, taxonomies, categorization schema, and classification schema & systems. Despite inconsistent use of the terms "categorization" and "classification" in some literature, categorization is generally loosely-assembled grouping schema and may include attributes that are not mutually exclusive (or having fuzzy boundaries), while classification is related to the arrangement of non-overlapping and mutually-exclusive classes. Classification schema may be universal (such as Dewey Decimal Classification and Information Coding Classification) or domain-specific (such as the National Library of Medicine Classification). === Relationship models === The types of KOS with greatest complexity and which utilize connections between concepts include thesauri, semantic networks, and ontologies. One of the most prominent examples of a semantic network is WordNet. === Others === Certain structures proposed to be considered types of KOS—but are not consistently included in schema—include folksonomies, topic maps, web directory structures, publication organization systems, and bibliometric maps. Some KOS organize other KOS themselves—for instance, PeriodO is a gazetteer of periodization categories. == Applications == Some early KOS were developed as a support system for abstracting and indexing services to be used by specially-trained searchers. With the growth of information digitization, usability became increasingly accessible, and more complex structures were developed. Prominent examples of KOS outside of LIS include organism taxonomy in biology, the periodic table of elements in chemistry, SIC and NAICS classification systems for industry & business, and AGROVOC agricultural controlled vocabulary. == Challenges == The study and design of KOS is an ongoing topic of discussion among KO scholars. === Terminology === [There is] a serious lack of vocabulary control in the literature on controlled vocabulary. Inconsistency of terminology within the study of KOS is a common issue. For instance, "ontology" is used for both a specific type of KOS as well as a generic term for any KOS. The terms "taxonomy", "classification", and "categorization" are also sometimes used interchangeably. === Bias === As knowledge can be historically and culturally biased, scholars have also discussed how KOS themselves can perpetuate harmful practices or stereotypes. For example, a number of concerns and criticisms about the classification of mental disorders in the Diagnostic and Statistical Manual of Mental Disorders have been raised, contributing to ongoing revisions. Ethical and intentional design approaches have been proposed for multi-perspective KOS in efforts to mitigate bias and other harmful practices. === Obsolescence === The possible obsolescence of the thesaurus and other simpler KOS has been the topic of debate, especially in the face of increasingly complex ontologies, the growing usage of "Google-like retrieval systems", and the move of KO theory and research away from LIS and toward computer science. Supporters of thesauri argue its continued usefulness for metadata enrichment, vocabulary mapping, and web services, as well as its usage in specific domains such as corporate intranets and digital image libraries.

    Read more →
  • NAPLPS

    NAPLPS

    NAPLPS (North American Presentation Layer Protocol Syntax) is a graphics language for use originally with videotex and teletext services. NAPLPS was developed from the Telidon system developed in Canada, with a small number of additions from AT&T Corporation. The basics of NAPLPS were later used as the basis for several other microcomputer-based graphics systems. == History == The Canadian Communications Research Centre (CRC), based in Ottawa, had been working on various graphics systems since the late 1960s, much of it led by Herb Bown. Through the 1970s they turned their attention to building out a system of "picture description instructions", which encoded graphics commands as a text stream. Graphics were encoded as a series of instructions (graphics primitives) each represented by a single ASCII character. Graphic coordinates were encoded in multiple 6-bit strings of XY coordinate data, flagged to place them in the printable ASCII range so that they could be transmitted with conventional text transmission techniques. ASCII SI/SO characters were used to differentiate the text from graphic portions of a transmitted "page". These instructions were decoded by separate programs to produce graphics output, on a plotter for instance. Other work produced a fully interactive version. In 1975, the CRC gave a contract to Norpak to develop an interactive graphics terminal that could decode the instructions and display them on a color display. During this period, a number of companies were developing the first teletext systems, notably the BBC's Ceefax system. Ceefax encoded character data into the lines in the vertical blanking interval of normal television signals where they could not be seen on-screen, and then used a buffer and decoder in the user's television to convert these into "pages" of text on the display. The Independent Broadcasting Authority quickly introduced their own ORACLE system, and the two organizations subsequently agreed to use a single standard, the "Broadcast Teletext Specification". This later became World System Teletext. At about the same time, other organizations were developing videotex systems, similar to teletext except they used modems to transmit their data instead of television signals. This was potentially slower and used up a telephone line, but had the major advantage of allowing the user to transmit data back to the sender. The UK's General Post Office developed a system using the Ceefax/ORACLE standard, launching it as Prestel, while France prepared the first steps for its ultimately very successful Minitel system, using a rival display standard called Antiope. By 1977, the Norpak system was running, and from this work the CRC decided to create their own teletext/videotext system. Unlike the systems being rolled out in Europe, the CRC decided from the start that the system should be able to run on any combination of communications links. For instance, it could use the vertical blanking interval to send data to the user, and a modem to return selections to the servers. It could be used in a one-way or two-way system. In teletext mode, character codes were sent to users' televisions by encoding them as dot patterns in the vertical blanking interval of the video signal. Various technical "tweaks" and details of the NTSC signals used by North American televisions allowed the downstream videotex channel to increase to 600 bit/s, about twice that used in the European systems. In videotext mode, Bell 202 modems were typical, offering a 1,200 bit/s download rate. A set top box attached to the TV decoded these signals back into text and graphics pages, which the user could select among. The system was publicly launched as Telidon on August 15, 1978. Compared to the European standards, the CRC system was faster, bi-directional, and offered real graphics as opposed to simple character graphics. The downside of the system was that it required much more advanced decoders, typically featuring Zilog Z80 or Motorola 6809 processors with RGB and/or RF output. The Innovation, Science and Economic Development Canada (then Department of Communications) launched a four-year plan to fund public roll-outs of the technology in an effort to spur the development of a commercial Telidon system. AT&T Corporation was so impressed by Telidon that they decided to join the project. They added a number of useful extensions, notably the ability to define original graphics commands (macro) and character sets (DRCS). They also tabled algorithms for proportionally spaced text, which greatly improved the quality of the displayed pages. A joint CSA/ANSI working group (X3L2.1) revised the specifications, which were submitted for standardization. In 1983, they became CSA T500 and ANSI X3.110, or NAPLPS. The data encoding system was also standardized as the NABTS (North American Broadcast Teletext Specification) protocol. Business models for Telidon services were poorly developed. Unlike the UK, where teletext was supported by one of only two large companies whose whole revenue model was based on a read-only medium (television), in North America Telidon was being offered by companies who worked on a subscriber basis. == One-way systems == Telidon-based teletext was tested in a few North American trials in the early 1980s — CBC IRIS, TVOntario, MTS-sponsored Project IDA, to name a few. NAPLPS was also part of the NABTS teletext standard, for the encoding and display of teletext pages. In the late 1980s and early 1990s, affiliates of the regional sports network group SportsChannel ran a service called Sports Plus Network, which ran sports news and scores while SportsChannel was not otherwise on the air. The screens, which frequently featured team logos or likenesses of players in addition to text, were drawn entirely with NAPLPS graphics and resembled the loading of Prodigy pages over a modem, though slightly faster. == Two-way systems == Various two-way systems using NAPLPS appeared in North America in the early 1980s. The biggest North American examples were Knight Ridder's Viewtron (based in Miami) and the Los Angeles Times' Gateway service (based in Orange County). Both used the Sceptre NAPLPS terminal from AT&T. The Sceptre contained a slow modem that connected over the consumer's telephone line to host computers. The Sceptre was expensive whether purchased or rented. Despite huge investments by their parent companies, neither Viewtron nor Gateway lasted into the second half of the decade. Another system, Keyfax, was developed by Keycom Electronic Publishing, a joint venture of Honeywell, Centel (since acquired by Sprint) and Field Enterprises, then-owner of the Chicago Sun-Times newspaper. Keyfax had originally been a WST teletext service, broadcast overnights on Field's Chicago television station WFLD-32 and through the VBI of both WFLD and national superstation WTBS; the decision was made to convert Keyfax into a subscription service, using a proprietary NAPLPS terminal device in a last-ditch effort to save the service. It did not work and Keyfax had ceased operations by the end of 1986. Other early-1980s NAPLPS technology was deployed in Canada, both as a way for rural Canadians to get news and weather information and as the platform for touchscreen information kiosks. In Vancouver these were featured at Expo 86. The kiosks became ubiquitous in Toronto under the name Teleguide, and were deployed in many shopping centres and at major tourist attractions. The latter city was the North American nexus of NAPLPS and the home of Norpak, the most successful of NAPLPS-oriented developers. Norpak created and sold hardware and software for NAPLPS development and display. TVOntario also developed NAPLPS content creation software. London, Ontario - based Cableshare used NAPLPS as the basis of touch-screen information kiosks for shopping malls, the flagship of which was deployed at Toronto's Eaton Centre. The system relied on an 8085-based microcomputer which drove several NAPLPS terminals fitted with touch screens, all communicating via Datapac to a back end database. The system offered news, weather and sports information along with shopping mall guides and coupons. Cableshare also developed and sold a leading NAPLPS page creation utility called the "Picture Painter." In the late 1980s, Tribune Media Services (TMS) and the Associated Press operated a cable television channel called AP News Plus that provided NAPLPS-based news screens to cable television subscribers in many U.S. cities. The news pages were created and edited by TMS staffers working on an Atex editing system in Orlando, Florida, and sent by satellite to NAPLPS decoder devices located at the local cable television companies. Among the firms providing technology to TMS and the Associated Press for the AP News Plus channel was Minneapolis-based Electronic Publishers Inc. (1985–1988). In 1981, two amateur radio operators (VE3FTT and VE3GQW) received special permission from the Canad

    Read more →
  • List of algorithms

    List of algorithms

    An algorithm is a fundamental set of rules or defined procedures that are typically designed and used to be a simpler way to solve a specific problem or a broad set of problems. Simply speaking, algorithms define different processes, sets of rules and regulations, or methodologies that are to be followed through in calculations, data processing, data mining, pattern recognition, automated reasoning or other problem-solving operations. With the increasing automation of services, more and more decisions are being made by algorithms. Some general examples are risk assessments, anticipatory policing, and pattern recognition technology. The following is a list of well-known algorithms. == Automated planning == == Combinatorial algorithms == === General combinatorial algorithms === Brent's algorithm: finds a cycle in function value iterations using only two iterators Floyd's cycle-finding algorithm: finds a cycle in function value iterations Gale–Shapley algorithm: solves the stable matching problem Pseudorandom number generators (uniformly distributed—see also List of pseudorandom number generators for other PRNGs with varying degrees of convergence and varying statistical quality): ACORN generator Blum Blum Shub Lagged Fibonacci generator Linear congruential generator Mersenne Twister === Graph algorithms === Blossom algorithm: algorithm for constructing maximum-cardinality matching on graphs. Coloring algorithm: algorithms for graph (vertex or edge) coloring (subject to constraints, e.g. proper coloring or list coloring) Hopcroft–Karp algorithm: convert a bipartite graph to a maximum-cardinality matching Hungarian algorithm: algorithm for finding a perfect matching Prüfer coding: conversion between a labeled tree and its Prüfer sequence Tarjan's off-line lowest common ancestors algorithm: computes lowest common ancestors for pairs of nodes in a tree Topological sort: finds linear order of nodes (e.g. jobs) based on their dependencies. ==== Graph drawing ==== Coin graph drawing algorithms for finite connected planar graphs (approximately computing the theoretical circle-packing given by the Koebe-Andreev-Thurston theorem). See also Fáry's theorem on straight-line drawings of planar graphs. Force-based algorithms (also known as force-directed algorithms or spring-based algorithms) Spectral layout ==== Network theory ==== Network analysis Link analysis Girvan–Newman algorithm: detect communities in complex systems Web link analysis Hyperlink-Induced Topic Search (HITS) (also known as Hubs and authorities) PageRank TrustRank Flow networks Dinic's algorithm: is a strongly polynomial algorithm for computing the maximum flow in a flow network. Edmonds–Karp algorithm: implementation of Ford–Fulkerson Ford–Fulkerson algorithm: computes the maximum flow in a graph Karger's algorithm: a Monte Carlo method to compute the minimum cut of a connected graph Push–relabel algorithm: computes a maximum flow in a graph ==== Routing for graphs ==== Edmonds' algorithm (also known as Chu–Liu/Edmonds' algorithm): find maximum or minimum branchings Euclidean minimum spanning tree: algorithms for computing the minimum spanning tree of a set of points in the plane Longest path problem: find a simple path of maximum length in a given graph Minimum spanning tree Borůvka's algorithm Kruskal's algorithm Prim's algorithm Reverse-delete algorithm Nonblocking minimal spanning switch say, for a telephone exchange Shortest path problem Bellman–Ford algorithm: computes shortest paths in a weighted graph (where some of the edge weights may be negative) Dijkstra's algorithm: computes shortest paths in a graph with non-negative edge weights Floyd–Warshall algorithm: solves the all pairs shortest path problem in a weighted, directed graph Johnson's algorithm: all pairs shortest path algorithm in sparse weighted directed graph Transitive closure problem: find the transitive closure of a given binary relation Traveling salesman problem Christofides algorithm Nearest neighbour algorithm Vehicle routing problem Clarke and Wright Saving algorithm Warnsdorff's rule: a heuristic method for solving the Knight's tour problem ==== Graph search ==== A: special case of best-first search that uses heuristics to improve speed B: a best-first graph search algorithm that finds the least-cost path from a given initial node to any goal node (out of one or more possible goals) Backtracking: abandons partial solutions when they are found not to satisfy a complete solution Beam search: is a heuristic search algorithm that is an optimization of best-first search that reduces its memory requirement Beam stack search: integrates backtracking with beam search Best-first search: traverses a graph in the order of likely importance using a priority queue Bidirectional search: find the shortest path from an initial vertex to a goal vertex in a directed graph Breadth-first search: traverses a graph level by level Brute-force search: an exhaustive and reliable search method, but computationally inefficient in many applications D: an incremental heuristic search algorithm Depth-first search: traverses a graph branch by branch Dijkstra's algorithm: a special case of A for which no heuristic function is used General Problem Solver: a seminal theorem-proving algorithm intended to work as a universal problem solver machine. Iterative deepening depth-first search (IDDFS): a state space search strategy Jump point search: an optimization to A which may reduce computation time by an order of magnitude using further heuristics Lexicographic breadth-first search (also known as Lex-BFS): a linear time algorithm for ordering the vertices of a graph SSS: state space search traversing a game tree in a best-first fashion similar to that of the A search algorithm Uniform-cost search: a tree search that finds the lowest-cost route where costs vary ==== Subgraphs ==== Cliques Bron–Kerbosch algorithm: a technique for finding maximal cliques in an undirected graph MaxCliqueDyn maximum clique algorithm: find a maximum clique in an undirected graph Strongly connected components Kosaraju's algorithm Path-based strong component algorithm Tarjan's strongly connected components algorithm Subgraph isomorphism problem === Sequence algorithms === ==== Approximate sequence matching ==== Bitap algorithm: fuzzy algorithm that determines if strings are approximately equal. Phonetic algorithms Daitch–Mokotoff Soundex: a Soundex refinement which allows matching of Slavic and Germanic surnames Double Metaphone: an improvement on Metaphone Match rating approach: a phonetic algorithm developed by Western Airlines Metaphone: an algorithm for indexing words by their sound, when pronounced in English NYSIIS: phonetic algorithm, improves on Soundex Soundex: a phonetic algorithm for indexing names by sound, as pronounced in English String metrics: computes a similarity or dissimilarity (distance) score between two pairs of text strings Damerau–Levenshtein distance: computes a distance measure between two strings, improves on Levenshtein distance Dice's coefficient (also known as the Dice coefficient): a similarity measure related to the Jaccard index Hamming distance: sum number of positions which are different Jaro–Winkler distance: is a measure of similarity between two strings Levenshtein edit distance: computes a metric for the amount of difference between two sequences Trigram search: search for text when the exact syntax or spelling of the target object is not precisely known ==== Selection algorithms ==== Introselect Quickselect ==== Sequence search ==== Linear search: locates an item in an unsorted sequence Selection algorithm: finds the kth largest item in a sequence Sorted lists Binary search algorithm: locates an item in a sorted sequence Eytzinger binary search: cache friendly binary search algorithm Fibonacci search technique: search a sorted sequence using a divide and conquer algorithm that narrows down possible locations with the aid of Fibonacci numbers Jump search (or block search): linear search on a smaller subset of the sequence Predictive search: binary-like search which factors in magnitude of search term versus the high and low values in the search. Sometimes called dictionary search or interpolated search. Uniform binary search: an optimization of the classic binary search algorithm Ternary search: a technique for finding the minimum or maximum of a function that is either strictly increasing and then strictly decreasing or vice versa ==== Sequence merging ==== k-way merge algorithm Simple merge algorithm Union (merge, with elements on the output not repeated) ==== Sequence permutations ==== Fisher–Yates shuffle (also known as the Knuth shuffle): randomly shuffle a finite set Heap's permutation generation algorithm: interchange elements to generate next permutation Schensted algorithm: constructs a pair of Young tableaux from a permutation Steinhaus–Johnson–Trotter algorithm (also known as the Johnson–Trotter algorithm):

    Read more →
  • Artificial intelligence in government

    Artificial intelligence in government

    Artificial intelligence (AI) has a range of uses in government. It can be used to further public policy objectives (in areas such as emergency services, health and welfare), as well as assist the public to interact with the government (through the use of virtual assistants, for example). According to the Harvard Business Review, "Applications of artificial intelligence to the public sector are broad and growing, with early experiments taking place around the world." Hila Mehr from the Ash Center for Democratic Governance and Innovation at Harvard University notes that AI in government is not new, with postal services using machine methods in the late 1990s to recognise handwriting on envelopes to automatically route letters. The use of AI in government comes with significant benefits, including efficiencies resulting in cost savings (for instance by reducing the number of front office staff) and reducing the opportunities for corruption. However, it also carries risks (described below). == Uses of AI in government == The potential uses of AI in government are wide and varied, with Deloitte considering that "Cognitive technologies could eventually revolutionize every facet of government operations". Mehr suggests that six types of government problems are appropriate for AI applications: Resource allocation—such as where administrative support is required to complete tasks more quickly. Large datasets—where these are too large for employees to work efficiently and multiple datasets could be combined to provide greater insights. Expert shortage—including where basic questions could be answered and niche issues can be learned. Predictable scenario—historical data makes the situation predictable. Procedural tasks refer to repetitive tasks in which the answers to inputs or outputs are binary. Diverse data—where data takes various forms (such as visual and linguistic) and needs to be summarized regularly. Mehr states that "While applications of AI in government work have not kept pace with the rapid expansion of AI in the private sector, the potential use cases in the public sector mirror common applications in the private sector." Potential and actual uses of AI in government can be divided into three broad categories: those that contribute to public policy objectives, those that assist public interactions with the government, and other uses. === Contributing to public policy objectives === There are a range of examples of where AI can contribute to public policy objectives. These include: Receiving benefits at job loss, retirement, bereavement and child birth almost immediately, in an automated way (thus without requiring any actions from citizens at all) Social insurance service provision Classifying emergency calls based on their urgency (like the system used by the Cincinnati Fire Department in the United States) Detecting and preventing the spread of diseases Assisting public servants in making welfare payments and immigration decisions Adjudicating bail hearings Triaging health care cases Monitoring social media for public feedback on policies Monitoring social media to identify emergency situations Identifying fraudulent benefits claims Predicting a crime and recommending optimal police presence Predicting traffic congestion and car accidents Anticipating road maintenance requirements Identifying breaches of health regulations Providing personalised education to students Marking exam papers Assisting with defence and national security (see Artificial intelligence § Military and Applications of artificial intelligence § Other fields in which AI methods are implemented respectively) Artificial Intelligence in China has been used to drive both political and economic markets. In 2019, Shanghai’s government rolled out 100 billion yuan to assist in funding enterprises that used AI to introduce 22 new policy agendas. Shanghai invested in these enterprises to attract top international talent in order to set up the Shanghai Municipal Big Data Center. City Brain AI is an urban management platform made by Alibaba. China uses City Brain AI to maintain a significant share of capital investment through public and state owned enterprises. The synergy between public and private sectors are more than capital-driven with City Brain AI. The blend of both public and private shareholding is only made out to be through the role of provincial and sub-provincial governments. Both hold control over the direction that City Brain AI makes both socially and economically. === Assisting public interactions with government === AI can be used to assist members of the public to interact with government and access government services, for example by: Answering questions using virtual assistants or chatbots (see below) Directing requests to the appropriate area within government Filling out forms Assisting with searching documents (e.g. IP Australia's trade mark search) Scheduling appointments Various governments, including those of Australia and Estonia, have implemented virtual assistants to aid citizens in navigating services, with applications ranging from tax inquiries to life-event registrations. === Gerrymandering === Gerrymandering is a method of influencing political process by drawing map boundaries in favor of incumbent parties. Academic researchers Wendy Tam Cho and Bruce Cain have proposed partially automating the map-drawing process with an AI system to reduce partisan gerrymandering. Even with this AI system, the process may still be manipulated to favor partisan interests, so the researchers emphasized the importance of transparency and human involvement. === Other uses === Other uses of AI in government include: Translation Language interpretation pioneered by the European Commission's Directorate General for Interpretation and Florika Fink-Hooijer. Drafting documents == Potential benefits == AI offers potential efficiencies and cost savings for the government. For example, Deloitte has estimated that automation could save US Government employees between 96.7 million to 1.2 billion hours a year, resulting in potential savings of between $3.3 billion to $41.1 billion a year. The Harvard Business Review has stated that while this may lead a government to reduce employee numbers, "Governments could instead choose to invest in the quality of its services. They can re-employ workers' time towards more rewarding work that requires lateral thinking, empathy, and creativity—all things at which humans continue to outperform even the most sophisticated AI program." == Risks == Risks associated with the use of AI in government include AI becoming susceptible to bias, a lack of transparency in how an AI application may make decisions, and the accountability for any such decisions. For example, a 2026 lawsuit alleged that the U.S. Department of Government Efficiency used ChatGPT to flag and cancel federal humanities grants, including projects on Jewish history and Israeli culture, over some objections from NEH officials, illustrating how automated decision-making could affect funding outcomes.

    Read more →
  • Automatic image annotation

    Automatic image annotation

    Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database. This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations. Subsequently, techniques were developed using machine translation to attempt to translate the textual vocabulary into the 'visual vocabulary,' represented by clustered regions known as blobs. Subsequent work has included classification approaches, relevance models, and other related methods. The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user. At present, Content-Based Image Retrieval (CBIR) generally requires users to search by image concepts such as color and texture or by finding example queries. However, certain image features in example images may override the concept that the user is truly focusing on. Traditional methods of image retrieval, such as those used by libraries, have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

    Read more →
  • Deep Instinct

    Deep Instinct

    Deep Instinct is a cybersecurity company that applies deep learning to cybersecurity. The company implements artificial intelligence to the task of preventing and detecting malware. The company was the recipient of the Technology Pioneer by The World Economic Forum in 2017. Lane Bess has been CEO of the company since 2022. == Overview == In 2015, Deep Instinct was founded by Guy Caspi, Dr. Eli David, and Nadav Maman. The headquarters of the company is located in New York City. In July 2017, NVIDIA became an investor. According to Tom's Hardware, NVIDIA’s investment enabled access to a GPU-based neural network and CUDA platform, which they were using to achieve maximum vulnerability detection rates. As of February 2020, the company had raised $43 million in Series C funding round. In April 2021, Deep Instinct raised $100 million in Series D funding to accelerate growth. == Partnerships == In April 2019, Deep Instinct partnered with Chinese artist, Guo O. Dong on an art project titled, The Persistence of Chaos, consisting of a laptop infected with 6 pieces of malware that represented $95 billion in damages. The art was auctioned with a final bid of $1,345,000. In the same year, Globes reported that, HP Inc partnered with Deep Instinct to launch their security solution HP SureSense, which has been applied to the EliteBook and Zbook devices.

    Read more →
  • Algorithms and Combinatorics

    Algorithms and Combinatorics

    Algorithms and Combinatorics (ISSN 0937-5511) is a book series in mathematics, and particularly in combinatorics and the design and analysis of algorithms. It is published by Springer Science+Business Media, and was founded in 1987. == Books == The books published in this series include: The Simplex Method: A Probabilistic Analysis (Karl Heinz Borgwardt, 1987, vol. 1) Geometric Algorithms and Combinatorial Optimization (Martin Grötschel, László Lovász, and Alexander Schrijver, 1988, vol. 2; 2nd ed., 1993) Systems Analysis by Graphs and Matroids (Kazuo Murota, 1987, vol. 3) Greedoids (Bernhard Korte, László Lovász, and Rainer Schrader, 1991, vol. 4) Mathematics of Ramsey Theory (Jaroslav Nešetřil and Vojtěch Rödl, eds., 1990, vol. 5) Matroid Theory and its Applications in Electric Network Theory and in Statics (Andras Recszki, 1989, vol. 6) Irregularities of Partitions: Papers from the meeting held in Fertőd, July 7–11, 1986 (Gábor Halász and Vera T. Sós, eds., 1989, vol. 8) Paths, Flows, and VLSI-Layout: Papers from the meeting held at the University of Bonn, Bonn, June 20–July 1, 1988 (Bernhard Korte, László Lovász, Hans Jürgen Prömel, and Alexander Schrijver, eds., 1990, vol. 9) New Trends in Discrete and Computational Geometry (János Pach, ed., 1993, vol. 10) Discrete Images, Objects, and Functions in Z n {\displaystyle \mathbb {Z} ^{n}} (Klaus Voss, 1993, vol. 11) Linear Optimization and Extensions (Manfred Padberg, 1999, vol. 12) The Mathematics of Paul Erdős I (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 13) The Mathematics of Paul Erdős II (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 14) Geometry of Cuts and Metrics (Michel Deza and Monique Laurent, 1997, vol. 15) Probabilistic Methods for Algorithmic Discrete Mathematics (M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, 1998, vol. 16) Modern Cryptography, Probabilistic Proofs and Pseudorandomness (Oded Goldreich, 1999, vol. 17) Geometric Discrepancy: An Illustrated Guide (Jiří Matoušek, 1999, vol. 18) Applied Finite Group Actions (Adalbert Kerber, 1999, vol. 19) Matrices and Matroids for Systems Analysis (Kazuo Murota, 2000, vol. 20; corrected ed., 2010) Combinatorial Optimization (Bernhard Korte and Jens Vygen, 2000, vol. 21; 5th ed., 2012) The Strange Logic of Random Graphs (Joel Spencer, 2001, vol. 22) Graph Colouring and the Probabilistic Method (Michael Molloy and Bruce Reed, 2002, Vol. 23) Combinatorial Optimization: Polyhedra and Efficiency (Alexander Schrijver, 2003, vol. 24. In three volumes: A. Paths, flows, matchings; B. Matroids, trees, stable sets; C. Disjoint paths, hypergraphs) Discrete and Computational Geometry: The Goodman-Pollack Festschrift (B. Aronov, S. Basu, J. Pach, and M. Sharir, eds., 2003, vol. 25) Topics in Discrete Mathematics: Dedicated to Jarik Nešetril on the Occasion of his 60th birthday (M. Klazar, J. Kratochvíl, M. Loebl, J. Matoušek, R. Thomas, and P. Valtr, eds., 2006, vol. 26) Boolean Function Complexity: Advances and Frontiers (Stasys Jukna, 2012, Vol. 27) Sparsity: Graphs, Structures, and Algorithms (Jaroslav Nešetřil and Patrice Ossona de Mendez, 2012, vol. 28) Optimal Interconnection Trees in the Plane (Marcus Brazil and Martin Zachariasen, 2015, vol. 29) Combinatorics and Complexity of Partition Functions (Alexander Barvinok, 2016, vol. 30)

    Read more →
  • Materials informatics

    Materials informatics

    Materials informatics is a field of study that applies the principles of informatics and data science to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. The term "materials informatics" is frequently used interchangeably with "data science", "machine learning", and "artificial intelligence" by the community. This is an emerging field, with a goal to achieve high-speed and robust acquisition, management, analysis, and dissemination of diverse materials data with the goal of greatly reducing the time and risk required to develop, produce, and deploy new materials, which generally takes longer than 20 years. This field of endeavor is not limited to some traditional understandings of the relationship between materials and information. Some more narrow interpretations include combinatorial chemistry, process modeling, materials databases, materials data management, and product life cycle management. Materials informatics is at the convergence of these concepts, but also transcends them and has the potential to achieve greater insights and deeper understanding by applying lessons learned from data gathered on one type of material to others. By gathering appropriate meta data, the value of each individual data point can be greatly expanded. == Databases == Databases are essential for any informatics research and applications. In material informatics many databases exist containing both empirical data obtained experimentally, and theoretical data obtained computationally. Big data that can be used for machine learning is particularly difficult to obtain for experimental data due to the lack of a standard for reporting data and the variability in the experimental environment. This lack of big data has led to growing effort in developing machine learning techniques that utilize data extremely data sets. On the other hand, large uniform database of theoretical density functional theory (DFT) calculations exists. These databases have proven their utility in high-throughput material screening and discovery. Some common DFT databases and high throughput tools are listed below: Databases: MaterialsProject.org, MaterialsWeb.org (University of Florida) HT software: Pymatgen, MPInterfaces, Matminer == Beyond computational methods? == The concept of materials informatics is addressed by the Materials Research Society. For example, materials informatics was the theme of the December 2006 issue of the MRS Bulletin. The issue was guest-edited by John Rodgers of Innovative Materials, Inc., and David Cebon of Cambridge University, who described the "high payoff for developing methodologies that will accelerate the insertion of materials, thereby saving millions of investment dollars." The editors focused on the limited definition of materials informatics as primarily focused on computational methods to process and interpret data. They stated that "specialized informatics tools for data capture, management, analysis, and dissemination" and "advances in computing power, coupled with computational modeling and simulation and materials properties databases" will enable such accelerated insertion of materials. A broader definition of materials informatics goes beyond the use of computational methods to carry out the same experimentation, viewing materials informatics as a framework in which a measurement or computation is one step in an information-based learning process that uses the power of a collective to achieve greater efficiency in exploration. When properly organized, this framework crosses materials boundaries to uncover fundamental knowledge of the basis of physical, mechanical, and engineering properties. == Challenges == While there are many who believe in the future of informatics in the materials development and scaling process, many challenges remain. Hill, et al., write that "Today, the materials community faces serious challenges to bringing about this data-accelerated research paradigm, including diversity of research areas within materials, lack of data standards, and missing incentives for sharing, among others. Nonetheless, the landscape is rapidly changing in ways that should benefit the entire materials research enterprise." This remaining tension between traditional materials development methodologies and the use of more computationally, machine learning, and analytics approaches will likely exist for some time as the materials industry overcomes some of the cultural barriers necessary to fully embrace such new ways of thinking. == Analogy from Biology == The overarching goals of bioinformatics and systems biology may provide a useful analogy. Andrew Murray of Harvard University expresses the hope that such an approach "will save us from the era of "one graduate student, one gene, one PhD". Similarly, the goal of materials informatics is to save us from one graduate student, one alloy, one PhD. Such goals will require more sophisticated strategies and research paradigms than applying data-science methods to the same tasks set currently undertaken by students.

    Read more →
  • Iteration

    Iteration

    Iteration means repeating a process to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is the starting point of the next iteration. In mathematics and computer science, iteration (along with the related technique of recursion) is a standard element of algorithms. == Mathematics == In mathematics, iteration may refer to the process of iterating a function, i.e. applying a function repeatedly, using the output from one iteration as the input to the next. Iteration of apparently simple functions can produce complex behaviors and difficult problems – for examples, see the Collatz conjecture and juggler sequences. Another use of iteration in mathematics is in iterative methods which are used to produce approximate numerical solutions to certain mathematical problems. Newton's method is an example of an iterative method. Manual calculation of a number's square root is a common use and a well-known example. == Computing == In computing, iteration is a technique that marks out of a block of statements within a computer program for a defined number of repetitions. That block of statements is said to be iterated. A computer programmer might also refer to that block of statements as an iteration. === Implementations === Loops constitute the most common language constructs for performing iterations. The following pseudocode "iterates" three times the line of code between begin & end through a for loop, and uses the values of i as increments. It is permissible, and often necessary, to use values from other parts of the program outside the bracketed block of statements, to perform the desired function. Iterators constitute alternative language constructs to loops, which ensure consistent iterations over specific data structures. They can eventually save time and effort in later coding attempts. In particular, an iterator allows one to repeat the same kind of operation at each node of such a data structure, often in some pre-defined order. Iteratees are purely functional language constructs, which accept or reject data during the iterations. === Relation with recursion === Recursions and iterations have different algorithmic definitions, even though they can generate identical results. The primary difference is that recursion can be a solution without prior knowledge as to how many times the action must repeat, while a successful iteration requires that foreknowledge. Some types of programming languages, known as functional programming languages, are designed such that they do not set up a block of statements for explicit repetition, as with the for loop. Instead, those programming languages exclusively use recursion. Rather than call out a block of code to repeate a pre-defined number of times, the executing code block instead "divides" the work into a number of separate pieces, after which the code block executes itself on each individual piece. Each piece of work is divided repeatedly until the "amount" of work is as small as possible, at which point the algorithm does that work very quickly. The algorithm then "reverses" and reassembles the pieces into a complete whole. The classic example of recursion is in list-sorting algorithms, such as merge sort. The merge sort recursive algorithm first repeatedly divides the list into consecutive pairs. Each pair is then ordered, then each consecutive pair of pairs, and so forth until the elements of the list are in the desired order. The code below is an example of a recursive algorithm in the Scheme programming language that outputs the same result as the pseudocode under the previous heading. == Education == In some schools of pedagogy, iterations are used to describe the process of teaching or guiding students to repeat experiments, assessments, or projects, until more accurate results are found, or the student has mastered the technical skill. This idea is found in the old adage, "Practice makes perfect." In particular, "iterative" is defined as the "process of learning and development that involves cyclical inquiry, enabling multiple opportunities for people to revisit ideas and critically reflect on their implication." Unlike computing and math, educational iterations are not predetermined; instead, the task is repeated until success according to some external criteria (often a test) is achieved.

    Read more →
  • Mixed raster content

    Mixed raster content

    Mixed raster content (MRC) is a method for compressing images that contain both binary-compressible text and continuous-tone components, using image segmentation methods to improve the level of compression and the quality of the rendered image. By separating the image into components with different compressibility characteristics, the most efficient and accurate compression algorithm for each component can be applied. MRC-compressed images are typically packaged into a hybrid file format such as DjVu and sometimes PDF. This allows for multiple images, and the instructions to properly render and reassemble them, to be stored within a single file. Some image scanners optionally support MRC when scanning to PDF. A typical manual states that without MRC, the image is generated in a single process, with text and graphics not distinguished. With MRC, separate processes are used for text, graphics, and other elements, producing clearer graphics and sharper text, at the price of slightly slower processing. MRC is recommended to optimise the scanning of documents with harder-to-read text or lower-quality graphics. MRC can also reduce the size of the scanned file, though higher compression using JBIG2 can sometimes lead to character substitution errors in scanned documents. == File format == A form of MRC is defined by international standard bodies as ISO/IEC 16485, or ITU recommendation T.44 (accessible free of charge). It defines a file format with bilevel masks and two data layers in each "stripe" of the image. The mask can be encoded in ITU T.4, JBIG1, or JBIG2, while the images can be JPEG, JBIG1, or run-length encoded color. The format is loosely based on JPEG, with a APP13 segment registered for this purpose. It is not known whether this file format is actually used, as formats like DjVu and PDF have their own ways of defining layers and masks.

    Read more →
  • The Algorithm Auction

    The Algorithm Auction

    The Algorithm Auction is the world's first auction of computer algorithms. Created by Ruse Laboratories, the initial auction featured seven lots and was held at the Cooper Hewitt, Smithsonian Design Museum on March 27, 2015. Five lots were physical representations of famous code or algorithms, including a signed, handwritten copy of the original Hello, World! C program by its creator Brian Kernighan on dot-matrix printer paper, a printed copy of 5,000 lines of Assembly code comprising the earliest known version of Turtle Graphics, signed by its creator Hal Abelson, a necktie containing the six-line qrpff algorithm capable of decrypting content on a commercially produced DVD video disc, and a pair of drawings representing OkCupid's original Compatibility Calculation algorithm, signed by the company founders. The qrpff lot sold for $2,500. Two other lots were “living algorithms,” including a set of JavaScript tools for building applications that are accessible to the visually impaired and the other is for a program that converts lines of software code into music. Winning bidders received, along with artifacts related to the algorithms, a full intellectual property license to use, modify, or open-source the code. All lots were sold, with Hello World receiving the most bids. Exhibited alongside the auction lots were a facsimile of the Plimpton 322 tablet on loan from Columbia University, and Nigella, an art-world facing computer virus named after Nigella Lawson and created by cypherpunk and hacktivist Richard Jones. Sebastian Chan, Director of Digital & Emerging Media at the Cooper–Hewitt, attended the event remotely from Milan, Italy via a Beam Pro telepresence robot. == Effects == Following the auction, the Museum of Modern Art held a salon titled The Way of the Algorithm highlighting algorithms as "a ubiquitous and indispensable component of our lives."

    Read more →
  • Time Warp Edit Distance

    Time Warp Edit Distance

    In the data analysis of time series, Time Warp Edit Distance (TWED) is a measure of similarity (or dissimilarity) between pairs of discrete time series, controlling the relative distortion of the time units of the two series using the physical notion of elasticity. In comparison to other distance measures, (e.g. DTW (dynamic time warping) or LCS (longest common subsequence problem)), TWED is a metric. Its computational time complexity is O ( n 2 ) {\displaystyle O(n^{2})} , but can be drastically reduced in some specific situations by using a corridor to reduce the search space. Its memory space complexity can be reduced to O ( n ) {\displaystyle O(n)} . It was first proposed in 2009 by P.-F. Marteau. == Definition == δ λ , ν ( A 1 p , B 1 q ) = M i n { δ λ , ν ( A 1 p − 1 , B 1 q ) + Γ ( a p ′ → Λ ) d e l e t e i n A δ λ , ν ( A 1 p − 1 , B 1 q − 1 ) + Γ ( a p ′ → b q ′ ) m a t c h o r s u b s t i t u t i o n δ λ , ν ( A 1 p , B 1 q − 1 ) + Γ ( Λ → b q ′ ) d e l e t e i n B {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{p},B_{1}^{q})=Min{\begin{cases}\delta _{\lambda ,\nu }(A_{1}^{p-1},B_{1}^{q})+\Gamma (a_{p}^{'}\to \Lambda )&{\rm {delete\ in\ A}}\\\delta _{\lambda ,\nu }(A_{1}^{p-1},B_{1}^{q-1})+\Gamma (a_{p}^{'}\to b_{q}^{'})&{\rm {match\ or\ substitution}}\\\delta _{\lambda ,\nu }(A_{1}^{p},B_{1}^{q-1})+\Gamma (\Lambda \to b_{q}^{'})&{\rm {delete\ in\ B}}\end{cases}}} whereas Γ ( α p ′ → Λ ) = d L P ( a p ′ , a p − 1 ′ ) + ν ⋅ ( t a p − t a p − 1 ) + λ {\displaystyle \Gamma (\alpha _{p}^{'}\to \Lambda )=d_{LP}(a_{p}^{'},a_{p-1}^{'})+\nu \cdot (t_{a_{p}}-t_{a_{p-1}})+\lambda } Γ ( α p ′ → b q ′ ) = d L P ( a p ′ , b q ′ ) + d L P ( a p − 1 ′ , b q − 1 ′ ) + ν ⋅ ( | t a p − t b q | + | t a p − 1 − t b q − 1 | ) {\displaystyle \Gamma (\alpha _{p}^{'}\to b_{q}^{'})=d_{LP}(a_{p}^{'},b_{q}^{'})+d_{LP}(a_{p-1}^{'},b_{q-1}^{'})+\nu \cdot (|t_{a_{p}}-t_{b_{q}}|+|t_{a_{p-1}}-t_{b_{q-1}}|)} Γ ( Λ → b q ′ ) = d L P ( b p ′ , b p − 1 ′ ) + ν ⋅ ( t b q − t b q − 1 ) + λ {\displaystyle \Gamma (\Lambda \to b_{q}^{'})=d_{LP}(b_{p}^{'},b_{p-1}^{'})+\nu \cdot (t_{b_{q}}-t_{b_{q-1}})+\lambda } Whereas the recursion δ λ , ν {\displaystyle \delta _{\lambda ,\nu }} is initialized as: δ λ , ν ( A 1 0 , B 1 0 ) = 0 , {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{0},B_{1}^{0})=0,} δ λ , ν ( A 1 0 , B 1 j ) = ∞ f o r j ≥ 1 {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{0},B_{1}^{j})=\infty \ {\rm {{for\ }j\geq 1}}} δ λ , ν ( A 1 i , B 1 0 ) = ∞ f o r i ≥ 1 {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{i},B_{1}^{0})=\infty \ {\rm {{for\ }i\geq 1}}} with a 0 ′ = b 0 ′ = 0 {\displaystyle a'_{0}=b'_{0}=0} === Implementations === An implementation of the TWED algorithm in C with a Python wrapper is available at TWED is also implemented into the Time Series Subsequence Search Python package (TSSEARCH for short) available at [1]. An R implementation of TWED has been integrated into the TraMineR, a R package for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. Additionally, cuTWED is a CUDA- accelerated implementation of TWED which uses an improved algorithm due to G. Wright (2020). This method is linear in memory and massively parallelized. cuTWED is written in CUDA C/C++, comes with Python bindings, and also includes Python bindings for Marteau's reference C implementation. ==== Python ==== Backtracking, to find the most cost-efficient path: ==== MATLAB ==== Backtracking, to find the most cost-efficient path:

    Read more →