Gerrit (software)

Gerrit (software)

Gerrit ( GERR-it) is a free, web-based team code collaboration tool. Software developers in a team can review each other's modifications on their source code using a Web browser and approve or reject those changes. It integrates closely with Git, a distributed version control system. Gerrit is a fork of Rietveld, a code review tool for Subversion. Both are named after Dutch designer Gerrit Rietveld. == History == Originally written in Python like Rietveld, it is now written in Java (Java EE Servlet) with SQL since version 2 and a custom-made Git-based database (NoteDb) since version 3. In versions 2.0–2.16 Gerrit used Google Web Toolkit for its browser-based front-end. After being developed and used in parallel with GWT for versions 2.14–2.16, a new Polymer web UI replaced the GWT UI in version 3.0.

Data annotation

Data annotation is the process of labeling or tagging relevant metadata within a dataset to enable machines to interpret the data accurately. The dataset can take various forms, including images, audio files, video footage, or text. == Applications == Data is a fundamental component in the development of artificial intelligence (AI). Training AI models, particularly in computer vision and natural language processing, requires large volumes of annotated data. Proper annotation ensures that machine learning algorithms can recognize patterns and make accurate predictions. Common types of data annotation include classification, bounding boxes, semantic segmentation, and keypoint annotation. Data annotation is used in AI-driven fields, including healthcare, autonomous vehicles, retail, security, and entertainment. By accurately labeling data, machine learning models can perform complex tasks such as object detection, sentiment analysis, and speech recognition with greater precision. This growing demand has led to the emergence of specialized sectors and platforms dedicated to AI training and human-in-the-loop workflows, which often utilize Reinforcement Learning from Human Feedback (RLHF) to refine model behavior. == In computer vision == === Image classification === Image classification, also known as image categorization, involves assigning predefined labels to images. Machine learning algorithms trained on classified images can later recognize objects and differentiate between categories. For instance, an AI model trained to recognize furniture styles can distinguish between Georgian and Rococo armchairs. === Semantic segmentation === Semantic segmentation assigns each pixel in an image to a specific class, such as trees, vehicles, humans, or buildings. This type of annotation enables machine learning models to differentiate objects by grouping similar pixels, allowing for a detailed understanding of an image. === Bounding boxes === Bounding box annotation involves drawing rectangular boxes around objects in an image. This technique is commonly used in autonomous driving, security surveillance, and retail analytics to detect and classify objects such as pedestrians, vehicles, and products on store shelves. === 3D cuboids === 3D cuboid annotation enhances traditional bounding boxes by adding depth, enabling models to predict an object's spatial orientation, movement, and size. This method is particularly useful for autonomous vehicles and robotics, where understanding object dimensions and depth is critical. === Polygonal annotation === For objects with irregular shapes, such as curved or multi-sided items, polygonal annotation provides more precise labeling than bounding boxes. This technique is often used in applications that require detailed object recognition, such as medical imaging or aerial mapping. === Keypoint annotation === Keypoint annotation marks specific points on an object, such as facial landmarks or body joints, to enable tracking and motion analysis. This method is widely used in facial recognition, emotion detection, sports analytics, and augmented reality applications.

Source criticism

Source criticism (or information evaluation) is the process of evaluating an information source, i.e.: a document, a person, a speech, a fingerprint, a photo, an observation, or anything used in order to obtain knowledge. In relation to a given purpose, a given information source may be more or less valid, reliable or relevant. Broadly, "source criticism" is the interdisciplinary study of how information sources are evaluated for given tasks. == Meaning == Problems in translation: The Danish word kildekritik, like the Norwegian word kildekritikk and the Swedish word källkritik, derived from the German Quellenkritik and is closely associated with the German historian Leopold von Ranke (1795–1886). Historian Wolfgang Hardtwig wrote: His [Ranke's] first work Geschichte der romanischen und germanischen Völker von 1494–1514 (History of the Latin and Teutonic Nations from 1494 to 1514) (1824) was a great success. It already showed some of the basic characteristics of his conception of Europe, and was of historiographical importance particularly because Ranke made an exemplary critical analysis of his sources in a separate volume, Zur Kritik neuerer Geschichtsschreiber (On the Critical Methods of Recent Historians). In this work he raised the method of textual criticism used in the late eighteenth century, particularly in classical philology to the standard method of scientific historical writing. (Hardtwig, 2001, p. 12739) Historical theorist Chris Lorenz wrote: The larger part of the nineteenth and twentieth centuries would be dominated by the research-oriented conception of historical method of the so-called Historical School in Germany, led by historians as Leopold Ranke and Berthold Niebuhr. Their conception of history, long been regarded as the beginning of modern, 'scientific' history, harked back to the 'narrow' conception of historical method, limiting the methodical character of history to source criticism. (Lorenz, 2001) In the early 21st century, source criticism is a growing field in, among other fields, library and information science. In this context source criticism is studied from a broader perspective than just, for example, history, classical philology, or biblical studies (but there, too, it has more recently received new attention). == Principles == The following principles are from two Scandinavian textbooks on source criticism, written by the historians Olden-Jørgensen (1998) and Thurén (1997): Human sources may be relics (e.g. a fingerprint) or narratives (e.g. a statement or a letter). Relics are more credible sources than narratives. A given source may be forged or corrupted; strong indications of the originality of the source increases its reliability. The closer a source is to the event which it purports to describe, the more one can trust it to give an accurate description of what really happened A primary source is more reliable than a secondary source, which in turn is more reliable than a tertiary source and so on. If a number of independent sources contain the same message, the credibility of the message is strongly increased. The tendency of a source is its motivation for providing some kind of bias. Tendencies should be minimized or supplemented with opposite motivations. If it can be demonstrated that the witness (or source) has no direct interest in creating bias, the credibility of the message is increased. Two other principles are: Knowledge of source criticism cannot substitute for subject knowledge: "Because each source teaches you more and more about your subject, you will be able to judge with ever-increasing precision the usefulness and value of any prospective source. In other words, the more you know about the subject, the more precisely you can identify what you must still find out". (Bazerman, 1995, p. 304). The reliability of a given source is relative to the questions put to it. "The empirical case study showed that most people find it difficult to assess questions of cognitive authority and media credibility in a general sense, for example, by comparing the overall credibility of newspapers and the Internet. Thus these assessments tend to be situationally sensitive. Newspapers, television and the Internet were frequently used as sources of orienting information, but their credibility varied depending on the actual topic at hand" (Savolainen, 2007). The following questions are often good ones to ask about any source according to the American Library Association (1994) and Engeldinger (1988): How was the source located? What type of source is it? Who is the author and what are the qualifications of the author in regard to the topic that is discussed? When was the information published? In which country was it published? What is the reputation of the publisher? Does the source show a particular cultural or political bias? For literary sources complementing criteria are: Does the source contain a bibliography? Has the material been reviewed by a group of peers, or has it been edited? How does the article/book compare with similar articles/books? == Levels of generality == Some principles of source criticism are universal, other principles are specific for certain kinds of information sources. There is today no consensus about the similarities and differences between source criticism in the natural science and humanities. Logical positivism claimed that all fields of knowledge were based on the same principles. Much of the criticism of logical positivism claimed that positivism is the basis of the sciences, whereas hermeneutics is the basis of the humanities. This was, for example, the position of Jürgen Habermas. A newer position, in accordance with, among others, Hans-Georg Gadamer and Thomas Kuhn, understands both science and humanities as determined by researchers' preunderstanding and paradigms. Hermeneutics is thus a universal theory. The difference is, however, that the sources of the humanities are themselves products of human interests and preunderstanding, whereas the sources of the natural sciences are not. Humanities are thus "doubly hermeneutic". Natural scientists, however, are also using human products (such as scientific papers) which are products of preunderstanding (and can lead to, for example, academic fraud). == Contributing fields == === Epistemology === Epistemological theories are the basic theories about how knowledge is obtained and are thus the most general theories about how to evaluate information sources. Empiricism evaluates sources by considering the observations (or sensations) on which they are based. Sources without basis in experience are not seen as valid. Rationalism provides low priority to sources based on observations. In order to be meaningful, observations must be explained by clear ideas or concepts. It is the logical structure and the well definedness that is in focus in evaluating information sources from the rationalist point of view. Historicism evaluates information sources on the basis of their reflection of their sociocultural context and their theoretical development. Pragmatism evaluate sources on the basis of how their values and usefulness to accomplish certain outcomes. Pragmatism is skeptical about claimed neutral information sources. The evaluation of knowledge or information sources cannot be more certain than is the construction of knowledge. If one accepts the principle of fallibilism then one also has to accept that source criticism can never 100% verify knowledge claims. As discussed in the next section, source criticism is intimately linked to scientific methods. The presence of fallacies of argument in sources is another kind of philosophical criterion for evaluating sources. Fallacies are presented by Walton (1998). Among the fallacies are the ad hominem fallacy (the use of personal attack to try to undermine or refute a person's argument) and the straw man fallacy (when one arguer misrepresents another's position to make it appear less plausible than it really is, in order more easily to criticize or refute it.) === Research methodology === Research methods are methods used to produce scholarly knowledge. The methods that are relevant for producing knowledge are also relevant for evaluating knowledge. An example of a book that turns methodology upside-down and uses it to evaluate produced knowledge is Katzer; Cook & Crouch (1998). === Science studies === Studies of quality evaluation processes such as peer review, book reviews and of the normative criteria used in evaluation of scientific and scholarly research. Another field is the study of scientific misconduct. Harris (1979) provides a case study of how a famous experiment in psychology, Little Albert, has been distorted throughout the history of psychology, starting with the author (Watson) himself, general textbook authors, behavior therapists, and a prominent learning theorist. Harris proposes possible causes for these distortions and analyzes the Albert study as an ex

VMDS

VMDS abbreviates the relational database technology called Version Managed Data Store provided by GE Energy as part of its Smallworld technology platform and was designed from the outset to store and analyse the highly complex spatial and topological networks typically used by enterprise utilities such as power distribution and telecommunications. VMDS was originally introduced in 1990 as has been improved and updated over the years. Its current version is 6.0. VMDS has been designed as a spatial database. This gives VMDS a number of distinctive characteristics when compared to conventional attribute only relational databases. == Distributed server processing == VMDS is composed of two parts: a simple, highly scalable data block server called SWMFS (Smallworld Master File Server) and an intelligent client API written in C and Magik. Spatial and attribute data are stored in data blocks that reside in special files called data store files on the server. When the client application requests data it has sufficient intelligence to work out the optimum set of data blocks that are required. This request is then made to SWMFS which returns the data to the client via the network for processing. This approach is particularly efficient and scalable when dealing with spatial and topological data which tends to flow in larger volumes and require more processing then plain attribute data (for example during a map redraw operation). This approach makes VMDS well suited to enterprise deployment that might involve hundreds or even thousands of concurrent clients. == Support for long transactions == Relational databases support short transactions in which changes to data are relatively small and are brief in terms in duration (the maximum period between the start and the end of a transaction is typically a few seconds or less). VMDS supports long transactions in which the volume of data involved in the transaction can be substantial and the duration of the transaction can be significant (days, weeks or even months). These types of transaction are common in advanced network applications used by, for example, power distribution utilities. Due to the time span of a long transaction in this context the amount of change can be significant (not only within the scope of the transaction, but also within the context of the database as a whole). Accordingly, it is likely that the same record might be changed more than once. To cope with this scenario VMDS has inbuilt support for automatically managing such conflicts and allows applications to review changes and accept only those edits that are correct. == Spatial and topological capabilities == As well as conventional relational database features such as attribute querying, join fields, triggers and calculated fields, VMDS has numerous spatial and topological capabilities. This allows spatial data such as points, texts, polylines, polygons and raster data to be stored and analysed. Spatial functions include: find all features within a polygon, calculate the Voronoi polygons of a set of sites and perform a cluster analysis on a set of points. Vector spatial data such as points, polylines and polygons can be given topological attributes that allow complex networks to be modelled. Network analysis engines are provided to answer questions such as find the shortest path between two nodes or how to optimize a delivery route (the travelling salesman problem). A topology engine can be configured with a set of rules that define how topological entities interact with each other when new data is added or existing data edited. == Data abstraction == In VMDS all data is presented to the application as objects. This is different from many relational databases that present the data as rows from a table or query result using say JDBC. VMDS provides a data modelling tool and underlying infrastructure as part of the Smallworld technology platform that allows administrators to associate a table in the database with a Magik exemplar (or class). Magik get and set methods for the Magik exemplar can be automatically generated that expose a table's field (or column). Each VMDS row manifests itself to the application as an instance of a Magik object and is known as an RWO (or real world object). Tables are known as collections in Smallworld parlance. # all_rwos hold all the rwos in the database and is heterogeneous all_rwos << my_application.rwo_set() # valve_collection holds the valve collection valves << all_rwos.select(:collection, {:valve}) number_of_valves << valves.size Queries are built up using predicate objects: # find 'open' valves. open_valves << valves.select(predicate.eq(:operating_status, "open")) number_of_open_valves << open_valves.size _for valve _over open_valves.elements() _loop write(valve.id) _endloop Joins are implemented as methods on the parent RWO. For example, a manager might have several employees who report to him: # get the employee collection. employees << my_application.database.collection(:gis, :employees) # find a manager called 'Steve' and get the first matching element steve << employees.select(predicate.eq(:name, "Steve").and(predicate.eq(:role, "manager")).an_element() # display the names of his direct reports. name is a field (or column) # on the employee collection (or table) _for employee _over steve.direct_reports.elements() _loop write(employee.name) _endloop Performing a transaction: # each key in the hash table corresponds to the name of the field (or column) in # the collection (or table) valve_data << hash_table.new_with( :asset_id, 57648576, :material, "Iron") # get the valve collection directly valve_collection << my_application.database.collection(:gis, :valve) # create an insert transaction to insert a new valve record into the collection a # comment can be provide that describes the transaction transaction << record_transaction.new_insert(valve_collection, valve_data, "Inserted a new valve") transaction.run()

Pol.is

Polis (or Pol.is) is wiki survey software designed for large group collaborations. As a civic technology, Polis allows people to share their opinions and ideas, and its algorithm is intended to elevate ideas that can facilitate better decision-making, especially when there are lots of participants. Polis has been credited for assisting the passage of legislation in Taiwan. Pol.is has been used by governments in the United States, Canada, Singapore, Philippines, Finland, Spain and elsewhere. == History == Pol.is was founded by Colin Megill, Christopher Small, and Michael Bjorkegren after the Occupy Wall Street and Arab Spring movements. In Taiwan, pol.is has been "one of the key parts" of vTaiwan's suite of open-source tools for its citizen engagement efforts arising out of the Sunflower Student Movement. vTaiwan claims that of the 26 national issues related to technology discussed on the platform, 80% led to government action. Pol.is is also utilized by "Join," a national platform for online deliberation run by the Taiwanese government. In 2022, Wired reported that Polis was an influence on the Community Notes project at Twitter. In 2023, Megill advised OpenAI on how to facilitate deliberation at scale in a way that was more efficient than Polis, which still required significant human labor and analysis at the time. He helped to award $1 million in grants to teams working on solving the problem of deliberation at scale. In 2023, Anthropic was also exploring steering model behavior using Polis. In 2025, it helped the county that includes Bowling Green, Kentucky make a 25 year plan by facilitating the collection and review of ideas from thousands of residents, representing 10% of the county. 2,370 of 3,940 unique ideas were agreed-upon by over 80% of survey respondents. Ideas were screened by volunteers if they were redundant to an existing idea, off-topic or obscene. == How it works == Pol.is participants are anonymous and cannot reply directly to others posts, in an effort to avoid personal attacks for users. Its algorithms are designed not for engagement and scrolling, but to find areas of agreement to better understand the nuances of a wide range of opinions. Participants are prompted for ideas and vote on other participants' ideas. == Reception == Andrew Leonard, The Financial Times, and VentureBeat describe Pol.is as a possible antidote to the divisiveness of traditional internet discourse by gamifying consensus. Audrey Tang agreed saying, "Polis is quite well known in that it's a kind of social media that instead of polarizing people to drive so called engagement or addiction or attention, it automatically drives bridge making narratives and statements. So only the ideas that speak to both sides or to multiple sides will gain prominence in Polis." Niall Ferguson argues that the approach to utilize tools like Pol.is and Join in Taiwan empowers ordinary people instead of the elite and protects individual freedoms, providing a contrast to the AI-enhanced panopticon model seen in China. Carl Miller praised the technology as having "gamified finding consensus." Darshana Narayanan, in an op-ed in the Economist, argues that open-source machine-learning-based tools like Polis can help to bypass the influence of special interests or experts. Jamie Susskind cited polis and vTaiwan as a model for democracies, particularly around digital policy issues.

Spanner (database)

Spanner is a distributed SQL database management and storage service developed by Google. It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos. == Features == Spanner stores large amounts of mutable structured data. Spanner allows users to perform arbitrary queries using SQL with relational data while maintaining strong consistency and high availability for that data with synchronous replication. Key features of Spanner: Transactions can be applied across rows, columns, tables, and databases within a Spanner universe. Clients can control the replication and placement of data using automatic multi-site replication and failover. Replication is synchronous and strongly consistent. Reads are strongly consistent and data is versioned to allow for stale reads: clients can read previous versions of data, subject to garbage collection windows. Supports a native SQL interface for reading and writing data. Support for Graph Query Language == History == Spanner was first described in 2012 for internal Google data centers. Spanner's SQL capability was added in 2017 and documented in a SIGMOD 2017 paper. It became available as part of Google Cloud Platform in 2017, under the name "Cloud Spanner". == Architecture == Spanner uses the Paxos algorithm as part of its operation to shard (partition) data across up to hundreds of servers. It makes heavy use of hardware-assisted clock synchronization using GPS clocks and atomic clocks to ensure global consistency. TrueTime is the brand name for Google's distributed cloud infrastructure, which provides Spanner with the ability to generate monotonically increasing timestamps in data centers around the world. Google's F1 SQL database management system (DBMS) is built on top of Spanner, replacing Google's custom MySQL variant.

Operational data store

An operational data store (ODS) is used for operational reporting and as a source of data for the enterprise data warehouse (EDW). It is a complementary element to an EDW in a decision support environment, and is used for operational reporting, controls, and decision making, as opposed to the EDW, which is used for tactical and strategic decision support. An ODS is a database designed to integrate data from multiple sources for additional operations on the data, for reporting, controls and operational decision support. Unlike a production master data store, the data is not passed back to operational systems. It may be passed for further operations and to the data warehouse for reporting. An ODS should not be confused with an enterprise data hub (EDH). An operational data store will take transactional data from one or more production systems and loosely integrate it, in some respects it is still subject oriented, integrated and time variant, but without the volatility constraints. This integration is mainly achieved through the use of EDW structures and content. An ODS is not an intrinsic part of an EDH solution, although an EDH may be used to subsume some of the processing performed by an ODS and the EDW. An EDH is a broker of data. An ODS is certainly not. Because the data originates from multiple sources, the integration often involves cleaning, resolving redundancy and checking against business rules for integrity. An ODS is usually designed to contain low-level or atomic (indivisible) data (such as transactions and prices) with limited history that is captured "real time" or "near real time" as opposed to the much greater volumes of data stored in the data warehouse generally on a less-frequent basis. == General use == The general purpose of an ODS is to integrate data from disparate source systems in a single structure, using data integration technologies like data virtualization, data federation, or extract, transform, and load (ETL). This will allow operational access to the data for operational reporting, master data or reference data management. An ODS is not a replacement or substitute for a data warehouse or for a data hub but in turn could become a source.