Ontology-based data integration

Ontology-based data integration

Ontology-based data integration involves the use of one or more ontologies to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology‑based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process. == Background == Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used: Syntactic heterogeneity: is a result of differences in representation format of data Schematic or structural heterogeneity: the native model or structure to store data differ in data sources leading to structural heterogeneity. Schematic heterogeneity that particularly appears in structured databases is also an aspect of structural heterogeneity. Semantic heterogeneity: differences in interpretation of the 'meaning' of data are source of semantic heterogeneity System heterogeneity: use of different operating system, hardware platforms lead to system heterogeneity Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies [1] Archived 2007-06-16 at the Wayback Machine has made it possible for the data integration community to leverage them for semantic integration of data and information. == The role of ontologies == Ontologies enable the unambiguous identification of entities in heterogeneous information systems and assertion of applicable named relationships that connect these entities together. Specifically, ontologies play the following roles: Content Explication The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology. Query Model In some systems like SIMS, the query is formulated using the ontology as a global query schema. Verification The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system. == Approaches using ontologies for data integration == There are three main architectures that are implemented in ontology‑based data integration applications, namely, Single ontology approach A single ontology is used as a global reference model in the system. This is the simplest approach as it can be simulated by other approaches. SIMS is a prominent example of this approach. The Structured Knowledge Source Integration component of Research Cyc is another prominent example of this approach. (Title = Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries). The Gellish Taxonomic Dictionary-Ontology follows this approach as well. Multiple ontologies Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in computer science [2]. The OBSERVER system is an example of this approach. Hybrid approaches The hybrid approach involves the use of multiple ontologies that subscribe to a common, top-level vocabulary. The top-level vocabulary defines the basic terms of the domain. Thus, the hybrid approach makes it easier to use multiple ontologies for integration in presence of the common vocabulary.

YrWall

YrWall is a Digital Graffiti Wall developed by event company Luma, where designs are created on a large wall using a modified spray paint can. The can contains no paint, instead it has an IR light which is tracked by a computer vision system and the image immediately back-projected onto the wall. The inbuilt YrWall software has much of the functionality of a typical computer paint program, with a pop-out interface which enables users to change colour, spray width, opacity, work with stencils and use animated items such as swirls, stars, drips and splats. Recent additions to YrWall include options to email a JPEG of the completed design and create personalised stickers and T-shirts. == Dragons' Den == The inventor of YrWall, Tom Hogan, and his business partner, Tim Williams, appeared on Episode 4 of Series 8 of the BBC show Dragons' Den. Seeking investment in YrWall, the entrepreneurs were successful in gaining £50,000 for 40% of the YrWall parent company Lumacoustics from Dragons Deborah Meaden and Peter Jones. == World's Largest Interactive Graffiti Wall == In September 2009 YrWall was used to create the 'World's Largest Interactive Graffiti Wall' at the Bristol Festival, UK. Artists used the standard 3.5 m2 YrWall to produce artwork which was in turn projected live onto a 26m x 10m space on the side of the iconic Lloyds amphitheatre building.

Evntlive

Evntlive was an interactive digital concert venue that allowed music fans worldwide to stream concerts to their computer, tablet, or phone. Based in Redwood City, CA, EVNTLIVE Beta launched on April 15, 2013. EVNTLIVE provided users with the ability to switch camera angles, view All Access interviews and clips from artists, buy music, and chat with other online concert-goers in the in-app feature. Users could watch live and on-demand concerts with both free and pay-per-view concerts offered. In its first two months, EVNTLIVE streamed live performances of popular artists ranging from Bon Jovi to Wale, as well as music festivals such as Taste of Country and Mountain Jam; including performances by The Lumineers, Gary Clark Jr., Phil Lesh & Friends, Primus, and more. On December 6, 2013, Evntlive was acquired and absorbed by Yahoo!. The site ceased operations and redirected viewers to Yahoo! Music and Yahoo! Screen promptly afterwards. == About the Platform == EvntLive is an HTML5, web-based platform available on laptops, iPads, and mobile devices. Users must register for a free account on Evntlive’s website in order to reserve tickets and access live and on-demand content. Once they reserve tickets, they can view All Access features from their favorite artists or bands, purchase music, and interact with other online audience members using Buzz. Users can also switch between alternate camera angles as though they are on the concert floor - sharing the experience with their friends online in real-time. EvntLive was acquired by Yahoo in December 2013 == Artists == Bon Jovi Wale Escape the Fate The Parlotones === Taste of Country Music Festival === Trace Adkins Willie Nelson Justin Moore Montgomery Gentry Craig Campbell Blackberry Smoke Gloriana Dustin Lynch LoCash Cowboys Rachel Farley Parmalee Joe Nichols === Mountain Jam Music Festival === Source: The Lumineers Primus Widespread Panic Gov't Mule Phil Lesh The Avett Brothers Dispatch Rubblebucket Michael Franti Jackie Greene Deer Tick Gary Clark Jr. ALO The London Souls Nicki Bluhm Amy Helm The Lone Bellow The Revivalists Swear and Shake Roadkill Ghost Choir Michael Bernard Fitzgerald Michele Clark 's Sunset Sessions Semi Precious Weapons Dale Earnhardt Jr. Jr. DigiTour Media Pentatonix Allstar Weekend Tyler Ward === Launch Music Festival ===

D4Science

D4Science is a Data Infrastructure offering services by community-driven virtual research environments. In particular, it supports communities of practice willing to implement open science practices, thus it is an Open Science Infrastructure. The infrastructure follows the system of systems approach, where the constituent systems (Service providers) offer "resources" (namely services and by them data, computing, storage) assembled together to implement the overall set of D4Science services. In particular, D4Science aggregates "domain agnostic" service providers as well as community-specific ones to build a unifying space where the aggregated resources can be exploited via Virtual research Environments and their services. It is spread across several sites, the primary one is hosted by the Istituto di Scienza e Tecnologie dell'Informazione of National Research Council (Italy). At the earth of this infrastructure there is an Open Source Software named gCube system. == Services == D4Science offers: Virtual Research Environment as a Service providing any community of practice with a dedicated working environment supporting any knowledge production process in a collaborative way, in fact every VRE enables computer-supported cooperative work by design. D4Science-based VREs are web-based, community-oriented, collaborative, user-friendly, open-science-enabler working environments for scientists and practitioners willing to work together to perform a set of (research) task. From the end-user perspective, each VRE manifests in a unifying web application (and a set of application programming interfaces (APIs)): (a) comprising several applications organised in specific menu items and (b) running in a plain web browser. Every application is providing VRE users with facilities implemented by relying on one or more services provisioned by diverse providers. Among the basic services every VRE is equipped with there are a Social Networking area enabling collaborative and open discussions on any topic and disseminating information of interest for the community, for example, the availability of a research outcome; a Workspace for storing, organizing and sharing any version of a research artifact, including dataset and model implementation; a User Management dashboard for managing membership and roles; a Catalogue Service recording the assets worth being published thus to make it possible for others to be informed and make use of these assets. Science Gateway as a Service providing a community of practice with a dedicated science gateway hosting a selected set of virtual research environments. Data Analytics at scale for data analytics including: a proprietary data analytics platform (DataMiner) to execute analytics tasks either by relying on methods provided by the user or by others. It is endowed with importing and sharing facilities for analytics methods implemented in heterogeneous forms including R, Java, Python, and KNIME. The platform enacts tasks execution by a distributed and hybrid computing infrastructure. Moreover, one of the worth highlighting feature of this platform is its open science-friendliness. All the analytics methods integrated in it are exposed by a standard protocol (the OGC WPS protocol) clients can use to get informed on available methods as well as to start processes, monitor their execution and access results. Every analytics task performed by the platform automatically produces a provenance record catering for the reproducibility of the task; an RStudio-based development environment for R enabling to perform statistical computing tasks in the cloud. This RStudio environment is (i) preconfigured with libraries and packages to ease the execution of common data analytics tasks, and (ii) provides seamless access to the VRE Workspace enabling sharing of resources with other members of the same working environment. a Jupyter-based notebook environment for developing and executing interactive computing by JupyterLab instances. Each JupyterLab is (i) preconfigured with libraries and packages to ease the execution of common data analytics tasks, and (ii) provides access to the VRE Workspace enabling sharing of resources with other members of the same working environment. == Community == The D4Science Infrastructure serves more than 24,000 registered users (August 2024) through 177 active VREs offered via 20 Science gateways. This extensive infrastructure not only supports a diverse range of scientific communities but also fosters significant engagement and collaboration among researchers worldwide. Engagement within the D4Science community is robust, with users benefiting from user-friendly application environments tailored to their specific needs. The platform allows users to securely preserve, access, and share their data from anywhere, fostering a collaborative and inclusive research environment. Additionally, groups of users can create their own virtual environments and customise them with the applications they need, further enhancing the platform's flexibility and usability. Supported communities and cases range from Agri-food to Social Data Science, Earth Science and Marine Science. These diverse applications demonstrate the versatility and broad applicability of the D4Science Infrastructure, making it an invaluable resource for researchers across various scientific domains. == History == The D4Science development has been supported by several European-funded projects. DILIGENT (2004-2007) in the Sixth Framework Programme for Research and Technological Development was the forerunner where a testbed infrastructure built by integrating digital library and grid computing technologies and resources was conceived and developed to serve the needs of communities of practice involved in knowledge development. In the context of the Seventh Framework Programme for research, technological development and demonstration the development of the D4Science initiative. In this period the infrastructure was established and developed to serve communities of practices from domains ranging from Earth Science to Marine Science with worldwide scope In the context of the H2020 research and innovation programme the maturity level of the D4Science infrastructure was high enough to allow a large and very diverse set of communities of practice to benefit from it and its services and further contribute to its development. Moreover, the services offered by the infrastructure have been developed to support open science practices. The operation and improvement of the D4Science infrastructure facilities are still ongoing while its exploitation is progressively growing.

Coda (document editor)

Coda is a cloud-based multi-user document editor. == Features == Coda is a document editor that provides features from spreadsheets, presentation documents, word processor files, and apps. Possible uses for Coda documents include using them as a wiki, database, or project management tool. Coda has built a formula system, much like spreadsheets commonly have, but in Coda documents, formulas can be used anywhere within the document, and can link to things that aren't just cells, including other documents, calendars or graphs. Coda also has the ability to integrate with custom third-party services, and has automations. It has offered $1 million in grants for developers that create such integrations. == Development == Coda Project, Inc. was founded by Shishir Mehrotra and Alex DeNeui in June 2014. Having met at MIT, they developed the project mostly privately before announcing a public beta in October 2017. The company was named Coda, which is an anadrome for “a doc”. Coda raised $60 million in venture capital funding over two rounds by 2017. The Coda software came out of beta in February 2019. Version 1.0 had an improved user interface, new features for folders and workspaces, and permission levels for accessing files. Coda raised another $80 million in 2020, and $100 million in 2021. The 2021 funding brought Coda's valuation to $1.4 billion, making it a unicorn. In December 2024, Coda was acquired by Grammarly in an all-stock deal for an undisclosed amount. In October 2025, Grammarly rebranded as Superhuman, incorporating Coda as a core product within the new Superhuman productivity suite alongside Grammarly's writing tools, Superhuman Mail, and a new AI assistant called Superhuman Go.

Singularity studies

Singularity studies is an interdisciplinary academic field which examines the idea of technological singularity — the hypothesised point at which artificial intelligence may surpass human intelligence, might be attained by artificial intelligence (AI), robotics, and other technologies and sciences, and its social impacts. In this academic field, the study and research are conducted across a broad array of terrains such as information science, robotics, social informatics, economics, philosophy, and ethics. The primary aim of singularity studies is to gain an integrative understanding of the transformation of social systems occurring in tandem with the explosive evolution of AI and also the changes to be effected by such transformation in the view of humans, ethics, and legal systems. == History == An academic work on technological singurality has appeared in computer science, philosophy, sociology, and law since the early 1990s. Early discussions of an intelligence explosion were popularised by science-fiction writer Vernor Vinge in 1993 and later systematised by futurist Ray Kurzweil. Since the 2010s, universities such as Oxford, Stanford, and Keio have established dedicated programmes, while peer-reviewed journals have begun to publish scenario analyses and policy studies. Ongoing debates question the predictive value of singularity scenarios and warn against a deterministic view of technology. == Characteristics of research == Singularity studies extends beyond mere future predictions and offer an intellectual foundation for proactively designing and creating a desirable future. Principal research themes in this realm include: Ethics of AI; Social implications of technologies; Possibility of harmonious coexistence of humans and AI; Communication with AI; and Redesign of social systems. == Technologists and academics == Vernor Vinge: Propounded the concept of singularity in 1993, making a massive impact on the academic and science-fiction spheres. Ray Kurzweil: Predicted the advent around 2045 of the technological singularity in his 2005 book The Singularity Is Near. Nick Bostrom: Offered philosophical reflections on superintelligence and the risks posed by AI. He is the founding director of the now-dissolved Future of Humanity Institute at the University of Oxford. === Japan === Kento Sasano: A social informatician, AI educator, and inventor. He is the president of the Japan Society of Singularity Studies. == Challenges and outlook == Singularity studies is still evolving as an academic field, and quite a few challenges remain unresolved in regard to the systematization of their theories, research methods, and educational curricula. That said, in this day and age of accelerating technological and societal shifts, interdisciplinary approaches have gained in importance and are drawing much attention in the arenas of scholarly research, intercorporate collaboration, and policy planning.

Dominant resource fairness

Dominant resource fairness (DRF) is a rule for fair division. It is particularly useful for dividing computing resources in among users in cloud computing environments, where each user may require a different combination of resources. DRF was presented by Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker and Ion Stoica in 2011. == Motivation == In an environment with a single resource, a widely used criterion is max-min fairness, which aims to maximize the minimum amount of resource given to a user. But in cloud computing, it is required to share different types of resource, such as: memory, CPU, bandwidth and disk-space. Previous fair schedulers, such as in Apache Hadoop, reduced the multi-resource setting to a single-resource setting by defining nodes with a fixed amount of each resource (e.g. 4 CPU, 32 MB memory, etc.), and dividing slots which are fractions of nodes. But this method is inefficient, since not all users need the same ratio of resources. For example, some users need more CPU whereas other users need more memory. As a result, most tasks either under-utilize or over-utilize their resources. DRF solves the problem by maximizing the minimum amount of the dominant resource given to a user (then the second-minimum etc., in a leximin order). The dominant resource may be different for different users. For example, if user A runs CPU-heavy tasks and user B runs memory-heavy tasks, DRF will try to equalize the CPU share given to user A and the memory share given to user B. == Definition == There are m resources. The total capacities of the resources are r1,...,rm. There are n users. Each users runs individual tasks. Each task has a demand-vector (d1,..,dm), representing the amount it needs of each resource. It is implicitly assumed that the utility of a user equals the number of tasks he can perform. For example, if user A runs tasks with demand-vector [1 CPU, 4 GB RAM], and receives 3 CPU and 8 GB RAM, then his utility is 2, since he can perform only 2 tasks. More generally, the utility of a user receiving x1,...,xm resources is minj(xj/dj), that is, the users have Leontief utilities. The demand-vectors are normalized to fractions of the capacities. For example, if the system has 9 CPUs and 18 GB RAM, then the above demand-vector is normalized to [1/9 CPU, 2/9 GB]. For each user, the resource with the highest demand-fraction is called the dominant resource. In the above example, the dominant resource is memory, as 2/9 is the largest fraction. If user B runs a task with demand-vector [3 CPU, 1 GB], which is normalized to [1/3 CPU, 1/18 GB], then his dominant resource is CPU. DRF aims to find the maximum x such that all agents can receive at least x of their dominant resource. In the above example, this maximum x is 2/3: User A gets 3 tasks, which require 3/9 CPU and 2/3 GB. User B gets 2 tasks, which require 2/3 CPU and 1/9 GB. The maximum x can be found by solving a linear program; see Lexicographic max-min optimization. Alternatively, the DRF can be computed sequentially. The algorithm tracks the amount of dominant resource used by each user. At each round, it finds a user with the smallest allocated dominant resource so far, and allocates the next task of this user. Note that this procedure allows the same user to run tasks with different demand vectors. == Properties == DRF has several advantages over other policies for resource allocation. Proportionality: each user receives at least as much resources as they could get in a system in which all resources are partitioned equally among users (the authors call this condition "sharing incentive"). Strategyproofness: a user cannot get a larger allocation by lying about his needs. Strategyproofness is important, as evidence from cloud operators show that users try to manipulate the servers in order to get better allocations. Envy-freeness: no user would prefer the allocation of another user. Pareto efficiency: no other allocation is better for some users and not worse for anyone. Population monotonicity: when a user leaves the system, the allocations of remaining users do not decrease. When there is a single resource that is a bottleneck resource (highly demanded by all users), DRF reduces to max-min fairness. However, DRF violates resource monotonicity: when resources are added to the system, some allocations might decrease. == Extensions == Weighted DRF is an extension of DRF to settings in which different users have different weights (representing their different entitlements). Parkes, Procaccia and Shah formally extend weighted DRF to a setting in which some users do not need all resources (that is, they may have demand 0 to some resource). They prove that the extended version still satisfies proportionality, Pareto-efficiency, envy-freeness, strategyproofness, and even Group strategyproofness. On the other hand, they show that DRF may yield poor utilitarian social welfare, that is, the sum of utilities may be only 1/m of the optimum. However, they prove that any mechanism satisfying one of proportionality, envy-freeness or strategyproofness may suffers from the same low utilitarian welfare. They also extend DRF to the setting in which the users' demands are indivisible (as in fair item allocation). For the indivisible setting, they relax envy-freeness to EF1. They show that strategyproofness is incompatible with PO+EF1 or with PO+proportionality. However, a mechanism called SequentialMinMax satisfies efficiency, proportionality and EF1. Wang, Li and Liang present DRFH - an extension of DRF to a system with several heterogeneous servers. == Implementation == DRF was first implemented in Apache Mesos - a cluster resource manager, and it led to better throughput and fairness than previously used fair-sharing schemes.