Data custodian

Data custodian

In data governance groups, responsibilities for data management are increasingly divided between the business process owners and information technology (IT) departments. Two functional titles commonly used for these roles are data steward and data custodian. Data Stewards are commonly responsible for data content, context, and associated business rules. Data custodians are responsible for the safe custody, transport, storage of the data and implementation of business rules. Simply put, Data Stewards are responsible for what is stored in a data field, while data custodians are responsible for the technical environment and database structure. Common job titles for data custodians are database administrator (DBA), data modeler, ETL developer and data engineer. == Data custodian responsibilities == A data custodian ensures: Access to the data is authorized and controlled Data stewards are identified for each data set Technical processes sustain data integrity Processes exist for data quality issue resolution in partnership with data stewards Technical controls safeguard data Data added to data sets are consistent with the common data model Versions of master data are maintained along with the history of changes Change management practices are applied in maintenance of the database Data content and changes can be audited

Artificial intelligence and elections

As artificial intelligence (AI) has become more mainstream, there is growing concern about how this will influence elections. Potential targets of AI include election processes, election offices, election officials and election vendors. There are also global efforts to improve elections using AI. == Tactics == Generative AI capabilities allow creation of misleading content. Examples of this include text-to-video, deepfake videos, text-to-image, AI-altered images, text-to-speech, voice cloning, and text-to-text. In the context of an election, a deepfake video of a candidate may propagate information that the candidate does not endorse. Chatbots could spread misinformation related to election locations, times or voting methods. In contrast to malicious actors in the past, these techniques require little technical skill and can spread rapidly. LLM-generated messages have the capacity to persuade humans on political issues. Researchers have begun to investigate how people rate messages that LLMs generate for how persuasive they are. When it came to policy issues, the LLM-generated messages received a 2.91 compared to a 2.80 when it came to smartness between the AI and humans. The LLM-generated messages were often more technical and analytical than human-generated messages. Generative AI has been used to micro-target people during tight political elections. The generation of targeted large language models has triggered concern that they will be used to leverage readily scale microtargeting. Rephrasing inputs have been used to generate fraudulent emails and phishing websites. Rephrasing inputs in a microtargeting does not violate the terms of OpenAI usage. There are no safeguards to prevent the use of rephrasing and creation of fraudulent emails. Political campaign managers have access to this allowing for them to create targeted content. == Usage by country == === Argentina === ==== 2023 elections ==== During the 2023 Argentine primary elections, Javier Milei's team distributed AI generated images including a fabricated image of his rival Sergio Massa and drew 3 million views. The team also created an unofficial Instagram account entitled "AI for the Homeland." Sergio Massa's team also distributed AI generated images and videos. === Bangladesh === ==== 2024 elections ==== In the run up to the 2024 Bangladeshi general election, deepfake videos of female opposition politicians appeared. Rumin Farhana was pictured in a bikini while Nipun Ray was shown in a swimming pool. === Canada === ==== 2025 elections ==== In the run up to the 2025 Canadian federal election, the use of AI tools is likely to figure prominently. India, Pakistan and Iran are all expected to make efforts to subvert the national vote using disinformation campaigns to deceive voters and sway diaspora communities. In a report by the Canadian Centre for Cyber Security called "Cyber Threats to Canada's Democratic Process: 2025 Update", it states that malicious actors including China and Russia: "are most likely to use generative AI as a means of creating and spreading disinformation, designed to sow division among Canadians and push narratives conducive to the interests of foreign states". === France === ==== 2024 elections ==== In the 2024 French legislative election, deepfake videos appeared claiming: i) That they showed the family of Marine le Pen. In the videos, young women, supposedly Le Pen's nieces, are seen skiing, dancing and at the beach "while making fun of France’s racial minorities": However, the family members don't exist. On social media there were over 2 million views. ii) In a video seen on social media, a deepfake video of a France24 broadcast appeared to report that the Ukrainian leadership had "tried to lure French president Emmanuel Macron to Ukraine to assassinate him and then blame his death on Russia". === Ghana === ==== 2024 elections ==== During the months before the December 2024 Ghanaian general election, a network of at least 171 fake accounts has been used to spam social media. Posts have been used by a group identified as "@TheTPatriots" to promote the New Patriotic Party, although it is not known whether the two are connected. All the networks' posts were "highly likely" to have been generated by ChatGPT and appear to be the "first secretly partisan network using AI to influence elections in Ghana". The opposition National Democratic Congress was also criticized with its leader John Mahama being called a drunkard. === India === ==== 2024 elections ==== In the 2024 Indian general election, politicians used deepfakes in their campaign materials. These deepfakes included politicians who had died prior to the election. Mathuvel Karunanidhi's party posted with his likeness even though he had died 2018. A video The All-India Anna Dravidian Progressive Federation party posted showed an audio clip of Jayaram Jayalalithaa even though she had died in 2016. The Deepfakes Analysis Unit (DAU) is an open source platform created in March 2024 for the public to share misleading content and assess if it had been AI-generated. AI was also used to translate political speeches in real time. This translating ability was widely used to reach more voters. === Indonesia === ==== 2024 elections ==== In the 2024 Indonesian presidential election, Prabowo Subianto made extensive use of AI-generated art in his campaign, which ranged from images of himself as an adorable child to various child portrayals in his advertisements. The Indonesian Children's Protection Commission condemned these ads, labeling them as a form of misuse. Other candidates, Anies Baswedan and Ganjar Pranowo, also incorporated AI art into their campaigns. Throughout the election period, all presidential candidates faced attacks from deepfakes, both in video and audio formats. === Ireland === ==== 2024 elections ==== In the last weeks of the 2024 Irish general election a spoof election poster appeared in Dublin featuring "an AI-generated candidate with three arms". The candidate is called Aidan Irwin, but no-one stood in the election with that name. A slogan on the poster says "put matters into artificial intelligence’s hands". The convincing election poster shows a man that "has six fingers on one hand, three arms, and a distorted thumb". === New Zealand === ==== 2023 elections ==== In May 2023, ahead of the 2023 New Zealand general election in October 2023, the New Zealand National Party published a "series of AI-generated political advertisements" on its Instagram account. After confirming that the images were faked, a party spokesperson said that it was "an innovative way to drive our social media". === Pakistan === ==== 2024 elections ==== AI has been used by the imprisoned ex-Prime Minister Imran Khan and his media team in the 2024 Pakistani general election: i) An AI generated audio of his voice was added to a video clip and was broadcast at a virtual rally. ii) An op-ed in The Economist written by Khan was later claimed by himself to have been written by AI which was later denied by his team. The article was liked and shared on social media by thousands of users. === South Africa === ==== 2024 elections ==== In the 2024 South African general election, there were several uses of AI content: i) A deepfaked video of Joe Biden emerged on social media showing him saying that "The U.S. would place sanctions on SA and declare it an enemy state if the African National Congress (ANC) won". ii) In a deepfake video, Donald Trump was shown endorsing the uMkhonto weSizwe party. It was posted to social media and was viewed more than 158,000 times. iii) Less than 3 months before the elections, a deepfake video showed U.S. rapper Eminem endorsing the Economic Freedom Fighters party while criticizing the ANC. The deepfake was viewed on social media more than 173,000 times. === South Korea === ==== 2022 elections ==== In the 2022 South Korean presidential election, a committee for one presidential candidate Yoon Suk Yeol released an AI avatar 'Al Yoon Seok-yeol' that would campaign in places the candidate could not go. The other presidential candidate Lee Jae-myung introduced a chatbot that provided information about the candidate's pledges. ==== 2024 elections ==== Deepfakes were used to spread misinformation before the 2024 South Korean legislative election with one source reporting 129 deepfake violations of election laws within a two week period. Seoul hosted the 2024 Summit for Democracy, a virtual gathering of world leaders initiated by US President Joe Biden in 2021. The focus of the summit was on digital threats to democracy including artificial intelligence and deepfakes. === Taiwan === ==== 2024 elections ==== AI-generated content was used during the 2024 Taiwanese presidential election. Among the media were: i) A deepfake video of General Secretary of the Chinese Communist Party Xi Jinping which showed him supporting the presidential elections. Created on social media, the video was "widely circulated

Uncertain database

An uncertain database is a kind of database studied in database theory. The goal of uncertain databases is to manage information on which there is some uncertainty. Uncertain databases make it possible to explicitly represent and manage uncertainty on the data, usually in a succinct way. == Formal definition == At the basis of uncertain databases is the notion of possible world. Specifically, a possible world of an uncertain database is a (certain) database which is one of the possible realizations of the uncertain database. A given uncertain database typically has more than one, and potentially infinitely many, possible worlds. A formalism to represent uncertain databases then explains how to succinctly represent a set of possible worlds into one uncertain database. == Types of uncertain databases == Uncertain database models differ in how they represent and quantify these possible worlds: Incomplete databases are a compact representation of the set of possible worlds – the use of NULL in SQL, arguably the most commonplace instantiation of uncertain databases, is an example of incomplete database model. Probabilistic databases are a compact representation of a probability distribution over the set of possible worlds. Fuzzy databases are a compact representation of a fuzzy set of the possible worlds. Though mostly studied in the relational setting, uncertain database models can also be defined in other relational models such as graph databases or XML databases. === Incomplete database === The most common database model is the relational model. Multiple incomplete database models have been defined over the relational model, that form extensions to the relational algebra. These have been called Imieliński–Lipski algebras: Relations with NULL values, also called Codd tables c-tables v-tables === Example === The following table is a relation of an incomplete database, described in the formalism of NULL values: There are infinitely many possible worlds for this incomplete database, obtained by replacing the "NULL" values with concrete values. For instance, the following relation is a possible world:

Aidoc

Aidoc Medical is an Israeli technology company that develops computer-aided simple triage and notification systems. Aidoc has obtained U.S. Food and Drug Administration and CE mark approval for its stroke, pulmonary embolism, cervical fracture, intracranial hemorrhage, intra-abdominal free gas, and incidental pulmonary embolism algorithms. Aidoc algorithms are in use in more than 900 hospitals and imaging centers, including Montefiore Nyack Hospital, LifeBridge Health, LucidHealth, Yale New Haven Hospital, Cedars-Sinai Medical Center, University of Rochester Medical Center, and Sheba Medical Center. == History == Aidoc was founded in 2016 by Elad Walach as the CEO, Michael Braginsky as the CTO and Guy Reiner as the VP. In April 2017, the company raised $7M, led by TLV Partners, and in April 2019, the company raised another $27M, led by Square Peg capital. There have been several additional rounds of funding as well, bringing Aidoc's total investment to $370M as of July 2025. In August 2018, Aidoc gained FDA clearance for its intracranial hemorrhage system, and in May 2019 it received clearance for the pulmonary embolism system. In January 2020, the system for detecting large-vessel occlusions (LVOs) in head CTA examinations obtained FDA clearance. In October 2024, it was reported that Aidoc is working with NVIDIA to develop a framework for deployment and integration of artificial intelligence tools in healthcare. The Blueprint for Resilient Integration and Deployment of Guided Excellence (BRIDGE) is a guideline to facilitate AI adoption in the healthcare industry. == Products and market == Aidoc has developed a suite of artificial intelligence products that flag both time-sensitive and time-consuming (for the radiologist) abnormalities across the body. The algorithms are developed with large quantities of data to provide diagnostic aid for a broad set of pathologies. The company offers an array of algorithms that span across the body, including for intracranial hemorrhage, spine fractures (C, T & L), free air in the abdomen, pulmonary embolism, and more. It developed "Always-on AI", a term coined by Elad Walach that refers to a type of artificial intelligence that is "Always-on—constantly running in the background and automatically analyzing medical imaging data, identifying urgent findings, and sparing radiologists from "drowning" in vast amounts of irrelevant data. Aidoc's solutions cover medical conditions prevalent in all settings (ED/inpatient/outpatient), including level 1 trauma centers, outpatient imaging centers, teleradiology groups and, are set up in over 200 medical centers worldwide. Notable customers include the University of Rochester Medical Center and Global Diagnostics Australia. Aidoc announced in 2024 that its new Clinical AI Reasoning Engine (CARE1) had been submitted for FDA approval. In September 2025 Aidoc received a "Breakthrough Device Designation" from the FDA for a new multi-triage solution that spans numerous acute findings in CT scans. Aidoc's CARE1 foundation model was the basis of the workflow on which the designation was made, enabling simultaneous coverage of multiple pathologies. This new designation allows parallel FDA review of multiple indications under a single submission. In April 2026, Aidoc raised million in a Series E funding round led by Growth Equity at Goldman Sachs Alternatives, with participation from General Catalyst and NVentures. The financing brought the company's total funding to over million. == Clinical Research == A clinical study on Aidoc’ accuracy of deep convolutional neural networks for the detection of pulmonary embolism (PE) on CT pulmonary angiograms (CTPAs) was performed by the University Hospital of Basel and presented at the European Congress of Radiology, showing that the Aidoc algorithm reached 93% sensitivity and 95% specificity. Clinical research has also been performed to test the diagnostic performance of Aidoc's deep learning-based triage system for the flagging of acute findings in abdominal computed tomography (CT) examinations. Overall, the algorithm achieved 93% sensitivity (91/98, 7 false negatives) and 97% specificity (93/96, 3 false-positive) in the detection of acute abdominal findings. Additional clinical research on Aidoc's Intracranial hemorrhage algorithm accuracy was presented at the European Congress of Radiology by Antwerp University Hospital, evaluating the use of its deep learning algorithm for the detection of intracranial hemorrhage on non-contrast enhanced CT of the brain. The University of Washington completed a study on the accuracy of Aidoc's intracranial hemorrhage algorithm.

FAIR data

FAIR data is data which meets the 2016 FAIR principles of findability, accessibility, interoperability, and reusability (FAIR). The FAIR principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in the volume, complexity, and rate of production of data. The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the FAIR principles and also carries an explicit data‑capable open license. == FAIR principles published by GO FAIR == Findable The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process. F1. (Meta)data are assigned a globally unique and persistent identifier F2. Data are described with rich metadata (defined by R1 below) F3. Metadata clearly and explicitly include the identifier of the data they describe F4. (Meta)data are registered or indexed in a searchable resource Accessible Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation. A1. (Meta)data are retrievable by their identifier using a standardised communications protocol A1.1 The protocol is open, free, and universally implementable A1.2 The protocol allows for an authentication and authorisation procedure, where necessary A2. Metadata are accessible, even when the data are no longer available Interoperable The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation I2. (Meta)data use vocabularies that follow FAIR principles I3. (Meta)data include qualified references to other (meta)data Reusable The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. R1. (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (Meta)data are released with a clear and accessible data usage license R1.2. (Meta)data are associated with detailed provenance R1.3. (Meta)data meet domain-relevant community standards The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component). === Acceptance and implementation === Before FAIR, a 2007 OECD report was the most influential paper discussing similar ideas related to data accessibility. In January 2014, the Lorentz Centre at Leiden University hosted a workshop entitled "Jointly designing a data FAIRPORT" where the participants first formulated the FAIR principles. After further discussions, they were published in the March 2016 issue of Scientific Data. At the 2016 G20 Hangzhou summit, the G20 leaders issued a statement endorsing the application of FAIR principles to research. Also in 2016, a group of Australian organisations developed a Statement on FAIR Access to Australia's Research Outputs, which aimed to extend the principles to research outputs more generally. In 2017, Germany, Netherlands and France agreed to establish an international office to support the FAIR initiative, the GO FAIR International Support and Coordination Office. Other international organisations active in the research data ecosystem, such as CODATA or Research Data Alliance (RDA) also support FAIR implementations by their communities. FAIR principles implementation assessment is being explored by FAIR Data Maturity Model Working Group of RDA, CODATA's strategic Decadal Programme "Data for Planet: Making data work for cross-domain challenges" mentions FAIR data principles as a fundamental enabler of data driven science. The Association of European Research Libraries recommends the use of FAIR principles. A 2017 paper by advocates of FAIR data reported that awareness of the FAIR concept was increasing among various researchers and institutes, but also, understanding of the concept was becoming confused as different people apply their own differing perspectives to it. Guides on implementing FAIR data practices state that the cost of a data management plan in compliance with FAIR data practices should be 5% of the total research budget. In 2019 the Global Indigenous Data Alliance (GIDA) released the CARE Principles for Indigenous Data Governance as a complementary guide. The CARE principles extend principles outlined in FAIR data to include Collective benefit, Authority to control, Responsibility, and Ethics to ensure data guidelines address historical contexts and power differentials. The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event, "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", held 8 November 2018, in Gaborone, Botswana. The lack of information on how to implement the guidelines have led to inconsistent interpretations of them. In January 2020, representatives of nine groups of universities around the world produced the Sorbonne declaration on research data rights, which included a commitment to FAIR data, and called on governments to provide support to enable it. In 2021, researchers identified the FAIR principles as a conceptual component of data catalog software tools, with the other components being metadata management, business context and data responsibility roles. In April 2022, Matthias Scheffler and colleagues argued in Nature that FAIR principles are "a must" so that data mining and artificial intelligence can extract useful scientific information from the data. There have been moves in the geosciences to establish FAIR data by use of decimal georeferencing However, making data (and research outcomes) FAIR is a challenging task, and it is challenging to assess the FAIRness. In 2020, the FAIR Data Maturity Model Working Group published a set of guidelines for assessing "FAIRness".

Software engine

A software engine is a core component of a complex software system. The word "engine" is a metaphor of a car's engine. Thus a software engine is a complex subsystem; not unlike how a car engine functions. Software engines work in conjunction with other components of a process or system. They typically have an input and an output, and the productivity is usually linear to running speed. There is no formal guideline for what should be called an engine, but the term has become widespread in the software industry. == Notable examples == === Multi-engine systems === Mainstream web browsers have both a browser engine and a JavaScript engine. Video games are often based on a game engine. Some of these also have specialized physics or graphics engines.

Ecoinformatics

Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics. A few definitions have been circulating, mostly centered on the creation of tools to access and analyze natural system data. However, the scope and aims of ecoinformatics are certainly broader than the development of metadata standards to be used in documenting datasets. Ecoinformatics aims to facilitate environmental research and management by developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different environmental datasets to be combined to test ecological hypotheses. Ecoinformatics is related to the concept of ecosystem services. Ecoinformatics characterize the semantics of natural system knowledge. For this reason, much of today's ecoinformatics research relates to the branch of computer science known as knowledge representation, and active ecoinformatics projects are developing links to activities such as the Semantic Web. Current initiatives to effectively manage, share, and reuse ecological data are indicative of the increasing importance of fields like ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE, Data Conservancy, and Artificial Intelligence for Environment & Sustainability. == Software Development Lifecycle == Central to the concept of ecoinformatics is the Software Development Lifecycle (SDLC), a systematic framework for writing, implementing, and maintaining software products. Typically in Ecoinformatics projects, the development pipeline includes data collection, usually from several different environmental data sources, then integrating these data sources together, and then analyzing the data. Here, each step of the SDLC is described in the context of ecoinformatics, per Michener et al. It is important to note that the plan, collect, assure, describes and preserve steps refer to the data collection entity, which can be individual researchers or large data-collection networks, while the discover, integrate, and analyze steps typically refer to the individual researcher. Plan: Ecoinformatics projects require data from several databases. Each database holds different data, and therefore researchers should identify what types of environmental or ecological data they will need to answer their research question. Collect: Data is collected in several different ways. In ecoinformatics, this is usually restricted to manually entering data into a spreadsheet, and parsing data from an existing database. The growth of relational databases has made it easier for ecologists to download relevant data and integrate datasets together Assure: Data entries should be checked thoroughly to validate their accuracy and usability, such as to check for outliers and erroneous points. The same principle applies to data downloaded from datasets. This responsibility falls on both the ecologist downloading the data, and the entity that sets up the data collection system. Describe: An accurate description of the metadata of a dataset that is used in a study should include enough information to deduce the data collection and processing methodology, when the data were collected, why the data were collected, and how the data were stored. This is important for reproducibility, especially for projects that build on each other and may recycle data Preserve: After data is collected by an institutional entity, it should be archived such that it is easily accessible. Ideally, this is in databases that are maintained and not at risk of deprecation Discover: While there are good practices for discovering data to start a research project, this process is often marred by a lack of usable, published data, as researchers may collect data specific to their study, but may not publish this data for wider use. On the data collection end, this can be addressed by better data-sharing practices, such as by linking datasets when publishing papers or studies. On the data procurement end, this can be addressed by more precise data searching, such as using key words to find relevant datasets. Integrate: Synthesizing datasets together can be difficult and labor-intensive, largely due to the methodological differences in data collection. There are several approaches to this, but the best practices typically involve computational approaches, namely using R or Python, to automate the processes and prevent errors Analyze: Data analysis can take several forms, and should be tailored to the specific ecological project. However, all data analysis methods should be well-documented, including the procedure for analysis, justification for analysis methods, and any shortcomings in a specific approach. == Applications of Ecoinformatics Across Ecology == === Ecosystem Ecology === Source: Ecosystem studies, by definition, encompass interactions across the entire life sciences spectrum, from microscopic biochemical reactions to large-scale geological phenomena. As a result, big databases may not be designed specifically for any particular research question, but should be inclusive enough to support most studies. Since ecosystem-level questions require a broad perspective, data-related ecosystem projects would likely incorporate data from several databases. A common framework for incorporating data into ecosystem-level studies is the network science model, in which data collection mechanisms and resources are treated like a large, interconnected network instead of individual entities. The network may include several data collection stations within one databases, or may span across multiple databases. Currently there are several large-scale networks, but they do not generate data on the scale to consider ecology as a big data science. A current challenge for ecoinformatics in ecosystem ecology is that most funding is prioritized for generating new data rather than maintaining existing data infrastructures. Integrating data across the different spatial scales can also be difficult, since each dataset may hold different types of data. === Urban Ecology === Source: The current push for smart cities, and sensor network integration into infrastructure, has positioned as a major source of data for ecological studies. Typical urban ecology questions address the effects of urbanization on the local ecosystem, and how to drive future development to promote urban biodiversity. While sensor networks in cities typically collect environmental data to optimize city processes, they may also be used for ecological initiatives, especially with respect to understanding the complex, multi-layered relationship between cities and their local ecosystem. It can also be used to better understand the current landscape of cities, and identify avenues for rewinding of cities. For example, analyzing mobility patterns can identify areas that may lend themselves well to building parks and green spaces. Bird watching data can also be used to identify the types of bird species in a local area. === Infectious Disease === Source: Like other disciplines of ecology, emerging infectious disease and epidemiology span multiple scales, from understanding the genetics that drive disease trends to large-scale spatiotemporal analyses. As a result, infectious disease studies can incorporate everything from bioinformatics, genetic sequences, amino acid sequences, and environmental observation data. On the micro-scale, these data can then be used to predict infectivity/transmissibility, drug resistance, drug candidates, and mutation sites. On the macro-scale, it can be used to identify societal trends or environmental factors that lend themselves to spillover, locations of infection, and practices that cause disease transmission. == Databases == Source: USGS National Streamflow sensor network GBIF Neotoma Paleobiology database European Vegetation Archive USDA Forest Inventory Analysis TRY BIEN AmeriFlux TEAM iNaturalist NEON GLEON LTER CZO TERN SAEON