Digital curation

Digital curation

Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. It is a process that establishes, maintains, and adds value to repositories of digital data for present and future use. The implementation of digital curation is often carried out by archivists, librarians, scientists, historians, and scholars to ensure users have access to reliable, high-quality resources. Enterprises are also starting to adopt digital curation as a means to improve the quality of information and data within their operational and strategic processes. A successful digital curation initiative will help to mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes various aspects, including digital asset management, data curation, digital preservation, and electronic records management. == Word History == Much like the word archive has layered meanings and uses, the word curation is both a noun and a verb, used originally in the field of museology to represent a wide range of activities, most often associated with collection care, long-term preservation, and exhibition design. Curation can be a reference to physical repositories that store cultural heritage or natural resource collections (e.g., a curatorial repository) or a representation of varied policies and processes involved with the long-term care and management of heritage collections, digital archives, and research data (e.g, curatorial/collections management plans, curation life-cycle, and data curation). Yet curation is also associated with short-term objectives and processes of selection and interpretation for the purposes of presentation, such as for gallery exhibitions and websites, which contribute to knowledge creation. It has also been applied to interaction with social media including compiling digital images, web links, and movie files. The term curation entered the legal framework through federal historic preservation laws, starting with the National Historic Preservation Act of 1966, and was further defined and coded into federal regulations through 36 CFR Part 79: Curation of Federally-owned and Administered Archaeological Collections. Curation has since permeated into an array of disciplines but remains closely tied to heritage and information management. == Core Principles and Activities == The term "digital curation" was first used in the e-science and biological science fields as a means of differentiating the additional suite of activities ordinarily employed by library and museum curators to add value to their collections and enable its reuse from the smaller subtask of simply preserving the data, a significantly more concise archival task. Additionally, the historical understanding of the term "curator" demands more than simple care of the collection. A curator is expected to command academic mastery of the subject matter as a requisite part of appraisal and selection of assets and any subsequent adding of value to the collection through application of metadata. === Principles === There are five commonly accepted principles that govern the occupation of digital curation: Manage the complete birth-to-retirement life cycle of the digital asset. Evaluate and cull assets for inclusion in the collection. Apply preservation methods to strengthen the asset’s integrity and reusability for future users. Act proactively throughout the asset life cycle to add value to both the digital asset and the collection. Facilitate the appropriate degree of access to users. === Methodology === The Digital Curation Center offers the following step-by-step life cycle procedures for putting the above principles into practice: Sequential Actions: Conceptualize: Consider what digital material you will be creating and develop storage options. Take into account websites, publications, email, among other types of digital output. Create: Produce digital material and attach all relevant metadata, typically the more metadata the more accessible the information. Appraise and select: Consult the mission statement of the institution or private collection and determine what digital data is relevant. There may also be legal guidelines in place that will guide the decision process for a particular collection. Ingest: Send digital material to the predetermined storage solution. This may be an archive, repository or other facility. Preservation action: Employ measures to maintain the integrity of the digital material. Store: Secure data within the predetermined storage facility. Access, use, and reuse: Determine the level of accessibility for the range of digital material created. Some material may be accessible only by password and other material may be freely accessible to the public. Routinely check that material is still accessible for the intended audience and that the material has not been compromised through multiple uses. Transform: If desirable or necessary the material may be transferred into a different digital format. Occasional Actions: Dispose: Discard any digital material that is not deemed necessary to the institution. Reappraise: Reevaluate material to ensure that is it still relevant and is true to its original form. Migrate: Migrate data to another format in order to protect data for using better in the future. == Related terms == The term "digital curation" is sometimes used interchangeably with terms such as "digital preservation" and "digital archiving." While digital preservation does focus a significant degree of energy on optimizing reusability, preservation remains a subtask to the concept of digital archiving, which is in turn a subtask of digital curation. For example, archiving is a part of curation, but so are subsequent tasks such as themed collection-building, which is not considered an archival task. Similarly, preservation is a part of archiving, as are the tasks of selection and appraisal that are not necessarily part of preservation. Data curation is another term that is often used interchangeably with digital curation, however common usage of the two terms differs. While "data" is a more all-encompassing term that can be used generally to indicate anything recorded in binary form, the term "data curation" is most common in scientific parlance and usually refers to accumulating and managing information relative to the process of research. Data-driven research of education request the role of information professional gradually develop tradition of digital service to data curation particularly at the management of digital research data. So, while documents and other discrete digital assets are technically a subset of the broader concept of data, in the context of scientific vernacular digital curation represents a broader purview of responsibilities than data curation due to its interest in preserving and adding value to digital assets of any kind. == Challenges == === Rate of creation of new data and data sets === The ever lowering cost and increasing prevalence of entirely new categories of technology has led to a quickly growing flow of new data sets. These come from well established sources such as business and government, but the trend is also driven by new styles of sensors becoming embedded in more areas of modern life. This is particularly true of consumers, whose production of digital assets is no longer relegated strictly to work. Consumers now create wider ranges of digital assets, including videos, photos, location data, purchases, and fitness tracking data, just to name a few, and share them in wider ranges of social platforms. Additionally, the advance of technology has introduced new ways of working with data. Some examples of this are international partnerships that leverage astronomical data to create "virtual observatories," and similar partnerships have also leveraged data resulting from research at the Large Hadron Collider at CERN and the database of protein structures at the Protein Data Bank. === Storage format evolution and obsolescence === By comparison, archiving of analog assets is notably passive in nature, often limited to simply ensuring a suitable storage environment. Digital preservation requires a more proactive approach. Today’s artifacts of cultural significance are notably transient in nature and prone to obsolescence when social trends or dependent technologies change. This rapid progression of technology occasionally makes it necessary to migrate digital asset holdings from one file format to another in order to mitigate the dangers of hardware and software obsolescence which would render the asset unusable. === Underestimation of human labor costs === Modern tools for program planning often underestimate the amount of human labor costs required for adequate digital curation of large collections. As a result cost-benefit assessments often paint an inaccurate picture of both the amount of work involved and the true cost to the institution for bot

Digital on-screen graphic

A digital on-screen graphic, digitally originated graphic (DOG, bug, network bug, on-screen bug or screenbug) is a watermark-like station logo that most television broadcasters overlay over a portion of the screen area of their programs to identify the channel. They are thus a form of permanent visual station identification, increasing brand recognition and asserting ownership of the video signal. The graphic identifies the source of programming, even if it has been time-shifted or recorded. Many of these technologies allow viewers to skip or omit traditional between-programming station identification; thus the use of a DOG enables the station or network to enforce brand identification even when standard commercials are skipped. DOG watermarking helps to reduce off-the-air copyright infringement—for example, the distribution of a current series' episodes on DVD: the watermarked content is easily differentiated from "official" DVD releases, and can help identify not only the station from which the broadcast was captured, but usually the actual date of the broadcast as well. Graphics may be used to identify if the correct subscription is being used for a type of venue. For example, showing Sky Sports within a pub in the United Kingdom requires a more expensive subscription; a channel authorized under this subscription adds a pint glass graphic to the bottom of the screen for inspectors to see. The graphic changes at certain times, making it harder to counterfeit. On the other hand, watermarks pollute the picture, distract viewers' attention and may cover an important piece of information presented in the television program. Extremely bright watermarks may cause screen burn-in or image persistence on some types of television sets such as the now mostly discontinued and rarely used plasma and CRT displays, and currently commonly used OLED and LCD displays. Usage of visually perceptible embedded watermarks requires the program author to have a separate clean copy for archival purposes, but this practice was not common decades ago when watermarking became popular among broadcasters. Watermarks present an issue when archival videos are used for a documentary that strives to create a coherent story. In some cases, watermarks are blurred or digitally removed if possible to clean up the picture. In the absence of visually perceptible watermarks, content control can be ensured with visually imperceptible digital watermarks. In some cases, the graphic also shows the name of the current program. Some television networks may place additional logos or text alongside their DOG to advertise significant upcoming programs. For example, broadcasters of the Olympic Games (most notably United States broadcaster NBC) often add the Olympic rings to their DOG for a period of time leading up to and during the Games. == Usage == == Connections with sponsor tags == Another graphic on television usually connected with sports (particularly in North America, though not in Europe) is the sponsor tag. It shows the logos of certain sponsors, accompanied by some background relevant to the game, the network logo, announcement and music of some kind. == Usage in ham radio and television == In most countries, the ham station is required to periodically identify their amateur-television transmission. Such stations frequently overlay their callsign on the signal instead of placing a card in the background. Most hams use homebuilt devices or old consumer character generators to generate such identifications rather than using graphical superimposes of high cost to do so. Only rarely one can see real graphics, as the callsign is usually written in the "OSD font". == Live DOGs by hobbyists == One of the easiest and most sought-after devices used to generate DOGs by hobbyists is the 1980s vintage Sony XV-T500 video superimposer. This device can luma-key a signal, capture a still frame into memory and then overlay the keyed graphic in one of eight colors onto any CVBS signal. Another method commonly used by hobbyists and even low-budgeted television stations was Amiga computers with genlock interfaces.

Intelligent character recognition

Intelligent character recognition (ICR) is a method of extracting handwritten text from images. It is a more sophisticated type of OCR technology that recognizes different handwriting styles and fonts to intelligently interpret data from physical documents. ICR is used to organize paper-based unstructured data by scanning documents, extracting information, and adapting extracted data for database storage. ICR algorithms collaborate with OCR to automate data entry from forms by removing the need for keystrokes. It has a high degree of accuracy and is a dependable method for processing various handwritten media quickly. == Capabilities == Most ICR software has a self-learning neural network-based algorithms, which automatically update the recognition database for new handwriting patterns. It extends the usefulness of scanning devices for the purpose of document processing, from printed character recognition (a function of OCR) to hand-written matter recognition. Because this process is involved in recognizing hand writing, accuracy levels may, in some circumstances, not be very good but can achieve 97%+ accuracy rates in reading handwriting in structured forms. Often to achieve these high recognition rates several read engines are used within the software and each is given elective voting rights to determine the true reading of characters. In numeric fields, engines which are designed to read numbers take preference, while in alpha fields, engines designed to read hand written letters have higher elective rights. When used in conjunction with a bespoke interface hub, hand-written data can be automatically populated into a back office system avoiding laborious manual keying and can be more accurate than traditional human data entry. === Automated forms processing === An important development of ICR was the invention of automated forms processing in 1993 by Joseph Corcoran who was awarded a patent on the invention. This involved a three-stage process of capturing the image of the form to be processed by ICR and preparing it to enable the ICR engine to give best results, then capturing the information using the ICR engine and finally processing the results to automatically validate the output from the ICR engine. This application of ICR increased the usefulness of the technology and made it applicable for use with real world forms in normal business applications. Modern software applications use ICR as a technology of recognizing text in forms filled in by hand (hand-printed). == Differences between ICR and OCR == === OCR === Optical character recognition (OCR) is commonly considered to apply to any recognition technique that reads machine printed text. An example of a traditional OCR use case would be to translate the characters from an image of a printed document, such as a book page, newspaper clipping, or legal contract, into a separate file that could be searched and updated with a word processor or document viewer. It's also quite helpful for automating the processing of forms. Information can be swiftly extracted from form fields and entered into another application, like a spreadsheet or database, by zonally applying the OCR engine to those fields. Yet, data is typically manually input rather than typed into form fields. Character identification becomes even more challenging while reading handwritten material. The diversity of more than 700,000 printed font variants is tiny compared to the near unlimited variations in hand-printed characters. The recognition program must take into account not just stylistic differences but also the kind of writing implement used, the standard of the paper, errors, hand stability, and smudges or running ink. === ICR === Intelligent character recognition (ICR) makes use of continuously improving algorithms to collect more information about the variances in hand-printed characters and more precisely identify them. ICR, which was created in the early 1990s to aid in the automation of forms processing, enables the conversion of manually entered data into text that is simple to read, search for, and change. When used to read characters that are obviously divided into distinct areas or zones, such as fixed fields seen on many structured forms, it works best. Both OCR and ICR can be configured to read a variety of languages; however, limiting the expected character set to a smaller number of languages will produce better recognition outcomes. ICR cannot read cursive handwriting since it must still be able to assess each character individually. While writing in cursive, it might be difficult to tell where one character ends and another one begins, and there are more differences across samples than when hand-printing text. A more recent method called intelligent word recognition (IWR) focuses on reading a word in context rather than recognizing individual characters. == Intelligent word recognition == Intelligent word recognition (IWR) can recognize and extract not only printed-handwritten information, cursive handwriting as well. ICR recognizes on the character-level, whereas IWR works with full words or phrases. Capable of capturing unstructured information from every day pages, IWR is said to be more evolved than hand print ICR. Not meant to replace conventional ICR and OCR systems, IWR is optimized for processing real-world documents that contain mostly free-form, hard-to-recognize data fields that are inherently unsuitable for ICR. This means that the highest and best use of IWR is to eliminate a high percentage of the manual entry of handwritten data and run-on hand print fields on documents that otherwise could be keyed only by humans.

Yaron Singer

Yaron Singer is a computer scientist and entrepreneur whose work has focused on algorithms, machine learning, optimization, and artificial intelligence security. He was the Gordon McKay Professor of Computer Science and Applied Mathematics at Harvard University and co-founded Robust Intelligence, an artificial intelligence security company acquired by Cisco Systems in 2024. == Education == Singer received a PhD in computer science from the University of California, Berkeley under the supervision of Christos Papadimitriou. == Academic career == Singer was a postdoctoral research scientist at Google Research. Singer joined the computer science faculty at Harvard John A. Paulson School of Engineering and Applied Sciences in 2013 and became a full professor in 2019. == Research == Singer's research has focused on algorithms and machine learning, including optimization, algorithmic mechanism design, and adversarial machine learning. His doctoral work studied computational limits in algorithmic mechanism design, including truthful mechanisms and budget-feasible mechanisms. In optimization, Singer co-authored work on submodular optimization and parallel algorithms for large-scale data processing. Singer has also worked on adversarial machine learning, including attacks that use small perturbations or noise to affect the behavior of machine learning systems. == Entrepreneurship == In 2020, Singer co-founded Robust Intelligence Kojin Oshiba. Harvard SEAS reported that the company raised $14 million that year, and TechCrunch reported in 2021 that the company raised a $30 million Series B round led by Tiger Global. The company developed tools for testing AI models and detecting failures before or during deployment. TechCrunch described its RIME product as using an "AI firewall" to stress-test models. In 2024, Cisco Systems acquired Robust Intelligence. CTech reported that Cisco had not disclosed the purchase amount when the acquisition was announced, and later reported the deal value as $400 million. In 2025, Cisco launched Foundation AI, a Cisco team focused on AI for cybersecurity. Techzine reported that Singer led the team and was Cisco's VP of AI and Security. == Recognition == Singer has received a Sloan Research Fellowship, an NSF CAREER Award, a Google Faculty Research Award, and a Facebook Faculty Award. As a graduate student, he received Microsoft Research and Facebook fellowships. In 2012, he received the Best Student Paper Award at the ACM International Conference on Web Search and Data Mining for "How to Win Friends and Influence People, Truthfully: Influence Maximization Mechanisms for Social Networks."

AI Marketing Tools: Free vs Paid (2026)

Shopping for the best AI marketing tool? An AI marketing tool is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI marketing tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

Google Vids

Google Vids (not to be confused with Google Video) is an online timeline-based video editing application included as part of the Google Workspace suite. It is designed to help users create informational videos for work-related purposes. The app uses Google's Gemini technology to enable users to create video storyboards manually or with AI assistance using simple prompts. Features include uploading media, choosing stock videos, images, background music, and a voiceover feature with script generation using AI. The app is currently in testing with select Google Workspace Labs users. Like Kapwing and Capcut, Google Vids is primarily for creating work-related content like sales training, onboarding videos, vendor outreach, and project updates. It offers various styles and templates, collaborative features, and is not limited to videos without the short integration at the moment. Google Vids was announced on April 9, 2024. In September 2025, Google began to roll out a basic version of the application to Google Workspace users.

Best AI Code Generators in 2026

Comparing the best AI code generator? An AI code generator is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI code generator slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.