AI Data Trainer/annotator

AI Data Trainer/annotator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Document-oriented database

    Document-oriented database

    A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases, and the popularity of the term "document-oriented database" has grown alongside the adoption of NoSQL itself. XML databases are a subclass of document-oriented databases optimized for XML documents. Graph databases are similar, but add another layer, the relationship, which allows them to link documents for rapid traversal. Document-oriented databases are conceptually an extension of the key–value store, another type of NoSQL database. In key-value stores, data is treated as opaque by the database, whereas document-oriented systems exploit the internal structure of documents to extract metadata and optimize storage and queries. Although in practice the distinction can be minimal due to modern tooling, document stores are designed to provide a richer programming experience with modern programming techniques. Document databases differ significantly from traditional relational databases (RDBs). Relational databases store data in predefined tables, often requiring an object to be split across multiple tables. In contrast, document databases store all information for a given object in a single document, with each document potentially having a unique structure. This design eliminates the need for object-relational mapping when loading data into the database. == Documents == The central concept of a document-oriented database is the notion of a document. Although implementations vary in their specific definitions, document-oriented databases generally treat documents as self-contained units that encapsulate and encode data in a standardized format. Common encoding formats include XML, YAML, JSON, as well as binary representations such as BSON. Documents in a document store are equivalent to the programming concept of an object. They are not required to adhere to a fixed schema, and documents within the same collection may contain different fields or structures. Fields may be optional, and documents of the same logical type may differ in composition. For example, the following illustrates a document encoded in JSON: A second document might be encoded in XML as: The two example documents share some structural elements but also contain unique fields. The structure, text, and other data within each document are collectively referred to as the document's content and can be accessed or modified using retrieval or editing operations. Unlike relational databases, in which each record contains the same fields and unused fields are left empty, document-oriented databases do not require uniform fields across documents. This design allows new information to be added to some documents without affecting the structure of others. Document databases often support the storage of additional metadata alongside the document content. Such metadata may relate to organizational features, security, indexing, or other implementation-specific features. === CRUD operations === The core operations supported by a document-oriented database for manipulating documents are similar to those in other databases. Although terminology is not perfectly standardized, these operations are generally recognized as Create, Read, Update, and Delete (CRUD). Creation (C): Adds a new document to the database. Retrieval (R): Retrieves documents or fields based on queries. Update (U): Modifies the contents of existing documents. Deletion (D): Removes documents from the database. === Keys === Documents in a document-oriented database are addressed via a unique identifier. This identifier, often a string, URI, or path, can be used to retrieve the document from the database. Most document stores maintain an index on the key to optimize retrieval, and in some implementations the key is required when creating or inserting a new document. === Retrieval === In addition to key-based access, document-oriented databases typically provide an API or query language that enables retrieval based on document content or associated metadata. For example, a query may return all documents with a specific field matching a given value. The available query features, indexing options, and performance characteristics vary across implementations. Document stores differ from key-value stores in that they exploit the internal structure and metadata of stored documents. In many key-value stores, values are treated as opaque or "black-box" data, meaning the database system does not interpret their internal structure. By contrast, document-oriented databases can classify and interpret document content. This enables queries that distinguish between types of data––for example, retrieving all phone numbers containing "555" without also matching a postal code such as "55555." === Editing === Document databases typically provide mechanisms for updating or editing the content or metadata of a document. Updates may involve replacing the entire document or modifying individual elements or fields within the document. === Organization === Document database implementations support a variety of methods for organizing documents, including: Collections: Groups of documents. Depending on the implementation, a document may be required to belong to a single collection or may be allowed in multiple collections. Tags and non-visible metadata: Additional data stored outside the main document content. Directory hierarchies: Documents organized in a tree-like structure, often based on path or URI. These organizational structures may differ between logical and physical representations (e.g. on disk or in memory). == Relationship to other databases == === Relationship to key-value stores === A document-oriented database can be viewed as a specialized form of key-value store, which is itself a category of NoSQL database. In a basic key-value store, the stored value is typically treated as opaque by the database system. By contrast, a document-oriented database provides APIs or a query and update language that allows queries and modifications based on the internal structure of the document. For users who do not require advanced query, retrieval, or update capabilities, the distinction between document-oriented databases and key-value stores may be minimal. === Relationship to search engines === Some search engine and information retrieval systems, such as Apache Solr and Elasticsearch, provide document storage and support core document operations. As a result, they may meet certain functional definitions of a document-oriented database, although their primary design goals differ. === Relationship to relational databases === In a relational database, data is organized into predefined types represented as tables. Each table contains rows (records) with a fixed set of columns (fields), so all records in a table share the same structure. Administrators typically define indexes on selected fields to improve query performance. A central principle of relational database design is database normalization, in which data that might otherwise be repeated is stored in separate tables and linked using keys. When records in different tables are related, a foreign key is used to associate them. For example, an address book application may store a contact's name, image, phone numbers, mailing addresses, and email addresses. In a normalized relational design, separate tables might be created for contacts, phone numbers, and email addresses. The phone number table would include a foreign key referencing the associated contact. To reconstruct a complete contact record, the database retrieves related information from each table using the foreign keys and combines it into a single record. In contrast, a document-oriented database stores all data related to an object within a single document, and stored in the database as a single entry. In the address book example,the contact's name, image, and contact information may be stored together in one document. The document is retrieved using a unique key, and all related information is returned together, without needing to look up multiple tables. A key difference between the document-oriented and relational models is that the data formats are not predefined in the document case. In most cases, any sort of document can be stored in a database, and documents can change in type and form over time. For example, a new field such as COUNTRY_FLAG can be added to new documents as they are inserted without affecting existing documents. To aid retrieval, document-oriented systems generally allow the administrator to provide hints to the database for locating certain types of information. These hints work in a similar fashion to indexes in relational databases. Many systems also allow additional metadata outside the content of the document itself

    Read more →
  • RemObjects Software

    RemObjects Software

    RemObjects Software is an American software company founded in 2002 by Alessandro Federici and Marc Hoffman. It develops and offers tools and libraries for software developers on a variety of development platforms, including Embarcadero Delphi, Microsoft .NET, Mono, and Apple's Xcode. == History == RemObjects Software was founded in the summer of 2002. Its first product was RemObjects SDK 1.0 for Delphi, the company's remoting solution which is now in its 6th version. In late 2003 RemObjects expanded its product portfolio to add Data Abstract for Delphi, a multi-tier database framework built on top of the SDK. In 2004, Carlo Kok, who would eventually become Chief Compiler Architect for Oxygene, joined the company, adding the open source Pascal Script library for Delphi to the company's portfolio. Initial development began on Oxygene (which was then named Chrome) based on Carlo's experience from writing the widely used Pascal Script scripting engine. Towards the end of 2004, RemObjects SDK for .NET was released, expanding the remoting framework to its second platform. Chrome 1.0 was released in mid-2005, providing support for .NET 1.1 and .NET 2.0, which was still in beta at the time - making Chrome the first shipping language for .NET that supported features such as generics. It was followed by Chrome 1.5 when .NET 2.0 shipped in November of the same year. 2005 also saw the expansion of Data Abstract to .NET as a second platform. Data Abstract for .NET was the first RemObjects product (besides Oxygene itself) to be written in Oxygene. Hydra 3.0, was released for .NET in December 2006, bringing a paradigm shift to the product, away from a regular plugin framework, and focusing on interoperability between plugins and host applications written in either .NET or Delphi/Win32, essentially enabling the use of both managed and unmanaged code in the same project. In Summer 2007, RemObjects released Chrome 'Joyride' which added official support for .NET 3.0 and 3.5. Chrome once again was the first language to ship release level support for new .NET framework features supported by that runtime - most importantly Sequences and Queries (aka LINQ). Development continued and in May 2008 Oxygene 3.0 was released, dropping the "Chrome" moniker. Oxygene once again brought major language enhancements, including extensive support for concurrency and parallel programming as part of the language syntax. In October 2008, RemObjects Software and Embarcadero Technologies announced plans to collaborate and ship future versions of Oxygene under the Delphi Prism moniker, later changed to Embarcadero Prism. The first of these releases of Prism became available in December 2008. Over the course of 2009, RemObjects software completed the expansion of its Data Abstract and RemObjects SDK product combo to a third development platform - Xcode and Cocoa, for both Mac OS X and iPhone SDK client development. RemObjects SDK for OS X shipped in the spring of 2009, followed by Data Abstract for OS X in the fall. In 2011, Oxygene was expanded to add support for the Java platform, in addition to NET. In 2014, RemObjects introduced a C# compiler which runs as a Visual Studio 2013 plugin, that can output code for iOS, MacOS (Cocoa) and Android, in addition to .NET compatible code. In addition, an IDE called Fire was introduced for macOS which works with their C# and Oxygene compilers. Together, the compiler supporting both Oxygene and C# was rebranded as the Elements Compiler, with CE# having the Code name "Hydrogene". In February 2015, RemObjects introduced a beta version of a Swift compiler called Silver as part of its Elements effort. Silver, too, could create code that will execute on Android, the JVM, .NET platform and also create native Cocoa code. Silver added new features to the Swift language, such as exceptions and has a few differences and limitations compared to Apple's Swift. In February 2020, support for the Go programming language was introduced with RemObjects Gold, including the ability to compile Go language code for all Elements platforms, and a port of the extensive Go Base Library available to all Elements languages. In 2021, Mercury was added to the Elements compiler as the sixth language, providing a future for the Visual Basic .NET language recently deprecated by Microsoft. Mercury supports building and maintaining existing VB.NET projects, as well as using the language for new projects both on .NET and the other platforms. == Commercial products == Elements is a development toolchain that targets .NET runtime, Java/Android virtual machines, the Apple ecosystem (macOS, iOS, tvOS), WebAssembly and native and Windows/Linux/Android NDK processor-native machine code in conjunction with a runtime library that does automatic garbage collection on non-ARC environments and ARC on ARC-based environments, such as iOS and MacOS. Because Java, C#, Swift, and Oxygene all can import each other's APIs, Elements effectively functions as Java bonded together with C# bonded together with Swift bonded together with Oxygene as a confederation of languages cooperating together quite intimately. Oxygene, a unique programming language based on Object Pascal, which can import Java, C#, and Swift APIs from the runtime of the target operating system; RemObjects C#, an implementation of C# programming language, which can import Java, Swift, and Oxygene APIs from the runtime of the target operating system and which is intended as a competitor of Xamarin, but Hydrogene's C# targets JVM bytecode instead of Xamarin's C# compiling to only Common Language Infrastructure byte code and needing the accompanying Mono Common Language Runtime to be present in such JVM-centric environments as Android; Silver, a free implementation of the Swift programming language, which can import Java, C#, and Oxygene APIs from the runtime of the target operating system; Iodine, an implementation of the Java programming language. Gold, an implementation of the Go programming language. Mercury, an implementation of the Visual Basic .NET programming language. Fire an integrated development environment for macOS. Water an integrated development environment for Windows. Data Abstract Remoting SDK, a.k.a. RemObjects SDK Hydra Oxfuscator Oxidizer, an automatic translator from Java, C#, Objective-C, and Delphi to Oxygene, from Java, Objective-C, and C# to Swift, and from Java and Objective-C to C#. == Open source projects == Train is an open-source JavaScript-based tool for building and running build scripts and automation. Internet Pack for .NET is a free, open source library for building network clients and servers using TCP and higher level protocols such as HTTP or FTP, using the .NET or Mono platforms. It includes a range of ready to use protocol implementations, as well as base classes that allow the creation of custom implementations. RemObjects Script for .NET is a fully managed ECMAScript implementation for .NET and Mono. Pascal Script for Delphi is a widely used implementation of Pascal as scripting language. == Involvement of other projects == The Oxygene Compiler Oxygene is a language based on Object Pascal and designed to efficiently target the Microsoft .NET and Mono managed runtimes; it expands Object Pascal with a range of additional language features, such as Aspect Oriented Programming, Class Contracts and support for Parallelism. It integrates with the Microsoft Visual Studio and MonoDevelop IDEs.

    Read more →
  • TinEye

    TinEye

    TinEye is a reverse image search engine developed and offered by Idée, Inc., a company based in Toronto, Ontario, Canada. It was the first image search engine on the web to use image identification technology rather than keywords, metadata or watermarks. TinEye allows users to search not using keywords but with images. Upon submitting an image, TinEye creates a "unique and compact digital signature or fingerprint" of the image and matches it with other indexed images. This procedure is able to match even heavily edited versions of the submitted image, but will not usually return similar images in the results. == History == Idée, Inc. was founded by Leila Boujnane and Paul Bloore in 1999. Idée launched the service on May 6, 2008 and went into open beta in August that year. While computer vision and image identification research projects began as early as the 1980s, the company claims that TinEye is the first web-based image search engine to use image identification technology. The service was created with copyright owners and brand marketers as the intended user base, to look up unauthorized use and track where the brands are showing up respectively. In June 2014, TinEye claimed to have indexed more than five billion images for comparisons. However, this is a relatively small proportion of the total number of images available on the World Wide Web. As of September 2025, TinEye's search results claim to have over 77.6 billion images indexed for comparison. == Technology == A user uploads an image to the search engine (the upload size is limited to 20 MB) or provides a URL for an image or for a page containing the image. The search engine will look up other usage of the image in the internet, including modified images based upon that image, and report the date and time at which they were posted. TinEye does not recognize outlines of objects or perform facial recognition, but recognizes the entire image, and some altered versions of that image. This includes smaller, larger, and cropped versions of the image. TinEye has shown itself capable of retrieving different images from its database of the same subject, such as famous landmarks. TinEye is capable of searching for images in JPEG, PNG, WebP, GIF, BMP and TIFF format. Results generated from TinEye include the total number of matches in their database, a preview image, and the URL to each match. TinEye can sort results by best match, most changed, biggest image, newest, and oldest. User registration is optional and offers storage of the user's previous queries. Other features include embeddable widgets and bookmarklets. TinEye has also released their commercial API. == Usage == TinEye's ability to search the web for specific images (and modifications of those images) makes it a potential tool for the copyright holders of visual works to locate infringements on their copyright. It also creates a possible avenue for people who are looking to make use of imagery under orphan works to find the copyright holders of that imagery. Being that orphan works can be defined as "copyrighted works whose owners are difficult or impossible to identify and/or locate," the use of TinEye could potentially remove the orphan work status from online images that can be found in its database. === Fact-checking === It has been recommended by fact-checkers as a useful resource in attempts to verify the origin of images. As of 2019, TinEye specialized in copyright violations and finding exact versions of images online.

    Read more →
  • PlantUML

    PlantUML

    PlantUML is an open-source tool allowing users to create diagrams from a plain text language. Besides various UML diagrams, PlantUML has support for various other software development related formats (such as Archimate, Block diagram, BPMN, C4, Computer network diagram, ERD, Gantt chart, Mind map, and WBD), as well as visualisation of JSON and YAML files. The language of PlantUML is an example of a domain-specific language. Besides its own DSL, PlantUML also understands AsciiMath, Creole, DOT, and LaTeX. It uses Graphviz software to lay out its diagrams and Tikz for LaTeX support. Images can be output as PNG, SVG, LaTeX and even ASCII art. PlantUML has also been used to allow blind people to design and read UML diagrams. == Applications that use PlantUML == There are various extensions or add-ons that incorporate PlantUML. Atom has a community maintained PlantUML syntax highlighter and viewer. Confluence wiki has a PlantUML plug-in for Confluence Server, which renders diagrams on-the-fly during a page reload. There is an additional PlantUML plug-in for Confluence Cloud. Doxygen integrates diagrams for which sources are provided after the startuml command. Eclipse has a PlantUML plug-in. Google Docs has an add-on called PlantUML Gizmo that works with the PlantUML.com server. IntelliJ IDEA can create and display diagrams embedded into Markdown (built-in) or in standalone files (using a plugin). LaTeX using the Tikz package has limited support for PlantUML. LibreOffice has Libo_PlantUML extension to use PlantUML diagrams. MediaWiki has a PlantUML plug-in which renders diagrams in pages as SVG or PNG. Microsoft Word can use PlantUML diagrams via a Word Template Add-in. There is an additional Visual Studio Tools for Office add-in called PlantUML Gizmo that works in a similar fashion. NetBeans has a PlantUML plug-in. Notepad++ has a PlantUML plug-in. Obsidian has a PlantUML plug-in. Org-mode has a PlantUML org-babel support. Rider has a PlantUML plug-in. Sublime Text has a PlantUML package called PlantUmlDiagrams for Sublime Text 2 and 3. Visual Studio Code has various PlantUML extensions on its marketplace, most popular being PlantUML by jebbs. Vnote open source notetaking markdown application has built in PlantUML support. Xcode has a community maintained Source Editor Extension to generate and view PlantUML class diagrams from Swift source code. == Text format to communicate UML at source code level == PlantUML uses well-formed and human-readable code to render the diagrams. There are other text formats for UML modelling, but PlantUML supports many diagram types, and does not need an explicit layout, though it is possible to tweak the diagrams if necessary. +--------------------------------------+ | TEDx Talks Recommendation | | System | +--------------------------------------+ | +----------------------------------+ | | | Visitor | | | +----------------------------------+ | | | + View Recommended Talks | | | | + Search Talks | | | +----------------------------------+ | +--------------------------------------+ | | V +--------------------------------------+ | Authenticated User | +--------------------------------------+ | +----------------------------------+ | | | User | | | +----------------------------------+ | | | + View Recommended Talks | | | | + Search Talks | | | | + Save Favorite Talks | | | +----------------------------------+ | +--------------------------------------+ | | V +--------------------------------------+ | Admin | +--------------------------------------+ | +----------------------------------+ | | | Admin | | | +----------------------------------+ | | | + CRUD Talks | | | | + Manage Users | | | +----------------------------------+ | +--------------------------------------+

    Read more →
  • Gerrit (software)

    Gerrit (software)

    Gerrit ( GERR-it) is a free, web-based team code collaboration tool. Software developers in a team can review each other's modifications on their source code using a Web browser and approve or reject those changes. It integrates closely with Git, a distributed version control system. Gerrit is a fork of Rietveld, a code review tool for Subversion. Both are named after Dutch designer Gerrit Rietveld. == History == Originally written in Python like Rietveld, it is now written in Java (Java EE Servlet) with SQL since version 2 and a custom-made Git-based database (NoteDb) since version 3. In versions 2.0–2.16 Gerrit used Google Web Toolkit for its browser-based front-end. After being developed and used in parallel with GWT for versions 2.14–2.16, a new Polymer web UI replaced the GWT UI in version 3.0.

    Read more →
  • Cobocards

    Cobocards

    CoboCards is a web application for creation, study and sharing of flashcards. They also provide mobile application for Android and iOS mobile devices, to help study of flashcards on the move. Based on the freemium model, CoboCards provides users a free account with two card sets compared to paid subscription with premium features such as unlimited card sets, Leitner system based trainer and collaborative learning. == History == CoboCards is a project of Jamil Soufan and Tamim Swaid. Tamim Swaid has developed the concept and interface of a collaboratively usable e-learning platform in his diploma thesis at the University of Applied Sciences in February 2007. In January 2010 they founded the CoboCards GmbH (limited company) together with Ali Yildirim. CoboCards is supported by its strategic partners Prof. Schroeder (RWTH Aachen University), Prof. Oliver Wrede (University for Applied Sciences Aachen) and Prof. Klaus Gasteier (University of Arts Berlin). With the idea of creating and studying flashcards online and offering an active control of learning progress they won the start2grow business idea competition in September 2009 (€25.000 ). Additionally CoboCards was funded by German Authorities with approximately €100.000 .

    Read more →
  • JDoodle

    JDoodle

    JDoodle is a cloud-based online integrated development environment and compiler platform that supports execution of source code in 70+ programming languages including Java, Python, C/C++, PHP, Ruby, Perl, HTML, and more. It provides zero‑setup code for compilation, execution, and sharing via a web browser interface. == Features == Provides real‑time collaboration and code embedding via shareable URLs and APIs Offers an integrated terminal interface supporting database engines such as MySQL and MongoDB. JDroid — AI‑assistant to generate code snippets, optimize code, and assist debugging. == Languages and frameworks supported ==

    Read more →
  • Clara.io

    Clara.io

    Clara.io is web-based freemium 3D computer graphics software developed by Exocortex, a Canadian software company. The free or "Basic" component of their freemium offering, however, places severe restrictions, such as on saving models and importing texture maps, which are undisclosed in the company's own descriptions of their plans.vf TMN == History == Clara.io was announced in July 2013, and first presented as part of the official SIGGRAPH 2013 program later that month. By November 2013, when the open beta period started, Clara.io had 14,000 registered users. Clara.io claimed to have 26,000 registered users in January 2014, which grew to 85,000 by December 2014. Clara.io was permanently shut down on December 31, 2022, but the site is currently still partially functional to logged-in users. == Features == Polygonal modeling Constructive solid geometry Key frame animation Skeletal animation Hierarchical scene graph Texture mapping Photorealistic rendering (streaming cloud rendering using V-Ray Cloud) Scene publishing via HTML iframe embedding FBX, Collada, OBJ, STL and Three.js import/export Collaborative real-time editing Revision control (versioning & history) Scripting, Plugins & REST APIs 3D model library Unlisted and Private scenes (paid subscriptions only). == Technology == Clara.io is developed using HTML5, JavaScript, WebGL and Three.js. Clara.io does not rely on any browser plugins and thus runs on any platform that has a modern standards compliant browser. == Screenshots ==

    Read more →
  • Environmental impact of AI

    Environmental impact of AI

    The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl

    Read more →
  • Open Threat Exchange

    Open Threat Exchange

    Open Threat Exchange (OTX) is a crowd-sourced computer-security platform. It has more than 180,000 participants in 140 countries who share more than 19 million potential threats daily. It is free to use. Founded in 2012, OTX was created and is run by AlienVault (now AT&T Cybersecurity), a developer of commercial and open source solutions to manage cyber attacks. The collaborative threat exchange was created partly as a counterweight to criminal hackers successfully working together and sharing information about viruses, malware and other cyber attacks. == Components == OTX is cloud-hosted. Information sharing covers a wide range of security-related issues, including viruses, malware, intrusion detection and firewalls. Its automated tools cleanse, aggregate, validate and publish data shared by participants. The OTX platform validates the data, then strips the information identifying the participating contributor. In 2015, OTX 2.0 added a social network, enabling members to share, discuss and research security threats, including via a real-time threat feed. Users can share the IP addresses or websites from where attacks originated or look up specific threats to see if anyone has already left such information. Users can subscribe to a “Pulse,” an analysis of a specific threat, including data on IoC, impact, and the targeted software. Pulses can be exported as STIX, JSON, OpenloC, MAEC and CSV, and can be used to update local security products automatically. Users can up-vote and comment on specific pulses to assist others in identifying the most important threats. OTX combines social contributions with automated machine-to-machine tools that integrate with major security products such as firewalls and perimeter security hardware. The platform can read security reports in .pdf, .csv, .json and other open formats. Relevant information is extracted automatically, assisting IT professionals in analyzing data more readily. Specific OTX components include a dashboard with details about the top malicious IPs around the world and to check the status of specific IPs; notifications should an organization's IP or domain be found in a hacker forum, blacklist or be listed by OTX; and a feature to review log files to determine if there has been communication with known malicious IPs. In 2016, AlienVault released a new version of OTX, allowing participants to create private communities and discussion groups to share information on threats only within the group. The feature is intended to facilitate more in-depth discussions on specific threats, particular industries, and different regions worldwide. Threat data from groups can also be distributed to subscribers of managed service providers using OTX." == Technology == OTX is a large data platform that integrates natural language processing and machine learning. It uses these features to facilitate the collection and correlation of data from many sources, including third-party threat feeds, websites, external APIs and local agents. == Partners == In 2015, AlienVault partnered with Intel to coordinate real-time threat information on OTX. A similar deal with Hewlett Packard was announced the same year. == Competitors == Both Facebook and IBM have threat exchange platforms. The Facebook ThreatExchange is in beta and requires an application or invitation to join. IBM launched IBM X-Force Exchange in April 2015.

    Read more →
  • SeaTable

    SeaTable

    SeaTable is a no-code platform that allows users to develop and implement business processes. The cloud collaboration service SeaTable is marketed by the GmbH of the same name with headquarters in Mainz and additional offices in Berlin and Beijing, and developed by the same company as Seafile. == History == SeaTable is a collaborative database and low-code application platform developed as part of a joint venture between Seafile Ltd., a software company based in Guangzhou, China, and SeaTable GmbH, a German firm headquartered in Mainz. Founded in 2020, the project represents the international expansion of Seafile, a Chinese developer originally known for its file synchronization and sharing software. While SeaTable's cloud services and European client operations are managed by the German entity, the platform itself is developed in China by Seafile's engineering team. This cross-border structure, described by TechCrunch as an “unconventional path” for a Chinese startup expanding abroad, reflects Seafile's effort to maintain its product development in China while addressing growing scrutiny in Western markets over data governance and corporate control. In 2021, an innovation project led by the Cyber Innovation Hub at the IT School of the German Armed Forces started to evaluate the possibilities of a large-scale deployment at the German Armed Forces. The evaluation project is currently still ongoing. In 2022, SeaTable is optimizing its database backend to allow millions of records within one base in the future. The focus of development is increasingly on automation and visualization. In 2025, SeaTable introduced AI-powered automations with version 6. The update enabled the integration of large language models (LLMs) for text analysis and automated decision-making. SeaTable operates a self-hosted LLM on servers provided by Hetzner (Germany), while self-hosted deployments can connect to any compatible model. == Features == SeaTable combines the traditional capabilities of a spreadsheet such as Excel and supplements them with a wide range of functions for process automation and visualization as well as a fully comprehensive API. SeaTable is not a pure cloud solution, but can alternatively be installed on a private server and operated completely autonomously. In this way, the owner retains full control over their own data. The installation is done via Docker on a Linux server. == Security and privacy == While most no-code platforms exist only as SaaS solutions, SeaTable describes itself as a data-sparse European solution. While initially the SeaTable Cloud was hosted on Amazon AWS, the move to the German data centers of Swiss provider Exoscale then took place in May 2021. This was followed by the replacement of the Freshdesk cloud ticketing system with a self-hosted Zammad instance, and since April 2022 SeaTable has completely dispensed with all tracking cookies on its website.

    Read more →
  • Apache Kudu

    Apache Kudu

    Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable fast analytics on fast data. The open source project to build Apache Kudu began as internal project at Cloudera. The first version Apache Kudu 1.0 was released 19 September 2016. == Comparison with other storage engines == Kudu was designed and optimized for OLAP workloads. Like HBase, it is a real-time store that supports key-indexed record lookup and mutation. Kudu differs from HBase since Kudu's datamodel is a more traditional relational model, while HBase is schemaless. Kudu's "on-disk representation is truly columnar and follows an entirely different storage design than HBase/Bigtable".

    Read more →
  • E-gree (app)

    E-gree (app)

    E-gree is a legal app that became well known in 2020. It was the first app of its kind to protect users against a number of dating-related issues, including revenge porn. == Background == The app was co-founded by Araz Mamet, Keith Fraser and Ilya Flaks. The app focuses on privacy, with users being able to set up various contracts to protect themselves following a breakup, or while dating. This notably included signing an NDA when sexting. The app received investment from a number of notable people and companies, including Natalia Vodianova.

    Read more →
  • Real-Time UML

    Real-Time UML

    Real-Time UML (RTUML) refers to the application of the Unified Modelling Language (UML) for the analysis, design, and implementation of real-time and embedded systems, where timing constraints, concurrency, and resource management are critical. It extends standard UML with profiles, notations, and semantics to handle hard and soft real-time requirements, such as modelling predictable response times and fault tolerance. RTUML is not a separate language but a methodology leveraging UML diagrams (e.g., statecharts, sequence diagrams) for time-sensitive applications like automotive controls, avionics, and medical devices. The term is closely associated with Bruce Powel Douglass, who popularised it through his books and the Harmony process for embedded software development. As of 2025, RTUML remains relevant in industries requiring certified systems, though its adoption varies with agile methodologies and model-driven engineering tools. == Background == Real-Time UML emerged in the late 1990s as UML was standardized by the Object Management Group (OMG) in 1997, addressing the need for object-oriented modeling in real-time systems previously dominated by procedural languages like C. Traditional real-time development relied on "bare metal" programming or theoretical models, but RTUML introduced visual notations for object structure, behaviour, and timing. Bruce Powel Douglass’s 1999 book, Real-Time UML: Developing Efficient Objects for Embedded Systems, formalised the approach, emphasising statecharts for concurrency and timing constraints. Later editions (2004, 2006) incorporated UML 2.0 features like activity and timing diagrams, aligning with OMG’s Real-Time Profile (now part of MARTE—Modelling and Analysis of Real-Time and Embedded Systems). The Harmony process integrates RTUML with executable models for simulation and code generation. RTUML addresses hard real-time systems (e.g., strict deadlines in avionics) versus soft real-time (e.g., media streaming), using UML extensions for schedulability analysis. == Key concepts == RTUML adapts UML diagrams and techniques for real-time needs: Statecharts and Behaviour Modelling: Extended state machines model reactive behaviour, using and-states for concurrency, pseudostates for transitions, and timing constraints (e.g., {duration < 10ms}). Examples include cardiac pacemaker models. Sequence and Interaction Diagrams: Capture message timing, priorities, and resource allocation in multi-threaded systems. Architectural Patterns: Define logical and physical architectures with active objects for concurrency and patterns like observer or publisher-subscriber. Timing and Constraints: Use Object Constraint Language (OCL) for specifying deadlines and priorities. Profiles and Extensions: OMG’s UML Profile for Schedulability, Performance, and Time (SPT) and MARTE add stereotypes like RT::ActiveObject. These support iterative development, from requirements to deployment, often with tools like IBM Rhapsody or Enterprise Architect. == Applications == RTUML is used in: Embedded Systems: Modelling automotive ECUs or UAV controls. Avionics and Defence: DO-178C-compliant designs for fault tolerance. Medical Devices: Pacemakers or ventilators with precise timing. Industrial Automation: RTOS task visualisation via sequence diagrams. Tools like IBM Rhapsody support RTUML for model-based development and code generation in C/C++. == Criticism and adoption == RTUML’s complexity can overwhelm simple systems, and its use in agile environments is limited, where lightweight diagrams are preferred. Surveys indicate UML (including RTUML) is used in 30–50% of embedded projects, often for documentation rather than full model-driven engineering. It remains standard in academia and certified industries like aerospace.

    Read more →
  • Cloud-Based Secure File Transfer

    Cloud-Based Secure File Transfer

    Cloud-Based Secure File Transfer is a managed or hosted file transfer service that provides cloud storage that can be accessed via SSH File Transfer Protocol (SFTP). These services allow secure, reliable file transfers while offering the scalability, redundancy, and high availability of cloud infrastructure. == Technical overview == The evolution of file transfer protocols began with File Transfer Protocol (FTP) and SSH File Transfer Protocol (SFTP). SFTP offered enhanced security through the use of SSH (Secure Shell) encryption, which addressed many of the security concerns associated with traditional FTP. Over time, as businesses increasingly adopted cloud infrastructure, the demand for services that integrate secure file transfer with cloud storage led to the rise of Cloud-Based Secure File Transfer services. These services combine the benefits of secure, encrypted file transfer with the scalability and flexibility of cloud-based storage systems. Traditional on-premises SFTP typically involves setting up and managing physical or virtual servers to handle file transfers. In contrast, Cloud-Based Secure File Transfer utilizes managed cloud infrastructure, such as AWS EC2, Azure VMs, or Google Cloud, to automate scaling, ensure redundancy, and provide high availability. These cloud environments can be configured to automatically scale with demand, enabling businesses to handle large volumes of data transfers without the need for extensive physical hardware. == Features == Scalability and availability: Cloud-Based Secure File Transfer services are inherently scalable, with features like load balancing, multi-region deployments, and auto-scaling groups that adjust resources in response to traffic spikes. This ensures that the system can handle varying workloads and provides continuous availability, even during high-demand periods. Cost-effectiveness: By eliminating the need for physical infrastructure and reducing ongoing server maintenance costs, Cloud-Based Secure File Transfer services offer significant cost savings compared to traditional on-premises services. Cloud providers typically offer pay-as-you-go pricing models, where users only pay for the resources they use, further optimizing costs. Security and compliance: Cloud-Based Secure File Transfer products offer strong security measures, including end-to-end encryption, key management, detailed logging, and auditing. These services are often compliant with industry regulations such as HIPAA (Health Insurance Portability and Accountability Act), GDPR (General Data Protection Regulation), and SOC 2 (System and Organization Controls), ensuring that data transfers meet necessary security and privacy standards. == Cloud-Based Secure File Transfer providers == == Uses == Cloud-Based Secure File Transfer is used across various industries to securely transfer sensitive data and integrate into business workflows. In healthcare, Cloud-Based Secure File Transfer is essential for securely transferring electronic Protected Health Information (ePHI), ensuring compliance with regulations like HIPAA. In financial institutions, it is used to protect sensitive financial data during transfer, maintaining privacy and security. Data analytics also benefits from Cloud-Based Secure File Transfer, offering a secure and efficient method for transferring large datasets between systems or partners. Technically, Cloud-Based Secure File Transfer is often integrated into enterprise workflows through automated file transfers, using scripting or APIs. It also plays a key role in cloud backup and disaster recovery, ensuring that files are securely transferred and stored in cloud environments, which supports business continuity. However, businesses must address certain implementation challenges. Despite its secure design, Cloud-Based Secure File Transfer is not immune to risks such as misconfigured SSH keys, improper access control, or inadequate encryption. Regular security audits and careful configuration management are necessary to minimize the risk of data breaches. Additionally, integrating Cloud-Based Secure File Transfer with legacy systems can present challenges, such as incompatible APIs or outdated authentication methods. == Comparisons with related technologies == Cloud-Based Secure File Transfer differs from traditional SFTP primarily in its deployment and management model. Traditional SFTP services are typically hosted on-premises or on virtual servers, requiring manual configuration, ongoing infrastructure maintenance, and security management by in-house IT teams. In contrast, Cloud-Based Secure File Transfer is offered as a Software-as-a-Service (SaaS) service, reducing infrastructure overhead by eliminating the need for dedicated hardware or virtual machines. This model simplifies management through centralized web-based interfaces, automated updates, and built-in scalability. While Cloud-Based Secure File Transfer is focused on providing secure file transfers over the SFTP protocol, Managed File Transfer (MFT) platforms generally support a broader range of protocols, including FTP, FTPS, HTTP/S, and AS2. MFT services often include advanced features such as end-to-end encryption, extensive automation, compliance reporting, and integration with enterprise systems. Cloud-Based Secure File Transfer services may offer some of these features but are typically more lightweight and streamlined, targeting organizations seeking a secure and scalable alternative to traditional SFTP without the full suite of MFT capabilities. As such, Cloud-Based Secure File Transfer can be seen as a specialized subset within the broader managed file transfer ecosystem.

    Read more →