Round-trip engineering

Round-trip engineering

Round-trip engineering (RTE) in the context of model-driven architecture is a functionality of software development tools that synchronizes two or more related software artifacts, such as, source code, models, configuration files, documentation, etc. between each other. The need for round-trip engineering arises when the same information is present in multiple artifacts and when an inconsistency may arise in case some artifacts are updated. For example, some piece of information was added to/changed in only one artifact (source code) and, as a result, it became missing in/inconsistent with the other artifacts (in models). == Overview == Round-trip engineering is closely related to traditional software engineering disciplines: forward engineering (creating software from specifications), reverse engineering (creating specifications from existing software), and reengineering (understanding existing software and modifying it). Round-trip engineering is often wrongly defined as simply supporting both forward and reverse engineering. In fact, the key characteristic of round-trip engineering that distinguishes it from forward and reverse engineering is the ability to synchronize existing artifacts that evolved concurrently by incrementally updating each artifact to reflect changes made to the other artifacts. Furthermore, forward engineering can be seen as a special instance of RTE in which only the specification is present and reverse engineering can be seen as a special instance of RTE in which only the software is present. Many reengineering activities can also be understood as RTE when the software is updated to reflect changes made to the previously reverse engineered specification. === Types === Various books describe two types of RTE: partial or uni-directional RTE: changes made to a higher level representation of a code and model are reflected in lower level, but not otherwise; the latter might be allowed, but with limitations that may not affect higher-level abstractions full or bi-directional RTE: regardless of changes, both higher and lower-level code and model representations are synchronized if any of them altered === Auto synchronization === Another characteristic of round-trip engineering is automatic update of the artifacts in response to automatically detected inconsistencies. In that sense, it is different from forward- and reverse engineering which can be both manual (traditionally) and automatic (via automatic generation or analysis of the artifacts). The automatic update can be either instantaneous or on-demand. In instantaneous RTE, all related artifacts are immediately updated after each change made to one of them. In on-demand RTE, authors of the artifacts may concurrently update the artifacts (even in a distributed setting) and at some point choose to execute matching to identify inconsistencies and choose to propagate some of them and reconcile potential conflicts. === Iterative approach === Round trip engineering may involve an iterative development process. After you have synchronized your model with revised code, you are still free to choose the best way to work – make further modifications to the code or make changes to your model. You can synchronize in either direction at any time and you can repeat the cycle as many times as necessary. == Software == Many commercial tools and research prototypes support this form of RTE; a 2007 book lists Rational Rose, Together, ESS-Model, BlueJ, and Fujaba among those capable, with Fujaba said to be capable to also identify design patterns. == Limitations == A 2005 book on Visual Studio notes for instance that a common problem in RTE tools is that the model reversed is not the same as the original one, unless the tools are aided by leaving laborious annotations in the source code. The behavioral parts of UML impose even more challenges for RTE. Usually, UML class diagrams are supported to some degree; however, certain UML concepts, such as associations and containment do not have straightforward representations in many programming languages which limits the usability of the created code and accuracy of code analysis/reverse engineering (e.g., containment is hard to recognize in the code). A more tractable form of round-trip engineering is implemented in the context of framework application programming interfaces (APIs), whereby a model describing the usage of a framework API by an application is synchronized with that application's code. In this setting, the API prescribes all correct ways the framework can be used in applications, which allows precise and complete detection of API usages in the code as well as creation of useful code implementing correct API usages. Two prominent RTE implementations in this category are framework-specific modeling languages and Spring Roo (Java). Round-trip engineering is critical for maintaining consistency among multiple models and between the models and the code in Object Management Group's (OMG) Model-driven architecture. OMG proposed the QVT (query/view/transformation) standard to handle model transformations required for MDA. To date, a few implementations of the standard have been created. (Need to present practical experiences with MDA in relation to RTE). == Controversies == === Code generation controversy === Code generation (forward-engineering) from models means that the user abstractly models solutions, which are connoted by some model data, and then an automated tool derives from the models parts or all of the source code for the software system. In some tools, the user can provide a skeleton of the program source code, in the form of a source code template where predefined tokens are then replaced with program source code parts during the code generation process. UML (if used for MDA) diagrams specification was criticized for lack the detail which is needed to contain the same information as is covered with the program source. Some developers even claim that "the Code is the design". == Disadvantages == There is a serious risk that the generated code will rapidly differ from the model or that the reverse-engineered model will lose its reflection on the code or a mix of these two problems as result of cycled reengineering efforts. Regarding behavioral/dynamic part of UML for features like statechart diagram there is no equivalents in programming languages. Their translation during code-generation will result in common programming statement (.e.g if,switch,enum) being either missing or misinterpreted. If edited and imported back may result in different or incomplete model. The same goes for code snippets used for code generation stage for the pattern-implementation and user-specific logic: intermixed they may not be easily reverse-engineered back. There is also general lack of advanced tooling for modelling that are comparable to that of modern IDEs (for testing, debugging, navigation, etc.) for general-purpose programming languages and domain-specific languages. == Examples in software engineering == Perhaps the most common form of round-trip engineering is synchronization between UML (Unified Modeling Language) models and the corresponding source code and entity–relationship diagrams in data modelling and database modelling. Round-trip engineering based on Unified Modeling Language (UML) needs three basic tools for software development: Source Code Editor; UML Editor for the Attributes and Methods; Visualisation of UML structure

H2O (software)

H2O is an open-source, in-memory, distributed machine learning and predictive analytics platform developed by the company H2O.ai (previously 0xdata). The software uses a distributed architecture for parallel processing on standard hardware. It supports algorithms for large-scale data analysis and model deployment. H2O is primarily used by data scientists and developers for statistical modeling and data-driven decision-making. The platform is designed to handle in-memory computations across a distributed computing environment. It offers implementations for numerous statistical and machine learning algorithms, which are accessible through various programming interfaces. The software is released under the Apache License 2.0. == Functionality and features == H2O provides a suite of supervised and unsupervised machine learning algorithms. Its core functions include: Supervised learning: algorithms in the field of statistics, data mining and machine learning such as generalized linear models, random forests, gradient boosting and deep learning are implemented for classification and regression tasks. Unsupervised learning: including K-Means clustering and principal component analysis. Automated machine learning: a features designed to automate the processes of model selection, tuning, and ensemble creation. The software can ingest data from various sources, including the Hadoop Distributed File System, Amazon S3, SQL databases, as well as local file systems. It operates natively on Apache Spark clusters through Sparkling Water. Proponents claim that improved performance is achieved compared to other analysis tools. The software is distributed free of charge, under a business model based on the development of individual applications and support. == Architecture == H2O is primarily written in Java. It uses a distributed architecture that allows the platform to cluster nodes for parallel processing and in-memory storage of data and models. Users interact with the H2O platform through several primary interfaces: Programming language interfaces: APIs are provided for the R and Python programming languages, and various Apache offerings (Apache Hadoop and Spark, as well as Maven). H2O Flow: a graphical web-based interactive computational environment that functions as a notebook interface for data exploration, model building, and scripting. REST-API: allows for integration with other applications and frameworks such as Microsoft Excel or RStudio. With the H2O Machine Learning Integration Nodes, KNIME offers algorithmic workflows. While the algorithm executes, approximate results are displayed, so that users can track the progress and intervene if needed. == History, influences, and extensions == The software project was initiated by the company 0xdata, which later changed its name to H2O.ai. The three Stanford professors Stephen P. Boyd, Robert Tibshirani and Trevor Hastie form a panel that advises H2O on scientific issues. Since its inception, H2O provides open-source machine learning libraries for enterprise use. The core H2O platform is often complemented by offerings from H2O.ai, such as H2O Driverless AI. == Reception == H2O is referenced in peer-reviewed literature regarding automated machine learning (AutoML). The platform has been categorized as a "Leader" and a "Strong Performer" in industry reports by Forrester Research. H2O (the open-source platform) and the associated commercial platform Driverless AI have been recurring winners of InfoWorld's most prestigious awards, including both the Best of Open Source Software ("Bossies") and the Technology of the Year awards.

VoID

The Vocabulary of Interlinked Datasets (VoID) is a vocabulary for providing concise summaries (metadata) of Resource Description Framework (RDF) datasets—meaningful collections of semantic triples—using the syntax of RDF Schema. It can be used for general metadata (such as information about the license of the dataset), access metadata (information about how to access the dataset), structural metadata (information about how the dataset is structured), and linking metadata (information about links between datasets). A linked dataset is a collection of data, published and maintained by a single provider, available as RDF on the Web, where at least some of the resources in the dataset are identified by dereferencable Uniform Resource Identifiers (URIs). VoID is used to provide metadata on RDF datasets to facilitate query processing on a graph of interlinked datasets in the Semantic Web.

AI Futures Project

The AI Futures Project is a nonprofit research organization based in the United States that specializes in forecasting the development and societal impact of advanced artificial intelligence. The organization is best known for its 2025 scenario forecast, AI 2027, which examines the potential near-term emergence of artificial general intelligence (AGI) and its possible global consequences. == History == The AI Futures Project was founded in 2025 by Daniel Kokotajlo, a former researcher in the governance division of OpenAI. Kokotajlo resigned from OpenAI in April 2024, expressing concerns that the company prioritized rapid product development over AI safety and was advancing without sufficient safeguards. He founded the nonprofit to conduct independent forecasting and policy research. The organization is registered as a 501(c)(3) nonprofit in the United States and is funded through donations. It operates with a small research staff and network of advisors drawn from fields including AI policy, forecasting, and risk analysis. == Activities == The mission of the AI Futures Project is to develop detailed scenario forecasts of the trajectory of advanced AI systems to inform policymakers, researchers, and the public. In addition to written reports, the group has conducted tabletop exercises and workshops based on its scenarios, involving participants from academia, technology, and public policy. == AI 2027 == In April 2025, the AI Futures Project released AI 2027, a detailed scenario forecast describing possible developments in AI between 2025 and 2027. The report was authored by Daniel Kokotajlo along with Eli Lifland, Thomas Larsen, and Romeo Dean, with editing assistance from blogger Scott Alexander. The scenario depicts very rapid progress in AI capabilities, including the development of autonomous AI systems capable of recursive self-improvement. AI 2027 presents two alternative endings: one in which international competition over advanced AI leads to catastrophic loss of human control, and another in which coordinated global action slows down development and averts imminent disaster. The authors emphasize that the narratives are hypothetical and intended as planning tools rather than literal forecasts. == Reception == AI 2027 attracted attention from technology journalists and AI researchers. Some commentators praised the report for its level of detail and its usefulness as a strategic planning exercise, while others criticized the scenario as implausibly aggressive in its timelines. The report was cited in policy discussions about AI governance. U.S. Vice President JD Vance reportedly read AI 2027 and referenced its warnings in conversations about international AI coordination. More recent reporting noted that the authors of AI 2027 had publicly revised some of their timelines. According to Kokotajlo, developments since the report's original publication suggested a slower path toward fully autonomous AI research systems than initially forecasted.

Spark NLP

Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming languages. The library is built on top of Apache Spark and its Spark ML library. Its purpose is to provide an API for natural language processing pipelines that implement recent academic research results as production-grade, scalable, and trainable software. The library offers pre-trained neural network models, pipelines, and embeddings, as well as support for training custom models. == Features == The design of the library makes use of the concept of a pipeline which is an ordered set of text annotators. Out of the box annotators include, tokenizer, normalizer, stemming, lemmatizer, regular expression, TextMatcher, chunker, DateMatcher, SentenceDetector, DeepSentenceDetector, POS tagger, ViveknSentimentDetector, sentiment analysis, named entity recognition, conditional random field annotator, deep learning annotator, spell checking and correction, dependency parser, typed dependency parser, document classification, and language detection. The Models Hub is a platform for sharing open-source as well as licensed pre-trained models and pipelines. It includes pre-trained pipelines with tokenization, lemmatization, part-of-speech tagging, and named entity recognition that exist for more than thirteen languages; word embeddings including GloVe, ELMo, BERT, ALBERT, XLNet, Small BERT, and ELECTRA; sentence embeddings including Universal Sentence Embeddings (USE) and Language Agnostic BERT Sentence Embeddings (LaBSE). It also includes resources and pre-trained models for more than two hundred languages. Spark NLP base code includes support for East Asian languages such as tokenizers for Chinese, Japanese, Korean; for right-to-left languages such as Urdu, Farsi, Arabic, Hebrew and pre-trained multilingual word and sentence embeddings such as LaUSE and a translation annotator. == Usage in healthcare == Spark NLP for Healthcare is a commercial extension of Spark NLP for clinical and biomedical text mining. It provides healthcare-specific annotators, pipelines, models, and embeddings for clinical entity recognition, clinical entity linking, entity normalization, assertion status detection, de-identification, relation extraction, and spell checking and correction. The library offers access to several clinical and biomedical transformers: JSL-BERT-Clinical, BioBERT, ClinicalBERT, GloVe-Med, GloVe-ICD-O. It also includes over 50 pre-trained healthcare models, that can recognize the entities such as clinical, drugs, risk factors, anatomy, demographics, and sensitive data. == Spark OCR == Spark OCR is another commercial extension of Spark NLP for optical character recognition (OCR) from images, scanned PDF documents, and DICOM files. It is a software library built on top of Apache Spark. It provides several image pre-processing features for improving text recognition results such as adaptive thresholding and denoising, skew detection & correction, adaptive scaling, layout analysis and region detection, image cropping, removing background objects. Due to the tight coupling between Spark OCR and Spark NLP, users can combine NLP and OCR pipelines for tasks such as extracting text from images, extracting data from tables, recognizing and highlighting named entities in PDF documents or masking sensitive text in order to de-identify images. Several output formats are supported by Spark OCR such as PDF, images, or DICOM files with annotated or masked entities, digital text for downstream processing in Spark NLP or other libraries, structured data formats (JSON and CSV), as files or Spark data frames. Users can also distribute the OCR jobs across multiple nodes in a Spark cluster. == License and availability == Spark NLP is licensed under the Apache 2.0 license. The source code is publicly available on GitHub as well as documentation and a tutorial. Prebuilt versions of Spark NLP are available in PyPi and Anaconda Repository for Python development, in Maven Central for Java & Scala development, and in Spark Packages for Spark development. == Award == In March 2019, Spark NLP received Open Source Award for its contributions in natural language processing in Python, Java, and Scala.

The Outliner of Giants

The Outliner of Giants was commercial outlining software. Like other outliners, it allowed the user to create a document consisting of a series of nested lists. It was one of a number of browser-based outliners that are delivered as a web application, used through a web browser, rather than being installed as a stand-alone application. The Outliner of Giants was released in 2009. The service was shut down on December 31, 2017 and only exports are allowed at this time. == Feature set == Unlike most other browser-based outliners - which often focus on providing a minimum viable product - the Outliner of Giants had much of the functionality typically associated with a desktop outliner, such as the ability to use of columns to structure information. However, The Outliner of Giants did not support offline editing, requiring an active internet connection in order to make changes to an outline document. === Outlining === Like all outliners, The Outliner of Giants supported the creation of a hierarchy of items, with users modifying the parent-child relationship between items in order to structure a document. This included the ability to promote or demote items up or down the hierarchy, or move an item up or down a list of siblings on the same level. The Outliner of Giants did not support the true cloning of items (where an item can appear to be in multiple places within the hierarchy at the same time), although it did support the copying of single or multiple nodes. === Import === The Outliner of Giants could import both plain text and the OPML XML format, which is commonly used to transfer data between outlining applications. === Editing === Outline documents could be edited using a WYSIWYG editor, as well as the Markdown, and Textile markup languages. === Annotation === The Outliner of Giants supported functions to annotate an outline, such as the ability to add colored labels, highlights and text, as well as tags and hashtags. === Collaboration === The Outliner of Giants supported real-time collaboration, where multiple users could edit the same document, and can see the changes made by another user as they happened. === Publication === Outlines created through The Outliner of Giants could be published directly online through the service, either as outlines, pages or in a blog format. === Export === The Outliner of Giants can export outline data as plain text, HTML, as well as directly to the Google Docs word processor.

AlphaEvolve

AlphaEvolve is an evolutionary coding agent for designing advanced algorithms based on large language models such as Gemini. It was developed by Google DeepMind and unveiled in May 2025. == Design == AlphaEvolve aims to autonomously discover and refine algorithms through a combination of large language models (LLMs) and evolutionary computation. AlphaEvolve needs an evaluation function with metrics to optimize, and an initial algorithm. At each step, AlphaEvolve uses the LLM to produce variants of the existing algorithms, and then selects the most effective ones. Unlike domain-specific predecessors like AlphaFold or AlphaTensor, AlphaEvolve is designed as a general-purpose system. It can operate across a wide array of scientific and engineering tasks by automatically modifying code and optimizing for multiple objectives. Its architecture allows it to evaluate code programmatically, reducing reliance on human input and mitigating risks such as hallucinations common in standard LLM outputs. == Achievements == According to Google, across a selection of 50 open mathematical problems, the model was able to rediscover state-of-the-art solutions 75% of the time and discovered improved solutions 20% of the time, for example advancing the kissing number problem. AlphaEvolve was also used to optimize Google's computing ecosystem. Improved data center scheduling heuristics, enabled the recovery of 0.7% of stranded resources. It was also used to optimize TPU circuit design and Gemini's training matrix multiplication kernel. == Open source implementations == Following the publication of AlphaEvolve, several open source implementations have been developed by the research community. One such implementation is OpenEvolve, which implements distributed evolutionary algorithms, multi-language support, integration with various large language model providers, and automated discovery of high-performance GPU kernels that outperform expert-engineered baselines.