Existing Workflow Solutions: Analyzing Jupyter, CWL, Galaxy, and FMI for Reproducibility

cover
2 Jun 2025

Authors:

(1) Pavan L. Veluvali, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg;

(2) Jan Heiland, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg;

(3) Peter Benner, Max Planck Institute for Dynamics of Complex Technical Systems, Sandtorstr. 1, 39106 Magdeburg.

Abstract and 1. Introduction

Existing Solutions

2. MaRDIFlow

Minimum working examples

Spinodal decomposition in a binary A-B alloy

Summary and Outlook, Acknowledgments, Data Availability, and References

Existing Solutions

Reproducibility poses a multifaceted challenge which often demands a comprehensive examination. In this work, we focus on the specific aspect of ensuring reproducibility of computational workflows through integrated software solutions.

Because of the popularity and universality of Jupyter notebooks, we begin with highlighting the capabilities of the Jupyter environment, which significantly boost productivity in computational science and mathematics, while also promoting reproducibility [BTK+21]. In general, Jupyter Notebooks [KRKP+16b] are accessed through a modern web browser and are typically designed to support interactive exploration and publishing records of a scientific computation. Through text and code blocks, it performs a specific computation and elucidates it in detail. The code within a given Jupyter Notebook is organized into cells, which in turn allows individuals to modify and execute, respectively. Also, the output from each cell appears directly below it and is stored as part of the document. This approach often facilitates a symbiotic display of code, data, and model descriptions and naturally ensures replicability of the experiments; [FHHS16]. In this respect, the side by side appearance of text (for documentation) and code (for the execution) is in line with the redundant or multi-layered representation of workflows that we want to achieve but adds to the complexity of such a notebook realization.

What concerns reproducibility, in the strict sense that an experiment could be reproduced solely by information provided in the notebook, certain design decisions in the Jupyter notebook like undocumented versions of the imported libraries let alone the underlying libraries in the backend may stand in the way. From a more practical perspective, by its linear design of consecutive cells, Jupyter notebooks are not well suited to handle larger projects. And although a call of third party code is certainly possible through direct Julia/python/R interfaces or through the system’s shell, the embedding of external tools is not a primary and, thus, not a well-defined feature of Jupyter notebooks. Finally, the reproducibility of a workflow in a Jupyter notebook hinges on the code only whereas the description is commonly seen as an add-on. Thus, there is no built-in mechanism that ensures completeness of the documentation to function as an equivalent or fully valid addition to the code base. In fact, recent studies have found that a Jupyter notebook per se is not a strong guarantee for reproducibility; see Refs. [SM24,PMBF21].

Whereas Jupyter notebooks have become a popular tool for implementing, documenting, and publishing workflows and experiments of moderate complexity, in computational science and engineering, multiple advanced tools have addressed the challenge of managing workflows composed of various simulation codes. Domain specific workflow managers, such as CWL [CAI+22a] and Galaxy [Com22] are applicable to high-performance computing in general. CWL [CAI+22a] is widely known for the description of command-line tools and of the workflows made from these tools. It includes many features, such as, software containers, resource requirements, workflow-level conditional branching. While most of the CWL development began in the field of bioinformatics, since 2016, the CWL standards have been used in other fields, including hydrology, radio astronomy, geo-spatial analysis, and high-energy physics. On the other hand, Galaxy [Com22] serves as an accessible browser-based platform for scientific computing. It facilitates data sharing, analysis, and visualization for scientists with minimal technical barriers. In addition, integration with thirdparty tools like noWorkflow [PMBF17] allows for effective tracking of provenance, elucidating the relationship between inputs, code, and generated files.

A container-based approach to modelling workflow components has been followed in the functional mockup interface (FMI) [BOA+11] development. Here, arbitrary simulation model are encapsulated in a complete virtual computational environment (a container) and made accessible through interfaces. This facilitates an easy exchange of simulation tools (even without disclosing the source) and can be integrated in workflow designs, as it is specified in the FMI using xml syntax.

Most of the aforementioned systems have improved the reproducibility of computational workflows over the last years, and have become the defacto standard for syntactic interoperability of workflow management systems. However, each of the systems discussed come with their own limitations. Namely, CWL often syntax fails to address the user-defined construction and interaction with other command-line tools once its execution is finished [CAI+22b]. Likewise, one of the disadvantages of document based workflow definitions is their static nature, as the exact flow of the workflow must be known before execution. This specific mechanism inherently imposes constraints on programming structures that can be utilized, generally via confining options to either directed acyclic graphs (DAGs) or directed cyclic graphs (DCGs), especially in cases where loops are accommodated by the markup language [UHY+21]. Nevertheless, all the aforementioned frameworks store the data produced, but none with a focus on explicitly recording provenance in detail, and, in particular, the workflow components are not expressed in a multi-level framework or abstract objects. On the other hand, customizing the system for specific requirements may present a significant challenge for end users, requiring additional effort and resources.

Container-based implementations like in FMI can mitigate the issues with data provenance or setting up the environment as much as mere replication is concerned. However, in view reproducibility in different environments or reusability and adaptation of workflow components, these containers would need to be equipped with just the same meta-data as any other workflow design tool would require it.

Based on the aforementioned challenges, it becomes evident that the existing research landscape for a comprehensive workflow description tailored to effectively handle mathematical data within a multi-component framework is lacking. On that account, the present study aims to bridge the existing gap.

This paper is available on arxiv under CC BY 4.0 DEED license.