Data Logistics Service in eFlows4HPC
Modern scientific endeavors often require complex, data-intensive workflows leveraging distributed and heterogeneous computing and data resources. Such workflows often include multiple steps of classical simulations, but increasingly also ML and AI components. As a result, they use not only HPC, but...
Saved in:
Published in | 2024 47th MIPRO ICT and Electronics Convention (MIPRO) pp. 892 - 897 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
20.05.2024
|
Subjects | |
Online Access | Get full text |
ISSN | 2623-8764 |
DOI | 10.1109/MIPRO60963.2024.10569664 |
Cover
Summary: | Modern scientific endeavors often require complex, data-intensive workflows leveraging distributed and heterogeneous computing and data resources. Such workflows often include multiple steps of classical simulations, but increasingly also ML and AI components. As a result, they use not only HPC, but also Cloud-like resources. Efficient and user-friendly execution and management of such workflows pose many challenges. In this paper, we share our experience in implementing three such workflows in the eFlows4HPC project. We focus, however, on the data management dimension of the workflows. How to ensure the timely availability of the required data, how to move data to and from compute resources, and how to make the workflows complete and portable. To this end, we implemented the Data Logistics Service, integrated it with the workflow execution engine, and defined multiple data movement pipelines to cater for specific scientific needs. We will share our experience from implementation and operation of the service. This will include building a solution for continuous deployment and access management in a federated environment. On a more abstract level, we also explore how the presented approach fits into the vision of the FAIR paradigm. |
---|---|
ISSN: | 2623-8764 |
DOI: | 10.1109/MIPRO60963.2024.10569664 |