Data Logistics Service in eFlows4HPC

Modern scientific endeavors often require complex, data-intensive workflows leveraging distributed and heterogeneous computing and data resources. Such workflows often include multiple steps of classical simulations, but increasingly also ML and AI components. As a result, they use not only HPC, but...

Full description

Saved in:
Bibliographic Details
Published in2024 47th MIPRO ICT and Electronics Convention (MIPRO) pp. 892 - 897
Main Authors Rybicki, Jedrzej, Bottcher, Christian
Format Conference Proceeding
LanguageEnglish
Published IEEE 20.05.2024
Subjects
Online AccessGet full text
ISSN2623-8764
DOI10.1109/MIPRO60963.2024.10569664

Cover

Abstract Modern scientific endeavors often require complex, data-intensive workflows leveraging distributed and heterogeneous computing and data resources. Such workflows often include multiple steps of classical simulations, but increasingly also ML and AI components. As a result, they use not only HPC, but also Cloud-like resources. Efficient and user-friendly execution and management of such workflows pose many challenges. In this paper, we share our experience in implementing three such workflows in the eFlows4HPC project. We focus, however, on the data management dimension of the workflows. How to ensure the timely availability of the required data, how to move data to and from compute resources, and how to make the workflows complete and portable. To this end, we implemented the Data Logistics Service, integrated it with the workflow execution engine, and defined multiple data movement pipelines to cater for specific scientific needs. We will share our experience from implementation and operation of the service. This will include building a solution for continuous deployment and access management in a federated environment. On a more abstract level, we also explore how the presented approach fits into the vision of the FAIR paradigm.
AbstractList Modern scientific endeavors often require complex, data-intensive workflows leveraging distributed and heterogeneous computing and data resources. Such workflows often include multiple steps of classical simulations, but increasingly also ML and AI components. As a result, they use not only HPC, but also Cloud-like resources. Efficient and user-friendly execution and management of such workflows pose many challenges. In this paper, we share our experience in implementing three such workflows in the eFlows4HPC project. We focus, however, on the data management dimension of the workflows. How to ensure the timely availability of the required data, how to move data to and from compute resources, and how to make the workflows complete and portable. To this end, we implemented the Data Logistics Service, integrated it with the workflow execution engine, and defined multiple data movement pipelines to cater for specific scientific needs. We will share our experience from implementation and operation of the service. This will include building a solution for continuous deployment and access management in a federated environment. On a more abstract level, we also explore how the presented approach fits into the vision of the FAIR paradigm.
Author Bottcher, Christian
Rybicki, Jedrzej
Author_xml – sequence: 1
  givenname: Jedrzej
  surname: Rybicki
  fullname: Rybicki, Jedrzej
  email: j.rybicki@fz-juelich.de
  organization: Juelich Supercompuging Center,Juelich,Germany
– sequence: 2
  givenname: Christian
  surname: Bottcher
  fullname: Bottcher, Christian
  email: c.boettcher@fz-juelich.de
  organization: Juelich Supercompuging Center,Juelich,Germany
BookMark eNo1j0tLw0AURkdRsNb8AxdZuE29d17JXUq0thBp8bEuk5sZGaiJZILiv29BXX2rczjfpTjrh94LkSMsEIFun9bb540FsmohQeoFgrFkrT4RGZVUKQOqkgbwVMyklaqoSqsvRJZSbEFLXcIRnYmbeze5vBneY5oip_zFj1-RfR773C_3w3fSq219Jc6D2yef_e1cvC0fXutV0Wwe1_VdU0REmgoECAqZGNm1DIxdQPCAOqguGGyppaCOUWyMUSAradl1FWliI4lbq-bi-tcbvfe7zzF-uPFn939MHQCzQ0J6
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MIPRO60963.2024.10569664
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350382501
9798350382495
EISSN 2623-8764
EndPage 897
ExternalDocumentID 10569664
Genre orig-research
GrantInformation_xml – fundername: Ministry of Education
  funderid: 10.13039/501100002701
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
M43
RIE
RIL
ID FETCH-LOGICAL-i119t-100f31c9c1cabc0c1df10e014f3df51b9b9f3382c555302826cad8949c529cb63
IEDL.DBID RIE
IngestDate Wed Aug 27 02:06:46 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-100f31c9c1cabc0c1df10e014f3df51b9b9f3382c555302826cad8949c529cb63
PageCount 6
ParticipantIDs ieee_primary_10569664
PublicationCentury 2000
PublicationDate 2024-May-20
PublicationDateYYYYMMDD 2024-05-20
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-May-20
  day: 20
PublicationDecade 2020
PublicationTitle 2024 47th MIPRO ICT and Electronics Convention (MIPRO)
PublicationTitleAbbrev MIPRO
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib042470096
Score 1.8837678
Snippet Modern scientific endeavors often require complex, data-intensive workflows leveraging distributed and heterogeneous computing and data resources. Such...
SourceID ieee
SourceType Publisher
StartPage 892
SubjectTerms Buildings
Cloud
Computational modeling
Data transfer
Distributed Data
Distributed databases
Full stack
High Performance Computing
Pipelines
Reproducibility of results
Title Data Logistics Service in eFlows4HPC
URI https://ieeexplore.ieee.org/document/10569664
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA22J08qVvxmD73uNtlkN-ZcXarYuoiF3koym0BRtmK3CP56J2lXURC8hYGEfJGXl8ybIaQvwCIuIE0VUlkkKBpiXfmAl1xyvF2DZcY_DYwn-Wgq7mbZbCtWD1oYa21wPrOJL4a__GoJa_9UNvBZ4rEB0SEdKdVGrNVuHpEK6e_jrbcOVYPxbfn4kKONIw9MRdJW_5FIJeBIsUcmbQ827iPPyboxCXz8Cs747y7uk963ZC8qv8DogOzY-pD0r3Wjo_sg8lnAKtoeDNGijmzxsnxfiVE57JFpcfM0HMXbtAjxgjHV4MFJHWeggIE2QIFVjlGLVMfxymXMKKMcEs8UspASCPkD6OpKCQVZqsDk_Ih062Vtj0lkKwqaOZlLZxDZqeHUIGClLneZEOBOSM8Pcf66iXwxb0d3-of9jOz6mfa_6yk9J93mbW0vELQbcxkW6xNL9ZPt
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA1aD3pSseK3e-i122ST3TXnatlqW4u00FtJZhMoyq7YLYK_3knaVRQEb2EOIV_kzUvmzRDSEmAQF5CmilQaJCgK2ip3CS95ytG7BsO0exoYjpJsKu5n8WwjVvdaGGOMDz4zoWv6v_y8hJV7Kuu4KvHYgdgmOzHSinQt16qPj4hE6jzyOl6Hys6wP356TNDGkQlGIqw7-FFKxSNJb5-M6jGsA0iew1WlQ_j4lZ7x34M8IM1v0V4w_oKjQ7JliiPSulWVCgZe5rOAZbC5GoJFEZjeS_m-FNm42yTT3t2km7U3hRHaC8ZkhVcntZyBBAZKAwWWW0YNkh3LcxszLbW0SD0jiH1RIGQQoPIbKSTEkQSd8GPSKMrCnJDA5BQUs2mSWo3YTjWnGiErsonFJQZ7SppuivPXde6LeT27sz_s12Q3mwwH80F_9HBO9tyqu7_2iF6QRvW2MpcI4ZW-8hv3CUmWlz4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+47th+MIPRO+ICT+and+Electronics+Convention+%28MIPRO%29&rft.atitle=Data+Logistics+Service+in+eFlows4HPC&rft.au=Rybicki%2C+Jedrzej&rft.au=Bottcher%2C+Christian&rft.date=2024-05-20&rft.pub=IEEE&rft.eissn=2623-8764&rft.spage=892&rft.epage=897&rft_id=info:doi/10.1109%2FMIPRO60963.2024.10569664&rft.externalDocID=10569664