Closing the HPC-Cloud Convergence Gap: Multi-Tenant Slingshot RDMA for Kubernetes
Converged HPC-Cloud computing is an emerging computing paradigm that aims to support increasingly complex and multi-tenant scientific workflows. These systems require reconciliation of the isolation requirements of native cloud workloads and the performance demands of HPC applications. In this conte...
Saved in:
| Published in | Proceedings / IEEE International Conference on Cluster Computing pp. 1 - 10 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
02.09.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2168-9253 |
| DOI | 10.1109/CLUSTER59342.2025.11186471 |
Cover
| Abstract | Converged HPC-Cloud computing is an emerging computing paradigm that aims to support increasingly complex and multi-tenant scientific workflows. These systems require reconciliation of the isolation requirements of native cloud workloads and the performance demands of HPC applications. In this context, networking hardware is a critical boundary component: it is the conduit for high-throughput, low-latency communication and enables isolation across tenants. HPE Slingshot is a high-speed network interconnect that provides up to 200 Gbps of throughput per port and targets high-performance computing (HPC) systems. The Slingshot host software, including hardware drivers and network middleware libraries, is designed to meet HPC deployments, which predominantly use singletenant access modes. Hence, the Slingshot stack is not suited for secure use in multi-tenant deployments, such as converged HPCCloud deployments. In this paper, we design and implement an extension to the Slingshot stack targeting converged deployments on the basis of Kubernetes. Our integration provides secure, container-granular, and multi-tenant access to Slingshot RDMA networking capabilities at minimal overhead. |
|---|---|
| AbstractList | Converged HPC-Cloud computing is an emerging computing paradigm that aims to support increasingly complex and multi-tenant scientific workflows. These systems require reconciliation of the isolation requirements of native cloud workloads and the performance demands of HPC applications. In this context, networking hardware is a critical boundary component: it is the conduit for high-throughput, low-latency communication and enables isolation across tenants. HPE Slingshot is a high-speed network interconnect that provides up to 200 Gbps of throughput per port and targets high-performance computing (HPC) systems. The Slingshot host software, including hardware drivers and network middleware libraries, is designed to meet HPC deployments, which predominantly use singletenant access modes. Hence, the Slingshot stack is not suited for secure use in multi-tenant deployments, such as converged HPCCloud deployments. In this paper, we design and implement an extension to the Slingshot stack targeting converged deployments on the basis of Kubernetes. Our integration provides secure, container-granular, and multi-tenant access to Slingshot RDMA networking capabilities at minimal overhead. |
| Author | Haus, Utz-Uwe Schulz, Martin Eleliemy, Ahmed Friese, Philipp A. |
| Author_xml | – sequence: 1 givenname: Philipp A. surname: Friese fullname: Friese, Philipp A. email: philipp.friese@cit.tum.de organization: Technical University of Munich,Garching,Germany – sequence: 2 givenname: Ahmed surname: Eleliemy fullname: Eleliemy, Ahmed email: ahmed.eleliemy@hpe.com organization: HPE HPC/AI EMEA Research Lab,Basel,BS,Switzerland – sequence: 3 givenname: Utz-Uwe surname: Haus fullname: Haus, Utz-Uwe email: uhaus@hpe.com organization: HPE HPC/AI EMEA Research Lab,Basel,BS,Switzerland – sequence: 4 givenname: Martin surname: Schulz fullname: Schulz, Martin email: schulzm@in.tum.de organization: Technical University of Munich,Garching,Germany |
| BookMark | eNqFjslqAkEURZ_BgEP8gywe2bfWYA-VXagYBRUcOmvpmKd2aKukqlrI36cXce3qcs_lwO1B21hDAC-cDTlnaqQXn9t8somVHIuhYCJuMM-SccpbMFCpyqTksWRcZQ_QFTzJIiVi2YGe9z-MyVSypAtrXVlfmiOGE-FspaOm19-orbmSO5LZE06Lyysu6yqUUU6mMAG3VWP4kw24eV--4cE6nNdf5AwF8k_weCgqT4P_7MPzxyTXs6gkot3FlefC_e5uV-Wd-Q9NnUPp |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CLUSTER59342.2025.11186471 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798331530198 |
| EISSN | 2168-9253 |
| EndPage | 10 |
| ExternalDocumentID | 11186471 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL RNS |
| ID | FETCH-ieee_primary_111864713 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 15 14:21:20 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-ieee_primary_111864713 |
| ParticipantIDs | ieee_primary_11186471 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-Sept.-2 |
| PublicationDateYYYYMMDD | 2025-09-02 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-Sept.-2 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / IEEE International Conference on Cluster Computing |
| PublicationTitleAbbrev | CLUSTER |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0037306 |
| Score | 4.6093917 |
| Snippet | Converged HPC-Cloud computing is an emerging computing paradigm that aims to support increasingly complex and multi-tenant scientific workflows. These systems... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Converged HPC-Cloud Hardware High performance computing High-speed networks HPE Slingshot Kubernetes Libraries Low latency communication Middleware RDMA Software measurement Source coding Systematics Throughput |
| Title | Closing the HPC-Cloud Convergence Gap: Multi-Tenant Slingshot RDMA for Kubernetes |
| URI | https://ieeexplore.ieee.org/document/11186471 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LS8NAEMcH7clTfUR8VNmD16R5bF7eJFqD2lJtC72V3c0EQUlKTS5-enc3iaIoeNsEsgzZzM4w-f9mAS5sO0fB_cyMRMBMynNuxqGL0q98FjqZy13dSHs8CdIFvVv6yxZW1ywMImrxGVpqqP_lZ6WoValsKP0yCqgixrfDKGhgrW7b9eSnGrRdRR07HiYPi5lMCP3Yo4q3cn2re_rbOSo6jIz6MOkMaNQjL1ZdcUu8_-jN-G8Ld8H4IvbI9DMW7cEWFvvQ745sIK0HH8Bj8lqq6gCReR9Jp4kpr-uMJEp7rjFMJLdsfUk0l2vOUclkyEwh62_PZUWersdXRKa55L7muClU0daAwehmnqSmMnO1bppXrDoLvUPoFWWBR0ACL49ZmLHcjpA6NFcDhzIhKEcZ4fkxGL9OcfLH_VPYUS9cK7HcAfSqTY1nMnRX_Fwv2QfHEJy- |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LT4NAEMcnph70VB81PqruwSuUxy4t3gxa0UJTbZv01rAwpIkGmgoXP727C2g0mnhbSNhMWGZnMvx_swBXhpFizFmiDWIn0ihPueb2LRR-xaK-mVjcUo20w7Hjz-njgi1qWF2xMIioxGeoy6H6l5_kcSlLZT3hlwOHSmJ8m1FKWYVrNRuvLT5Wp-4rahpuzwvmU5ESMtemkriymN48_-0kFRVIhm0YNyZU-pEXvSy4Hr__6M74bxv3oPPF7JHJZzTahy3MDqDdHNpAah8-hCfvNZf1ASIyP-JPPE1clwnxpPpcgZhI7qP1NVFkrjZDKZQhUwmtv63ygjzfhjdEJLpkVHLcZLJs24Hu8G7m-Zo0c7mu2lcsGwvtI2hleYbHQBw7daN-EqXGAKlJUzkwaRTHlKOI8fwEOr9OcfrH_UvY8WdhsAwexqMz2JUvX-myrC60ik2J5yKQF_xCLd8HKymgCw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=Closing+the+HPC-Cloud+Convergence+Gap%3A+Multi-Tenant+Slingshot+RDMA+for+Kubernetes&rft.au=Friese%2C+Philipp+A.&rft.au=Eleliemy%2C+Ahmed&rft.au=Haus%2C+Utz-Uwe&rft.au=Schulz%2C+Martin&rft.date=2025-09-02&rft.pub=IEEE&rft.eissn=2168-9253&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1109%2FCLUSTER59342.2025.11186471&rft.externalDocID=11186471 |