Improving SQL Join Algorithms for Distributed Systems: A Case Study of CXL-based Multi-Host Shared Memory

The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers,...

Full description

Saved in:
Bibliographic Details
Published inIEEE MICRO pp. 1 - 9
Main Authors Jun, JaeYung, Ahn, HyunWoong, Lee, Joohee, Choi, Jungmin, Koh, Byungil, Moon, Donguk
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text
ISSN0272-1732
1937-4143
DOI10.1109/MM.2025.3574357

Cover

More Information
Summary:The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers, enabling independent processing. However, cross-partition operations, such as joins, require data repartitioning, leading to significant communication overhead. To address this challenge, we propose Merge Hash Join (MHJ), a novel SQL join algorithm that leverages shared memory to eliminate the need for repartitioning. By storing the joining table in shared memory and making it directly accessible to all servers, MHJ significantly reduces communication overhead. To validate our approach, we implemented MHJ and the necessary shared memory functionalities on a CXL-based shared memory prototype. Extensive evaluations using the industry-standard TPC-DS benchmark demonstrate that MHJ achieves up to 1.5× performance improvement compared to conventional join algorithms.
ISSN:0272-1732
1937-4143
DOI:10.1109/MM.2025.3574357