Improving SQL Join Algorithms for Distributed Systems: A Case Study of CXL-based Multi-Host Shared Memory
The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers,...
Saved in:
| Published in | IEEE MICRO pp. 1 - 9 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
IEEE
2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0272-1732 1937-4143 |
| DOI | 10.1109/MM.2025.3574357 |
Cover
| Summary: | The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers, enabling independent processing. However, cross-partition operations, such as joins, require data repartitioning, leading to significant communication overhead. To address this challenge, we propose Merge Hash Join (MHJ), a novel SQL join algorithm that leverages shared memory to eliminate the need for repartitioning. By storing the joining table in shared memory and making it directly accessible to all servers, MHJ significantly reduces communication overhead. To validate our approach, we implemented MHJ and the necessary shared memory functionalities on a CXL-based shared memory prototype. Extensive evaluations using the industry-standard TPC-DS benchmark demonstrate that MHJ achieves up to 1.5× performance improvement compared to conventional join algorithms. |
|---|---|
| ISSN: | 0272-1732 1937-4143 |
| DOI: | 10.1109/MM.2025.3574357 |