Improving SQL Join Algorithms for Distributed Systems: A Case Study of CXL-based Multi-Host Shared Memory

The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers,...

Full description

Saved in:
Bibliographic Details
Published inIEEE MICRO pp. 1 - 9
Main Authors Jun, JaeYung, Ahn, HyunWoong, Lee, Joohee, Choi, Jungmin, Koh, Byungil, Moon, Donguk
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text
ISSN0272-1732
1937-4143
DOI10.1109/MM.2025.3574357

Cover

Abstract The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers, enabling independent processing. However, cross-partition operations, such as joins, require data repartitioning, leading to significant communication overhead. To address this challenge, we propose Merge Hash Join (MHJ), a novel SQL join algorithm that leverages shared memory to eliminate the need for repartitioning. By storing the joining table in shared memory and making it directly accessible to all servers, MHJ significantly reduces communication overhead. To validate our approach, we implemented MHJ and the necessary shared memory functionalities on a CXL-based shared memory prototype. Extensive evaluations using the industry-standard TPC-DS benchmark demonstrate that MHJ achieves up to 1.5× performance improvement compared to conventional join algorithms.
AbstractList The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers, enabling independent processing. However, cross-partition operations, such as joins, require data repartitioning, leading to significant communication overhead. To address this challenge, we propose Merge Hash Join (MHJ), a novel SQL join algorithm that leverages shared memory to eliminate the need for repartitioning. By storing the joining table in shared memory and making it directly accessible to all servers, MHJ significantly reduces communication overhead. To validate our approach, we implemented MHJ and the necessary shared memory functionalities on a CXL-based shared memory prototype. Extensive evaluations using the industry-standard TPC-DS benchmark demonstrate that MHJ achieves up to 1.5× performance improvement compared to conventional join algorithms.
Author Choi, Jungmin
Moon, Donguk
Ahn, HyunWoong
Lee, Joohee
Jun, JaeYung
Koh, Byungil
Author_xml – sequence: 1
  givenname: JaeYung
  surname: Jun
  fullname: Jun, JaeYung
  email: jaeyung.jun@sk.com
  organization: SK hynix Inc., Icheon, South Korea
– sequence: 2
  givenname: HyunWoong
  surname: Ahn
  fullname: Ahn, HyunWoong
  email: hyungwoong.ahn@sk.com
  organization: SK hynix Inc., Icheon, South Korea
– sequence: 3
  givenname: Joohee
  surname: Lee
  fullname: Lee, Joohee
  email: joohee.lee@sk.com
  organization: SK hynix Inc., Icheon, South Korea
– sequence: 4
  givenname: Jungmin
  surname: Choi
  fullname: Choi, Jungmin
  email: jungmin.choi@sk.com
  organization: SK hynix Inc., Icheon, South Korea
– sequence: 5
  givenname: Byungil
  surname: Koh
  fullname: Koh, Byungil
  email: byungil.koh@sk.com
  organization: SK hynix Inc., Icheon, South Korea
– sequence: 6
  givenname: Donguk
  surname: Moon
  fullname: Moon, Donguk
  email: donguk.moon@sk.com
  organization: SK hynix Inc., Icheon, South Korea
BookMark eNpFkM1rwjAAxcNwMHU777JD_oFqPpt2N3EfOlrG6Aa7laRJNcM2ksRB__spCjs8Hjzee4ffBIx61xsA7jGaYYzyeVnOCCJ8RrlgR12BMc6pSBhmdATGiAiSYEHJDZiE8IMQ4gRlY2DX3d67X9tvYPVRwDdne7jYbZy3cdsF2DoPn2yI3qpDNBpWQ4imC49wAZcyGFjFgx6ga-Hyu0jUMdGwPOyiTVYuRFhtpT8lpnN-uAXXrdwFc3fxKfh6ef5crpLi_XW9XBRJgymNCeGilYprzHORkRYZKZs0VTzVtJVSKKSYkJplKqWaCdMwhpBGJCXc5I1Qkk7B_PzbeBeCN22997aTfqgxqk-k6rKsT6TqC6nj4uG8sMaY_zZGOCOC0z9xyGad
CODEN IEMIDZ
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/MM.2025.3574357
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore (NTUSG)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1937-4143
EndPage 9
ExternalDocumentID 10_1109_MM_2025_3574357
11018275
Genre orig-research
GroupedDBID -DZ
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAFWJ
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
ACNCT
AENEX
AETEA
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
AZLTO
BEFXN
BFFAM
BGNUA
BKEBE
BKOMP
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TAE
TN5
TWZ
YZZ
3EH
5VS
AAYXX
AETIX
AGSQL
AI.
AIBXA
ALLEH
C1A
CITATION
H~9
IBMZZ
ICLAB
IFJZH
OHT
RNI
RZB
VH1
ZCG
ID FETCH-LOGICAL-c133t-257fab5d159782f0eaac66b56d3faa7b0b47ad48b63d47ec4400d02625e9c7ba3
IEDL.DBID RIE
ISSN 0272-1732
IngestDate Wed Oct 01 05:57:06 EDT 2025
Wed Aug 27 01:52:01 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c133t-257fab5d159782f0eaac66b56d3faa7b0b47ad48b63d47ec4400d02625e9c7ba3
PageCount 9
ParticipantIDs ieee_primary_11018275
crossref_primary_10_1109_MM_2025_3574357
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025-00-00
PublicationDecade 2020
PublicationTitle IEEE MICRO
PublicationTitleAbbrev MM
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0005208
Score 2.4210384
Snippet The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 1
SubjectTerms Buildings
Clustering algorithms
Distributed databases
Hash functions
Partitioning algorithms
Random access memory
Servers
Sparks
Structured Query Language
Training
Title Improving SQL Join Algorithms for Distributed Systems: A Case Study of CXL-based Multi-Host Shared Memory
URI https://ieeexplore.ieee.org/document/11018275
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore (NTUSG)
  customDbUrl:
  eissn: 1937-4143
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0005208
  issn: 0272-1732
  databaseCode: RIE
  dateStart: 19810101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62Jy_WR8U3OXjwstt9ZdP1VqqllG5BtNDbktdqUXel3R7qrzeTTbUIgrclbCBkksw3yTffIHTNSCAMqxGKEzqRyhN4JPQdQfMkhGhRmeoN6SQeTqPRjMxssrrJhVFKGfKZcuHTvOXLUqzgqqzjg7xUQEkDNWg3rpO1tvkc5tgNaOD4NAysjo_vJZ001YFgQNyQaH8JjmjLBW3VVDEuZdBCk81gaibJq7uquCs-f-k0_nu0-2jPgkvcq1fDAdpRxSFqbQo3YLuPj9D8-yoBPz6M8aicF7j39lwu5tXL-xJrHIvvQFAXamEpia2q-S3u4b52ehi4h2tc5rg_GzvgBiU2ebzOsFxWGCSgoQUovOs2mg7un_pDx9ZccISOVitH7-CccSI1ytHYIfcUYyKOOYllmDNGuccjymTU5XEoI6pEpM8AqeO4gKhEUM7CY9QsykKdIMxDRgllgnWljiKFSrSJNFrSODnxOEvEKbrZmCH7qKU1MhOSeEmWphlYLLMWO0VtmN-f3-zUnv3Rfo52oXt9VXKBmtVipS41eKj4lVk0X9vjv78
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8UD3oRPzDiZw8evAzG1q7MG0EJIiMxQsJt6deUqMzAOOBfb183lJiYeFuapWn62r7fa3_v9xC64tSTltUIxQkdopMQHgkbjmRJ6EO0qG31hmgQdEekN6bjIlnd5sJorS35TNfg077lq1Qu4Kqs3gB5KY_RTbRFCSE0T9daZ3TYg9djntNgvlco-TTcsB5FJhT0aM2nxmOCK1pzQmtVVaxT6ZTRYDWcnEvyWltkoiY_fyk1_nu8e2i3gJe4la-HfbShpweovCrdgIudfIgm35cJ-Omxj3vpZIpbb8_pbJK9vM-xQbL4FiR1oRqWVrjQNb_BLdw2bg8D-3CJ0wS3x30HHKHCNpPX6abzDIMINLQAiXdZQaPO3bDddYqqC4408WrmmD2ccEGVwTkGPSSu5lwGgaCB8hPOmXAFYVyRpgh8RZiWxJwCykRyHtWhZIL7R6g0Taf6GGHhc0YZl7ypTBwpdWhMZPCSQcqhK3goq-h6ZYb4IxfXiG1Q4oZxFMVgsbiwWBVVYH5_fium9uSP9ku03R1G_bh_P3g4RTvQVX5xcoZK2Wyhzw2UyMSFXUBfFGTDDA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+SQL+Join+Algorithms+for+Distributed+Systems%3A+A+Case+Study+of+CXL-based+Multi-Host+Shared+Memory&rft.jtitle=IEEE+MICRO&rft.au=Jun%2C+JaeYung&rft.au=Ahn%2C+HyunWoong&rft.au=Lee%2C+Joohee&rft.au=Choi%2C+Jungmin&rft.date=2025&rft.pub=IEEE&rft.issn=0272-1732&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FMM.2025.3574357&rft.externalDocID=11018275
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0272-1732&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0272-1732&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0272-1732&client=summon