Improving SQL Join Algorithms for Distributed Systems: A Case Study of CXL-based Multi-Host Shared Memory
The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers,...
        Saved in:
      
    
          | Published in | IEEE MICRO pp. 1 - 9 | 
|---|---|
| Main Authors | , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            IEEE
    
        2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0272-1732 1937-4143  | 
| DOI | 10.1109/MM.2025.3574357 | 
Cover
| Abstract | The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers, enabling independent processing. However, cross-partition operations, such as joins, require data repartitioning, leading to significant communication overhead. To address this challenge, we propose Merge Hash Join (MHJ), a novel SQL join algorithm that leverages shared memory to eliminate the need for repartitioning. By storing the joining table in shared memory and making it directly accessible to all servers, MHJ significantly reduces communication overhead. To validate our approach, we implemented MHJ and the necessary shared memory functionalities on a CXL-based shared memory prototype. Extensive evaluations using the industry-standard TPC-DS benchmark demonstrate that MHJ achieves up to 1.5× performance improvement compared to conventional join algorithms. | 
    
|---|---|
| AbstractList | The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been limited exploration of shared memory at the application layer. Traditional distributed systems typically partition data across multiple servers, enabling independent processing. However, cross-partition operations, such as joins, require data repartitioning, leading to significant communication overhead. To address this challenge, we propose Merge Hash Join (MHJ), a novel SQL join algorithm that leverages shared memory to eliminate the need for repartitioning. By storing the joining table in shared memory and making it directly accessible to all servers, MHJ significantly reduces communication overhead. To validate our approach, we implemented MHJ and the necessary shared memory functionalities on a CXL-based shared memory prototype. Extensive evaluations using the industry-standard TPC-DS benchmark demonstrate that MHJ achieves up to 1.5× performance improvement compared to conventional join algorithms. | 
    
| Author | Choi, Jungmin Moon, Donguk Ahn, HyunWoong Lee, Joohee Jun, JaeYung Koh, Byungil  | 
    
| Author_xml | – sequence: 1 givenname: JaeYung surname: Jun fullname: Jun, JaeYung email: jaeyung.jun@sk.com organization: SK hynix Inc., Icheon, South Korea – sequence: 2 givenname: HyunWoong surname: Ahn fullname: Ahn, HyunWoong email: hyungwoong.ahn@sk.com organization: SK hynix Inc., Icheon, South Korea – sequence: 3 givenname: Joohee surname: Lee fullname: Lee, Joohee email: joohee.lee@sk.com organization: SK hynix Inc., Icheon, South Korea – sequence: 4 givenname: Jungmin surname: Choi fullname: Choi, Jungmin email: jungmin.choi@sk.com organization: SK hynix Inc., Icheon, South Korea – sequence: 5 givenname: Byungil surname: Koh fullname: Koh, Byungil email: byungil.koh@sk.com organization: SK hynix Inc., Icheon, South Korea – sequence: 6 givenname: Donguk surname: Moon fullname: Moon, Donguk email: donguk.moon@sk.com organization: SK hynix Inc., Icheon, South Korea  | 
    
| BookMark | eNpFkM1rwjAAxcNwMHU777JD_oFqPpt2N3EfOlrG6Aa7laRJNcM2ksRB__spCjs8Hjzee4ffBIx61xsA7jGaYYzyeVnOCCJ8RrlgR12BMc6pSBhmdATGiAiSYEHJDZiE8IMQ4gRlY2DX3d67X9tvYPVRwDdne7jYbZy3cdsF2DoPn2yI3qpDNBpWQ4imC49wAZcyGFjFgx6ga-Hyu0jUMdGwPOyiTVYuRFhtpT8lpnN-uAXXrdwFc3fxKfh6ef5crpLi_XW9XBRJgymNCeGilYprzHORkRYZKZs0VTzVtJVSKKSYkJplKqWaCdMwhpBGJCXc5I1Qkk7B_PzbeBeCN22997aTfqgxqk-k6rKsT6TqC6nj4uG8sMaY_zZGOCOC0z9xyGad | 
    
| CODEN | IEMIDZ | 
    
| ContentType | Journal Article | 
    
| DBID | 97E RIA RIE AAYXX CITATION  | 
    
| DOI | 10.1109/MM.2025.3574357 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef  | 
    
| DatabaseTitle | CrossRef | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore (NTUSG) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISSN | 1937-4143 | 
    
| EndPage | 9 | 
    
| ExternalDocumentID | 10_1109_MM_2025_3574357 11018275  | 
    
| Genre | orig-research | 
    
| GroupedDBID | -DZ -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAFWJ AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK ACNCT AENEX AETEA AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV AZLTO BEFXN BFFAM BGNUA BKEBE BKOMP BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TAE TN5 TWZ YZZ 3EH 5VS AAYXX AETIX AGSQL AI. AIBXA ALLEH C1A CITATION H~9 IBMZZ ICLAB IFJZH OHT RNI RZB VH1 ZCG  | 
    
| ID | FETCH-LOGICAL-c133t-257fab5d159782f0eaac66b56d3faa7b0b47ad48b63d47ec4400d02625e9c7ba3 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 0272-1732 | 
    
| IngestDate | Wed Oct 01 05:57:06 EDT 2025 Wed Aug 27 01:52:01 EDT 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c133t-257fab5d159782f0eaac66b56d3faa7b0b47ad48b63d47ec4400d02625e9c7ba3 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | ieee_primary_11018275 crossref_primary_10_1109_MM_2025_3574357  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2025-00-00 | 
    
| PublicationDateYYYYMMDD | 2025-01-01 | 
    
| PublicationDate_xml | – year: 2025 text: 2025-00-00  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | IEEE MICRO | 
    
| PublicationTitleAbbrev | MM | 
    
| PublicationYear | 2025 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0005208 | 
    
| Score | 2.4210384 | 
    
| Snippet | The advent of Compute Express Link (CXL) has introduced the possibility of multi-host shared memory architectures. Despite this advancement, there has been... | 
    
| SourceID | crossref ieee  | 
    
| SourceType | Index Database Publisher  | 
    
| StartPage | 1 | 
    
| SubjectTerms | Buildings Clustering algorithms Distributed databases Hash functions Partitioning algorithms Random access memory Servers Sparks Structured Query Language Training  | 
    
| Title | Improving SQL Join Algorithms for Distributed Systems: A Case Study of CXL-based Multi-Host Shared Memory | 
    
| URI | https://ieeexplore.ieee.org/document/11018275 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore (NTUSG) customDbUrl: eissn: 1937-4143 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0005208 issn: 0272-1732 databaseCode: RIE dateStart: 19810101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62Jy_WR8U3OXjwstt9ZdP1VqqllG5BtNDbktdqUXel3R7qrzeTTbUIgrclbCBkksw3yTffIHTNSCAMqxGKEzqRyhN4JPQdQfMkhGhRmeoN6SQeTqPRjMxssrrJhVFKGfKZcuHTvOXLUqzgqqzjg7xUQEkDNWg3rpO1tvkc5tgNaOD4NAysjo_vJZ001YFgQNyQaH8JjmjLBW3VVDEuZdBCk81gaibJq7uquCs-f-k0_nu0-2jPgkvcq1fDAdpRxSFqbQo3YLuPj9D8-yoBPz6M8aicF7j39lwu5tXL-xJrHIvvQFAXamEpia2q-S3u4b52ehi4h2tc5rg_GzvgBiU2ebzOsFxWGCSgoQUovOs2mg7un_pDx9ZccISOVitH7-CccSI1ytHYIfcUYyKOOYllmDNGuccjymTU5XEoI6pEpM8AqeO4gKhEUM7CY9QsykKdIMxDRgllgnWljiKFSrSJNFrSODnxOEvEKbrZmCH7qKU1MhOSeEmWphlYLLMWO0VtmN-f3-zUnv3Rfo52oXt9VXKBmtVipS41eKj4lVk0X9vjv78 | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8UD3oRPzDiZw8evAzG1q7MG0EJIiMxQsJt6deUqMzAOOBfb183lJiYeFuapWn62r7fa3_v9xC64tSTltUIxQkdopMQHgkbjmRJ6EO0qG31hmgQdEekN6bjIlnd5sJorS35TNfg077lq1Qu4Kqs3gB5KY_RTbRFCSE0T9daZ3TYg9djntNgvlco-TTcsB5FJhT0aM2nxmOCK1pzQmtVVaxT6ZTRYDWcnEvyWltkoiY_fyk1_nu8e2i3gJe4la-HfbShpweovCrdgIudfIgm35cJ-Omxj3vpZIpbb8_pbJK9vM-xQbL4FiR1oRqWVrjQNb_BLdw2bg8D-3CJ0wS3x30HHKHCNpPX6abzDIMINLQAiXdZQaPO3bDddYqqC4408WrmmD2ccEGVwTkGPSSu5lwGgaCB8hPOmXAFYVyRpgh8RZiWxJwCykRyHtWhZIL7R6g0Taf6GGHhc0YZl7ypTBwpdWhMZPCSQcqhK3goq-h6ZYb4IxfXiG1Q4oZxFMVgsbiwWBVVYH5_fium9uSP9ku03R1G_bh_P3g4RTvQVX5xcoZK2Wyhzw2UyMSFXUBfFGTDDA | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Improving+SQL+Join+Algorithms+for+Distributed+Systems%3A+A+Case+Study+of+CXL-based+Multi-Host+Shared+Memory&rft.jtitle=IEEE+MICRO&rft.au=Jun%2C+JaeYung&rft.au=Ahn%2C+HyunWoong&rft.au=Lee%2C+Joohee&rft.au=Choi%2C+Jungmin&rft.date=2025&rft.pub=IEEE&rft.issn=0272-1732&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FMM.2025.3574357&rft.externalDocID=11018275 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0272-1732&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0272-1732&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0272-1732&client=summon |