HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform
As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with...
        Saved in:
      
    
          | Published in | IEEE transactions on parallel and distributed systems Vol. 35; no. 5; pp. 707 - 719 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        New York
          IEEE
    
        01.05.2024
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1045-9219 1558-2183  | 
| DOI | 10.1109/TPDS.2024.3371332 | 
Cover
| Abstract | As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform. | 
    
|---|---|
| AbstractList | As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform. | 
    
| Author | Zhang, Bingyi Prasanna, Viktor K. Lin, Yi-Chien  | 
    
| Author_xml | – sequence: 1 givenname: Yi-Chien orcidid: 0000-0002-1710-1532 surname: Lin fullname: Lin, Yi-Chien email: yichienl@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA – sequence: 2 givenname: Bingyi orcidid: 0000-0002-8115-0814 surname: Zhang fullname: Zhang, Bingyi email: bingyizh@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA – sequence: 3 givenname: Viktor K. orcidid: 0000-0002-1609-8589 surname: Prasanna fullname: Prasanna, Viktor K. email: prasanna@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA  | 
    
| BookMark | eNpNkE1Lw0AQhhepYFv9AYKHBY-Sul9Jd72Vahuh1oApHpftZpKmttm6SRD_vSntwdO8DM87A88A9SpXAUK3lIwoJeoxTZ4_RowwMeJ8TDlnF6hPw1AGjEre6zIRYaAYVVdoUNdbQqgIieijz7hs5svlE47LYhOkG-_aYnNoG9wtcepNWZVVgWfe7OHH-S_sKjxNVg9v7a4pg1kyn-AYGvCugApcW-NkZ5rc-f01uszNroab8xyi1ewlncbB4n3-Op0sAksj2gQmEyFIGuWWmMwoCAXJpI0iqrillguakTVkmVxnVrIwH6tIMsiJBMGUJZbxIbo_3T14991C3eita33VvdRMRYoI2dnpKHqirHd17SHXB1_ujf_VlOijP330p4_-9Nlf17k7dUoA-MeLkEkR8T8-Qmz0 | 
    
| CODEN | ITDSEO | 
    
| Cites_doi | 10.1109/HPEC49654.2021.9622822 10.1186/s13321-023-00682-3 10.1109/FPL60245.2023.00038 10.1109/DAC18074.2021.9586122 10.1007/978-3-031-23821-5_2 10.1145/3447786.3456233 10.1145/3219819.3219890 10.1109/FPL.2018.00074 10.1109/IPDPS54959.2023.00062 10.14778/3352063.3352127 10.1109/IA351965.2020.00011 10.1109/HPEC49654.2021.9622801 10.1109/HPCA47549.2020.00012 10.1145/3373087.3375887 10.1109/SC41405.2020.00060 10.1145/3475851.3475864 10.1145/3470496.3527439 10.1109/TC.2020.2983694 10.1145/3294054 10.1145/3490422.3502359 10.1109/IPDPS57955.2024.00039 10.1145/3419111.3421281 10.1109/FCCM.2018.00021 10.1145/3466752.3480113 10.1109/ICASSP40776.2020.9053977  | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 | 
    
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 | 
    
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D  | 
    
| DOI | 10.1109/TPDS.2024.3371332 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitleList | Technology Research Database | 
    
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore digital library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering Computer Science  | 
    
| EISSN | 1558-2183 | 
    
| EndPage | 719 | 
    
| ExternalDocumentID | 10_1109_TPDS_2024_3371332 10452846  | 
    
| Genre | orig-research | 
    
| GrantInformation_xml | – fundername: National Science Foundation; U.S. National Science Foundation grantid: CCF-1919289/SPX-2333009; CNS-2009057; OAC-2209563 funderid: 10.13039/100000001 – fundername: Semiconductor Research Corporation funderid: 10.13039/100000028  | 
    
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D  | 
    
| ID | FETCH-LOGICAL-c161t-ad45e816fc0ada9e540d8c66193c1c341d0bedd8bdc825f79682ef08e429c0c23 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 1045-9219 | 
    
| IngestDate | Mon Jun 30 03:37:22 EDT 2025 Wed Oct 01 04:37:25 EDT 2025 Wed Aug 27 02:17:02 EDT 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 5 | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c161t-ad45e816fc0ada9e540d8c66193c1c341d0bedd8bdc825f79682ef08e429c0c23 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
    
| ORCID | 0000-0002-8115-0814 0000-0002-1710-1532 0000-0002-1609-8589  | 
    
| PQID | 2969048109 | 
    
| PQPubID | 85437 | 
    
| PageCount | 13 | 
    
| ParticipantIDs | proquest_journals_2969048109 crossref_primary_10_1109_TPDS_2024_3371332 ieee_primary_10452846  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2024-05-01 | 
    
| PublicationDateYYYYMMDD | 2024-05-01 | 
    
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-05-01 day: 01  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | New York | 
    
| PublicationPlace_xml | – name: New York | 
    
| PublicationTitle | IEEE transactions on parallel and distributed systems | 
    
| PublicationTitleAbbrev | TPDS | 
    
| PublicationYear | 2024 | 
    
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| References | ref13 ref12 Chen (ref22) ref34 ref15 ref31 ref30 ref33 ref10 Zhang (ref25) (ref38) 2023 ref2 ref1 Gandhi (ref14) ref17 (ref35) 2022 ref39 ref16 ref19 ref18 Hamilton (ref3) Kipf (ref40) Lin (ref9) Fey (ref21) Zeng (ref6) Que (ref11) 2022 Hu (ref32) 2020 (ref37) 2022 ref24 ref23 ref26 ref41 Zhang (ref20) ref28 ref27 ref29 ref4 Lin (ref8) ref5 (ref36) 2022 Zeng (ref7)  | 
    
| References_xml | – ident: ref28 doi: 10.1109/HPEC49654.2021.9622822 – ident: ref4 doi: 10.1186/s13321-023-00682-3 – ident: ref13 doi: 10.1109/FPL60245.2023.00038 – ident: ref29 doi: 10.1109/DAC18074.2021.9586122 – volume-title: Proc. Int. Conf. Learn. Representations ident: ref40 article-title: Semi-supervised classification with graph convolutional networks – volume-title: Proc. Int. Conf. Learn. Representations Workshop ident: ref22 article-title: Revisiting distributed synchronous SGD – ident: ref31 doi: 10.1007/978-3-031-23821-5_2 – ident: ref33 doi: 10.1145/3447786.3456233 – ident: ref1 doi: 10.1145/3219819.3219890 – ident: ref24 doi: 10.1109/FPL.2018.00074 – ident: ref17 doi: 10.1109/IPDPS54959.2023.00062 – ident: ref2 doi: 10.14778/3352063.3352127 – start-page: 392 volume-title: Proc. ACM Int. Conf. Comput. Front. ident: ref9 article-title: A unified CPU-GPU protocol for GNN training – ident: ref15 doi: 10.1109/IA351965.2020.00011 – ident: ref12 doi: 10.1109/HPEC49654.2021.9622801 – ident: ref18 doi: 10.1109/HPCA47549.2020.00012 – year: 2020 ident: ref32 article-title: Open graph benchmark: Datasets for machine learning on graphs – ident: ref39 doi: 10.1145/3373087.3375887 – ident: ref41 doi: 10.1109/SC41405.2020.00060 – ident: ref5 doi: 10.1145/3475851.3475864 – ident: ref19 doi: 10.1145/3470496.3527439 – ident: ref27 doi: 10.1109/TC.2020.2983694 – start-page: 551 volume-title: Proc. 15th USENIX Symp. Operating Syst. Des. Implementation ident: ref14 article-title: P3: Distributed deep graph learning at scale – volume-title: Proc. Int. Conf. Learn. Representations ident: ref6 article-title: GraphSAINT: Graph sampling based inductive learning method – ident: ref34 doi: 10.1145/3294054 – ident: ref10 doi: 10.1145/3490422.3502359 – start-page: 19665 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref7 article-title: Decoupling the depth and scope of graph neural networks – volume-title: Proc. IEEE Int. Parallel Distrib. Process. Symp. ident: ref8 article-title: Argo: An auto-tuning runtime system for scalable gnn training on multi-core processor doi: 10.1109/IPDPS57955.2024.00039 – volume-title: Proc. ICLR Workshop Representation Learn. Graphs Manifolds ident: ref21 article-title: Fast graph representation learning with PyTorch Geometric – start-page: 1025 volume-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst. ident: ref3 article-title: Inductive representation learning on large graphs – year: 2022 ident: ref35 article-title: Amazon EC2 F1 [online] – ident: ref16 doi: 10.1145/3419111.3421281 – year: 2023 ident: ref38 article-title: Nvidia system management interface [online] – start-page: 467 volume-title: Proc. Mach. Learn. Syst. ident: ref20 article-title: Understanding GNN computational graph: A coordinated computation, IO, and memory perspective – ident: ref23 doi: 10.1109/FCCM.2018.00021 – ident: ref30 doi: 10.1145/3466752.3480113 – year: 2022 ident: ref36 article-title: Azure np-series [online] – start-page: 1 volume-title: Proc. Proc. 39th Int. Conf. Comput.-Aided Des. ident: ref25 article-title: DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator – year: 2022 ident: ref37 article-title: Powertop [online] – year: 2022 ident: ref11 article-title: Ll-GNN: Low latency graph neural networks on FPGAs for particle detectors – ident: ref26 doi: 10.1109/ICASSP40776.2020.9053977  | 
    
| SSID | ssj0014504 | 
    
| Score | 2.4484735 | 
    
| Snippet | As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works... | 
    
| SourceID | proquest crossref ieee  | 
    
| SourceType | Aggregation Database Index Database Publisher  | 
    
| StartPage | 707 | 
    
| SubjectTerms | Algorithms Application programming interface Bandwidths Central processing units Computational modeling CPU+Multi-FPGA CPUs Design parameters Field programmable gate arrays graph neural network Graph neural networks Graphics processing units Hardware hardware acceleration Memory management Metadata Partitioning algorithms Training Vectors Workload  | 
    
| Title | HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform | 
    
| URI | https://ieeexplore.ieee.org/document/10452846 https://www.proquest.com/docview/2969048109  | 
    
| Volume | 35 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore digital library customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60Jz1Yn1hf5OBJSU13k3TjTdS2CJaCLXpbNo8FUbqi24u_3kl2V3wgeFtCJoTMTL6ZnZkMwDEXCKM5Y1SJSFKecEcRRjIqLRe9PI9infni5NuxHM34zYN4qIvVQy2Mcy4kn7mu_wyxfFuYhf9VhhrOBV6nchmW-4msirU-QwZchF6Bfg5VqId1CLPH1Nl0cnWHrmDEu3HsnbLoGwiFriq_ruKAL4M2jJudVWklT91Fqbvm_cejjf_e-jqs1ZYmuahEYwOW3HwT2k0XB1Ir9SasfnmScAvuR4_lcDw-Jz4BhE6rLj5IQXCQTOt-EmTQpHSRYk4uJ7PTUMdLB5PhBRn5BJsC5dIVizcyec5Kbxdvw2xwPb0c0br5AjVoBJY0Q2a5pCdzwzKbKYeWnU0MormKTc8g9lmmnbWJtgadzLyvZBK5nCUOAc4wE8U70JoXc7cLRHDXt1rEQiEdM0xLK-Io1znT6BNL14GThhvpS_XGRhp8E6ZSz7rUsy6tWdeBbX-6XyZWB9uBg4aBaa2Gb2mk0PnnCS609wfZPqz41asUxgNola8Ld4hmRqmPgnh9APXGzMI | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB7xOBQOvApiefrAicqLN7FDzA0BS1ogWqlZlVsUPyIh0AZB9sKvZ-wkCIqQuEWRnVieGX8znhfAARcIoyVjVIogojzmliKMFDQyXAzKMghV4ZKTb9IoGfM_t-K2TVb3uTDWWh98Zvvu0fvyTaWn7qoMJZwLPE6jWZgXnHPRpGu9OQ248N0C3SgqURJbJ-aAyaNsdP4XjcGA98PQmWXBBxjyfVU-HcYeYYbLkHZrawJL7vvTWvX1y39lG7-9-BVYanVNctowxyrM2MkaLHd9HEgr1muw-K4o4U_4l9zVl2l6QlwICM2aPj44g-BLkrUdJciwC-oi1YScjca_fCYvHY4uT0niQmwq5ExbTZ_J6KGonWa8DuPhRXaW0Lb9AtWoBta0QHLZeBCVmhWmkBZ1OxNrxHMZ6oFG9DNMWWNiZTSameWxjOLAliy2CHGa6SDcgLlJNbGbQAS3x0aJUEicxzRTkRFhUKqSKbSKI9uDw44a-WNTZSP31gmTuSNd7kiXt6Trwbrb3XcDm43twU5HwLwVxOc8kGj-8xg_tPXFtH34kWQ31_n17_RqGxbcn5qAxh2Yq5-mdheVjlrteVZ7BdsX0A8 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HitGNN%3A+High-Throughput+GNN+Training+Framework+on+CPU%2BMulti-FPGA+Heterogeneous+Platform&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Lin%2C+Yi-Chien&rft.au=Zhang%2C+Bingyi&rft.au=Prasanna%2C+Viktor+K.&rft.date=2024-05-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=35&rft.issue=5&rft.spage=707&rft.epage=719&rft_id=info:doi/10.1109%2FTPDS.2024.3371332&rft.externalDocID=10452846 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |