HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 35; no. 5; pp. 707 - 719
Main Authors	Lin, Yi-Chien, Zhang, Bingyi, Prasanna, Viktor K.
Format	Journal Article
Language	English
Published	New York IEEE 01.05.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Application programming interface Bandwidths Central processing units Computational modeling CPU+Multi-FPGA CPUs Design parameters Field programmable gate arrays graph neural network Graph neural networks Graphics processing units Hardware hardware acceleration Memory management Metadata Partitioning algorithms Training Vectors Workload
Online Access	Get full text
ISSN	1045-9219 1558-2183
DOI	10.1109/TPDS.2024.3371332

Cover

Abstract	As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.
AbstractList	As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.
Author	Zhang, Bingyi Prasanna, Viktor K. Lin, Yi-Chien
Author_xml	– sequence: 1 givenname: Yi-Chien orcidid: 0000-0002-1710-1532 surname: Lin fullname: Lin, Yi-Chien email: yichienl@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA – sequence: 2 givenname: Bingyi orcidid: 0000-0002-8115-0814 surname: Zhang fullname: Zhang, Bingyi email: bingyizh@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA – sequence: 3 givenname: Viktor K. orcidid: 0000-0002-1609-8589 surname: Prasanna fullname: Prasanna, Viktor K. email: prasanna@usc.edu organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
BookMark	eNpNkE1Lw0AQhhepYFv9AYKHBY-Sul9Jd72Vahuh1oApHpftZpKmttm6SRD_vSntwdO8DM87A88A9SpXAUK3lIwoJeoxTZ4_RowwMeJ8TDlnF6hPw1AGjEre6zIRYaAYVVdoUNdbQqgIieijz7hs5svlE47LYhOkG-_aYnNoG9wtcepNWZVVgWfe7OHH-S_sKjxNVg9v7a4pg1kyn-AYGvCugApcW-NkZ5rc-f01uszNroab8xyi1ewlncbB4n3-Op0sAksj2gQmEyFIGuWWmMwoCAXJpI0iqrillguakTVkmVxnVrIwH6tIMsiJBMGUJZbxIbo_3T14991C3eita33VvdRMRYoI2dnpKHqirHd17SHXB1_ujf_VlOijP330p4_-9Nlf17k7dUoA-MeLkEkR8T8-Qmz0
CODEN	ITDSEO
Cites_doi	10.1109/HPEC49654.2021.9622822 10.1186/s13321-023-00682-3 10.1109/FPL60245.2023.00038 10.1109/DAC18074.2021.9586122 10.1007/978-3-031-23821-5_2 10.1145/3447786.3456233 10.1145/3219819.3219890 10.1109/FPL.2018.00074 10.1109/IPDPS54959.2023.00062 10.14778/3352063.3352127 10.1109/IA351965.2020.00011 10.1109/HPEC49654.2021.9622801 10.1109/HPCA47549.2020.00012 10.1145/3373087.3375887 10.1109/SC41405.2020.00060 10.1145/3475851.3475864 10.1145/3470496.3527439 10.1109/TC.2020.2983694 10.1145/3294054 10.1145/3490422.3502359 10.1109/IPDPS57955.2024.00039 10.1145/3419111.3421281 10.1109/FCCM.2018.00021 10.1145/3466752.3480113 10.1109/ICASSP40776.2020.9053977
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TPDS.2024.3371332
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore digital library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1558-2183
EndPage	719
ExternalDocumentID	10_1109_TPDS_2024_3371332 10452846
Genre	orig-research
GrantInformation_xml	– fundername: National Science Foundation; U.S. National Science Foundation grantid: CCF-1919289/SPX-2333009; CNS-2009057; OAC-2209563 funderid: 10.13039/100000001 – fundername: Semiconductor Research Corporation funderid: 10.13039/100000028
GroupedDBID	--Z -~X .DC 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 TWZ UHB VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c161t-ad45e816fc0ada9e540d8c66193c1c341d0bedd8bdc825f79682ef08e429c0c23
IEDL.DBID	RIE
ISSN	1045-9219
IngestDate	Mon Jun 30 03:37:22 EDT 2025 Wed Oct 01 04:37:25 EDT 2025 Wed Aug 27 02:17:02 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	5
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c161t-ad45e816fc0ada9e540d8c66193c1c341d0bedd8bdc825f79682ef08e429c0c23
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-8115-0814 0000-0002-1710-1532 0000-0002-1609-8589
PQID	2969048109
PQPubID	85437
PageCount	13
ParticipantIDs	proquest_journals_2969048109 crossref_primary_10_1109_TPDS_2024_3371332 ieee_primary_10452846
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-05-01
PublicationDateYYYYMMDD	2024-05-01
PublicationDate_xml	– month: 05 year: 2024 text: 2024-05-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev	TPDS
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 Chen (ref22) ref34 ref15 ref31 ref30 ref33 ref10 Zhang (ref25) (ref38) 2023 ref2 ref1 Gandhi (ref14) ref17 (ref35) 2022 ref39 ref16 ref19 ref18 Hamilton (ref3) Kipf (ref40) Lin (ref9) Fey (ref21) Zeng (ref6) Que (ref11) 2022 Hu (ref32) 2020 (ref37) 2022 ref24 ref23 ref26 ref41 Zhang (ref20) ref28 ref27 ref29 ref4 Lin (ref8) ref5 (ref36) 2022 Zeng (ref7)
References_xml	– ident: ref28 doi: 10.1109/HPEC49654.2021.9622822 – ident: ref4 doi: 10.1186/s13321-023-00682-3 – ident: ref13 doi: 10.1109/FPL60245.2023.00038 – ident: ref29 doi: 10.1109/DAC18074.2021.9586122 – volume-title: Proc. Int. Conf. Learn. Representations ident: ref40 article-title: Semi-supervised classification with graph convolutional networks – volume-title: Proc. Int. Conf. Learn. Representations Workshop ident: ref22 article-title: Revisiting distributed synchronous SGD – ident: ref31 doi: 10.1007/978-3-031-23821-5_2 – ident: ref33 doi: 10.1145/3447786.3456233 – ident: ref1 doi: 10.1145/3219819.3219890 – ident: ref24 doi: 10.1109/FPL.2018.00074 – ident: ref17 doi: 10.1109/IPDPS54959.2023.00062 – ident: ref2 doi: 10.14778/3352063.3352127 – start-page: 392 volume-title: Proc. ACM Int. Conf. Comput. Front. ident: ref9 article-title: A unified CPU-GPU protocol for GNN training – ident: ref15 doi: 10.1109/IA351965.2020.00011 – ident: ref12 doi: 10.1109/HPEC49654.2021.9622801 – ident: ref18 doi: 10.1109/HPCA47549.2020.00012 – year: 2020 ident: ref32 article-title: Open graph benchmark: Datasets for machine learning on graphs – ident: ref39 doi: 10.1145/3373087.3375887 – ident: ref41 doi: 10.1109/SC41405.2020.00060 – ident: ref5 doi: 10.1145/3475851.3475864 – ident: ref19 doi: 10.1145/3470496.3527439 – ident: ref27 doi: 10.1109/TC.2020.2983694 – start-page: 551 volume-title: Proc. 15th USENIX Symp. Operating Syst. Des. Implementation ident: ref14 article-title: P3: Distributed deep graph learning at scale – volume-title: Proc. Int. Conf. Learn. Representations ident: ref6 article-title: GraphSAINT: Graph sampling based inductive learning method – ident: ref34 doi: 10.1145/3294054 – ident: ref10 doi: 10.1145/3490422.3502359 – start-page: 19665 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref7 article-title: Decoupling the depth and scope of graph neural networks – volume-title: Proc. IEEE Int. Parallel Distrib. Process. Symp. ident: ref8 article-title: Argo: An auto-tuning runtime system for scalable gnn training on multi-core processor doi: 10.1109/IPDPS57955.2024.00039 – volume-title: Proc. ICLR Workshop Representation Learn. Graphs Manifolds ident: ref21 article-title: Fast graph representation learning with PyTorch Geometric – start-page: 1025 volume-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst. ident: ref3 article-title: Inductive representation learning on large graphs – year: 2022 ident: ref35 article-title: Amazon EC2 F1 [online] – ident: ref16 doi: 10.1145/3419111.3421281 – year: 2023 ident: ref38 article-title: Nvidia system management interface [online] – start-page: 467 volume-title: Proc. Mach. Learn. Syst. ident: ref20 article-title: Understanding GNN computational graph: A coordinated computation, IO, and memory perspective – ident: ref23 doi: 10.1109/FCCM.2018.00021 – ident: ref30 doi: 10.1145/3466752.3480113 – year: 2022 ident: ref36 article-title: Azure np-series [online] – start-page: 1 volume-title: Proc. Proc. 39th Int. Conf. Comput.-Aided Des. ident: ref25 article-title: DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator – year: 2022 ident: ref37 article-title: Powertop [online] – year: 2022 ident: ref11 article-title: Ll-GNN: Low latency graph neural networks on FPGAs for particle detectors – ident: ref26 doi: 10.1109/ICASSP40776.2020.9053977
SSID	ssj0014504
Score	2.4484735
Snippet	As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Index Database Publisher
StartPage	707
SubjectTerms	Algorithms Application programming interface Bandwidths Central processing units Computational modeling CPU+Multi-FPGA CPUs Design parameters Field programmable gate arrays graph neural network Graph neural networks Graphics processing units Hardware hardware acceleration Memory management Metadata Partitioning algorithms Training Vectors Workload
Title	HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform
URI	https://ieeexplore.ieee.org/document/10452846 https://www.proquest.com/docview/2969048109
Volume	35
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Xplore digital library customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60Jz1Yn1hf5OBJSU13k3TjTdS2CJaCLXpbNo8FUbqi24u_3kl2V3wgeFtCJoTMTL6ZnZkMwDEXCKM5Y1SJSFKecEcRRjIqLRe9PI9infni5NuxHM34zYN4qIvVQy2Mcy4kn7mu_wyxfFuYhf9VhhrOBV6nchmW-4msirU-QwZchF6Bfg5VqId1CLPH1Nl0cnWHrmDEu3HsnbLoGwiFriq_ruKAL4M2jJudVWklT91Fqbvm_cejjf_e-jqs1ZYmuahEYwOW3HwT2k0XB1Ir9SasfnmScAvuR4_lcDw-Jz4BhE6rLj5IQXCQTOt-EmTQpHSRYk4uJ7PTUMdLB5PhBRn5BJsC5dIVizcyec5Kbxdvw2xwPb0c0br5AjVoBJY0Q2a5pCdzwzKbKYeWnU0MormKTc8g9lmmnbWJtgadzLyvZBK5nCUOAc4wE8U70JoXc7cLRHDXt1rEQiEdM0xLK-Io1znT6BNL14GThhvpS_XGRhp8E6ZSz7rUsy6tWdeBbX-6XyZWB9uBg4aBaa2Gb2mk0PnnCS609wfZPqz41asUxgNola8Ld4hmRqmPgnh9APXGzMI
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB7xOBQOvApiefrAicqLN7FDzA0BS1ogWqlZlVsUPyIh0AZB9sKvZ-wkCIqQuEWRnVieGX8znhfAARcIoyVjVIogojzmliKMFDQyXAzKMghV4ZKTb9IoGfM_t-K2TVb3uTDWWh98Zvvu0fvyTaWn7qoMJZwLPE6jWZgXnHPRpGu9OQ248N0C3SgqURJbJ-aAyaNsdP4XjcGA98PQmWXBBxjyfVU-HcYeYYbLkHZrawJL7vvTWvX1y39lG7-9-BVYanVNctowxyrM2MkaLHd9HEgr1muw-K4o4U_4l9zVl2l6QlwICM2aPj44g-BLkrUdJciwC-oi1YScjca_fCYvHY4uT0niQmwq5ExbTZ_J6KGonWa8DuPhRXaW0Lb9AtWoBta0QHLZeBCVmhWmkBZ1OxNrxHMZ6oFG9DNMWWNiZTSameWxjOLAliy2CHGa6SDcgLlJNbGbQAS3x0aJUEicxzRTkRFhUKqSKbSKI9uDw44a-WNTZSP31gmTuSNd7kiXt6Trwbrb3XcDm43twU5HwLwVxOc8kGj-8xg_tPXFtH34kWQ31_n17_RqGxbcn5qAxh2Yq5-mdheVjlrteVZ7BdsX0A8
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HitGNN%3A+High-Throughput+GNN+Training+Framework+on+CPU%2BMulti-FPGA+Heterogeneous+Platform&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Lin%2C+Yi-Chien&rft.au=Zhang%2C+Bingyi&rft.au=Prasanna%2C+Viktor+K.&rft.date=2024-05-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=35&rft.issue=5&rft.spage=707&rft.epage=719&rft_id=info:doi/10.1109%2FTPDS.2024.3371332&rft.externalDocID=10452846
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon