HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform

As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 35; no. 5; pp. 707 - 719
Main Authors Lin, Yi-Chien, Zhang, Bingyi, Prasanna, Viktor K.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.05.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1045-9219
1558-2183
DOI10.1109/TPDS.2024.3371332

Cover

Abstract As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.
AbstractList As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works have demonstrated the potential of utilizing FPGA for accelerating GNN training, few works have been carried out to accelerate GNN training with multiple FPGAs due to the necessity of hardware expertise and substantial development effort. To this end, we propose HitGNN, a framework that enables users to effortlessly map GNN training workloads onto a CPU+Multi-FPGA platform for acceleration. In particular, HitGNN takes the user-defined synchronous GNN training algorithm, GNN model, and platform metadata as input, determines the design parameters based on the platform metadata, and performs hardware mapping onto the CPU+Multi-FPGA platform, automatically. HitGNN consists of the following building blocks: (1) high-level application programming interfaces (APIs) that allow users to specify various synchronous GNN training algorithms and GNN models with only a handful of lines of code; (2) a software generator that generates a host program that performs mini-batch sampling, manages CPU-FPGA communication, and handles workload balancing among the FPGAs; (3) an accelerator generator that generates GNN kernels with optimized datapath and memory organization. We show that existing synchronous GNN training algorithms such as DistDGL and PaGraph can be easily deployed on a CPU+Multi-FPGA platform using our framework, while achieving high training throughput. Compared with the state-of-the-art frameworks that accelerate synchronous GNN training on a multi-GPU platform, HitGNN achieves up to 27.21× bandwidth efficiency, and up to 4.26× speedup using much less compute power and memory bandwidth than GPUs. In addition, HitGNN demonstrates good scalability to 16 FPGAs on a CPU+Multi-FPGA platform.
Author Zhang, Bingyi
Prasanna, Viktor K.
Lin, Yi-Chien
Author_xml – sequence: 1
  givenname: Yi-Chien
  orcidid: 0000-0002-1710-1532
  surname: Lin
  fullname: Lin, Yi-Chien
  email: yichienl@usc.edu
  organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
– sequence: 2
  givenname: Bingyi
  orcidid: 0000-0002-8115-0814
  surname: Zhang
  fullname: Zhang, Bingyi
  email: bingyizh@usc.edu
  organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
– sequence: 3
  givenname: Viktor K.
  orcidid: 0000-0002-1609-8589
  surname: Prasanna
  fullname: Prasanna, Viktor K.
  email: prasanna@usc.edu
  organization: Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
BookMark eNpNkE1Lw0AQhhepYFv9AYKHBY-Sul9Jd72Vahuh1oApHpftZpKmttm6SRD_vSntwdO8DM87A88A9SpXAUK3lIwoJeoxTZ4_RowwMeJ8TDlnF6hPw1AGjEre6zIRYaAYVVdoUNdbQqgIieijz7hs5svlE47LYhOkG-_aYnNoG9wtcepNWZVVgWfe7OHH-S_sKjxNVg9v7a4pg1kyn-AYGvCugApcW-NkZ5rc-f01uszNroab8xyi1ewlncbB4n3-Op0sAksj2gQmEyFIGuWWmMwoCAXJpI0iqrillguakTVkmVxnVrIwH6tIMsiJBMGUJZbxIbo_3T14991C3eita33VvdRMRYoI2dnpKHqirHd17SHXB1_ujf_VlOijP330p4_-9Nlf17k7dUoA-MeLkEkR8T8-Qmz0
CODEN ITDSEO
Cites_doi 10.1109/HPEC49654.2021.9622822
10.1186/s13321-023-00682-3
10.1109/FPL60245.2023.00038
10.1109/DAC18074.2021.9586122
10.1007/978-3-031-23821-5_2
10.1145/3447786.3456233
10.1145/3219819.3219890
10.1109/FPL.2018.00074
10.1109/IPDPS54959.2023.00062
10.14778/3352063.3352127
10.1109/IA351965.2020.00011
10.1109/HPEC49654.2021.9622801
10.1109/HPCA47549.2020.00012
10.1145/3373087.3375887
10.1109/SC41405.2020.00060
10.1145/3475851.3475864
10.1145/3470496.3527439
10.1109/TC.2020.2983694
10.1145/3294054
10.1145/3490422.3502359
10.1109/IPDPS57955.2024.00039
10.1145/3419111.3421281
10.1109/FCCM.2018.00021
10.1145/3466752.3480113
10.1109/ICASSP40776.2020.9053977
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPDS.2024.3371332
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore digital library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 719
ExternalDocumentID 10_1109_TPDS_2024_3371332
10452846
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation; U.S. National Science Foundation
  grantid: CCF-1919289/SPX-2333009; CNS-2009057; OAC-2209563
  funderid: 10.13039/100000001
– fundername: Semiconductor Research Corporation
  funderid: 10.13039/100000028
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
TWZ
UHB
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c161t-ad45e816fc0ada9e540d8c66193c1c341d0bedd8bdc825f79682ef08e429c0c23
IEDL.DBID RIE
ISSN 1045-9219
IngestDate Mon Jun 30 03:37:22 EDT 2025
Wed Oct 01 04:37:25 EDT 2025
Wed Aug 27 02:17:02 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c161t-ad45e816fc0ada9e540d8c66193c1c341d0bedd8bdc825f79682ef08e429c0c23
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8115-0814
0000-0002-1710-1532
0000-0002-1609-8589
PQID 2969048109
PQPubID 85437
PageCount 13
ParticipantIDs proquest_journals_2969048109
crossref_primary_10_1109_TPDS_2024_3371332
ieee_primary_10452846
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-05-01
PublicationDateYYYYMMDD 2024-05-01
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-05-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
Chen (ref22)
ref34
ref15
ref31
ref30
ref33
ref10
Zhang (ref25)
(ref38) 2023
ref2
ref1
Gandhi (ref14)
ref17
(ref35) 2022
ref39
ref16
ref19
ref18
Hamilton (ref3)
Kipf (ref40)
Lin (ref9)
Fey (ref21)
Zeng (ref6)
Que (ref11) 2022
Hu (ref32) 2020
(ref37) 2022
ref24
ref23
ref26
ref41
Zhang (ref20)
ref28
ref27
ref29
ref4
Lin (ref8)
ref5
(ref36) 2022
Zeng (ref7)
References_xml – ident: ref28
  doi: 10.1109/HPEC49654.2021.9622822
– ident: ref4
  doi: 10.1186/s13321-023-00682-3
– ident: ref13
  doi: 10.1109/FPL60245.2023.00038
– ident: ref29
  doi: 10.1109/DAC18074.2021.9586122
– volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref40
  article-title: Semi-supervised classification with graph convolutional networks
– volume-title: Proc. Int. Conf. Learn. Representations Workshop
  ident: ref22
  article-title: Revisiting distributed synchronous SGD
– ident: ref31
  doi: 10.1007/978-3-031-23821-5_2
– ident: ref33
  doi: 10.1145/3447786.3456233
– ident: ref1
  doi: 10.1145/3219819.3219890
– ident: ref24
  doi: 10.1109/FPL.2018.00074
– ident: ref17
  doi: 10.1109/IPDPS54959.2023.00062
– ident: ref2
  doi: 10.14778/3352063.3352127
– start-page: 392
  volume-title: Proc. ACM Int. Conf. Comput. Front.
  ident: ref9
  article-title: A unified CPU-GPU protocol for GNN training
– ident: ref15
  doi: 10.1109/IA351965.2020.00011
– ident: ref12
  doi: 10.1109/HPEC49654.2021.9622801
– ident: ref18
  doi: 10.1109/HPCA47549.2020.00012
– year: 2020
  ident: ref32
  article-title: Open graph benchmark: Datasets for machine learning on graphs
– ident: ref39
  doi: 10.1145/3373087.3375887
– ident: ref41
  doi: 10.1109/SC41405.2020.00060
– ident: ref5
  doi: 10.1145/3475851.3475864
– ident: ref19
  doi: 10.1145/3470496.3527439
– ident: ref27
  doi: 10.1109/TC.2020.2983694
– start-page: 551
  volume-title: Proc. 15th USENIX Symp. Operating Syst. Des. Implementation
  ident: ref14
  article-title: P3: Distributed deep graph learning at scale
– volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref6
  article-title: GraphSAINT: Graph sampling based inductive learning method
– ident: ref34
  doi: 10.1145/3294054
– ident: ref10
  doi: 10.1145/3490422.3502359
– start-page: 19665
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  ident: ref7
  article-title: Decoupling the depth and scope of graph neural networks
– volume-title: Proc. IEEE Int. Parallel Distrib. Process. Symp.
  ident: ref8
  article-title: Argo: An auto-tuning runtime system for scalable gnn training on multi-core processor
  doi: 10.1109/IPDPS57955.2024.00039
– volume-title: Proc. ICLR Workshop Representation Learn. Graphs Manifolds
  ident: ref21
  article-title: Fast graph representation learning with PyTorch Geometric
– start-page: 1025
  volume-title: Proc. 31st Int. Conf. Neural Inf. Process. Syst.
  ident: ref3
  article-title: Inductive representation learning on large graphs
– year: 2022
  ident: ref35
  article-title: Amazon EC2 F1 [online]
– ident: ref16
  doi: 10.1145/3419111.3421281
– year: 2023
  ident: ref38
  article-title: Nvidia system management interface [online]
– start-page: 467
  volume-title: Proc. Mach. Learn. Syst.
  ident: ref20
  article-title: Understanding GNN computational graph: A coordinated computation, IO, and memory perspective
– ident: ref23
  doi: 10.1109/FCCM.2018.00021
– ident: ref30
  doi: 10.1145/3466752.3480113
– year: 2022
  ident: ref36
  article-title: Azure np-series [online]
– start-page: 1
  volume-title: Proc. Proc. 39th Int. Conf. Comput.-Aided Des.
  ident: ref25
  article-title: DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator
– year: 2022
  ident: ref37
  article-title: Powertop [online]
– year: 2022
  ident: ref11
  article-title: Ll-GNN: Low latency graph neural networks on FPGAs for particle detectors
– ident: ref26
  doi: 10.1109/ICASSP40776.2020.9053977
SSID ssj0014504
Score 2.4484735
Snippet As the size of real-world graphs increases, training Graph Neural Networks (GNNs) has become time-consuming and requires acceleration. While previous works...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 707
SubjectTerms Algorithms
Application programming interface
Bandwidths
Central processing units
Computational modeling
CPU+Multi-FPGA
CPUs
Design parameters
Field programmable gate arrays
graph neural network
Graph neural networks
Graphics processing units
Hardware
hardware acceleration
Memory management
Metadata
Partitioning algorithms
Training
Vectors
Workload
Title HitGNN: High-Throughput GNN Training Framework on CPU+Multi-FPGA Heterogeneous Platform
URI https://ieeexplore.ieee.org/document/10452846
https://www.proquest.com/docview/2969048109
Volume 35
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore digital library
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB60Jz1Yn1hf5OBJSU13k3TjTdS2CJaCLXpbNo8FUbqi24u_3kl2V3wgeFtCJoTMTL6ZnZkMwDEXCKM5Y1SJSFKecEcRRjIqLRe9PI9infni5NuxHM34zYN4qIvVQy2Mcy4kn7mu_wyxfFuYhf9VhhrOBV6nchmW-4msirU-QwZchF6Bfg5VqId1CLPH1Nl0cnWHrmDEu3HsnbLoGwiFriq_ruKAL4M2jJudVWklT91Fqbvm_cejjf_e-jqs1ZYmuahEYwOW3HwT2k0XB1Ir9SasfnmScAvuR4_lcDw-Jz4BhE6rLj5IQXCQTOt-EmTQpHSRYk4uJ7PTUMdLB5PhBRn5BJsC5dIVizcyec5Kbxdvw2xwPb0c0br5AjVoBJY0Q2a5pCdzwzKbKYeWnU0MormKTc8g9lmmnbWJtgadzLyvZBK5nCUOAc4wE8U70JoXc7cLRHDXt1rEQiEdM0xLK-Io1znT6BNL14GThhvpS_XGRhp8E6ZSz7rUsy6tWdeBbX-6XyZWB9uBg4aBaa2Gb2mk0PnnCS609wfZPqz41asUxgNola8Ld4hmRqmPgnh9APXGzMI
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB7xOBQOvApiefrAicqLN7FDzA0BS1ogWqlZlVsUPyIh0AZB9sKvZ-wkCIqQuEWRnVieGX8znhfAARcIoyVjVIogojzmliKMFDQyXAzKMghV4ZKTb9IoGfM_t-K2TVb3uTDWWh98Zvvu0fvyTaWn7qoMJZwLPE6jWZgXnHPRpGu9OQ248N0C3SgqURJbJ-aAyaNsdP4XjcGA98PQmWXBBxjyfVU-HcYeYYbLkHZrawJL7vvTWvX1y39lG7-9-BVYanVNctowxyrM2MkaLHd9HEgr1muw-K4o4U_4l9zVl2l6QlwICM2aPj44g-BLkrUdJciwC-oi1YScjca_fCYvHY4uT0niQmwq5ExbTZ_J6KGonWa8DuPhRXaW0Lb9AtWoBta0QHLZeBCVmhWmkBZ1OxNrxHMZ6oFG9DNMWWNiZTSameWxjOLAliy2CHGa6SDcgLlJNbGbQAS3x0aJUEicxzRTkRFhUKqSKbSKI9uDw44a-WNTZSP31gmTuSNd7kiXt6Trwbrb3XcDm43twU5HwLwVxOc8kGj-8xg_tPXFtH34kWQ31_n17_RqGxbcn5qAxh2Yq5-mdheVjlrteVZ7BdsX0A8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=HitGNN%3A+High-Throughput+GNN+Training+Framework+on+CPU%2BMulti-FPGA+Heterogeneous+Platform&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Lin%2C+Yi-Chien&rft.au=Zhang%2C+Bingyi&rft.au=Prasanna%2C+Viktor+K.&rft.date=2024-05-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=35&rft.issue=5&rft.spage=707&rft.epage=719&rft_id=info:doi/10.1109%2FTPDS.2024.3371332&rft.externalDocID=10452846
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon