IP Cores for Graph Kernels on FPGAs

Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key gr...

Full description

Saved in:
Bibliographic Details
Published inIEEE Conference on High Performance Extreme Computing (Online) pp. 1 - 7
Main Authors Kuppannagari, Sanmukh R., Rajat, Rachit, Kannan, Rajgopal, Dasu, Aravind, Prasanna, Viktor K.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2019
Subjects
Online AccessGet full text
ISSN2643-1971
DOI10.1109/HPEC.2019.8916363

Cover

Abstract Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput.
AbstractList Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput.
Author Kuppannagari, Sanmukh R.
Dasu, Aravind
Prasanna, Viktor K.
Rajat, Rachit
Kannan, Rajgopal
Author_xml – sequence: 1
  givenname: Sanmukh R.
  surname: Kuppannagari
  fullname: Kuppannagari, Sanmukh R.
  organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089
– sequence: 2
  givenname: Rachit
  surname: Rajat
  fullname: Rajat, Rachit
  organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089
– sequence: 3
  givenname: Rajgopal
  surname: Kannan
  fullname: Kannan, Rajgopal
  organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089
– sequence: 4
  givenname: Aravind
  surname: Dasu
  fullname: Dasu, Aravind
  organization: Programmable Solutions Group Intel Corporation,San Jose,California
– sequence: 5
  givenname: Viktor K.
  surname: Prasanna
  fullname: Prasanna, Viktor K.
  organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089
BookMark eNotz81OAjEUQOFqNBGQBzBumrie8d62tz9LMoGBQOIsdE2mcBsxOENaN769C1md3ZecqbgbxoGFeEKoESG8rrtlUyvAUPuAVlt9I6bolEcCBXQrJsoaXWFw-CDmpXwBgNYKnNYT8bLpZDNmLjKNWba5v3zKLeeBz0WOg1x17aI8ivvUnwvPr52Jj9XyvVlXu7d20yx21UmB_qkiBY9JxXD04eA4IRjrKSkyIRJwtCaY_pBM9BwRyKIHR3Akq3x0lKKeied_98TM-0s-fff5d3990n8oXz5K
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPEC.2019.8916363
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728150205
9781728150208
EISSN 2643-1971
EndPage 7
ExternalDocumentID 8916363
Genre orig-research
GroupedDBID 6IE
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i203t-b5981f2b9d89c7ef104685f2549b50eb6494acf4b8eb1056180750d5628b75fb3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:42:49 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-b5981f2b9d89c7ef104685f2549b50eb6494acf4b8eb1056180750d5628b75fb3
PageCount 7
ParticipantIDs ieee_primary_8916363
PublicationCentury 2000
PublicationDate 2019-Sept.
PublicationDateYYYYMMDD 2019-09-01
PublicationDate_xml – month: 09
  year: 2019
  text: 2019-Sept.
PublicationDecade 2010
PublicationTitle IEEE Conference on High Performance Extreme Computing (Online)
PublicationTitleAbbrev HPEC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003320733
Score 1.727894
Snippet Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Bandwidth
Field programmable gate arrays
IP networks
Kernel
Optimization
Parallel processing
System-on-chip
Title IP Cores for Graph Kernels on FPGAs
URI https://ieeexplore.ieee.org/document/8916363
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxFH7UnjxVbcWdgB5NmyaZmeQopYtKZQ4WeitN8gZEmEo7vfjrTWamFcWDtxDIvnxvfwB3Ruolj62m3LNeVLrEUo3Yp1zYxEmPh650pJ2-xJOZfJpH8wbc731hELE0PsNuKJa6fLey2yAq6ylPy4hYHMBBouLKV2svTxGCh_yDteKyz3Rvkg4HwXbLX4aq3Y8EKiV-jFow3Y1cmY28d7eF6drPX0EZ_zu1I-h8e-qRdI9Bx9DA_ARau1QNpH65bbh9TMnA97khnkgl4xClmjzjOvfISFY5GaXjh00HZqPh62BC6_wI9I0zUVATadXPuNFOaZtgFtS1KsoCy2cihiaWWi5tJo3yH3LgFELgYeY8xaNMEmVGnEIzX-V4BsQqJ1iWeH5YWIkxGmaklRots85ymZxDO6x58VGFwFjUy734u_oSDsO-V6ZYV9As1lu89thdmJvy0L4AM-SWjA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JTwIxFH5BPOgJFYy7TfRoYZh2lh4NAQZZMgdIuBHavkkMyWBguPjrbWcGjMaDt6ZJt7y239sfwLPkYun6SlDXiF6U60BRgdimLlOB5gYPdR5IO5740Yy_zb15BV4OsTCImDufYdM2c1u-XqudVZW1QsPLMJ8dwbHHOfeKaK2DRoUx11YgLE2XbUe0orjbsd5b5joUI3-UUMkRpFeD8X7twnFk1dxlsqk-f6Vl_O_mzqDxHatH4gMKnUMF0wuo7Ys1kPLt1uFpEJOOmXNLDJtK-jZPNRniJjXYSNYp6cX9120DZr3utBPRskICfXcdllHpibCduFLoUKgAE2uwDb3ECn3Sc1D6XPClSrgMzZdsZQWbetjRhucJZeAlkl1CNV2neAVEhZo5SWAkYqY4-igdyRUXqByllcuDa6jbMy8-iiQYi_K4N393P8JJNB2PFqPBZHgLp5YGhWPWHVSzzQ7vDZJn8iEn4BdMkJnZ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Conference+on+High+Performance+Extreme+Computing+%28Online%29&rft.atitle=IP+Cores+for+Graph+Kernels+on+FPGAs&rft.au=Kuppannagari%2C+Sanmukh+R.&rft.au=Rajat%2C+Rachit&rft.au=Kannan%2C+Rajgopal&rft.au=Dasu%2C+Aravind&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2643-1971&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FHPEC.2019.8916363&rft.externalDocID=8916363