IP Cores for Graph Kernels on FPGAs

Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key gr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Conference on High Performance Extreme Computing (Online) pp. 1 - 7
Main Authors	Kuppannagari, Sanmukh R., Rajat, Rachit, Kannan, Rajgopal, Dasu, Aravind, Prasanna, Viktor K.
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2019
Subjects	Bandwidth Field programmable gate arrays IP networks Kernel Optimization Parallel processing System-on-chip
Online Access	Get full text
ISSN	2643-1971
DOI	10.1109/HPEC.2019.8916363

Cover

Abstract	Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput.
AbstractList	Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput.
Author	Kuppannagari, Sanmukh R. Dasu, Aravind Prasanna, Viktor K. Rajat, Rachit Kannan, Rajgopal
Author_xml	– sequence: 1 givenname: Sanmukh R. surname: Kuppannagari fullname: Kuppannagari, Sanmukh R. organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089 – sequence: 2 givenname: Rachit surname: Rajat fullname: Rajat, Rachit organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089 – sequence: 3 givenname: Rajgopal surname: Kannan fullname: Kannan, Rajgopal organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089 – sequence: 4 givenname: Aravind surname: Dasu fullname: Dasu, Aravind organization: Programmable Solutions Group Intel Corporation,San Jose,California – sequence: 5 givenname: Viktor K. surname: Prasanna fullname: Prasanna, Viktor K. organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089
BookMark	eNotz81OAjEUQOFqNBGQBzBumrie8d62tz9LMoGBQOIsdE2mcBsxOENaN769C1md3ZecqbgbxoGFeEKoESG8rrtlUyvAUPuAVlt9I6bolEcCBXQrJsoaXWFw-CDmpXwBgNYKnNYT8bLpZDNmLjKNWba5v3zKLeeBz0WOg1x17aI8ivvUnwvPr52Jj9XyvVlXu7d20yx21UmB_qkiBY9JxXD04eA4IRjrKSkyIRJwtCaY_pBM9BwRyKIHR3Akq3x0lKKeied_98TM-0s-fff5d3990n8oXz5K
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/HPEC.2019.8916363
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1728150205 9781728150208
EISSN	2643-1971
EndPage	7
ExternalDocumentID	8916363
Genre	orig-research
GroupedDBID	6IE 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i203t-b5981f2b9d89c7ef104685f2549b50eb6494acf4b8eb1056180750d5628b75fb3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:42:49 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-b5981f2b9d89c7ef104685f2549b50eb6494acf4b8eb1056180750d5628b75fb3
PageCount	7
ParticipantIDs	ieee_primary_8916363
PublicationCentury	2000
PublicationDate	2019-Sept.
PublicationDateYYYYMMDD	2019-09-01
PublicationDate_xml	– month: 09 year: 2019 text: 2019-Sept.
PublicationDecade	2010
PublicationTitle	IEEE Conference on High Performance Extreme Computing (Online)
PublicationTitleAbbrev	HPEC
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003320733
Score	1.727894
Snippet	Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Bandwidth Field programmable gate arrays IP networks Kernel Optimization Parallel processing System-on-chip
Title	IP Cores for Graph Kernels on FPGAs
URI	https://ieeexplore.ieee.org/document/8916363
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxFH7UnjxVbcWdgB5NmyaZmeQopYtKZQ4WeitN8gZEmEo7vfjrTWamFcWDtxDIvnxvfwB3Ruolj62m3LNeVLrEUo3Yp1zYxEmPh650pJ2-xJOZfJpH8wbc731hELE0PsNuKJa6fLey2yAq6ylPy4hYHMBBouLKV2svTxGCh_yDteKyz3Rvkg4HwXbLX4aq3Y8EKiV-jFow3Y1cmY28d7eF6drPX0EZ_zu1I-h8e-qRdI9Bx9DA_ARau1QNpH65bbh9TMnA97khnkgl4xClmjzjOvfISFY5GaXjh00HZqPh62BC6_wI9I0zUVATadXPuNFOaZtgFtS1KsoCy2cihiaWWi5tJo3yH3LgFELgYeY8xaNMEmVGnEIzX-V4BsQqJ1iWeH5YWIkxGmaklRots85ymZxDO6x58VGFwFjUy734u_oSDsO-V6ZYV9As1lu89thdmJvy0L4AM-SWjA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JTwIxFH5BPOgJFYy7TfRoYZh2lh4NAQZZMgdIuBHavkkMyWBguPjrbWcGjMaDt6ZJt7y239sfwLPkYun6SlDXiF6U60BRgdimLlOB5gYPdR5IO5740Yy_zb15BV4OsTCImDufYdM2c1u-XqudVZW1QsPLMJ8dwbHHOfeKaK2DRoUx11YgLE2XbUe0orjbsd5b5joUI3-UUMkRpFeD8X7twnFk1dxlsqk-f6Vl_O_mzqDxHatH4gMKnUMF0wuo7Ys1kPLt1uFpEJOOmXNLDJtK-jZPNRniJjXYSNYp6cX9120DZr3utBPRskICfXcdllHpibCduFLoUKgAE2uwDb3ECn3Sc1D6XPClSrgMzZdsZQWbetjRhucJZeAlkl1CNV2neAVEhZo5SWAkYqY4-igdyRUXqByllcuDa6jbMy8-iiQYi_K4N393P8JJNB2PFqPBZHgLp5YGhWPWHVSzzQ7vDZJn8iEn4BdMkJnZ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Conference+on+High+Performance+Extreme+Computing+%28Online%29&rft.atitle=IP+Cores+for+Graph+Kernels+on+FPGAs&rft.au=Kuppannagari%2C+Sanmukh+R.&rft.au=Rajat%2C+Rachit&rft.au=Kannan%2C+Rajgopal&rft.au=Dasu%2C+Aravind&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2643-1971&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FHPEC.2019.8916363&rft.externalDocID=8916363