IP Cores for Graph Kernels on FPGAs
Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key gr...
        Saved in:
      
    
          | Published in | IEEE Conference on High Performance Extreme Computing (Online) pp. 1 - 7 | 
|---|---|
| Main Authors | , , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.09.2019
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2643-1971 | 
| DOI | 10.1109/HPEC.2019.8916363 | 
Cover
| Abstract | Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput. | 
    
|---|---|
| AbstractList | Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led to widespread adoption of dedicated hardware accelerators such as FPGA for this purpose. In this work, we develop IP cores for several key graph kernels. Our IP cores use graph processing over partitions (GPOP) programming paradigm to perform computations over graph partitions. Partitioning the input graph into nonoverlapping partitions improves on-chip data reuse. Additional optimizations to exploit intra and interpartition parallelism and to reduce external memory accesses are also discussed. We generate FPGA designs for general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). We target a platform consisting of large external DDR4 memory to store the graph data and Intel Stratix FPGA to accelerate the processing. Experimental results show that our accelerators sustain a high throughput of up to 2250, 2300, 3378, and 2178 Million Traversed Edges Per Second (MTEPS) for SpMV, PR, SSSP and WCC, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, 16.4× speedup for PR, 3.5× speedup for SSSP, and 35.1× speedup for WCC, and compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× speedup for SpMV, 1.64× speedup for PR, and 1.8× speedup for WCC, respectively. We develop a performance model for our GPOP paradigm. We then perform performance predictions of our designs assuming the graph is stored in HBM2 instead of DRAM. We further discuss extensions to our optimizations to improve the throughput. | 
    
| Author | Kuppannagari, Sanmukh R. Dasu, Aravind Prasanna, Viktor K. Rajat, Rachit Kannan, Rajgopal  | 
    
| Author_xml | – sequence: 1 givenname: Sanmukh R. surname: Kuppannagari fullname: Kuppannagari, Sanmukh R. organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089 – sequence: 2 givenname: Rachit surname: Rajat fullname: Rajat, Rachit organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089 – sequence: 3 givenname: Rajgopal surname: Kannan fullname: Kannan, Rajgopal organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089 – sequence: 4 givenname: Aravind surname: Dasu fullname: Dasu, Aravind organization: Programmable Solutions Group Intel Corporation,San Jose,California – sequence: 5 givenname: Viktor K. surname: Prasanna fullname: Prasanna, Viktor K. organization: University of Southern California,Ming Hsieh Department of Electrical and Computer Engineering,Los Angeles,California,90089  | 
    
| BookMark | eNotz81OAjEUQOFqNBGQBzBumrie8d62tz9LMoGBQOIsdE2mcBsxOENaN769C1md3ZecqbgbxoGFeEKoESG8rrtlUyvAUPuAVlt9I6bolEcCBXQrJsoaXWFw-CDmpXwBgNYKnNYT8bLpZDNmLjKNWba5v3zKLeeBz0WOg1x17aI8ivvUnwvPr52Jj9XyvVlXu7d20yx21UmB_qkiBY9JxXD04eA4IRjrKSkyIRJwtCaY_pBM9BwRyKIHR3Akq3x0lKKeied_98TM-0s-fff5d3990n8oXz5K | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/HPEC.2019.8916363 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISBN | 1728150205 9781728150208  | 
    
| EISSN | 2643-1971 | 
    
| EndPage | 7 | 
    
| ExternalDocumentID | 8916363 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL  | 
    
| ID | FETCH-LOGICAL-i203t-b5981f2b9d89c7ef104685f2549b50eb6494acf4b8eb1056180750d5628b75fb3 | 
    
| IEDL.DBID | RIE | 
    
| IngestDate | Wed Aug 27 02:42:49 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | false | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i203t-b5981f2b9d89c7ef104685f2549b50eb6494acf4b8eb1056180750d5628b75fb3 | 
    
| PageCount | 7 | 
    
| ParticipantIDs | ieee_primary_8916363 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2019-Sept. | 
    
| PublicationDateYYYYMMDD | 2019-09-01 | 
    
| PublicationDate_xml | – month: 09 year: 2019 text: 2019-Sept.  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | IEEE Conference on High Performance Extreme Computing (Online) | 
    
| PublicationTitleAbbrev | HPEC | 
    
| PublicationYear | 2019 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0003320733 | 
    
| Score | 1.727894 | 
    
| Snippet | Graphs are a powerful abstraction for representing networked data in many real-world applications. The need for performing large scale graph analytics has led... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 1 | 
    
| SubjectTerms | Bandwidth Field programmable gate arrays IP networks Kernel Optimization Parallel processing System-on-chip  | 
    
| Title | IP Cores for Graph Kernels on FPGAs | 
    
| URI | https://ieeexplore.ieee.org/document/8916363 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxFH7UnjxVbcWdgB5NmyaZmeQopYtKZQ4WeitN8gZEmEo7vfjrTWamFcWDtxDIvnxvfwB3Ruolj62m3LNeVLrEUo3Yp1zYxEmPh650pJ2-xJOZfJpH8wbc731hELE0PsNuKJa6fLey2yAq6ylPy4hYHMBBouLKV2svTxGCh_yDteKyz3Rvkg4HwXbLX4aq3Y8EKiV-jFow3Y1cmY28d7eF6drPX0EZ_zu1I-h8e-qRdI9Bx9DA_ARau1QNpH65bbh9TMnA97khnkgl4xClmjzjOvfISFY5GaXjh00HZqPh62BC6_wI9I0zUVATadXPuNFOaZtgFtS1KsoCy2cihiaWWi5tJo3yH3LgFELgYeY8xaNMEmVGnEIzX-V4BsQqJ1iWeH5YWIkxGmaklRots85ymZxDO6x58VGFwFjUy734u_oSDsO-V6ZYV9As1lu89thdmJvy0L4AM-SWjA | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JTwIxFH5BPOgJFYy7TfRoYZh2lh4NAQZZMgdIuBHavkkMyWBguPjrbWcGjMaDt6ZJt7y239sfwLPkYun6SlDXiF6U60BRgdimLlOB5gYPdR5IO5740Yy_zb15BV4OsTCImDufYdM2c1u-XqudVZW1QsPLMJ8dwbHHOfeKaK2DRoUx11YgLE2XbUe0orjbsd5b5joUI3-UUMkRpFeD8X7twnFk1dxlsqk-f6Vl_O_mzqDxHatH4gMKnUMF0wuo7Ys1kPLt1uFpEJOOmXNLDJtK-jZPNRniJjXYSNYp6cX9120DZr3utBPRskICfXcdllHpibCduFLoUKgAE2uwDb3ECn3Sc1D6XPClSrgMzZdsZQWbetjRhucJZeAlkl1CNV2neAVEhZo5SWAkYqY4-igdyRUXqByllcuDa6jbMy8-iiQYi_K4N393P8JJNB2PFqPBZHgLp5YGhWPWHVSzzQ7vDZJn8iEn4BdMkJnZ | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Conference+on+High+Performance+Extreme+Computing+%28Online%29&rft.atitle=IP+Cores+for+Graph+Kernels+on+FPGAs&rft.au=Kuppannagari%2C+Sanmukh+R.&rft.au=Rajat%2C+Rachit&rft.au=Kannan%2C+Rajgopal&rft.au=Dasu%2C+Aravind&rft.date=2019-09-01&rft.pub=IEEE&rft.eissn=2643-1971&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FHPEC.2019.8916363&rft.externalDocID=8916363 |