Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC
Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix...
Saved in:
| Published in | International Conference on Field-programmable Logic and Applications pp. 1 - 4 |
|---|---|
| Main Authors | , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
EPFL
01.08.2016
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1946-1488 |
| DOI | 10.1109/FPL.2016.7577314 |
Cover
| Abstract | Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap. |
|---|---|
| AbstractList | Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap. |
| Author | Jaewoong Sim Mishra, Asit Krishnan, Srivatsan Nurvitadhi, Eriko Sheffield, David Marr, Debbie |
| Author_xml | – sequence: 1 givenname: Eriko surname: Nurvitadhi fullname: Nurvitadhi, Eriko email: eriko.nurvitadhi@intel.com organization: Intel Corp., Hillsboro, OR, USA – sequence: 2 surname: Jaewoong Sim fullname: Jaewoong Sim organization: Intel Corp., Hillsboro, OR, USA – sequence: 3 givenname: David surname: Sheffield fullname: Sheffield, David organization: Intel Corp., Hillsboro, OR, USA – sequence: 4 givenname: Asit surname: Mishra fullname: Mishra, Asit organization: Intel Corp., Hillsboro, OR, USA – sequence: 5 givenname: Srivatsan surname: Krishnan fullname: Krishnan, Srivatsan organization: Intel Corp., Hillsboro, OR, USA – sequence: 6 givenname: Debbie surname: Marr fullname: Marr, Debbie organization: Intel Corp., Hillsboro, OR, USA |
| BookMark | eNotkEFLwzAYhqMoOOfugpf8gHX2S9Im8VaKm4OBA915fM0SiXbpSDpl_96Kew_vw3N5D-8tuQpdsITcQz4DyPXjfL2asRzKmSyk5CAuyERLxRTXGpQQ7JKMQIsyA6HUDZmk9JkPKYRURTkipjLGtjZi78MHjdYcY7Shp8EeI7YD-p8ufiXqA8WA7an3JtFk47eN6YnW3f6A0acu0M7R-XpRTWm93kzp4q8w7Gj1tqzvyLXDNtnJmWOymT-_1y_Z6nWxrKtV5pmAPmsa7px2yBuuBLBcyVLygsFOKEAlB-VgGmROGSkK64RjuOMoNAjkmgk-Jg__u95auz1Ev8d42p5v4b8rL1YT |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/FPL.2016.7577314 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9782839918442 2839918447 |
| EISSN | 1946-1488 |
| EndPage | 4 |
| ExternalDocumentID | 7577314 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i241t-bb3ff9fa3b38412087673521d481a8787631cba2f8c745ef4f2ad3a4914a39243 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:40:02 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i241t-bb3ff9fa3b38412087673521d481a8787631cba2f8c745ef4f2ad3a4914a39243 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_7577314 |
| PublicationCentury | 2000 |
| PublicationDate | 2016-08 |
| PublicationDateYYYYMMDD | 2016-08-01 |
| PublicationDate_xml | – month: 08 year: 2016 text: 2016-08 |
| PublicationDecade | 2010 |
| PublicationTitle | International Conference on Field-programmable Logic and Applications |
| PublicationTitleAbbrev | FPL |
| PublicationYear | 2016 |
| Publisher | EPFL |
| Publisher_xml | – name: EPFL |
| SSID | ssj0000547856 |
| Score | 2.0734534 |
| Snippet | Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Classification algorithms Field programmable gate arrays Graphics processing units Logic gates Random access memory Recurrent neural networks Runtime |
| Title | Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC |
| URI | https://ieeexplore.ieee.org/document/7577314 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ08qm_ibHDyuXdskbeptFLspTgo62G3kpwylG7O7-Neb186K4sFLUwqhJQ_6vbx83_cQuqap-wXGkfZYKLRHU0M9ToX1TExVYBlVxIBQePoYT2b0fs7mHTRotTDGmJp8Zny4rc_y9UptoVQ2TFiSEOhavZfwuNFqtfWUAIypWHsSGaTDvHgA6lbs76b96J9Sw0d-gKZfL25YI6_-tpK--vjlyfjfLztE_W-hHi5aCDpCHVP2kBop5cAEQlu-4A0U1MGCCYN1pXhzQ038fsfLEguwJAGjZgzFWZcJ3uCsbUyIVxbnxXg0wFkxG-AxXESp8ejpLuujWX77nE28XS8Fb-kwuvKkJNamVhBJOA0jMKJLXO4VaspDwRPwpQuVFJHlKqHMWGojoYmgaUiFS6EoOUbdclWaE4RB-ZEGikkSC7d54dztxwNijSapDGNtT1EPFmixbuwyFru1Ofv78TnahyA1nLoL1K02W3PpcL6SV3WAPwEZwacN |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1KPehJpRW_3YPHJk2ys_nwVoL90LYEbKG3stnsSlFSqenFX-9OUiOKBy_ZEFgSdiBvdva9N4TcQmR-gb6XWdwVmQWRAisEoS3lg3Q0B8kUCoUnU384h4cFXzRIp9bCKKVK8pmy8bY8y8_Wcoulsm7Ag4Bh1-o9DgC8UmvVFRUHral4fRbpRN1-Mkbylm_vJv7ooFICSP-QTL5eXfFGXuxtkdry45cr43-_7Yi0v6V6NKlB6Jg0VN4isielgRMMbv5MN1hSRxMmiuaV4tUMJfX7na5yKtCUBK2aKZZnTS54R-O6NSFda9pPBr0OjZN5hw7wIvKM9p5GcZvM-_ezeGjtuilYK4PShZWmTOtIC5ayEFwPregCk325GYSuCAN0pnNlKjwdygC40qA9kTEBkQvCJFHATkgzX-fqlFDUfkSO5Cnzhdm-hKHZkTtMq4xFqetn-oy0cIGWb5VhxnK3Nud_P74h-8PZZLwcj6aPF-QAA1Yx7C5Js9hs1ZVB_SK9LoP9CQpyqlo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+on+Field-programmable+Logic+and+Applications&rft.atitle=Accelerating+recurrent+neural+networks+in+analytics+servers%3A+Comparison+of+FPGA%2C+CPU%2C+GPU%2C+and+ASIC&rft.au=Nurvitadhi%2C+Eriko&rft.au=Jaewoong+Sim&rft.au=Sheffield%2C+David&rft.au=Mishra%2C+Asit&rft.date=2016-08-01&rft.pub=EPFL&rft.eissn=1946-1488&rft.spage=1&rft.epage=4&rft_id=info:doi/10.1109%2FFPL.2016.7577314&rft.externalDocID=7577314 |