POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator
In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the op...
Saved in:
| Published in | IEEE transactions on very large scale integration (VLSI) systems Vol. 28; no. 3; pp. 838 - 842 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.03.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1063-8210 1557-9999 |
| DOI | 10.1109/TVLSI.2019.2947639 |
Cover
| Abstract | In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6x , 43.6x , 21.9x , and 114.5x improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31 faster than the best previously reported results. |
|---|---|
| AbstractList | In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6x , 43.6x , 21.9x , and 114.5x improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31 faster than the best previously reported results. In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about [Formula Omitted], [Formula Omitted], [Formula Omitted], and [Formula Omitted] improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is [Formula Omitted] faster than the best previously reported results. |
| Author | Kamal, Mehdi Afzali-Kusha, Ali Bank-Tavakoli, Erfan Pedram, Massoud Ghasemzadeh, Seyed Abolfazl |
| Author_xml | – sequence: 1 givenname: Erfan surname: Bank-Tavakoli fullname: Bank-Tavakoli, Erfan email: btavakoli@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 2 givenname: Seyed Abolfazl surname: Ghasemzadeh fullname: Ghasemzadeh, Seyed Abolfazl email: a.ghasemzadeh@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 3 givenname: Mehdi orcidid: 0000-0001-7098-6440 surname: Kamal fullname: Kamal, Mehdi email: mehdikamal@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 4 givenname: Ali orcidid: 0000-0001-8614-2007 surname: Afzali-Kusha fullname: Afzali-Kusha, Ali email: afzali@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 5 givenname: Massoud orcidid: 0000-0002-2677-7307 surname: Pedram fullname: Pedram, Massoud email: pedram@usc.edu organization: Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA |
| BookMark | eNp9kE1Lw0AQhhepYFv9A3opeE67sx_JjrdYbBUiLbZ6XTbZDaTEJG5SwX9vaosHD85l3sP7zMAzIoOqrhwh10CnABRn27dk8zRlFHDKUEQhxzMyBCmjAPsZ9JmGPFAM6AUZte2OUhAC6ZDgepXEL3eTeLIuGlcWlbOz1afzpWkaZyeL9TIO7k3bx2SzfZ7EWeZK501X-0tynpuydVenPSavi4ft_DFIVsuneZwEGUPZBQ6VMYZCCDxFpAaFkxkVMlehsFwaZ8GmIK2AVKahjBDylFkRgQ0RuDJ8TG6Pdxtff-xd2-ldvfdV_1IzHipGBUrVt9Sxlfm6bb3LdVZ0pivqqvOmKDVQfRClf0Tpgyh9EtWj7A_a-OLd-K__oZsjVDjnfgGlFEYR5d9Q1HNs |
| CODEN | IEVSE9 |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2023_3329048 crossref_primary_10_1145_3564929 crossref_primary_10_1109_TPDS_2021_3063670 crossref_primary_10_1109_TVLSI_2021_3135353 crossref_primary_10_1145_3534969 crossref_primary_10_1109_TC_2022_3207137 crossref_primary_10_1109_TCSI_2024_3464687 crossref_primary_10_3390_technologies8030046 crossref_primary_10_1109_TCAD_2021_3093832 crossref_primary_10_1145_3629979 crossref_primary_10_1145_3699512 crossref_primary_10_1109_TCDS_2022_3147253 crossref_primary_10_1109_ACCESS_2024_3488033 crossref_primary_10_1109_TETC_2022_3230961 crossref_primary_10_1109_TBCAS_2021_3064841 crossref_primary_10_1016_j_sysarc_2024_103181 crossref_primary_10_1109_TC_2024_3500368 crossref_primary_10_1007_s00034_023_02456_6 crossref_primary_10_1016_j_micpro_2021_104374 crossref_primary_10_1007_s00034_023_02412_4 |
| Cites_doi | 10.1162/neco.1997.9.8.1735 10.1109/ReConFig.2016.7857151 10.1049/cp:19991218 10.1016/S0167-8655(99)00077-X 10.1109/FPL.2018.00015 10.1109/TNNLS.2016.2582924 10.1109/IJCNN.2000.861302 10.1109/ICIP.2018.8451053 10.1109/CVPR.2014.223 10.1109/TASLP.2014.2303296 10.1109/ASPDAC.2017.7858394 10.1109/ISCAS.2017.8050816 10.1109/ICASSP.2013.6638947 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1109/TVLSI.2019.2947639 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1557-9999 |
| EndPage | 842 |
| ExternalDocumentID | 10_1109_TVLSI_2019_2947639 8889770 |
| Genre | orig-research |
| GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYXX CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c295t-e98aaa01613b990a94e5c045f864d35aed1db15d41b5b65791fb2d471d69138a3 |
| IEDL.DBID | RIE |
| ISSN | 1063-8210 |
| IngestDate | Sun Oct 05 00:22:28 EDT 2025 Thu Apr 24 22:54:51 EDT 2025 Wed Oct 01 02:59:25 EDT 2025 Wed Aug 27 02:35:30 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c295t-e98aaa01613b990a94e5c045f864d35aed1db15d41b5b65791fb2d471d69138a3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-7098-6440 0000-0001-8614-2007 0000-0002-2677-7307 |
| PQID | 2368204958 |
| PQPubID | 85424 |
| PageCount | 5 |
| ParticipantIDs | proquest_journals_2368204958 crossref_citationtrail_10_1109_TVLSI_2019_2947639 ieee_primary_8889770 crossref_primary_10_1109_TVLSI_2019_2947639 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2020-03-01 |
| PublicationDateYYYYMMDD | 2020-03-01 |
| PublicationDate_xml | – month: 03 year: 2020 text: 2020-03-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on very large scale integration (VLSI) systems |
| PublicationTitleAbbrev | TVLSI |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref14 chang (ref11) 2015 ref10 ref2 ref16 han (ref6) 2016 ref8 ref7 ref9 ref4 ref3 (ref17) 2019 ref5 krizhevsky (ref1) 2012 |
| References_xml | – year: 2015 ident: ref11 article-title: Recurrent neural networks hardware implementation on FPGA publication-title: arXiv 1511 05552 – ident: ref5 doi: 10.1162/neco.1997.9.8.1735 – year: 2016 ident: ref6 article-title: ESE: Efficient speech recognition engine with sparse LSTM on FPGA publication-title: arXiv 1612 00694 – ident: ref13 doi: 10.1109/ReConFig.2016.7857151 – ident: ref8 doi: 10.1049/cp:19991218 – year: 2019 ident: ref17 publication-title: UCI Machine Learning Repository Japanese Vowels Dataset – ident: ref16 doi: 10.1016/S0167-8655(99)00077-X – ident: ref15 doi: 10.1109/FPL.2018.00015 – start-page: 1097 year: 2012 ident: ref1 article-title: Imagenet classification with deep convolutional neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref10 doi: 10.1109/TNNLS.2016.2582924 – ident: ref9 doi: 10.1109/IJCNN.2000.861302 – ident: ref7 doi: 10.1109/ICIP.2018.8451053 – ident: ref3 doi: 10.1109/CVPR.2014.223 – ident: ref4 doi: 10.1109/TASLP.2014.2303296 – ident: ref14 doi: 10.1109/ASPDAC.2017.7858394 – ident: ref12 doi: 10.1109/ISCAS.2017.8050816 – ident: ref2 doi: 10.1109/ICASSP.2013.6638947 |
| SSID | ssj0014490 |
| Score | 2.445659 |
| Snippet | In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 838 |
| SubjectTerms | Computer architecture Field programmable gate arrays Field-programmable gate array (FPGA) Hardware high speed Logic gates long short-term memory (LSTM) accelerator low power low resource utilization Mapping Power demand Power efficiency Power management Resource management Resource utilization Timing |
| Title | POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator |
| URI | https://ieeexplore.ieee.org/document/8889770 https://www.proquest.com/docview/2368204958 |
| Volume | 28 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014490 issn: 1063-8210 databaseCode: RIE dateStart: 19930101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJxh4FUShoAxskDRO7DRmC4jyUEsr2qJukV-REKitoGXg13N2koqXEFsGn2Td2fnu_N0DoeMsFob6912VZcIlhGOXKWUoxBbRWLaEb4f2de-i6xG5HdNxBZ0ua2G01jb5THvm03L5aioX5qmsCdEauCsQoK-04iiv1VoyBoSwvPNAFLoxxDFlgYzPmsOHzuDGZHExL2AELhT7AkJ2qsqPX7HFl_YG6pY7y9NKnrzFXHjy_VvTxv9ufROtF46mk-QnYwtV9GQbrX1qP1hDrN_rJPdnTuL0H2emLl2rZu_NPPDNZlo57f5V4p4DyimnMxh2nURKwChLy--gUftyeHHtFqMUXBkwOnc1iznnxr0LBeAPZ0RTCd5cFkdEhZRrhZXAVBEsqIhoi-FMBAqAS0UMhzEPd1F1Mp3oPeSEgGgaJCkDZOMRZ2EmMIE4jUpteN06wqVuU1n0GTfjLp5TG2_4LLX2SI090sIedXSylJnlXTb-XF0zCl6uLHRbR43ShGlxEV_TIIzAx4EoMN7_XeoArQbmrNq0sgaqzl8W-hD8jLk4sgfsA5FNynU |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED4hGICBN6I8M7BB2jqx05gtIEqBFCooiC3yKxICtRW0DPx6zk5a8RJiy-CTrDs7352_ewDs57G01H_d13kufUoF8bnWlkJsUENUQ9bd0L72VdS6oxcP7GEKDie1MMYYl3xmqvbTcfm6r0b2qayG0Rq6KxigzzBKKSuqtSacAaW86D0QhX6Mkcy4RKbOa9379Pbc5nHxasApXin-BYbcXJUfP2OHMM1FaI_3ViSWPFVHQ1lV79_aNv5380uwULqaXlKcjWWYMr0VmP_UgHAVeOc6TW6OvMTrPA5sZbrRtes3-8Q3GBjtNTtniX-MOKe99Lbb9hKlEKUcMb8Gd83T7knLL4cp-CrgbOgbHgshrIMXSkQgwalhCv25PI6oDpkwmmhJmKZEMhmxBie5DDRCl444CWMRrsN0r98zG-CFiGkGJRlHbBOR4GEuCcVIjSljmd0KkLFuM1V2GrcDL54zF3HUeebskVl7ZKU9KnAwkRkUfTb-XL1qFTxZWeq2AttjE2blVXzNgjBCLwfjwHjzd6k9mG1122mWnl9dbsFcYE-uSzLbhunhy8jsoNcxlLvusH0AXNLNwg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=POLAR%3A+A+Pipelined%2FOverlapped+FPGA-Based+LSTM+Accelerator&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Bank-Tavakoli%2C+Erfan&rft.au=Ghasemzadeh%2C+Seyed+Abolfazl&rft.au=Kamal%2C+Mehdi&rft.au=Afzali-Kusha%2C+Ali&rft.date=2020-03-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=28&rft.issue=3&rft.spage=838&rft.epage=842&rft_id=info:doi/10.1109%2FTVLSI.2019.2947639&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2019_2947639 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon |