POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator

In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the op...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 28; no. 3; pp. 838 - 842
Main Authors	Bank-Tavakoli, Erfan, Ghasemzadeh, Seyed Abolfazl, Kamal, Mehdi, Afzali-Kusha, Ali, Pedram, Massoud
Format	Journal Article
Language	English
Published	New York IEEE 01.03.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Computer architecture Field programmable gate arrays Field-programmable gate array (FPGA) Hardware high speed Logic gates long short-term memory (LSTM) accelerator low power low resource utilization Mapping Power demand Power efficiency Power management Resource management Resource utilization Timing
Online Access	Get full text
ISSN	1063-8210 1557-9999
DOI	10.1109/TVLSI.2019.2947639

Cover

Abstract	In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6x , 43.6x , 21.9x , and 114.5x improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31 faster than the best previously reported results.
AbstractList	In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6x , 43.6x , 21.9x , and 114.5x improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31 faster than the best previously reported results. In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about [Formula Omitted], [Formula Omitted], [Formula Omitted], and [Formula Omitted] improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is [Formula Omitted] faster than the best previously reported results.
Author	Kamal, Mehdi Afzali-Kusha, Ali Bank-Tavakoli, Erfan Pedram, Massoud Ghasemzadeh, Seyed Abolfazl
Author_xml	– sequence: 1 givenname: Erfan surname: Bank-Tavakoli fullname: Bank-Tavakoli, Erfan email: btavakoli@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 2 givenname: Seyed Abolfazl surname: Ghasemzadeh fullname: Ghasemzadeh, Seyed Abolfazl email: a.ghasemzadeh@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 3 givenname: Mehdi orcidid: 0000-0001-7098-6440 surname: Kamal fullname: Kamal, Mehdi email: mehdikamal@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 4 givenname: Ali orcidid: 0000-0001-8614-2007 surname: Afzali-Kusha fullname: Afzali-Kusha, Ali email: afzali@ut.ac.ir organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran – sequence: 5 givenname: Massoud orcidid: 0000-0002-2677-7307 surname: Pedram fullname: Pedram, Massoud email: pedram@usc.edu organization: Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
BookMark	eNp9kE1Lw0AQhhepYFv9A3opeE67sx_JjrdYbBUiLbZ6XTbZDaTEJG5SwX9vaosHD85l3sP7zMAzIoOqrhwh10CnABRn27dk8zRlFHDKUEQhxzMyBCmjAPsZ9JmGPFAM6AUZte2OUhAC6ZDgepXEL3eTeLIuGlcWlbOz1afzpWkaZyeL9TIO7k3bx2SzfZ7EWeZK501X-0tynpuydVenPSavi4ft_DFIVsuneZwEGUPZBQ6VMYZCCDxFpAaFkxkVMlehsFwaZ8GmIK2AVKahjBDylFkRgQ0RuDJ8TG6Pdxtff-xd2-ldvfdV_1IzHipGBUrVt9Sxlfm6bb3LdVZ0pivqqvOmKDVQfRClf0Tpgyh9EtWj7A_a-OLd-K__oZsjVDjnfgGlFEYR5d9Q1HNs
CODEN	IEVSE9
CitedBy_id	crossref_primary_10_1109_ACCESS_2023_3329048 crossref_primary_10_1145_3564929 crossref_primary_10_1109_TPDS_2021_3063670 crossref_primary_10_1109_TVLSI_2021_3135353 crossref_primary_10_1145_3534969 crossref_primary_10_1109_TC_2022_3207137 crossref_primary_10_1109_TCSI_2024_3464687 crossref_primary_10_3390_technologies8030046 crossref_primary_10_1109_TCAD_2021_3093832 crossref_primary_10_1145_3629979 crossref_primary_10_1145_3699512 crossref_primary_10_1109_TCDS_2022_3147253 crossref_primary_10_1109_ACCESS_2024_3488033 crossref_primary_10_1109_TETC_2022_3230961 crossref_primary_10_1109_TBCAS_2021_3064841 crossref_primary_10_1016_j_sysarc_2024_103181 crossref_primary_10_1109_TC_2024_3500368 crossref_primary_10_1007_s00034_023_02456_6 crossref_primary_10_1016_j_micpro_2021_104374 crossref_primary_10_1007_s00034_023_02412_4
Cites_doi	10.1162/neco.1997.9.8.1735 10.1109/ReConFig.2016.7857151 10.1049/cp:19991218 10.1016/S0167-8655(99)00077-X 10.1109/FPL.2018.00015 10.1109/TNNLS.2016.2582924 10.1109/IJCNN.2000.861302 10.1109/ICIP.2018.8451053 10.1109/CVPR.2014.223 10.1109/TASLP.2014.2303296 10.1109/ASPDAC.2017.7858394 10.1109/ISCAS.2017.8050816 10.1109/ICASSP.2013.6638947
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID	97E RIA RIE AAYXX CITATION 7SP 8FD L7M
DOI	10.1109/TVLSI.2019.2947639
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace
DatabaseTitle	CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1557-9999
EndPage	842
ExternalDocumentID	10_1109_TVLSI_2019_2947639 8889770
Genre	orig-research
GroupedDBID	-~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TN5 VH1 AAYXX CITATION 7SP 8FD L7M
ID	FETCH-LOGICAL-c295t-e98aaa01613b990a94e5c045f864d35aed1db15d41b5b65791fb2d471d69138a3
IEDL.DBID	RIE
ISSN	1063-8210
IngestDate	Sun Oct 05 00:22:28 EDT 2025 Thu Apr 24 22:54:51 EDT 2025 Wed Oct 01 02:59:25 EDT 2025 Wed Aug 27 02:35:30 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c295t-e98aaa01613b990a94e5c045f864d35aed1db15d41b5b65791fb2d471d69138a3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-7098-6440 0000-0001-8614-2007 0000-0002-2677-7307
PQID	2368204958
PQPubID	85424
PageCount	5
ParticipantIDs	proquest_journals_2368204958 crossref_citationtrail_10_1109_TVLSI_2019_2947639 ieee_primary_8889770 crossref_primary_10_1109_TVLSI_2019_2947639
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2020-03-01
PublicationDateYYYYMMDD	2020-03-01
PublicationDate_xml	– month: 03 year: 2020 text: 2020-03-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev	TVLSI
PublicationYear	2020
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref15 ref14 chang (ref11) 2015 ref10 ref2 ref16 han (ref6) 2016 ref8 ref7 ref9 ref4 ref3 (ref17) 2019 ref5 krizhevsky (ref1) 2012
References_xml	– year: 2015 ident: ref11 article-title: Recurrent neural networks hardware implementation on FPGA publication-title: arXiv 1511 05552 – ident: ref5 doi: 10.1162/neco.1997.9.8.1735 – year: 2016 ident: ref6 article-title: ESE: Efficient speech recognition engine with sparse LSTM on FPGA publication-title: arXiv 1612 00694 – ident: ref13 doi: 10.1109/ReConFig.2016.7857151 – ident: ref8 doi: 10.1049/cp:19991218 – year: 2019 ident: ref17 publication-title: UCI Machine Learning Repository Japanese Vowels Dataset – ident: ref16 doi: 10.1016/S0167-8655(99)00077-X – ident: ref15 doi: 10.1109/FPL.2018.00015 – start-page: 1097 year: 2012 ident: ref1 article-title: Imagenet classification with deep convolutional neural networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref10 doi: 10.1109/TNNLS.2016.2582924 – ident: ref9 doi: 10.1109/IJCNN.2000.861302 – ident: ref7 doi: 10.1109/ICIP.2018.8451053 – ident: ref3 doi: 10.1109/CVPR.2014.223 – ident: ref4 doi: 10.1109/TASLP.2014.2303296 – ident: ref14 doi: 10.1109/ASPDAC.2017.7858394 – ident: ref12 doi: 10.1109/ISCAS.2017.8050816 – ident: ref2 doi: 10.1109/ICASSP.2013.6638947
SSID	ssj0014490
Score	2.445659
Snippet	In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	838
SubjectTerms	Computer architecture Field programmable gate arrays Field-programmable gate array (FPGA) Hardware high speed Logic gates long short-term memory (LSTM) accelerator low power low resource utilization Mapping Power demand Power efficiency Power management Resource management Resource utilization Timing
Title	POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator
URI	https://ieeexplore.ieee.org/document/8889770 https://www.proquest.com/docview/2368204958
Volume	28
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9999 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014490 issn: 1063-8210 databaseCode: RIE dateStart: 19930101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJxh4FUShoAxskDRO7DRmC4jyUEsr2qJukV-REKitoGXg13N2koqXEFsGn2Td2fnu_N0DoeMsFob6912VZcIlhGOXKWUoxBbRWLaEb4f2de-i6xG5HdNxBZ0ua2G01jb5THvm03L5aioX5qmsCdEauCsQoK-04iiv1VoyBoSwvPNAFLoxxDFlgYzPmsOHzuDGZHExL2AELhT7AkJ2qsqPX7HFl_YG6pY7y9NKnrzFXHjy_VvTxv9ufROtF46mk-QnYwtV9GQbrX1qP1hDrN_rJPdnTuL0H2emLl2rZu_NPPDNZlo57f5V4p4DyimnMxh2nURKwChLy--gUftyeHHtFqMUXBkwOnc1iznnxr0LBeAPZ0RTCd5cFkdEhZRrhZXAVBEsqIhoi-FMBAqAS0UMhzEPd1F1Mp3oPeSEgGgaJCkDZOMRZ2EmMIE4jUpteN06wqVuU1n0GTfjLp5TG2_4LLX2SI090sIedXSylJnlXTb-XF0zCl6uLHRbR43ShGlxEV_TIIzAx4EoMN7_XeoArQbmrNq0sgaqzl8W-hD8jLk4sgfsA5FNynU
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED4hGICBN6I8M7BB2jqx05gtIEqBFCooiC3yKxICtRW0DPx6zk5a8RJiy-CTrDs7352_ewDs57G01H_d13kufUoF8bnWlkJsUENUQ9bd0L72VdS6oxcP7GEKDie1MMYYl3xmqvbTcfm6r0b2qayG0Rq6KxigzzBKKSuqtSacAaW86D0QhX6Mkcy4RKbOa9379Pbc5nHxasApXin-BYbcXJUfP2OHMM1FaI_3ViSWPFVHQ1lV79_aNv5380uwULqaXlKcjWWYMr0VmP_UgHAVeOc6TW6OvMTrPA5sZbrRtes3-8Q3GBjtNTtniX-MOKe99Lbb9hKlEKUcMb8Gd83T7knLL4cp-CrgbOgbHgshrIMXSkQgwalhCv25PI6oDpkwmmhJmKZEMhmxBie5DDRCl444CWMRrsN0r98zG-CFiGkGJRlHbBOR4GEuCcVIjSljmd0KkLFuM1V2GrcDL54zF3HUeebskVl7ZKU9KnAwkRkUfTb-XL1qFTxZWeq2AttjE2blVXzNgjBCLwfjwHjzd6k9mG1122mWnl9dbsFcYE-uSzLbhunhy8jsoNcxlLvusH0AXNLNwg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=POLAR%3A+A+Pipelined%2FOverlapped+FPGA-Based+LSTM+Accelerator&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Bank-Tavakoli%2C+Erfan&rft.au=Ghasemzadeh%2C+Seyed+Abolfazl&rft.au=Kamal%2C+Mehdi&rft.au=Afzali-Kusha%2C+Ali&rft.date=2020-03-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=28&rft.issue=3&rft.spage=838&rft.epage=842&rft_id=info:doi/10.1109%2FTVLSI.2019.2947639&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2019_2947639
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon