POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator

In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the op...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on very large scale integration (VLSI) systems Vol. 28; no. 3; pp. 838 - 842
Main Authors Bank-Tavakoli, Erfan, Ghasemzadeh, Seyed Abolfazl, Kamal, Mehdi, Afzali-Kusha, Ali, Pedram, Massoud
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1063-8210
1557-9999
DOI10.1109/TVLSI.2019.2947639

Cover

Abstract In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6x , 43.6x , 21.9x , and 114.5x improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31 faster than the best previously reported results.
AbstractList In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6x , 43.6x , 21.9x , and 114.5x improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31 faster than the best previously reported results.
In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the register-transfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about [Formula Omitted], [Formula Omitted], [Formula Omitted], and [Formula Omitted] improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is [Formula Omitted] faster than the best previously reported results.
Author Kamal, Mehdi
Afzali-Kusha, Ali
Bank-Tavakoli, Erfan
Pedram, Massoud
Ghasemzadeh, Seyed Abolfazl
Author_xml – sequence: 1
  givenname: Erfan
  surname: Bank-Tavakoli
  fullname: Bank-Tavakoli, Erfan
  email: btavakoli@ut.ac.ir
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 2
  givenname: Seyed Abolfazl
  surname: Ghasemzadeh
  fullname: Ghasemzadeh, Seyed Abolfazl
  email: a.ghasemzadeh@ut.ac.ir
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 3
  givenname: Mehdi
  orcidid: 0000-0001-7098-6440
  surname: Kamal
  fullname: Kamal, Mehdi
  email: mehdikamal@ut.ac.ir
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 4
  givenname: Ali
  orcidid: 0000-0001-8614-2007
  surname: Afzali-Kusha
  fullname: Afzali-Kusha, Ali
  email: afzali@ut.ac.ir
  organization: School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
– sequence: 5
  givenname: Massoud
  orcidid: 0000-0002-2677-7307
  surname: Pedram
  fullname: Pedram, Massoud
  email: pedram@usc.edu
  organization: Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
BookMark eNp9kE1Lw0AQhhepYFv9A3opeE67sx_JjrdYbBUiLbZ6XTbZDaTEJG5SwX9vaosHD85l3sP7zMAzIoOqrhwh10CnABRn27dk8zRlFHDKUEQhxzMyBCmjAPsZ9JmGPFAM6AUZte2OUhAC6ZDgepXEL3eTeLIuGlcWlbOz1afzpWkaZyeL9TIO7k3bx2SzfZ7EWeZK501X-0tynpuydVenPSavi4ft_DFIVsuneZwEGUPZBQ6VMYZCCDxFpAaFkxkVMlehsFwaZ8GmIK2AVKahjBDylFkRgQ0RuDJ8TG6Pdxtff-xd2-ldvfdV_1IzHipGBUrVt9Sxlfm6bb3LdVZ0pivqqvOmKDVQfRClf0Tpgyh9EtWj7A_a-OLd-K__oZsjVDjnfgGlFEYR5d9Q1HNs
CODEN IEVSE9
CitedBy_id crossref_primary_10_1109_ACCESS_2023_3329048
crossref_primary_10_1145_3564929
crossref_primary_10_1109_TPDS_2021_3063670
crossref_primary_10_1109_TVLSI_2021_3135353
crossref_primary_10_1145_3534969
crossref_primary_10_1109_TC_2022_3207137
crossref_primary_10_1109_TCSI_2024_3464687
crossref_primary_10_3390_technologies8030046
crossref_primary_10_1109_TCAD_2021_3093832
crossref_primary_10_1145_3629979
crossref_primary_10_1145_3699512
crossref_primary_10_1109_TCDS_2022_3147253
crossref_primary_10_1109_ACCESS_2024_3488033
crossref_primary_10_1109_TETC_2022_3230961
crossref_primary_10_1109_TBCAS_2021_3064841
crossref_primary_10_1016_j_sysarc_2024_103181
crossref_primary_10_1109_TC_2024_3500368
crossref_primary_10_1007_s00034_023_02456_6
crossref_primary_10_1016_j_micpro_2021_104374
crossref_primary_10_1007_s00034_023_02412_4
Cites_doi 10.1162/neco.1997.9.8.1735
10.1109/ReConFig.2016.7857151
10.1049/cp:19991218
10.1016/S0167-8655(99)00077-X
10.1109/FPL.2018.00015
10.1109/TNNLS.2016.2582924
10.1109/IJCNN.2000.861302
10.1109/ICIP.2018.8451053
10.1109/CVPR.2014.223
10.1109/TASLP.2014.2303296
10.1109/ASPDAC.2017.7858394
10.1109/ISCAS.2017.8050816
10.1109/ICASSP.2013.6638947
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TVLSI.2019.2947639
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1557-9999
EndPage 842
ExternalDocumentID 10_1109_TVLSI_2019_2947639
8889770
Genre orig-research
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TN5
VH1
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c295t-e98aaa01613b990a94e5c045f864d35aed1db15d41b5b65791fb2d471d69138a3
IEDL.DBID RIE
ISSN 1063-8210
IngestDate Sun Oct 05 00:22:28 EDT 2025
Thu Apr 24 22:54:51 EDT 2025
Wed Oct 01 02:59:25 EDT 2025
Wed Aug 27 02:35:30 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-e98aaa01613b990a94e5c045f864d35aed1db15d41b5b65791fb2d471d69138a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-7098-6440
0000-0001-8614-2007
0000-0002-2677-7307
PQID 2368204958
PQPubID 85424
PageCount 5
ParticipantIDs proquest_journals_2368204958
crossref_citationtrail_10_1109_TVLSI_2019_2947639
ieee_primary_8889770
crossref_primary_10_1109_TVLSI_2019_2947639
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-03-01
PublicationDateYYYYMMDD 2020-03-01
PublicationDate_xml – month: 03
  year: 2020
  text: 2020-03-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on very large scale integration (VLSI) systems
PublicationTitleAbbrev TVLSI
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
chang (ref11) 2015
ref10
ref2
ref16
han (ref6) 2016
ref8
ref7
ref9
ref4
ref3
(ref17) 2019
ref5
krizhevsky (ref1) 2012
References_xml – year: 2015
  ident: ref11
  article-title: Recurrent neural networks hardware implementation on FPGA
  publication-title: arXiv 1511 05552
– ident: ref5
  doi: 10.1162/neco.1997.9.8.1735
– year: 2016
  ident: ref6
  article-title: ESE: Efficient speech recognition engine with sparse LSTM on FPGA
  publication-title: arXiv 1612 00694
– ident: ref13
  doi: 10.1109/ReConFig.2016.7857151
– ident: ref8
  doi: 10.1049/cp:19991218
– year: 2019
  ident: ref17
  publication-title: UCI Machine Learning Repository Japanese Vowels Dataset
– ident: ref16
  doi: 10.1016/S0167-8655(99)00077-X
– ident: ref15
  doi: 10.1109/FPL.2018.00015
– start-page: 1097
  year: 2012
  ident: ref1
  article-title: Imagenet classification with deep convolutional neural networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref10
  doi: 10.1109/TNNLS.2016.2582924
– ident: ref9
  doi: 10.1109/IJCNN.2000.861302
– ident: ref7
  doi: 10.1109/ICIP.2018.8451053
– ident: ref3
  doi: 10.1109/CVPR.2014.223
– ident: ref4
  doi: 10.1109/TASLP.2014.2303296
– ident: ref14
  doi: 10.1109/ASPDAC.2017.7858394
– ident: ref12
  doi: 10.1109/ISCAS.2017.8050816
– ident: ref2
  doi: 10.1109/ICASSP.2013.6638947
SSID ssj0014490
Score 2.445659
Snippet In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 838
SubjectTerms Computer architecture
Field programmable gate arrays
Field-programmable gate array (FPGA)
Hardware
high speed
Logic gates
long short-term memory (LSTM) accelerator
low power
low resource utilization
Mapping
Power demand
Power efficiency
Power management
Resource management
Resource utilization
Timing
Title POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator
URI https://ieeexplore.ieee.org/document/8889770
https://www.proquest.com/docview/2368204958
Volume 28
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1557-9999
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014490
  issn: 1063-8210
  databaseCode: RIE
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJxh4FUShoAxskDRO7DRmC4jyUEsr2qJukV-REKitoGXg13N2koqXEFsGn2Td2fnu_N0DoeMsFob6912VZcIlhGOXKWUoxBbRWLaEb4f2de-i6xG5HdNxBZ0ua2G01jb5THvm03L5aioX5qmsCdEauCsQoK-04iiv1VoyBoSwvPNAFLoxxDFlgYzPmsOHzuDGZHExL2AELhT7AkJ2qsqPX7HFl_YG6pY7y9NKnrzFXHjy_VvTxv9ufROtF46mk-QnYwtV9GQbrX1qP1hDrN_rJPdnTuL0H2emLl2rZu_NPPDNZlo57f5V4p4DyimnMxh2nURKwChLy--gUftyeHHtFqMUXBkwOnc1iznnxr0LBeAPZ0RTCd5cFkdEhZRrhZXAVBEsqIhoi-FMBAqAS0UMhzEPd1F1Mp3oPeSEgGgaJCkDZOMRZ2EmMIE4jUpteN06wqVuU1n0GTfjLp5TG2_4LLX2SI090sIedXSylJnlXTb-XF0zCl6uLHRbR43ShGlxEV_TIIzAx4EoMN7_XeoArQbmrNq0sgaqzl8W-hD8jLk4sgfsA5FNynU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED4hGICBN6I8M7BB2jqx05gtIEqBFCooiC3yKxICtRW0DPx6zk5a8RJiy-CTrDs7352_ewDs57G01H_d13kufUoF8bnWlkJsUENUQ9bd0L72VdS6oxcP7GEKDie1MMYYl3xmqvbTcfm6r0b2qayG0Rq6KxigzzBKKSuqtSacAaW86D0QhX6Mkcy4RKbOa9379Pbc5nHxasApXin-BYbcXJUfP2OHMM1FaI_3ViSWPFVHQ1lV79_aNv5380uwULqaXlKcjWWYMr0VmP_UgHAVeOc6TW6OvMTrPA5sZbrRtes3-8Q3GBjtNTtniX-MOKe99Lbb9hKlEKUcMb8Gd83T7knLL4cp-CrgbOgbHgshrIMXSkQgwalhCv25PI6oDpkwmmhJmKZEMhmxBie5DDRCl444CWMRrsN0r98zG-CFiGkGJRlHbBOR4GEuCcVIjSljmd0KkLFuM1V2GrcDL54zF3HUeebskVl7ZKU9KnAwkRkUfTb-XL1qFTxZWeq2AttjE2blVXzNgjBCLwfjwHjzd6k9mG1122mWnl9dbsFcYE-uSzLbhunhy8jsoNcxlLvusH0AXNLNwg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=POLAR%3A+A+Pipelined%2FOverlapped+FPGA-Based+LSTM+Accelerator&rft.jtitle=IEEE+transactions+on+very+large+scale+integration+%28VLSI%29+systems&rft.au=Bank-Tavakoli%2C+Erfan&rft.au=Ghasemzadeh%2C+Seyed+Abolfazl&rft.au=Kamal%2C+Mehdi&rft.au=Afzali-Kusha%2C+Ali&rft.date=2020-03-01&rft.issn=1063-8210&rft.eissn=1557-9999&rft.volume=28&rft.issue=3&rft.spage=838&rft.epage=842&rft_id=info:doi/10.1109%2FTVLSI.2019.2947639&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TVLSI_2019_2947639
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-8210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-8210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-8210&client=summon