7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16

Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN lear...

Full description

Saved in:
Bibliographic Details
Published inDigest of technical papers - IEEE International Solid-State Circuits Conference pp. 142 - 144
Main Authors Lee, Jinsu, Lee, Juhyoung, Han, Donghyeon, Lee, Jinmook, Park, Gwangtae, Yoo, Hoi-Jun
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.02.2019
Subjects
Online AccessGet full text
ISSN2376-8606
DOI10.1109/ISSCC.2019.8662302

Cover

Abstract Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16.
AbstractList Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16.
Author Lee, Jinmook
Lee, Juhyoung
Lee, Jinsu
Han, Donghyeon
Park, Gwangtae
Yoo, Hoi-Jun
Author_xml – sequence: 1
  givenname: Jinsu
  surname: Lee
  fullname: Lee, Jinsu
  organization: KAIST, Daejeon, Korea
– sequence: 2
  givenname: Juhyoung
  surname: Lee
  fullname: Lee, Juhyoung
  organization: KAIST, Daejeon, Korea
– sequence: 3
  givenname: Donghyeon
  surname: Han
  fullname: Han, Donghyeon
  organization: KAIST, Daejeon, Korea
– sequence: 4
  givenname: Jinmook
  surname: Lee
  fullname: Lee, Jinmook
  organization: KAIST, Daejeon, Korea
– sequence: 5
  givenname: Gwangtae
  surname: Park
  fullname: Park, Gwangtae
  organization: KAIST, Daejeon, Korea
– sequence: 6
  givenname: Hoi-Jun
  surname: Yoo
  fullname: Yoo, Hoi-Jun
  organization: KAIST, Daejeon, Korea
BookMark eNot0N1KwzAcBfAoCrq5F9CbvEC6fDRp4t2odg7qVujEy5G2_2p0tiOpTN_egrs5v7sD50zQRdd3gNAtoxFj1MxXZZmmEafMRFopLig_QzOTaCaFVloKRs_RNReJIlpRdYUmIXxQSqVR-hoNSZTgfF283OMF5jIS2yzfFOX8FZcH6wPgB4ADWcO3t_uR4dj7T5yD9Z3r3nDh-xpC6D0-uuEdZ64DsvR2pMHP7mfMwkPtgus73Lc4KzTJCqZu0GVr9wFmJ6domz1u0yeSb5ardJETZ-hAjIqZjE0LleWK1rZJbFUrYNBwIyqQcdXa2nKm61jGisqmgXGxaQ0X0nKuxBTd_dc6ANgdvPuy_nd3-kj8AfpxWhw
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISSCC.2019.8662302
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781538685310
1538685310
EISSN 2376-8606
EndPage 144
ExternalDocumentID 8662302
Genre orig-research
GroupedDBID 29G
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i90t-9641549feba260cad7abc6e1ed293be54bfaca218c454605dde5389f9235a2263
IEDL.DBID RIE
IngestDate Wed Sep 03 07:09:17 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-9641549feba260cad7abc6e1ed293be54bfaca218c454605dde5389f9235a2263
PageCount 3
ParticipantIDs ieee_primary_8662302
PublicationCentury 2000
PublicationDate 2019-Feb.
PublicationDateYYYYMMDD 2019-02-01
PublicationDate_xml – month: 02
  year: 2019
  text: 2019-Feb.
PublicationDecade 2010
PublicationTitle Digest of technical papers - IEEE International Solid-State Circuits Conference
PublicationTitleAbbrev ISSCC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0005968
Score 2.4987178
Snippet Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN...
SourceID ieee
SourceType Publisher
StartPage 142
SubjectTerms Acceleration
Deep learning
Neural networks
Registers
Throughput
Training
Title 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16
URI https://ieeexplore.ieee.org/document/8662302
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT4MwGG_mTnrxsRnf6cGjZTxKKd7MFB_ZJglb3G1py4dZNGNZWGL8622BbWo8eAFCAjQt6fd97e-B0CX3TN2TUiJ8rgh1MpsINwuJApsHKrXBK10i-gP2MKJPY3_cQFdrLgwAlOAzsMxluZef5mpplso6nOlgbZQjtwLOKq7WBs4RMr4ixdhh5zFJul2D3NK_QvXUD_uUMnpEu6i_-m4FGnmzloW01OcvScb_NmwPtTc8PRyvI9A-asDsAO18kxhsoSKwAtwbxKNrfINd3_KGUe85TjovOJnrmhbwLcC8lOgQ7_pUYsJxLbr6imsaQb7AZr0WR_q95N6YSkCK-9MPfYwXtUkPzjMcxZxEscPaaBjdDbsPpHZaINPQLkjIqFFqy0AKXd4okQZCKgYOpDoZkOBTmQkldDKgqG_2UfWUqOfJMNPJoS90_uYdouYsn8ERwjYF1w2EkQEUVNiSS5UGTPoeZI6Ol-wYtUz3TeaVlsak7rmTv2-fom0zhBVK-gw1i8USznUSUMiLcvS_AH2przU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT4MwGG8WPagXH5vxbQ8eLYPR8vBmprjpNklgcbelLR9m0YxlYYnxr7cFtqnx4AUICdCUpt_3tb8HQleereuehBLOPEmolZqEt1KfSDA9VyYm2IVLRH_gdIb0ccRGNXS94sIAQAE-A0NfFnv5SSYXeqms6TkqWGvlyE1GKWUlW2sN6PAdb0mLMf1mN4rabY3dUoOhfO6HgUoRP4Jd1F9-uYSNvBmLXBjy85co43-btocaa6YeDlcxaB_VYHqAdr6JDNZR7hou7g3C4Q2-xS1m2HHQew6j5guOZqqqBXwHMCtEOvi7OhWocFzJrr7iikiQzbFescWBei950LYSkOD-5EMdw3ll04OzFAehR4LQchooDu7jdodUXgtk4ps58R2qtdpSEFwVOJInLhfSAQsSlQ4IYFSkXHKVDkjK9E6qmhTVTOmnKj1kXGVw9iHamGZTOELYpNBquVwLAXLKTeEJmbiOYDakloqYzjGq6-4bz0o1jXHVcyd_375EW5243xv3uoOnU7Stf2eJmT5DG_l8AecqJcjFRTESvgDbD7KC
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=7.7+LNPU%3A+A+25.3TFLOPS%2FW+Sparse+Deep-Neural-Network+Learning+Processor+with+Fine-Grained+Mixed+Precision+of+FP8-FP16&rft.au=Lee%2C+Jinsu&rft.au=Lee%2C+Juhyoung&rft.au=Han%2C+Donghyeon&rft.au=Lee%2C+Jinmook&rft.date=2019-02-01&rft.pub=IEEE&rft.eissn=2376-8606&rft.spage=142&rft.epage=144&rft_id=info:doi/10.1109%2FISSCC.2019.8662302&rft.externalDocID=8662302