7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16

Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN lear...

Full description

Saved in:

Bibliographic Details
Published in	Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 142 - 144
Main Authors	Lee, Jinsu, Lee, Juhyoung, Han, Donghyeon, Lee, Jinmook, Park, Gwangtae, Yoo, Hoi-Jun
Format	Conference Proceeding
Language	English
Published	IEEE 01.02.2019
Subjects	Acceleration Deep learning Neural networks Registers Throughput Training
Online Access	Get full text
ISSN	2376-8606
DOI	10.1109/ISSCC.2019.8662302

Cover

Abstract	Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16.
AbstractList	Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16.
Author	Lee, Jinmook Lee, Juhyoung Lee, Jinsu Han, Donghyeon Park, Gwangtae Yoo, Hoi-Jun
Author_xml	– sequence: 1 givenname: Jinsu surname: Lee fullname: Lee, Jinsu organization: KAIST, Daejeon, Korea – sequence: 2 givenname: Juhyoung surname: Lee fullname: Lee, Juhyoung organization: KAIST, Daejeon, Korea – sequence: 3 givenname: Donghyeon surname: Han fullname: Han, Donghyeon organization: KAIST, Daejeon, Korea – sequence: 4 givenname: Jinmook surname: Lee fullname: Lee, Jinmook organization: KAIST, Daejeon, Korea – sequence: 5 givenname: Gwangtae surname: Park fullname: Park, Gwangtae organization: KAIST, Daejeon, Korea – sequence: 6 givenname: Hoi-Jun surname: Yoo fullname: Yoo, Hoi-Jun organization: KAIST, Daejeon, Korea
BookMark	eNot0N1KwzAcBfAoCrq5F9CbvEC6fDRp4t2odg7qVujEy5G2_2p0tiOpTN_egrs5v7sD50zQRdd3gNAtoxFj1MxXZZmmEafMRFopLig_QzOTaCaFVloKRs_RNReJIlpRdYUmIXxQSqVR-hoNSZTgfF283OMF5jIS2yzfFOX8FZcH6wPgB4ADWcO3t_uR4dj7T5yD9Z3r3nDh-xpC6D0-uuEdZ64DsvR2pMHP7mfMwkPtgus73Lc4KzTJCqZu0GVr9wFmJ6domz1u0yeSb5ardJETZ-hAjIqZjE0LleWK1rZJbFUrYNBwIyqQcdXa2nKm61jGisqmgXGxaQ0X0nKuxBTd_dc6ANgdvPuy_nd3-kj8AfpxWhw
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ISSCC.2019.8662302
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9781538685310 1538685310
EISSN	2376-8606
EndPage	144
ExternalDocumentID	8662302
Genre	orig-research
GroupedDBID	29G 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i90t-9641549feba260cad7abc6e1ed293be54bfaca218c454605dde5389f9235a2263
IEDL.DBID	RIE
IngestDate	Wed Sep 03 07:09:17 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i90t-9641549feba260cad7abc6e1ed293be54bfaca218c454605dde5389f9235a2263
PageCount	3
ParticipantIDs	ieee_primary_8662302
PublicationCentury	2000
PublicationDate	2019-Feb.
PublicationDateYYYYMMDD	2019-02-01
PublicationDate_xml	– month: 02 year: 2019 text: 2019-Feb.
PublicationDecade	2010
PublicationTitle	Digest of technical papers - IEEE International Solid-State Circuits Conference
PublicationTitleAbbrev	ISSCC
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0005968
Score	2.4987178
Snippet	Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN...
SourceID	ieee
SourceType	Publisher
StartPage	142
SubjectTerms	Acceleration Deep learning Neural networks Registers Throughput Training
Title	7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16
URI	https://ieeexplore.ieee.org/document/8662302
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT4MwGG_mTnrxsRnf6cGjZTxKKd7MFB_ZJglb3G1py4dZNGNZWGL8622BbWo8eAFCAjQt6fd97e-B0CX3TN2TUiJ8rgh1MpsINwuJApsHKrXBK10i-gP2MKJPY3_cQFdrLgwAlOAzsMxluZef5mpplso6nOlgbZQjtwLOKq7WBs4RMr4ixdhh5zFJul2D3NK_QvXUD_uUMnpEu6i_-m4FGnmzloW01OcvScb_NmwPtTc8PRyvI9A-asDsAO18kxhsoSKwAtwbxKNrfINd3_KGUe85TjovOJnrmhbwLcC8lOgQ7_pUYsJxLbr6imsaQb7AZr0WR_q95N6YSkCK-9MPfYwXtUkPzjMcxZxEscPaaBjdDbsPpHZaINPQLkjIqFFqy0AKXd4okQZCKgYOpDoZkOBTmQkldDKgqG_2UfWUqOfJMNPJoS90_uYdouYsn8ERwjYF1w2EkQEUVNiSS5UGTPoeZI6Ol-wYtUz3TeaVlsak7rmTv2-fom0zhBVK-gw1i8USznUSUMiLcvS_AH2przU
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT4MwGG8WPagXH5vxbQ8eLYPR8vBmprjpNklgcbelLR9m0YxlYYnxr7cFtqnx4AUICdCUpt_3tb8HQleereuehBLOPEmolZqEt1KfSDA9VyYm2IVLRH_gdIb0ccRGNXS94sIAQAE-A0NfFnv5SSYXeqms6TkqWGvlyE1GKWUlW2sN6PAdb0mLMf1mN4rabY3dUoOhfO6HgUoRP4Jd1F9-uYSNvBmLXBjy85co43-btocaa6YeDlcxaB_VYHqAdr6JDNZR7hou7g3C4Q2-xS1m2HHQew6j5guOZqqqBXwHMCtEOvi7OhWocFzJrr7iikiQzbFescWBei950LYSkOD-5EMdw3ll04OzFAehR4LQchooDu7jdodUXgtk4ps58R2qtdpSEFwVOJInLhfSAQsSlQ4IYFSkXHKVDkjK9E6qmhTVTOmnKj1kXGVw9iHamGZTOELYpNBquVwLAXLKTeEJmbiOYDakloqYzjGq6-4bz0o1jXHVcyd_375EW5243xv3uoOnU7Stf2eJmT5DG_l8AecqJcjFRTESvgDbD7KC
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=7.7+LNPU%3A+A+25.3TFLOPS%2FW+Sparse+Deep-Neural-Network+Learning+Processor+with+Fine-Grained+Mixed+Precision+of+FP8-FP16&rft.au=Lee%2C+Jinsu&rft.au=Lee%2C+Juhyoung&rft.au=Han%2C+Donghyeon&rft.au=Lee%2C+Jinmook&rft.date=2019-02-01&rft.pub=IEEE&rft.eissn=2376-8606&rft.spage=142&rft.epage=144&rft_id=info:doi/10.1109%2FISSCC.2019.8662302&rft.externalDocID=8662302