7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16
Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN lear...
        Saved in:
      
    
          | Published in | Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 142 - 144 | 
|---|---|
| Main Authors | , , , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.02.2019
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2376-8606 | 
| DOI | 10.1109/ISSCC.2019.8662302 | 
Cover
| Abstract | Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16. | 
    
|---|---|
| AbstractList | Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN learning with domain-specific and private data is required meet various user preferences on edge or mobile devices. Since edge and mobile devices contain only limited computation capability with battery power, an energy-efficient DNN learning processor is necessary. Only [6] supported on-chip DNN learning, but it was not energy-efficient, as it did not utilize sparsity which represents 37%-61% of the inputs for various CNNs, such as VGG16, AlexNet and ResNet-18, as shown in Fig. 7.7.1. Although [3-5] utilized the sparsity, they only considered the inference phase with inter-channel accumulation in Fig. 7.7.1, and did not support intra-channel accumulation for the weight-gradient generation (WG) step of the learning phase. Also, [6] adopted FP16, but it was not energy optimal because FP8 is enough for many input operands with 4× less energy than FP16. | 
    
| Author | Lee, Jinmook Lee, Juhyoung Lee, Jinsu Han, Donghyeon Park, Gwangtae Yoo, Hoi-Jun  | 
    
| Author_xml | – sequence: 1 givenname: Jinsu surname: Lee fullname: Lee, Jinsu organization: KAIST, Daejeon, Korea – sequence: 2 givenname: Juhyoung surname: Lee fullname: Lee, Juhyoung organization: KAIST, Daejeon, Korea – sequence: 3 givenname: Donghyeon surname: Han fullname: Han, Donghyeon organization: KAIST, Daejeon, Korea – sequence: 4 givenname: Jinmook surname: Lee fullname: Lee, Jinmook organization: KAIST, Daejeon, Korea – sequence: 5 givenname: Gwangtae surname: Park fullname: Park, Gwangtae organization: KAIST, Daejeon, Korea – sequence: 6 givenname: Hoi-Jun surname: Yoo fullname: Yoo, Hoi-Jun organization: KAIST, Daejeon, Korea  | 
    
| BookMark | eNot0N1KwzAcBfAoCrq5F9CbvEC6fDRp4t2odg7qVujEy5G2_2p0tiOpTN_egrs5v7sD50zQRdd3gNAtoxFj1MxXZZmmEafMRFopLig_QzOTaCaFVloKRs_RNReJIlpRdYUmIXxQSqVR-hoNSZTgfF283OMF5jIS2yzfFOX8FZcH6wPgB4ADWcO3t_uR4dj7T5yD9Z3r3nDh-xpC6D0-uuEdZ64DsvR2pMHP7mfMwkPtgus73Lc4KzTJCqZu0GVr9wFmJ6domz1u0yeSb5ardJETZ-hAjIqZjE0LleWK1rZJbFUrYNBwIyqQcdXa2nKm61jGisqmgXGxaQ0X0nKuxBTd_dc6ANgdvPuy_nd3-kj8AfpxWhw | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IH CBEJK RIE RIO  | 
    
| DOI | 10.1109/ISSCC.2019.8662302 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering | 
    
| EISBN | 9781538685310 1538685310  | 
    
| EISSN | 2376-8606 | 
    
| EndPage | 144 | 
    
| ExternalDocumentID | 8662302 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 29G 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS  | 
    
| ID | FETCH-LOGICAL-i90t-9641549feba260cad7abc6e1ed293be54bfaca218c454605dde5389f9235a2263 | 
    
| IEDL.DBID | RIE | 
    
| IngestDate | Wed Sep 03 07:09:17 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i90t-9641549feba260cad7abc6e1ed293be54bfaca218c454605dde5389f9235a2263 | 
    
| PageCount | 3 | 
    
| ParticipantIDs | ieee_primary_8662302 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2019-Feb. | 
    
| PublicationDateYYYYMMDD | 2019-02-01 | 
    
| PublicationDate_xml | – month: 02 year: 2019 text: 2019-Feb.  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | Digest of technical papers - IEEE International Solid-State Circuits Conference | 
    
| PublicationTitleAbbrev | ISSCC | 
    
| PublicationYear | 2019 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0005968 | 
    
| Score | 2.4987178 | 
    
| Snippet | Recently, deep neural network (DNN) hardware accelerators have been reported for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 142 | 
    
| SubjectTerms | Acceleration Deep learning Neural networks Registers Throughput Training  | 
    
| Title | 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 | 
    
| URI | https://ieeexplore.ieee.org/document/8662302 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT4MwGG_mTnrxsRnf6cGjZTxKKd7MFB_ZJglb3G1py4dZNGNZWGL8622BbWo8eAFCAjQt6fd97e-B0CX3TN2TUiJ8rgh1MpsINwuJApsHKrXBK10i-gP2MKJPY3_cQFdrLgwAlOAzsMxluZef5mpplso6nOlgbZQjtwLOKq7WBs4RMr4ixdhh5zFJul2D3NK_QvXUD_uUMnpEu6i_-m4FGnmzloW01OcvScb_NmwPtTc8PRyvI9A-asDsAO18kxhsoSKwAtwbxKNrfINd3_KGUe85TjovOJnrmhbwLcC8lOgQ7_pUYsJxLbr6imsaQb7AZr0WR_q95N6YSkCK-9MPfYwXtUkPzjMcxZxEscPaaBjdDbsPpHZaINPQLkjIqFFqy0AKXd4okQZCKgYOpDoZkOBTmQkldDKgqG_2UfWUqOfJMNPJoS90_uYdouYsn8ERwjYF1w2EkQEUVNiSS5UGTPoeZI6Ol-wYtUz3TeaVlsak7rmTv2-fom0zhBVK-gw1i8USznUSUMiLcvS_AH2przU | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT4MwGG8WPagXH5vxbQ8eLYPR8vBmprjpNklgcbelLR9m0YxlYYnxr7cFtqnx4AUICdCUpt_3tb8HQleereuehBLOPEmolZqEt1KfSDA9VyYm2IVLRH_gdIb0ccRGNXS94sIAQAE-A0NfFnv5SSYXeqms6TkqWGvlyE1GKWUlW2sN6PAdb0mLMf1mN4rabY3dUoOhfO6HgUoRP4Jd1F9-uYSNvBmLXBjy85co43-btocaa6YeDlcxaB_VYHqAdr6JDNZR7hou7g3C4Q2-xS1m2HHQew6j5guOZqqqBXwHMCtEOvi7OhWocFzJrr7iikiQzbFescWBei950LYSkOD-5EMdw3ll04OzFAehR4LQchooDu7jdodUXgtk4ps58R2qtdpSEFwVOJInLhfSAQsSlQ4IYFSkXHKVDkjK9E6qmhTVTOmnKj1kXGVw9iHamGZTOELYpNBquVwLAXLKTeEJmbiOYDakloqYzjGq6-4bz0o1jXHVcyd_375EW5243xv3uoOnU7Stf2eJmT5DG_l8AecqJcjFRTESvgDbD7KC | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=7.7+LNPU%3A+A+25.3TFLOPS%2FW+Sparse+Deep-Neural-Network+Learning+Processor+with+Fine-Grained+Mixed+Precision+of+FP8-FP16&rft.au=Lee%2C+Jinsu&rft.au=Lee%2C+Juhyoung&rft.au=Han%2C+Donghyeon&rft.au=Lee%2C+Jinmook&rft.date=2019-02-01&rft.pub=IEEE&rft.eissn=2376-8606&rft.spage=142&rft.epage=144&rft_id=info:doi/10.1109%2FISSCC.2019.8662302&rft.externalDocID=8662302 |