7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC

Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performin...

Full description

Saved in:
Bibliographic Details
Published inDigest of technical papers - IEEE International Solid-State Circuits Conference pp. 130 - 132
Main Authors Song, Jinook, Cho, Yunkyo, Park, Jun-Seok, Jang, Jun-Woo, Lee, Sehwan, Song, Joon-Ho, Lee, Jae-Gon, Kang, Inyup
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.02.2019
Subjects
Online AccessGet full text
ISSN2376-8606
DOI10.1109/ISSCC.2019.8662476

Cover

Abstract Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
AbstractList Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
Author Lee, Sehwan
Song, Jinook
Cho, Yunkyo
Song, Joon-Ho
Lee, Jae-Gon
Kang, Inyup
Jang, Jun-Woo
Park, Jun-Seok
Author_xml – sequence: 1
  givenname: Jinook
  surname: Song
  fullname: Song, Jinook
  organization: Samsung Electronics, Hwaseong, Korea
– sequence: 2
  givenname: Yunkyo
  surname: Cho
  fullname: Cho, Yunkyo
  organization: Samsung Electronics, Hwaseong, Korea
– sequence: 3
  givenname: Jun-Seok
  surname: Park
  fullname: Park, Jun-Seok
  organization: Samsung Electronics, Hwaseong, Korea
– sequence: 4
  givenname: Jun-Woo
  surname: Jang
  fullname: Jang, Jun-Woo
  organization: Samsung Advanced Institute of Technology, Suwon, Korea
– sequence: 5
  givenname: Sehwan
  surname: Lee
  fullname: Lee, Sehwan
  organization: Samsung Advanced Institute of Technology, Suwon, Korea
– sequence: 6
  givenname: Joon-Ho
  surname: Song
  fullname: Song, Joon-Ho
  organization: Samsung Advanced Institute of Technology, Suwon, Korea
– sequence: 7
  givenname: Jae-Gon
  surname: Lee
  fullname: Lee, Jae-Gon
  organization: Samsung Electronics, Hwaseong, Korea
– sequence: 8
  givenname: Inyup
  surname: Kang
  fullname: Kang, Inyup
  organization: Samsung Electronics, Hwaseong, Korea
BookMark eNotkNFOwjAYhavRREBeQG_6Aht_2-1vezknKAkIyTBekrJ1WDM20m0xe3tJ5Oqc7-J8F2dM7uqmtoQ8MQgZAz1bZlmahhyYDhUijyTekKmWisVCoYoFg1sy4kJioBDwgYzb9gcAYo1qRAYZMprUlLEw3m222eyLMuBRsE5S-tJ3nfVlNdCs833e9d7S195UQdpcWnY2vnXdECS_5oIftvemolvf5LZtXX2kn7XrqKupqk90UZlj--3OdN0cXHUZN-kjuS9N1drpNSdkt5jv0vdgtXlbpskqcBq6QHBVSslzazUUKkeJUOoYlDLIEXIUZRGVXB6kiXIsFBaRMAVILXTMD3Feigl5_tc6a-3-7N3J-GF_PUr8AedVW8E
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISSCC.2019.8662476
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781538685310
1538685310
EISSN 2376-8606
EndPage 132
ExternalDocumentID 8662476
Genre orig-research
GroupedDBID 29G
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i90t-328f772cee90d8c6760f95088a6260c63fd4f27b7a4c6d86d43ad0793952b5cf3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:44:40 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-328f772cee90d8c6760f95088a6260c63fd4f27b7a4c6d86d43ad0793952b5cf3
PageCount 3
ParticipantIDs ieee_primary_8662476
PublicationCentury 2000
PublicationDate 2019-Feb.
PublicationDateYYYYMMDD 2019-02-01
PublicationDate_xml – month: 02
  year: 2019
  text: 2019-Feb.
PublicationDecade 2010
PublicationTitle Digest of technical papers - IEEE International Solid-State Circuits Conference
PublicationTitleAbbrev ISSCC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0005968
Score 2.478625
Snippet Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile...
SourceID ieee
SourceType Publisher
StartPage 130
SubjectTerms Bandwidth
Central Processing Unit
Clocks
Kernel
Neural networks
Parallel processing
Semiconductor device measurement
Title 7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC
URI https://ieeexplore.ieee.org/document/8662476
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG62nfTij834Oz14tIxBaelxQZdpMl3CjLstLRSziLCYETP_el-BbWo8eCGlSWl5bXhfy_e-h9CVdlxmS5cRLqQ5urFdopIILoJ6VHFqvKJhWzyw4RO9n3rTBrrexMJorUvymbZMsfyXH-dRYY7Kuj5jDuWsiZqciypWa0vnEMxfB8XYonsXhkFgmFuwFKpWP9KnlN5jsIdG634r0sirVSyVFX3-kmT878D2UWcbp4fHGw90gBo6O0S73yQG22jFrR7uZ7jXs7zJ4zjsPmNw9pSM-gGuk1SnKxyWKrLFu8Y3hUxJAP3jcCFLwgbpf0i4NSoeMsV1YAE8Gxu4iucZ9rM3PEjli-F94VGu4DuDwzzooMngdhIMSZ1ugcyFvSSu4ycAtWHIwo79iHFmJ2WKWGn2PBFzk5gmDldc0ojFPoupK2Mjryc8R3lR4h6hVpZn-hhhBu-hpBMDfIPtN7RQTBucKLkPRcc7QW1jw9miEtSY1eY7_bv6DO2Yeayo0ueoBRbRF4AEluqyXAJfFAOwVA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG7mPKgXf2zG3_bgURiD0sJxQZdNx1zCjN6WFopZRFjMiJl_va_ANjUevJBCUloeDe-98n3fQ-hKmhY1uEU15nK1dWNYmohDOLjEJoIR5RUV2mJIe4_k7tl-rqHrFRdGSlmAz6SumsW__CgLc7VV1nIoNQmjG2jThqyClWytNaDDpc6SFmO4rX4QeJ7CbsFiKPv9KKBS-I_uLvKXI5ewkVc9nws9_Pwlyvjfqe2h5pqph0crH7SPajI9QDvfRAYbaMH0Nu6kuN3W7fHDKGg9YXD3RPM7Hq7KVCcLHBQ6svm7xDc5TzQPxsfBjBeQDa3zweFU6XjwBFfUArg3VgErnqbYSd9wN-EvCvmF_UzAlwYHmddE4-7t2OtpVcEFbeoac80ynRiCbZiya0ROSBk14qJILFdZT0itOCKxyQTjJKSRQyNi8UgJ7Lm2Kewwtg5RPc1SeYQwhecQ3IwggIMEHHoIKlWkyJkDTdM-Rg1lw8mslNSYVOY7-fvyJdrqjf3BZNAf3p-ibfVOS-D0GaqDdeQ5xAVzcVEshy-IL7Ol
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=7.1+An+11.5TOPS%2FW+1024-MAC+Butterfly+Structure+Dual-Core+Sparsity-Aware+Neural+Processing+Unit+in+8nm+Flagship+Mobile+SoC&rft.au=Song%2C+Jinook&rft.au=Cho%2C+Yunkyo&rft.au=Park%2C+Jun-Seok&rft.au=Jang%2C+Jun-Woo&rft.date=2019-02-01&rft.pub=IEEE&rft.eissn=2376-8606&rft.spage=130&rft.epage=132&rft_id=info:doi/10.1109%2FISSCC.2019.8662476&rft.externalDocID=8662476