7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC

Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performin...

Full description

Saved in:

Bibliographic Details
Published in	Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 130 - 132
Main Authors	Song, Jinook, Cho, Yunkyo, Park, Jun-Seok, Jang, Jun-Woo, Lee, Sehwan, Song, Joon-Ho, Lee, Jae-Gon, Kang, Inyup
Format	Conference Proceeding
Language	English
Published	IEEE 01.02.2019
Subjects	Bandwidth Central Processing Unit Clocks Kernel Neural networks Parallel processing Semiconductor device measurement
Online Access	Get full text
ISSN	2376-8606
DOI	10.1109/ISSCC.2019.8662476

Cover

Abstract	Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
AbstractList	Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained.
Author	Lee, Sehwan Song, Jinook Cho, Yunkyo Song, Joon-Ho Lee, Jae-Gon Kang, Inyup Jang, Jun-Woo Park, Jun-Seok
Author_xml	– sequence: 1 givenname: Jinook surname: Song fullname: Song, Jinook organization: Samsung Electronics, Hwaseong, Korea – sequence: 2 givenname: Yunkyo surname: Cho fullname: Cho, Yunkyo organization: Samsung Electronics, Hwaseong, Korea – sequence: 3 givenname: Jun-Seok surname: Park fullname: Park, Jun-Seok organization: Samsung Electronics, Hwaseong, Korea – sequence: 4 givenname: Jun-Woo surname: Jang fullname: Jang, Jun-Woo organization: Samsung Advanced Institute of Technology, Suwon, Korea – sequence: 5 givenname: Sehwan surname: Lee fullname: Lee, Sehwan organization: Samsung Advanced Institute of Technology, Suwon, Korea – sequence: 6 givenname: Joon-Ho surname: Song fullname: Song, Joon-Ho organization: Samsung Advanced Institute of Technology, Suwon, Korea – sequence: 7 givenname: Jae-Gon surname: Lee fullname: Lee, Jae-Gon organization: Samsung Electronics, Hwaseong, Korea – sequence: 8 givenname: Inyup surname: Kang fullname: Kang, Inyup organization: Samsung Electronics, Hwaseong, Korea
BookMark	eNotkNFOwjAYhavRREBeQG_6Aht_2-1vezknKAkIyTBekrJ1WDM20m0xe3tJ5Oqc7-J8F2dM7uqmtoQ8MQgZAz1bZlmahhyYDhUijyTekKmWisVCoYoFg1sy4kJioBDwgYzb9gcAYo1qRAYZMprUlLEw3m222eyLMuBRsE5S-tJ3nfVlNdCs833e9d7S195UQdpcWnY2vnXdECS_5oIftvemolvf5LZtXX2kn7XrqKupqk90UZlj--3OdN0cXHUZN-kjuS9N1drpNSdkt5jv0vdgtXlbpskqcBq6QHBVSslzazUUKkeJUOoYlDLIEXIUZRGVXB6kiXIsFBaRMAVILXTMD3Feigl5_tc6a-3-7N3J-GF_PUr8AedVW8E
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ISSCC.2019.8662476
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	9781538685310 1538685310
EISSN	2376-8606
EndPage	132
ExternalDocumentID	8662476
Genre	orig-research
GroupedDBID	29G 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i90t-328f772cee90d8c6760f95088a6260c63fd4f27b7a4c6d86d43ad0793952b5cf3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:44:40 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i90t-328f772cee90d8c6760f95088a6260c63fd4f27b7a4c6d86d43ad0793952b5cf3
PageCount	3
ParticipantIDs	ieee_primary_8662476
PublicationCentury	2000
PublicationDate	2019-Feb.
PublicationDateYYYYMMDD	2019-02-01
PublicationDate_xml	– month: 02 year: 2019 text: 2019-Feb.
PublicationDecade	2010
PublicationTitle	Digest of technical papers - IEEE International Solid-State Circuits Conference
PublicationTitleAbbrev	ISSCC
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0005968
Score	2.478625
Snippet	Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile...
SourceID	ieee
SourceType	Publisher
StartPage	130
SubjectTerms	Bandwidth Central Processing Unit Clocks Kernel Neural networks Parallel processing Semiconductor device measurement
Title	7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC
URI	https://ieeexplore.ieee.org/document/8662476
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG62nfTij834Oz14tIxBaelxQZdpMl3CjLstLRSziLCYETP_el-BbWo8eCGlSWl5bXhfy_e-h9CVdlxmS5cRLqQ5urFdopIILoJ6VHFqvKJhWzyw4RO9n3rTBrrexMJorUvymbZMsfyXH-dRYY7Kuj5jDuWsiZqciypWa0vnEMxfB8XYonsXhkFgmFuwFKpWP9KnlN5jsIdG634r0sirVSyVFX3-kmT878D2UWcbp4fHGw90gBo6O0S73yQG22jFrR7uZ7jXs7zJ4zjsPmNw9pSM-gGuk1SnKxyWKrLFu8Y3hUxJAP3jcCFLwgbpf0i4NSoeMsV1YAE8Gxu4iucZ9rM3PEjli-F94VGu4DuDwzzooMngdhIMSZ1ugcyFvSSu4ycAtWHIwo79iHFmJ2WKWGn2PBFzk5gmDldc0ojFPoupK2Mjryc8R3lR4h6hVpZn-hhhBu-hpBMDfIPtN7RQTBucKLkPRcc7QW1jw9miEtSY1eY7_bv6DO2Yeayo0ueoBRbRF4AEluqyXAJfFAOwVA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG7mPKgXf2zG3_bgURiD0sJxQZdNx1zCjN6WFopZRFjMiJl_va_ANjUevJBCUloeDe-98n3fQ-hKmhY1uEU15nK1dWNYmohDOLjEJoIR5RUV2mJIe4_k7tl-rqHrFRdGSlmAz6SumsW__CgLc7VV1nIoNQmjG2jThqyClWytNaDDpc6SFmO4rX4QeJ7CbsFiKPv9KKBS-I_uLvKXI5ewkVc9nws9_Pwlyvjfqe2h5pqph0crH7SPajI9QDvfRAYbaMH0Nu6kuN3W7fHDKGg9YXD3RPM7Hq7KVCcLHBQ6svm7xDc5TzQPxsfBjBeQDa3zweFU6XjwBFfUArg3VgErnqbYSd9wN-EvCvmF_UzAlwYHmddE4-7t2OtpVcEFbeoac80ynRiCbZiya0ROSBk14qJILFdZT0itOCKxyQTjJKSRQyNi8UgJ7Lm2Kewwtg5RPc1SeYQwhecQ3IwggIMEHHoIKlWkyJkDTdM-Rg1lw8mslNSYVOY7-fvyJdrqjf3BZNAf3p-ibfVOS-D0GaqDdeQ5xAVzcVEshy-IL7Ol
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=7.1+An+11.5TOPS%2FW+1024-MAC+Butterfly+Structure+Dual-Core+Sparsity-Aware+Neural+Processing+Unit+in+8nm+Flagship+Mobile+SoC&rft.au=Song%2C+Jinook&rft.au=Cho%2C+Yunkyo&rft.au=Park%2C+Jun-Seok&rft.au=Jang%2C+Jun-Woo&rft.date=2019-02-01&rft.pub=IEEE&rft.eissn=2376-8606&rft.spage=130&rft.epage=132&rft_id=info:doi/10.1109%2FISSCC.2019.8662476&rft.externalDocID=8662476