7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC
Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performin...
Saved in:
Published in | Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 130 - 132 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.02.2019
|
Subjects | |
Online Access | Get full text |
ISSN | 2376-8606 |
DOI | 10.1109/ISSCC.2019.8662476 |
Cover
Abstract | Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained. |
---|---|
AbstractList | Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energy-efficient neural processing units (NPU) have been studied for performing the convolutional layers (CLs) and fully-connected layers (FCLs) [2-5] in deep neural networks. Moreover, considering that neural networks are getting deeper, the NPU needs to integrate 1K or even more multiply/accumulate (MAC) units. For energy efficiency, compression of neural networks has been studied by pruning neural connections and quantizing weights and features with 8b or even lower fixed-point precision without accuracy loss [1]. A hardware accelerator exploited network sparsity for high utilization of MAC units [3]. However, since it is challenging to predict where pruning is possible, the accelerator needed complex circuitry for selecting an array of features corresponding to an array of non-zero weights. For reducing the power of MAC operations, bit-serial multipliers have been applied [5]. Generally, extremely low- or variable-bit-precision neural networks need to be carefully trained. |
Author | Lee, Sehwan Song, Jinook Cho, Yunkyo Song, Joon-Ho Lee, Jae-Gon Kang, Inyup Jang, Jun-Woo Park, Jun-Seok |
Author_xml | – sequence: 1 givenname: Jinook surname: Song fullname: Song, Jinook organization: Samsung Electronics, Hwaseong, Korea – sequence: 2 givenname: Yunkyo surname: Cho fullname: Cho, Yunkyo organization: Samsung Electronics, Hwaseong, Korea – sequence: 3 givenname: Jun-Seok surname: Park fullname: Park, Jun-Seok organization: Samsung Electronics, Hwaseong, Korea – sequence: 4 givenname: Jun-Woo surname: Jang fullname: Jang, Jun-Woo organization: Samsung Advanced Institute of Technology, Suwon, Korea – sequence: 5 givenname: Sehwan surname: Lee fullname: Lee, Sehwan organization: Samsung Advanced Institute of Technology, Suwon, Korea – sequence: 6 givenname: Joon-Ho surname: Song fullname: Song, Joon-Ho organization: Samsung Advanced Institute of Technology, Suwon, Korea – sequence: 7 givenname: Jae-Gon surname: Lee fullname: Lee, Jae-Gon organization: Samsung Electronics, Hwaseong, Korea – sequence: 8 givenname: Inyup surname: Kang fullname: Kang, Inyup organization: Samsung Electronics, Hwaseong, Korea |
BookMark | eNotkNFOwjAYhavRREBeQG_6Aht_2-1vezknKAkIyTBekrJ1WDM20m0xe3tJ5Oqc7-J8F2dM7uqmtoQ8MQgZAz1bZlmahhyYDhUijyTekKmWisVCoYoFg1sy4kJioBDwgYzb9gcAYo1qRAYZMprUlLEw3m222eyLMuBRsE5S-tJ3nfVlNdCs833e9d7S195UQdpcWnY2vnXdECS_5oIftvemolvf5LZtXX2kn7XrqKupqk90UZlj--3OdN0cXHUZN-kjuS9N1drpNSdkt5jv0vdgtXlbpskqcBq6QHBVSslzazUUKkeJUOoYlDLIEXIUZRGVXB6kiXIsFBaRMAVILXTMD3Feigl5_tc6a-3-7N3J-GF_PUr8AedVW8E |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ISSCC.2019.8662476 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISBN | 9781538685310 1538685310 |
EISSN | 2376-8606 |
EndPage | 132 |
ExternalDocumentID | 8662476 |
Genre | orig-research |
GroupedDBID | 29G 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i90t-328f772cee90d8c6760f95088a6260c63fd4f27b7a4c6d86d43ad0793952b5cf3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:44:40 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-328f772cee90d8c6760f95088a6260c63fd4f27b7a4c6d86d43ad0793952b5cf3 |
PageCount | 3 |
ParticipantIDs | ieee_primary_8662476 |
PublicationCentury | 2000 |
PublicationDate | 2019-Feb. |
PublicationDateYYYYMMDD | 2019-02-01 |
PublicationDate_xml | – month: 02 year: 2019 text: 2019-Feb. |
PublicationDecade | 2010 |
PublicationTitle | Digest of technical papers - IEEE International Solid-State Circuits Conference |
PublicationTitleAbbrev | ISSCC |
PublicationYear | 2019 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0005968 |
Score | 2.478625 |
Snippet | Deep learning has been widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 130 |
SubjectTerms | Bandwidth Central Processing Unit Clocks Kernel Neural networks Parallel processing Semiconductor device measurement |
Title | 7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC |
URI | https://ieeexplore.ieee.org/document/8662476 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG62nfTij834Oz14tIxBaelxQZdpMl3CjLstLRSziLCYETP_el-BbWo8eCGlSWl5bXhfy_e-h9CVdlxmS5cRLqQ5urFdopIILoJ6VHFqvKJhWzyw4RO9n3rTBrrexMJorUvymbZMsfyXH-dRYY7Kuj5jDuWsiZqciypWa0vnEMxfB8XYonsXhkFgmFuwFKpWP9KnlN5jsIdG634r0sirVSyVFX3-kmT878D2UWcbp4fHGw90gBo6O0S73yQG22jFrR7uZ7jXs7zJ4zjsPmNw9pSM-gGuk1SnKxyWKrLFu8Y3hUxJAP3jcCFLwgbpf0i4NSoeMsV1YAE8Gxu4iucZ9rM3PEjli-F94VGu4DuDwzzooMngdhIMSZ1ugcyFvSSu4ycAtWHIwo79iHFmJ2WKWGn2PBFzk5gmDldc0ojFPoupK2Mjryc8R3lR4h6hVpZn-hhhBu-hpBMDfIPtN7RQTBucKLkPRcc7QW1jw9miEtSY1eY7_bv6DO2Yeayo0ueoBRbRF4AEluqyXAJfFAOwVA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG7mPKgXf2zG3_bgURiD0sJxQZdNx1zCjN6WFopZRFjMiJl_va_ANjUevJBCUloeDe-98n3fQ-hKmhY1uEU15nK1dWNYmohDOLjEJoIR5RUV2mJIe4_k7tl-rqHrFRdGSlmAz6SumsW__CgLc7VV1nIoNQmjG2jThqyClWytNaDDpc6SFmO4rX4QeJ7CbsFiKPv9KKBS-I_uLvKXI5ewkVc9nws9_Pwlyvjfqe2h5pqph0crH7SPajI9QDvfRAYbaMH0Nu6kuN3W7fHDKGg9YXD3RPM7Hq7KVCcLHBQ6svm7xDc5TzQPxsfBjBeQDa3zweFU6XjwBFfUArg3VgErnqbYSd9wN-EvCvmF_UzAlwYHmddE4-7t2OtpVcEFbeoac80ynRiCbZiya0ROSBk14qJILFdZT0itOCKxyQTjJKSRQyNi8UgJ7Lm2Kewwtg5RPc1SeYQwhecQ3IwggIMEHHoIKlWkyJkDTdM-Rg1lw8mslNSYVOY7-fvyJdrqjf3BZNAf3p-ibfVOS-D0GaqDdeQ5xAVzcVEshy-IL7Ol |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Digest+of+technical+papers+-+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=7.1+An+11.5TOPS%2FW+1024-MAC+Butterfly+Structure+Dual-Core+Sparsity-Aware+Neural+Processing+Unit+in+8nm+Flagship+Mobile+SoC&rft.au=Song%2C+Jinook&rft.au=Cho%2C+Yunkyo&rft.au=Park%2C+Jun-Seok&rft.au=Jang%2C+Jun-Woo&rft.date=2019-02-01&rft.pub=IEEE&rft.eissn=2376-8606&rft.spage=130&rft.epage=132&rft_id=info:doi/10.1109%2FISSCC.2019.8662476&rft.externalDocID=8662476 |