7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogen...

Full description

Saved in:
Bibliographic Details
Published inDigest of technical papers - IEEE International Solid-State Circuits Conference pp. 138 - 140
Main Authors Yue, Jinshan, Liu, Ruoyang, Sun, Wenyu, Yuan, Zhe, Wang, Zhibo, Tu, Yung-Ning, Chen, Yi-Ju, Ren, Ao, Wang, Yanzhi, Chang, Meng-Fan, Li, Xueqing, Yang, Huazhong, Liu, Yongpan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.02.2019
Subjects
Online AccessGet full text
ISSN2376-8606
DOI10.1109/ISSCC.2019.8662360

Cover

More Information
Summary:Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128\times storage savings and a O(\mathrm{n}^{2})-to-O(nlog(n)) computation complexity reduction.
ISSN:2376-8606
DOI:10.1109/ISSCC.2019.8662360