7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogen...

Full description

Saved in:

Bibliographic Details
Published in	Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 138 - 140
Main Authors	Yue, Jinshan, Liu, Ruoyang, Sun, Wenyu, Yuan, Zhe, Wang, Zhibo, Tu, Yung-Ning, Chen, Yi-Ju, Ren, Ao, Wang, Yanzhi, Chang, Meng-Fan, Li, Xueqing, Yang, Huazhong, Liu, Yongpan
Format	Conference Proceeding
Language	English
Published	IEEE 01.02.2019
Subjects	Acceleration Arrays Artificial neural networks Biological neural networks Random access memory Two dimensional displays
Online Access	Get full text
ISSN	2376-8606
DOI	10.1109/ISSCC.2019.8662360

Cover

More Information
Summary:	Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128\times storage savings and a O(\mathrm{n}^{2})-to-O(nlog(n)) computation complexity reduction.
ISSN:	2376-8606
DOI:	10.1109/ISSCC.2019.8662360