7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture
Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogen...
Saved in:
| Published in | Digest of technical papers - IEEE International Solid-State Circuits Conference pp. 138 - 140 |
|---|---|
| Main Authors | , , , , , , , , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.02.2019
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2376-8606 |
| DOI | 10.1109/ISSCC.2019.8662360 |
Cover
| Summary: | Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128\times storage savings and a O(\mathrm{n}^{2})-to-O(nlog(n)) computation complexity reduction. |
|---|---|
| ISSN: | 2376-8606 |
| DOI: | 10.1109/ISSCC.2019.8662360 |