Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC

Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Field-programmable Logic and Applications pp. 1 - 4
Main Authors	Nurvitadhi, Eriko, Jaewoong Sim, Sheffield, David, Mishra, Asit, Krishnan, Srivatsan, Marr, Debbie
Format	Conference Proceeding
Language	English
Published	EPFL 01.08.2016
Subjects	Classification algorithms Field programmable gate arrays Graphics processing units Logic gates Random access memory Recurrent neural networks Runtime
Online Access	Get full text
ISSN	1946-1488
DOI	10.1109/FPL.2016.7577314

Cover

More Information
Summary:	Recurrent neural networks (RNNs) provide state-of-the-art accuracy for performing analytics on datasets with sequence (e.g., language model). This paper studied a state-of-the-art RNN variant, Gated Recurrent Unit (GRU). We first proposed memoization optimization to avoid 3 out of the 6 dense matrix vector multiplications (SGEMVs) that are the majority of the computation in GRU. Then, we study the opportunities to accelerate the remaining SGEMVs using FPGAs, in comparison to 14-nm ASIC, GPU, and multi-core CPU. Results show that FPGA provides superior performance/Watt over CPU and GPU because FPGA's on-chip BRAMs, hard DSPs, and reconfigurable fabric allow for efficiently extracting fine-grained parallelisms from small/medium size matrices used by GRU. Moreover, newer FPGAs with more DSPs, on-chip BRAMs, and higher frequency have the potential to narrow the FPGA-ASIC efficiency gap.
ISSN:	1946-1488
DOI:	10.1109/FPL.2016.7577314