A Low-Power Reconfigurable DNN Accelerator for Instruction-Extended RISC-V

Deep neural networks (DNNs) find extensive applications across diverse domains, including Speech Recognition, Face Detection, and Image Classification. While the conventional approach relies on Graphics Processing Units (GPUs) for DNN implementation, it prioritizes speed at the expense of efficiency...

Full description

Saved in:

Bibliographic Details
Published in	IPSJ Transactions on System and LSI Design Methodology Vol. 17; pp. 55 - 66
Main Authors	Li, Dongju, Wang, Hansen, Isshiki, Tsuyoshi
Format	Journal Article
Language	English
Published	Tokyo Information Processing Society of Japan 01.01.2024 Japan Science and Technology Agency
Subjects	Accuracy application-specific integrated circuit Artificial neural networks deep neural network direct memory access dynamic fixed-point Face recognition Field programmable gate arrays Graphics processing units hardware/software co-design Image classification Power management Reconfiguration RISC RISC-V Speech recognition
Online Access	Get full text
ISSN	1882-6687 1882-6687
DOI	10.2197/ipsjtsldm.17.55

Cover

More Information
Summary:	Deep neural networks (DNNs) find extensive applications across diverse domains, including Speech Recognition, Face Detection, and Image Classification. While the conventional approach relies on Graphics Processing Units (GPUs) for DNN implementation, it prioritizes speed at the expense of efficiency. In the pursuit of reduced power consumption and enhanced efficiency, we advocate for the adoption of application-specific hardware computing. This paper introduces a run-time reconfigurable DNN accelerator SoC (DNN-AS) architecture, seamlessly integrated into the instruction-extended RISC-V platform. The meticulously crafted application-specific extension instruction set is tailored to expedite high-frequency DNN operations. To optimize circuit structure, we have devised an 8-bit dynamic fixed-point (DFP) scheme within the DNN-AS. Furthermore, we conduct a comparative accuracy analysis between DFP and the PyTorch float implementation. Our results demonstrate that DNN-AS exhibits minimal accuracy loss, with Top 1 accuracy deviations of only up to 0.53%, 0.31%, and 0.68% for RESNET34, RESNET50, and RESNET101, respectively. Finally, we juxtapose the overall simulated results with other platforms, revealing that our design has achieved remarkable improvements in throughput per joule (GOP/J), ranging from 8.4x to 1897x compared to Field-Programmable Gate Arrays (FPGAs) and GPU.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1882-6687 1882-6687
DOI:	10.2197/ipsjtsldm.17.55