An FPGA-Based High-Throughput Dataflow Accelerator for Lightweight Neural Network
Lightweight neural networks (LWNNs) have drawn significant attention recently for compact architecture and acceptable accuracy. Despite achieving substantial reductions in computation complexity and model size, increased memory access demands are caused by the extensive use of depthwise separable co...
Saved in:
| Published in | IEEE International Symposium on Circuits and Systems proceedings pp. 1 - 5 |
|---|---|
| Main Authors | , , , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
19.05.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2158-1525 |
| DOI | 10.1109/ISCAS58744.2024.10558315 |
Cover
| Summary: | Lightweight neural networks (LWNNs) have drawn significant attention recently for compact architecture and acceptable accuracy. Despite achieving substantial reductions in computation complexity and model size, increased memory access demands are caused by the extensive use of depthwise separable convolutions (DSCs) and skip-connection blocks (SCBs), which makes it difficult to achieve the anticipated performance. To process LWNNs efficiently, an FPGA-based dataflow accelerator is proposed in this paper. Firstly, a pixel-based streaming strategy is introduced to reduce off-chip memory access while minimizing on-chip memory overhead. Furthermore, an adaptive bandwidth computing engine (CE) is designed to increase computational efficiency in multi-CE architecture. Finally, based on the scalable CE, a dynamic parallelism allocation algorithm is proposed to avoid underutilization of on-chip computing resources. Shuf-fleNetV2 is implemented on Xilinx ZC706 platform, and the results show the proposed accelerator can achieve a state-of-the-art performance of 1771.2 FPS and computational efficiency of 0.64 GOPS/DSP, which is 5.3× of the reference design. |
|---|---|
| ISSN: | 2158-1525 |
| DOI: | 10.1109/ISCAS58744.2024.10558315 |