Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA

To accelerate the practical applications of artificial intelligence, this paper proposes a high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA). The refine...

Full description

Saved in:

Bibliographic Details
Published in	Computational Intelligence and Neuroscience Vol. 2022; pp. 1 - 22
Main Authors	Li, Hengyi, Yue, Xuebin, Wang, Zhichen, Chai, Zhilei, Wang, Wenwen, Tomiyama, Hiroyuki, Meng, Lin
Format	Journal Article
Language	English
Published	United States Hindawi 01.06.2022 Hindawi Limited John Wiley & Sons, Inc
Subjects	Acceleration Accuracy Algorithms Artificial Intelligence Artificial neural networks Computational linguistics Cultural heritage Design Digital integrated circuits Efficiency Field programmable gate arrays Floating point arithmetic Hardware Language processing Natural language interfaces Neural networks Neural Networks, Computer Parameters Pruning Research Article Semiconductor industry Software Sparsity United States
Online Access	Get full text
ISSN	1687-5265 1687-5273 1687-5273
DOI	10.1155/2022/8039281

Cover

More Information
Summary:	To accelerate the practical applications of artificial intelligence, this paper proposes a high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA). The refined pruning operation is based on the channel-wise importance indexes of each layer and the layer-wise input sparsity of convolutional layers. The method utilizes the characteristics of the native networks without introducing any extra workloads to the training phase. In addition, the operation is easy to be extended to various state-of-the-art deep neural networks. The effectiveness of the method is verified on ResNet architecture and VGG networks in terms of dataset CIFAR10, CIFAR100, and ImageNet100. Experimental results show that in terms of ResNet50 on CIFAR10 and ResNet101 on CIFAR100, more than 85% of parameters and Floating-Point Operations are pruned with only 0.35% and 0.40% accuracy loss, respectively. As for the VGG network, 87.05% of parameters and 75.78% of Floating-Point Operations are pruned with only 0.74% accuracy loss for VGG13BN on CIFAR10. Furthermore, we accelerate the networks at the hardware level on the FPGA platform by utilizing the tool Vitis AI. For two threads mode in FPGA, the throughput/fps of the pruned VGG13BN and ResNet101 achieves 151.99 fps and 124.31 fps, respectively, and the pruned networks achieve about 4.3× and 1.8× speed up for VGG13BN and ResNet101, respectively, compared with the original networks on FPGA.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Academic Editor: M. Hassaballah
ISSN:	1687-5265 1687-5273 1687-5273
DOI:	10.1155/2022/8039281