A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural networks (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there are two challenges for this...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 230; pp. 48 - 59
Main Authors	Li, Shijie, Dou, Yong, Niu, Xin, Lv, Qi, Wang, Qiang
Format	Journal Article
Language	English
Published	Elsevier B.V 22.03.2017
Subjects	Convolutional neural networks GPU Target detection Convolutional neural networks GPU Target detection
Online Access	Get full text
ISSN	0925-2312 1872-8286
DOI	10.1016/j.neucom.2016.11.046

Cover

More Information
Summary:	Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural networks (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there are two challenges for this task. One is that the running speed remains to be improved. The other is that we probably use a deeper and larger CNN model, but a more sophisticated model may not be trained well due to the shortage of GPU memory. In this paper, we present two scheduling algorithms to solve the aforementioned challenges for improving the system performance holistically. The first one is an efficient image combination algorithm used to accelerate the feedforward process of CNN. The other is a light-memory-cost algorithm used to train an arbitrarily large CNN model for a GPU device with a limited memory. We run our experiments on a GTX980 card and use a CNN model with 8GB of model parameters, which is larger than the size of the global memory of a GPU. Compared with that of cuDNNv3, a high speedup of 6.97x is obtained in the detection task.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2016.11.046