A Fast Algorithm for Convolutional Neural Networks Using Tile-based Fast Fourier Transforms

State-of-the-art convolution algorithms accelerate training of convolutional neural networks (CNNs) by decomposing convolutions in time or Fourier domain, these decomposition implementations are designed for small filters or large inputs, respectively. We take these two aspects into account, devote...

Full description

Saved in:

Bibliographic Details
Published in	Neural processing letters Vol. 50; no. 2; pp. 1951 - 1967
Main Authors	Lin, Jinhua, Yao, Yu
Format	Journal Article
Language	English
Published	New York Springer US 01.10.2019 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Artificial neural networks Complex Systems Computational Intelligence Computer Science Decomposition Fast Fourier transformations Fourier transforms Neural networks Propagation Semantics Fourier transforms Decomposition implementations Small filters Convolutional neural network
Online Access	Get full text
ISSN	1370-4621 1573-773X
DOI	10.1007/s11063-019-09981-z

Cover

More Information
Summary:	State-of-the-art convolution algorithms accelerate training of convolutional neural networks (CNNs) by decomposing convolutions in time or Fourier domain, these decomposition implementations are designed for small filters or large inputs, respectively. We take these two aspects into account, devote to a novel decomposition strategy in Fourier domain and propose a conceptually useful algorithm for accelerating CNNs. We extend the classical Fast Fourier Transform theory to meet the requirements of convolving large inputs with small filters in faster manner. The tile-based decomposition strategy is introduced into Fourier transforms to yield a fast convolution algorithm. The algorithm, called tFFT, is simple to program, implementing tile sized transformations in Fourier domain to minimize convolution time for modern CNNs. tFFT reduces the arithmetic complexity of CNNs by over a factor of 3 compared to FFT-based convolution algorithms. We evaluate the performance of tFFT by implementing it on a set of state-of-the-art CNNs, the experiments show good results at batch sizes from 1 to 128.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1370-4621 1573-773X
DOI:	10.1007/s11063-019-09981-z