FTA-GAN: A Computation-Efficient Accelerator for GANs With Fast Transformation Algorithm

Nowadays, generative adversarial network (GAN) is making continuous breakthroughs in many machine learning tasks. The popular GANs usually involve computation-intensive deconvolution operations, leading to limited real-time applications. Prior works have brought several accelerators for deconvolutio...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 34; no. 6; pp. 2978 - 2992
Main Authors Mao, Wendong, Yang, Peixiang, Wang, Zhongfeng
Format Journal Article
LanguageEnglish
Published United States IEEE 01.06.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2162-237X
2162-2388
2162-2388
DOI10.1109/TNNLS.2021.3110728

Cover

More Information
Summary:Nowadays, generative adversarial network (GAN) is making continuous breakthroughs in many machine learning tasks. The popular GANs usually involve computation-intensive deconvolution operations, leading to limited real-time applications. Prior works have brought several accelerators for deconvolution, but all of them suffer from severe problems, such as computation imbalance and large memory requirements. In this article, we first introduce a novel fast transformation algorithm (FTA) for deconvolution computation, which well solves the computation imbalance problem and removes the extra memory requirement for overlapped partial sums. Besides, it can reduce the computation complexity for various types of deconvolutions significantly. Based on FTA, we develop a fast computing core (FCC) and the corresponding computing array so that the deconvolution can be efficiently computed. We next optimize the dataflow and storage scheme to further reuse on-chip memory and improve the computation efficiency. Finally, we present a computation-efficient hardware architecture for GANs and validate it on several GAN benchmarks, such as deep convolutional GAN (DCGAN), energy-based GAN (EBGAN), and Wasserstein GAN (WGAN). The experimental results show that our design can reach 2211 GOPS under 185-MHz working frequency on Intel Stratix 10SX field-programmable gate array (FPGA) board with satisfactory visual results. In brief, the proposed design can achieve more than <inline-formula> <tex-math notation="LaTeX">2\times </tex-math></inline-formula> hardware efficiency improvement over previous designs, and it can reduce the storage requirement drastically.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2021.3110728