Data Reuse for Accelerated Approximate Warps

Many data-driven applications, including computer vision, machine learning, speech recognition, and medical diagnostics show tolerance to computation error. These applications are often accelerated on GPUs, but the performance improvements require high energy usage. In this article, we present DRAAW...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 12; pp. 4623 - 4634
Main Authors	Peroni, Daniel, Imani, Mohsen, Nejatollahi, Hamid, Dutt, Nikil, Rosing, Tajana
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acceleration Accuracy Approximate computing Approximation Associative memory Biological neural networks Computation Computer vision Energy consumption energy efficiency floating-point unit (FPU) GPU Graphics processing units Instruction sets Lookup tables Machine learning Mathematical analysis Neural networks Neurons Speech recognition Warp warps
Online Access	Get full text
ISSN	0278-0070 1937-4151
DOI	10.1109/TCAD.2020.2986128

Cover

More Information
Summary:	Many data-driven applications, including computer vision, machine learning, speech recognition, and medical diagnostics show tolerance to computation error. These applications are often accelerated on GPUs, but the performance improvements require high energy usage. In this article, we present DRAAW, an approximate computing technique capable of accelerating GPGPU applications at a warp level. In GPUs, warps are groups of threads which issued together across multiple cores. The slowest thread dictates the pace of the warp, so DRAAW identifies these bottlenecks and avoids them during approximation. We alleviate computation costs by using an approximate lookup table which tracks recent operations and reuses them to exploit temporal locality within applications. To improve neural network performance, we propose neuron aware approximation, a technique which profiles operations within network layers and automatically configures DRAAW to ensure computations with more impact on the output accuracy are subject to less approximation. We evaluate our design by placing DRAAW within each core of an Nvidia Kepler Architecture Titan. DRAAW improves throughput by up to <inline-formula> <tex-math notation="LaTeX">2.8\times </tex-math></inline-formula> and improves energy-delay product (EDP) by <inline-formula> <tex-math notation="LaTeX">5.6\times </tex-math></inline-formula> for six GPGPU applications while maintaining less than 5% output error. We show neuron aware approximation accelerates the inference of six neutral networks by <inline-formula> <tex-math notation="LaTeX">2.9\times </tex-math></inline-formula> and improves EDP by <inline-formula> <tex-math notation="LaTeX">6.2\times </tex-math></inline-formula> with less than 1% impact on prediction accuracy.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2020.2986128