ReAAP: A Reconfigurable and Algorithm-Oriented Array Processor with Compiler-Architecture Co-Design

Parallelism and data reuse are the most critical issues for the design of hardware acceleration in a deep learning processor. Besides, abundant on-chip memories and precise data management are intrinsic design requirements because most of deep learning algorithms are data-driven and memory-bound. In...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 71; no. 12; pp. 1 - 14
Main Authors	Zheng, Jianwei, Liu, Yu, Liu, Xuejiao, Liang, Luhong, Chen, Deming, Cheng, Kwang-Ting
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Array processors Arrays Artificial neural networks Co-design compiler-architecture co-design Compilers Computer architecture Data management Deep learning diverse layer-level workloads Domain-specific processor Machine learning Microprocessors Optimization Parallel processing polyhedral modeling reconfigurable computing Reconfigurable hardware Reconfiguration Software System-on-chip Systolic arrays Workload Workloads
Online Access	Get full text
ISSN	0018-9340 1557-9956 2326-3814 1557-9956
DOI	10.1109/TC.2022.3213177

Cover

More Information
Summary:	Parallelism and data reuse are the most critical issues for the design of hardware acceleration in a deep learning processor. Besides, abundant on-chip memories and precise data management are intrinsic design requirements because most of deep learning algorithms are data-driven and memory-bound. In this paper, we propose a compiler-architecture co-design scheme targeting a reconfigurable and algorithm-oriented array processor, named ReAAP. Given specific deep neural networks, the proposed co-design scheme is effective to perform parallelism and data reuse optimization on compute-intensive layers for guiding reconfigurable computing in hardware. Especially, the systemic optimization is performed in our proposed domain-specific compiler to deal with the intrinsic tensions between parallelism and data locality, for the purpose of automatically mapping diverse layer-level workloads onto our proposed reconfigurable array architecture. In this architecture, abundant on-chip memories are software-controlled and its massive data access is precisely handled by compiler-generated instructions. In our experiments, the ReAAP is implemented on an embedded FPGA platform. Experimental results demonstrate that our proposed co-design scheme is effective to integrate software flexibility with hardware parallelism for accelerating diverse deep learning workloads. As a whole system, ReAAP achieves a consistently high utilization of hardware resource for accelerating all the diverse compute-intensive layers in ResNet, MobileNet, and BERT.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9340 1557-9956 2326-3814 1557-9956
DOI:	10.1109/TC.2022.3213177