A Lightweight and Efficient GPU for NDP Utilizing Data Access Pattern of Image Processing

As the demand for image applications with high resolution increases, the importance of the system for image processing is growing. Graphics processing units (GPUs) can increase computational capacity with massive parallelism, but are still subject to limited memory bandwidth. Near-data-processing (N...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 71; no. 1; pp. 13 - 26
Main Authors	Choi, Jungwoo, Kim, Boyeal, Jeon, Ji-Ye, Lee, Hyuk-Jae, Lim, Euicheol, Rhee, Chae Eun
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Computer architecture Data processing Data transfer (computers) Graphics processing units Hardware Image processing Image resolution Instruction sets Lightweight Near-data processing Optimization Performance enhancement processing-in-memory Random access memory Resource management
Online Access	Get full text
ISSN	0018-9340 1557-9956
DOI	10.1109/TC.2020.3035826

Cover

More Information
Summary:	As the demand for image applications with high resolution increases, the importance of the system for image processing is growing. Graphics processing units (GPUs) can increase computational capacity with massive parallelism, but are still subject to limited memory bandwidth. Near-data-processing (NDP) is expected to mitigate the performance and energy overhead caused as a result of data transfer by performing computations on the logic die of 3D-stacked memory. Although prior studies have demonstrated the advantages of NDP, a NDP solution focused on image processing has not yet been developed. This article proposes a GPU-based NDP architecture and well-matched optimization strategies considering both the characteristics of image applications and NDP constraints. First, data allocation to the processing unit is addressed to maintain the data locality and data access pattern. Second, a lightweight yet efficient NDP GPU architecture is proposed. By applying a prefetcher that leverages the pattern-aware data allocation, the number of active warps and the on-chip SRAM size of the NDP are significantly reduced. This enables the NDP constraints to be satisfied and a greater number of processing units to be integrated on a logic die. The evaluation results show that the proposed NDP GPU improves the performance by 1.85× and consumes 82.7 percent energy compared to the baseline NDP GPU.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2020.3035826