Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approaches help run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. Nevertheless, performance...

Full description

Saved in:

Bibliographic Details
Published in	Parallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 3 - 14
Main Authors	Shen, Jingcheng, Wu, Yifan, Okita, Masao, Ino, Fumihiko
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2022 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	GPGPU High performance computing On-the-fly compression Simulation Stencil computation
Online Access	Get full text
ISBN	9783030967710 3030967719
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-96772-7_1

Cover

More Information
Summary:	Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approaches help run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. Nevertheless, performance of out-of-core approaches is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, however, published results on the use of on-the-fly compression are insufficient. In this study, we propose a method that accelerates GPU-based out-of-core stencil computation with on-the-fly compression, introducing a novel data compression scheme that solves the data dependency between contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps data transfer with computation. Experimental results show that the proposed method achieved a speedup of 1.2× $$\times $$ compared with a method that involves no compression. Moreover, although precision loss caused by compression increased with the number of time steps, it was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.
Bibliography:	Original Abstract: Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approaches help run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. Nevertheless, performance of out-of-core approaches is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, however, published results on the use of on-the-fly compression are insufficient. In this study, we propose a method that accelerates GPU-based out-of-core stencil computation with on-the-fly compression, introducing a novel data compression scheme that solves the data dependency between contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps data transfer with computation. Experimental results show that the proposed method achieved a speedup of 1.2×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} compared with a method that involves no compression. Moreover, although precision loss caused by compression increased with the number of time steps, it was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.
ISBN:	9783030967710 3030967719
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-96772-7_1