A highly efficient I/O-based out-of-core stencil algorithm with globally optimized temporal blocking

This paper proposes the most efficient I/O-based out-of-core stencil algorithm for large-capacity type of non-volatile memory (NVM), such as flash. The paper evaluates the performances of various out-of-core stencil algorithms and implementations designed for flash. The algorithms for flash are very...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) pp. 1 - 6
Main Authors	Midorikawa, Hiroko, Tan, Hideyuki
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2017
Subjects	access locality algorithm Arrays asynchronous I/O auto-tuning Data models Flash memories flash memory Layout Non-volatile memory out-of-core stencil temporal blocking
Online Access	Get full text
DOI	10.1109/PACRIM.2017.8121909

Cover

More Information
Summary:	This paper proposes the most efficient I/O-based out-of-core stencil algorithm for large-capacity type of non-volatile memory (NVM), such as flash. The paper evaluates the performances of various out-of-core stencil algorithms and implementations designed for flash. The algorithms for flash are very different from existing algorithms designed for memory-and-cache, host-and-GPU, and local-and-remote nodes, in their schemes, data structures used in stencil computations, and the way of using blocking technique to increase data access locality for accelerating performance. The proposed algorithm achieves 80% of the performance of in-core computing using sufficient capacity of the main memory, even if available memory capacity is limited to 6.3% of the data size required in the stencil computation problem. In other words, the algorithm degrades performance within 20% for the stencil computation problem that requires 2TiB of data by using only 128GiB of main memory and flash SSDs whose access latency is much larger than that of DRAM.
DOI:	10.1109/PACRIM.2017.8121909