A highly efficient I/O-based out-of-core stencil algorithm with globally optimized temporal blocking
This paper proposes the most efficient I/O-based out-of-core stencil algorithm for large-capacity type of non-volatile memory (NVM), such as flash. The paper evaluates the performances of various out-of-core stencil algorithms and implementations designed for flash. The algorithms for flash are very...
Saved in:
| Published in | 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) pp. 1 - 6 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.08.2017
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/PACRIM.2017.8121909 |
Cover
| Summary: | This paper proposes the most efficient I/O-based out-of-core stencil algorithm for large-capacity type of non-volatile memory (NVM), such as flash. The paper evaluates the performances of various out-of-core stencil algorithms and implementations designed for flash. The algorithms for flash are very different from existing algorithms designed for memory-and-cache, host-and-GPU, and local-and-remote nodes, in their schemes, data structures used in stencil computations, and the way of using blocking technique to increase data access locality for accelerating performance. The proposed algorithm achieves 80% of the performance of in-core computing using sufficient capacity of the main memory, even if available memory capacity is limited to 6.3% of the data size required in the stencil computation problem. In other words, the algorithm degrades performance within 20% for the stencil computation problem that requires 2TiB of data by using only 128GiB of main memory and flash SSDs whose access latency is much larger than that of DRAM. |
|---|---|
| DOI: | 10.1109/PACRIM.2017.8121909 |