Minimizing communication in sparse matrix solvers
Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES. Here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Ω(k) vector...
Saved in:
| Published in | Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis pp. 1 - 12 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
New York, NY, USA
ACM
14.11.2009
|
| Series | ACM Conferences |
| Subjects | |
| Online Access | Get full text |
| ISBN | 1605587443 9781605587448 |
| ISSN | 2167-4329 |
| DOI | 10.1145/1654059.1654096 |
Cover
| Summary: | Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES. Here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Ω(k) vector operations like dot products, resulting in communication that grows by a factor of Ω(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k · log P) messages on a parallel machine, and reading the matrix A from DRAM to cache just once, instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to form a new variant of GMRES. Our shared-memory implementation on an 8-core Intel Clovertown gets speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability. |
|---|---|
| ISBN: | 1605587443 9781605587448 |
| ISSN: | 2167-4329 |
| DOI: | 10.1145/1654059.1654096 |