Strassen's Matrix Multiplication on GPUs
We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sg...
Saved in:
Published in | 2011 IEEE 17th International Conference on Parallel and Distributed Systems pp. 157 - 164 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2011
|
Subjects | |
Online Access | Get full text |
ISBN | 1457718758 9781457718755 |
ISSN | 1521-9097 |
DOI | 10.1109/ICPADS.2011.130 |
Cover
Summary: | We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations. |
---|---|
ISBN: | 1457718758 9781457718755 |
ISSN: | 1521-9097 |
DOI: | 10.1109/ICPADS.2011.130 |