Strassen's Matrix Multiplication on GPUs

We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sg...

Full description

Saved in:
Bibliographic Details
Published in2011 IEEE 17th International Conference on Parallel and Distributed Systems pp. 157 - 164
Main Authors Junjie Li, Ranka, S., Sahni, S.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2011
Subjects
Online AccessGet full text
ISBN1457718758
9781457718755
ISSN1521-9097
DOI10.1109/ICPADS.2011.130

Cover

More Information
Summary:We provide efficient single-precision and integer GPU implementations of Strassen's algorithm as well as of Winograd's variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen's 4-level implementation and 33% (36%) for Winograd's variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations.
ISBN:1457718758
9781457718755
ISSN:1521-9097
DOI:10.1109/ICPADS.2011.130