The MOMMS Family of Matrix Multiplication Algorithms

As the ratio between the rate of computation and rate with which data can be retrieved from various layers of memory continues to deteriorate, a question arises: Will the current best algorithms for computing matrix-matrix multiplication on future CPUs continue to be (near) optimal? This paper provi...

Full description

Saved in:
Bibliographic Details
Main Authors Smith, Tyler M, van de Geijn, Robert A
Format Journal Article
LanguageEnglish
Published 11.04.2019
Subjects
Online AccessGet full text
DOI10.48550/arxiv.1904.05717

Cover

More Information
Summary:As the ratio between the rate of computation and rate with which data can be retrieved from various layers of memory continues to deteriorate, a question arises: Will the current best algorithms for computing matrix-matrix multiplication on future CPUs continue to be (near) optimal? This paper provides compelling analytical and empirical evidence that the answer is "no". The analytical results guide us to a new family of algorithms of which the current state-of-the-art "Goto's algorithm" is but one member. The empirical results, on architectures that were custom built to reduce the amount of bandwidth to main memory, show that under different circumstances, different and particular members of the family become more superior. Thus, this family will likely start playing a prominent role going forward.
DOI:10.48550/arxiv.1904.05717