A Tight I/O Lower Bound for Matrix Multiplication
A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed to perform$C:=AB$ , for distinct matrices$A$ ,$B$ , and$C$ ,...
Saved in:
| Main Authors | , , , |
|---|---|
| Format | Journal Article |
| Language | English |
| Published |
03.02.2017
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.48550/arxiv.1702.02017 |
Cover
| Abstract | A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed to perform$C:=AB$ , for distinct matrices$A$ ,$B$ , and$C$ , where each segment is a series of operations involving$M$reads and writes to and from fast memory, and$M$is the size of fast memory. A lower bound on the number of segments was then determined by obtaining an upper bound on the number of elementary multiplications performed per segment. This paper follows the same high level approach, but improves the lower bound by (1) transforming algorithms for MMM so that they perform all computation via fused multiply-add instructions (FMAs) and using this to reason about only the cost associated with reading the matrices, and (2) decoupling the per-segment I/O cost from the size of fast memory. For$n \times n$matrices, the lower bound's leading-order term is$2n^3/\sqrt{M}$ . A theoretical algorithm whose leading terms attains this is introduced. To what extent the state-of-the-art Goto's Algorithm attains the lower bound is discussed. |
|---|---|
| AbstractList | A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed to perform$C:=AB$ , for distinct matrices$A$ ,$B$ , and$C$ , where each segment is a series of operations involving$M$reads and writes to and from fast memory, and$M$is the size of fast memory. A lower bound on the number of segments was then determined by obtaining an upper bound on the number of elementary multiplications performed per segment. This paper follows the same high level approach, but improves the lower bound by (1) transforming algorithms for MMM so that they perform all computation via fused multiply-add instructions (FMAs) and using this to reason about only the cost associated with reading the matrices, and (2) decoupling the per-segment I/O cost from the size of fast memory. For$n \times n$matrices, the lower bound's leading-order term is$2n^3/\sqrt{M}$ . A theoretical algorithm whose leading terms attains this is introduced. To what extent the state-of-the-art Goto's Algorithm attains the lower bound is discussed. |
| Author | Langou, Julien Smith, Tyler Michael Lowery, Bradley van de Geijn, Robert A |
| Author_xml | – sequence: 1 givenname: Tyler Michael surname: Smith fullname: Smith, Tyler Michael – sequence: 2 givenname: Bradley surname: Lowery fullname: Lowery, Bradley – sequence: 3 givenname: Julien surname: Langou fullname: Langou, Julien – sequence: 4 givenname: Robert A surname: van de Geijn fullname: van de Geijn, Robert A |
| BackLink | https://doi.org/10.48550/arXiv.1702.02017$$DView paper in arXiv |
| BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzQ3MNIzMDIwNOdkMHRUCMlMzyhR8NT3V_DJL08tUnDKL81LUUjLL1LwTSwpyqxQ8C3NKcksyMlMTizJzM_jYWBNS8wpTuWF0twM8m6uIc4eumDD4wuKMnMTiyrjQZbEgy0xJqwCABnnMdI |
| ContentType | Journal Article |
| Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
| Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
| DBID | AKY GOX |
| DOI | 10.48550/arxiv.1702.02017 |
| DatabaseName | arXiv Computer Science arXiv.org |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| ExternalDocumentID | 1702_02017 |
| GroupedDBID | AKY GOX |
| ID | FETCH-arxiv_primary_1702_020173 |
| IEDL.DBID | GOX |
| IngestDate | Tue Sep 30 19:10:47 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-arxiv_primary_1702_020173 |
| OpenAccessLink | https://arxiv.org/abs/1702.02017 |
| ParticipantIDs | arxiv_primary_1702_02017 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-02-03 |
| PublicationDateYYYYMMDD | 2017-02-03 |
| PublicationDate_xml | – month: 02 year: 2017 text: 2017-02-03 day: 03 |
| PublicationDecade | 2010 |
| PublicationYear | 2017 |
| Score | 3.233282 |
| SecondaryResourceType | preprint |
| Snippet | A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior... |
| SourceID | arxiv |
| SourceType | Open Access Repository |
| SubjectTerms | Computer Science - Computational Complexity |
| Title | A Tight I/O Lower Bound for Matrix Multiplication |
| URI | https://arxiv.org/abs/1702.02017 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NawIxEB3UUy-lxRZrbZ1Dr0FjssoetVSsuPViYW9LPqGXUqwWf76ZZMVePAWSIQPJ4b3JzLwAvAhtlLejCfNGGia9tSzXzjOhxnmmlJMmJtqLj_HiUy7LrGwAnnph1Pbw9Zf0gfXvgFOzVCA0fNKEZiAK1My7LlNyMkpx1fZnu8Ax49Q_kJjfwHXN7nCaruMWGu67DXyKGwqB8X2wxhX9SoYz-swIA1_EgiTyD1ikur76Ae0O-vO3zeuCRSfVT1KEqMh_Ff2Le2iFuN11ALXSXDqbKclJ593kxuoAsHboA0464R-gc2mX7uWlR7iiIRYOix60dtu9ewq4uNPP8XCOcM1nKw |
| linkProvider | Cornell University |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Tight+I%2FO+Lower+Bound+for+Matrix+Multiplication&rft.au=Smith%2C+Tyler+Michael&rft.au=Lowery%2C+Bradley&rft.au=Langou%2C+Julien&rft.au=van+de+Geijn%2C+Robert+A&rft.date=2017-02-03&rft_id=info:doi/10.48550%2Farxiv.1702.02017&rft.externalDocID=1702_02017 |