SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU
SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in co...
Saved in:
| Published in | Parallel processing letters Vol. 34; no. 2 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Singapore
World Scientific Publishing Company
01.06.2024
World Scientific Publishing Co. Pte., Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0129-6264 1793-642X |
| DOI | 10.1142/S012962642450004X |
Cover
| Abstract | SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in computational efficiency. This paper proposes a new algorithm, SPMSD (SpGEMM Based on Minimum Standard Deviation). The algorithm is developed based on a hash table and partition strategy. First, the number of intermediate results in the matrix is divided into multiple blocks based on a new partition strategy to ensure the minimum standard deviation among blocks. Second, the input matrix is transformed according to the result of the partition strategy. Finally, SPMSD performs the parallel computing of SpGEMM based on the advantages of fast insertion and also fast access storage of the hash table and the calculation process controls the insertion and merging of intermediate results according to the offset to avoid the shortage of atomic operations. These experiments indicate the execution of SPMSD is faster than the existing cuSPARSE libraries by 7.4x. Compared with the Out of Core method, SPMSD improves the computational performance by 1.2x, SPMSD memory utilization is decreased by 0.19x. |
|---|---|
| AbstractList | SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in computational efficiency. This paper proposes a new algorithm, SPMSD (SpGEMM Based on Minimum Standard Deviation). The algorithm is developed based on a hash table and partition strategy. First, the number of intermediate results in the matrix is divided into multiple blocks based on a new partition strategy to ensure the minimum standard deviation among blocks. Second, the input matrix is transformed according to the result of the partition strategy. Finally, SPMSD performs the parallel computing of SpGEMM based on the advantages of fast insertion and also fast access storage of the hash table and the calculation process controls the insertion and merging of intermediate results according to the offset to avoid the shortage of atomic operations. These experiments indicate the execution of SPMSD is faster than the existing cuSPARSE libraries by 7.4x. Compared with the Out of Core method, SPMSD improves the computational performance by 1.2x, SPMSD memory utilization is decreased by 0.19x. |
| Author | Cui, Huanyu Wang, Ye Wang, Nianbin Han, Qilong |
| Author_xml | – sequence: 1 givenname: Huanyu surname: Cui fullname: Cui, Huanyu – sequence: 2 givenname: Nianbin surname: Wang fullname: Wang, Nianbin – sequence: 3 givenname: Qilong surname: Han fullname: Han, Qilong – sequence: 4 givenname: Ye surname: Wang fullname: Wang, Ye |
| BookMark | eNplUNFKwzAUDTLBOf0A3wI-V2_SNFl9G1OnsOGgDvZWYpKOjJrWJEP397ZUfBncy7ncc8-5cC7RyDXOIHRD4I4QRu8LIDTnlDPKMgBg2zM0JiJPk26zHaFxTyc9f4EuQ9gDECYYjFFZrFfF4wOeObyWPtpoG2fdLimil9HsjrhqfM_IujY1XhhnuhEXrfTB4JWM3v4kA-DVoY62ra2SvQnuarHeXKHzStbBXP_hBG2en97nL8nybfE6ny0TRTnfJrlWTFPJKjplmoGQqSQ8zYGlU6VFxhVQTbJUCTIFRj50TqUAyXLZtaKapxN0O_i2vvk6mBDLfXPwrntZpoRnIhO0S2OCyHClfBOCN1XZevsp_bEkUPY5lic5dhoYNN-Nr3VQ1rhoK6v-paeSX0FDdaE |
| Cites_doi | 10.3390/fi13020036 10.1177/1094342021990738 10.1016/j.compeleceng.2020.106848 10.1137/130948811 10.1145/3229710.3229720 10.1137/0613024 10.1016/j.future.2020.10.036 10.1145/3350755.3400216 10.1109/TPDS.2020.3000708 10.1109/ScalA.2018.00011 10.1109/TPDS.2018.2864729 10.1145/3079079.3079105 10.1137/110848244 |
| ContentType | Journal Article |
| Copyright | 2024, World Scientific Publishing Company 2024. World Scientific Publishing Company |
| Copyright_xml | – notice: 2024, World Scientific Publishing Company – notice: 2024. World Scientific Publishing Company |
| DBID | AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1142/S012962642450004X |
| DatabaseName | CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | CrossRef Technology Research Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1793-642X |
| ExternalDocumentID | 10_1142_S012962642450004X S012962642450004X |
| GroupedDBID | -~X .DC 0R~ 123 4.4 8V8 ADSJI ALMA_UNASSIGNED_HOLDINGS CAG COF CS3 DU5 EBS EJD ESX HZ~ H~9 K1G O9- P2P P71 PQQKQ RWJ WSC AAYXX AMVHM CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c266X-9dc4d2a4f284d407a3a16390438cd756c02d153c718041bd92a70a49aa49c2d63 |
| ISSN | 0129-6264 |
| IngestDate | Mon Jun 30 13:00:12 EDT 2025 Tue Jul 01 00:34:07 EDT 2025 Fri Aug 23 08:19:25 EDT 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | load imbalance hash table parallel efficiency non-uniformity GPU SpGEMM |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c266X-9dc4d2a4f284d407a3a16390438cd756c02d153c718041bd92a70a49aa49c2d63 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-7996-7545 0000-0002-5185-8387 0000-0002-0223-8181 0000-0003-1738-7937 |
| PQID | 3165757279 |
| PQPubID | 4437618 |
| ParticipantIDs | worldscientific_primary_S012962642450004X crossref_primary_10_1142_S012962642450004X proquest_journals_3165757279 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 20240600 2024-06-00 20240601 |
| PublicationDateYYYYMMDD | 2024-06-01 |
| PublicationDate_xml | – month: 06 year: 2024 text: 20240600 |
| PublicationDecade | 2020 |
| PublicationPlace | Singapore |
| PublicationPlace_xml | – name: Singapore |
| PublicationTitle | Parallel processing letters |
| PublicationYear | 2024 |
| Publisher | World Scientific Publishing Company World Scientific Publishing Co. Pte., Ltd |
| Publisher_xml | – name: World Scientific Publishing Company – name: World Scientific Publishing Co. Pte., Ltd |
| References | Deveci M. (S012962642450004XBIB022) Lee J. (S012962642450004XBIB026) Xie Z. (S012962642450004XBIB009) S012962642450004XBIB005 Jamour F. (S012962642450004XBIB011) S012962642450004XBIB020 Guo S. (S012962642450004XBIB031) 2015 S012962642450004XBIB002 S012962642450004XBIB024 S012962642450004XBIB001 Parger M. (S012962642450004XBIB014) Xia Y. (S012962642450004XBIB015) S012962642450004XBIB023 S012962642450004XBIB003 S012962642450004XBIB025 Nagasaka Y. (S012962642450004XBIB013) Qin E. (S012962642450004XBIB007) Patwary M. M. A. (S012962642450004XBIB027) Cui H. Y. (S012962642450004XBIB012) 2022 S012962642450004XBIB016 S012962642450004XBIB019 S012962642450004XBIB018 S012962642450004XBIB030 Anh N. Q. Pham (S012962642450004XBIB021) Martone M. (S012962642450004XBIB017) S012962642450004XBIB034 Liu W. F. (S012962642450004XBIB006) Akbudak K. (S012962642450004XBIB010) 2018; 4 |
| References_xml | – ident: S012962642450004XBIB018 doi: 10.3390/fi13020036 – volume: 4 start-page: 1 issue: 3 year: 2018 ident: S012962642450004XBIB010 publication-title: ACM Transactions on Parallel Computing (TOPC) – start-page: 392 volume-title: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ident: S012962642450004XBIB015 – ident: S012962642450004XBIB034 doi: 10.1177/1094342021990738 – start-page: 1 volume-title: Proceedings of the 2016 International Conference on Supercomputing ident: S012962642450004XBIB021 – start-page: 93 volume-title: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ident: S012962642450004XBIB022 – start-page: 1 volume-title: Proceedings of the Fourteenth EuroSys Conference, 2019 46th International Conference on Parallel Processing (ICPP) ident: S012962642450004XBIB011 – ident: S012962642450004XBIB024 doi: 10.1016/j.compeleceng.2020.106848 – start-page: 94 volume-title: Proceedings of the ACM International Conference on Supercomputing ident: S012962642450004XBIB009 – start-page: 362 volume-title: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ident: S012962642450004XBIB014 – start-page: 1 year: 2022 ident: S012962642450004XBIB012 publication-title: The Journal of Supercomputing – start-page: 925 volume-title: 2020 IEEE 36th International Conference on Data Engineering (ICDE) ident: S012962642450004XBIB026 – start-page: 327 volume-title: Proceedings of the International Multiconference on Computer Science and Information Technology ident: S012962642450004XBIB017 – ident: S012962642450004XBIB023 doi: 10.1137/130948811 – ident: S012962642450004XBIB003 doi: 10.1145/3229710.3229720 – ident: S012962642450004XBIB019 doi: 10.1137/0613024 – start-page: 101 volume-title: 2017 46th International Conference on Parallel Processing (ICPP) ident: S012962642450004XBIB013 – ident: S012962642450004XBIB016 doi: 10.1016/j.future.2020.10.036 – start-page: 61 volume-title: CCF National Conference on Computer Engineering and Technology year: 2015 ident: S012962642450004XBIB031 – ident: S012962642450004XBIB020 doi: 10.1145/3350755.3400216 – start-page: 48 volume-title: High Performance Computing: 30th International Conference, ISC High Performance ident: S012962642450004XBIB027 – ident: S012962642450004XBIB030 doi: 10.1109/TPDS.2020.3000708 – start-page: 58 volume-title: IEEE International Symposium on High-Performance Computer Architecture (HPCA) ident: S012962642450004XBIB007 – ident: S012962642450004XBIB002 doi: 10.1109/ScalA.2018.00011 – ident: S012962642450004XBIB001 doi: 10.1109/TPDS.2018.2864729 – ident: S012962642450004XBIB005 doi: 10.1145/3079079.3079105 – start-page: 370 volume-title: IEEE 28th International Parallel and Distributed Processing Symposium ident: S012962642450004XBIB006 – ident: S012962642450004XBIB025 doi: 10.1137/110848244 |
| SSID | ssj0014740 ssib019635363 |
| Score | 2.3210864 |
| Snippet | SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations.... |
| SourceID | proquest crossref worldscientific |
| SourceType | Aggregation Database Index Database Publisher |
| SubjectTerms | Algorithms Atomic properties Grid method Insertion Linear equations Multiplication Nonuniformity Process controls Sparse matrices Sparsity Standard deviation |
| Title | SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU |
| URI | http://www.worldscientific.com/doi/abs/10.1142/S012962642450004X https://www.proquest.com/docview/3165757279 |
| Volume | 34 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1793-642X dateEnd: 20241105 omitProxy: false ssIdentifier: ssj0014740 issn: 0129-6264 databaseCode: AMVHM dateStart: 19910901 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1JS8NAFB5qvXhxF-vGHLxoiaaTyeatuBWhpVIr9RSSmRQKJS1dQPTP-2bJUuNBhTYtSXgt816-eftD6Jx5tuO7YKZaQ4eLkhxmhI5HDctyPNbgEbM9Ue_c7jitPn0a2INK5bNYXbKIrtjHj3Ul_-EqnAO-iirZP3A2Iwon4DvwF47AYTj-ise9brt3p117XXFde1cN3XNWJWPCFTEwZZy2mK73pmDNiplDi9no3VAf9bbKLNQuPBFDeOz2i6prRmaqaguEj2Esi4Eytfx2qWZgLwFhlrmrXsFJBwQxGiU55Em8ex6NJ3rzLNz7Fhe9EYTmWVMK82QGkIQlmepUdKaVAa4hUm-I6mJ-FSsABrwwwCYaFBFauztHuaFcBn5KZOgZSAqKhIpBDyr381s_7dI9a2idwJZgVtF6s_3aypw9Apts2Z1MB6WoK-trs7-tg-Tw09cloqtqTm67bMpGuPNsgQrKzMs22tRWCG4qkdpBlTjZRVvphA-sAX8PBVLCbnAzwT_KFwb5wqlgYC1fWMkXXpEvvCpfGF4gX_uo_3D_ctsy9EgOg4EmNzB8zignIR2CVsOp6YZWCAq9L8LJjLu2w0zCYQ9loPGYtBFxn4SuGVI_hDcj3LEOUDWZJPEhwrY7DIkZmc4QNFrQkjxG3ZgR4kXc9AmJa-gyXcFgqjqvBKqKngSl5a6hk3SNA_2AzgOrIaKKoKD7NXTxbd0zkiVSR3-49xht5A_BCaouZsv4FJTURXSmhekLuZKNPg |
| linkProvider | EBSCOhost |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPMSD%3A+An+Partitioning-Strategy+for+Parallel+General+Sparse+Matrix-Matrix+Multiplication+on+GPU&rft.jtitle=Parallel+processing+letters&rft.au=Cui%2C+Huanyu&rft.au=Wang%2C+Nianbin&rft.au=Han%2C+Qilong&rft.au=Wang%2C+Ye&rft.date=2024-06-01&rft.pub=World+Scientific+Publishing+Company&rft.issn=0129-6264&rft.eissn=1793-642X&rft.volume=34&rft.issue=2&rft_id=info:doi/10.1142%2FS012962642450004X&rft.externalDocID=S012962642450004X |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0129-6264&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0129-6264&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0129-6264&client=summon |