SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU

SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in co...

Full description

Saved in:

Bibliographic Details
Published in	Parallel processing letters Vol. 34; no. 2
Main Authors	Cui, Huanyu, Wang, Nianbin, Han, Qilong, Wang, Ye
Format	Journal Article
Language	English
Published	Singapore World Scientific Publishing Company 01.06.2024 World Scientific Publishing Co. Pte., Ltd
Subjects	Algorithms Atomic properties Grid method Insertion Linear equations Multiplication Nonuniformity Process controls Sparse matrices Sparsity Standard deviation load imbalance hash table parallel efficiency non-uniformity GPU SpGEMM
Online Access	Get full text
ISSN	0129-6264 1793-642X
DOI	10.1142/S012962642450004X

Cover

Abstract	SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in computational efficiency. This paper proposes a new algorithm, SPMSD (SpGEMM Based on Minimum Standard Deviation). The algorithm is developed based on a hash table and partition strategy. First, the number of intermediate results in the matrix is divided into multiple blocks based on a new partition strategy to ensure the minimum standard deviation among blocks. Second, the input matrix is transformed according to the result of the partition strategy. Finally, SPMSD performs the parallel computing of SpGEMM based on the advantages of fast insertion and also fast access storage of the hash table and the calculation process controls the insertion and merging of intermediate results according to the offset to avoid the shortage of atomic operations. These experiments indicate the execution of SPMSD is faster than the existing cuSPARSE libraries by 7.4x. Compared with the Out of Core method, SPMSD improves the computational performance by 1.2x, SPMSD memory utilization is decreased by 0.19x.
AbstractList	SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations. Due to the non-uniformity of some sparse matrices, the existing parallel SpGEMM algorithms suffer from load imbalance, lead to a decrease in computational efficiency. This paper proposes a new algorithm, SPMSD (SpGEMM Based on Minimum Standard Deviation). The algorithm is developed based on a hash table and partition strategy. First, the number of intermediate results in the matrix is divided into multiple blocks based on a new partition strategy to ensure the minimum standard deviation among blocks. Second, the input matrix is transformed according to the result of the partition strategy. Finally, SPMSD performs the parallel computing of SpGEMM based on the advantages of fast insertion and also fast access storage of the hash table and the calculation process controls the insertion and merging of intermediate results according to the offset to avoid the shortage of atomic operations. These experiments indicate the execution of SPMSD is faster than the existing cuSPARSE libraries by 7.4x. Compared with the Out of Core method, SPMSD improves the computational performance by 1.2x, SPMSD memory utilization is decreased by 0.19x.
Author	Cui, Huanyu Wang, Ye Wang, Nianbin Han, Qilong
Author_xml	– sequence: 1 givenname: Huanyu surname: Cui fullname: Cui, Huanyu – sequence: 2 givenname: Nianbin surname: Wang fullname: Wang, Nianbin – sequence: 3 givenname: Qilong surname: Han fullname: Han, Qilong – sequence: 4 givenname: Ye surname: Wang fullname: Wang, Ye
BookMark	eNplUNFKwzAUDTLBOf0A3wI-V2_SNFl9G1OnsOGgDvZWYpKOjJrWJEP397ZUfBncy7ncc8-5cC7RyDXOIHRD4I4QRu8LIDTnlDPKMgBg2zM0JiJPk26zHaFxTyc9f4EuQ9gDECYYjFFZrFfF4wOeObyWPtpoG2fdLimil9HsjrhqfM_IujY1XhhnuhEXrfTB4JWM3v4kA-DVoY62ra2SvQnuarHeXKHzStbBXP_hBG2en97nL8nybfE6ny0TRTnfJrlWTFPJKjplmoGQqSQ8zYGlU6VFxhVQTbJUCTIFRj50TqUAyXLZtaKapxN0O_i2vvk6mBDLfXPwrntZpoRnIhO0S2OCyHClfBOCN1XZevsp_bEkUPY5lic5dhoYNN-Nr3VQ1rhoK6v-paeSX0FDdaE
Cites_doi	10.3390/fi13020036 10.1177/1094342021990738 10.1016/j.compeleceng.2020.106848 10.1137/130948811 10.1145/3229710.3229720 10.1137/0613024 10.1016/j.future.2020.10.036 10.1145/3350755.3400216 10.1109/TPDS.2020.3000708 10.1109/ScalA.2018.00011 10.1109/TPDS.2018.2864729 10.1145/3079079.3079105 10.1137/110848244
ContentType	Journal Article
Copyright	2024, World Scientific Publishing Company 2024. World Scientific Publishing Company
Copyright_xml	– notice: 2024, World Scientific Publishing Company – notice: 2024. World Scientific Publishing Company
DBID	AAYXX CITATION 7SP 8FD L7M
DOI	10.1142/S012962642450004X
DatabaseName	CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace
DatabaseTitle	CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts
DatabaseTitleList	CrossRef Technology Research Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1793-642X
ExternalDocumentID	10_1142_S012962642450004X S012962642450004X
GroupedDBID	-~X .DC 0R~ 123 4.4 8V8 ADSJI ALMA_UNASSIGNED_HOLDINGS CAG COF CS3 DU5 EBS EJD ESX HZ~ H~9 K1G O9- P2P P71 PQQKQ RWJ WSC AAYXX AMVHM CITATION 7SP 8FD L7M
ID	FETCH-LOGICAL-c266X-9dc4d2a4f284d407a3a16390438cd756c02d153c718041bd92a70a49aa49c2d63
ISSN	0129-6264
IngestDate	Mon Jun 30 13:00:12 EDT 2025 Tue Jul 01 00:34:07 EDT 2025 Fri Aug 23 08:19:25 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	2
Keywords	load imbalance hash table parallel efficiency non-uniformity GPU SpGEMM
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c266X-9dc4d2a4f284d407a3a16390438cd756c02d153c718041bd92a70a49aa49c2d63
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-7996-7545 0000-0002-5185-8387 0000-0002-0223-8181 0000-0003-1738-7937
PQID	3165757279
PQPubID	4437618
ParticipantIDs	worldscientific_primary_S012962642450004X crossref_primary_10_1142_S012962642450004X proquest_journals_3165757279
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20240600 2024-06-00 20240601
PublicationDateYYYYMMDD	2024-06-01
PublicationDate_xml	– month: 06 year: 2024 text: 20240600
PublicationDecade	2020
PublicationPlace	Singapore
PublicationPlace_xml	– name: Singapore
PublicationTitle	Parallel processing letters
PublicationYear	2024
Publisher	World Scientific Publishing Company World Scientific Publishing Co. Pte., Ltd
Publisher_xml	– name: World Scientific Publishing Company – name: World Scientific Publishing Co. Pte., Ltd
References	Deveci M. (S012962642450004XBIB022) Lee J. (S012962642450004XBIB026) Xie Z. (S012962642450004XBIB009) S012962642450004XBIB005 Jamour F. (S012962642450004XBIB011) S012962642450004XBIB020 Guo S. (S012962642450004XBIB031) 2015 S012962642450004XBIB002 S012962642450004XBIB024 S012962642450004XBIB001 Parger M. (S012962642450004XBIB014) Xia Y. (S012962642450004XBIB015) S012962642450004XBIB023 S012962642450004XBIB003 S012962642450004XBIB025 Nagasaka Y. (S012962642450004XBIB013) Qin E. (S012962642450004XBIB007) Patwary M. M. A. (S012962642450004XBIB027) Cui H. Y. (S012962642450004XBIB012) 2022 S012962642450004XBIB016 S012962642450004XBIB019 S012962642450004XBIB018 S012962642450004XBIB030 Anh N. Q. Pham (S012962642450004XBIB021) Martone M. (S012962642450004XBIB017) S012962642450004XBIB034 Liu W. F. (S012962642450004XBIB006) Akbudak K. (S012962642450004XBIB010) 2018; 4
References_xml	– ident: S012962642450004XBIB018 doi: 10.3390/fi13020036 – volume: 4 start-page: 1 issue: 3 year: 2018 ident: S012962642450004XBIB010 publication-title: ACM Transactions on Parallel Computing (TOPC) – start-page: 392 volume-title: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ident: S012962642450004XBIB015 – ident: S012962642450004XBIB034 doi: 10.1177/1094342021990738 – start-page: 1 volume-title: Proceedings of the 2016 International Conference on Supercomputing ident: S012962642450004XBIB021 – start-page: 93 volume-title: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ident: S012962642450004XBIB022 – start-page: 1 volume-title: Proceedings of the Fourteenth EuroSys Conference, 2019 46th International Conference on Parallel Processing (ICPP) ident: S012962642450004XBIB011 – ident: S012962642450004XBIB024 doi: 10.1016/j.compeleceng.2020.106848 – start-page: 94 volume-title: Proceedings of the ACM International Conference on Supercomputing ident: S012962642450004XBIB009 – start-page: 362 volume-title: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ident: S012962642450004XBIB014 – start-page: 1 year: 2022 ident: S012962642450004XBIB012 publication-title: The Journal of Supercomputing – start-page: 925 volume-title: 2020 IEEE 36th International Conference on Data Engineering (ICDE) ident: S012962642450004XBIB026 – start-page: 327 volume-title: Proceedings of the International Multiconference on Computer Science and Information Technology ident: S012962642450004XBIB017 – ident: S012962642450004XBIB023 doi: 10.1137/130948811 – ident: S012962642450004XBIB003 doi: 10.1145/3229710.3229720 – ident: S012962642450004XBIB019 doi: 10.1137/0613024 – start-page: 101 volume-title: 2017 46th International Conference on Parallel Processing (ICPP) ident: S012962642450004XBIB013 – ident: S012962642450004XBIB016 doi: 10.1016/j.future.2020.10.036 – start-page: 61 volume-title: CCF National Conference on Computer Engineering and Technology year: 2015 ident: S012962642450004XBIB031 – ident: S012962642450004XBIB020 doi: 10.1145/3350755.3400216 – start-page: 48 volume-title: High Performance Computing: 30th International Conference, ISC High Performance ident: S012962642450004XBIB027 – ident: S012962642450004XBIB030 doi: 10.1109/TPDS.2020.3000708 – start-page: 58 volume-title: IEEE International Symposium on High-Performance Computer Architecture (HPCA) ident: S012962642450004XBIB007 – ident: S012962642450004XBIB002 doi: 10.1109/ScalA.2018.00011 – ident: S012962642450004XBIB001 doi: 10.1109/TPDS.2018.2864729 – ident: S012962642450004XBIB005 doi: 10.1145/3079079.3079105 – start-page: 370 volume-title: IEEE 28th International Parallel and Distributed Processing Symposium ident: S012962642450004XBIB006 – ident: S012962642450004XBIB025 doi: 10.1137/110848244
SSID	ssj0014740 ssib019635363
Score	2.3210864
Snippet	SpGEMM (General Sparse Matrix-Matrix Multiplication) is one of the kernels of an algebraic multi-grid method, graph algorithm, and solving linear equations....
SourceID	proquest crossref worldscientific
SourceType	Aggregation Database Index Database Publisher
SubjectTerms	Algorithms Atomic properties Grid method Insertion Linear equations Multiplication Nonuniformity Process controls Sparse matrices Sparsity Standard deviation
Title	SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPU
URI	http://www.worldscientific.com/doi/abs/10.1142/S012962642450004X https://www.proquest.com/docview/3165757279
Volume	34
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1793-642X dateEnd: 20241105 omitProxy: false ssIdentifier: ssj0014740 issn: 0129-6264 databaseCode: AMVHM dateStart: 19910901 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1JS8NAFB5qvXhxF-vGHLxoiaaTyeatuBWhpVIr9RSSmRQKJS1dQPTP-2bJUuNBhTYtSXgt816-eftD6Jx5tuO7YKZaQ4eLkhxmhI5HDctyPNbgEbM9Ue_c7jitPn0a2INK5bNYXbKIrtjHj3Ul_-EqnAO-iirZP3A2Iwon4DvwF47AYTj-ise9brt3p117XXFde1cN3XNWJWPCFTEwZZy2mK73pmDNiplDi9no3VAf9bbKLNQuPBFDeOz2i6prRmaqaguEj2Esi4Eytfx2qWZgLwFhlrmrXsFJBwQxGiU55Em8ex6NJ3rzLNz7Fhe9EYTmWVMK82QGkIQlmepUdKaVAa4hUm-I6mJ-FSsABrwwwCYaFBFauztHuaFcBn5KZOgZSAqKhIpBDyr381s_7dI9a2idwJZgVtF6s_3aypw9Apts2Z1MB6WoK-trs7-tg-Tw09cloqtqTm67bMpGuPNsgQrKzMs22tRWCG4qkdpBlTjZRVvphA-sAX8PBVLCbnAzwT_KFwb5wqlgYC1fWMkXXpEvvCpfGF4gX_uo_3D_ctsy9EgOg4EmNzB8zignIR2CVsOp6YZWCAq9L8LJjLu2w0zCYQ9loPGYtBFxn4SuGVI_hDcj3LEOUDWZJPEhwrY7DIkZmc4QNFrQkjxG3ZgR4kXc9AmJa-gyXcFgqjqvBKqKngSl5a6hk3SNA_2AzgOrIaKKoKD7NXTxbd0zkiVSR3-49xht5A_BCaouZsv4FJTURXSmhekLuZKNPg
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPMSD%3A+An+Partitioning-Strategy+for+Parallel+General+Sparse+Matrix-Matrix+Multiplication+on+GPU&rft.jtitle=Parallel+processing+letters&rft.au=Cui%2C+Huanyu&rft.au=Wang%2C+Nianbin&rft.au=Han%2C+Qilong&rft.au=Wang%2C+Ye&rft.date=2024-06-01&rft.pub=World+Scientific+Publishing+Company&rft.issn=0129-6264&rft.eissn=1793-642X&rft.volume=34&rft.issue=2&rft_id=info:doi/10.1142%2FS012962642450004X&rft.externalDocID=S012962642450004X
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0129-6264&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0129-6264&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0129-6264&client=summon