Parallel Methods for Finding k-Mismatch Shortest Unique Substrings Using GPU
k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does...
Saved in:
| Published in | IEEE/ACM transactions on computational biology and bioinformatics Vol. 18; no. 1; pp. 386 - 395 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
United States
01.01.2021
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1545-5963 1557-9964 1557-9964 |
| DOI | 10.1109/TCBB.2019.2935061 |
Cover
| Abstract | k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does not have a duplicate (within a Hamming distance of k) elsewhere in the string. The challenge in SUS query is to collectively find the SUS for every position of a massively long string in a both time- and space-efficient manner. All known efforts and results have been focused on improving and optimizing the time and space efficiency of SUS computation in the sequential CPU model. In this work, we propose the first parallel approach for k-mismatch SUS queries, particularly leveraging on the massive multi-threading architecture of the graphic processing unit (GPU) technology. Experimental study performed on a mid-end GPU using real-world biological data shows that our proposal is consistently faster than the fastest CPU solution by a factor of at least 6 for exact SUS queries ( k=0) and at least 23 for approximate SUS queries over DNA sequences ( ), while maintaining nearly the same peak memory usage as the most memory-efficient sequential CPU proposal. Our work provides practitioners a faster tool for SUS finding on massively long strings, and indeed provides the first practical tool for approximate SUS computation, because the any-case quadratical time cost of the state-of-the-art sequential CPU method for approximate SUS queries does not scale well even to modestly long strings. |
|---|---|
| AbstractList | k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does not have a duplicate (within a Hamming distance of k) elsewhere in the string. The challenge in SUS query is to collectively find the SUS for every position of a massively long string in a both time- and space-efficient manner. All known efforts and results have been focused on improving and optimizing the time and space efficiency of SUS computation in the sequential CPU model. In this work, we propose the first parallel approach for k-mismatch SUS queries, particularly leveraging on the massive multi-threading architecture of the graphic processing unit (GPU) technology. Experimental study performed on a mid-end GPU using real-world biological data shows that our proposal is consistently faster than the fastest CPU solution by a factor of at least 6 for exact SUS queries ( k=0) and at least 23 for approximate SUS queries over DNA sequences ( ), while maintaining nearly the same peak memory usage as the most memory-efficient sequential CPU proposal. Our work provides practitioners a faster tool for SUS finding on massively long strings, and indeed provides the first practical tool for approximate SUS computation, because the any-case quadratical time cost of the state-of-the-art sequential CPU method for approximate SUS queries does not scale well even to modestly long strings.k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does not have a duplicate (within a Hamming distance of k) elsewhere in the string. The challenge in SUS query is to collectively find the SUS for every position of a massively long string in a both time- and space-efficient manner. All known efforts and results have been focused on improving and optimizing the time and space efficiency of SUS computation in the sequential CPU model. In this work, we propose the first parallel approach for k-mismatch SUS queries, particularly leveraging on the massive multi-threading architecture of the graphic processing unit (GPU) technology. Experimental study performed on a mid-end GPU using real-world biological data shows that our proposal is consistently faster than the fastest CPU solution by a factor of at least 6 for exact SUS queries ( k=0) and at least 23 for approximate SUS queries over DNA sequences ( ), while maintaining nearly the same peak memory usage as the most memory-efficient sequential CPU proposal. Our work provides practitioners a faster tool for SUS finding on massively long strings, and indeed provides the first practical tool for approximate SUS computation, because the any-case quadratical time cost of the state-of-the-art sequential CPU method for approximate SUS queries does not scale well even to modestly long strings. k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational biology. The k-mismatch SUS query over one given position of a string asks for a shortest substring that covers the given position and does not have a duplicate (within a Hamming distance of k) elsewhere in the string. The challenge in SUS query is to collectively find the SUS for every position of a massively long string in a both time- and space-efficient manner. All known efforts and results have been focused on improving and optimizing the time and space efficiency of SUS computation in the sequential CPU model. In this work, we propose the first parallel approach for k-mismatch SUS queries, particularly leveraging on the massive multi-threading architecture of the graphic processing unit (GPU) technology. Experimental study performed on a mid-end GPU using real-world biological data shows that our proposal is consistently faster than the fastest CPU solution by a factor of at least 6 for exact SUS queries ( k=0) and at least 23 for approximate SUS queries over DNA sequences ( ), while maintaining nearly the same peak memory usage as the most memory-efficient sequential CPU proposal. Our work provides practitioners a faster tool for SUS finding on massively long strings, and indeed provides the first practical tool for approximate SUS computation, because the any-case quadratical time cost of the state-of-the-art sequential CPU method for approximate SUS queries does not scale well even to modestly long strings. |
| Author | Xu, Bojian Schultz, Daniel W. |
| Author_xml | – sequence: 1 givenname: Daniel W. orcidid: 0000-0003-3912-2841 surname: Schultz fullname: Schultz, Daniel W. organization: Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville, TN, USA – sequence: 2 givenname: Bojian orcidid: 0000-0001-5642-6826 surname: Xu fullname: Xu, Bojian organization: Department of Computer Science, Eastern Washington University, Cheney, WA, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31425048$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kDtPwzAUhS1URB_wA1hQRpYUP-N4pBUtSK2o1HaOHOeGGtKk2O7AvydRCwMD073Dd46OviHq1U0NCN0SPCYEq4fNdDIZU0zUmComcEIu0IAIIWOlEt7rfi5ioRLWR0Pv3zGmXGF-hfqMcCowTwdosdJOVxVU0RLCril8VDYumtm6sPVb9BEvrd_rYHbRete4AD5E29p-HiFaH3MfXAv5aOs7dr7aXqPLUlcebs53hDazp830OV68zl-mj4vYUCVDnAqWs1LzMgVOy1xwRnPO024RkUmS5sxgbkQKsoACtEylNJQpCiCBaMNG6P5Ue3BNO8WHbG-9garSNTRHn1HGEpVSjmmL3p3RY76HIjs4u9fuK_sR0ALkBBjXeO-g_EUIzjrJWSc56yRnZ8ltRv7JGBt0sE0dnLbVP8lv9uZ_Mw |
| CitedBy_id | crossref_primary_10_3390_a13090224 crossref_primary_10_3390_a13090234 crossref_primary_10_1371_journal_pone_0251047 |
| Cites_doi | 10.1007/978-3-319-07566-2_18 10.1016/j.tcs.2014.11.004 10.1007/3-540-48194-X_17 10.1007/978-3-319-94968-0_18 10.1007/978-3-319-04298-5_44 10.1007/978-3-319-11918-2_16 10.1007/978-3-319-18120-2_19 10.1007/978-3-662-48971-0_63 10.1186/1471-2105-6-123 10.1016/j.tcs.2017.05.032 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1109/TCBB.2019.2935061 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 1557-9964 |
| EndPage | 395 |
| ExternalDocumentID | 31425048 10_1109_TCBB_2019_2935061 |
| Genre | Journal Article |
| GroupedDBID | 0R~ 29I 4.4 53G 5GY 5VS 6IK 8US 97E AAJGR AAKMM AALFJ AASAJ AAWTH AAWTV AAYXX ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACM ACPRK ADBCU ADL AEBYY AEFXT AEJOY AENEX AENSD AFRAH AFWIH AFWXC AGQYO AHBIQ AIKLT AKJIK AKQYR AKRVB ALMA_UNASSIGNED_HOLDINGS ASPBG ATWAV AVWKF BDXCO BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF CITATION CS3 DU5 EBS EJD FEDTE GUFHI HGAVV HZ~ I07 IEDLZ IFIPE IPLJI JAVBF LAI LHSKQ M43 O9- OCL P1C P2P PQQKQ RIA RIE RNS ROL TN5 AAYOK ADPZR AETIX AGSQL AIBXA CGR CUY CVF ECM EIF NPM RIG RNI RZB W7O XOL 7X8 |
| ID | FETCH-LOGICAL-c297t-853b3fa4f8e42fb5432b448250417668b3c04c58e7dedea7877c2392ee7e1ac3 |
| ISSN | 1545-5963 1557-9964 |
| IngestDate | Sun Sep 28 10:00:34 EDT 2025 Thu Apr 03 07:07:55 EDT 2025 Sat Oct 25 04:05:12 EDT 2025 Thu Apr 24 22:57:18 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c297t-853b3fa4f8e42fb5432b448250417668b3c04c58e7dedea7877c2392ee7e1ac3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ORCID | 0000-0001-5642-6826 0000-0003-3912-2841 |
| PMID | 31425048 |
| PQID | 2336982402 |
| PQPubID | 23479 |
| PageCount | 10 |
| ParticipantIDs | proquest_miscellaneous_2336982402 pubmed_primary_31425048 crossref_primary_10_1109_TCBB_2019_2935061 crossref_citationtrail_10_1109_TCBB_2019_2935061 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-1-1 2021 Jan-Feb 20210101 |
| PublicationDateYYYYMMDD | 2021-01-01 |
| PublicationDate_xml | – month: 01 year: 2021 text: 2021-1-1 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | IEEE/ACM transactions on computational biology and bioinformatics |
| PublicationTitleAlternate | IEEE/ACM Trans Comput Biol Bioinform |
| PublicationYear | 2021 |
| References | mieno (ref11) 2017 ref13 ref14 (ref15) 2017 ref2 ref1 mieno (ref10) 2016 ref8 ref7 ref9 (ref17) 2017 ref4 ref6 ref5 (ref16) 2017 wang (ref12) 2015 pei (ref3) 2013 |
| References_xml | – start-page: 24:1 year: 2017 ident: ref11 article-title: Tight bounds on the maximum number of shortest unique substrings publication-title: Proc Ann Symp Combinatorial Pattern Matching – ident: ref5 doi: 10.1007/978-3-319-07566-2_18 – ident: ref6 doi: 10.1016/j.tcs.2014.11.004 – ident: ref13 doi: 10.1007/3-540-48194-X_17 – start-page: 573 year: 2015 ident: ref12 publication-title: Fast Parallel Suffix Array on the GPU – start-page: 937 year: 2013 ident: ref3 article-title: On shortest unique substring queries publication-title: Proc IEEE Int Conf Data Eng – ident: ref1 doi: 10.1007/978-3-319-94968-0_18 – start-page: 69:1 year: 2016 ident: ref10 article-title: Shortest unique substring queries on run-length encoded strings publication-title: Proc Int Symp Math Found Comput Sci – year: 2017 ident: ref16 – ident: ref4 doi: 10.1007/978-3-319-04298-5_44 – ident: ref7 doi: 10.1007/978-3-319-11918-2_16 – ident: ref14 doi: 10.1007/978-3-319-18120-2_19 – ident: ref8 doi: 10.1007/978-3-662-48971-0_63 – year: 2017 ident: ref15 – ident: ref2 doi: 10.1186/1471-2105-6-123 – ident: ref9 doi: 10.1016/j.tcs.2017.05.032 – year: 2017 ident: ref17 |
| SSID | ssj0024904 |
| Score | 2.2611654 |
| Snippet | k-mismatch shortest unique substring (SUS) queries have been proposed and studied very recently due to its useful applications in the subfield of computational... |
| SourceID | proquest pubmed crossref |
| SourceType | Aggregation Database Index Database Enrichment Source |
| StartPage | 386 |
| SubjectTerms | Algorithms Computational Biology - methods Computer Graphics Image Processing, Computer-Assisted Sequence Analysis, DNA - methods |
| Title | Parallel Methods for Finding k-Mismatch Shortest Unique Substrings Using GPU |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/31425048 https://www.proquest.com/docview/2336982402 |
| Volume | 18 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1557-9964 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0024904 issn: 1545-5963 databaseCode: RIE dateStart: 20040101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Zb9NAEF6FIiReKm7CpUXiiWiDvUdsP9KopUIEVWoq8mbZ67VSWiUocR7or2dmD8e9EOXFcpxkfMzn2ZndmW8I-VCXKuGpilitwX2TWSxZkYxqZqSOKxgussjO6U6-jw5P5NeZmvV6F93qkqYc6osb60r-R6twDPSKVbJ30GwrFA7APugXtqBh2P6Tjo-KFbZCwdxU7ANtqRUGB6euUOWMTU7X4I_q-eB4jim16wY9TKRrRWvRrGzDTpcy8OXopOulYgSIRAvjCbaQCP3E7cKCtl0gwgxil8IJ9j0La9PJoD_W88257Rnrq9kHP4bhu9nGomv5M0DUzz7wuDP74A2mVExl3kgZf0wlDOIoebOVbdHkTKYIVNjuk2u5ed2wW17U6XhvD9PxsiF4KSpyNO6XSbSvDG5tyqENdqIsRxE5isi9iHvkPocRIXK1f1uexsy2nmzvz6-Ig4hP167isk9zS6BiHZbpI7LrIw362cHmMemZxRPywPUe_f2UfAvgoR48FHRHPXjoFjw0gIc68NAteKgFDwXwPCPTg_3p-JD51hpM8yxpGDhppagLWadGcnhfpeAlBOrIZ4eMoWkpdCS1Sk1SmcoUYNUTzcGVNiYxcaHFc7KzWC7MS0Kl0objar4B15ubKq0kkhiWlREiLpKoT6LwbHLtaeex-8l5fqtG-uRj-5dfjnPlbz9-Hx54DpYRl7uKhVlu1jkXYpSluHrYJy-cJlpxAq4V7jV9dZdTvSYPt6_AG7LTrDbmLbikTfnOYucPSIWE1w |
| linkProvider | IEEE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Parallel+Methods+for+Finding+k-Mismatch+Shortest+Unique+Substrings+Using+GPU&rft.jtitle=IEEE%2FACM+transactions+on+computational+biology+and+bioinformatics&rft.au=Schultz%2C+Daniel+W.&rft.au=Xu%2C+Bojian&rft.date=2021-01-01&rft.issn=1545-5963&rft.eissn=1557-9964&rft.volume=18&rft.issue=1&rft.spage=386&rft.epage=395&rft_id=info:doi/10.1109%2FTCBB.2019.2935061&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCBB_2019_2935061 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1545-5963&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1545-5963&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1545-5963&client=summon |