High-performance genome sorting program
This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems...
Saved in:
| Published in | Procedia computer science Vol. 193; pp. 464 - 473 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
2021
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1877-0509 1877-0509 |
| DOI | 10.1016/j.procs.2021.10.048 |
Cover
| Abstract | This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas. |
|---|---|
| AbstractList | This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas. |
| Author | Kasilov, Vasily Voinov, Nikita Drobintsev, Pavel |
| Author_xml | – sequence: 1 givenname: Vasily surname: Kasilov fullname: Kasilov, Vasily organization: Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, St.Petersburg, 195251, Russia – sequence: 2 givenname: Pavel surname: Drobintsev fullname: Drobintsev, Pavel organization: Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, St.Petersburg, 195251, Russia – sequence: 3 givenname: Nikita surname: Voinov fullname: Voinov, Nikita email: voinov@ics2.ecd.spbstu.ru organization: Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, St.Petersburg, 195251, Russia |
| BookMark | eNqNkLFOwzAQhi1UJErpE7B0Y0qwHSd2BgZUQYtUiQVmy7UvwVViR3YA9e1xKQNiAG650y99v_TdOZo47wChS4Jzgkl1vcuH4HXMKaYkJTlm4gRNieA8wyWuJ9_uMzSPcYfTFELUhE_R1dq2L9kAofGhV07DogXne1hEH0br2kXqboPqL9Bpo7oI8689Q8_3d0_LdbZ5XD0sbzeZLpgYM0MVA25MQ7Auha44BUHZVuOq4aZmShihVUlLzYuGMspxjWtgDCoQVcHZtpghdux9dYPav6uuk0OwvQp7SbA8-Mqd_PSVB99DmHwTVhwxHXyMAZp_UvUPSttRjda7MSjb_cHeHFlI33izEGTUFtIDjQ2gR2m8_ZX_AIQBhOg |
| CitedBy_id | crossref_primary_10_38126_JSPG210305 |
| Cites_doi | 10.15690/vramn1108 10.1093/nar/gkp1137 10.1038/nrd.2017.226 10.1007/s12575-009-9004-1 |
| ContentType | Journal Article |
| Copyright | 2021 |
| Copyright_xml | – notice: 2021 |
| DBID | 6I. AAFTH AAYXX CITATION ADTOC UNPAY |
| DOI | 10.1016/j.procs.2021.10.048 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1877-0509 |
| EndPage | 473 |
| ExternalDocumentID | 10.1016/j.procs.2021.10.048 10_1016_j_procs_2021_10_048 S1877050921020895 |
| GroupedDBID | --K 0R~ 0SF 1B1 457 5VS 6I. 71M AACTN AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO ABMAC ACGFS ADBBV ADEZE AEXQZ AFTJW AGHFR AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E NCXOZ O-L O9- OK1 P2P RIG ROL SES SSZ AAYWO AAYXX ABWVN ACRPL ACVFH ADCNI ADNMO ADVLN AEUPX AFPUW AIGII AKBMS AKRWK AKYEP CITATION ~HD ADTOC UNPAY |
| ID | FETCH-LOGICAL-c348t-d2a4e7ddf10c58c672e824bc06f7d94a8d8ca525c73f24270909e44e6e86374b3 |
| IEDL.DBID | UNPAY |
| ISSN | 1877-0509 |
| IngestDate | Tue Aug 19 19:04:36 EDT 2025 Wed Oct 01 02:35:52 EDT 2025 Thu Apr 24 23:08:10 EDT 2025 Wed May 17 00:10:39 EDT 2023 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | SAM files genome OpenMP HPC sorting algorithm alignment BAM |
| Language | English |
| License | This is an open access article under the CC BY-NC-ND license. cc-by-nc-nd |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c348t-d2a4e7ddf10c58c672e824bc06f7d94a8d8ca525c73f24270909e44e6e86374b3 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.1016/j.procs.2021.10.048 |
| PageCount | 10 |
| ParticipantIDs | unpaywall_primary_10_1016_j_procs_2021_10_048 crossref_primary_10_1016_j_procs_2021_10_048 crossref_citationtrail_10_1016_j_procs_2021_10_048 elsevier_sciencedirect_doi_10_1016_j_procs_2021_10_048 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2021 2021-00-00 |
| PublicationDateYYYYMMDD | 2021-01-01 |
| PublicationDate_xml | – year: 2021 text: 2021 |
| PublicationDecade | 2020 |
| PublicationTitle | Procedia computer science |
| PublicationYear | 2021 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Sedgewick (bib0009) 1998 Sequence Alignment/Map Format Specification. (2021). Available at Dedov (bib0001) 2019; 74 Commins, Toft, Fares (bib0003) 2009; 11 The OpenMP API specification for parallel programming. Available at Dershowitz, Leong (bib0004) 1989 (accessed 16 June 2021). The OpenMPI – MPI standard implementation. Available at Samtools package. Available at (accessed 30 April 2021). The MPI standard for interprocess communication. Available at Iakobovskii (bib0008) 2004 The Lustre, distributed file system. Available at Dugger, Platt, Goldstein (bib0002) 2018; 17 Supercomputer Center ‘Polytechnic’: Technological Base for Advanced Training in Informatics, Computer Science and Engineering. (2017). Abailable at Ken E. Batcher. (1968). “Sorting networks and their applications.”, Proceedings of the Spring Joint Computer Conference, 307-314. Cock, Fields, Goto, Heuer, Rice (bib00010) 2010; 38 10.1016/j.procs.2021.10.048_bib0007 Iakobovskii (10.1016/j.procs.2021.10.048_bib0008) 2004 Dershowitz (10.1016/j.procs.2021.10.048_bib0004) 1989 Dugger (10.1016/j.procs.2021.10.048_bib0002) 2018; 17 Cock (10.1016/j.procs.2021.10.048_bib00010) 2010; 38 10.1016/j.procs.2021.10.048_bib00011 10.1016/j.procs.2021.10.048_bib0005 10.1016/j.procs.2021.10.048_bib0006 10.1016/j.procs.2021.10.048_bib00012 10.1016/j.procs.2021.10.048_bib00013 10.1016/j.procs.2021.10.048_bib00014 10.1016/j.procs.2021.10.048_bib00015 Dedov (10.1016/j.procs.2021.10.048_bib0001) 2019; 74 Commins (10.1016/j.procs.2021.10.048_bib0003) 2009; 11 Sedgewick (10.1016/j.procs.2021.10.048_bib0009) 1998 |
| References_xml | – volume: 38 start-page: 1767 year: 2010 end-page: 1771 ident: bib00010 article-title: "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." publication-title: Nucleic Acids Research – year: 1989 ident: bib0004 article-title: "Fast exchange sorts" publication-title: Foundations of Data Organization and Algorithms – start-page: 153 year: 2004 end-page: 163 ident: bib0008 article-title: "Parallel Sorting of Large Data Volumes on Distributed Memory Systems" publication-title: Mathematical modeling: modern methods and applications. The book of scientific articles – reference: (accessed 30 April 2021). – reference: (accessed 16 June 2021). – volume: 17 start-page: 183 year: 2018 end-page: 196 ident: bib0002 article-title: "Drug development in the era of precision medicine" publication-title: Nature Reviews Drug Discovery – reference: The MPI standard for interprocess communication. Available at: – reference: Supercomputer Center ‘Polytechnic’: Technological Base for Advanced Training in Informatics, Computer Science and Engineering. (2017). Abailable at: – reference: Sequence Alignment/Map Format Specification. (2021). Available at: – reference: Samtools package. Available at: – reference: Ken E. Batcher. (1968). “Sorting networks and their applications.”, Proceedings of the Spring Joint Computer Conference, 307-314. – volume: 11 start-page: 52 year: 2009 end-page: 78 ident: bib0003 article-title: "Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects" publication-title: Biological Procedures Online – year: 1998 ident: bib0009 publication-title: "Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching, Third Edition" – reference: The OpenMP API specification for parallel programming. Available at: – reference: The Lustre, distributed file system. Available at: – volume: 74 start-page: 61 year: 2019 end-page: 70 ident: bib0001 article-title: "Personalized medicine." publication-title: Annals of the Russian Academy of Medical Sciences – reference: (accessed 16 June 2021). – reference: The OpenMPI – MPI standard implementation. Available at: – ident: 10.1016/j.procs.2021.10.048_bib00013 – volume: 74 start-page: 61 issue: 1 year: 2019 ident: 10.1016/j.procs.2021.10.048_bib0001 article-title: "Personalized medicine." publication-title: Annals of the Russian Academy of Medical Sciences doi: 10.15690/vramn1108 – year: 1989 ident: 10.1016/j.procs.2021.10.048_bib0004 article-title: "Fast exchange sorts" – ident: 10.1016/j.procs.2021.10.048_bib00011 – ident: 10.1016/j.procs.2021.10.048_bib00012 – ident: 10.1016/j.procs.2021.10.048_bib00014 – ident: 10.1016/j.procs.2021.10.048_bib00015 – ident: 10.1016/j.procs.2021.10.048_bib0006 – ident: 10.1016/j.procs.2021.10.048_bib0005 – start-page: 153 year: 2004 ident: 10.1016/j.procs.2021.10.048_bib0008 article-title: "Parallel Sorting of Large Data Volumes on Distributed Memory Systems" – ident: 10.1016/j.procs.2021.10.048_bib0007 – volume: 38 start-page: 1767 issue: 6 year: 2010 ident: 10.1016/j.procs.2021.10.048_bib00010 article-title: "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants." publication-title: Nucleic Acids Research doi: 10.1093/nar/gkp1137 – year: 1998 ident: 10.1016/j.procs.2021.10.048_bib0009 – volume: 17 start-page: 183 year: 2018 ident: 10.1016/j.procs.2021.10.048_bib0002 article-title: "Drug development in the era of precision medicine" publication-title: Nature Reviews Drug Discovery doi: 10.1038/nrd.2017.226 – volume: 11 start-page: 52 year: 2009 ident: 10.1016/j.procs.2021.10.048_bib0003 article-title: "Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects" publication-title: Biological Procedures Online doi: 10.1007/s12575-009-9004-1 |
| SSID | ssj0000388917 |
| Score | 2.2246506 |
| Snippet | This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The... |
| SourceID | unpaywall crossref elsevier |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 464 |
| SubjectTerms | alignment BAM genome HPC OpenMP SAM files sorting algorithm |
| SummonAdditionalLinks | – databaseName: ScienceDirect Free and Delayed Access Journal dbid: IXB link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8MwGA5jF734Lc4vehC8GNemaT6OOhxT0IsOditpkspkdmVuiP_evG06FWSIx4aEtE-a903Ck-dB6Mxw5qYYcKq4ytwGxURYmhyUETmhmc1ZVvmn3D-wwZDejZJRC_WauzBAq_Sxv47pVbT2JV2PZrccj7uPkeAc1Etg0xIKCRfNYyrAvuF2dL08ZwG1E1kZ70J9DA0a8aGK5gV5AmS7SXQJLC_wAfo9Qa0tilJ9vKvJ5FsC6m-hDb9yDK7ql9tGLVvsoM3GlSHwk3QXnQN1A5dfFwIC0GF9tcHbFBQDngNPydpDw_7NU2-AvR0C1u675tgQRS03Jo9CnQjNOLHC4alDlnMjqRJGaJWQRPM4d4mXhzKUllLLrGAxp1m8j9rFtLAHKDDagq4eBblAaohQDsVc8txYE3IidAeRBoNUe61wsKyYpA0p7CWtgEsBOCh0wHXQxbJRWUtlrK7OGnDTHyOeumC-uiFeDsVfOjr8b0dHaB2e6gOXY9Sezxb2xC1B5tlp9Y99Aicy15I priority: 102 providerName: Elsevier |
| Title | High-performance genome sorting program |
| URI | https://dx.doi.org/10.1016/j.procs.2021.10.048 https://doi.org/10.1016/j.procs.2021.10.048 |
| UnpaywallVersion | publishedVersion |
| Volume | 193 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: KQ8 dateStart: 20100501 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVESC databaseName: ScienceDirect Free and Delayed Access Journal customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: IXB dateStart: 20100501 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: AKRWK dateStart: 20100501 isFulltext: true providerName: Library Specific Holdings |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3fS8MwEMcPtz345PyJEx19EHyxo0vTJH2c4pgOh4jD-VTaJBV0dsN1iP715vpj_kCHvpYcKZekd6Hf-xzAoeLMHDHUVPEwMhcU1bZ9FSMZkRMa6ZhFWf-UywHrDenFyBsVnG2shfny_z7TYeGHHLnapN1CGRYVFagxzyTeVagNB1edO7xSCc5tJJmUXKGfLX-LPavzZBq-voTj8afY0q3nRduzDEmIkpLH1jyNWvLtG7Dxj6-9DmtFjml18k2xASs62YR62b_BKo7zFhyhyMOefpQOWEhsfdLWbIJsgXurEG9tw7B7dnPas4vGCbZ0qUhtRUKquVJx25GekIwTLYznpcNirnwaCiVk6BFPcjc2IZo7vuNrSjXTgrmcRu4OVJNJonfBUlIjgY8iWJAqIkLHJ7HPY6WVw4mQDSClSwNZUMWxucU4KOVjD0HmigBdgQ-NKxpwvDCa5lCN5cNZuVZBkRfk8T4wbl9uaC9W9i8T7f1z_D5U0-e5PjBpSRo1odbpX9_2m1A5H500i835Dhmn4Xo |
| linkProvider | Unpaywall |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV09T8MwED2VMpSFb0T5zIDEQmjiOLEzQgUq0HahlbpFie2gopJG0Arx7_ElTgEJVYjVycnRc3x3tt69AziTLNBbDDlVLE70AUW6dihTVEZkhCYqDZKif0qvH3SG9H7kj2rQrmphkFZpfH_p0wtvbUZaBs1WPh63Hl3OGKqX4KHF4aG_AqvU19kJVvGNrhcXLSh3Ehadd9HARotKfajgeWGgQN1u4l4izQsbAf0eoRrzLI8_3uPJ5FsEut2EdZM6Wlfl121BTWXbsFG1ZbDMLt2Bc-Ru2PlXRYCFQqwvynqbomTAk2U4WbswvL0ZtDu26YdgC4_ymS1JTBWTMnUd4XMRMKK4BlQ4QcpkSGMuuYh94gvmpTryMid0QkWpChQPPEYTbw_q2TRT-2BJoVBYj6JeIJWExxrGNGSpVNJhhIsmkAqDSBixcOxZMYkqVthzVAAXIXA4qIFrwsXCKC-1Mpa_HlTgRj-WPNLefLmhvViKv0x08N-JTqHRGfS6Ufeu_3AIa_ikvH05gvrsda6OdT4yS06K_-0TGznauA |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDbgdPzp84UelB8GJGl6ZJehyiDMHhwcE8lTQ_BJ3dcB2if715bTp_oGNeSx4pL0nfC_2-z0PoVHPmjhhoqrjM3AVFd3GiLZAROaGZsSwr-6fcDFh_SK9H8chztqEW5tv_-1KHBR9y4GqTbgdkWFSsoyaLXeLdQM3h4LZ3D1cqwTkGkknNFfrd8q_YszHPp_LtVY7HX2LLVasq2p6VSEKQlDx15kXWUe8_gI0rvvYW2vQ5ZtCrNsU2WjP5DmrV_RsCf5x30RmIPPD0s3QgAGLrswlmE2ALPARevLWHhleXdxd97BsnYBVRUWBNJDVca9sNVSwU48QI53kVMst1QqXQQsmYxIpH1oVoHiZhYig1zAgWcZpF-6iRT3JzgAKtDBD4KIAFqSZChgmxCbfa6JATodqI1C5NlaeKQ3OLcVrLxx7T0hUpuAIeOle00fnCaFpBNZYPZ_VapT4vqOJ96ty-3BAvVnaViQ7_Of4INYqXuTl2aUmRnfjt-AHZ9d7T |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-performance+genome+sorting+program&rft.jtitle=Procedia+computer+science&rft.au=Kasilov%2C+Vasily&rft.au=Drobintsev%2C+Pavel&rft.au=Voinov%2C+Nikita&rft.date=2021&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=193&rft.spage=464&rft.epage=473&rft_id=info:doi/10.1016%2Fj.procs.2021.10.048&rft.externalDocID=S1877050921020895 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon |