High-performance genome sorting program

This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 193; pp. 464 - 473
Main Authors Kasilov, Vasily, Drobintsev, Pavel, Voinov, Nikita
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2021
Subjects
Online AccessGet full text
ISSN1877-0509
1877-0509
DOI10.1016/j.procs.2021.10.048

Cover

Abstract This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas.
AbstractList This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The paper considers different approaches to the implementation of such algorithms, taking into account the capabilities of high-performance systems. Main purpose of the work is to develop a genome sorting program, the efficiency of which significantly exceeds the efficiency of free software analogues. The genome sorting program is implemented for a supercomputer using the C++ language and the OpenMP and OpenMPI. The developed program demonstrates a significant increase in the speed of operation (up to 10 times) compared to free software analogues due to massive parallel data input and output. Different approaches for data input/output parallelization and data processing considered in the paper can be applied in other subject areas.
Author Kasilov, Vasily
Voinov, Nikita
Drobintsev, Pavel
Author_xml – sequence: 1
  givenname: Vasily
  surname: Kasilov
  fullname: Kasilov, Vasily
  organization: Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, St.Petersburg, 195251, Russia
– sequence: 2
  givenname: Pavel
  surname: Drobintsev
  fullname: Drobintsev, Pavel
  organization: Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, St.Petersburg, 195251, Russia
– sequence: 3
  givenname: Nikita
  surname: Voinov
  fullname: Voinov, Nikita
  email: voinov@ics2.ecd.spbstu.ru
  organization: Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, St.Petersburg, 195251, Russia
BookMark eNqNkLFOwzAQhi1UJErpE7B0Y0qwHSd2BgZUQYtUiQVmy7UvwVViR3YA9e1xKQNiAG650y99v_TdOZo47wChS4Jzgkl1vcuH4HXMKaYkJTlm4gRNieA8wyWuJ9_uMzSPcYfTFELUhE_R1dq2L9kAofGhV07DogXne1hEH0br2kXqboPqL9Bpo7oI8689Q8_3d0_LdbZ5XD0sbzeZLpgYM0MVA25MQ7Auha44BUHZVuOq4aZmShihVUlLzYuGMspxjWtgDCoQVcHZtpghdux9dYPav6uuk0OwvQp7SbA8-Mqd_PSVB99DmHwTVhwxHXyMAZp_UvUPSttRjda7MSjb_cHeHFlI33izEGTUFtIDjQ2gR2m8_ZX_AIQBhOg
CitedBy_id crossref_primary_10_38126_JSPG210305
Cites_doi 10.15690/vramn1108
10.1093/nar/gkp1137
10.1038/nrd.2017.226
10.1007/s12575-009-9004-1
ContentType Journal Article
Copyright 2021
Copyright_xml – notice: 2021
DBID 6I.
AAFTH
AAYXX
CITATION
ADTOC
UNPAY
DOI 10.1016/j.procs.2021.10.048
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1877-0509
EndPage 473
ExternalDocumentID 10.1016/j.procs.2021.10.048
10_1016_j_procs_2021_10_048
S1877050921020895
GroupedDBID --K
0R~
0SF
1B1
457
5VS
6I.
71M
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQFI
AAXUO
ABMAC
ACGFS
ADBBV
ADEZE
AEXQZ
AFTJW
AGHFR
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
E3Z
EBS
EJD
EP3
FDB
FNPLU
HZ~
IXB
KQ8
M41
M~E
NCXOZ
O-L
O9-
OK1
P2P
RIG
ROL
SES
SSZ
AAYWO
AAYXX
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEUPX
AFPUW
AIGII
AKBMS
AKRWK
AKYEP
CITATION
~HD
ADTOC
UNPAY
ID FETCH-LOGICAL-c348t-d2a4e7ddf10c58c672e824bc06f7d94a8d8ca525c73f24270909e44e6e86374b3
IEDL.DBID UNPAY
ISSN 1877-0509
IngestDate Tue Aug 19 19:04:36 EDT 2025
Wed Oct 01 02:35:52 EDT 2025
Thu Apr 24 23:08:10 EDT 2025
Wed May 17 00:10:39 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords SAM files
genome
OpenMP
HPC
sorting algorithm
alignment
BAM
Language English
License This is an open access article under the CC BY-NC-ND license.
cc-by-nc-nd
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c348t-d2a4e7ddf10c58c672e824bc06f7d94a8d8ca525c73f24270909e44e6e86374b3
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.1016/j.procs.2021.10.048
PageCount 10
ParticipantIDs unpaywall_primary_10_1016_j_procs_2021_10_048
crossref_primary_10_1016_j_procs_2021_10_048
crossref_citationtrail_10_1016_j_procs_2021_10_048
elsevier_sciencedirect_doi_10_1016_j_procs_2021_10_048
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021
2021-00-00
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – year: 2021
  text: 2021
PublicationDecade 2020
PublicationTitle Procedia computer science
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Sedgewick (bib0009) 1998
Sequence Alignment/Map Format Specification. (2021). Available at
Dedov (bib0001) 2019; 74
Commins, Toft, Fares (bib0003) 2009; 11
The OpenMP API specification for parallel programming. Available at
Dershowitz, Leong (bib0004) 1989
(accessed 16 June 2021).
The OpenMPI – MPI standard implementation. Available at
Samtools package. Available at
(accessed 30 April 2021).
The MPI standard for interprocess communication. Available at
Iakobovskii (bib0008) 2004
The Lustre, distributed file system. Available at
Dugger, Platt, Goldstein (bib0002) 2018; 17
Supercomputer Center ‘Polytechnic’: Technological Base for Advanced Training in Informatics, Computer Science and Engineering. (2017). Abailable at
Ken E. Batcher. (1968). “Sorting networks and their applications.”, Proceedings of the Spring Joint Computer Conference, 307-314.
Cock, Fields, Goto, Heuer, Rice (bib00010) 2010; 38
10.1016/j.procs.2021.10.048_bib0007
Iakobovskii (10.1016/j.procs.2021.10.048_bib0008) 2004
Dershowitz (10.1016/j.procs.2021.10.048_bib0004) 1989
Dugger (10.1016/j.procs.2021.10.048_bib0002) 2018; 17
Cock (10.1016/j.procs.2021.10.048_bib00010) 2010; 38
10.1016/j.procs.2021.10.048_bib00011
10.1016/j.procs.2021.10.048_bib0005
10.1016/j.procs.2021.10.048_bib0006
10.1016/j.procs.2021.10.048_bib00012
10.1016/j.procs.2021.10.048_bib00013
10.1016/j.procs.2021.10.048_bib00014
10.1016/j.procs.2021.10.048_bib00015
Dedov (10.1016/j.procs.2021.10.048_bib0001) 2019; 74
Commins (10.1016/j.procs.2021.10.048_bib0003) 2009; 11
Sedgewick (10.1016/j.procs.2021.10.048_bib0009) 1998
References_xml – volume: 38
  start-page: 1767
  year: 2010
  end-page: 1771
  ident: bib00010
  article-title: "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants."
  publication-title: Nucleic Acids Research
– year: 1989
  ident: bib0004
  article-title: "Fast exchange sorts"
  publication-title: Foundations of Data Organization and Algorithms
– start-page: 153
  year: 2004
  end-page: 163
  ident: bib0008
  article-title: "Parallel Sorting of Large Data Volumes on Distributed Memory Systems"
  publication-title: Mathematical modeling: modern methods and applications. The book of scientific articles
– reference: (accessed 30 April 2021).
– reference: (accessed 16 June 2021).
– volume: 17
  start-page: 183
  year: 2018
  end-page: 196
  ident: bib0002
  article-title: "Drug development in the era of precision medicine"
  publication-title: Nature Reviews Drug Discovery
– reference: The MPI standard for interprocess communication. Available at:
– reference: Supercomputer Center ‘Polytechnic’: Technological Base for Advanced Training in Informatics, Computer Science and Engineering. (2017). Abailable at:
– reference: Sequence Alignment/Map Format Specification. (2021). Available at:
– reference: Samtools package. Available at:
– reference: Ken E. Batcher. (1968). “Sorting networks and their applications.”, Proceedings of the Spring Joint Computer Conference, 307-314.
– volume: 11
  start-page: 52
  year: 2009
  end-page: 78
  ident: bib0003
  article-title: "Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects"
  publication-title: Biological Procedures Online
– year: 1998
  ident: bib0009
  publication-title: "Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching, Third Edition"
– reference: The OpenMP API specification for parallel programming. Available at:
– reference: The Lustre, distributed file system. Available at:
– volume: 74
  start-page: 61
  year: 2019
  end-page: 70
  ident: bib0001
  article-title: "Personalized medicine."
  publication-title: Annals of the Russian Academy of Medical Sciences
– reference: (accessed 16 June 2021).
– reference: The OpenMPI – MPI standard implementation. Available at:
– ident: 10.1016/j.procs.2021.10.048_bib00013
– volume: 74
  start-page: 61
  issue: 1
  year: 2019
  ident: 10.1016/j.procs.2021.10.048_bib0001
  article-title: "Personalized medicine."
  publication-title: Annals of the Russian Academy of Medical Sciences
  doi: 10.15690/vramn1108
– year: 1989
  ident: 10.1016/j.procs.2021.10.048_bib0004
  article-title: "Fast exchange sorts"
– ident: 10.1016/j.procs.2021.10.048_bib00011
– ident: 10.1016/j.procs.2021.10.048_bib00012
– ident: 10.1016/j.procs.2021.10.048_bib00014
– ident: 10.1016/j.procs.2021.10.048_bib00015
– ident: 10.1016/j.procs.2021.10.048_bib0006
– ident: 10.1016/j.procs.2021.10.048_bib0005
– start-page: 153
  year: 2004
  ident: 10.1016/j.procs.2021.10.048_bib0008
  article-title: "Parallel Sorting of Large Data Volumes on Distributed Memory Systems"
– ident: 10.1016/j.procs.2021.10.048_bib0007
– volume: 38
  start-page: 1767
  issue: 6
  year: 2010
  ident: 10.1016/j.procs.2021.10.048_bib00010
  article-title: "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants."
  publication-title: Nucleic Acids Research
  doi: 10.1093/nar/gkp1137
– year: 1998
  ident: 10.1016/j.procs.2021.10.048_bib0009
– volume: 17
  start-page: 183
  year: 2018
  ident: 10.1016/j.procs.2021.10.048_bib0002
  article-title: "Drug development in the era of precision medicine"
  publication-title: Nature Reviews Drug Discovery
  doi: 10.1038/nrd.2017.226
– volume: 11
  start-page: 52
  year: 2009
  ident: 10.1016/j.procs.2021.10.048_bib0003
  article-title: "Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects"
  publication-title: Biological Procedures Online
  doi: 10.1007/s12575-009-9004-1
SSID ssj0000388917
Score 2.2246506
Snippet This paper is devoted to the practical application of parallel sorting algorithms and parallel input-output methods for the problem of genome alignment. The...
SourceID unpaywall
crossref
elsevier
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 464
SubjectTerms alignment
BAM
genome
HPC
OpenMP
SAM files
sorting algorithm
SummonAdditionalLinks – databaseName: ScienceDirect Free and Delayed Access Journal
  dbid: IXB
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NS8MwGA5jF734Lc4vehC8GNemaT6OOhxT0IsOditpkspkdmVuiP_evG06FWSIx4aEtE-a903Ck-dB6Mxw5qYYcKq4ytwGxURYmhyUETmhmc1ZVvmn3D-wwZDejZJRC_WauzBAq_Sxv47pVbT2JV2PZrccj7uPkeAc1Etg0xIKCRfNYyrAvuF2dL08ZwG1E1kZ70J9DA0a8aGK5gV5AmS7SXQJLC_wAfo9Qa0tilJ9vKvJ5FsC6m-hDb9yDK7ql9tGLVvsoM3GlSHwk3QXnQN1A5dfFwIC0GF9tcHbFBQDngNPydpDw_7NU2-AvR0C1u675tgQRS03Jo9CnQjNOLHC4alDlnMjqRJGaJWQRPM4d4mXhzKUllLLrGAxp1m8j9rFtLAHKDDagq4eBblAaohQDsVc8txYE3IidAeRBoNUe61wsKyYpA0p7CWtgEsBOCh0wHXQxbJRWUtlrK7OGnDTHyOeumC-uiFeDsVfOjr8b0dHaB2e6gOXY9Sezxb2xC1B5tlp9Y99Aicy15I
  priority: 102
  providerName: Elsevier
Title High-performance genome sorting program
URI https://dx.doi.org/10.1016/j.procs.2021.10.048
https://doi.org/10.1016/j.procs.2021.10.048
UnpaywallVersion publishedVersion
Volume 193
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: KQ8
  dateStart: 20100501
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVESC
  databaseName: ScienceDirect Free and Delayed Access Journal
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: IXB
  dateStart: 20100501
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: AKRWK
  dateStart: 20100501
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3fS8MwEMcPtz345PyJEx19EHyxo0vTJH2c4pgOh4jD-VTaJBV0dsN1iP715vpj_kCHvpYcKZekd6Hf-xzAoeLMHDHUVPEwMhcU1bZ9FSMZkRMa6ZhFWf-UywHrDenFyBsVnG2shfny_z7TYeGHHLnapN1CGRYVFagxzyTeVagNB1edO7xSCc5tJJmUXKGfLX-LPavzZBq-voTj8afY0q3nRduzDEmIkpLH1jyNWvLtG7Dxj6-9DmtFjml18k2xASs62YR62b_BKo7zFhyhyMOefpQOWEhsfdLWbIJsgXurEG9tw7B7dnPas4vGCbZ0qUhtRUKquVJx25GekIwTLYznpcNirnwaCiVk6BFPcjc2IZo7vuNrSjXTgrmcRu4OVJNJonfBUlIjgY8iWJAqIkLHJ7HPY6WVw4mQDSClSwNZUMWxucU4KOVjD0HmigBdgQ-NKxpwvDCa5lCN5cNZuVZBkRfk8T4wbl9uaC9W9i8T7f1z_D5U0-e5PjBpSRo1odbpX9_2m1A5H500i835Dhmn4Xo
linkProvider Unpaywall
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV09T8MwED2VMpSFb0T5zIDEQmjiOLEzQgUq0HahlbpFie2gopJG0Arx7_ElTgEJVYjVycnRc3x3tt69AziTLNBbDDlVLE70AUW6dihTVEZkhCYqDZKif0qvH3SG9H7kj2rQrmphkFZpfH_p0wtvbUZaBs1WPh63Hl3OGKqX4KHF4aG_AqvU19kJVvGNrhcXLSh3Ehadd9HARotKfajgeWGgQN1u4l4izQsbAf0eoRrzLI8_3uPJ5FsEut2EdZM6Wlfl121BTWXbsFG1ZbDMLt2Bc-Ru2PlXRYCFQqwvynqbomTAk2U4WbswvL0ZtDu26YdgC4_ymS1JTBWTMnUd4XMRMKK4BlQ4QcpkSGMuuYh94gvmpTryMid0QkWpChQPPEYTbw_q2TRT-2BJoVBYj6JeIJWExxrGNGSpVNJhhIsmkAqDSBixcOxZMYkqVthzVAAXIXA4qIFrwsXCKC-1Mpa_HlTgRj-WPNLefLmhvViKv0x08N-JTqHRGfS6Ufeu_3AIa_ikvH05gvrsda6OdT4yS06K_-0TGznauA
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PS8MwFMeDbgdPzp84UelB8GJGl6ZJehyiDMHhwcE8lTQ_BJ3dcB2if715bTp_oGNeSx4pL0nfC_2-z0PoVHPmjhhoqrjM3AVFd3GiLZAROaGZsSwr-6fcDFh_SK9H8chztqEW5tv_-1KHBR9y4GqTbgdkWFSsoyaLXeLdQM3h4LZ3D1cqwTkGkknNFfrd8q_YszHPp_LtVY7HX2LLVasq2p6VSEKQlDx15kXWUe8_gI0rvvYW2vQ5ZtCrNsU2WjP5DmrV_RsCf5x30RmIPPD0s3QgAGLrswlmE2ALPARevLWHhleXdxd97BsnYBVRUWBNJDVca9sNVSwU48QI53kVMst1QqXQQsmYxIpH1oVoHiZhYig1zAgWcZpF-6iRT3JzgAKtDBD4KIAFqSZChgmxCbfa6JATodqI1C5NlaeKQ3OLcVrLxx7T0hUpuAIeOle00fnCaFpBNZYPZ_VapT4vqOJ96ty-3BAvVnaViQ7_Of4INYqXuTl2aUmRnfjt-AHZ9d7T
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High-performance+genome+sorting+program&rft.jtitle=Procedia+computer+science&rft.au=Kasilov%2C+Vasily&rft.au=Drobintsev%2C+Pavel&rft.au=Voinov%2C+Nikita&rft.date=2021&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=193&rft.spage=464&rft.epage=473&rft_id=info:doi/10.1016%2Fj.procs.2021.10.048&rft.externalDocID=S1877050921020895
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon