A fast algorithm for constructing suffix arrays for DNA alphabets

The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length...

Full description

Saved in:
Bibliographic Details
Published inJournal of King Saud University. Computer and information sciences Vol. 34; no. 7; pp. 4659 - 4668
Main Authors Rabea, Zeinab, El-Metwally, Sara, Elmougy, Samir, Zakaria, Magdi
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.07.2022
Elsevier
Subjects
Online AccessGet full text
ISSN1319-1578
2213-1248
2213-1248
DOI10.1016/j.jksuci.2022.04.015

Cover

Abstract The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length genomes. Building a suffix array itself is an expensive-resource process since the computations are dominant by sorting suffixes in a lexical order. Most of the suffix array construction algorithms consider the general and integer alphabets without utilizing special cases for fixed-size ones such as DNA alphabets. In this paper, we exploit the nature of four-sized DNA alphabets and utilize their predefined lexicographical ordering in order to construct suffix arrays for genomic data correctly and efficiently. The suffix array construction algorithm for DNA alphabets is evaluated using three real data sets with different lengths ranging from small E-coli genome to long length Homo sapiens GRCh38.p13 chromosomes. For long length genomes, their corresponding sequence is divided into parts (i.e. reads) with a minimum overlap length, the suffix array is computed for each part separately, and finally all partially computed arrays are merged together into a single one. We studied the effects of varying the reads/overlap lengths on the running time of the proposed suffix array construction algorithm and conclude that the minimum overlap length should be equal to the average length of the longest common prefix between the adjacent parts.
AbstractList The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length genomes. Building a suffix array itself is an expensive-resource process since the computations are dominant by sorting suffixes in a lexical order. Most of the suffix array construction algorithms consider the general and integer alphabets without utilizing special cases for fixed-size ones such as DNA alphabets. In this paper, we exploit the nature of four-sized DNA alphabets and utilize their predefined lexicographical ordering in order to construct suffix arrays for genomic data correctly and efficiently. The suffix array construction algorithm for DNA alphabets is evaluated using three real data sets with different lengths ranging from small E-coli genome to long length Homo sapiens GRCh38.p13 chromosomes. For long length genomes, their corresponding sequence is divided into parts (i.e. reads) with a minimum overlap length, the suffix array is computed for each part separately, and finally all partially computed arrays are merged together into a single one. We studied the effects of varying the reads/overlap lengths on the running time of the proposed suffix array construction algorithm and conclude that the minimum overlap length should be equal to the average length of the longest common prefix between the adjacent parts.
Author Zakaria, Magdi
Elmougy, Samir
El-Metwally, Sara
Rabea, Zeinab
Author_xml – sequence: 1
  givenname: Zeinab
  surname: Rabea
  fullname: Rabea, Zeinab
– sequence: 2
  givenname: Sara
  surname: El-Metwally
  fullname: El-Metwally, Sara
  email: sarah_almetwally4@mans.edu.eg
– sequence: 3
  givenname: Samir
  surname: Elmougy
  fullname: Elmougy, Samir
– sequence: 4
  givenname: Magdi
  surname: Zakaria
  fullname: Zakaria, Magdi
BookMark eNqNkE1v1DAQQC1UJJbSf8AhfyBhxh-xlwPSqnxVquACZ2vi2FuHNFnZXuj-e9Km4tADcBppRu9J816ys2mePGOvERoEbN8MzfAjH11sOHDegGwA1TO24RxFjVyaM7ZBgdsalTYv2EXOAwCgbpUU7YbtdlWgXCoa93OK5ea2CnOq3Dzlko6uxGlf5WMI8a6ilOiUH87vv-wW4HBDnS_5FXseaMz-4nGes-8fP3y7_Fxff_10dbm7rp1EU2pDrTO6BwzKdNwZoXDrhZCd6pUL1IJWCmQnyBAXKD0gaHIh6D4Y570S5-xq9fYzDfaQ4i2lk50p2ofFnPaWUolu9FYprRHU8jUHGbDtSAgPYutV12rPu8WlVtdxOtDpF43jHyGCvc9qB7tmtfdZLUi7ZF04uXIuzTknH_4Xe_sEc7FQifNUEsXxX_C7FfZL3J_RJ5td9JPzfUzeleX7-HfBb4Y0qCw
CitedBy_id crossref_primary_10_1016_j_ipm_2024_103777
Cites_doi 10.1109/TC.2018.2842050
10.1093/bioinformatics/bty544
10.1186/s13015-016-0068-6
10.1109/TC.2010.188
10.1016/j.tcs.2021.06.004
10.1093/bib/bbt081
10.1145/355541.355547
10.1145/2493175.2493180
10.1016/j.parco.2007.06.004
10.1093/bioinformatics/bts173
10.1145/1217856.1217858
10.1016/j.tcs.2017.03.039
10.1137/070685373
10.1186/s13059-020-1935-5
10.1016/j.ipl.2016.09.010
10.3389/fgene.2018.00035
10.1186/s13015-017-0117-9
10.1145/321941.321946
10.1038/nrg.2016.49
10.1016/j.isci.2019.06.035
10.1109/TC.2021.3061709
10.1186/s13015-019-0140-0
10.1089/cmb.2017.0021
ContentType Journal Article
Copyright 2022 The Author(s)
Copyright_xml – notice: 2022 The Author(s)
DBID 6I.
AAFTH
AAYXX
CITATION
ADTOC
UNPAY
DOA
DOI 10.1016/j.jksuci.2022.04.015
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2213-1248
EndPage 4668
ExternalDocumentID oai_doaj_org_article_5577105319204f16ba33e039e5b67e2b
10.1016/j.jksuci.2022.04.015
10_1016_j_jksuci_2022_04_015
S1319157822001434
GroupedDBID --K
0R~
0SF
4.4
457
5VS
6I.
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQXK
AAXUO
ABMAC
ACGFS
ADBBV
ADEZE
AEXQZ
AFTJW
AGHFR
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ASPBG
AVWKF
AZFZN
BCNDV
EBS
EJD
FDB
FEDTE
FGOYB
GROUPED_DOAJ
HVGLF
HZ~
IPNFZ
IXB
KQ8
M41
NCXOZ
O-L
O9-
OK1
R2-
RIG
ROL
SES
SSZ
XH2
AAJSJ
AASML
AAYWO
AAYXX
ABEEZ
ABWVN
ACULB
ADVLN
AFGXO
AFJKZ
AGQPQ
APXCP
BGLVJ
C6C
CCPQU
CITATION
K7-
PHGZM
PHGZT
PIMPY
PQGLB
SOJ
ADTOC
UNPAY
ID FETCH-LOGICAL-c418t-8a6c87d01f58b2c83519e334b5d5cfa6075504b3a8a2314e0107acff7df8cee53
IEDL.DBID DOA
ISSN 1319-1578
2213-1248
IngestDate Fri Oct 03 12:50:38 EDT 2025
Tue Aug 19 20:22:17 EDT 2025
Thu Apr 24 23:03:18 EDT 2025
Wed Oct 01 05:02:25 EDT 2025
Fri Feb 23 02:40:36 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Keywords Suffixes
Longest Common Prefix
Suffix arrays
DNA alphabets
Burrows-Wheeler transform
Language English
License This is an open access article under the CC BY-NC-ND license.
cc-by-nc-nd
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c418t-8a6c87d01f58b2c83519e334b5d5cfa6075504b3a8a2314e0107acff7df8cee53
OpenAccessLink https://doaj.org/article/5577105319204f16ba33e039e5b67e2b
PageCount 10
ParticipantIDs doaj_primary_oai_doaj_org_article_5577105319204f16ba33e039e5b67e2b
unpaywall_primary_10_1016_j_jksuci_2022_04_015
crossref_primary_10_1016_j_jksuci_2022_04_015
crossref_citationtrail_10_1016_j_jksuci_2022_04_015
elsevier_sciencedirect_doi_10_1016_j_jksuci_2022_04_015
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-07-00
2022-07-01
PublicationDateYYYYMMDD 2022-07-01
PublicationDate_xml – month: 07
  year: 2022
  text: 2022-07-00
PublicationDecade 2020
PublicationTitle Journal of King Saud University. Computer and information sciences
PublicationYear 2022
Publisher Elsevier B.V
Elsevier
Publisher_xml – name: Elsevier B.V
– name: Elsevier
References Louza, Gog, Telles (b0135) 2017; 118
Nong, Zhang, Chan (b0165) 2009; 2009
Kärkkäinen, Sanders (b0080) 2003
Manber, Myers (b0150) 1993; 22
Ge, Sen, Wai Hong (b0060) 2011; 60
Adjeroh, Bell, Mukherjee (b0005) 2008
Alkhateeb, Rueda (b0010) 2017; 24
Farach-Colton (b0045) 1997
Louza, Telles, Ciferri (b0140) 2013
Cox, Bauer, Jakobi, Rosone (b0030) 2012; 28
Ko, Aluru (b0105) 2003
Keel, Snelling (b0090) 2018; 9
Kempa, D., Kociumaka, T. 2021. Breaking the $ O (n) $-barrier in the construction of compressed suffix arrays. arXiv preprint arXiv:2106.12725.
McCreight (b0155) 1976; 23
Hon, Sadakane, Sung (b0075) 2009; 38
Shukhrov, Y., 2019. Lightweight Massively Parallel Suffix Array Construction.
Wu (b0185) 2016; 11
Cenzato, D., Z. Lipták, 2022. A theoretical and experimental analysis of BWT variants for string collections. arXiv preprint arXiv:2202.13235.
Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.
Goodwin, McPherson, McCombie (b0065) 2016; 17
Nong (b0160) 2013; 31
Futamura, N., Aluru, S., Kurtz, S. 2001. Parallel suffix sorting.
Louza, Gog, Telles (b0130) 2017; 678
Farach-Colton, Ferragina, Muthukrishnan (b0050) 2000; 47
Daykin, Mhaskar, Smyth (b0035) 2021; 880
Haghshenas, Sahinalp, Hach, Berger (b0070) 2019; 35
Louza, Telles, Hoffmann (b0145) 2017; 12
Lao, Nong, Chan, Xie (b0115) 2018; 67
Kim, Sim, Park (b0100) 2003
Kulla, Sanders (b0110) 2007; 33
Amarasinghe, Su, Dong, Zappia, Ritchie, Gouil (b0015) 2020; 21
Bansal, Boucher (b0020) 2019; 18
Egidi, Louza, Manzini (b0040) 2019; 14
Osipov (b0170) 2012
Lao, Wu, Nong, Chan (b0120) 2021; 71
Kärkkäinen, Sanders, Burkhardt (b0085) 2006; 53
Shrestha, Frith, Horton (b0175) 2014; 15
10.1016/j.jksuci.2022.04.015_b0125
Louza (10.1016/j.jksuci.2022.04.015_b0135) 2017; 118
Adjeroh (10.1016/j.jksuci.2022.04.015_b0005) 2008
10.1016/j.jksuci.2022.04.015_b0025
Farach-Colton (10.1016/j.jksuci.2022.04.015_b0045) 1997
Haghshenas (10.1016/j.jksuci.2022.04.015_b0070) 2019; 35
McCreight (10.1016/j.jksuci.2022.04.015_b0155) 1976; 23
Farach-Colton (10.1016/j.jksuci.2022.04.015_b0050) 2000; 47
Hon (10.1016/j.jksuci.2022.04.015_b0075) 2009; 38
Amarasinghe (10.1016/j.jksuci.2022.04.015_b0015) 2020; 21
Nong (10.1016/j.jksuci.2022.04.015_b0165) 2009; 2009
10.1016/j.jksuci.2022.04.015_b0180
Kim (10.1016/j.jksuci.2022.04.015_b0100) 2003
Ge (10.1016/j.jksuci.2022.04.015_b0060) 2011; 60
Egidi (10.1016/j.jksuci.2022.04.015_b0040) 2019; 14
Keel (10.1016/j.jksuci.2022.04.015_b0090) 2018; 9
Bansal (10.1016/j.jksuci.2022.04.015_b0020) 2019; 18
Kulla (10.1016/j.jksuci.2022.04.015_b0110) 2007; 33
Cox (10.1016/j.jksuci.2022.04.015_b0030) 2012; 28
Louza (10.1016/j.jksuci.2022.04.015_b0145) 2017; 12
10.1016/j.jksuci.2022.04.015_b0055
Alkhateeb (10.1016/j.jksuci.2022.04.015_b0010) 2017; 24
Louza (10.1016/j.jksuci.2022.04.015_b0130) 2017; 678
Osipov (10.1016/j.jksuci.2022.04.015_b0170) 2012
Goodwin (10.1016/j.jksuci.2022.04.015_b0065) 2016; 17
Manber (10.1016/j.jksuci.2022.04.015_b0150) 1993; 22
10.1016/j.jksuci.2022.04.015_b0095
Nong (10.1016/j.jksuci.2022.04.015_b0160) 2013; 31
Ko (10.1016/j.jksuci.2022.04.015_b0105) 2003
Shrestha (10.1016/j.jksuci.2022.04.015_b0175) 2014; 15
Lao (10.1016/j.jksuci.2022.04.015_b0115) 2018; 67
Kärkkäinen (10.1016/j.jksuci.2022.04.015_b0080) 2003
Wu (10.1016/j.jksuci.2022.04.015_b0185) 2016; 11
Lao (10.1016/j.jksuci.2022.04.015_b0120) 2021; 71
Louza (10.1016/j.jksuci.2022.04.015_b0140) 2013
Kärkkäinen (10.1016/j.jksuci.2022.04.015_b0085) 2006; 53
Daykin (10.1016/j.jksuci.2022.04.015_b0035) 2021; 880
References_xml – volume: 11
  start-page: 9
  year: 2016
  ident: b0185
  article-title: Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays
  publication-title: Algorithms Mol. Biol.
– reference: Futamura, N., Aluru, S., Kurtz, S. 2001. Parallel suffix sorting.
– volume: 2009
  start-page: 193
  year: 2009
  end-page: 202
  ident: b0165
  article-title: Linear suffix array construction by almost pure induced-sorting
  publication-title: Data Compression Conference
– reference: Kempa, D., Kociumaka, T. 2021. Breaking the $ O (n) $-barrier in the construction of compressed suffix arrays. arXiv preprint arXiv:2106.12725.
– year: 2003
  ident: b0080
  article-title: Simple linear work suffix array construction
  publication-title: International colloquium on automata, languages, and programming
– volume: 678
  start-page: 22
  year: 2017
  end-page: 39
  ident: b0130
  article-title: Inducing enhanced suffix arrays for string collections
  publication-title: Theoret. Comput. Sci.
– volume: 15
  start-page: 138
  year: 2014
  end-page: 154
  ident: b0175
  article-title: A bioinformatician’s guide to the forefront of suffix array construction algorithms
  publication-title: Brief. Bioinf.
– reference: Cenzato, D., Z. Lipták, 2022. A theoretical and experimental analysis of BWT variants for string collections. arXiv preprint arXiv:2202.13235.
– year: 1997
  ident: b0045
  article-title: Optimal Suffix Tree Construction with Large Alphabets
  publication-title: FOCS.
– volume: 24
  start-page: 746
  year: 2017
  end-page: 755
  ident: b0010
  article-title: Zseq: an approach for preprocessing next-generation sequencing data
  publication-title: J. Comput. Biol.
– volume: 18
  start-page: 37
  year: 2019
  end-page: 41
  ident: b0020
  article-title: Sequencing technologies and analyses: where have we been and where are we going?
  publication-title: iScience
– volume: 14
  start-page: 6
  year: 2019
  ident: b0040
  article-title: External memory BWT and LCP computation for sequence collections with applications
  publication-title: Algorithms Mol. Biol.
– year: 2012
  ident: b0170
  article-title: Parallel suffix array construction for shared memory architectures
  publication-title: International Symposium on String Processing and Information Retrieval
– year: 2008
  ident: b0005
  article-title: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching
– volume: 28
  start-page: 1415
  year: 2012
  end-page: 1419
  ident: b0030
  article-title: Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
  publication-title: Bioinformatics
– year: 2003
  ident: b0105
  article-title: Space efficient linear time construction of suffix arrays
  publication-title: Annual Symposium on Combinatorial Pattern Matching
– volume: 33
  start-page: 605
  year: 2007
  end-page: 612
  ident: b0110
  article-title: Scalable parallel suffix array construction
  publication-title: Parallel Comput.
– reference: Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.
– year: 2003
  ident: b0100
  article-title: Linear-time construction of suffix arrays
  publication-title: Annual Symposium on Combinatorial Pattern Matching
– volume: 53
  start-page: 918
  year: 2006
  end-page: 936
  ident: b0085
  article-title: Linear work suffix array construction
  publication-title: J. ACM
– volume: 12
  start-page: 26
  year: 2017
  ident: b0145
  article-title: Generalized enhanced suffix array construction in external memory
  publication-title: Algorithms Mol. Biol.
– volume: 17
  start-page: 333
  year: 2016
  end-page: 351
  ident: b0065
  article-title: Coming of age: ten years of next-generation sequencing technologies
  publication-title: Nat. Rev. Genet.
– volume: 38
  start-page: 2162
  year: 2009
  end-page: 2178
  ident: b0075
  article-title: Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
  publication-title: SIAM J. Comput.
– year: 2013
  ident: b0140
  article-title: External memory generalized suffix and LCP arrays construction
  publication-title: Annual Symposium on Combinatorial Pattern Matching
– volume: 35
  start-page: 20
  year: 2019
  end-page: 27
  ident: b0070
  article-title: lordFAST: sensitive and fast alignment search tool for long noisy read sequencing data
  publication-title: Bioinformatics
– volume: 880
  start-page: 82
  year: 2021
  end-page: 96
  ident: b0035
  article-title: Computation of the suffix array, Burrows-Wheeler transform and FM-index in V-order
  publication-title: Theoret. Comput. Sci.
– volume: 67
  start-page: 1737
  year: 2018
  end-page: 1749
  ident: b0115
  article-title: Fast in-place suffix sorting on a multicore computer
  publication-title: IEEE Trans. Comput.
– volume: 71
  start-page: 756
  year: 2021
  end-page: 765
  ident: b0120
  article-title: Building and checking suffix array simultaneously by induced sorting method
  publication-title: IEEE Trans. Comput.
– volume: 118
  start-page: 30
  year: 2017
  end-page: 34
  ident: b0135
  article-title: Optimal suffix sorting and LCP array construction for constant alphabets
  publication-title: Inf. Process. Lett.
– volume: 23
  start-page: 262
  year: 1976
  end-page: 272
  ident: b0155
  article-title: A space-economical suffix tree construction algorithm
  publication-title: J. ACM
– volume: 47
  start-page: 987
  year: 2000
  end-page: 1011
  ident: b0050
  article-title: On the sorting-complexity of suffix tree construction
  publication-title: J. ACM
– volume: 60
  start-page: 1471
  year: 2011
  end-page: 1484
  ident: b0060
  article-title: Two efficient algorithms for linear time suffix array construction
  publication-title: IEEE Trans. Comput.
– volume: 22
  start-page: 935
  year: 1993
  end-page: 948
  ident: b0150
  article-title: Suffix Arrays: A New Method for On-Line String Searches
  publication-title: Suffix arrays: a new method for on-line string searches.
– volume: 9
  start-page: 35
  year: 2018
  ident: b0090
  article-title: Comparison of burrows-wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to illumina data for livestock genomes
  publication-title: Front. Genet.
– volume: 21
  year: 2020
  ident: b0015
  article-title: Opportunities and challenges in long-read sequencing data analysis
  publication-title: Genome Biol.
– volume: 31
  start-page: 1
  year: 2013
  end-page: 15
  ident: b0160
  article-title: Practical linear-time O(1)-workspace suffix sorting for constant alphabets
  publication-title: ACM Trans. Inf. Syst.
– reference: Shukhrov, Y., 2019. Lightweight Massively Parallel Suffix Array Construction.
– year: 2008
  ident: 10.1016/j.jksuci.2022.04.015_b0005
– ident: 10.1016/j.jksuci.2022.04.015_b0125
– volume: 67
  start-page: 1737
  issue: 12
  year: 2018
  ident: 10.1016/j.jksuci.2022.04.015_b0115
  article-title: Fast in-place suffix sorting on a multicore computer
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2018.2842050
– volume: 35
  start-page: 20
  issue: 1
  year: 2019
  ident: 10.1016/j.jksuci.2022.04.015_b0070
  article-title: lordFAST: sensitive and fast alignment search tool for long noisy read sequencing data
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty544
– volume: 11
  start-page: 9
  issue: 1
  year: 2016
  ident: 10.1016/j.jksuci.2022.04.015_b0185
  article-title: Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays
  publication-title: Algorithms Mol. Biol.
  doi: 10.1186/s13015-016-0068-6
– volume: 60
  start-page: 1471
  issue: 10
  year: 2011
  ident: 10.1016/j.jksuci.2022.04.015_b0060
  article-title: Two efficient algorithms for linear time suffix array construction
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2010.188
– volume: 880
  start-page: 82
  year: 2021
  ident: 10.1016/j.jksuci.2022.04.015_b0035
  article-title: Computation of the suffix array, Burrows-Wheeler transform and FM-index in V-order
  publication-title: Theoret. Comput. Sci.
  doi: 10.1016/j.tcs.2021.06.004
– volume: 2009
  start-page: 193
  year: 2009
  ident: 10.1016/j.jksuci.2022.04.015_b0165
  article-title: Linear suffix array construction by almost pure induced-sorting
  publication-title: Data Compression Conference
– volume: 15
  start-page: 138
  issue: 2
  year: 2014
  ident: 10.1016/j.jksuci.2022.04.015_b0175
  article-title: A bioinformatician’s guide to the forefront of suffix array construction algorithms
  publication-title: Brief. Bioinf.
  doi: 10.1093/bib/bbt081
– volume: 47
  start-page: 987
  issue: 6
  year: 2000
  ident: 10.1016/j.jksuci.2022.04.015_b0050
  article-title: On the sorting-complexity of suffix tree construction
  publication-title: J. ACM
  doi: 10.1145/355541.355547
– volume: 31
  start-page: 1
  issue: 3
  year: 2013
  ident: 10.1016/j.jksuci.2022.04.015_b0160
  article-title: Practical linear-time O(1)-workspace suffix sorting for constant alphabets
  publication-title: ACM Trans. Inf. Syst.
  doi: 10.1145/2493175.2493180
– year: 2013
  ident: 10.1016/j.jksuci.2022.04.015_b0140
  article-title: External memory generalized suffix and LCP arrays construction
– year: 2012
  ident: 10.1016/j.jksuci.2022.04.015_b0170
  article-title: Parallel suffix array construction for shared memory architectures
– volume: 33
  start-page: 605
  issue: 9
  year: 2007
  ident: 10.1016/j.jksuci.2022.04.015_b0110
  article-title: Scalable parallel suffix array construction
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2007.06.004
– year: 2003
  ident: 10.1016/j.jksuci.2022.04.015_b0080
  article-title: Simple linear work suffix array construction
– volume: 28
  start-page: 1415
  issue: 11
  year: 2012
  ident: 10.1016/j.jksuci.2022.04.015_b0030
  article-title: Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bts173
– volume: 53
  start-page: 918
  issue: 6
  year: 2006
  ident: 10.1016/j.jksuci.2022.04.015_b0085
  article-title: Linear work suffix array construction
  publication-title: J. ACM
  doi: 10.1145/1217856.1217858
– ident: 10.1016/j.jksuci.2022.04.015_b0180
– volume: 678
  start-page: 22
  year: 2017
  ident: 10.1016/j.jksuci.2022.04.015_b0130
  article-title: Inducing enhanced suffix arrays for string collections
  publication-title: Theoret. Comput. Sci.
  doi: 10.1016/j.tcs.2017.03.039
– volume: 38
  start-page: 2162
  issue: 6
  year: 2009
  ident: 10.1016/j.jksuci.2022.04.015_b0075
  article-title: Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
  publication-title: SIAM J. Comput.
  doi: 10.1137/070685373
– year: 2003
  ident: 10.1016/j.jksuci.2022.04.015_b0100
  article-title: Linear-time construction of suffix arrays
– volume: 21
  issue: 1
  year: 2020
  ident: 10.1016/j.jksuci.2022.04.015_b0015
  article-title: Opportunities and challenges in long-read sequencing data analysis
  publication-title: Genome Biol.
  doi: 10.1186/s13059-020-1935-5
– volume: 118
  start-page: 30
  year: 2017
  ident: 10.1016/j.jksuci.2022.04.015_b0135
  article-title: Optimal suffix sorting and LCP array construction for constant alphabets
  publication-title: Inf. Process. Lett.
  doi: 10.1016/j.ipl.2016.09.010
– volume: 9
  start-page: 35
  issue: 35
  year: 2018
  ident: 10.1016/j.jksuci.2022.04.015_b0090
  article-title: Comparison of burrows-wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to illumina data for livestock genomes
  publication-title: Front. Genet.
  doi: 10.3389/fgene.2018.00035
– year: 2003
  ident: 10.1016/j.jksuci.2022.04.015_b0105
  article-title: Space efficient linear time construction of suffix arrays
– volume: 12
  start-page: 26
  year: 2017
  ident: 10.1016/j.jksuci.2022.04.015_b0145
  article-title: Generalized enhanced suffix array construction in external memory
  publication-title: Algorithms Mol. Biol.
  doi: 10.1186/s13015-017-0117-9
– volume: 23
  start-page: 262
  issue: 2
  year: 1976
  ident: 10.1016/j.jksuci.2022.04.015_b0155
  article-title: A space-economical suffix tree construction algorithm
  publication-title: J. ACM
  doi: 10.1145/321941.321946
– volume: 17
  start-page: 333
  issue: 6
  year: 2016
  ident: 10.1016/j.jksuci.2022.04.015_b0065
  article-title: Coming of age: ten years of next-generation sequencing technologies
  publication-title: Nat. Rev. Genet.
  doi: 10.1038/nrg.2016.49
– volume: 18
  start-page: 37
  year: 2019
  ident: 10.1016/j.jksuci.2022.04.015_b0020
  article-title: Sequencing technologies and analyses: where have we been and where are we going?
  publication-title: iScience
  doi: 10.1016/j.isci.2019.06.035
– volume: 71
  start-page: 756
  issue: 4
  year: 2021
  ident: 10.1016/j.jksuci.2022.04.015_b0120
  article-title: Building and checking suffix array simultaneously by induced sorting method
  publication-title: IEEE Trans. Comput.
  doi: 10.1109/TC.2021.3061709
– ident: 10.1016/j.jksuci.2022.04.015_b0025
– volume: 22
  start-page: 935
  issue: 5
  year: 1993
  ident: 10.1016/j.jksuci.2022.04.015_b0150
  article-title: Suffix Arrays: A New Method for On-Line String Searches
  publication-title: Suffix arrays: a new method for on-line string searches.
– volume: 14
  start-page: 6
  year: 2019
  ident: 10.1016/j.jksuci.2022.04.015_b0040
  article-title: External memory BWT and LCP computation for sequence collections with applications
  publication-title: Algorithms Mol. Biol.
  doi: 10.1186/s13015-019-0140-0
– volume: 24
  start-page: 746
  issue: 8
  year: 2017
  ident: 10.1016/j.jksuci.2022.04.015_b0010
  article-title: Zseq: an approach for preprocessing next-generation sequencing data
  publication-title: J. Comput. Biol.
  doi: 10.1089/cmb.2017.0021
– ident: 10.1016/j.jksuci.2022.04.015_b0095
– year: 1997
  ident: 10.1016/j.jksuci.2022.04.015_b0045
  article-title: Optimal Suffix Tree Construction with Large Alphabets
  publication-title: FOCS.
– ident: 10.1016/j.jksuci.2022.04.015_b0055
SSID ssj0001765436
Score 2.211476
Snippet The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data...
SourceID doaj
unpaywall
crossref
elsevier
SourceType Open Website
Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 4659
SubjectTerms Burrows-Wheeler transform
DNA alphabets
Longest Common Prefix
Suffix arrays
Suffixes
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwELaq9rAnHvsQRQvyYY_rKvErzjG8hJBa7WErsSfLdmwolBQ1qZby63GcpKIrIdhrYjvOzNjzTeL5BoAfTueOikggkTqCaJ44JIxwSAvKmHXeh4dPF-MJv5zSq2t23QM_u1yYrf_34RzW3X25MjMfyWEcSEnrjPIBZx5598FgOvmV_QkxVZyimIWNF-OYIO-2RJcp98YwW54oEPZvOaRPq-JRrf-q-fyVw7nYBeNuqs05k_vRqtIj8_wPi-NH32UP7LTIE2aNqeyDni0-g92uqgNsF_kXkGXQqbKCan6zWM6q2wfocS00i45qtriB5cq52RNUy6Val-H22SSDIW1X26r8CqYX579PL1FbaAEZGosKCcWNSPIodkxobERdtM8SQjXLmXGKe1jBIqqJEsrDQWp9DJco41ySO-GdLCPfQL9YFPYAwJjw3Md4jidpzQRIUpVihoXmqc1zTNQQkE7o0rQs5HUxjLnsjpvdyUZKspaSjKj0UhoCtOn12LBwvNP-pNbnpm3NoR0ueHXIdklKxhIPr-o9CEfUxVwrQmxEUss0TyzWQ5B01iBbONLADD_U7J3HjzbG86H5Hv5vh--g73VujzwgqvRxuw5eAF9_BOk
  priority: 102
  providerName: Unpaywall
Title A fast algorithm for constructing suffix arrays for DNA alphabets
URI https://dx.doi.org/10.1016/j.jksuci.2022.04.015
https://doi.org/10.1016/j.jksuci.2022.04.015
https://doaj.org/article/5577105319204f16ba33e039e5b67e2b
UnpaywallVersion publishedVersion
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2213-1248
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001765436
  issn: 1319-1578
  databaseCode: KQ8
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVESC
  databaseName: ScienceDirect Free and Delayed Access Journal
  customDbUrl:
  eissn: 2213-1248
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001765436
  issn: 1319-1578
  databaseCode: IXB
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVAVX
  databaseName: Springer Nature HAS Fully OA
  customDbUrl:
  eissn: 2213-1248
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001765436
  issn: 1319-1578
  databaseCode: AAJSJ
  dateStart: 0
  isFulltext: true
  titleUrlDefault: https://www.springernature.com
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT-MwELYQHODCa0GUl3zgmiWJ7dg5hpdQJSokthJ7smzHZst2U0RaAf-esZNU5dQ9cEziJNb4S-Yba-YbhM6cLh0VsYhE7khES-4iYYSLtKCMWQc-PGxd3A2y2yHtP7LHhVZfPieskQduDHfOGAcn6JGSxtQlmVaE2JjklumM21T7v28s8oVgKuyucF8zGUqLfJVOArjs6uZCctfz33pmRhAepmlQOvVdcRf8UpDv_-Ke1mfVi_p4U-Pxgvu52UabLW_ERTPfHbRiq1201fVkwO0n-gMVBXaqnmI1fppA3P_nHwZWis2kE4qtnnA9c270jtXrq_qow-WrQYFD0a2203oPDW-uf13eRm2bhMjQREwjoTIjeBknjgmdGuFb7llCqGYlM05lQApYTDVRQgGZoxYiMK6Mc7x0AlwkI_totZpU9gDhhGQlRGgu47nX8SO5ylOWCp3ltixTonqIdEaSptUQ960sxrJLFnuWjWmlN62MqQTT9lA0v-ul0dBYMv7C238-1itghxOAC9niQi7DRQ_xbvVkSyYakgCPGi15_c_5Yv_XfA-_Y75HaMM_sskEPkarAAp7Anxnqk_RWlH0H_qnAeJwNBzcF78_AUFq-4w
linkProvider Directory of Open Access Journals
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwELaq9rAnHvsQRQvyYY_rKvErzjG8hJBa7WErsSfLdmwolBQ1qZby63GcpKIrIdhrYjvOzNjzTeL5BoAfTueOikggkTqCaJ44JIxwSAvKmHXeh4dPF-MJv5zSq2t23QM_u1yYrf_34RzW3X25MjMfyWEcSEnrjPIBZx5598FgOvmV_QkxVZyimIWNF-OYIO-2RJcp98YwW54oEPZvOaRPq-JRrf-q-fyVw7nYBeNuqs05k_vRqtIj8_wPi-NH32UP7LTIE2aNqeyDni0-g92uqgNsF_kXkGXQqbKCan6zWM6q2wfocS00i45qtriB5cq52RNUy6Val-H22SSDIW1X26r8CqYX579PL1FbaAEZGosKCcWNSPIodkxobERdtM8SQjXLmXGKe1jBIqqJEsrDQWp9DJco41ySO-GdLCPfQL9YFPYAwJjw3Md4jidpzQRIUpVihoXmqc1zTNQQkE7o0rQs5HUxjLnsjpvdyUZKspaSjKj0UhoCtOn12LBwvNP-pNbnpm3NoR0ueHXIdklKxhIPr-o9CEfUxVwrQmxEUss0TyzWQ5B01iBbONLADD_U7J3HjzbG86H5Hv5vh--g73VujzwgqvRxuw5eAF9_BOk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+fast+algorithm+for+constructing+suffix+arrays+for+DNA+alphabets&rft.jtitle=Journal+of+King+Saud+University.+Computer+and+information+sciences&rft.au=Rabea%2C+Zeinab&rft.au=El-Metwally%2C+Sara&rft.au=Elmougy%2C+Samir&rft.au=Zakaria%2C+Magdi&rft.pub=Elsevier+B.V&rft.issn=1319-1578&rft.eissn=2213-1248&rft_id=info:doi/10.1016%2Fj.jksuci.2022.04.015&rft.externalDocID=S1319157822001434
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1319-1578&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1319-1578&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1319-1578&client=summon