A fast algorithm for constructing suffix arrays for DNA alphabets
The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length...
        Saved in:
      
    
          | Published in | Journal of King Saud University. Computer and information sciences Vol. 34; no. 7; pp. 4659 - 4668 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier B.V
    
        01.07.2022
     Elsevier  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1319-1578 2213-1248 2213-1248  | 
| DOI | 10.1016/j.jksuci.2022.04.015 | 
Cover
| Abstract | The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length genomes. Building a suffix array itself is an expensive-resource process since the computations are dominant by sorting suffixes in a lexical order. Most of the suffix array construction algorithms consider the general and integer alphabets without utilizing special cases for fixed-size ones such as DNA alphabets. In this paper, we exploit the nature of four-sized DNA alphabets and utilize their predefined lexicographical ordering in order to construct suffix arrays for genomic data correctly and efficiently. The suffix array construction algorithm for DNA alphabets is evaluated using three real data sets with different lengths ranging from small E-coli genome to long length Homo sapiens GRCh38.p13 chromosomes. For long length genomes, their corresponding sequence is divided into parts (i.e. reads) with a minimum overlap length, the suffix array is computed for each part separately, and finally all partially computed arrays are merged together into a single one. We studied the effects of varying the reads/overlap lengths on the running time of the proposed suffix array construction algorithm and conclude that the minimum overlap length should be equal to the average length of the longest common prefix between the adjacent parts. | 
    
|---|---|
| AbstractList | The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length genomes. Building a suffix array itself is an expensive-resource process since the computations are dominant by sorting suffixes in a lexical order. Most of the suffix array construction algorithms consider the general and integer alphabets without utilizing special cases for fixed-size ones such as DNA alphabets. In this paper, we exploit the nature of four-sized DNA alphabets and utilize their predefined lexicographical ordering in order to construct suffix arrays for genomic data correctly and efficiently. The suffix array construction algorithm for DNA alphabets is evaluated using three real data sets with different lengths ranging from small E-coli genome to long length Homo sapiens GRCh38.p13 chromosomes. For long length genomes, their corresponding sequence is divided into parts (i.e. reads) with a minimum overlap length, the suffix array is computed for each part separately, and finally all partially computed arrays are merged together into a single one. We studied the effects of varying the reads/overlap lengths on the running time of the proposed suffix array construction algorithm and conclude that the minimum overlap length should be equal to the average length of the longest common prefix between the adjacent parts. | 
    
| Author | Zakaria, Magdi Elmougy, Samir El-Metwally, Sara Rabea, Zeinab  | 
    
| Author_xml | – sequence: 1 givenname: Zeinab surname: Rabea fullname: Rabea, Zeinab – sequence: 2 givenname: Sara surname: El-Metwally fullname: El-Metwally, Sara email: sarah_almetwally4@mans.edu.eg – sequence: 3 givenname: Samir surname: Elmougy fullname: Elmougy, Samir – sequence: 4 givenname: Magdi surname: Zakaria fullname: Zakaria, Magdi  | 
    
| BookMark | eNqNkE1v1DAQQC1UJJbSf8AhfyBhxh-xlwPSqnxVquACZ2vi2FuHNFnZXuj-e9Km4tADcBppRu9J816ys2mePGOvERoEbN8MzfAjH11sOHDegGwA1TO24RxFjVyaM7ZBgdsalTYv2EXOAwCgbpUU7YbtdlWgXCoa93OK5ea2CnOq3Dzlko6uxGlf5WMI8a6ilOiUH87vv-wW4HBDnS_5FXseaMz-4nGes-8fP3y7_Fxff_10dbm7rp1EU2pDrTO6BwzKdNwZoXDrhZCd6pUL1IJWCmQnyBAXKD0gaHIh6D4Y570S5-xq9fYzDfaQ4i2lk50p2ofFnPaWUolu9FYprRHU8jUHGbDtSAgPYutV12rPu8WlVtdxOtDpF43jHyGCvc9qB7tmtfdZLUi7ZF04uXIuzTknH_4Xe_sEc7FQifNUEsXxX_C7FfZL3J_RJ5td9JPzfUzeleX7-HfBb4Y0qCw | 
    
| CitedBy_id | crossref_primary_10_1016_j_ipm_2024_103777 | 
    
| Cites_doi | 10.1109/TC.2018.2842050 10.1093/bioinformatics/bty544 10.1186/s13015-016-0068-6 10.1109/TC.2010.188 10.1016/j.tcs.2021.06.004 10.1093/bib/bbt081 10.1145/355541.355547 10.1145/2493175.2493180 10.1016/j.parco.2007.06.004 10.1093/bioinformatics/bts173 10.1145/1217856.1217858 10.1016/j.tcs.2017.03.039 10.1137/070685373 10.1186/s13059-020-1935-5 10.1016/j.ipl.2016.09.010 10.3389/fgene.2018.00035 10.1186/s13015-017-0117-9 10.1145/321941.321946 10.1038/nrg.2016.49 10.1016/j.isci.2019.06.035 10.1109/TC.2021.3061709 10.1186/s13015-019-0140-0 10.1089/cmb.2017.0021  | 
    
| ContentType | Journal Article | 
    
| Copyright | 2022 The Author(s) | 
    
| Copyright_xml | – notice: 2022 The Author(s) | 
    
| DBID | 6I. AAFTH AAYXX CITATION ADTOC UNPAY DOA  | 
    
| DOI | 10.1016/j.jksuci.2022.04.015 | 
    
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals  | 
    
| DatabaseTitle | CrossRef | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISSN | 2213-1248 | 
    
| EndPage | 4668 | 
    
| ExternalDocumentID | oai_doaj_org_article_5577105319204f16ba33e039e5b67e2b 10.1016/j.jksuci.2022.04.015 10_1016_j_jksuci_2022_04_015 S1319157822001434  | 
    
| GroupedDBID | --K 0R~ 0SF 4.4 457 5VS 6I. AACTN AAEDT AAEDW AAFTH AAIKJ AALRI AAQXK AAXUO ABMAC ACGFS ADBBV ADEZE AEXQZ AFTJW AGHFR AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ ASPBG AVWKF AZFZN BCNDV EBS EJD FDB FEDTE FGOYB GROUPED_DOAJ HVGLF HZ~ IPNFZ IXB KQ8 M41 NCXOZ O-L O9- OK1 R2- RIG ROL SES SSZ XH2 AAJSJ AASML AAYWO AAYXX ABEEZ ABWVN ACULB ADVLN AFGXO AFJKZ AGQPQ APXCP BGLVJ C6C CCPQU CITATION K7- PHGZM PHGZT PIMPY PQGLB SOJ ADTOC UNPAY  | 
    
| ID | FETCH-LOGICAL-c418t-8a6c87d01f58b2c83519e334b5d5cfa6075504b3a8a2314e0107acff7df8cee53 | 
    
| IEDL.DBID | DOA | 
    
| ISSN | 1319-1578 2213-1248  | 
    
| IngestDate | Fri Oct 03 12:50:38 EDT 2025 Tue Aug 19 20:22:17 EDT 2025 Thu Apr 24 23:03:18 EDT 2025 Wed Oct 01 05:02:25 EDT 2025 Fri Feb 23 02:40:36 EST 2024  | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 7 | 
    
| Keywords | Suffixes Longest Common Prefix Suffix arrays DNA alphabets Burrows-Wheeler transform  | 
    
| Language | English | 
    
| License | This is an open access article under the CC BY-NC-ND license. cc-by-nc-nd  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c418t-8a6c87d01f58b2c83519e334b5d5cfa6075504b3a8a2314e0107acff7df8cee53 | 
    
| OpenAccessLink | https://doaj.org/article/5577105319204f16ba33e039e5b67e2b | 
    
| PageCount | 10 | 
    
| ParticipantIDs | doaj_primary_oai_doaj_org_article_5577105319204f16ba33e039e5b67e2b unpaywall_primary_10_1016_j_jksuci_2022_04_015 crossref_primary_10_1016_j_jksuci_2022_04_015 crossref_citationtrail_10_1016_j_jksuci_2022_04_015 elsevier_sciencedirect_doi_10_1016_j_jksuci_2022_04_015  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2022-07-00 2022-07-01  | 
    
| PublicationDateYYYYMMDD | 2022-07-01 | 
    
| PublicationDate_xml | – month: 07 year: 2022 text: 2022-07-00  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | Journal of King Saud University. Computer and information sciences | 
    
| PublicationYear | 2022 | 
    
| Publisher | Elsevier B.V Elsevier  | 
    
| Publisher_xml | – name: Elsevier B.V – name: Elsevier  | 
    
| References | Louza, Gog, Telles (b0135) 2017; 118 Nong, Zhang, Chan (b0165) 2009; 2009 Kärkkäinen, Sanders (b0080) 2003 Manber, Myers (b0150) 1993; 22 Ge, Sen, Wai Hong (b0060) 2011; 60 Adjeroh, Bell, Mukherjee (b0005) 2008 Alkhateeb, Rueda (b0010) 2017; 24 Farach-Colton (b0045) 1997 Louza, Telles, Ciferri (b0140) 2013 Cox, Bauer, Jakobi, Rosone (b0030) 2012; 28 Ko, Aluru (b0105) 2003 Keel, Snelling (b0090) 2018; 9 Kempa, D., Kociumaka, T. 2021. Breaking the $ O (n) $-barrier in the construction of compressed suffix arrays. arXiv preprint arXiv:2106.12725. McCreight (b0155) 1976; 23 Hon, Sadakane, Sung (b0075) 2009; 38 Shukhrov, Y., 2019. Lightweight Massively Parallel Suffix Array Construction. Wu (b0185) 2016; 11 Cenzato, D., Z. Lipták, 2022. A theoretical and experimental analysis of BWT variants for string collections. arXiv preprint arXiv:2202.13235. Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997. Goodwin, McPherson, McCombie (b0065) 2016; 17 Nong (b0160) 2013; 31 Futamura, N., Aluru, S., Kurtz, S. 2001. Parallel suffix sorting. Louza, Gog, Telles (b0130) 2017; 678 Farach-Colton, Ferragina, Muthukrishnan (b0050) 2000; 47 Daykin, Mhaskar, Smyth (b0035) 2021; 880 Haghshenas, Sahinalp, Hach, Berger (b0070) 2019; 35 Louza, Telles, Hoffmann (b0145) 2017; 12 Lao, Nong, Chan, Xie (b0115) 2018; 67 Kim, Sim, Park (b0100) 2003 Kulla, Sanders (b0110) 2007; 33 Amarasinghe, Su, Dong, Zappia, Ritchie, Gouil (b0015) 2020; 21 Bansal, Boucher (b0020) 2019; 18 Egidi, Louza, Manzini (b0040) 2019; 14 Osipov (b0170) 2012 Lao, Wu, Nong, Chan (b0120) 2021; 71 Kärkkäinen, Sanders, Burkhardt (b0085) 2006; 53 Shrestha, Frith, Horton (b0175) 2014; 15 10.1016/j.jksuci.2022.04.015_b0125 Louza (10.1016/j.jksuci.2022.04.015_b0135) 2017; 118 Adjeroh (10.1016/j.jksuci.2022.04.015_b0005) 2008 10.1016/j.jksuci.2022.04.015_b0025 Farach-Colton (10.1016/j.jksuci.2022.04.015_b0045) 1997 Haghshenas (10.1016/j.jksuci.2022.04.015_b0070) 2019; 35 McCreight (10.1016/j.jksuci.2022.04.015_b0155) 1976; 23 Farach-Colton (10.1016/j.jksuci.2022.04.015_b0050) 2000; 47 Hon (10.1016/j.jksuci.2022.04.015_b0075) 2009; 38 Amarasinghe (10.1016/j.jksuci.2022.04.015_b0015) 2020; 21 Nong (10.1016/j.jksuci.2022.04.015_b0165) 2009; 2009 10.1016/j.jksuci.2022.04.015_b0180 Kim (10.1016/j.jksuci.2022.04.015_b0100) 2003 Ge (10.1016/j.jksuci.2022.04.015_b0060) 2011; 60 Egidi (10.1016/j.jksuci.2022.04.015_b0040) 2019; 14 Keel (10.1016/j.jksuci.2022.04.015_b0090) 2018; 9 Bansal (10.1016/j.jksuci.2022.04.015_b0020) 2019; 18 Kulla (10.1016/j.jksuci.2022.04.015_b0110) 2007; 33 Cox (10.1016/j.jksuci.2022.04.015_b0030) 2012; 28 Louza (10.1016/j.jksuci.2022.04.015_b0145) 2017; 12 10.1016/j.jksuci.2022.04.015_b0055 Alkhateeb (10.1016/j.jksuci.2022.04.015_b0010) 2017; 24 Louza (10.1016/j.jksuci.2022.04.015_b0130) 2017; 678 Osipov (10.1016/j.jksuci.2022.04.015_b0170) 2012 Goodwin (10.1016/j.jksuci.2022.04.015_b0065) 2016; 17 Manber (10.1016/j.jksuci.2022.04.015_b0150) 1993; 22 10.1016/j.jksuci.2022.04.015_b0095 Nong (10.1016/j.jksuci.2022.04.015_b0160) 2013; 31 Ko (10.1016/j.jksuci.2022.04.015_b0105) 2003 Shrestha (10.1016/j.jksuci.2022.04.015_b0175) 2014; 15 Lao (10.1016/j.jksuci.2022.04.015_b0115) 2018; 67 Kärkkäinen (10.1016/j.jksuci.2022.04.015_b0080) 2003 Wu (10.1016/j.jksuci.2022.04.015_b0185) 2016; 11 Lao (10.1016/j.jksuci.2022.04.015_b0120) 2021; 71 Louza (10.1016/j.jksuci.2022.04.015_b0140) 2013 Kärkkäinen (10.1016/j.jksuci.2022.04.015_b0085) 2006; 53 Daykin (10.1016/j.jksuci.2022.04.015_b0035) 2021; 880  | 
    
| References_xml | – volume: 11 start-page: 9 year: 2016 ident: b0185 article-title: Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays publication-title: Algorithms Mol. Biol. – reference: Futamura, N., Aluru, S., Kurtz, S. 2001. Parallel suffix sorting. – volume: 2009 start-page: 193 year: 2009 end-page: 202 ident: b0165 article-title: Linear suffix array construction by almost pure induced-sorting publication-title: Data Compression Conference – reference: Kempa, D., Kociumaka, T. 2021. Breaking the $ O (n) $-barrier in the construction of compressed suffix arrays. arXiv preprint arXiv:2106.12725. – year: 2003 ident: b0080 article-title: Simple linear work suffix array construction publication-title: International colloquium on automata, languages, and programming – volume: 678 start-page: 22 year: 2017 end-page: 39 ident: b0130 article-title: Inducing enhanced suffix arrays for string collections publication-title: Theoret. Comput. Sci. – volume: 15 start-page: 138 year: 2014 end-page: 154 ident: b0175 article-title: A bioinformatician’s guide to the forefront of suffix array construction algorithms publication-title: Brief. Bioinf. – reference: Cenzato, D., Z. Lipták, 2022. A theoretical and experimental analysis of BWT variants for string collections. arXiv preprint arXiv:2202.13235. – year: 1997 ident: b0045 article-title: Optimal Suffix Tree Construction with Large Alphabets publication-title: FOCS. – volume: 24 start-page: 746 year: 2017 end-page: 755 ident: b0010 article-title: Zseq: an approach for preprocessing next-generation sequencing data publication-title: J. Comput. Biol. – volume: 18 start-page: 37 year: 2019 end-page: 41 ident: b0020 article-title: Sequencing technologies and analyses: where have we been and where are we going? publication-title: iScience – volume: 14 start-page: 6 year: 2019 ident: b0040 article-title: External memory BWT and LCP computation for sequence collections with applications publication-title: Algorithms Mol. Biol. – year: 2012 ident: b0170 article-title: Parallel suffix array construction for shared memory architectures publication-title: International Symposium on String Processing and Information Retrieval – year: 2008 ident: b0005 article-title: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching – volume: 28 start-page: 1415 year: 2012 end-page: 1419 ident: b0030 article-title: Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform publication-title: Bioinformatics – year: 2003 ident: b0105 article-title: Space efficient linear time construction of suffix arrays publication-title: Annual Symposium on Combinatorial Pattern Matching – volume: 33 start-page: 605 year: 2007 end-page: 612 ident: b0110 article-title: Scalable parallel suffix array construction publication-title: Parallel Comput. – reference: Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997. – year: 2003 ident: b0100 article-title: Linear-time construction of suffix arrays publication-title: Annual Symposium on Combinatorial Pattern Matching – volume: 53 start-page: 918 year: 2006 end-page: 936 ident: b0085 article-title: Linear work suffix array construction publication-title: J. ACM – volume: 12 start-page: 26 year: 2017 ident: b0145 article-title: Generalized enhanced suffix array construction in external memory publication-title: Algorithms Mol. Biol. – volume: 17 start-page: 333 year: 2016 end-page: 351 ident: b0065 article-title: Coming of age: ten years of next-generation sequencing technologies publication-title: Nat. Rev. Genet. – volume: 38 start-page: 2162 year: 2009 end-page: 2178 ident: b0075 article-title: Breaking a Time-and-Space Barrier in Constructing Full-Text Indices publication-title: SIAM J. Comput. – year: 2013 ident: b0140 article-title: External memory generalized suffix and LCP arrays construction publication-title: Annual Symposium on Combinatorial Pattern Matching – volume: 35 start-page: 20 year: 2019 end-page: 27 ident: b0070 article-title: lordFAST: sensitive and fast alignment search tool for long noisy read sequencing data publication-title: Bioinformatics – volume: 880 start-page: 82 year: 2021 end-page: 96 ident: b0035 article-title: Computation of the suffix array, Burrows-Wheeler transform and FM-index in V-order publication-title: Theoret. Comput. Sci. – volume: 67 start-page: 1737 year: 2018 end-page: 1749 ident: b0115 article-title: Fast in-place suffix sorting on a multicore computer publication-title: IEEE Trans. Comput. – volume: 71 start-page: 756 year: 2021 end-page: 765 ident: b0120 article-title: Building and checking suffix array simultaneously by induced sorting method publication-title: IEEE Trans. Comput. – volume: 118 start-page: 30 year: 2017 end-page: 34 ident: b0135 article-title: Optimal suffix sorting and LCP array construction for constant alphabets publication-title: Inf. Process. Lett. – volume: 23 start-page: 262 year: 1976 end-page: 272 ident: b0155 article-title: A space-economical suffix tree construction algorithm publication-title: J. ACM – volume: 47 start-page: 987 year: 2000 end-page: 1011 ident: b0050 article-title: On the sorting-complexity of suffix tree construction publication-title: J. ACM – volume: 60 start-page: 1471 year: 2011 end-page: 1484 ident: b0060 article-title: Two efficient algorithms for linear time suffix array construction publication-title: IEEE Trans. Comput. – volume: 22 start-page: 935 year: 1993 end-page: 948 ident: b0150 article-title: Suffix Arrays: A New Method for On-Line String Searches publication-title: Suffix arrays: a new method for on-line string searches. – volume: 9 start-page: 35 year: 2018 ident: b0090 article-title: Comparison of burrows-wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to illumina data for livestock genomes publication-title: Front. Genet. – volume: 21 year: 2020 ident: b0015 article-title: Opportunities and challenges in long-read sequencing data analysis publication-title: Genome Biol. – volume: 31 start-page: 1 year: 2013 end-page: 15 ident: b0160 article-title: Practical linear-time O(1)-workspace suffix sorting for constant alphabets publication-title: ACM Trans. Inf. Syst. – reference: Shukhrov, Y., 2019. Lightweight Massively Parallel Suffix Array Construction. – year: 2008 ident: 10.1016/j.jksuci.2022.04.015_b0005 – ident: 10.1016/j.jksuci.2022.04.015_b0125 – volume: 67 start-page: 1737 issue: 12 year: 2018 ident: 10.1016/j.jksuci.2022.04.015_b0115 article-title: Fast in-place suffix sorting on a multicore computer publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2018.2842050 – volume: 35 start-page: 20 issue: 1 year: 2019 ident: 10.1016/j.jksuci.2022.04.015_b0070 article-title: lordFAST: sensitive and fast alignment search tool for long noisy read sequencing data publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty544 – volume: 11 start-page: 9 issue: 1 year: 2016 ident: 10.1016/j.jksuci.2022.04.015_b0185 article-title: Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays publication-title: Algorithms Mol. Biol. doi: 10.1186/s13015-016-0068-6 – volume: 60 start-page: 1471 issue: 10 year: 2011 ident: 10.1016/j.jksuci.2022.04.015_b0060 article-title: Two efficient algorithms for linear time suffix array construction publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2010.188 – volume: 880 start-page: 82 year: 2021 ident: 10.1016/j.jksuci.2022.04.015_b0035 article-title: Computation of the suffix array, Burrows-Wheeler transform and FM-index in V-order publication-title: Theoret. Comput. Sci. doi: 10.1016/j.tcs.2021.06.004 – volume: 2009 start-page: 193 year: 2009 ident: 10.1016/j.jksuci.2022.04.015_b0165 article-title: Linear suffix array construction by almost pure induced-sorting publication-title: Data Compression Conference – volume: 15 start-page: 138 issue: 2 year: 2014 ident: 10.1016/j.jksuci.2022.04.015_b0175 article-title: A bioinformatician’s guide to the forefront of suffix array construction algorithms publication-title: Brief. Bioinf. doi: 10.1093/bib/bbt081 – volume: 47 start-page: 987 issue: 6 year: 2000 ident: 10.1016/j.jksuci.2022.04.015_b0050 article-title: On the sorting-complexity of suffix tree construction publication-title: J. ACM doi: 10.1145/355541.355547 – volume: 31 start-page: 1 issue: 3 year: 2013 ident: 10.1016/j.jksuci.2022.04.015_b0160 article-title: Practical linear-time O(1)-workspace suffix sorting for constant alphabets publication-title: ACM Trans. Inf. Syst. doi: 10.1145/2493175.2493180 – year: 2013 ident: 10.1016/j.jksuci.2022.04.015_b0140 article-title: External memory generalized suffix and LCP arrays construction – year: 2012 ident: 10.1016/j.jksuci.2022.04.015_b0170 article-title: Parallel suffix array construction for shared memory architectures – volume: 33 start-page: 605 issue: 9 year: 2007 ident: 10.1016/j.jksuci.2022.04.015_b0110 article-title: Scalable parallel suffix array construction publication-title: Parallel Comput. doi: 10.1016/j.parco.2007.06.004 – year: 2003 ident: 10.1016/j.jksuci.2022.04.015_b0080 article-title: Simple linear work suffix array construction – volume: 28 start-page: 1415 issue: 11 year: 2012 ident: 10.1016/j.jksuci.2022.04.015_b0030 article-title: Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform publication-title: Bioinformatics doi: 10.1093/bioinformatics/bts173 – volume: 53 start-page: 918 issue: 6 year: 2006 ident: 10.1016/j.jksuci.2022.04.015_b0085 article-title: Linear work suffix array construction publication-title: J. ACM doi: 10.1145/1217856.1217858 – ident: 10.1016/j.jksuci.2022.04.015_b0180 – volume: 678 start-page: 22 year: 2017 ident: 10.1016/j.jksuci.2022.04.015_b0130 article-title: Inducing enhanced suffix arrays for string collections publication-title: Theoret. Comput. Sci. doi: 10.1016/j.tcs.2017.03.039 – volume: 38 start-page: 2162 issue: 6 year: 2009 ident: 10.1016/j.jksuci.2022.04.015_b0075 article-title: Breaking a Time-and-Space Barrier in Constructing Full-Text Indices publication-title: SIAM J. Comput. doi: 10.1137/070685373 – year: 2003 ident: 10.1016/j.jksuci.2022.04.015_b0100 article-title: Linear-time construction of suffix arrays – volume: 21 issue: 1 year: 2020 ident: 10.1016/j.jksuci.2022.04.015_b0015 article-title: Opportunities and challenges in long-read sequencing data analysis publication-title: Genome Biol. doi: 10.1186/s13059-020-1935-5 – volume: 118 start-page: 30 year: 2017 ident: 10.1016/j.jksuci.2022.04.015_b0135 article-title: Optimal suffix sorting and LCP array construction for constant alphabets publication-title: Inf. Process. Lett. doi: 10.1016/j.ipl.2016.09.010 – volume: 9 start-page: 35 issue: 35 year: 2018 ident: 10.1016/j.jksuci.2022.04.015_b0090 article-title: Comparison of burrows-wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to illumina data for livestock genomes publication-title: Front. Genet. doi: 10.3389/fgene.2018.00035 – year: 2003 ident: 10.1016/j.jksuci.2022.04.015_b0105 article-title: Space efficient linear time construction of suffix arrays – volume: 12 start-page: 26 year: 2017 ident: 10.1016/j.jksuci.2022.04.015_b0145 article-title: Generalized enhanced suffix array construction in external memory publication-title: Algorithms Mol. Biol. doi: 10.1186/s13015-017-0117-9 – volume: 23 start-page: 262 issue: 2 year: 1976 ident: 10.1016/j.jksuci.2022.04.015_b0155 article-title: A space-economical suffix tree construction algorithm publication-title: J. ACM doi: 10.1145/321941.321946 – volume: 17 start-page: 333 issue: 6 year: 2016 ident: 10.1016/j.jksuci.2022.04.015_b0065 article-title: Coming of age: ten years of next-generation sequencing technologies publication-title: Nat. Rev. Genet. doi: 10.1038/nrg.2016.49 – volume: 18 start-page: 37 year: 2019 ident: 10.1016/j.jksuci.2022.04.015_b0020 article-title: Sequencing technologies and analyses: where have we been and where are we going? publication-title: iScience doi: 10.1016/j.isci.2019.06.035 – volume: 71 start-page: 756 issue: 4 year: 2021 ident: 10.1016/j.jksuci.2022.04.015_b0120 article-title: Building and checking suffix array simultaneously by induced sorting method publication-title: IEEE Trans. Comput. doi: 10.1109/TC.2021.3061709 – ident: 10.1016/j.jksuci.2022.04.015_b0025 – volume: 22 start-page: 935 issue: 5 year: 1993 ident: 10.1016/j.jksuci.2022.04.015_b0150 article-title: Suffix Arrays: A New Method for On-Line String Searches publication-title: Suffix arrays: a new method for on-line string searches. – volume: 14 start-page: 6 year: 2019 ident: 10.1016/j.jksuci.2022.04.015_b0040 article-title: External memory BWT and LCP computation for sequence collections with applications publication-title: Algorithms Mol. Biol. doi: 10.1186/s13015-019-0140-0 – volume: 24 start-page: 746 issue: 8 year: 2017 ident: 10.1016/j.jksuci.2022.04.015_b0010 article-title: Zseq: an approach for preprocessing next-generation sequencing data publication-title: J. Comput. Biol. doi: 10.1089/cmb.2017.0021 – ident: 10.1016/j.jksuci.2022.04.015_b0095 – year: 1997 ident: 10.1016/j.jksuci.2022.04.015_b0045 article-title: Optimal Suffix Tree Construction with Large Alphabets publication-title: FOCS. – ident: 10.1016/j.jksuci.2022.04.015_b0055  | 
    
| SSID | ssj0001765436 | 
    
| Score | 2.211476 | 
    
| Snippet | The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data... | 
    
| SourceID | doaj unpaywall crossref elsevier  | 
    
| SourceType | Open Website Open Access Repository Enrichment Source Index Database Publisher  | 
    
| StartPage | 4659 | 
    
| SubjectTerms | Burrows-Wheeler transform DNA alphabets Longest Common Prefix Suffix arrays Suffixes  | 
    
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwELaq9rAnHvsQRQvyYY_rKvErzjG8hJBa7WErsSfLdmwolBQ1qZby63GcpKIrIdhrYjvOzNjzTeL5BoAfTueOikggkTqCaJ44JIxwSAvKmHXeh4dPF-MJv5zSq2t23QM_u1yYrf_34RzW3X25MjMfyWEcSEnrjPIBZx5598FgOvmV_QkxVZyimIWNF-OYIO-2RJcp98YwW54oEPZvOaRPq-JRrf-q-fyVw7nYBeNuqs05k_vRqtIj8_wPi-NH32UP7LTIE2aNqeyDni0-g92uqgNsF_kXkGXQqbKCan6zWM6q2wfocS00i45qtriB5cq52RNUy6Val-H22SSDIW1X26r8CqYX579PL1FbaAEZGosKCcWNSPIodkxobERdtM8SQjXLmXGKe1jBIqqJEsrDQWp9DJco41ySO-GdLCPfQL9YFPYAwJjw3Md4jidpzQRIUpVihoXmqc1zTNQQkE7o0rQs5HUxjLnsjpvdyUZKspaSjKj0UhoCtOn12LBwvNP-pNbnpm3NoR0ueHXIdklKxhIPr-o9CEfUxVwrQmxEUss0TyzWQ5B01iBbONLADD_U7J3HjzbG86H5Hv5vh--g73VujzwgqvRxuw5eAF9_BOk priority: 102 providerName: Unpaywall  | 
    
| Title | A fast algorithm for constructing suffix arrays for DNA alphabets | 
    
| URI | https://dx.doi.org/10.1016/j.jksuci.2022.04.015 https://doi.org/10.1016/j.jksuci.2022.04.015 https://doaj.org/article/5577105319204f16ba33e039e5b67e2b  | 
    
| UnpaywallVersion | publishedVersion | 
    
| Volume | 34 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2213-1248 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001765436 issn: 1319-1578 databaseCode: KQ8 dateStart: 19960101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVESC databaseName: ScienceDirect Free and Delayed Access Journal customDbUrl: eissn: 2213-1248 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001765436 issn: 1319-1578 databaseCode: IXB dateStart: 19960101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVAVX databaseName: Springer Nature HAS Fully OA customDbUrl: eissn: 2213-1248 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001765436 issn: 1319-1578 databaseCode: AAJSJ dateStart: 0 isFulltext: true titleUrlDefault: https://www.springernature.com providerName: Springer Nature  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT-MwELYQHODCa0GUl3zgmiWJ7dg5hpdQJSokthJ7smzHZst2U0RaAf-esZNU5dQ9cEziJNb4S-Yba-YbhM6cLh0VsYhE7khES-4iYYSLtKCMWQc-PGxd3A2y2yHtP7LHhVZfPieskQduDHfOGAcn6JGSxtQlmVaE2JjklumM21T7v28s8oVgKuyucF8zGUqLfJVOArjs6uZCctfz33pmRhAepmlQOvVdcRf8UpDv_-Ke1mfVi_p4U-Pxgvu52UabLW_ERTPfHbRiq1201fVkwO0n-gMVBXaqnmI1fppA3P_nHwZWis2kE4qtnnA9c270jtXrq_qow-WrQYFD0a2203oPDW-uf13eRm2bhMjQREwjoTIjeBknjgmdGuFb7llCqGYlM05lQApYTDVRQgGZoxYiMK6Mc7x0AlwkI_totZpU9gDhhGQlRGgu47nX8SO5ylOWCp3ltixTonqIdEaSptUQ960sxrJLFnuWjWmlN62MqQTT9lA0v-ul0dBYMv7C238-1itghxOAC9niQi7DRQ_xbvVkSyYakgCPGi15_c_5Yv_XfA-_Y75HaMM_sskEPkarAAp7Anxnqk_RWlH0H_qnAeJwNBzcF78_AUFq-4w | 
    
| linkProvider | Directory of Open Access Journals | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT-MwELaq9rAnHvsQRQvyYY_rKvErzjG8hJBa7WErsSfLdmwolBQ1qZby63GcpKIrIdhrYjvOzNjzTeL5BoAfTueOikggkTqCaJ44JIxwSAvKmHXeh4dPF-MJv5zSq2t23QM_u1yYrf_34RzW3X25MjMfyWEcSEnrjPIBZx5598FgOvmV_QkxVZyimIWNF-OYIO-2RJcp98YwW54oEPZvOaRPq-JRrf-q-fyVw7nYBeNuqs05k_vRqtIj8_wPi-NH32UP7LTIE2aNqeyDni0-g92uqgNsF_kXkGXQqbKCan6zWM6q2wfocS00i45qtriB5cq52RNUy6Val-H22SSDIW1X26r8CqYX579PL1FbaAEZGosKCcWNSPIodkxobERdtM8SQjXLmXGKe1jBIqqJEsrDQWp9DJco41ySO-GdLCPfQL9YFPYAwJjw3Md4jidpzQRIUpVihoXmqc1zTNQQkE7o0rQs5HUxjLnsjpvdyUZKspaSjKj0UhoCtOn12LBwvNP-pNbnpm3NoR0ueHXIdklKxhIPr-o9CEfUxVwrQmxEUss0TyzWQ5B01iBbONLADD_U7J3HjzbG86H5Hv5vh--g73VujzwgqvRxuw5eAF9_BOk | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+fast+algorithm+for+constructing+suffix+arrays+for+DNA+alphabets&rft.jtitle=Journal+of+King+Saud+University.+Computer+and+information+sciences&rft.au=Rabea%2C+Zeinab&rft.au=El-Metwally%2C+Sara&rft.au=Elmougy%2C+Samir&rft.au=Zakaria%2C+Magdi&rft.pub=Elsevier+B.V&rft.issn=1319-1578&rft.eissn=2213-1248&rft_id=info:doi/10.1016%2Fj.jksuci.2022.04.015&rft.externalDocID=S1319157822001434 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1319-1578&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1319-1578&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1319-1578&client=summon |