Fast, parallel, and cache-friendly suffix array construction
Purpose String indexes such as the suffix array ( sa ) and the closely related longest common prefix ( lcp ) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are know...
Saved in:
| Published in | Algorithms for molecular biology Vol. 19; no. 1; p. 16 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
BioMed Central
28.04.2024
BioMed Central Ltd Springer Nature B.V BMC |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1748-7188 1748-7188 |
| DOI | 10.1186/s13015-024-00263-5 |
Cover
| Summary: | Purpose
String indexes such as the suffix array (
sa
) and the closely related longest common prefix (
lcp
) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize.
Methods
In this paper we present
caps-sa
, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort and utilizing an LCP-informed mergesort. Due to its design,
caps-sa
has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies.
Results
We show that despite its simple design,
caps-sa
outperforms existing state-of-the-art parallel
sa
and
lcp
-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context
sa
and show that
caps-sa
can easily be extended to exploit this structure to obtain further speedups. We make our code publicly available at
https://github.com/jamshed/CaPS-SA
. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1748-7188 1748-7188 |
| DOI: | 10.1186/s13015-024-00263-5 |