Succinct suffix sorting in external memory
•This research puts forward a novel algorithm called nSAIS for inducing suffix array using external memory.•The time, space and IO complexities of nSAIS are linearly proportional to the input size.•The constant factor for the space complexity of nSAIS is not more than 6.82.•A program of the algorith...
Saved in:
| Published in | Information processing & management Vol. 58; no. 1; p. 102378 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Oxford
Elsevier Ltd
01.01.2021
Elsevier Science Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0306-4573 1873-5371 |
| DOI | 10.1016/j.ipm.2020.102378 |
Cover
| Summary: | •This research puts forward a novel algorithm called nSAIS for inducing suffix array using external memory.•The time, space and IO complexities of nSAIS are linearly proportional to the input size.•The constant factor for the space complexity of nSAIS is not more than 6.82.•A program of the algorithm nSAIS is engineered for performance evaluation.•nSAIS is rather general for the datasets of different sizes and characteristics.
Given a size-N input string X, a number of algorithms have been proposed to sort the suffixes of X into the output suffix array using the inducing methods. While the existing algorithms eSAIS, DSAIS, and fSAIS presented remarkable time and space results for suffix sorting in external memory, there are still potentials for further improvements. We propose here a new algorithm called nSAIS by reinventing the core inducing procedure in DSAIS with a new set of data structures for running faster and using less space. The suffix array is computed recursively and the inducing procedure on each recursion level is performed block by block to facilitate sequential I/Os. If X has a byte-alphabet and N=O(M2/B), where M and B are the sizes of internal memory and I/O block, respectively, nSAIS guarantees a workspace less than N bytes besides input and output while keeping the linear I/O volume O(N) which is the best known so far for external-memory inducing methods. Our experiments on typical settings show that, our program for nSAIS with 40-bit integers not only runs faster than the existing representative external memory algorithms when N keeps growing, but also always uses the least disk space around 6.1 bytes on average. The techniques proposed by this study can be utilized to develop fast and succinct suffix sorters in external memory. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0306-4573 1873-5371 |
| DOI: | 10.1016/j.ipm.2020.102378 |