Space–time trade-offs for finding shortest unique substrings and maximal unique matches

Given a string X[1,n] and a position k∈[1,n], a Shortest Unique Substring of X covering k, denoted by Sk, is a substring X[i,j] of X which satisfies the following conditions: (i)i≤k≤j, (ii)i is the only position where there is an occurrence of X[i,j], and (iii)j−i is minimized. Current best-known al...

Full description

Saved in:
Bibliographic Details
Published inTheoretical computer science Vol. 700; pp. 75 - 88
Main Authors Ganguly, Arnab, Hon, Wing-Kai, Shah, Rahul, Thankachan, Sharma V.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 14.11.2017
Subjects
Online AccessGet full text
ISSN0304-3975
1879-2294
DOI10.1016/j.tcs.2017.08.002

Cover

More Information
Summary:Given a string X[1,n] and a position k∈[1,n], a Shortest Unique Substring of X covering k, denoted by Sk, is a substring X[i,j] of X which satisfies the following conditions: (i)i≤k≤j, (ii)i is the only position where there is an occurrence of X[i,j], and (iii)j−i is minimized. Current best-known algorithms for finding Sk require Θ(n) words of working space, and O(n) time. Let τ be a given parameter. We present the following new results.•Given a k∈[1,n], we can compute Sk in O(nτ2log⁡nτ) time using X and an additional O(n/τ) words of working space.•For every k∈[1,n], we can compute Sk in O(nτ2log⁡n) time using X, and an additional O(n/τ) words and 4n+o(n) bits of working space.•We present an O(nτlogc+1⁡n)-time randomized algorithm that uses n/logc⁡n words in addition to that mentioned above, where c≥0 is an arbitrary constant. In this case, the reported string is unique and covers k, but, with probability at most n−O(1), may not be the shortest. By choosing τ=ω(1), our results imply the first sub-linear space (in addition to the input string) solution to these problems. We also present the following two results.•An algorithm that finds Sk for every k∈[1,n] using O(nlog⁡σ) bits of working space in O(nlog⁡n) time, where σ≤n is the number of distinct symbols in X.•A 4n+o(n)-bit index that can report Sk for any k in O(1) time. As a consequence of our techniques, we also obtain similar space-and-time tradeoffs for a related problem of finding Maximal Unique Matches of two strings (Delcher et al., 1999 [14]).
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2017.08.002