Efficient Computation of Sequence Mappability
Sequence mappability is an important task in genome resequencing. In the ( k , m )-mappability problem, for a given sequence T of length n , the goal is to compute a table whose i th entry is the number of indices j ≠ i such that the length- m substrings of T starting at positions i and j have at m...
Saved in:
| Published in | Algorithmica Vol. 84; no. 5; pp. 1418 - 1440 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.05.2022
Springer Nature B.V Springer Verlag |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0178-4617 1432-0541 1432-0541 |
| DOI | 10.1007/s00453-022-00934-y |
Cover
| Summary: | Sequence mappability is an important task in genome resequencing. In the (
k
,
m
)-mappability problem, for a given sequence
T
of length
n
, the goal is to compute a table whose
i
th entry is the number of indices
j
≠
i
such that the length-
m
substrings of
T
starting at positions
i
and
j
have at most
k
mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of
k
=
1
. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for
k
=
O
(
1
)
, works in
O
(
n
)
space and, with high probability, in
O
(
n
·
min
{
m
k
,
log
k
n
}
)
time. Our algorithm requires a careful adaptation of the
k
-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop
O
(
n
2
)
-time algorithms to compute
all
(
k
,
m
)-mappability tables for a fixed
m
and all
k
∈
{
0
,
…
,
m
}
or a fixed
k
and all
m
∈
{
k
,
…
,
n
}
. Finally, we show that, for
k
,
m
=
Θ
(
log
n
)
, the (
k
,
m
)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0178-4617 1432-0541 1432-0541 |
| DOI: | 10.1007/s00453-022-00934-y |