A safe and complete algorithm for metagenomic assembly
Background Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally form...
Saved in:
| Published in | Algorithms for molecular biology Vol. 13; no. 1; pp. 3 - 12 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
London
BioMed Central
07.02.2018
BioMed Central Ltd BMC |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1748-7188 1748-7188 |
| DOI | 10.1186/s13015-018-0122-7 |
Cover
| Summary: | Background
Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph
G
that together cover all nodes, or edges, of
G
.
Approach
We address this problem with the “safe and complete” framework of Tomescu and Medvedev (Research in computational Molecular biology—20th annual conference, RECOMB 9649:152–163,
2016
). An algorithm is called
safe
if it returns only those walks (also called
safe
) that appear as subwalk in all metagenomic assembly solutions for
G
. A safe algorithm is called
complete
if it returns all safe walks of
G
.
Results
We give graph-theoretic characterizations of the safe walks of
G
, and a safe and complete algorithm finding
all
safe walks of
G
. In the node-covering case, our algorithm runs in time
O
(
m
2
+
n
3
)
, and in the edge-covering case it runs in time
O
(
m
2
n
)
;
n
and
m
denote the number of nodes and edges, respectively, of
G
. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1748-7188 1748-7188 |
| DOI: | 10.1186/s13015-018-0122-7 |