A safe and complete algorithm for metagenomic assembly
Background Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally form...
        Saved in:
      
    
          | Published in | Algorithms for molecular biology Vol. 13; no. 1; pp. 3 - 12 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          BioMed Central
    
        07.02.2018
     BioMed Central Ltd BMC  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1748-7188 1748-7188  | 
| DOI | 10.1186/s13015-018-0122-7 | 
Cover
| Summary: | Background
Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph
G
that together cover all nodes, or edges, of
G
.
Approach
We address this problem with the “safe and complete” framework of Tomescu and Medvedev (Research in computational Molecular biology—20th annual conference, RECOMB 9649:152–163,
2016
). An algorithm is called
safe
if it returns only those walks (also called
safe
) that appear as subwalk in all metagenomic assembly solutions for
G
. A safe algorithm is called
complete
if it returns all safe walks of
G
.
Results
We give graph-theoretic characterizations of the safe walks of
G
, and a safe and complete algorithm finding
all
safe walks of
G
. In the node-covering case, our algorithm runs in time
O
(
m
2
+
n
3
)
, and in the edge-covering case it runs in time
O
(
m
2
n
)
;
n
and
m
denote the number of nodes and edges, respectively, of
G
. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23  | 
| ISSN: | 1748-7188 1748-7188  | 
| DOI: | 10.1186/s13015-018-0122-7 |