A safe and complete algorithm for metagenomic assembly

Background Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally form...

Full description

Saved in:
Bibliographic Details
Published inAlgorithms for molecular biology Vol. 13; no. 1; pp. 3 - 12
Main Authors Obscura Acosta, Nidia, Mäkinen, Veli, Tomescu, Alexandru I.
Format Journal Article
LanguageEnglish
Published London BioMed Central 07.02.2018
BioMed Central Ltd
BMC
Subjects
Online AccessGet full text
ISSN1748-7188
1748-7188
DOI10.1186/s13015-018-0122-7

Cover

More Information
Summary:Background Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G . Approach We address this problem with the “safe and complete” framework of Tomescu and Medvedev (Research in computational Molecular biology—20th annual conference, RECOMB 9649:152–163, 2016 ). An algorithm is called safe if it returns only those walks (also called safe ) that appear as subwalk in all metagenomic assembly solutions for G . A safe algorithm is called complete if it returns all safe walks of G . Results We give graph-theoretic characterizations of the safe walks of G , and a safe and complete algorithm finding all safe walks of G . In the node-covering case, our algorithm runs in time O ( m 2 + n 3 ) , and in the edge-covering case it runs in time O ( m 2 n ) ; n and m denote the number of nodes and edges, respectively, of G . This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1748-7188
1748-7188
DOI:10.1186/s13015-018-0122-7