Efficient assembly consensus algorithms for divergent contig sets

Assembly is a fundamental task in genome sequencing, and many assemblers have been made available in the last decade. Because of the wide range of possible choices, it can be hard to determine which tool or parameter to use for a specific genome sequencing project. In this paper, we propose a consen...

Full description

Saved in:

Bibliographic Details
Published in	Computational biology and chemistry Vol. 93; p. 107516
Main Authors	Chateau, Annie, Davot, Tom, Lafond, Manuel
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.08.2021 Elsevier
Subjects	B.2.4.a Algorithms Bioinformatics Computational Complexity Computer Science F.2 Analysis of algorithms and problem complexity L.3.0.j. Molecular biology L.3.0.j. Molecular biology F.2 Analysis of algorithms and problem complexity B.2.4.a Algorithms F.2 Analysis of Algorithms and Problem Complexity L.3.0.j. Molecular biology 3
Online Access	Get full text
ISSN	1476-9271 1476-928X 1476-928X
DOI	10.1016/j.compbiolchem.2021.107516

Cover

More Information
Summary:	Assembly is a fundamental task in genome sequencing, and many assemblers have been made available in the last decade. Because of the wide range of possible choices, it can be hard to determine which tool or parameter to use for a specific genome sequencing project. In this paper, we propose a consensus approach that takes the best parts of several contigs datasets produced by different methods, and combines them into a better assembly. This amounts to orienting and ordering sets of contigs, which can be viewed as an optimization problem where the aim is to find an alignment of two fragmented strings that maximizes an arbitrary scoring function between matched characters. In this work, we investigate the computational complexity of this problem. We first show that it is NP-hard, even in an alphabet with only two symbols and with all scores being either 0 or 1. On the positive side, we propose an efficient, quadratic time algorithm that achieves approximation factor 3.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1476-9271 1476-928X 1476-928X
DOI:	10.1016/j.compbiolchem.2021.107516