SOF: An Efficient String Graph Construction Algorithm

In contrast to genome assemblers that use de Bruijn graphs, those based on string graphs are able to losslessly retain information from sequence data. However, despite the advantages provided by a string graph framework in repeat detection and in maintaining read coherence, the high computational co...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 283 - 287
Main Authors Morshed, S. M. Iqbal, Yooseph, Shibu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2019
Subjects
Online AccessGet full text
DOI10.1109/BIBM47256.2019.8983393

Cover

More Information
Summary:In contrast to genome assemblers that use de Bruijn graphs, those based on string graphs are able to losslessly retain information from sequence data. However, despite the advantages provided by a string graph framework in repeat detection and in maintaining read coherence, the high computational cost for constructing a string graph hinders its usability for genome assembly. Even though different algorithms have been proposed over the last decade for string graph construction, efficiency is still a challenge due to the demand for processing a large amount of sequence data generated by Next-Generation Sequencing technologies. In this paper, we provide a novel, linear time and alphabet-size-independent algorithm SOF which uses the property of irreducible edges and transitive edges to efficiently construct a string graph from an overlap graph. Experimental results show that SOF is at least 2.3 times faster than the string graph construction algorithm provided in SGA (one of the most popular string graph-based assemblers), while maintaining almost the same memory footprint as SGA. Moreover, the implementation of SOF as a subprogram in the SGA assembly pipeline will allow a user easy access to the preprocessing and postprocessing steps for genome assembly provided in SGA. Implementation: https://github.com/iqbalmorshed/sof
DOI:10.1109/BIBM47256.2019.8983393