Fulcrum: condensing redundant reads from high-throughput sequencing studies

Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse ident...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 28; no. 10; pp. 1324 - 1327
Main Authors Burriesci, Matthew S., Lehnert, Erik M., Pringle, John R.
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.05.2012
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1367-4811
DOI10.1093/bioinformatics/bts123

Cover

More Information
Summary:Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory. Availability and implementation: Source code and a tutorial are available at http://pringlelab.stanford.edu/protocols.html under a BSD-like license. Fulcrum was written and tested in Python 2.6, and the single-machine and local-network modes depend on a modified version of the Parallel Python library (provided). Contact:  erik.m.lehnert@gmail.com Supplementary information:  Supplementary information is available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Associate Editor: Alex Bateman
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/bts123