Fulcrum: condensing redundant reads from high-throughput sequencing studies
Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs. Results: We developed Fulcrum to collapse ident...
        Saved in:
      
    
          | Published in | Bioinformatics (Oxford, England) Vol. 28; no. 10; pp. 1324 - 1327 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Oxford
          Oxford University Press
    
        15.05.2012
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1367-4803 1367-4811 1367-4811  | 
| DOI | 10.1093/bioinformatics/bts123 | 
Cover
| Summary: | Motivation: Ultra-high-throughput sequencing produces duplicate and near-duplicate reads, which can consume computational resources in downstream applications. A tool that collapses such reads should reduce storage and assembly complications and costs.
Results: We developed Fulcrum to collapse identical and near-identical Illumina and 454 reads (such as those from PCR clones) into single error-corrected sequences; it can process paired-end as well as single-end reads. Fulcrum is customizable and can be deployed on a single machine, a local network or a commercially available MapReduce cluster, and it has been optimized to maximize ease-of-use, cross-platform compatibility and future scalability. Sequence datasets have been collapsed by up to 71%, and the reduced number and improved quality of the resulting sequences allow assemblers to produce longer contigs while using less memory.
Availability and implementation: Source code and a tutorial are available at http://pringlelab.stanford.edu/protocols.html under a BSD-like license. Fulcrum was written and tested in Python 2.6, and the single-machine and local-network modes depend on a modified version of the Parallel Python library (provided).
Contact:  erik.m.lehnert@gmail.com
Supplementary information:  Supplementary information is available at Bioinformatics online. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Associate Editor: Alex Bateman  | 
| ISSN: | 1367-4803 1367-4811 1367-4811  | 
| DOI: | 10.1093/bioinformatics/bts123 |