BAMQL: a query language for extracting reads from BAM files
Background It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and erro...
        Saved in:
      
    
          | Published in | BMC bioinformatics Vol. 17; no. 1; p. 305 | 
|---|---|
| Main Authors | , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          BioMed Central
    
        11.08.2016
     BioMed Central Ltd Springer Nature B.V  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1471-2105 1471-2105  | 
| DOI | 10.1186/s12859-016-1162-y | 
Cover
| Summary: | Background
It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as
Bio-SamTools
for Perl and
pysam
for Python. Both allow access to the BAM’s read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering.
Results
We have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs.
Conclusions
BAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 1471-2105 1471-2105  | 
| DOI: | 10.1186/s12859-016-1162-y |