BAMQL: a query language for extracting reads from BAM files

Background It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and erro...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 17; no. 1; p. 305
Main Authors Masella, Andre P., Lalansingh, Christopher M., Sivasundaram, Pragash, Fraser, Michael, Bristow, Robert G., Boutros, Paul C.
Format Journal Article
LanguageEnglish
Published London BioMed Central 11.08.2016
BioMed Central Ltd
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1471-2105
1471-2105
DOI10.1186/s12859-016-1162-y

Cover

More Information
Summary:Background It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as Bio-SamTools for Perl and pysam for Python. Both allow access to the BAM’s read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering. Results We have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs. Conclusions BAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-016-1162-y