BAMQL: a query language for extracting reads from BAM files
Background It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and erro...
Saved in:
| Published in | BMC bioinformatics Vol. 17; no. 1; p. 305 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
BioMed Central
11.08.2016
BioMed Central Ltd Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1471-2105 1471-2105 |
| DOI | 10.1186/s12859-016-1162-y |
Cover
| Summary: | Background
It is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as
Bio-SamTools
for Perl and
pysam
for Python. Both allow access to the BAM’s read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering.
Results
We have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs.
Conclusions
BAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1471-2105 1471-2105 |
| DOI: | 10.1186/s12859-016-1162-y |