A Python-based optimization framework for high-performance genomics
Abstract Exponentially-growing next-generation sequencing data requires high-performance tools and algorithms. Nevertheless, the implementation of high-performance computational genomics software is inaccessible to many scientists because it requires extensive knowledge of low-level software optimiz...
Saved in:
| Published in | bioRxiv |
|---|---|
| Main Authors | , , , , , |
| Format | Paper |
| Language | English |
| Published |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
30.10.2020
Cold Spring Harbor Laboratory |
| Edition | 1.1 |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2692-8205 2692-8205 |
| DOI | 10.1101/2020.10.29.361402 |
Cover
| Summary: | Abstract Exponentially-growing next-generation sequencing data requires high-performance tools and algorithms. Nevertheless, the implementation of high-performance computational genomics software is inaccessible to many scientists because it requires extensive knowledge of low-level software optimization techniques, forcing scientists to resort to high-level software alternatives that are less efficient. Here, we introduce Seq—a Python-based optimization framework that combines the power and usability of high-level languages like Python with the performance of low-level languages like C or C++. Seq allows for shorter, simpler code, is readily usable by a novice programmer, and obtains significant performance improvements over existing languages and frameworks. We showcase and evaluate Seq by implementing seven standard, widely-used applications from all stages of the genomics analysis pipeline, including genome index construction, finding maximal exact matches, long-read alignment and haplotype phasing, and demonstrate its implementations are up to an order of magnitude faster than existing hand-optimized implementations, with just a fraction of the code. By enabling researchers of all backgrounds to easily implement high-performance analysis tools, Seq further opens the door to the democratization and scalability of computational genomics. Competing Interest Statement The authors have declared no competing interest. Footnotes * ↵1 Lead Contact * https://github.com/seq-lang/seq * https://seq-lang.org |
|---|---|
| Bibliography: | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest. |
| ISSN: | 2692-8205 2692-8205 |
| DOI: | 10.1101/2020.10.29.361402 |