A Python-based optimization framework for high-performance genomics

Abstract Exponentially-growing next-generation sequencing data requires high-performance tools and algorithms. Nevertheless, the implementation of high-performance computational genomics software is inaccessible to many scientists because it requires extensive knowledge of low-level software optimiz...

Full description

Saved in:

Bibliographic Details
Published in	bioRxiv
Main Authors	Shajii, Ariya, Ibrahim Numanagić, Leighton, Alexander T, Haley Greenyer, Amarasinghe, Saman, Berger, Bonnie
Format	Paper
Language	English
Published	Cold Spring Harbor Cold Spring Harbor Laboratory Press 30.10.2020 Cold Spring Harbor Laboratory
Edition	1.1
Subjects	Bioinformatics Computer applications Computer programs Genomics Haplotypes Next-generation sequencing Optimization techniques Software high-performance domain-specific languages Computational genomics sequencing
Online Access	Get full text
ISSN	2692-8205 2692-8205
DOI	10.1101/2020.10.29.361402

Cover

More Information
Summary:	Abstract Exponentially-growing next-generation sequencing data requires high-performance tools and algorithms. Nevertheless, the implementation of high-performance computational genomics software is inaccessible to many scientists because it requires extensive knowledge of low-level software optimization techniques, forcing scientists to resort to high-level software alternatives that are less efficient. Here, we introduce Seq—a Python-based optimization framework that combines the power and usability of high-level languages like Python with the performance of low-level languages like C or C++. Seq allows for shorter, simpler code, is readily usable by a novice programmer, and obtains significant performance improvements over existing languages and frameworks. We showcase and evaluate Seq by implementing seven standard, widely-used applications from all stages of the genomics analysis pipeline, including genome index construction, finding maximal exact matches, long-read alignment and haplotype phasing, and demonstrate its implementations are up to an order of magnitude faster than existing hand-optimized implementations, with just a fraction of the code. By enabling researchers of all backgrounds to easily implement high-performance analysis tools, Seq further opens the door to the democratization and scalability of computational genomics. Competing Interest Statement The authors have declared no competing interest. Footnotes * ↵1 Lead Contact * https://github.com/seq-lang/seq * https://seq-lang.org
Bibliography:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest.
ISSN:	2692-8205 2692-8205
DOI:	10.1101/2020.10.29.361402