BLESS: Bandwidth and Locality Enhanced SMEM Seeding Acceleration for DNA Sequencing

In an era marked by the pervasive spread of harmful viruses like COVID-19, the importance of DNA sequencing has grown significantly, given its crucial role in devising effective countermeasures. The seeding process, which aims to find locations of super-maximal exact matches (SMEM) between the DNA s...

Full description

Saved in:
Bibliographic Details
Published in2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 582 - 596
Main Authors Han, Seunghee, Moon, Seungjae, Suh, Teokkyu, Heo, JaeHoon, Kim, Joo-Young
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.06.2024
Subjects
Online AccessGet full text
DOI10.1109/ISCA59077.2024.00049

Cover

More Information
Summary:In an era marked by the pervasive spread of harmful viruses like COVID-19, the importance of DNA sequencing has grown significantly, given its crucial role in devising effective countermeasures. The seeding process, which aims to find locations of super-maximal exact matches (SMEM) between the DNA samples and reference genome for comparative analysis, has emerged as a major bottleneck due to its memory-intensive characteristics. The learned index approach has been developed that uses machine learning model to partially predict the location of the exact matches, which has effectively reduced the memory access. However, the lack of locality in the current in dexing structure and randomness at runtime of the seeding workload have constrained the memory bandwidth usage and have limited further performance advantage. In this paper, we propose BLESS, a bandwidth and locality enhanced SMEM seeding accelerator for learned-index-based DNA sequence alignment. BLESS is the first domain-specific seeding accelerator to maximize the potential hardware advantage of the learned index approach. We introduce coarse-fine (CF) block data structure, a novel memory mapping of seeding parameters to exploit spatial locality and increase effective bandwidth usage for any memory type, including high bandwidth memory (HBM). We also develop guaranteed search range update (GSRU) algorithm, a method that exploits caching in the search procedure to enable temporal locality and data reuse. Utilizing the CF block and GSRU algorithm, we develop a multi-core seeding accelerator using HBM with context switching and runtime scheduling for maximum core and memory bandwidth utilization. With these improvements, BLESS achieves 35.65 \times and 15.49 \times speedup over the state-of-the-art seeding system BWA-MEME and ERT-ASIC, respectively, in raw system performance.
DOI:10.1109/ISCA59077.2024.00049