RabbitSketch: a high-performance sketching library for genome analysis

We present RabbitSketch, a highly optimized library of sketching algorithms such as MinHash, OrderMinHash, and HyperLogLog that can exploit the power of modern multi-core CPUs. It provides significant speedups compared to existing implementations, ranging from 2.30× to 49.55×, as well as flexible an...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 41; no. 5
Main Authors Zhang, Tong, Yin, Zekun, Xu, Xiaoming, Yan, Lifeng, Zhu, Fangjin, Duan, Xiaohui, Schmidt, Bertil, Liu, Weiguo
Format Journal Article
LanguageEnglish
Published England Oxford University Press 06.05.2025
Subjects
Online AccessGet full text
ISSN1367-4811
1367-4803
1367-4811
DOI10.1093/bioinformatics/btaf249

Cover

More Information
Summary:We present RabbitSketch, a highly optimized library of sketching algorithms such as MinHash, OrderMinHash, and HyperLogLog that can exploit the power of modern multi-core CPUs. It provides significant speedups compared to existing implementations, ranging from 2.30× to 49.55×, as well as flexible and easy-to-use interfaces for both Python and C++. As a result, the similarity analysis of 455GB genomic data can be completed in only 5 minutes using RabbitSketch with merely 20 lines of Python code. As a case study, we enhanced RabbitTClust by integrating RabbitSketch's Kssd algorithm, resulting in a 1.54× speedup with no loss in accuracy. RabbitSketch is available at https://github.com/RabbitBio/RabbitSketch with an archived version at Zenodo: https://doi.org/10.5281/zenodo.14903962. Detailed API documentation is available at https://rabbitsketch.readthedocs.io/en/latest.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btaf249