RabbitSketch: a high-performance sketching library for genome analysis
We present RabbitSketch, a highly optimized library of sketching algorithms such as MinHash, OrderMinHash, and HyperLogLog that can exploit the power of modern multi-core CPUs. It provides significant speedups compared to existing implementations, ranging from 2.30× to 49.55×, as well as flexible an...
        Saved in:
      
    
          | Published in | Bioinformatics (Oxford, England) Vol. 41; no. 5 | 
|---|---|
| Main Authors | , , , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        England
          Oxford University Press
    
        06.05.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1367-4811 1367-4803 1367-4811  | 
| DOI | 10.1093/bioinformatics/btaf249 | 
Cover
| Summary: | We present RabbitSketch, a highly optimized library of sketching algorithms such as MinHash, OrderMinHash, and HyperLogLog that can exploit the power of modern multi-core CPUs. It provides significant speedups compared to existing implementations, ranging from 2.30× to 49.55×, as well as flexible and easy-to-use interfaces for both Python and C++. As a result, the similarity analysis of 455GB genomic data can be completed in only 5 minutes using RabbitSketch with merely 20 lines of Python code. As a case study, we enhanced RabbitTClust by integrating RabbitSketch's Kssd algorithm, resulting in a 1.54× speedup with no loss in accuracy.
RabbitSketch is available at https://github.com/RabbitBio/RabbitSketch with an archived version at Zenodo: https://doi.org/10.5281/zenodo.14903962. Detailed API documentation is available at https://rabbitsketch.readthedocs.io/en/latest. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23  | 
| ISSN: | 1367-4811 1367-4803 1367-4811  | 
| DOI: | 10.1093/bioinformatics/btaf249 |