PROTAX-GPU: a scalable probabilistic taxonomic classification system for DNA barcodes
DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a pr...
Saved in:
| Published in | Philosophical transactions of the Royal Society of London. Series B. Biological sciences Vol. 379; no. 1904; p. 20230124 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
England
The Royal Society
24.06.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0962-8436 1471-2970 1471-2970 |
| DOI | 10.1098/rstb.2023.0124 |
Cover
| Summary: | DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX’s key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics.
This article is part of the theme issue ‘Towards a toolkit for global insect biodiversity monitoring’. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.7159016. One contribution of 23 to a theme issue ‘Towards a toolkit for global insect biodiversity monitoring’. |
| ISSN: | 0962-8436 1471-2970 1471-2970 |
| DOI: | 10.1098/rstb.2023.0124 |