PROTAX-GPU: a scalable probabilistic taxonomic classification system for DNA barcodes

DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a pr...

Full description

Saved in:
Bibliographic Details
Published inPhilosophical transactions of the Royal Society of London. Series B. Biological sciences Vol. 379; no. 1904; p. 20230124
Main Authors Li, Roy, Ratnasingham, Sujeevan, Zarubiieva, Iuliia, Somervuo, Panu, Taylor, Graham W.
Format Journal Article
LanguageEnglish
Published England The Royal Society 24.06.2024
Subjects
Online AccessGet full text
ISSN0962-8436
1471-2970
1471-2970
DOI10.1098/rstb.2023.0124

Cover

More Information
Summary:DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX’s key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue ‘Towards a toolkit for global insect biodiversity monitoring’.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.7159016.
One contribution of 23 to a theme issue ‘Towards a toolkit for global insect biodiversity monitoring’.
ISSN:0962-8436
1471-2970
1471-2970
DOI:10.1098/rstb.2023.0124