PathoLive - Real-time pathogen identification from metagenomic Illumina datasets

Motivation: Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Tausch, Simon H, Loka, Tobias P, Schulze, Jakob M, Andrusch, Andreas, Klenner, Jeanette, Piotr Wojciech Dabrowski, Lindner, Martin S, Nitsche, Andreas, Renard, Bernhard Y
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 14.04.2020
Cold Spring Harbor Laboratory
Edition1.2
Subjects
Online AccessGet full text
ISSN2692-8205
2692-8205
DOI10.1101/402370

Cover

More Information
Summary:Motivation: Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. Results: We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan the results correctly indicated the presence of Crimean-Congo hemorrhagic Fever Virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS Coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly deemphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance. Availability: PathoLive is open source and available on GitLab (https://gitlab.com/rki_bioinformatics/PathoLive) and BioConda (conda install -c bioconda patholive). Competing Interest Statement The authors have declared no competing interest.
Bibliography:SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
Competing Interest Statement: The authors have declared no competing interest.
ISSN:2692-8205
2692-8205
DOI:10.1101/402370