Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection

Abstract Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses chall...

Full description

Saved in:

Bibliographic Details
Published in	Briefings in bioinformatics Vol. 25; no. 6
Main Authors	Sharma, Gulshan Kumar, Sharma, Rakesh, Joshi, Kavita, Qureshi, Sameer, Mathur, Shubhita, Sinha, Sharad, Chatterjee, Samit, Nunia, Vandana
Format	Journal Article
Language	English
Published	England Oxford University Press 23.09.2024 Oxford Publishing Limited (England)
Subjects	Algorithms Bacteria - genetics Bacteria - isolation & purification Biotechnology Computational Biology - methods Evolutionary algorithms Humans Microorganisms Monkeypox virus - isolation & purification Mpox Mycobacterium tuberculosis - isolation & purification Neisseria gonorrhoeae - isolation & purification Nodes Phylogenetics Phylogeny Problem Solving Protocol Search algorithms Sequence Alignment Sequence Analysis, DNA - methods Sequences Software Taxonomy Uniqueness k-mer phylogenetic tree taxonomy unique sequences hash map depth first search
Online Access	Get full text
ISSN	1467-5463 1477-4054 1477-4054
DOI	10.1093/bib/bbae545

Cover

More Information
Summary:	Abstract Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer–based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1467-5463 1477-4054 1477-4054
DOI:	10.1093/bib/bbae545