Identifying key soil characteristics for Francisella tularensis classification with optimized Machine learning models

Francisella tularensis (Ft) poses a significant threat to both animal and human populations, given its potential as a bioweapon. Current research on the classification of this pathogen and its relationship with soil physical–chemical characteristics often relies on traditional statistical methods. I...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 14; no. 1; pp. 1743 - 18
Main Authors Ahmad, Fareed, Javed, Kashif, Tahir, Ahsen, Khan, Muhammad Usman Ghani, Abbas, Mateen, Rabbani, Masood, Shabbir, Muhammad Zubair
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 19.01.2024
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2045-2322
2045-2322
DOI10.1038/s41598-024-51502-z

Cover

More Information
Summary:Francisella tularensis (Ft) poses a significant threat to both animal and human populations, given its potential as a bioweapon. Current research on the classification of this pathogen and its relationship with soil physical–chemical characteristics often relies on traditional statistical methods. In this study, we leverage advanced machine learning models to enhance the prediction of epidemiological models for soil-based microbes. Our model employs a two-stage feature ranking process to identify crucial soil attributes and hyperparameter optimization for accurate pathogen classification using a unique soil attribute dataset. Optimization involves various classification algorithms, including Support Vector Machines (SVM), Ensemble Models (EM), and Neural Networks (NN), utilizing Bayesian and Random search techniques. Results indicate the significance of soil features such as clay, nitrogen, soluble salts, silt, organic matter, and zinc , while identifying the least significant ones as potassium, calcium, copper, sodium, iron, and phosphorus. Bayesian optimization yields the best results, achieving an accuracy of 86.5% for SVM, 81.8% for EM, and 83.8% for NN. Notably, SVM emerges as the top-performing classifier, with an accuracy of 86.5% for both Bayesian and Random Search optimizations. The insights gained from employing machine learning techniques enhance our understanding of the environmental factors influencing Ft’s persistence in soil. This, in turn, reduces the risk of false classifications, contributing to better pandemic control and mitigating socio-economic impacts on communities.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-024-51502-z