A retrospective study using machine learning to develop predictive model to identify rotavirus-associated acute gastroenteritis in children

Rotavirus is the leading cause of severe dehydrating diarrhea in children under 5 years worldwide. Timely diagnosis is critical, but access to confirmatory testing is limited in hospital settings. Machine learning (ML) models have shown promising potential in supporting symptom-based diagnosis of se...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ (San Francisco, CA) Vol. 13; p. e19025
Main Authors	Paul, Sourav, Rahman, Minhazur, Dolley, Anutee, Saikia, Kasturi, Shyamsunder Singh, Chongtham, Mohammed, Arifullah, Muteeb, Ghazala, Sarmah, Rosy, Namsa, Nima D.
Format	Journal Article
Language	English
Published	United States PeerJ. Ltd 14.04.2025 PeerJ, Inc PeerJ Inc
Subjects	Accuracy Artificial intelligence Bayes Theorem Child health Child, Preschool Children Collinearity Computational Biology Computer-aided medical diagnosis Correlation analysis Data Mining and Machine Learning Datasets Decision Trees Dehydration Developing countries Diagnosis Diarrhea Diarrhea - virology Disease diagnosis Female Gastroenteritis Gastroenteritis - diagnosis Gastroenteritis - virology Gastroenterology and Hepatology Health aspects Humans Immunization India Infant Infectious Diseases LDCs Learning algorithms Machine Learning Male Medical science Methods Mortality Pediatrics Prediction models Recall Retrospective Studies Rotavirus Rotavirus infections Rotavirus Infections - complications Rotavirus Infections - diagnosis Supervised learning Support Vector Machine Support vector machines Variance analysis Vomiting India South Africa Rotavirus Disease diagnosis Supervised learning Gastroenteritis Child health Machine learning
Online Access	Get full text
ISSN	2167-8359 2167-8359 2376-5992
DOI	10.7717/peerj.19025

Cover

More Information
Summary:	Rotavirus is the leading cause of severe dehydrating diarrhea in children under 5 years worldwide. Timely diagnosis is critical, but access to confirmatory testing is limited in hospital settings. Machine learning (ML) models have shown promising potential in supporting symptom-based diagnosis of several diseases in resource-limited settings. This study aims to develop a machine-learning predictive model integrated with multiple sources of clinical parameters specific to rotavirus infection without relying on laboratory tests. A clinical dataset of 509 children was collected in collaboration with the Regional Institute of Medical Sciences, Imphal, India. The clinical symptoms included diarrhea and its duration, number of stool episodes per day, fever, vomiting and its duration, number of vomiting episodes per day, temperature and dehydration. Correlation analysis is performed to check the feature-feature and feature-outcome collinearity. Feature selection using ANOVA test is carried out to find the feature importance values and finally obtain the reduced feature subset. Seven supervised learning models were tested and compared viz., support vector machine (SVM), K-nearest neighbor (KNN), naive Bayes (NB), logistic regression (Log_R) , random forest (RF), decision tree (DT), and XGBoost (XGB). A comparison of the performances of the seven models using the classification results obtained. The performance of the models was evaluated based on accuracy, precision, recall, specificity, F1 score, macro F1, F2, and receiver operator characteristic curve. The seven ML models were exhaustively experimented on our dataset and compared based on eight evaluation scores which are accuracy, precision, recall, specificity, F1 score, F2 score, macro F1 score, and AUC values computed. We observed that when the seven ML models were applied, RF performed the best with an accuracy of 81.4%, F1 score of 86.9%, macro F1-score of 77.3%, F2 score of 86.5% and area under the curve (AUC) of 89%. The machine learning models can contribute to predicting symptom-based diagnosis of rotavirus-associated acute gastroenteritis in children, especially in resource-limited settings. Further validation of the models using a large dataset is needed for predicting pediatric diarrheic populations with optimum sensitivity and specificity.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2167-8359 2167-8359 2376-5992
DOI:	10.7717/peerj.19025