Utilizing machine learning to classify persistent organic pollutants in the serum of pregnant women: a predictive modeling approach

Polychlorinated biphenyls (PCBs), organochlorine pesticides (OCPs), polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs), and per- and poly-fluoroalkyl substances (PFAS) are persistent organic pollutants (POPs) that remain detrimental to critical subpopulations, namely pregn...

Full description

Saved in:
Bibliographic Details
Published inEnvironmental science and pollution research international Vol. 31; no. 40; pp. 52980 - 52995
Main Authors Mahfouz, Maya, Mahfouz, Yara, Harmouche-Karaki, Mireille, Matta, Joseph, Younes, Hassan, Helou, Khalil, Finan, Ramzi, Abi-Tayeh, Georges, Meslimani, Mohamad, Moussa, Ghada, Chahrour, Nada, Osseiran, Camille, Skaiki, Farouk, Narbonne, Jean-François
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2024
Springer Nature B.V
Springer Verlag
Subjects
Online AccessGet full text
ISSN1614-7499
0944-1344
1614-7499
DOI10.1007/s11356-024-34684-x

Cover

More Information
Summary:Polychlorinated biphenyls (PCBs), organochlorine pesticides (OCPs), polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs), and per- and poly-fluoroalkyl substances (PFAS) are persistent organic pollutants (POPs) that remain detrimental to critical subpopulations, namely pregnant women. Required tests for biomonitoring are quite expensive. Moreover, statistical models aiming to discover the relationships between pollutants levels and human characteristics have their limitations. Therefore, the objective of this study is to use machine learning predictive models to further examine the pollutants’ predictors, while comparing them. Levels of 33 congeners were measured in the serum of 269 pregnant women, from whom data was collected regarding sociodemographic, dietary, environmental, and anthropometric characteristics. Several machine learning algorithms were compared using “Python” for each pollutant: support vector machine (SVM), random forest, XGBoost, and neural networks. The aforementioned characteristics were included in the model as features. Prediction, accuracy, precision, recall, F1-score, area under the ROC curve (AUC), sensitivity, and specificity were retrieved to compare the models between them and among pollutants. The highest performing model for all pollutants was Random Forest. Results showed a moderate to acceptable performance and discriminative power among all POPs, with OCPs’ model performing slightly better than all other models. Top related features for each model were also presented using SHAP analysis, detailing the predictors’ negative or positive impact on the model. In conclusion, developing such a tool is of major importance in a context of limited financial and research resources. Nevertheless, machine learning models should always be interpreted with caution by exploring all evaluation metrics.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1614-7499
0944-1344
1614-7499
DOI:10.1007/s11356-024-34684-x