Source identification of mine water inrush based on GBDT-RS-SHAP

A novel interpretable intelligent water source identification model, integrating gradient boosting decision trees (GBDT) with SHapley Additive exPlanations (SHAP), has been developed to enhance safety in coal mining operations. To mitigate the impact of outliers on model accuracy during training, bo...

Full description

Saved in:
Bibliographic Details
Published inEnvironmental earth sciences Vol. 84; no. 4; p. 114
Main Authors Yang, Zhenwei, Li, Han, Wang, Xinyi, Meng, Hongwei, Xi, Tong, Hou, Zhenhuan
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2025
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1866-6280
1866-6299
DOI10.1007/s12665-025-12107-5

Cover

More Information
Summary:A novel interpretable intelligent water source identification model, integrating gradient boosting decision trees (GBDT) with SHapley Additive exPlanations (SHAP), has been developed to enhance safety in coal mining operations. To mitigate the impact of outliers on model accuracy during training, box plots and multivariate distribution matrix plots were employed to detect and subsequently remove outlier data from the sample. The processed dataset was subsequently subjected to training via GBDT, culminating in the development of a definitive classification model predicated on the gradient of residuals. The model’s hyperparameters, encompassing the number of trees, tree depth, and learning rate, were meticulously optimized through a random search algorithm to augment the model’s predictive performance. Utilizing the measured data from water samples collected in the Pingdingshan Coalfield, cross-validation was performed, yielding a maximum precision of 0.857 and an average precision of 0.602. Upon the application of the optimized GBDT model to the classification of 24 unknown water samples, the model achieved a high accuracy rate of 95.8%, with a single misclassification, and a minimal root mean square error (RMSE) of 0.183. This demonstrates that stochastic search optimization enhances the model’s stability and robustness, addressing the challenges of inefficiency and inaccuracy in coal mine water source identification, and significantly contributes to the advancement of water hazard prevention and control measures in coal mining. To make the output of the model transparent, this study employs SHAP for the elucidation of the model’s output. SHAP is a Python-based “Model Interpretation” package designed to elucidate the predictions of machine learning models. The findings reveal that fluctuations in Ca 2+ concentration exert a substantial impact on the discrimination outcomes, whereas the characteristic contribution of SO 4 2− is negligible and can be disregarded. This offers a foundational and referential framework for the study of water sources for mine water emergencies.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1866-6280
1866-6299
DOI:10.1007/s12665-025-12107-5