Forecasting maximal and minimal air temperatures using explainable machine learning: Shapley additive explanation versus local interpretable model-agnostic explanations

This study investigates the performance of four boosting machine learning models, AdaBoost, XGBoost, CatBoost, and LightGBM, for forecasting maximal (Tmax) and minimal (Tmin) air temperatures at six lead times: the same day and 1, 7, 15, 21, and 30 days ahead. Daily temperature data from the USGS 02...

Full description

Saved in:

Bibliographic Details
Published in	Stochastic environmental research and risk assessment Vol. 39; no. 6; pp. 2551 - 2581
Main Authors	Daif, Noureddine, Di Nunno, Fabio, Granata, Francesco, Difi, Salah, Kisi, Ozgur, Heddam, Salim, Kim, Sungwon, Adnan, Rana Muhammad, Zounemat-Kermani, Mohammad
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2025 Springer Nature B.V
Subjects	Accuracy Air temperature Algorithms Aquatic Pollution Artificial intelligence Chemistry and Earth Sciences Climate change Climate models Climatic data Computational Intelligence Computer Science Correlation coefficient Correlation coefficients Datasets Decision making Deep learning Earth and Environmental Science Earth Sciences Environment Environmental risk Explainable artificial intelligence Forecasting Lead time Learning algorithms Machine learning Math. Appl. in Environmental Science Neural networks Original Paper Physics Probability Theory and Stochastic Processes Public health Risk management Root-mean-square errors Statistics for Engineering Support vector machines Temperature Waste Water Technology Water Management Water Pollution Control Weather stations Pakistan AdaBoost SHAP T XGBoost LightGBM LIME Forecasting CatBoost
Online Access	Get full text
ISSN	1436-3240 1436-3259
DOI	10.1007/s00477-025-02984-4

Cover

More Information
Summary:	This study investigates the performance of four boosting machine learning models, AdaBoost, XGBoost, CatBoost, and LightGBM, for forecasting maximal (Tmax) and minimal (Tmin) air temperatures at six lead times: the same day and 1, 7, 15, 21, and 30 days ahead. Daily temperature data from the USGS 02187010 weather station (South Carolina, USA) were used for model training and validation. To address the challenges posed by the non-linearity and complexity of climate data, the models were integrated with explainable artificial intelligence (XAI) techniques, specifically SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), which provide insights into the role of input features in shaping predictions. Results indicate that forecasting accuracy declines with increasing lead time. Among the tested models, CatBoost1 consistently exhibited the best performance. For Tmax forecasting on the validation set, CatBoost1 yielded a correlation coefficient (R) of 0.900, Nash–Sutcliffe efficiency (NSE) of 0.810, root mean squared error (RMSE) of 3.447 °C, mean absolute error (MAE) of 2.571 °C, Willmott’s Index (WI) of 0.947, Legates and McCabe Index (LM) of 0.615, explained variance score (EVS) of 0.810, and absolute percentage bias (APB) of 15.415%. For Tmin, CatBoost1 achieved R = 0.941, NSE = 0.885, RMSE = 2.618 °C, MAE = 1.952 °C, WI = 0.969, LM = 0.709, EVS = 0.885, and APB = 54.360%. These findings demonstrate that boosting models, when combined with explainable AI techniques, offer a robust and transparent framework for temperature forecasting, supporting their application in climate risk management, agriculture, and energy planning. Graphical abstract
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1436-3240 1436-3259
DOI:	10.1007/s00477-025-02984-4