Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand

We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM 2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring stations...

Full description

Saved in:
Bibliographic Details
Published inAerosol and air quality research Vol. 21; no. 11; pp. 210105 - 13
Main Authors Gupta, Pawan, Zhan, Shanshan, Mishra, Vikalp, Aekakkararungroj, Aekkapol, Markert, Amanda, Paibong, Sarawut, Chishtie, Farrukh
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.11.2021
Taiwan Association of Aerosol Research
Springer
Subjects
Online AccessGet full text
ISSN1680-8584
2071-1409
2071-1409
DOI10.4209/aaqr.210105

Cover

More Information
Summary:We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM 2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring stations in Thailand was spatiotemporally collocated with MERRA2 fields. The integrated data then used to train and validate a supervised MLA’ random forest’ to estimate hourly and daily PM 2.5 concentrations. The MLA is cross-validated using a 10-fold random sampling approach. The trained MLA can estimate PM 2.5 with close to zero mean bias across the country. The correlation coefficient of 0.95 with slope and intercept values of 0.95 and 0.88 are achieved between observed and estimated PM 2.5 . The MLA also shows underestimation at hourly scale under very clean conditions (PM 2.5 < 10 µg m −3 ) and overestimation during high loading (PM 2.5 > 80 µg m −3 ). The hourly data also demonstrate high skill in following the diurnal cycle during different seasons of the year. The daily mean PM 2.5 (24-hour) values follow day-to-day variability very well (correlation coefficient of 0.98, RMSE = 3.14 µg m −3 ), showing high value during winter months (November– February) and lower during other seasons. The trained MLA has the potential to reprocess the MERRA2 timeseries for the region, and the bias corrected data can be used in other applications such as long-term trend analysis and health exposure studies. The MLA can also be applied to GEOS forecasted fields to generate bias corrected air quality forecasts for the region.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1680-8584
2071-1409
2071-1409
DOI:10.4209/aaqr.210105