Spatial modelling of particulate matter air pollution sensor measurements collected by community scientists while cycling, land use regression with spatial cross-validation, and applications of machine learning for data correction

Fine particulate matter air pollution is a global issue; cycling is a global activity. In our paper, particulate matter less than 2.5 μm (PM2.5) air pollution data obtained by community scientists while cycling is used to develop high-resolution spatial air pollution maps. Mapping is completed using...

Full description

Saved in:
Bibliographic Details
Published inAtmospheric environment (1994) Vol. 230; p. 117479
Main Authors Adams, Matthew D., Massey, Felix, Chastko, Karl, Cupini, Calvin
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.06.2020
Subjects
Online AccessGet full text
ISSN1352-2310
1873-2844
DOI10.1016/j.atmosenv.2020.117479

Cover

More Information
Summary:Fine particulate matter air pollution is a global issue; cycling is a global activity. In our paper, particulate matter less than 2.5 μm (PM2.5) air pollution data obtained by community scientists while cycling is used to develop high-resolution spatial air pollution maps. Mapping is completed using a land use regression model for Charlotte, North Carolina. The air pollution observations were obtained with a low-cost sensor. We evaluated the accuracy of the sensor through a collocation study for 3203 h, which identified the sensor had a mean bias of 7.25 μg/m3 and a correlation of r = 0.77 with an US EPA Federal Equivalent Monitor. A machine learning model was developed to adjust the sensor observations, which demonstrated their highest errors during periods of high humidity. The adjustment was able to reduce the root mean squared error from 12 μg/m3 to 3.8 μg/m3, and the mean bias was reduced to −0.5 μg/m3. Cycling times were not balanced throughout the day nor the year. We applied a temporal adjustment algorithm to account for this imbalance in observation periods with the intention of producing long-term estimates representing the sampling period of 2016 and 2017. The long-term air pollution surface for the city was generated with a land use regression model. Both linear regression and machine learning approaches were applied. The linear regression model performed poorly with a training R2 of 0.15 and a cross-validation R2 of 0.15. A stacked ensemble model was developed using machine learning, which had a training 5-fold cross-validation mean residual deviance of 3.82 μg/m3, a root mean squared error of 1.95 μg/m3, and a mean absolute error of 0.95 μg/m3. Performance remained strong during cross-validation, which included both a random sample approach (RMSE = 1.52 μg/m3) and a spatial blocking cross-validation method (RMSE = 2.8 μg/m3). [Display omitted] •Air pollution maps were developed using low-cost sensors.•Spatially varying data were obtained by community scientists while cycling.•Sensor measurement errors demonstrate a strong correlation with humidity.•Spatial blocking cross-validation is compared with a random sample approach.•Automated machine learning is applied for model development.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1352-2310
1873-2844
DOI:10.1016/j.atmosenv.2020.117479