Spatial modelling of particulate matter air pollution sensor measurements collected by community scientists while cycling, land use regression with spatial cross-validation, and applications of machine learning for data correction

Fine particulate matter air pollution is a global issue; cycling is a global activity. In our paper, particulate matter less than 2.5 μm (PM2.5) air pollution data obtained by community scientists while cycling is used to develop high-resolution spatial air pollution maps. Mapping is completed using...

Full description

Saved in:

Bibliographic Details
Published in	Atmospheric environment (1994) Vol. 230; p. 117479
Main Authors	Adams, Matthew D., Massey, Felix, Chastko, Karl, Cupini, Calvin
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.06.2020
Subjects	Air pollution algorithms artificial intelligence atmospheric chemistry Citizen science Community science Cross-validation Cycling humidity land use Land use regression Machine learning North Carolina Particulate matter particulates regression analysis United States Environmental Protection Agency North Carolina Community science Land use regression Cycling Citizen science Particulate matter Cross-validation Machine learning Air pollution
Online Access	Get full text
ISSN	1352-2310 1873-2844
DOI	10.1016/j.atmosenv.2020.117479

Cover

More Information
Summary:	Fine particulate matter air pollution is a global issue; cycling is a global activity. In our paper, particulate matter less than 2.5 μm (PM2.5) air pollution data obtained by community scientists while cycling is used to develop high-resolution spatial air pollution maps. Mapping is completed using a land use regression model for Charlotte, North Carolina. The air pollution observations were obtained with a low-cost sensor. We evaluated the accuracy of the sensor through a collocation study for 3203 h, which identified the sensor had a mean bias of 7.25 μg/m3 and a correlation of r = 0.77 with an US EPA Federal Equivalent Monitor. A machine learning model was developed to adjust the sensor observations, which demonstrated their highest errors during periods of high humidity. The adjustment was able to reduce the root mean squared error from 12 μg/m3 to 3.8 μg/m3, and the mean bias was reduced to −0.5 μg/m3. Cycling times were not balanced throughout the day nor the year. We applied a temporal adjustment algorithm to account for this imbalance in observation periods with the intention of producing long-term estimates representing the sampling period of 2016 and 2017. The long-term air pollution surface for the city was generated with a land use regression model. Both linear regression and machine learning approaches were applied. The linear regression model performed poorly with a training R2 of 0.15 and a cross-validation R2 of 0.15. A stacked ensemble model was developed using machine learning, which had a training 5-fold cross-validation mean residual deviance of 3.82 μg/m3, a root mean squared error of 1.95 μg/m3, and a mean absolute error of 0.95 μg/m3. Performance remained strong during cross-validation, which included both a random sample approach (RMSE = 1.52 μg/m3) and a spatial blocking cross-validation method (RMSE = 2.8 μg/m3). [Display omitted] •Air pollution maps were developed using low-cost sensors.•Spatially varying data were obtained by community scientists while cycling.•Sensor measurement errors demonstrate a strong correlation with humidity.•Spatial blocking cross-validation is compared with a random sample approach.•Automated machine learning is applied for model development.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1352-2310 1873-2844
DOI:	10.1016/j.atmosenv.2020.117479