P-0006 - Ensemble averaging based high resolution PM2.5 exposure assessment in two major Indian cities over 2010 to 2016

Abstract Control Number
Abstract Body
Aim: In this work, we retrospectively assessed daily average PM2.5 exposure at 1 km × 1 km grids in two major Indian cities, Delhi and Chennai from 2010-2016, using multiple data sources and ensemble averaging approaches that combine machine learning algorithms.
Methods: We implemented a multi-stage modeling exercise involving satellite data, land use variables, reanalysis based meteorological variables and population density. The relationship between PM2.5 and spatiotemporal predictors was modeled using six learners; generalized additive model, elastic net, support vector regressions, neural networks, random forests and extreme gradient boosting. Predictions from each base learner was combined under a generalized additive model framework with penalized splines and tensor product smoothing. Prediction accuracy was assessed using cross-validated(CV) R2, root mean squared error and bias.
Results: Predicted average annual PM2.5 concentrations in Delhi ranged from 87 to 138 μg/m3 over 2010 to 2016. Average CV-R² ranged from 0.69-0.92 for the ensemble averaged (EA) model across the years with annual average concentrations ranging from 104 to 139 μg/m³. The predictions were characterized by higher bias and root mean squared error in the fall and winter compared to summer and monsoon seasons. Spatial CV-R² (yearly average) varied between 0.91-0.99, while temporal CV-R² (daily variability) ranged from 0.65-0.90, showing adequate model performances. We demonstrated important seasonal and geographical differences in PM2.5 particulate matter concentrations using the model outputs. Modeling for Chennai is ongoing and preliminary predictions will be shown.
Conclusion: We have developed a detailed exposure assessment for ambient air pollution in two distinct cities in India, with large differences in PM2.5 levels, that are critical for estimating effects on health. We also demonstrate the advantages of ensemble averaging and machine learning based hybrid modeling, which can be used to scale up this exercise to the national level.