Application of Machine Learning Algorithms for Estimating Daily PM2.5 Concentrations Open Access
Huo, Runing (Spring 2023)
Abstract
Background: The detrimental impact of PM2.5 air pollution is widespread, as it has been linked to premature mortality and a diverse range of health concerns such as cardiovascular and respiratory illnesses. Machine learning approaches offer several advantages for predicting PM2.5 levels at locations without monitoring data. These include the ability to handle complex and large datasets, detect nonlinear associations, and provide accurate and adaptable solutions.
Objectives: Compare the prediction ability of four machine learning algorithms with three types of cross-validation experiments using data from 2018 in California.
Methods: Four machine learning algorithms were applied in this analysis: random forest, Bayesian additive regression trees (BART), gradient boosting and soft Bayesian additive regression trees (SoftBART). We performed 3 types of 10-fold cross-validations (ordinary, spatial, and temporal) using, R-squared, mean absolute error (MAE), and root-mean square error (RMSE). We also obtained average predictions of PM2.5 concentrations at 1km spatial resolution for January, April, July, Octobe in 2018.
Results: In the cross-validation analysis, we found the random forest performed the best with highest R-squared and smallest RMSE and MAE values. Random forest model also the least computationally intensive approach. Gradients boosting and BART model with larger number of trees are the second-best model. When using small number of trees, SoftBART model behaved similarly with the BART model.
Conclusions: In this study, we demonstrated the superior predictive performance of random forest, which is a commonly used method for predicting daily PM2.5 concentrations.
Table of Contents
1. Introduction. 1
2. Materials and Methods. 3
2.1 Motivating Datasets. 3
2.2 Statistical Analysis. 4
2.2.1 Machine Learning Models. 4
2.2.3 Prediction Performance Comparison. 7
3. Results. 10
3.1 Results of Cross-Validation Experiments. 10
3.1.1 Traditional 10-fold Cross-Validation. 11
3.1.2 Spatial 10-fold Cross-Validation. 12
3.1.3 Temporal 10-fold Cross-Validation. 14
3.2 Results of Predictions. 16
4. Discussion. 18
References: 19
About this Master's Thesis
| School | |
|---|---|
| Department | |
| Subfield / Discipline | |
| Degree | |
| Submission | |
| Language | 
 | 
| Research Field | |
| Keyword | |
| Committee Chair / Thesis Advisor | |
| Committee Members | 
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions | 
|---|---|---|---|
|  | Application of Machine Learning Algorithms for Estimating Daily PM2.5 Concentrations () | 2023-04-13 23:43:09 -0400 |  | 
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions | 
|---|