Satellite-Based Daily Ground Ozone Estimates in California, Using Machine Learning Methods Open Access

Wang, Wenhao (Spring 2020)

Permanent URL:


Exposure to the ground-level ozone can trigger a variety of health problems as well as ecological impacts. To estimate ground-level ozone concentration, seldom satellite-based machine learning models were used in the prediction for large spatial and temporal coverage due to the lack of adequate satellite products. Troposphere Monitoring Instrument (TROPOMI) on board of the Sentinel 5 Precursor can provide high quality and relatively high-resolution gas pollutants data for the model of prediction the ozone. We aim to develop a high-performance TROPOMI satellite-driving machine learning model to estimate the daily maximum 8-hour average ground-level ozone concentration at a spatial resolution of square 10 kilometers in the state of California from May 2018 to April 2019 combined with predictors including meteorological fields, land-use variables. All predictors data and ground measurement of ozone are re-gridded to the 10 ´ 10 kilometers grid we create to build a random forest model setting the daily ground concentration in each pixel of the grid as the outcome. Our model achieved overall 10-fold cross-validation (CV) R2 of 0.83 with random mean square error (RMSE) of 5.91 ppb, indicating a good fit between model prediction and observation. Our model achieved a good prediction on the ground-level ozone concentration in California, supporting the feasibility and advantage of application TROPOMI satellite product and machine learning method in the prediction of ground-level ozone concentration. The result of our model can be applied in future epidemiological studies as well as the strategies studies in the control of ground-level ozone pollution.

Table of Contents

1.    Introduction

2.    Data and Method

2.1 Ground-Level Ozone measurements.

2.2 TROPOMI satellite data

2.3 Meteorological Field

2.4 Land-Use Variables

2.5 Data Process

2.6 Random Forest Model

3. Result

3.1 Descriptive Statistics

3.2 Result of Model Validation

3.3 Importance of Variables

3.4 Model Estimate of Ozone Concentration

4. Discussion

4.1 Model Analysis

4.2 Importance Rank Analysis

4.3 Prediction Analysis

4.4 Limitation and Future Plan

5. Conclusion

6. Tables and Figures

7. Reference

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Last modified

Primary PDF

Supplemental Files