Statistical Methods for Spatial Data in Public Health Open Access

Kianian, Behzad (Fall 2020)

Permanent URL:


Data in public health often contain a spatial component relevant to understanding underlying relationships of interest. Accounting for different manifestations of spatial components in statistical analyses is frequently challenged by a dearth of developed methodology or high computational costs. First, we consider the problem of estimating treatment effects from observational data with propensity score matching allowing for the presence of spatial and multi-level confounding. We build on recently developed distance-adjusted propensity score matching (DAPSm) and propose a two-stage approach that first matches within clusters (WC), and then uses the DAPSm approach to match remaining subjects (WC+DAPsm). We demonstrate the benefits and robustness of our approach through an extensive simulation study. We apply our method to a population of patients in Georgia who have recently started dialysis, where both the treatment (informed of transplant options) and outcome (1-year referral for transplant) may be plausibly affected by individual, facility, and area-level factors.

Next, we consider the task of using satellite-derived aerosol optical depth (AOD) as a predictor for particulate matter (PM2.5) concentrations, allowing broader coverage than the network of air pollution monitors. However, AOD contains large contiguous areas of missing data due to cloud cover. We propose imputing missing AOD data using lattice kriging, a large-scale spatial statistical method, and random forest, a regression tree-based machine learning method, as well as a distance-based ensemble for combining the two methods. Throughout our application, we construct cross-validation folds and testing data based on spatially clustered holdouts more closely mimicking observed data patterns than traditional random holdouts. Our results show that the proposed distance-based ensemble outperforms individual methods.

For the third topic, we discuss on-going work assessing the equity of COVID-19 testing site access in the Atlanta area. We adapt methods from the environmental justice literature using empirical cumulative distribution functions to compare demographic subgroup access to testing sites. We consider different measures of access, and we conduct Monte Carlo simulations of test site placements under different sampling schemes to assess factors associated with site placement.

Table of Contents

Propensity score matching for multi-level and spatial data: 1

Imputing satellite-derived aerosol optical depth using a multi-resolution spatial model and random forest for PM2.5 prediction: 44

A framework for assessing COVID-19 testing site spatial access: 76

Appendix A. Supplemental Materials to “Propensity score matching for multi-level and spatial data”: 102

Appendix B. Supplemental Materials to “Imputing satellite-derived aerosol optical depth using a multi-resolution spatial model and random forest for PM2.5 prediction”: 131

Appendix C. Supplemental Materials to “A framework for assessing COVID-19 testing site spatial access”: 158

Bibliography: 164

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files