Bayesian Tree-Based Methods for Environmental Health Research Público

Englert, Jacob (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/ns064745j?locale=es
Published

Abstract

Bayesian nonparametric models are widely used for estimating complex relationships and functional forms among predictors in regression settings. Within this class of models, Bayesian Additive Regression Trees (BART) is frequently cited for its strong performance and flexibility in a wide variety of statistical problems. This dissertation extends BART to three modeling frameworks commonly used to measure associations between environmental exposures and health outcomes.

In the first aim, a varying coefficient BART model is introduced to estimate heterogeneous short-term associations between acute exposure and health outcomes within the case-crossover design. This approach is applied to examine trends in emergency department visits among patients with Alzheimer’s disease during heat waves in California. The proposed method allows individual responses to heat waves to vary based on chronic comorbid conditions such as chronic kidney disease and hypertension, thus providing a more nuanced understanding of heat-related vulnerability in this population.

For the second aim, a soft version of BART is applied to model count-based health outcomes in environmental mixtures studies. The approach approximates a smooth exposure-risk surface for daily asthma-related emergency department visits in the Metropolitan Atlanta area, modeling risk as a function of temperature and an air pollution mixture consisting of ozone, fine particulate matter, nitrogen dioxide, and carbon monoxide. Existing BART implementations for count outcomes require complex prior specifications, making it difficult to incorporate other useful model components such as spatial random effects and population offsets. To address this, we use latent random variables to model the risk surface. We further describe the use of accumulated local effects for summarizing exposure-risk surfaces composed of correlated continuous exposures.

In the third aim, an extension of the quantile g-computation framework for studying heterogeneous effects of environmental mixtures is proposed. When data arise from a large geographical study region, it may be unreasonable to expect a common mixture effect due to variation in the composition of the mixture or nonlinearity in the true exposure-response function. The proposed method leverages a recently developed varying coefficient BART model to explore spatially varying mixture effects describing the association between air pollution mixtures and reduced birth weight in Georgia.

Table of Contents

1 Introduction 1

1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Description of Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Health Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Exposure Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.3 Linking Exposures to Health Data . . . . . . . . . . . . . . . . . . . 5

1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Densities and Distributions . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Review of Bayesian Additive Regression Trees 9

2.1 Bayesian Additive Regression Trees . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 The Original BART Model . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Interpretable BART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.1 Variable Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.2 Partial Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.3 Accumulated Local Effects . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Estimating Heterogeneous Exposure Effects in the Case-Crossover Design

using BART 31

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.1 Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.2 Exposure Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.3 Posterior Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.4 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.4.1 CART Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.2 Friedman Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 Application: Alzheimer’s Disease and Heat Waves in California . . . . . . . . 50

3.5.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5.2 Model Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.7 Supplementary Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.7.1 CL-BART Algorithm Details . . . . . . . . . . . . . . . . . . . . . . . 61

3.7.2 Additional Simulation Materials . . . . . . . . . . . . . . . . . . . . . 65

3.7.3 Additional Application Materials . . . . . . . . . . . . . . . . . . . . 78

4 Modeling Joint Health Effects of Environmental Exposure Mixtures using

BART 89

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.2.1 Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.2.2 Air Pollution Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.2.3 Other Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3.1 Soft BART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3.2 Negative Binomial Regression with BART . . . . . . . . . . . . . . . 94

4.3.3 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.3.4 Model Interpretation via Accumulated Local Effects . . . . . . . . . . 98

4.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.5 Application: Asthma and Air Pollution in Atlanta, Georgia . . . . . . . . . . 106

4.5.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.5.2 Model Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.7 Supplementary Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

4.7.1 Soft BART Negative Binomial Algorithm Details . . . . . . . . . . . 113

4.7.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 118

4.7.3 Additional Simulation Materials . . . . . . . . . . . . . . . . . . . . . 120

4.7.4 Additional Application Materials . . . . . . . . . . . . . . . . . . . . 126

5 Spatially Varying Coefficient Models for Estimating Heterogeneous Mixture

Effects 130

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.2.1 Air Pollution Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.2.2 Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.3.1 Review of Quantile g-Computation for Mixture Modeling . . . . . . . 133

5.3.2 Review of Spatially Varying Coefficient Models . . . . . . . . . . . . 135

5.3.3 Spatially Varying Quantile g-Computation with BART . . . . . . . . 136

5.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.5 Application: Reduced Birth Weight and Air Pollution in Georgia . . . . . . . 143

5.5.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.5.2 Model Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

5.7 Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.7.1 Additional Simulation Materials . . . . . . . . . . . . . . . . . . . . . 153

5.7.2 Additional Application Materials . . . . . . . . . . . . . . . . . . . . 156

6 Conclusion 158

6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.2.1 Exposure Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.2.2 Causal Inference for Environmental Mixtures . . . . . . . . . . . . . . 160

6.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

A Appendix 162

A.1 Selection of Controls in the Case-Crossover Design . . . . . . . . . . . . . . . 162

A.2 Bayesian Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

A.2.1 Metropolis-Hastings Algorithm . . . . . . . . . . . . . . . . . . . . . 165

A.2.2 Reversible Jump Metropolis-Hastings Algorithm . . . . . . . . . . . . 166

A.2.3 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

A.2.4 Adaptive Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . 169

A.3 Bayesian Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.3.1 Widely Applicable Information Criterion . . . . . . . . . . . . . . . . 171

A.4 Conditional Autoregressive Models for Spatial Data . . . . . . . . . . . . . . 172

A.5 Updating the BART Terminal Node Prior Variance . . . . . . . . . . . . . . 174

A.5.1 Inverse-Gamma Approach . . . . . . . . . . . . . . . . . . . . . . . . 175

A.5.2 Half-Cauchy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 175

A.5.3 Marginal Half-Cauchy Approach . . . . . . . . . . . . . . . . . . . . . 176

A.5.4 Horseshoe Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

A.5.5 Inverse-Gamma Approach (k) . . . . . . . . . . . . . . . . . . . . . . 178

A.5.6 Marginal Half-Cauchy Approach (k) . . . . . . . . . . . . . . . . . . 179

Bibliography 180

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Palabra Clave
Committee Chair / Thesis Advisor
Committee Members
Última modificación

Primary PDF

Supplemental Files