Methods for Improving Doubly Robust Estimators of Treatment Effects for Observational Studies and Randomized Trials Restricted; Files Only

Schader, Lindsey (Fall 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/z603qz936?locale=it
Published

Abstract

Estimating the causal effect of an intervention helps clinicians and policymakers determine whether the benefits of an intervention outweigh its costs. The field of causal inference has developed assumptions under which causal effects are identifiable from the observed data distribution. This dissertation centers around three different issues encountered when estimating treatment effects with machine learning-based causal inference methods. 

   

  In the first section, we develop a doubly robust targeted minimum loss-based estimator for the average treatment effect on the treated (ATT) when outcome data is missing at random. When nuisance regressions converge slower than the standard parametric rate, standard estimators of the ATT require that all nuisance regressions involved in estimation are consistently estimated to arrive at theoretically valid statistical inference. If this requirement does not hold, poor confidence interval coverage and inflated type 1 error may result. Our proposed estimator weakens these assumptions, requiring only one set of nuisance regressions to be correctly specified to arrive at theoretically valid statistical inference. 

  The second section is motivated by the Prepared, Protected, and empowered study, a randomized clinical trial designed to assess the efficacy of a social networking gamification application at increasing pre-exposure prophylaxis use among young men who have sex with men and young transgender women who have sex with men. Due to the COVID-19 pandemic, there was a high amount of missingness in the primary outcome for this study, which may decrease power for the analysis. We develop a novel estimator for the average treatment effect (ATE) in this setting that incorporates post-baseline auxiliary covariates to attempt to recover power to detect treatment effects. 

  In the third section, we explore the robustness of statistical results to random seed when the ATE is estimated with common doubly-robust estimators combined with flexible machine learning regression techniques. Such techniques often include random steps, such as sample splitting for cross-validation. We demonstrate that these random steps may lead to conflicting inferential results given the same dataset and statistical analysis plan. We propose two potential solutions for stabilizing both point estimates and inferential results in this setting and demonstrate their effectiveness through a simulation study. 

Table of Contents

1 Introduction 1

2 Nonparametric Doubly Robust Inference for Average Treatment Effect

on the Treated with Missing Outcomes 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Notation and Estimand . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Plug-In Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.3 Standard TMLE Estimator for the ATT . . . . . . . . . . . . 13

2.3 Proposed Doubly Robust Estimator for Average Treatment Effect Among

the Treated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Remainder Term Under Regression Misspecification . . . . . . 15

2.3.2 General Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.3 Example analysis of a single component of the remainder under

a single pattern of misspecification . . . . . . . . . . . . . . . 20

2.3.4 Results of full analysis of the remainder term under general

misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.5 Asymptotic properties of DRTMLE . . . . . . . . . . . . . . . 24

2.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4.1 Data-generating mechanism and set-up . . . . . . . . . . . . . 29

2.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.3 Simulation Hypotheses . . . . . . . . . . . . . . . . . . . . . . 30

2.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.1 Data and Methods . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Incorporating Auxiliary Covariates into Estimation of the Average

Treatment Effect with Targeted Maximum Likelihood Estimation 39

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.1 Notation, Model, and Definition of Average Treatment Effect . 44

3.2.2 Estimating the ATE . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 Proposed Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3.1 Identifying Functional . . . . . . . . . . . . . . . . . . . . . . 49

3.3.2 Proposed Targeted Maximum Likelihood Estimator . . . . . . 51

3.3.3 Theoretical Results for the Proposed TMLE . . . . . . . . . . 57

3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Don’t let your analysis go to seed: on the impact of random seed on

machine learning-based causal inference 69

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.2 Dependence of Doubly-Robust Estimators on Random Seed . 74

4.2.3 Proposed Solutions . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.1 Simulation Study Methods . . . . . . . . . . . . . . . . . . . . 77

4.4 Simulation Study Results . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.5 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Appendix A Appendix for Chapter 2 96

A.1 On “Convergence", Rates, and “Sufficient" Rates . . . . . . . . . . . . 96

A.2 Linear Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

A.2.1 Negligibility of the Extra Term in the Remainder . . . . . . . 98

A.3 On Compatibility of ¯gn,A and Ψalt as an Alternative Functional . . . . 100

A.4 Derivation of DRTMLE Estimator . . . . . . . . . . . . . . . . . . . . 103

A.4.1 Expansion for R1(ηn, η0) . . . . . . . . . . . . . . . . . . . . . 104

A.4.2 Expansion for R2(ηn, η0) . . . . . . . . . . . . . . . . . . . . . 106

A.4.3 Expansion for R3(ηn, η0) . . . . . . . . . . . . . . . . . . . . . 108

A.5 Assumptions of DRTMLE . . . . . . . . . . . . . . . . . . . . . . . . 110

A.6 Simulation Study Details . . . . . . . . . . . . . . . . . . . . . . . . . 112

A.6.1 Data Generating Mechanism . . . . . . . . . . . . . . . . . . . 112

A.6.2 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . 115

A.7 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Appendix B Appendix for Chapter 3 117

B.1 Standard TMLE for the ATE . . . . . . . . . . . . . . . . . . . . . . 117

B.2 Identifiability Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

B.3 Estimation of ¯QM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

B.4 Theorem Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

B.4.1 Bounding the Remainder Term . . . . . . . . . . . . . . . . . 121

B.4.2 Double Robustness . . . . . . . . . . . . . . . . . . . . . . . . 123

B.4.3 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . 124

B.5 Data Generating Mechanism for Simulation . . . . . . . . . . . . . . 124

B.6 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

B.6.1 Assessing Assumptions . . . . . . . . . . . . . . . . . . . . . . 126

B.6.2 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

B.6.3 Algorithms and Software . . . . . . . . . . . . . . . . . . . . . 128

Appendix C Appendix for Chapter 4 133

C.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

C.1.1 Illustration of Random Seed Dependence . . . . . . . . . . . . 133

C.1.2 High-Dimensional Data Generating Mechanism for the Simulation

Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

C.2 Justification Sketch for the Proposed Solutions . . . . . . . . . . . . . 135

C.3 Additional Doubly-robust Estimators . . . . . . . . . . . . . . . . . . 135

C.3.1 Targeted Maximum Likelihood Estimation (TMLE) . . . . . . 135

C.3.2 Doubly-Robust TMLE (DRTMLE) . . . . . . . . . . . . . . . 136

C.3.3 Cross-Fit TMLE and DRTMLE . . . . . . . . . . . . . . . . . 137

C.4 Additional Simulation Results . . . . . . . . . . . . . . . . . . . . . . 137

C.4.1 Additional AIPTW Results . . . . . . . . . . . . . . . . . . . 138

C.4.2 TMLE Simulation Results . . . . . . . . . . . . . . . . . . . . 139

C.4.3 DRTMLE Simulation Results . . . . . . . . . . . . . . . . . . 139

C.5 Real Data Analysis Details . . . . . . . . . . . . . . . . . . . . . . . . 233

Bibliography 240

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Parola chiave
Committee Chair / Thesis Advisor
Committee Members
Ultima modifica Preview image embargoed

Primary PDF

Supplemental Files