Measurement error methods for unmeasured confounding and pooling Restricted; Files Only

Van Domelen, Dane (Summer 2018)

Permanent URL:


Epidemiologists increasingly utilize existing datasets to explore exposure-disease relationships. A common problem is that one or more covariates may not be available. In Chapter 1, we compare methods for handling unmeasured confounding when validation data can be obtained. We consider propensity score calibration as well as maximum likelihood and regression calibration from the measurement error literature, both of which require specifying a model for the unmeasured confounder given exposure, disease model covariates, and perhaps additional covariates. We apply the methods to assess whether low Vitamin D is associated with fecundity controlling for age, overweight status, and caloric intake, by combining a primary dataset missing caloric intake with a smaller validation dataset. We propose several modifications to propensity score calibration to relax a critical surrogacy assumption, leading to improved performance but nullifying an appealing identifiability property of the original method.


In the logistic regression setting, measuring biomarkers in combined samples (“pools”) from multiple cases or controls can lead to large gains in statistical efficiency. Two types of error threaten validity: assay-related measurement error, and processing error caused by forming pools. In Chapter 2, we present a likelihood approach to correct for both errors. We assume the biomarker level given covariates is normally distributed, and measurement and processing errors are independent, normally distributed, and not dependent on pool size. Our approach accommodates replicate measurements, which are not required for identifiability but improve stability. We apply our methods to a reproductive health dataset with pools of size 1 and 2 and replicates and assess validity and efficiency via simulations.


In Chapter 3, we present a logistic regression approach and a discriminant function approach for estimating the covariate-adjusted odds ratio relating a binary outcome to a right-skewed biomarker measured in homogeneous pools. Both assume multiplicative lognormal (rather than additive normal) measurement and processing errors acting on the poolwise mean and utilize constant-scale Gamma models for the biomarker level. In the motivating example, AIC favors these models over their normal counterparts from Chapter 2, although substantive results are similar. Our methods are implemented in the R package pooling.

Table of Contents

Chapter 1: Measurement error methods in the unmeasured confounding setting 1

1.1 Introduction 1

1.1.1 Confounding 4

1.2 Methods 7

1.2.1 Notation 7

1.2.2 Data types 8

1.2.3 Maximum likelihood 9

1.2.4 Regression calibration 12

1.2.5 Propensity score calibration 14

1.3 Results 17

1.3.1 Motivating example: low Vitamin D and fecundity 17

1.3.2 Simulations 23

1.4 Discussion 31


Chapter 2: Estimating the covariate-adjusted log-odds ratio for a continuous exposure measured in pools and subject to errors 37

2.1 Introduction 37

2.2 Methods 40

2.2.1 Poolwise logistic regression 40

2.2.2 ML for handling errors in Xi* 42

2.2.3 Approximate ML 43

2.2.4 Discriminant function approach 45

2.2.5 Incorporating replicates 47

2.2.6 Implementation 48

2.3 Collaborative Perinatal Project 49

2.4 Simulation studies 52

2.4.1 Validity of error-correction methods 53

2.4.2 Robustness to non-normality of errors 57

2.4.3 Efficiency of traditional vs. pooling designs 57

2.5 Discussion 61


Chapter 3: Gamma models to accommodate a skewed exposure measured in pools and subject to multiplicative errors 67

3.1 Introduction 67

3.2 Methods 71

3.2.1 Scenario 71

3.2.2 Logistic regression methods 72

3.2.3 Discriminant function methods 75

3.2.4 Implementation 78

3.3 Results 78

3.3.1 Motivating example 78

3.3.2 Simulations 82

3.4 Discussion 87


Chapter 4: Future work 91

4.1 Propensity score calibration with multiple confounders 91

4.2 Conditional logistic regression with pooling 92

4.3 Paired t-test designs 92

4.4 Expanding suite of pooling functions 93

4.5 Tools for designing pooling studies 93


Appendix: R code for motivating examples 95

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research field
Committee Chair / Thesis Advisor
Committee Members
Last modified No preview

Primary PDF

Supplemental Files