Analysis of Data with Complex Misclassification in Response or Predictor Variables by Incorporating Validation Subsampling Open Access
Tang, Li (2012)
Abstract
Abstract
Analysis of Data with Complex Misclassification in Response or
Predictor Variables by Incorporating Validation Subsampling
By Li Tang
The problems of misclassification are common in epidemiological and clinical research. Misclassification may be present in either an exposure or outcome variable, or both. It is well known that the validity of analytic results (e.g., estimates of odds ratios of interest) might be questionable when no correction effort is made. Therefore, valid and accessible methods with which to deal with these issues are still in high demand.
In this dissertation, we first consider the situation when correlated binary response variables are subject to misclassification. Building upon prior work that extended McNemar's test to correct paired-data odds ratio estimation, we propose a nonlinear mixed model-based approach to adjust for potentially complex differential misclassification in correlated binary responses via internal validation sampling.
In the second topic, we shift gears toward predictor misclassification, for which we develop likelihood-based approaches based on generalized linear and generalized linear mixed models that can efficiently incorporate internal validation data in univariate and multivariate settings, respectively. We discuss the use of the approach both in the case when a baseline predictor is misclassified and when a time-dependent predictor is misclassified.
In the final topic, we elucidate extensions of well-studied methods in order to facilitate misclassification adjustment when a binary outcome and binary exposure variable are both subject to complex differential misclassification in the 2-by-2 table scenario. We develop maximum likelihood approaches to accommodate a broad range of complexity in the joint misclassification process while incorporating various types of internal validation observations. We then generalize the method to a more standard binary regression setting, allowing the incorporation of covariates both in the main health effects model of interest and in misclassification models for both the binary outcome and exposure variable. Throughout, illustrative examples are presented via detailed analyses of bacterial vaginosis and trichomoniasis data from the HIV Research Epidemiology Study (HERS).
Key Words: Differential; Misclassification; Internal Validation; Likelihood
Table of Contents
Table of Contents
Chapter 1 Introduction...1
1.1 Overview...1
1.2 Misclassification in Correlated Binary Responses...2
1.3 Misclassification in Predictors...4
1.4 Misclassification in Response and Predictor Variables in
2×2 Tables...6
1.5 Misclassification in Response and Predictor Variables in
Regression...7
1.6 Motivating Example...8
Chapter 2 Regression Analysis for Differentially Misclassified Correlated Binary Responses...10
2.1 Methods...10
2.1.1 Notation...10
2.1.2 Validation Sampling Scheme...12
2.1.3 Non-differential Misclassification with External
Validation...12
2.1.4 Differential Misclassification...13
2.1.5 Main-study Only and Sensitivity Analysis...16
2.1.6 Estimation...17
2.1.7 Correlation in Misclassification Processes...17
2.2 Simulation Studies...18
2.2.1 Non-differential
Misclassification...18
2.2.2 Differential Misclassification...20
2.2.3 Importance of Correctly Specifying SE/SP Model...22
2.2.4 A Note About Correlated Misclassification...24
2.3 Example...27
2.3.1 HERS Example...27
2.3.2 Example 1: Pairwise No-covariate case...27
2.3.3 Example 2: Pairwise Covariate-adjusted case...30
2.3.4 Example 3: Longitudinal Analysis with >2 Time
Points...38
2.4. Discussion...44
Chapter 3 Regression Analysis for Differentially Misclassified Binary Covariates...47
3.1 Univariate Case...47
3.1.1 Model Specification...47
3.1.2 External Validation: Non-differential
Misclassification...48
3.1.3 Internal Validation: Differential
Misclassification...50
3.1.4 Note on Impact of Mis-specifying X|C Model...52
3.2 Extension to Repeated Measures...52
3.2.1 Model Specification...52
3.2.2 External Validation: Non-Differential
Misclassification...54
3.2.3 Internal Validation: Differential...55
3.2.4 Estimation...56
3.3. Simulation Studies...57
3.3.1 External Validation in Univariate Case:
Non-Differential Misclassification...57
3.3.2 Internal Validation in Univariate Case: Differential
Misclassification...59
3.3.3 External Validation in Longitudinal Case: Non-Differential
Misclassification...61
3.3.4 Internal Validation in Longitudinal Case: Differential
Misclassification...63
3.4. Example...65
3.4.1 HERS Example...65
3.4.2 Example 1: Univariate Analysis with Visit 4...65
3.4.3 Example 2: Longitudinal Analysis...68
3.5. Discussion...71
Chapter 4 Misclassification in Response and Predictor Variables in 2×2 Tables...73
4.1 Methods...73
4.1.1 Notations and Terminology...73
4.1.2 Maximum Likelihood (ML) Approach...76
4.1.3 Generalized Matrix Method...77
4.1.4 Generalized Inverse Matrix Method...78
4.1.5 Estimation of Misclassification Probabilities and
Variance...79
4.1.6 Notes on Case-Control Studies...82
4.1.7 Model Selection...85
4.1.8 Comments Regarding Null Testing...86
4.2. SIMULATION STUDIES...90
4.2.1 Study I: Mimicking Real-data
Example...90
4.2.2 Study II: Different Types of Misclassification...92
4.2.3 Study III: Performance of Model Selection...97
4.2.4 Study IV: Misclassification in Case-control studies...100
4.3. EXAMPLE...101
4.4 Discussion...104
Chapter 5 Misclassification in Response and Predictor Variables in Logistic Regression...108
5.1. Methods...108
5.1.1 Notation...108
5.1.2 Independent Nondifferential Misclassification...108
5.1.3 Independent Differential Misclassification...110
5.1.4 Dependent and Differential Misclassification...113
5.1.5 Other Types of Misclassification...114
5.2. Example...115
5.3. Simulation Studies...122
5.4. Discussion...124
REFERENCES...125
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Analysis of Data with Complex Misclassification in Response or Predictor Variables by Incorporating Validation Subsampling () | 2018-08-28 12:40:47 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|