Multiple Imputation with Multivariate Models: An Evaluation of Two Case Studies Open Access

Bounthavong, Mark (2013)

Permanent URL:


The purpose of this thesis compared different methods for handling missing data with two observational studies as case studies in order to determine if there were any potential influence on the study results and conclusions.

Both case studies used multivariate models to answer a specific hypothesis. The first retrospective cohort study (Case study 1) constructed logistic regression models to investigate the association between adherence and achievement of lipid panel changes (achieving a >/=25% reduction). The second retrospective cohort study (Case study 2) constructed a multiple linear regression model that investigated the association between drug (exenatide or liraglutide) and change in hemoglobin A1c (HbA1c) level. Multiple imputation (MI) method was compared to complete-case analysis (CCA) to determine the direction and magnitude of the parameter estimates for each case study.

In Case study 1, the regression results for the crude, CCA, and MI methods were similar and did not vary significantly for LDL, HDL, and TC reduction of a >/=25% or greater from baseline. In Case study 2, results for the crude, CCA, and MI methods were similar and did not vary significantly for HbA1c reduction from baseline.

Based on the results of this study, multiple imputation may not be beneficial since the conclusions remained unchanged. Researchers who are involved with multivariate models may consider using multiple imputation to address missing data. Multiple imputation could be presented alongside the results of the complete-case analysis; but this may seem redundant if there are no differences in study conclusions.

Table of Contents



1.1. Introduction and Rationale (Page 1)

1.2. Problem Statement (Page 3)

1.1.2. Missing data mechanisms (Page 5) Missing completely at random (MCAR) (Page 6) Missing At Random (MAR) (Page 6) Not Missing At Random (NMAR) (Page 7)

1.2.2. Consequences of missing data mechanisms (Page 8)

1.3. Theoretical Framework (Page 8)

1.3.1. Simple methods (ad hoc) (Page 8) Complete-case analysis (Page 8) Available-case analysis (Page 9) Single imputation (Page 10)

1.3.2. Complex methods ( Page 12) Multiple imputations (Page 13)

1.4. Purpose Statement (Page 17) 1.5. Research Question (Page 17) 1.6. Significance Statement (Page 18) 1.7. Definition of Terms (Page 19) CHAPTER 2: REVIEW OF LITERATURE (Page 20)

2.1. Systematic Review of the Literature (Page 20)

2.2. Results of the Systematic Review (Page 21)

2.2.1. Summary of results (Page 22)

2.3. Summary of Current Problem and Study Relevance (Page 30) CHAPTER 3: METHODS (Page 33) 3.1. Introduction (Page 33) 3.2. Population and Sample (Page 34)

3.2.1. Case 1 - Regional level (Page 34)

3.2.2. Case 2 - National level (Page 36)

3.3. Research Design and Procedures (Page 37)

3.3.1. Case Study 1 (Page 37) 3.3.2. Case Study 2 (Page 38) 3.4. Instruments (Page 39)

3.5. Plans for Data Analysis (Page 41)

3.6. Limitations and Delimitations (Page 43)

CHAPTER 4: RESULTS (Page 45) 4.1. Introduction (Page 45) 4.2. Findings (Page 45) 4.2.1. Case study 1 (Page 45) 4.2.2. Case study 2 (Page 50) 4.3. Summary (Page 53)


5.1. Introduction (Page 56) 5.2. Summary of Study (Page 56) 5.2.1. Case study 1 (Page 57) 5.2.2. Case study 2 (Page 58) 5.3. Conclusions (Page 58) 5.3.1. Case study 1 (Page 58) 5.3.2. Case study 2 (Page 59) 5.5.3. Limitations (Page 60) 5.4. Implications (Page 61) 5.5. Recommendations (Page 62) REFERENCES (Page 64) APPENDIX (Page 69)

Appendix 1: SAS codes for Case study 1 (Page 69)

Appendix 2: SAS codes for Case study 2 (Page 80)


Table 1. Efficiency of multiple imputations relative to proportion of missing values (Page 15)

Table 2. Definition of terms 19

Table 3. ICD-9-CM Diagnosis Codes for dyslipidemia use for case study 1 (Page 35)

Table 4. Baseline demographics between adherent and non-adherent subjects (Page 46)

Table 5. Missing data pattern for Case study 1 (Page 47)

Table 6. Univariate analysis with lipid outcomes, Case study 1 (Page 48)

Table 7. Odds of achieving a >/=25% reduction in lipid panel levels for adherent

versus non-adherent patients on a statin in the VASDHS, Case study 1 (Page 48)

Table 8. Baseline demographics for exenatide and liraglutide groups, Case study 2 (Page 49)

Table 9. Number of missing data, Case study 2 (Page 51)

Table 10. Percent change in HbA1 at 2 years from baseline for exenatide relative to

liraglutide, Case study 2 (Page 52)


Figure 1. Multiple imputation of m = 5 datasets (Page 14)

Figure 2. Flow diagram of the literature search (Page 22)

Figure 3. Schematic of the VA Regional Data Warehouse (Page 40)

Figure 4. Schematic of the VA Corporate Data Warehouse (Page 40)

Figure 5. Recommended guideline for validating study conclusions (Page 63)

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Partnering Agencies
Last modified

Primary PDF

Supplemental Files