Likelihood Methods for Logistic Regression with Missing Data Open Access

Lin, Ji (2012)

Permanent URL: https://etd.library.emory.edu/concern/etds/mp48sd751?locale=en
Published

Abstract

In biometric research, missing data are often encountered. This
dissertation explores methods to deal with missing data in statistical analysis of logistic
regression. The disease status and risk exposure could be subject to missing data
separately or together. The interest is on identifying the covariate-adjusted association
between the disease status and the risk exposure, with consideration of the potential
impact of the missing data.
The first research topic was focused on providing an intuitive and computationally
accessible approach when the assumption of missing at random (MAR) was imposed. We
proposed a weighting method, utilizing an expanded dataset with two approaches to
estimation in different situations. The assumption of MAR is usually imposed in practice but often not testable. It is
then important to assess how sensitive the results are to the violation of this assumption.
In the second research topic, a framework of sensitivity analysis was proposed by
specification of alternative missing data mechanisms. The result from each specified
scenario is compared to that from MAR so that to assess the magnitude of change of
parameter estimates relative to deviation from MAR. Examples and simulation results
suggest that the proposed method succeeds in detecting the direction and magnitude of
bias in parameter estimates even if the specification of the alternative missing data
mechanism is not completely correct.
In the third research topic, we explore the reassessment design, where a second wave
of sampling is made in an attempt to recover some portion of the missing data in the
original data collection. We construct a joint likelihood based on the original model of
interest and a model for the missing data mechanism, with emphasis upon "non-ignorable"
missingness. The estimation is carried out by numerical maximization of the joint
likelihood and standard errors are estimated via a close approximation of the Hessian
matrix. We show how likelihood ratio tests can be used for model selection and how they
facilitate hypothesis testing for whether missingness is at random, which is an assumption
that can be suspect in many practical applications. Examples and simulations are
presented to demonstrate the performance of the proposed method.

Table of Contents

Table of Contents
Chapter 1.
INTRODUCTION AND BACKGROUND ................................................... 1
1.1.
Introduction ............................................................................................................. 1
1.2.
Background .............................................................................................................. 1
1.2.1.
Missing-Data Mechanisms .................................................................................... 1
1.2.2.
Complete-Case Analysis ........................................................................................ 3
1.2.3.
Maximum Likelihood Method .............................................................................. 5
1.2.4.
Inverse Propensity Weighting ............................................................................... 6
1.2.5.
Weighted Estimating Equations ............................................................................ 7
1.2.6.
Multiple Imputation ............................................................................................... 8
1.2.7.
Predictive Probability Weighting ........................................................................ 10
1.2.8.
Jackknife Resampling Method ............................................................................ 11
1.2.9.
Sensitivity Analysis with Data Missing Not-At-Random ................................. 11
1.2.10.
Reassessment Data in Missing Data Problems .................................................. 12
Chapter 2.
A WEIGHTING METHOD FOR LOGISTIC REGRESSION WITH
DATA MISSING-AT-RANDOM ......................................................................................... 14
2.1.
Introduction ........................................................................................................... 14
2.2.
Methods .................................................................................................................. 15
2.2.1.
Outcome Missing ................................................................................................. 16
2.2.2.
Predictor Variable Missing .................................................................................. 17
2.2.3.
More Than One Variable Missing ....................................................................... 42
2.3.
Simulation Results ................................................................................................. 44
2.3.1.
With Categorical Covariate C ........................................................................... 45
2.3.2.
With Continuous Covariates C ......................................................................... 47
2.4.
Discussions ............................................................................................................. 62
2.4.1.
Comparison of the Methods ................................................................................ 62
2.4.2.
Connection between the IPW and the "Flipped-Around" Model ..................... 65
Chapter 3.
SENSITIVITY ANALYSIS FOR DATA NOT MISSING AT
RANDOM IN LOGISTIC REGRESSION ........................................................................ 66
3.1.
Introduction ........................................................................................................... 66
3.2.
Methods .................................................................................................................. 68
3.2.1.
The No-Covariate Case: Basic Sensitivity Analysis .......................................... 68
3.2.2.
The Covariate Case .............................................................................................. 76
3.2.3.
Standard Error Estimation ................................................................................... 82
3.3.
Examples ................................................................................................................ 83
3.3.1.
No Covariate Case ............................................................................................... 83
3.3.2.
Covariate Case ..................................................................................................... 91
3.3.3.
Monte Carlo Sensitivity Analysis ....................................................................... 96
3.4.
Simulations .......................................................................................................... 100
3.5.
Discussions ........................................................................................................... 103
3.5.1.
Connection Between the Three Ways to Specify Alternative Missing
Mechanism ......................................................................................................................... 103
3.5.2.
Extensions........................................................................................................... 103
Chapter 4.
JOINT MODEL FOR LOGISTIC REGRESSION WITH MISSING
DATA AND REASSESSMENT DESIGN ......................................................................... 105
4.1.
Introduction ......................................................................................................... 105
4.2.
Methods ................................................................................................................ 106
4.2.1.
Outcome or Exposure Missing in Logistic Regression with
Reassessment Data ............................................................................................................. 106
4.2.2.
Outcome and Exposure Missing in Logistic Regression with
Reassessment Data ............................................................................................................. 110
4.2.3.
Estimation ........................................................................................................... 113
4.2.4.
Model Selection and Testing Not-Missing-At-Random .................................. 114
4.3.
Simulations .......................................................................................................... 114
4.3.1.
Comparison of Methods with NMAR X ....................................................... 115
4.3.2.
Comparison of Methods with MAR X .......................................................... 119
4.3.3.
Both Outcome and Exposure NMAR with Reassessment .............................. 121
4.4.
Example................................................................................................................ 122
4.5.
Discussion ............................................................................................................. 128
Chapter 5.
SUMMARY AND FUTURE RESEARCH ................................................ 130
5.1.
Summary .............................................................................................................. 130
5.2.
Future Research .................................................................................................. 132
BIBLIOGRAPHY .............................................................................................................. 134

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files