Quantile Regression for Complex Censored Data Open Access

Ji, Shuang (2012)

Permanent URL: https://etd.library.emory.edu/concern/etds/8s45q8830?locale=en
Published

Abstract

Survival data subject to complex censoring schemes are frequently encountered in biomedical research. For such data, naive application of classical approaches built for the random censoring case may lead to substantial estimation bias. In this dissertation, we focus on two different scenarios that involve complex censoring mechanisms, dependent censoring and double censoring. We develop appropriate methods under the quantile regression (Koenker and Bassett, 1978) framework, which are expected to accommodate a more dynamic relationship between covariates and survival time compared to traditional regression models in survival analysis.

The first part of this dissertation is motivated by the Warfarin-Aspirin Symptomatic Intracranial Disease (WASID) study, in which dependent censoring is posed by infor mative withdrawal. One scientific interest is about the analysis of time to a study endpoint defined as ischemic stroke, brain hemorrhage, or death from vascular causes, whichever happens first, corresponding to the setting where subjects do not withdraw. We propose a quantile regression procedure for such dependently censored data, along with an efficient and stable algorithm. We establish the uniform consistency and weak convergence of the resulting estimators. Extensive simulation studies demonstrate good finite-sample performance of the proposed inferential procedures. We illustrate the practical utility of our method via an application to the WASID study.

The second part of this dissertation is motivated by the US Cystic Fibrosis Foundation Patient Registry (CFFPR) study, in which double censoring presents while the left censoring variable is always observed. It is of interest to investigate the association between age at the first Pseudomonas aeruginosa (PA) infection, an important landmark event of CF pathology, and a set of risk factors. We propose a new analysis strategy for such doubly censored data and develop computationally simple estimation and inference procedures. Moreover, we propose conditional inference to address the special identifiability issues attached to the doubly censoring setting. Asymptotic properties are established for the resulting estimators, and the finite-sample performance is assessed by simulation studies. Analysis of the CFFPR study is also conducted based on our method.

In the third part, we study a double censoring data structure with unobservable left censoring times. We develop a self-consistent estimating equation along with an iterative algorithm. Our simulation studies demonstrate good finite-sample properties of the proposed method. We also apply the proposed method to the CFFPR study.

In summary, this dissertation work provides useful quantile regression tools for analyzing complex survival data, which have broad applications in medical and public health research.

Table of Contents

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 TheWarfarin-Aspirin Symptomatic Intracranial Disease (WASID) Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 The Cystic Fibrosis Foundation Patient Registry (CFFPR) Study 4
1.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Existing Work on Dealing with Dependent Censoring . . . . . 5
1.3.2 Existing Work on Dealing with Double Censoring . . . . . . . 7
1.4 Quantile Regression for Survival Data . . . . . . . . . . . . . . . . . . 8
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Quantile Regression for Dependently Censored Data 12
2.1 Quantile Regression Procedures . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Copula Functions . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Estimation Equations . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Computing Algorithms . . . . . . . . . . . . . . . . . . . . . . 18
2.1.5 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.6 Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 The WASID Study Example . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Regularity Conditions . . . . . . . . . . . . . . . . . . . . . . 38
2.5.2 Proof of Theorem 2.1.1 . . . . . . . . . . . . . . . . . . . . . . 39
2.5.3 Proof of Theorem 2.1.2 . . . . . . . . . . . . . . . . . . . . . . 42
2.6 Convergence Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6.1 Convergence Criteria for Computing Algorithms . . . . . . . . 47
3 Quantile Regression for Doubly Censored Data with Known Left
Censoring Times 48
3.1 Quantile Regression Procedures . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . 49
3.1.3 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.4 Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 A Conditional Version of Quantile Regression . . . . . . . . . . . . . 55
3.3 Extension to Handle Left Truncation . . . . . . . . . . . . . . . . . . 59
3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5 The CFFPR Data Example . . . . . . . . . . . . . . . . . . . . . . . 65
3.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.7.1 Proof of Theorem 3.1.2 . . . . . . . . . . . . . . . . . . . . . . 73
3.7.2 Justification for E{M(t)|Z} = 0 . . . . . . . . . . . . . . . . . 74
4 Quantile Regression for Doubly Censored Data 76
4.1 Quantile Regression Procedures . . . . . . . . . . . . . . . . . . . . . 77

4.1.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . 77
4.1.3 Computing Algorithm . . . . . . . . . . . . . . . . . . . . . . 80
4.1.4 Inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.1 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 Summary and Future Work 90
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliography 93

List of Figures
2.1 Upper Panel: Comparison among True coefficients β0(τ) (Bold Solid Lines), Mean Estimates for β0(τ) from the Proposed Method (Solid Lines) under a Correctly Specified Clayton Copula, and Mean Estimates for β0(τ) from the Naive Approach (Dotted Lines); Lower Panel: Comparison among True coefficients α0(τ) (Bold Solid Lines), Mean Estimates for α0(τ) from the Proposed Method (Solid Lines) under a Correctly Specified Clayton Copula, and Mean Estimates for α0(τ) from the Naive Approach (Dotted Lines). . . . . . . . . . . . . . . . 28


2.2 Estimates for β0(τ) under the Correctly Specified Clayton Copula with Misspecified Association Parameters: Kendall's tau= 0:79 (Dashed Lines), Kendall's tau= 0:33 (dotted Lines), Kendall's tau= 0:16 (Dot-dash Lines), and the True Association Parameter: Kendall's tau= 0:58 (Solid Lines); and the True Coefficients β0(τ) (Bold Solid Lines). . . 30

2.3 Estimates for β0(τ) under the Correctly Specified Frank Copula with Misspecified Association Parameters: Kendall's tau= 0:26 (Dashed Lines), Kendall's tau= --0:12 (Dotted Lines), Kendall's tau= --0:33 (Dotdash Lines), and the True Association Parameter: Kendall's tau= 0:58 (Solid Lines); and the True Coefficients β0(τ) (Bold Solid Lines). 31

2.4 Point Estimates of Regression Coefficients for Time to the Primary Endpoint (Ischemic Stroke, Brain Hemorrhage, or Death) under the Clayton Copula with Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8. . . . . . . 33


2.5 Estimated Quantiles of Time to the Primary Endpoint (Ischemic Stroke, Brain Hemorrhage, or Death) under the Clayton Copula with Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8, with the Stenosis Percentage Fixed at Its Mean ( 63.7%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35


2.6 Estimated Quantiles of Time to Early Termination of Study Medication under the Clayton Copula with Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8, with the Stenosis Percentage Fixed at Its Mean ( 63.7%). . . . . 36


3.1 Comparison among True coefficients (Bold Solid Lines), Mean Estimated Coefficients from the Proposed Method (Solid Lines), and Mean Estimated Coefficients from the Naive Approach (Dotted Lines). . . 67


3.2 Coefficient Estimates (Bold Solid Lines) and 95% Pointwise Confidence Intervals (Bold Dashed Lines) from the Proposed Method, in Contrast with Coefficient Estimates (Dot-Dash Lines) and 95% Pointwise Confidence Intervals (Dotted Lines) from the Naive Method. . . . . . . . 70


4.1 Coefficient Estimates (Bold Solid Lines) and 95% Pointwise Confidence Intervals (Shaded Areas) from the Proposed Method, in Contrast with Naive Coefficient Estimates Ignoring Left Censored Observations (Dotted Lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87


4.2 Estimated Quantiles of Age at First PA infection (Bold Solid Lines) and 95% Pointwise Confidence Intervals (Shaded Areas) from the Proposed Method, in Contrast with Estimated Quantiles from Chang and Yang (1987) (Dot-Dash Lines). . . . . . . . . . . . . . . . . . . . . . . . . . 89

List of Tables

2.1 Simulation Results on Parameter Estimation under the Clayton copula. Bias: biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . 26

2.2 Simulation Results on Parameter Estimation under the Frank copula. Bias: biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . 27


2.3 Simulation Results on Parameter Estimation when the Copula Function is Misspecified. Bias: biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . 29


2.4 The WASID Example: Standard Errors under the Clayton Copula with Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8. β(1), β(2) and β(3): estimated coefficients of Treatment, Diabetes and Stenosis Percentage on T, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1 Simulation Results under AFT Models. Bias: absolute biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


3.2 Simulation Results on Hypothesis Testing and Second-Stage Inference under AFT Models. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations. . . . . . . . . . . 62


3.3 Simulation Results under Log-Linear Models with Heteroscedastic Errors. Bias: absolute biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . 63

3.4 Simulation Results on Hypothesis Testing and Second-Stage Inference under Log-Linear Models with Heteroscedastic Errors. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations. . . . . . . . . . . . . . . . . . . . . . . 64


3.5 Simulation Results on Parameter Estimation for Doubly Censored Data with Left Truncation. Bias: absolute biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . 65

3.6 Simulation Results on Hypothesis Testing and Second-Stage Inference for Doubly Censored Data with Left Truncation. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations. . . . . . . . . . . . . . . . . . . . . . . 66


4.1 Simulation Results under Models with Constant Covariate Effects. Bias: absolute biases; EmpSD: empirical standard deviations; AvgSD: average estimated resampling-based standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . 84

4.2 Simulation Results under Model with Varying Covariate Effects. Bias: absolute biases; EmpSD: empirical standard deviations; AvgSD: average estimated resampling-based standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . 85

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files