Quantile Regression for Complex Censored Data Público
Ji, Shuang (2012)
Abstract
Survival data subject to complex censoring schemes are frequently encountered in biomedical research. For such data, naive application of classical approaches built for the random censoring case may lead to substantial estimation bias. In this dissertation, we focus on two different scenarios that involve complex censoring mechanisms, dependent censoring and double censoring. We develop appropriate methods under the quantile regression (Koenker and Bassett, 1978) framework, which are expected to accommodate a more dynamic relationship between covariates and survival time compared to traditional regression models in survival analysis.
The first part of this dissertation is motivated by the Warfarin-Aspirin Symptomatic Intracranial Disease (WASID) study, in which dependent censoring is posed by infor mative withdrawal. One scientific interest is about the analysis of time to a study endpoint defined as ischemic stroke, brain hemorrhage, or death from vascular causes, whichever happens first, corresponding to the setting where subjects do not withdraw. We propose a quantile regression procedure for such dependently censored data, along with an efficient and stable algorithm. We establish the uniform consistency and weak convergence of the resulting estimators. Extensive simulation studies demonstrate good finite-sample performance of the proposed inferential procedures. We illustrate the practical utility of our method via an application to the WASID study.
The second part of this dissertation is motivated by the US Cystic Fibrosis Foundation Patient Registry (CFFPR) study, in which double censoring presents while the left censoring variable is always observed. It is of interest to investigate the association between age at the first Pseudomonas aeruginosa (PA) infection, an important landmark event of CF pathology, and a set of risk factors. We propose a new analysis strategy for such doubly censored data and develop computationally simple estimation and inference procedures. Moreover, we propose conditional inference to address the special identifiability issues attached to the doubly censoring setting. Asymptotic properties are established for the resulting estimators, and the finite-sample performance is assessed by simulation studies. Analysis of the CFFPR study is also conducted based on our method.
In the third part, we study a double censoring data structure with unobservable left censoring times. We develop a self-consistent estimating equation along with an iterative algorithm. Our simulation studies demonstrate good finite-sample properties of the proposed method. We also apply the proposed method to the CFFPR study.
In summary, this dissertation work provides useful quantile regression tools for analyzing complex survival data, which have broad applications in medical and public health research.
Table of Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 2
1.2 Motivating Examples . . . . . . . . . . . . . . . . . . . . . .
. . . . . 3
1.2.1 TheWarfarin-Aspirin Symptomatic Intracranial Disease (WASID)
Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2.2 The Cystic Fibrosis Foundation Patient Registry (CFFPR) Study
4
1.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 5
1.3.1 Existing Work on Dealing with Dependent Censoring . . . . .
5
1.3.2 Existing Work on Dealing with Double Censoring . . . . . . .
7
1.4 Quantile Regression for Survival Data . . . . . . . . . . . . .
. . . . . 8
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 10
2 Quantile Regression for Dependently Censored Data 12
2.1 Quantile Regression Procedures . . . . . . . . . . . . . . . .
. . . . . 13
2.1.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . .
. . . 13
2.1.2 Copula Functions . . . . . . . . . . . . . . . . . . . . . .
. . . 14
2.1.3 Estimation Equations . . . . . . . . . . . . . . . . . . . .
. . . 15
2.1.4 Computing Algorithms . . . . . . . . . . . . . . . . . . . .
. . 18
2.1.5 Asymptotic Results . . . . . . . . . . . . . . . . . . . . .
. . . 21
2.1.6 Inferences . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 23
2.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 24
2.3 The WASID Study Example . . . . . . . . . . . . . . . . . . . .
. . . 29
2.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 37
2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 38
2.5.1 Regularity Conditions . . . . . . . . . . . . . . . . . . . .
. . 38
2.5.2 Proof of Theorem 2.1.1 . . . . . . . . . . . . . . . . . . .
. . . 39
2.5.3 Proof of Theorem 2.1.2 . . . . . . . . . . . . . . . . . . .
. . . 42
2.6 Convergence Criteria . . . . . . . . . . . . . . . . . . . . .
. . . . . . 47
2.6.1 Convergence Criteria for Computing Algorithms . . . . . . . .
47
3 Quantile Regression for Doubly Censored Data with Known
Left
Censoring Times 48
3.1 Quantile Regression Procedures . . . . . . . . . . . . . . . .
. . . . . 49
3.1.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . .
. . . 49
3.1.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . .
. . 49
3.1.3 Asymptotic Results . . . . . . . . . . . . . . . . . . . . .
. . . 51
3.1.4 Inferences . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 53
3.2 A Conditional Version of Quantile Regression . . . . . . . . .
. . . . 55
3.3 Extension to Handle Left Truncation . . . . . . . . . . . . . .
. . . . 59
3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 60
3.5 The CFFPR Data Example . . . . . . . . . . . . . . . . . . . .
. . . 65
3.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 71
3.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 73
3.7.1 Proof of Theorem 3.1.2 . . . . . . . . . . . . . . . . . . .
. . . 73
3.7.2 Justification for E{M(t)|Z} = 0 . . . . . . . . . . . . . . .
. . 74
4 Quantile Regression for Doubly Censored Data 76
4.1 Quantile Regression Procedures . . . . . . . . . . . . . . . .
. . . . . 77
4.1.1 Data and Model . . . . . . . . . . . . . . . . . . . . . .
. . . . 77
4.1.2 Estimation Procedure . . . . . . . . . . . . . . . . . . . .
. . 77
4.1.3 Computing Algorithm . . . . . . . . . . . . . . . . . . . . .
. 80
4.1.4 Inferences . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 82
4.2 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 83
4.2.1 Simulation Studies . . . . . . . . . . . . . . . . . . . . .
. . . 83
4.2.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . .
. . . 86
5 Summary and Future Work 90
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 91
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 92
Bibliography 93
List of Figures
2.1 Upper Panel: Comparison among True coefficients
β0(τ) (Bold Solid Lines), Mean
Estimates for β0(τ) from the Proposed
Method (Solid Lines) under a Correctly Specified Clayton Copula,
and Mean Estimates for
β0(τ)
from the Naive Approach (Dotted Lines); Lower Panel: Comparison
among True coefficients
α0(τ) (Bold Solid
Lines), Mean Estimates for
α0(τ) from
the Proposed Method (Solid Lines) under a Correctly Specified
Clayton Copula, and Mean Estimates for
α0(τ) from
the Naive Approach (Dotted Lines). . . . . . . . . . . . . . . .
28
2.2 Estimates for
β0(τ)
under the Correctly Specified Clayton Copula with
Misspecified Association Parameters: Kendall's tau= 0:79 (Dashed
Lines), Kendall's tau= 0:33 (dotted Lines), Kendall's tau= 0:16
(Dot-dash Lines), and the True Association Parameter: Kendall's
tau= 0:58 (Solid Lines); and the True Coefficients
β0(τ)
(Bold Solid Lines). . . 30
2.3 Estimates for β0(τ) under the Correctly Specified Frank Copula with Misspecified Association Parameters: Kendall's tau= 0:26 (Dashed Lines), Kendall's tau= --0:12 (Dotted Lines), Kendall's tau= --0:33 (Dotdash Lines), and the True Association Parameter: Kendall's tau= 0:58 (Solid Lines); and the True Coefficients β0(τ) (Bold Solid Lines). 31
2.4 Point Estimates of Regression Coefficients for Time to the Primary Endpoint (Ischemic Stroke, Brain Hemorrhage, or Death) under the Clayton Copula with Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8. . . . . . . 33
2.5 Estimated Quantiles of Time to the Primary Endpoint (Ischemic
Stroke, Brain Hemorrhage, or Death) under the Clayton Copula with
Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8, with the Stenosis
Percentage Fixed at Its Mean ( 63.7%) . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 35
2.6 Estimated Quantiles of Time to Early Termination of Study
Medication under the Clayton Copula with Kendall's tau=0, 0.2, 0.4,
0.6 and 0.8, with the Stenosis Percentage Fixed at Its Mean (
63.7%). . . . . 36
3.1 Comparison among True coefficients (Bold Solid Lines), Mean
Estimated Coefficients from the Proposed Method (Solid Lines), and
Mean Estimated Coefficients from the Naive Approach (Dotted Lines).
. . 67
3.2 Coefficient Estimates (Bold Solid Lines) and 95% Pointwise
Confidence Intervals (Bold Dashed Lines) from the Proposed Method,
in Contrast with Coefficient Estimates (Dot-Dash Lines) and 95%
Pointwise Confidence Intervals (Dotted Lines) from the Naive
Method. . . . . . . . 70
4.1 Coefficient Estimates (Bold Solid Lines) and 95% Pointwise
Confidence Intervals (Shaded Areas) from the Proposed Method, in
Contrast with Naive Coefficient Estimates Ignoring Left Censored
Observations (Dotted Lines). . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 87
4.2 Estimated Quantiles of Age at First PA infection (Bold Solid
Lines) and 95% Pointwise Confidence Intervals (Shaded Areas) from
the Proposed Method, in Contrast with Estimated Quantiles from
Chang and Yang (1987) (Dot-Dash Lines). . . . . . . . . . . . . . .
. . . . . . . . . . . 89
List of Tables
2.1 Simulation Results on Parameter Estimation under the Clayton copula. Bias: biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . 26
2.2 Simulation Results on Parameter Estimation under the Frank copula. Bias: biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . 27
2.3 Simulation Results on Parameter Estimation when the Copula
Function is Misspecified. Bias: biases; AvgSD: average estimated
resampling-based standard deviations; EmpSD: empirical standard
deviations; Cov95: coverage rates of 95% Wald confidence intervals.
. . . . . . . 29
2.4 The WASID Example: Standard Errors under the Clayton Copula
with Kendall's tau=0, 0.2, 0.4, 0.6 and 0.8. β(1),
β(2) and β(3): estimated
coefficients of Treatment, Diabetes and Stenosis Percentage on T,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 34
3.1 Simulation Results under AFT Models. Bias: absolute biases; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Simulation Results on Hypothesis Testing and Second-Stage
Inference under AFT Models. ERR: empirical rejection rates; AvgEst:
estimated average effects; AvgSD: average estimated
resampling-based standard deviations; EmpSD: empirical standard
deviations. . . . . . . . . . . 62
3.3 Simulation Results under Log-Linear Models with Heteroscedastic
Errors. Bias: absolute biases; AvgSD: average estimated
resampling-based standard deviations; EmpSD: empirical standard
deviations; Cov95: coverage rates of 95% Wald confidence intervals.
. . . . . . . 63
3.4 Simulation Results on Hypothesis Testing and Second-Stage Inference under Log-Linear Models with Heteroscedastic Errors. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations. . . . . . . . . . . . . . . . . . . . . . . 64
3.5 Simulation Results on Parameter Estimation for Doubly Censored
Data with Left Truncation. Bias: absolute biases; AvgSD: average
estimated resampling-based standard deviations; EmpSD: empirical
standard deviations; Cov95: coverage rates of 95% Wald confidence
intervals. . . 65
3.6 Simulation Results on Hypothesis Testing and Second-Stage Inference for Doubly Censored Data with Left Truncation. ERR: empirical rejection rates; AvgEst: estimated average effects; AvgSD: average estimated resampling-based standard deviations; EmpSD: empirical standard deviations. . . . . . . . . . . . . . . . . . . . . . . 66
4.1 Simulation Results under Models with Constant Covariate
Effects. Bias: absolute biases; EmpSD: empirical standard
deviations; AvgSD: average estimated resampling-based standard
deviations; Cov95: coverage rates of 95% Wald confidence intervals.
. . . . . . . . . . . . . . 84
4.2 Simulation Results under Model with Varying Covariate Effects. Bias: absolute biases; EmpSD: empirical standard deviations; AvgSD: average estimated resampling-based standard deviations; Cov95: coverage rates of 95% Wald confidence intervals. . . . . . . . . . . . . . . . . . 85
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Quantile Regression for Complex Censored Data () | 2018-08-28 10:12:36 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|