Latent Class Methods for Complex Chronic Disease Data Open Access
Fei, Teng (Summer 2021)
Abstract
Latent class analysis (LCA) is a powerful but intuitive data-driven tool to characterize the heterogeneity of chronic disease phenotypes. Motivated by the different research questions on neurodegenerative disease, we develop novel latent class methods in this dissertation, aiming to overcome various limitations of existing methods, such as estimation bias, restrictive parametric model assumptions, and expensive computation. We apply our methods to analyze the Uniform Data Set (UDS) for a cohort with mild cognitive impairment (MCI).
In the first topic, we propose a novel structural time-dependent competing risks model, which is sensibly formulated to assess the association between latent classes of baseline cognitive performance in MCI patients and their subsequent neuropathological features. We develop a two-step estimation procedure which circumvents latent class membership assignment and is rigorously justified in terms of accounting for the uncertainty in classifying latent classes. The new method also properly addresses the complications for competing risks outcomes, such as censoring and missing failure types. Our application on UDS uncovers a detailed picture of the neuropathological relevance of the baseline MCI subgroups.
Next, we develop a semi-parametric LCA framework with proportional hazards submodel to investigate the heterogeneity of baseline patient characteristics and its implications for survival. We novelly utilize non-parametric maximum likelihood estimator (NPMLE) to derive estimation procedure and asymptotic theories, which addresses considerable complications due to the presence of infinite-dimensional baseline hazard component in the finite mixture framework. The framework also flexibly considers class-specific covariate effects on both class membership and hazard. We apply the method on the UDS data, which reveals MCI subgroups with distinctive baseline factors on class-specific survival, and further helps to improve the prediction of survival based on baseline covariates.
In the third topic, we study a finite mixture framework for joint longitudinal and survival data, which effectively incorporates semi-parametric generalized estimating equation (GEE) and proportional hazards submodels. Critically, we account for the within-class correlation between longitudinal trajectories and time-to-event by treating longitudinal outcomes as time-dependent internal covariates for the survival submodel. We derive unbiased estimator which properly addresses challenging data characteristics, including time-dependent internal covariates and informative censoring of longitudinal observations due to a terminal event. Our application on the UDS data recognizes multiple latent MCI subgroups with distinguishable neurodegeneration trajectories and survival probability curves.
Table of Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 A Time-Dependent Structural Model Between Latent Classes and Competing Risks Outcomes 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Data, notation and models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Latent class model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Structural competing risks model . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Missing failure type model . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Estimation and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Estimation for the latent class model . . . . . . . . . . . . . . . . . . 13
2.3.2 Estimation for the model for missing failure types . . . . . . . . . . 14
2.3.3 Estimation for the structural competing risks model . . . . . . . . . 14
2.3.4 Asymptotic properties of the proposed estimator . . . . . . . . . . . 16
2.3.5 Inference procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 Data generation and analysis procedures . . . . . . . . . . . . . . . . 19
2.4.2 Simulation scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.3 Convergence of algorithm . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 An application to the MCI data from UDS . . . . . . . . . . . . . . . . . . 24
2.5.1 Latent class model and missing failure type model . . . . . . . . . . 25
2.5.2 Structural competing risks model . . . . . . . . . . . . . . . . . . . . 26
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7.2 Proof of Equation (2.3.2) . . . . . . . . . . . . . . . . . . . . . . . . 31
2.7.3 Proof of Equation (2.3.4) . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.4 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.5 Proof of Theorem 2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7.6 Further simulation about selecting the number of latent classes . . . 41
2.7.7 Additional tables for simulation results . . . . . . . . . . . . . . . . 49
2.7.8 Simulation under severe overlapping plus severely imbalanced class proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7.9 Discussions about the independent censoring assumption . . . . . . 56
3 Latent Class Analysis for Time-to-event Data Based on Semi-parametric Proportional Hazards Submodel 58
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Data, notation and models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.1 Data and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.2 The assumed models . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3 Estimation and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.1 Observed data likelihood . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 EM algorithm for point estimation . . . . . . . . . . . . . . . . . . . 64
3.3.3 Asymptotic properties and variance estimation . . . . . . . . . . . . 66
3.3.4 Selecting the number of latent classes . . . . . . . . . . . . . . . . . 68
3.3.5 Assessing the prediction performance . . . . . . . . . . . . . . . . . . 69
3.4 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.1 Estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.2 Determining the number of latent classes . . . . . . . . . . . . . . . 73
3.4.3 Goodness-of-fit and prediction . . . . . . . . . . . . . . . . . . . . . 74
3.5 Real data example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 Summary statistics of the obtained two latent classes . . . . . . . . . 76
3.5.2 Parameter estimation and interpretation . . . . . . . . . . . . . . . . 77
3.5.3 Assessment of goodness-of-fit and prediction performances . . . . . . 78
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.7 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7.1 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7.2 Proof of Theorem 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.7.3 Analytical variance estimator . . . . . . . . . . . . . . . . . . . . . . 90
3.7.4 Additional tables and figures for simulation results . . . . . . . . . . 94
4 Semi-parametric Latent Class Analysis for Joint Longitudinal and Survival Data 98
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2 Data, notation and models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2.1 Latent class probability submodel . . . . . . . . . . . . . . . . . . . 102
4.2.2 Class-specific generalized estimating equation submodel . . . . . . . 103
4.2.3 Class-specific Cox regression submodel . . . . . . . . . . . . . . . . . 104
4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.3.1 Latent class probability submodel . . . . . . . . . . . . . . . . . . . 106
4.3.2 Class-specific Cox regression submodel . . . . . . . . . . . . . . . . . 106
4.3.3 Class-specific GEE submodel . . . . . . . . . . . . . . . . . . . . . . 107
4.3.4 Posterior class membership probability . . . . . . . . . . . . . . . . . 109
4.3.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4 Selecting the number of latent classes . . . . . . . . . . . . . . . . . . . . . 111
4.5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.5.1 Data generation procedure . . . . . . . . . . . . . . . . . . . . . . . 113
4.5.2 Simulation scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.5.3 Point estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.5.4 Selecting the number of latent classes . . . . . . . . . . . . . . . . . 116
4.6 Real data application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.8 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.8.1 Proof of Equation (4.3.7) . . . . . . . . . . . . . . . . . . . . . . . . 122
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Latent Class Methods for Complex Chronic Disease Data () | 2021-07-20 14:22:47 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|