Robust Latent Class Analysis for Longitudinal Data Public
Hart, Kari Rebecca (2011)
Abstract
Latent class analysis is a likelihood-based approach that is designed to elucidate the structure underlying population heterogeneity. More specifically, in latent class analysis, researchers study the patterns of interrelationship among a set of observed feature variables in order to understand and characterize underlying population subtypes or classes. While, typically, these underlying classes cannot be observed directly, they often have meaningful physical interpretations. As such, latent class analysis is useful in many health applications, where it is a powerful statistical tool for detecting disease subtypes and diagnostic subcategories.
Existing latent class methods do not offer a robust and efficient approach applicable to longitudinal data. Most existing methods for latent class analysis apply only to cross-sectional data, while likelihood-based extensions for longitudinal data tend to be computationally intensive and sensitive to modeling assumptions. Thus, we propose a novel robust artificial-likelihood-based approach to longitudinal latent class analysis. In particular, we consider a finite mixture of latent-class-specific generalized estimating equations in which the class mixing proportions can be influenced by a set of covariates. The proposed model is fit under the assumption that the number of latent classes is fixed and known. However, since the number of classes is typically not known a priori, we explore novel model diagnostics for assessing the number of latent classes. The diagnostics rely on longitudinal extensions of information criteria, which account for how well the model fits the data, model complexity, and class membership uncertainty.
A major application of this research is in modeling latent trajectories based on the clinical presentation of diseases. In this research, we applied the proposed methods to a longitudinal data set from the National Alzheimer's Coordinating Center comprised of patients with a baseline consensus diagnosis of mild cognitive impairment (MCI). The proposed methods were used to statistically validate the presence of MCI subtypes and to model the progression of MCI within each subtype over time. Cognitive, functional, and neuropsychiatric assessments were considered as feature variables involved in the conceptualization of MCI subtypes, while an indicator of cerebrovascular disease was incorporated as a risk factor for MCI subtype membership.
Table of Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 1
1.2 Motivating Example . . . . . . . . . . . . . . . . . . . . . .
. . . . . 2
1.3 Limitations of Existing Methodology . . . . . . . . . . . . . .
. . . . 3
1.4 Outline and Objectives . . . . . . . . . . . . . . . . . . . .
. . . . . . 4
2 Literature Review 6
2.1 Cross-Sectional Finite Mixture Models . . . . . . . . . . . . .
. . . . 6
2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 6
2.1.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . .
9
2.1.3 Model Identifiability and Boundary Solutions . . . . . . . .
. 11
2.1.4 Bayesian Estimation of Finite Mixture Models . . . . . . . .
. 12
2.1.5 Assessing the Number of Components: Information Criteria .
13
2.1.6 Assessing the Number of Components: Hypothesis Testing . .
17
2.2 Artificial Likelihood . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 19
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 19
2.2.2 Quasi-likelihood and Extended Quasi-likelihood . . . . . . .
. 20
2.2.3 Generalized Estimating Equations (GEEs) . . . . . . . . . . .
22
2.2.4 Projection-Based Approach . . . . . . . . . . . . . . . . . .
. 23
2.2.5 Empirical Likelihood Approach . . . . . . . . . . . . . . . .
. 26
2.2.6 Quadratic Inference Function . . . . . . . . . . . . . . . .
. . 29
2.3 Model Selection Diagnostics for Longitudinal Data . . . . . . .
. . . . 31
2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 31
2.3.2 Quasi-Likelihood Under the Independence Model Criterion (QIC)
32
2.3.3 Empirical Information Criterion (EIC) . . . . . . . . . . . .
. 34
2.3.4 Bayesian Information Quadratic Inference Function (BIQIF) .
35
2.3.5 Expected Predictive Bias (EBP) . . . . . . . . . . . . . . .
. . 36
2.4 Generating Correlated Discrete Data . . . . . . . . . . . . . .
. . . . 38
2.4.1 Correlated Count Data . . . . . . . . . . . . . . . . . . . .
. . 38
2.4.2 Correlated Binary Data . . . . . . . . . . . . . . . . . . .
. . 39
3 A Latent Trajectory Model for Longitudinal Data 41
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 41
3.2 The Proposed Latent Trajectory Model . . . . . . . . . . . . .
. . . . 41
3.3 Asymptotic Standard Error . . . . . . . . . . . . . . . . . . .
. . . . 47
3.4 A Simulation Study to Assess the Performance of the Proposed
Latent
Trajectory Model . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 49
3.4.1 Identifying the Mean Structure of Normally Distributed
Feature
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 50
3.4.2 Identifying the Intercept and Slope of Normal and Discrete
Feature
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 62
4 Diagnostics for Latent Trajectory Models 63
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 63
4.2 Assessing the Number of Components in a Finite Mixture of
Generalized
Estimating Equations . . . . . . . . . . . . . . . . . . . . . . .
. 64
4.2.1 Cross-sectional Background . . . . . . . . . . . . . . . . .
. . 64
4.2.2 Mixture Classification Quasi-Likelihood Approach . . . . . .
. 66
4.2.3 A Cross-Validation Approach to Mixture Classification
Quasi-
Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 69
4.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 70
4.3.1 Normally Distributed Feature Variables with Zero Slope . . .
71
4.3.2 Discrete Feature Variables with Non-zero Slope . . . . . . .
. 78
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 85
5 Identifying Subtypes of Mild Cognitive Impairment via a
Latent
Trajectory Model 86
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 86
5.2 National Alzheimer's Coordinating Center- Uniform Data Set . .
. . 88
5.3 A Latent Trajectory Model for Mild Cognitive Impairment . . . .
. . 90
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 98
6 Summary and Future Research 100
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 100
6.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 100
6.2.1 Empirical Likelihood . . . . . . . . . . . . . . . . . . . .
. . . 101
6.2.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . .
. . 101
6.2.3 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . .
. . 102
6.2.4 Improvements in Computational Efficiency and Numerical
Issues103
6.2.5 Local Dependence . . . . . . . . . . . . . . . . . . . . . .
. . . 104
Bibliography 106
List of Figures
5.1 Latent class trajectories associated with cognitive,
functional, and neuropsychiatric
assessments for stable and declining MCI subtypes based
on 2,348 MCI patients from the uniform data set . . . . . . . . . .
. 97
List of Tables
2.1 Quasi-likelihood for a single observation yi associated with
some simple
variance functions . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 20
3.1 Class-specific intercepts of five normally distributed feature
variables
simulated under an AR(1) correlation structure with a slope of 0,
a
correlation coefficient of 0.3, and a standard deviation of 5. . .
. . . . 50
3.2 Summary of simulation results for parameter estimates generated
for
five normally distributed feature variables with equal
probabilities of
class membership and with unequal probabilities of class membership
53
3.3 Detailed simulation results for parameter estimates generated
for five
normally distributed feature variables with equal probabilities of
class
membership between two latent classes . . . . . . . . . . . . . . .
. . 54
3.4 Detailed simulation results for parameter estimates generated
for five
normally distributed feature variables with unequal probabilities
of
class membership between two latent classes . . . . . . . . . . . .
. . 55
3.5 Class-specific intercepts and slopes of six feature variables
simulated
under an AR(1) correlation structure with a correlation coefficient
of
0.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 57
3.6 Summary of simulation results for parameter estimates generated
for
six feature variables with equal probabilities of class membership
and
with unequal probabilities of class membership . . . . . . . . . .
. . . 59
3.7 Detailed simulation results for parameter estimates generated
for six
feature variables with equal probabilities of class membership
between
two latent classes . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 60
3.8 Detailed simulation results for parameter estimates generated
for six
feature variables with unequal probabilities of class membership
between
two latent classes . . . . . . . . . . . . . . . . . . . . . . . .
. 61
4.1 Simulation results for selecting the appropriate number of
latent classes
based on normal data generated under the assumption of two
latent
classes with equal mixing proportions . . . . . . . . . . . . . . .
. . . 73
4.2 Simulation results for selecting the appropriate number of
latent classes
based on normal data generated under the assumption of two
latent
classes with unequal mixing proportions . . . . . . . . . . . . . .
. . 75
4.3 Intercepts of five normally distributed feature variables
simulated under
an AR(1) correlation structure with a slope of 0, a
correlation
coefficient of 0.3, and a standard deviation of 5. . . . . . . . .
. . . . 76
4.4 Simulation results for selecting the appropriate number of
latent classes
based on normal data generated under the assumption of one latent
class 77
4.5 Simulation results for selecting the appropriate number of
latent classes
based on discrete and normal data generated under the assumption
of
two latent classes with equal mixing proportions . . . . . . . . .
. . . 80
4.6 Simulation results for selecting the appropriate number of
latent classes
based on discrete and normal data generated under the assumption
of
two latent classes with unequal mixing proportions . . . . . . . .
. . 82
4.7 Intercepts and slopes of six feature variables simulated under
an AR(1)
correlation structure with a correlation coefficient of 0.3. . . .
. . . . 83
4.8 Simulation results for selecting the appropriate number of
latent classes
based on discrete and normal data generated under the assumption
of
one latent class . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 84
5.1 Baseline demographic and clinical characteristics of 2,348 MCI
participants
from the uniform data set . . . . . . . . . . . . . . . . . . . . .
95
5.2 Parameter estimates associated with cognitive, functional, and
neuropsychiatric
assessments for the two-class latent trajectory model based
on 2,348 MCI patients from the uniform data set . . . . . . . . . .
. 96
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Robust Latent Class Analysis for Longitudinal Data () | 2018-08-28 12:54:34 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|