Robust Latent Class Analysis for Longitudinal Data 公开

Hart, Kari Rebecca (2011)

Permanent URL: https://etd.library.emory.edu/concern/etds/gt54kn46p?locale=zh
Published

Abstract

Latent class analysis is a likelihood-based approach that is designed to elucidate the structure underlying population heterogeneity. More specifically, in latent class analysis, researchers study the patterns of interrelationship among a set of observed feature variables in order to understand and characterize underlying population subtypes or classes. While, typically, these underlying classes cannot be observed directly, they often have meaningful physical interpretations. As such, latent class analysis is useful in many health applications, where it is a powerful statistical tool for detecting disease subtypes and diagnostic subcategories.

Existing latent class methods do not offer a robust and efficient approach applicable to longitudinal data. Most existing methods for latent class analysis apply only to cross-sectional data, while likelihood-based extensions for longitudinal data tend to be computationally intensive and sensitive to modeling assumptions. Thus, we propose a novel robust artificial-likelihood-based approach to longitudinal latent class analysis. In particular, we consider a finite mixture of latent-class-specific generalized estimating equations in which the class mixing proportions can be influenced by a set of covariates. The proposed model is fit under the assumption that the number of latent classes is fixed and known. However, since the number of classes is typically not known a priori, we explore novel model diagnostics for assessing the number of latent classes. The diagnostics rely on longitudinal extensions of information criteria, which account for how well the model fits the data, model complexity, and class membership uncertainty.

A major application of this research is in modeling latent trajectories based on the clinical presentation of diseases. In this research, we applied the proposed methods to a longitudinal data set from the National Alzheimer's Coordinating Center comprised of patients with a baseline consensus diagnosis of mild cognitive impairment (MCI). The proposed methods were used to statistically validate the presence of MCI subtypes and to model the progression of MCI within each subtype over time. Cognitive, functional, and neuropsychiatric assessments were considered as feature variables involved in the conceptualization of MCI subtypes, while an indicator of cerebrovascular disease was incorporated as a risk factor for MCI subtype membership.

Table of Contents

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Limitations of Existing Methodology . . . . . . . . . . . . . . . . . . 3
1.4 Outline and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 4


2 Literature Review 6
2.1 Cross-Sectional Finite Mixture Models . . . . . . . . . . . . . . . . . 6
2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . 9
2.1.3 Model Identifiability and Boundary Solutions . . . . . . . . . 11
2.1.4 Bayesian Estimation of Finite Mixture Models . . . . . . . . . 12
2.1.5 Assessing the Number of Components: Information Criteria . 13
2.1.6 Assessing the Number of Components: Hypothesis Testing . . 17
2.2 Artificial Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Quasi-likelihood and Extended Quasi-likelihood . . . . . . . . 20
2.2.3 Generalized Estimating Equations (GEEs) . . . . . . . . . . . 22
2.2.4 Projection-Based Approach . . . . . . . . . . . . . . . . . . . 23
2.2.5 Empirical Likelihood Approach . . . . . . . . . . . . . . . . . 26
2.2.6 Quadratic Inference Function . . . . . . . . . . . . . . . . . . 29
2.3 Model Selection Diagnostics for Longitudinal Data . . . . . . . . . . . 31
2.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 Quasi-Likelihood Under the Independence Model Criterion (QIC) 32
2.3.3 Empirical Information Criterion (EIC) . . . . . . . . . . . . . 34
2.3.4 Bayesian Information Quadratic Inference Function (BIQIF) . 35
2.3.5 Expected Predictive Bias (EBP) . . . . . . . . . . . . . . . . . 36
2.4 Generating Correlated Discrete Data . . . . . . . . . . . . . . . . . . 38
2.4.1 Correlated Count Data . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 Correlated Binary Data . . . . . . . . . . . . . . . . . . . . . 39


3 A Latent Trajectory Model for Longitudinal Data 41
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 The Proposed Latent Trajectory Model . . . . . . . . . . . . . . . . . 41
3.3 Asymptotic Standard Error . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 A Simulation Study to Assess the Performance of the Proposed Latent
Trajectory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.1 Identifying the Mean Structure of Normally Distributed Feature
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.2 Identifying the Intercept and Slope of Normal and Discrete Feature
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62


4 Diagnostics for Latent Trajectory Models 63
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Assessing the Number of Components in a Finite Mixture of Generalized
Estimating Equations . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Cross-sectional Background . . . . . . . . . . . . . . . . . . . 64
4.2.2 Mixture Classification Quasi-Likelihood Approach . . . . . . . 66
4.2.3 A Cross-Validation Approach to Mixture Classification Quasi-
Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Normally Distributed Feature Variables with Zero Slope . . . 71
4.3.2 Discrete Feature Variables with Non-zero Slope . . . . . . . . 78
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85


5 Identifying Subtypes of Mild Cognitive Impairment via a Latent
Trajectory Model 86
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 National Alzheimer's Coordinating Center- Uniform Data Set . . . . 88
5.3 A Latent Trajectory Model for Mild Cognitive Impairment . . . . . . 90
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98


6 Summary and Future Research 100
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.1 Empirical Likelihood . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.3 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.4 Improvements in Computational Efficiency and Numerical Issues103
6.2.5 Local Dependence . . . . . . . . . . . . . . . . . . . . . . . . . 104


Bibliography 106

List of Figures
5.1 Latent class trajectories associated with cognitive, functional, and neuropsychiatric
assessments for stable and declining MCI subtypes based
on 2,348 MCI patients from the uniform data set . . . . . . . . . . . 97

List of Tables
2.1 Quasi-likelihood for a single observation yi associated with some simple
variance functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Class-specific intercepts of five normally distributed feature variables
simulated under an AR(1) correlation structure with a slope of 0, a
correlation coefficient of 0.3, and a standard deviation of 5. . . . . . . 50
3.2 Summary of simulation results for parameter estimates generated for
five normally distributed feature variables with equal probabilities of
class membership and with unequal probabilities of class membership 53
3.3 Detailed simulation results for parameter estimates generated for five
normally distributed feature variables with equal probabilities of class
membership between two latent classes . . . . . . . . . . . . . . . . . 54
3.4 Detailed simulation results for parameter estimates generated for five
normally distributed feature variables with unequal probabilities of
class membership between two latent classes . . . . . . . . . . . . . . 55
3.5 Class-specific intercepts and slopes of six feature variables simulated
under an AR(1) correlation structure with a correlation coefficient of
0.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6 Summary of simulation results for parameter estimates generated for
six feature variables with equal probabilities of class membership and
with unequal probabilities of class membership . . . . . . . . . . . . . 59
3.7 Detailed simulation results for parameter estimates generated for six
feature variables with equal probabilities of class membership between
two latent classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.8 Detailed simulation results for parameter estimates generated for six
feature variables with unequal probabilities of class membership between
two latent classes . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1 Simulation results for selecting the appropriate number of latent classes
based on normal data generated under the assumption of two latent
classes with equal mixing proportions . . . . . . . . . . . . . . . . . . 73
4.2 Simulation results for selecting the appropriate number of latent classes
based on normal data generated under the assumption of two latent
classes with unequal mixing proportions . . . . . . . . . . . . . . . . 75
4.3 Intercepts of five normally distributed feature variables simulated under
an AR(1) correlation structure with a slope of 0, a correlation
coefficient of 0.3, and a standard deviation of 5. . . . . . . . . . . . . 76
4.4 Simulation results for selecting the appropriate number of latent classes
based on normal data generated under the assumption of one latent class 77
4.5 Simulation results for selecting the appropriate number of latent classes
based on discrete and normal data generated under the assumption of
two latent classes with equal mixing proportions . . . . . . . . . . . . 80
4.6 Simulation results for selecting the appropriate number of latent classes
based on discrete and normal data generated under the assumption of
two latent classes with unequal mixing proportions . . . . . . . . . . 82
4.7 Intercepts and slopes of six feature variables simulated under an AR(1)
correlation structure with a correlation coefficient of 0.3. . . . . . . . 83
4.8 Simulation results for selecting the appropriate number of latent classes
based on discrete and normal data generated under the assumption of
one latent class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.1 Baseline demographic and clinical characteristics of 2,348 MCI participants
from the uniform data set . . . . . . . . . . . . . . . . . . . . . 95
5.2 Parameter estimates associated with cognitive, functional, and neuropsychiatric
assessments for the two-class latent trajectory model based
on 2,348 MCI patients from the uniform data set . . . . . . . . . . . 96

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改

Primary PDF

Supplemental Files