Statistical Methods for Handling Missing Data in Functional Data Analysis Pubblico
Zhu, Wanzhe (Spring 2018)
Abstract
Statistical analyses of functional data have drawn increased attention in recent years, yet handling missing data remains a notable obstacle in functional data analysis. This work is motivated by a renal study on detection of kidney obstruction, where up to two imaging scans, namely, baseline scan and the scan after furosemide treatment, are available for each kidney, resulting in two curves. In some cases, the kidney is judged to be non-obstructed and the patient does not receive furosemide, resulting in missing data for the second scan.
First, our objective is to develop a method that can impute the second curve based on the first curve, assuming that the first curve is informative about the missing second curve (Chapter 2). We model the curves for each individual using a set of potential basis functions and posit a sparse latent factor model for the basis coefficients, in which a shrinkage prior is assigned to the loadings to induce basis selection. We employ a Bayesian data augmentation algorithm to simultaneously estimate the model parameters and impute the missing curves. Our method is evaluated and compared to existing methods through a simulation study. We illustrate our method using a renal study, in which we impute the second curve for a kidney with a missing second curve, which can be useful in the interpretation of kidney obstruction.
In the same data situation with missing second curve, we consider an analysis of relationship between functional covariates and a binary outcome. We employ a Bayesian hierarchical model for jointly modeling the curves that are measured with error and the association between noise-free curves and the binary outcome in the presence of missing data. We consider two approaches of selecting basis functions for modeling the curves and for parameterizing functional coefficients in the functional generalized linear model used to model the association. In the first approach (Chapter 3), we use cubic B-spline basis functions and use deviance information criterion to select number of basis functions.
To overcome the difficulty in selecting basis functions, alternatively, we utilize functional principal component analysis (FPCA) to derive a more parsimonious model within the same framework, based on selecting functional principal components that explain large percent of variation in the curves (Chapter 4). We conduct simulation studies to assess the performance of the proposed methods in the presence of missing functional data. We illustrate our methods with the application to renal study.
Table of Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.3 Missing Data in Functional Data Analysis . . . . . . . . . . . 17
1.5 Statistical Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Multiple imputation of functional data with application to renal studies 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 FK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.2 SLF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Renal Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Handling missing data in generalized functional linear models with application to renal studies 43
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.3 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.4 MCMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.5 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Renal Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Handling missing data in generalized functional linear models through functional principal component analysis with application to renal studies 69
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2.1 Data Structure and Model . . . . . . . . . . . . . . . . . . . . 73
4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4 Renal Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 Future work 89
A Appendix for Chapter 3 91
B Appendix for Chapter 4 92
Bibliography 93
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Statistical Methods for Handling Missing Data in Functional Data Analysis () | 2018-03-09 12:58:27 -0500 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|