Flexible Association Analysis and Prediction Methods With Biomedical Data Restricted; Files Only

Qi Yu (Summer 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/vt150k70h?locale=en
Published

Abstract

Continuous glucose monitoring (CGM), which measures interstitial glucose every 1-5 minutes, is increasingly used in US hospitals to manage diabetes. Time in Range (TIR), defined as the percentage of time that glucose readings are within a target glycemic range (e.g., 70 - 180 mg/dL) over a specified amount of time (e.g., 7 days), is a key metric to assess glycemic control based on CGM. However, routine analyses of TIR ignore the special missing data features inherited with inpatient CGM studies, such as insufficient CGM sampling due to short hospital stays or random device errors. Failing to appropriately account for such missing data can lead to inaccurate estimation and prediction of TIR.

This dissertation addresses the problems of estimating and predicting TIR based on inpatient CGM through three projects. In the first project, we propose a rigorous statistical framework for estimating mean TIR when CGM glucose trajectories are not completely observed as occurred in many inpatient studies. The proposed method can effectively mitigate the biases casued by intermittent missing data (due to random device errors) or monotone missingness (due to early hospital discharge), conferring unbiased and consistent estimates for mean TIR and valid inferences for comparing mean TIR among groups. Simulation studies demonstrate satisfactory finite-sample performance of the proposed method across realistic scenarios.

In the second project, we develop a random-forest (RF) based procedure to provide individualized TIR prediction. The proposed procedure is well-equipped to handle missing data while accommodating complex relationships among TIR and individual characteristics (e.g., clinical features, treatments). The new procedure also supports dynamic prediction which allows for updated TIR prediction according to recent CGM history. Results from simulations and real data show improved prediction accuracy and variable selection performance of our RF procedure relative to several benchmark approaches.

The third project is aimed to develop a R package, including a web-based Shiny application, that integrates and implements the estimation and prediction tools developed in the first two projects. The application provides tools for data processing, mean TIR estimation, group comparisons, and individualized TIR prediction. It also offers intuitive visualizations and summary reports of inpatient glucose profiles. The interface is user-friendly and designed for researchers and clinicians, requiring no coding expertise.

In summary, the three projects offer a comprehensive set of practical statistical tools for TIR analysis based on inpatient CGM. The methods are also broadly applicable to other clinical monitoring systems, such as real-time cardiac or temperature monitoring.

Table of Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Existing Methods for Mean TIR Estimation with Incomplete CGM Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Existing Methods for TIR Prediction with CGM Data . . . . 4

1.2.3 Software and Reproducibility for Inpatient CGM Analysis . . 6

1.3 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Estimation of Mean Time in Range with Incomplete Inpatient CGM Data 8

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Data and notation . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 Estimation of Mean TIR . . . . . . . . . . . . . . . . . . . . . 12

2.3 Asymptotic Theory and Inference . . . . . . . . . . . . . . . . . . . . 16

2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.1 Data generation and simulation set-ups . . . . . . . . . . . . . 18

2.4.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.3 Simulation studies under different configurations . . . . . . . . 23

2.4.4 Simulation studies on sensitivity analysis . . . . . . . . . . . . 24

2.5 A Real Data Application . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.7 Appendix A: Proof of Theorems 12 . . . . . . . . . . . . . . . . . . 32

2.7.1 Proof of Theorem 1: Asymptotic Properties of ppGptq . . . . . . 32

2.7.2 Proof Theorem 2: Asymptotic Properties of pμW . . . . . . . . 36

2.8 Appendix B: Additional Figures and Tables . . . . . . . . . . . . . . 36

2.8.1 Section B.1: Simulation results for the case with non-informative Ci’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.8.2 Section B.2: Details on additional simulation studies . . . . . 40

2.8.3 Section B.3: Additional results for the real data application . 44

3 Individualized Time in Range Prediction Using Random Forests for Inpatient CGM Data 49

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2 The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2.1 Data and Notation . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2.2 Unbiased Estimation of Mean TIR . . . . . . . . . . . . . . . 52

3.2.3 Random Forest Framework . . . . . . . . . . . . . . . . . . . . 52

3.2.4 Proposed Splitting Rule . . . . . . . . . . . . . . . . . . . . . 53

3.2.5 Random Forest TIR Prediction . . . . . . . . . . . . . . . . . 54

3.2.6 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . 55

3.2.7 Dynamic Prediction of Next-Day TIR . . . . . . . . . . . . . . 55

3.2.8 Variable Selection and Importance . . . . . . . . . . . . . . . 57

3.2.9 Prediction Error Evaluation . . . . . . . . . . . . . . . . . . . 58

3.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.3.1 Data Generation and Simulation Setup . . . . . . . . . . . . . 59

3.3.2 Alternative Predictor Structures . . . . . . . . . . . . . . . . . 60

3.3.3 Parameter Tuning and Benchmark Methods . . . . . . . . . . 61

3.3.4 Model Evaluation in Simulation Studies . . . . . . . . . . . . 62

3.3.5 Simulation on Dynamic Prediction . . . . . . . . . . . . . . . 62

3.3.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4 Real Data Application . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.6.1 Appendix 1: Simulation Settings . . . . . . . . . . . . . . . . 71

3.6.2 Appendix 2: MSE Results Across Sample Sizes . . . . . . . . 71

3.6.3 Appendix 3: Variable Selection Accuracy . . . . . . . . . . . . 75

3.6.4 Appendix 4: Dynamic Prediction Results . . . . . . . . . . . . 81

4 InpatCGM: An R Package and Shiny Application for Inpatient CGM Data Analysis 85

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2 Installation and Access . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3 User Interface Overview . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 Primary Functions and Documentation . . . . . . . . . . . . . . . . . 91

4.5 Reproducibility and Deployment . . . . . . . . . . . . . . . . . . . . . 92

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5 Conclusion and Future Directions 94

5.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 94

5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.4 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified Preview image embargoed

Primary PDF

Supplemental Files