Temporal Irregular Tensor Factorization and Prediction for Health Data Analysis Public

Ren, Yifei (Fall 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/mw22v6615?locale=fr
Published

Abstract

Tensors are a popular algebraic structure for a wide range of applications, due to their exceptional capability to model multidimensional relationships of the data. Among them, regular tensors with aligned dimensions for all modes have been extensively studied, for which various tensor factorization structures are proposed depending on the applications. However, regular tensor decomposition is incapable of handling many real-world cases involving time, due to its irregularity. Electronic health records (EHRs) are often generated and collected across a large number of patients featuring distinctive medical conditions and clinical progress over a long period of time, which results in unaligned records along the time dimension. PARAFAC2 has been re-popularized for successfully extracting meaningful medical concepts (phenotypes) from EHRs by irregular tensor factorization. However, efforts still need to overcome the limitations of the current PARAFAC2 model, including lack of robustness against missing values, lack of modeling of non-linear temporal dependencies, and lack of consideration of the downstream tasks. We propose 1) robust temporal PARAFAC2 for irregular tensor factorization and completion with potential missing and erroneous values; 2) generalized, low-rank recurrent neural network (RNN) regularized robustly irregular tensor factorization for more accurate temporal modeling, which is flexible enough to choose from a variate of losses to best suit different types of data in practice; 3) supervised irregular tensor factorization framework with multi-task learning for both phenotype extraction and predictive learning which enables information sharing between different prediction tasks and further improve downstream prediction performance. 

Table of Contents

Sampled EHRs data. 2

Symbols and notations used in chapter 2. 11

Additional symbols for REPAIR optimization. 18

Parameters for CMS and MIMIC-III. 25

Basis number for CMS and MIMIC-III. 27

Summary of average mortality risk of the higher-risk cluster, lower-risk cluster, and their difference. 30

Symbols and notations used in chapter 3. 35

Examples of tensor data types and loss functions. 42

Feature discretion ranges for the 6 features in PhysioNet sepsis dataset. 48

Temperature measurements of temperature trajectory groups. 53

MIMIC-EXTRACT phenotypes discovered by REBAR. 57

MIMIC-III phenotypes discovered by REBAR. 57

Symbols and notations used in chapter 4. 61

Comparison of existing PARAFAC2-based models. 63

Experiment result of PR-AUC and convergence epochs when m varies. 74

MIMIC-EXTRACT phenotypes discovered by MULTIPAR. 79

MIMIC-EXTRACT phenotypes discovered by SinglePAR incorporating in-hospital mortality prediction. 80

MIMIC-EXTRACT phenotypes discovered by SinglePAR incorporating icu mortality prediction. 80

MIMIC-EXTRACT phenotypes discovered by SinglePAR incorporating readmission prediction. 81

MIMIC-EXTRACT phenotypes discovered by SinglePAR incorporating ventilation prediction. 81

MIMIC-EXTRACT phenotypes discovered by COPA. 82

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files