Predicting Time-to-Event and Clinical Outcomes from High-Dimensional Unstructured Data Público
Mobadersany, Pooya (Spring 2021)
Abstract
This dissertation addresses challenges in learning to predict time-to-event outcomes such as survival and treatment response from high dimension data including whole slide images and genomic profiles that are being produced in modern pathology labs. Learning from these data requires integration of disparate data types, and the ability to attend to important signals within vast amounts of irrelevant data present in each sample. Furthermore, clinical translation of machine learning models for prognostication requires communicating the degree and types of uncertainty to clinical end users who will rely on inferences from these models.
This dissertation has addressed these challenges. To validate our developed data fusion technique, we have selected cancer histology data as it reflects underlying molecular processes and disease progression and contains rich phenotypic information predictive of patient outcomes. This study shows a computational approach for learning patient outcomes from digital pathology images using deep learning to combine the power of adaptive machine learning algorithms with survival models. We illustrate how these survival convolutional neural networks (SCNNs) can integrate information from both histology images and genomic biomarkers into a single unified framework to predict time-to-event outcomes and show prediction accuracy that surpasses the current clinical paradigm for predicting the overall survival of patients diagnosed with glioma. Next, to capture the volume of data and manage heterogeneity within the histology images, we have developed GestAltNet, which emulates human attention to high-yield areas and aggregation across regions. GestAltNet points toward a future of genuinely whole slide digital pathology by incorporating human-like behaviors of attention and gestalt formation process across massive whole slide images. We have used GestAltNet to estimate the gestational age from whole slide images of placental tissues and compared this to networks lacking attention and aggregation capabilities. To address the challenge of representing uncertainty during inference, we have developed a Bayesian survival neural network that captures the aleatoric and epistemic uncertainties when predicting clinical outcomes. These networks are the next generation of machine learning models for predicting time-to-event outcomes, where the degree and source of uncertainty are communicated to clinical end users.
Table of Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Survival Prediction based on Convolutional Neural Network (Chapter 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Architectures for Aggregate Learning (Chapter 4) . . . . . . . 6
1.2.3 Bayesian Neural Networks for Survival Prediction (Chapter 5) 7
2 Background & Related Work 10
2.1 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Survival models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Parametric survival models . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Non-parametric survival models . . . . . . . . . . . . . . . . . 20
2.2.3 Semi-parametric survival models . . . . . . . . . . . . . . . . . 21
2.3 Survival Convolutional Neural Networks . . . . . . . . . . . . . . . . 25
2.4 Learning from Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Predicting Cancer Outcomes From Histology and Genomics Using Convolutional Networks 33
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Learning patient outcomes with deep survival convolutional neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 Data and image curation . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Network architecture and training procedures . . . . . . . . . 41
3.4.3 Training resampling . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.4 Testing resampling and model averaging . . . . . . . . . . . . 43
3.4.5 Validation procedures . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.6 Statistical analyses . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.7 Hardware and software . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Assessing the prognostic accuracy of SCNN . . . . . . . . . . . . . . . 46
3.6 SCNN predictions correlate with genomic subtypes and manual histologic grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Improving prognostic accuracy by integrating genomic biomarkers . . 50
3.8 Visualizing prognosis with SCNN heatmaps . . . . . . . . . . . . . . 53
3.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.10 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . 59
4 Architectures for Aggregate Learning 60
4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Changes over time . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2 The placenta and digital pathology . . . . . . . . . . . . . . . 63
4.3 Materials, subjects, and methods . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Patients and materials . . . . . . . . . . . . . . . . . . . . . . 66
4.3.2 Baseline model . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.3 GestAltNet - input & glimpsing mechanism . . . . . . . . . . 68
4.3.4 GestAltNet - pipeline & attention and aggregation 69
4.3.5 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.6 Attention and whole slide estimation of GA . . . . . . . . . . 71
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 Interobserver variability . . . . . . . . . . . . . . . . . . . . . 71
4.4.2 Deep learning model performance . . . . . . . . . . . . . . . . 73
4.4.3 Attention and estimation of GA across whole slides . . . . . . 73
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . 78
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Bayesian Survival Neural Networks 81
5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2.1 Uncertainty analysis . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3.1 Bayesian neural network . . . . . . . . . . . . . . . . . . . . . 84
5.3.2 Quantifying uncertainties in Bayesian survival neural networks 89
5.3.3 Generating synthetic survival data . . . . . . . . . . . . . . . 91
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.4.2 Survival prediction for glioma patients . . . . . . . . . . . . . 94
5.5 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . 102
Bibliography 107
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Predicting Time-to-Event and Clinical Outcomes from High-Dimensional Unstructured Data () | 2021-04-19 14:14:30 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|