Neural Networks for Cancer Survival Analysis Using High-Dimensional Data Público
Yousefi, Safoora (Summer 2019)
Abstract
Since the emergence of high throughput experiments such as Next Generation Sequencing, the volume of genomic data produced has been increasing exponentially. This data holds the key to accurate predictions of clinical outcomes and mapping patients to the optimal treatment. However, analyzing genomic data is challenged by its high-dimensionality. Many prediction methods face limitations in learning from high-dimensional data generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this thesis, we demonstrate how the latest advances in neural networks methods that have been remarkably successful in general high-dimensional prediction tasks can be leveraged to the problem of predicting cancer outcomes. We perform an extensive comparison of deep survival models and other state of the art machine learning methods for survival analysis. We appreciate that interpretability is of great importance in adapting neural networks in bioinformatics, and propose a framework for interpreting deep survival models using a risk back-propagation technique that can lead to new understanding of diseases. Finally, we illustrate that deep survival models can successfully transfer information across heterogeneous data sources to improve prognostic accuracy, and describe an adversarial multi-task learning approach that outperforms traditional multi-task learning methods. We provide an open-source software implementation of these frameworks that enables automatic training, evaluation and interpretation of deep survival models.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
1.1.2 Survival and Hazard Functions . . . . . . . . . . . . . . . . . . . . . .3
1.1.3 Cox’s Proportional Hazards Model . . . . . . . . . . . . . . . . . . . . 6
1.2 Machine Learning and Genomic Data: Challenges . . . . . . . . . . . 7
1.2.1 Genomic Data, Dimensionality, and Heterogeneity . . . . . . . . . . . . 8
1.2.2 Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Model Selection and Evaluation . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2 Hyper-parameter Optimization . . . . . . . . . . . . . . . . . . . . . .14
2 Previous Work 18
2.1 Machine Learning for Survival Analysis . . . . . . . . . . . . . . . . . 18
2.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
2.2.2 Representation Learning . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.3 Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4 Adversarial Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.5 Interpretable Deep Learning . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Multi-task Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Learning Genomic Representations to Predict Clinical Outcomes in
Cancer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.1 Learning Genomic Representations to Predict Clinical Outcomes in
Cancer, ICLR-workshop, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Predicting clinical outcomes from large scale cancer genomic profiles
with deep survival models, Nature Scientific Reports, 2017 . . . . . . . . . .40
3.3 Predicting cancer outcomes from histology and genomics using convo-lutional
networks, PNAS, 2018 . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Multi-faceted computational assessment of risk and progression in oligo-dendroglioma
implicates NOTCH and PI3K pathways, NPJ Precision Oncology, 2018 . . . . . . 63
4 Learning Clinical Outcomes from Heterogeneous Data Sources . . . . . .73
4.1 Learning Clinical Outcomes from Heterogeneous Genomic Data Sources:
An Adversarial Multi-task Learning Approach, ICML, 2019 Adaptive
and Multitask Learning Workshop . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Learning clinical outcomes from heterogeneous genomic datasets using
adversarial and multi-task learning, Manuscript in Progress . . . . . . . . . . 83
5 Transfer Learning From Nucleus Detection To Classification In Histopathol-ogy Images, ISBI 2019 . .106
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Neural Networks for Cancer Survival Analysis Using High-Dimensional Data () | 2019-07-16 02:53:30 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|