Neural Networks for Cancer Survival Analysis Using High-Dimensional Data Open Access

Yousefi, Safoora (Summer 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/7s75dd52j?locale=en
Published

Abstract

Since the emergence of high throughput experiments such as Next Generation Sequencing, the volume of genomic data produced has been increasing exponentially. This data holds the key to accurate predictions of clinical outcomes and mapping patients to the optimal treatment. However, analyzing genomic data is challenged by its high-dimensionality. Many prediction methods face limitations in learning from high-dimensional data generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this thesis, we demonstrate how the latest advances in neural networks methods that have been remarkably successful in general high-dimensional prediction tasks can be leveraged to the problem of predicting cancer outcomes. We perform an extensive comparison of deep survival models and other state of the art machine learning methods for survival analysis. We appreciate that interpretability is of great importance in adapting neural networks in bioinformatics, and propose a framework for interpreting deep survival models using a risk back-propagation technique that can lead to new understanding of diseases. Finally, we illustrate that deep survival models can successfully transfer information across heterogeneous data sources to improve prognostic accuracy, and describe an adversarial multi-task learning approach that outperforms traditional multi-task learning methods. We provide an open-source software implementation of these frameworks that enables automatic training, evaluation and interpretation of deep survival models.

Table of Contents

1 Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1  Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

1.1.2 Survival and Hazard Functions  . . . . . . . . . . . . . . . . . . . . . .3

1.1.3 Cox’s Proportional Hazards Model . . . . . . . . . . . . . . . . . . . . 6

1.2  Machine Learning and Genomic Data: Challenges . . . . . . . . . . . 7

1.2.1 Genomic Data, Dimensionality, and Heterogeneity . . . . . . . . . . . . 8

1.2.2 Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3  Model Selection and Evaluation . . . . . . . . . . . . . . . . . . . . . 13

1.3.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.2 Hyper-parameter Optimization . . . . . . . . . . . . . . . . . . . . . .14

2 Previous Work                                                18

2.1  Machine Learning for Survival Analysis . . . . . . . . . . . . . . . . . 18

2.2  Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

2.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

2.2.2 Representation Learning . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.3 Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.4 Adversarial Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.5 Interpretable Deep Learning . . . . . . . . . . . . . . . . . . . . . . . 28

2.3  Multi-task Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Learning Genomic Representations to Predict Clinical Outcomes in

Cancer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1  Learning Genomic Representations to Predict Clinical Outcomes in

Cancer, ICLR-workshop, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2  Predicting clinical outcomes from large scale cancer genomic profiles

with deep survival models, Nature Scientific Reports, 2017 . . . . . . . . . .40

3.3  Predicting cancer outcomes from histology and genomics using convo-lutional

networks, PNAS, 2018 . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4  Multi-faceted computational assessment of risk and progression in oligo-dendroglioma

implicates NOTCH and PI3K pathways, NPJ Precision Oncology, 2018 . . . . . . 63

4 Learning Clinical Outcomes from Heterogeneous Data Sources . . . . . .73

4.1  Learning Clinical Outcomes from Heterogeneous Genomic Data Sources:

An Adversarial Multi-task Learning Approach, ICML, 2019 Adaptive

and Multitask Learning Workshop . . . . . . . . . . . . . . . . . . . . . . . 74

4.2  Learning clinical outcomes from heterogeneous genomic datasets using

adversarial and multi-task learning, Manuscript in Progress . . . . . . . . . . 83

5 Transfer Learning From Nucleus Detection To Classification In Histopathol-ogy Images, ISBI 2019 . .106

Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files