Deep Models for Gene Regulation 公开
Denas, Olgert (2014)
Abstract
The recent increase in the production pace of functional
genomics data has created new opportunities in understanding
regulation. Advances range from the identification of new
regulatory elements to gene expression prediction from genomic and
epigenomic features. At the same time, this data-rich environment
has raised challenges in retrieving and interpreting information
contained therein.
Based on recent algorithmic developments, deep artificial neural
networks (ANN) have been used to build representations of the input
that preserve only the information needed to the task at hand.
Prediction models based on these representations have achieved
excellent results in machine learning competitions. The deep
learning paradigm describes how to build these representations and
train the prediction models in a single learning exercise.
In this work, we propose ANN as tools for modeling gene regulation
and a novel technique for interpreting what the model has
learned.
We implement software for the design of ANNs and for training
practices over functional genomics data. As a proof of concept, use
our software to model differential gene expression during cell
differentiation. To show the versatility of ANNs, we train a
regression model on measurements of protein-DNA interaction to
predict gene expression levels.
Typically, input feature extraction from a trained ANN is
formulated as an optimization problem whose solution is slow to
obtain and not unique. We propose a new efficient feature
extraction technique for classification problems that provides
guarantees on the class probability of the features and their norm.
We apply this technique to identify differential gene expression
associated features that agree with previous empirical
studies.
Finally, we propose building representations of functional features
from protein-DNA interaction measurements using a deep stack of
nonlinear transformations. We show that these reduced
representations are informative and can be used to label parts of
the gene, regulatory elements, and quiescent regions.
While widely successful, deep ANNs are considered to be hard to use
and interpret. We hope that this work will help increase the
adoption of such models in the genomics community.
Table of Contents
1 Introduction
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 1
1.2 Summary of remaining chapters . . . . . . . . . . . . . . . . .
4
1.3 Contributions of this thesis . . . . . . . . . . . . . . . . .
. . . 5
2 Background 7
2.1 Importance of regulation . . . . . . . . . . . . . . . . . . .
. . 7
2.2 Mechanisms of gene regulation . . . . . . . . . . . . . . . . .
. 10
2.2.1 Transcriptional regulation . . . . . . . . . . . . . . . .
10
2.2.2 Post-transcriptional regulation . . . . . . . . . . . . . .
13
2.2.3 Epigenetic regulation . . . . . . . . . . . . . . . . . . .
14
2.3 Computational methods for TF binding analysis . . . . . . . .
16
2.3.1 From HGP to ENCODE . . . . . . . . . . . . . . . . . 16
2.3.2 Next Generation Sequencing . . . . . . . . . . . . . . .
17
2.3.3 From read counts to signal . . . . . . . . . . . . . . . .
20
2.4 Articial neural networks . . . . . . . . . . . . . . . . . . .
. . 22
2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
22
2.4.2 Feed forward networks . . . . . . . . . . . . . . . . . .
24
2.4.3 Convolutional Neural Networks . . . . . . . . . . . . .
26
2.4.4 Modern ANNs and Representation learning . . . . . . 27
3 Feature extraction 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 29
3.2 Relevant feature extraction from ANNs . . . . . . . . . . . . .
31
3.3 Convex optimization based method . . . . . . . . . . . . . . .
32
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 39
4 Deep models for regulation 41
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 41
4.2 Dierential gene expression modeling . . . . . . . . . . . . . .
45
4.2.1 The G1E biological model and data . . . . . . . . . . .
45
4.2.2 Feature extraction . . . . . . . . . . . . . . . . . . . .
49
4.3 Gene expression prediction from TFos . . . . . . . . . . . . .
. 55
4.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.3.2 Regression model . . . . . . . . . . . . . . . . . . . . .
56
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 56
5 Unsupervised modeling of functional genomics data 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 60
5.2 Deep representations of the genome . . . . . . . . . . . . . .
. 63
5.2.1 Experimental setting . . . . . . . . . . . . . . . . . . .
63
5.2.2 Data and Model . . . . . . . . . . . . . . . . . . . . . .
66
5.3 TF composition analysis of timing replication domains . . . .
67
5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
67
5.3.2 Data and Model . . . . . . . . . . . . . . . . . . . . . .
70
5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 72
Bibliography 74
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Deep Models for Gene Regulation () | 2018-08-28 11:18:42 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
gertidenas-dimer-5aaec5e2be35.zip () | 2018-08-28 11:19:00 -0400 |
|