Employing machine learning and high-throughput sequencing to uncover complex genetic and epigenetic contributors in neurological phenotypes Restricted; Files Only
Li, Ronnie (Fall 2024)
Abstract
Determining the contribution of genetic and epigenetic components to complex brain phenotypes is a fundamental undertaking in neuroscience. Countless computational approaches, ranging from inferential statistics to machine learning methods, have been developed to derive insights into the ever-growing amount of available high-throughput data. Indeed, such bioinformatic frameworks have both recapitulated known associations between genotype and phenotype as well as predicted new ones, making them valuable tools in the neuroscience context. The goal of this dissertation is both to develop novel methods for analyzing high-throughput data as well as to employ existing methods to identify new targets. First, we developed a machine learning model to conduct multivariate association studies between gene expression and genotype, highlighting the importance of common genetic variants that influence gene expression across somatic tissues, including multiple regions of the brain. Second, we took advantage of tissue- and cancer-specific DNA methylation markers to predict a tumor’s tissue of origin in patients, and we highlighted the usefulness of our tool for brain cancers like glioblastoma. Third, we leveraged existing data and standard analytical methods to identify cell-type-specific enhancer sequences in the mouse spinal cord in the hope of elucidating alpha motor neuron lethality in amyotrophic lateral sclerosis. These three studies are diverse in nature, but they reaffirm both the complexity of genetic and epigenetic regulation as well as the importance of furthering our understanding of this field. By combining experimental knowledge of biological systems with the capabilities of computation, we are hopeful that we will begin to unveil a more comprehensive picture of human brain health.
Table of Contents
Contents
Chapter 1. Introduction 1
1.1. A brief history of genetic regulation 1
1.2. The genome-wide association study (GWAS) 2
1.3. Expression quantitative trait loci (eQTLs) 4
1.4. The era of high-throughput sequencing 6
1.5. Machine learning and applications in biology 8
1.6. Beyond genetics: epigenetic regulation 9
1.7. Research Goals 10
Chapter 2. MTClass: Identification and annotation of multi-phenotype cis-eQTLs using machine learning 12
2.1. Abstract 12
2.2. Introduction 12
2.3. Results 15
2.4. Discussion 35
2.5. Data availability 39
2.6. Code availability 39
2.7. Methods 39
2.8. Supplementary figures 48
Chapter 3. LRmeth: a logistic regression approach for inferring tissue of origin from tumor DNA methylation profiles 59
3.1. Abstract 59
3.2. Introduction 59
3.3. Results 62
3.4. Discussion 70
3.5. Conclusion 73
3.6. Methods 73
3.7. Data availability 78
3.8. Code availability 78
3.9. Supplementary figures 79
Chapter 4. Identification of putative enhancer regions in alpha motor neurons using multi-omic analysis 84
4.1. Abstract 84
4.2. Introduction 84
4.3. Results 87
4.4. Discussion 92
4.5. Conclusion 93
4.6. Methods 94
4.7. Supplementary figures 98
Chapter 5. Concluding remarks 101
5.1. Summary of research findings 101
5.2. Limitations and other considerations 103
5.3. Future directions 105
5.4. Conclusions 107
References 109
Figures and Tables
Chapter 1. Introduction 1
Figure 1.1 3
Figure 1.2 4
Chapter 2. MTClass: Identification and annotation of multi-phenotype cis-eQTLs using machine learning 12
Figure 2.1 14
Figure 2.2 17
Figure 2.3 25
Figure 2.4 27
Figure 2.5 31
Figure 2.6 34
Supplementary Figure S2.1 48
Supplementary Figure S2.2 49
Supplementary Figure S2.3 50
Supplementary Figure S2.4 51
Supplementary Figure S2.5 52
Supplementary Figure S2.6 53
Supplementary Figure S2.7 54
Supplementary Figure S2.8 55
Supplementary Figure S2.9 56
Supplementary Table S2.1 57
Supplementary Table S2.2 58
Chapter 3. LRmeth: a logistic regression approach for inferring tissue of origin from tumor DNA methylation profiles 59
Figure 3.1 62
Table 3.1 63
Figure 3.2 64
Figure 3.3 65
Table 3.2 67
Figure 3.4 68
Supplementary Figure S3.1 79
Supplementary Figure S3.2 80
Supplementary Figure S3.3 81
Supplementary Figure S3.4 82
Supplementary Figure S3.5 83
Chapter 4. Identification of putative enhancer regions in alpha motor neurons using multi-omic analysis 84
Figure 4.1 86
Figure 4.2 89
Table 4.1 90
Figure 4.3 90
Figure 4.4 92
Table 4.2 94
Table 4.3 96
Supplementary Figure S4.1 98
Supplementary Figure S4.2 99
Supplementary Figure S4.3 100
About this Dissertation
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |

Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
![]() |
File download under embargo until 09 January 2027 | 2024-10-28 13:17:39 -0400 | File download under embargo until 09 January 2027 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|