Employing machine learning and high-throughput sequencing to uncover complex genetic and epigenetic contributors in neurological phenotypes Restricted; Files Only

Li, Ronnie (Fall 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/cr56n271w?locale=en
Published

Abstract

Determining the contribution of genetic and epigenetic components to complex brain phenotypes is a fundamental undertaking in neuroscience. Countless computational approaches, ranging from inferential statistics to machine learning methods, have been developed to derive insights into the ever-growing amount of available high-throughput data. Indeed, such bioinformatic frameworks have both recapitulated known associations between genotype and phenotype as well as predicted new ones, making them valuable tools in the neuroscience context. The goal of this dissertation is both to develop novel methods for analyzing high-throughput data as well as to employ existing methods to identify new targets. First, we developed a machine learning model to conduct multivariate association studies between gene expression and genotype, highlighting the importance of common genetic variants that influence gene expression across somatic tissues, including multiple regions of the brain. Second, we took advantage of tissue- and cancer-specific DNA methylation markers to predict a tumor’s tissue of origin in patients, and we highlighted the usefulness of our tool for brain cancers like glioblastoma. Third, we leveraged existing data and standard analytical methods to identify cell-type-specific enhancer sequences in the mouse spinal cord in the hope of elucidating alpha motor neuron lethality in amyotrophic lateral sclerosis. These three studies are diverse in nature, but they reaffirm both the complexity of genetic and epigenetic regulation as well as the importance of furthering our understanding of this field. By combining experimental knowledge of biological systems with the capabilities of computation, we are hopeful that we will begin to unveil a more comprehensive picture of human brain health.

Table of Contents

Contents

Chapter 1. Introduction                                                                                             1

1.1. A brief history of genetic regulation                                                                   1

1.2. The genome-wide association study (GWAS)                                                  2

1.3. Expression quantitative trait loci (eQTLs)                                                         4

1.4. The era of high-throughput sequencing                                                            6

1.5. Machine learning and applications in biology                                                   8

1.6. Beyond genetics: epigenetic regulation                                                            9

1.7. Research Goals                                                                                                 10

Chapter 2. MTClass: Identification and annotation of multi-phenotype cis-eQTLs using machine learning                                                                                                            12

2.1. Abstract                                                                                                             12

2.2. Introduction                                                                                                        12

2.3. Results                                                                                                               15

2.4. Discussion                                                                                                         35

2.5. Data availability                                                                                                  39

2.6. Code availability                                                                                                39

2.7. Methods                                                                                                             39

2.8. Supplementary figures                                                                                      48

Chapter 3. LRmeth: a logistic regression approach for inferring tissue of origin from tumor DNA methylation profiles                                                                                                59

3.1. Abstract                                                                                                             59

3.2. Introduction                                                                                                        59

3.3. Results                                                                                                               62

3.4. Discussion                                                                                                         70

3.5. Conclusion                                                                                                         73

3.6. Methods                                                                                                             73

3.7. Data availability                                                                                                  78

3.8. Code availability                                                                                                78

3.9. Supplementary figures                                                                                      79

Chapter 4. Identification of putative enhancer regions in alpha motor neurons using multi-omic analysis                                                                                                            84

4.1. Abstract                                                                                                             84

4.2. Introduction                                                                                                        84

4.3. Results                                                                                                               87

4.4. Discussion                                                                                                         92

4.5. Conclusion                                                                                                         93

4.6. Methods                                                                                                            94

4.7. Supplementary figures                                                                                      98

Chapter 5. Concluding remarks                                                                               101

5.1. Summary of research findings                                                                           101

5.2. Limitations and other considerations                                                                103

5.3. Future directions                                                                                                105

5.4. Conclusions                                                                                                       107

References                                                                                                               109

Figures and Tables

Chapter 1. Introduction                                                                                             1

Figure 1.1                                                                                                                   3

Figure 1.2                                                                                                                  4

Chapter 2. MTClass: Identification and annotation of multi-phenotype cis-eQTLs using machine learning                                                                                                            12

Figure 2.1                                                                                                                  14

Figure 2.2                                                                                                                  17

Figure 2.3                                                                                                                  25

Figure 2.4                                                                                                                  27

Figure 2.5                                                                                                                  31

Figure 2.6                                                                                                                  34

Supplementary Figure S2.1                                                                                      48

Supplementary Figure S2.2                                                                                      49

Supplementary Figure S2.3                                                                                      50

Supplementary Figure S2.4                                                                                      51

Supplementary Figure S2.5                                                                                      52

Supplementary Figure S2.6                                                                                      53

Supplementary Figure S2.7                                                                                      54

Supplementary Figure S2.8                                                                                      55

Supplementary Figure S2.9                                                                                      56

Supplementary Table S2.1                                                                                       57

Supplementary Table S2.2                                                                                       58

Chapter 3. LRmeth: a logistic regression approach for inferring tissue of origin from tumor DNA methylation profiles                                                                                               59

Figure 3.1                                                                                                                  62

Table 3.1                                                                                                                   63

Figure 3.2                                                                                                                  64

Figure 3.3                                                                                                                  65

Table 3.2                                                                                                                   67

Figure 3.4                                                                                                                  68

Supplementary Figure S3.1                                                                                      79

Supplementary Figure S3.2                                                                                      80

Supplementary Figure S3.3                                                                                      81

Supplementary Figure S3.4                                                                                      82

Supplementary Figure S3.5                                                                                      83

Chapter 4. Identification of putative enhancer regions in alpha motor neurons using multi-omic analysis                                                                                                            84

Figure 4.1                                                                                                                  86

Figure 4.2                                                                                                                  89

Table 4.1                                                                                                                   90

Figure 4.3                                                                                                                  90

Figure 4.4                                                                                                                  92

Table 4.2                                                                                                                   94

Table 4.3                                                                                                                   96

Supplementary Figure S4.1                                                                                      98

Supplementary Figure S4.2                                                                                      99

Supplementary Figure S4.3                                                                                      100

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified Preview image embargoed

Primary PDF

Supplemental Files