Applications of Machine learning in Imaging genetics Restricted; Files Only

Yu, Shaojun (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/mc87pr536?locale=fr
Published

Abstract

Imaging genetics is a growing field that explores the connections between genetic factors and imaging measures. It has successfully identified numerous genotype-phenotype associations in various diseases and traits, offering insights into individual differences in behavior, disease development, and body function. However, traditional methods for conducting genome-wide association studies (GWAS) with imaging data have limitations. These include the need for prior knowledge to select imaging phenotypes (IPs), the loss of complex relationships between different IPs, and a focus on linear associations. To address these limitations, we introduce a pioneering framework that employs a classification-based GWAS approach, harnessing the power of machine learning algorithms such as convolutional neural networks, vision transformers, and imaging autoencoders. This framework autonomously extracts IPs and uncovers non-linear associations, bridging the gap between traditional GWAS and the complexity of biological data. This framework encompasses two research projects spanning from proof-of-concept analyses on 2D imaging data to real applications on 3D imaging data.

Furthermore, we extend our scope to include epigenetic modifications, particularly DNA methylation, which play a pivotal role in genomic loci outcomes. To facilitate epigenomegenome wide association studies (EWAS) with imaging data, we present a novel DNA methylation imputation method. This method, termed AutoMethy, effectively imputes DNA methylation levels across diverse array platforms, overcoming the barrier of platform disparities and enhancing the integration of EWAS with imaging data.

We primarily developed our approach based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, which includes brain magnetic resonance imaging (MRI), genetic data (single nucleotide polymorphisms), and epigenetic data (DNA methylation array). Our findings underscore the potency of machine learning methods in unveiling new brain imaging phenotypes and identifying novel associations, thereby contributing to a more comprehensive understanding of the genetic and epigenetic basis of brain structure.

Table of Contents

LIST OF FIGURES .......................................................................................................................................................9

LIST OF TABLES ...................................................................................................................................................... 12

1. INTRODUCTION............................................................................................................................................ 13

1.1 Aims ............................................................................................................................................... 14

2. BACKGROUND ............................................................................................................................................. 17

2.1 Genome-wide association study ................................................................................................... 17

2.2 Magnetic Resonance Imaging ....................................................................................................... 18

2.3 Imaging Genetics ........................................................................................................................... 19

2.4 Epigenetics and DNA Methylation................................................................................................. 19

3. GENOME-WIDE ASSOCIATION STUDY OF WHOLE BRAIN MRI IMAGES USING DEEP LEARNING ....................................... 22

3.1 Materials ....................................................................................................................................... 23

3.2 Methods ........................................................................................................................................ 24

3.2.1 Data Preprocessing .................................................................................................................................... 24

3.2.2 Overview of the Method............................................................................................................................ 25

3.2.3 Simulation Study ........................................................................................................................................ 27

3.2.4 Genotyping data and processing. .............................................................................................................. 28

3.2.5 CNN Model architecture and model training............................................................................................. 28

3.2.6 Permutation test of the classification performance .................................................................................. 29

3.2.7 Functional Annotation ............................................................................................................................... 29

3.2.8 Saliency map construction ......................................................................................................................... 30

3.3 Results ........................................................................................................................................... 31

3.3.1 Simulation Results ...................................................................................................................................... 31

3.3.2 GWAS results .............................................................................................................................................. 32

3.3.3 Saliency maps ............................................................................................................................................. 38

3.4 Discussion ...................................................................................................................................... 39

4. NEW PHENOTYPE DISCOVERY WITH 3D BRAIN MRI IMAGES ................................................................................. 43

4.1 Materials ....................................................................................................................................... 44

4.2 Methods ........................................................................................................................................ 44

4.2.1 Data Preprocessing .................................................................................................................................... 44

4.2.2 3D ViT autoencoder ................................................................................................................................... 45

4.2.3 Visualization of endo-phenotypes ............................................................................................................. 47

4.2.4 Classification-based GWAS......................................................................................................................... 48

4.2.5 Integrated t-maps....................................................................................................................................... 49

4.3 Results ........................................................................................................................................... 50

4.3.1 Overview of the framework ....................................................................................................................... 50

7 4.3.2 Sex classification results ............................................................................................................................. 52

4.3.3 New brain Phenotype visualization ........................................................................................................... 53

4.3.4 GWAS Results ............................................................................................................................................. 54

4.4 Discussion ...................................................................................................................................... 58

5. METHYLATION IMPUTATION FOR IMAGING EPIGENETICS ....................................................................................... 60

5.1 Materials ....................................................................................................................................... 62

5.2 Methods ........................................................................................................................................ 62

5.2.1 Data Preprocessing .................................................................................................................................... 62

5.2.2 Method Overview ...................................................................................................................................... 63

5.2.3 Autoencoder .............................................................................................................................................. 64

5.2.4 NMF ............................................................................................................................................................ 66

5.2.5 Model training and chromosome encoding............................................................................................... 68

5.2.6 Implementation of other imputation methods ......................................................................................... 69

5.2.7 Performance measurement ....................................................................................................................... 69

5.2.8 Shiny webserver implementation .............................................................................................................. 70

5.3 Results ........................................................................................................................................... 70

5.3.1 Imputation performance on two independent datasets ........................................................................... 70

5.3.2 Cross-dataset validation ............................................................................................................................. 73

5.3.3 Imputation confidence ............................................................................................................................... 74

5.3.4 Methylation webserver .............................................................................................................................. 75

5.4 Discussion ...................................................................................................................................... 76

REFERENCE........................................................................................................................................................... 78

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification Preview image embargoed

Primary PDF

Supplemental Files