Methylation Imputation from HM450K Array to EPIC Array with Autoencoder and Nonnegative Matrix Factorization Open Access
Shen, Yang (Spring 2023)
Abstract
DNA methylation is an essential epigenetic modification that plays a crucial role in gene expression regulation and cellular differentiation. DNA methylation profiling has been widely used in research to determine the development of various human diseases, including cancer, cardiovascular disease, and neurological disorders. The HumanMethylation450K (HM450K) arrays and the Enhanced DNA Methylation Profiling (EPIC) arrays are two commonly used high-throughput technologies that enable genome-wide DNA methylation profiling. The HM450K array covers approximately 450,000 CpG sites, while the EPIC array covers more than 850,000 CpG sites, and there's an overlap of around 440,000 CpG sites between the two arrays. In this study, our goal is to impute methylation levels from the HM450K array to the EPIC array to circumvent the need for expensive re-measurement using the EPIC array when HM450K array data is already available. Convolutional autoencoders and nonnegative matrix factorization (NMF) are both machine-learning techniques that are commonly used in the analysis of large-scale genomic data. Our approach involved using a convolutional autoencoder and an NMF model to capture the latent structure in the DNA methylation data and generate imputed values for all CpG sites in the EPIC arrays. We mainly focused on chromosome 18 to simplify our model. The overall RMSE was 0.0196, which was better than 0.04 from a simple linear regression model with nearby CpG sites. Our model was highly adaptable to other chromosomes and could easily adjust the dimensions of the results obtained from autoencoders to accommodate different chromosome sizes.
Table of Contents
Table of contents
1. Introduction 2
2. Method 3
2.1 Data Pre-processing 3
2.2 Dimension reduction: Convolutional Autoencoder 4
2.3 Nonnegative Matrix Factorization 6
3. Results 7
4. Discussions 12
References 14
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Methylation Imputation from HM450K Array to EPIC Array with Autoencoder and Nonnegative Matrix Factorization () | 2023-04-08 17:25:04 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|