Bayesian Functional Genome-wide Association Study using Standardized Individual-level and Summary-level GWAS Data Pubblico
Wang, Lei (Spring 2022)
Abstract
Background: Genome-wide association study associates specific genetic variations with the human complex traits and diseases. And Bayesian Functional Genome-wide Association Study (BFGWAS) method integrates functional annotation with GWAS data, based on a multivariate Bayesian regression model for variants in locus. The current method requires individual-level data, which limits the scope of application of the BFGWAS method to public available GWAS summary data. The main bottleneck is implementing MCMC algorithm using only GWAS summary data and reference linkage disequilibrium (LD) information. Thus, my thesis project is to adapt the BFGWAS method for standardized genotype and phenotype data so that it can be applied to summary data.
Methods and Materials: In this project, I derived the MCMC algorithm using standardized genotype and phenotype from either the individual-level or the summary-level data. A simulation study is conducted to test this novel method. I used the odds ratio from the real GWAS summary data of Age-related Macular Degeneration, and then simulated quantitative phenotype data for testing our tools with standardized individual-level and summary-level GWAS data.
Results: From the simulation results, the tools when using summary statistics can greatly improve the work efficiency comparing to the individual data. The total time cost for individual data is 2.1505 min. And this could be 0.0905 min when using summary statistics. Since the summary-level data was generated from the individual-level data, using summary-level data showed similar performance as using individual-level data. By taking variants with posterior causal probability larger than 0.1 as potential causal variants in our work, the detected potential SNPs from individual data and summary statistics are comparably consistent.
Conclusion: In this paper, I propose to extend the BFGWAS method for studying summary-level GWAS data through the MCMC algorithm based on standardized genotype and phenotype. The usefulness of summary statistics was demonstrated in a simulation. However, there are also some limitations here. The reference LD matrix may miss some values in real data, and this will cause computation error in MCMC algorithm and result in an unreliable conclusion. Thus, further real data will be considered and tested in the future work.
Table of Contents
Introduction 1
Material And Method 4
Bayesian Variable Selection Model 4
Posterior Distribution of Estimators 6
The Conditional Posterior Distribution 6
Markov Chain Monte Carlo Algorithm 9
EM Update 9
Simulation Study Results 10
Test Data 10
Individual Data with Standardization 10
Use Summary Statistics with Standardization 11
Efficiency Comparison 11
Discussion 14
References 16
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Bayesian Functional Genome-wide Association Study using Standardized Individual-level and Summary-level GWAS Data () | 2022-04-11 02:49:50 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|