Bayesian Functional Genome-wide Association Study using Standardized Individual-level and Summary-level GWAS Data Open Access

Wang, Lei (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/hh63sx225?locale=en
Published

Abstract

Background: Genome-wide association study associates specific genetic variations with the human complex traits and diseases. And Bayesian Functional Genome-wide Association Study (BFGWAS) method integrates functional annotation with GWAS data, based on a multivariate Bayesian regression model for variants in locus. The current method requires individual-level data, which limits the scope of application of the BFGWAS method to public available GWAS summary data. The main bottleneck is implementing MCMC algorithm using only GWAS summary data and reference linkage disequilibrium (LD) information. Thus, my thesis project is to adapt the BFGWAS method for standardized genotype and phenotype data so that it can be applied to summary data. 

Methods and Materials: In this project, I derived the MCMC algorithm using standardized genotype and phenotype from either the individual-level or the summary-level data. A simulation study is conducted to test this novel method. I used the odds ratio from the real GWAS summary data of Age-related Macular Degeneration, and then simulated quantitative phenotype data for testing our tools with standardized individual-level and summary-level GWAS data.

Results: From the simulation results, the tools when using summary statistics can greatly improve the work efficiency comparing to the individual data. The total time cost for individual data is 2.1505 min. And this could be 0.0905 min when using summary statistics. Since the summary-level data was generated from the individual-level data, using summary-level data showed similar performance as using individual-level data. By taking variants with posterior causal probability larger than 0.1 as potential causal variants in our work, the detected potential SNPs from individual data and summary statistics are comparably consistent.

Conclusion: In this paper, I propose to extend the BFGWAS method for studying summary-level GWAS data through the MCMC algorithm based on standardized genotype and phenotype. The usefulness of summary statistics was demonstrated in a simulation. However, there are also some limitations here. The reference LD matrix may miss some values in real data, and this will cause computation error in MCMC algorithm and result in an unreliable conclusion. Thus, further real data will be considered and tested in the future work.

Table of Contents

Introduction 1

Material And Method 4

Bayesian Variable Selection Model 4

Posterior Distribution of Estimators 6

The Conditional Posterior Distribution 6

Markov Chain Monte Carlo Algorithm 9

EM Update 9

Simulation Study Results 10

Test Data 10

Individual Data with Standardization 10

Use Summary Statistics with Standardization 11

Efficiency Comparison 11

Discussion 14

References 16

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files