Development of Statistical Tool TIGAR for Transcriptome-Integrated Genetic Association Resource Pubblico
Meng, Xiaoran (Spring 2019)
Abstract
Transcriptome-wide association studies (TWAS) have been used to leverage reference data that have both and genetic profiles for the same samples in gene-based genome-wide association studies (GWAS). Basically, an imputation model for genetically regulated gene expression levels (GReX) per gene tissue type can be fitted by applying regression models on the reference data, where the effect-sizes of cis-expression quantitative trait loci (cis-eQTL) on expression levels will be estimated and used as variant weights in gene-based association studies. Many statistical tools have been developed for implementing TWAS, such as PrediXcan based on the Elastic-Net regression model and FUSION based on the Bayesian sparse linear mixed model (BSLMM). However, existing tools only implement parametric regression models to fit GReX imputation models, which have limitations to fully model the complex genetic architecture of transcriptome profiles. Recently proposed nonparametric Bayesian Dirichlet process regression (DPR) model has been shown improved the imputation accuracy of GReX over parametric regression models. Thus, my thesis is focused on developing a statistical tool to implement both parametric Elastic-Net model and nonparametric DPR model for fitting GReX imputation models and enable follow-up TWAS with both individual-level and summary-level GWAS data. To make the tool computationally efficient, I used advanced computational techniques such as multi-threading for parallel computation and TABIX for loading genotype data with memory efficiency. The tool is referred as Transcriptome-integrated Genetic Association Resource (TIGAR). In addition, to illustrate the advantages of TIGAR, I applied the tool on GTEx reference dataset to train GReX imputation models of brain frontal cortex tissue, and then conducted TWAS on ROS/MAP GWAS data for 4 different complex traits related to Alzheimer's Disease (AD) -- neurofibrillary tangle density, -amyloid load, global AD pathology burden and final consensus cognitive diagnosis. Application results show that the DPR model obtained higher in both training and prediction data, and the Elastic-Net model lead to 3 potentially significant genes (with FDR 0.077) that might be associated with -amyloid load. Overall, TIGAR is expected to provide a user-friendly, flexible, and computationally efficient tool for implementing TWAS.
Table of Contents
1 Introduction 1
2 TIGAR 4
2.1 --SizesCalculation..................... 4
2.1 DirichletProcessRegression (DPR) 5
2.2 TWAS.................................... 6
2.2.1 UnivariatePhenotype 6
2.2.2 MultivariatePhenotype ...................... 7
2.3 ComputationalAdvantages......................... 7
3 Data Description 8
3.1 ROS/MAPData .............................. 8
3.2 GTExData ................................. 10
4 Application of TIGAR 11
4.1 ApplicationAnalysisSteps ......................... 11
4.2 ApplicationResult ............................. 12
5 Discussion 23
Reference 24 Website 29
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Development of Statistical Tool TIGAR for Transcriptome-Integrated Genetic Association Resource (Thesis Original File) | 2019-04-09 00:44:32 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|