Development of Statistical Tool TIGAR for Transcriptome-Integrated Genetic Association Resource Pubblico

Meng, Xiaoran (Spring 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/2z10wr30v?locale=it
Published

Abstract

Transcriptome-wide association studies (TWAS) have been used to leverage reference data that have both and genetic profiles for the same samples in gene-based genome-wide association studies (GWAS). Basically, an imputation model for genetically regulated gene expression levels (GReX) per gene tissue type can be fitted by applying regression models on the reference data, where the effect-sizes of cis-expression quantitative trait loci (cis-eQTL) on expression levels will be estimated and used as variant weights in gene-based association studies. Many statistical tools have been developed for implementing TWAS, such as PrediXcan based on the Elastic-Net regression model and FUSION based on the Bayesian sparse linear mixed model (BSLMM). However, existing tools only implement parametric regression models to fit GReX imputation models, which have limitations to fully model the complex genetic architecture of transcriptome profiles. Recently proposed nonparametric Bayesian Dirichlet process regression (DPR) model has been shown improved the imputation accuracy of GReX over parametric regression models. Thus, my thesis is focused on developing a statistical tool to implement both parametric Elastic-Net model and nonparametric DPR model for fitting GReX imputation models and enable follow-up TWAS with both individual-level and summary-level GWAS data. To make the tool computationally efficient, I used advanced computational techniques such as multi-threading for parallel computation and TABIX for loading genotype data with memory efficiency. The tool is referred as Transcriptome-integrated Genetic Association Resource (TIGAR). In addition, to illustrate the advantages of TIGAR, I applied the tool on GTEx reference dataset to train GReX imputation models of brain frontal cortex tissue, and then conducted TWAS on ROS/MAP GWAS data for 4 different complex traits related to Alzheimer's Disease (AD) -- neurofibrillary tangle density, -amyloid load, global AD pathology burden and final consensus cognitive diagnosis. Application results show that the DPR model obtained higher in both training and prediction data, and the Elastic-Net model lead to 3 potentially significant genes (with FDR 0.077) that might be associated with -amyloid load. Overall, TIGAR is expected to provide a user-friendly, flexible, and computationally efficient tool for implementing TWAS.

Table of Contents

1 Introduction 1

2 TIGAR 4

2.1 --SizesCalculation..................... 4

2.1 DirichletProcessRegression (DPR) 5

2.2 TWAS.................................... 6

2.2.1 UnivariatePhenotype 6

2.2.2 MultivariatePhenotype ...................... 7

2.3 ComputationalAdvantages......................... 7

3 Data Description 8

3.1 ROS/MAPData .............................. 8

3.2 GTExData ................................. 10

4 Application of TIGAR 11

4.1 ApplicationAnalysisSteps ......................... 11

4.2 ApplicationResult ............................. 12

5 Discussion 23

Reference 24 Website 29

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Parola chiave
Committee Chair / Thesis Advisor
Committee Members
Ultima modifica

Primary PDF

Supplemental Files