TIGAR-V2 with nonparametric Bayesian eQTL weights estimated from GTEx V8 & Leveraging multiple reference panels to improve TWAS power by ensemble machine learning Open Access

Parrish, Randy (Spring 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/5x21tg67s?locale=en
Published

Abstract

Background: Transcriptome-wide association study (TWAS) is a popular technique for integrating reference transcriptomic data with data from genome-wide association studies (GWAS) to conduct gene-based association studies. The standard two-stage TWAS methods train gene expression prediction models on reference data, and then test the association between the predicted genetically regulated gene expression (GReX) and phenotype of interest for test data. Limitations of existing TWAS tools make it difficult for users to train GReX prediction models using their own data and no methods currently exists for leveraging multiple reference panels to improve TWAS power.

Methods: In part one, we develop a new version of the Transcriptome-Integrated Genetic Association Resource (TIGAR-V2), train nonparametric Bayesian DPR gene expression prediction models for 49 tissues from the Genotype-Tissue Expression (GTEx) project V8 reference panel, and validate the TIGAR-V2 method using application TWAS of breast and ovarian cancer. In part two, we develop a novel Stacked Regression based TWAS (SR-TWAS) method for leveraging multiple reference panels using ensemble machine learning and validated our method using simulation studies and real TWAS leveraging two reference panels of brain frontal cortex tissue.

Results: TIGAR-V2 identified 88 TWAS risk genes for breast cancer, most of which are known or near previously identified GWAS (84; 95%) or TWAS (35; 40%) risk genes. TIGAR-V2 identified 37 TWAS risk genes of ovarian cancer, most of which are known or near previously identified GWAS (35; 95%) or TWAS (13; 35%) risk genes. TIGAR-V2 identified 1 novel independent risk gene of breast cancer with known biological functions involved in carcinogenesis and 2 novel independent risk genes of both breast and ovarian cancer which are near such genes. SR-TWAS models had higher gene expression prediction accuracy and TWAS power than the models trained on single cohorts in all simulation scenarios and outperformed both single cohort models in the real data application GReX prediction.

Conclusions: We believe our improved TIGAR-V2 and SR-TWAS tools will provide a useful resource for mapping risk genes of complex diseases by TWAS.

Table of Contents

Contents

1 Introduction...............................................................................1

2 TIGAR-V2 with nonparametric Bayesian eQTL weights estimated from GTEx V8...................5

2.1 Methods..................................................................................5

2.1.1 TIGAR-V2 ..............................................................................5

2.1.2 GTExV8Data ............................................................................9

2.1.3 Training gene expression prediction models with GTEx V8...............................10

2.1.4 Application TWAS of Breast and Ovarian Cancer.........................................11

2.2 Results.................................................................................11

2.2.1 Model Training Results................................................................11

2.2.2 Application TWAS Results..............................................................14

2.3 Discussion..............................................................................20

3 Leveraging multiple reference panels to improve TWAS power by ensemble machine learning...22

3.1 Methods.................................................................................22

3.1.1 Stacked Regression ...................................................................22

3.1.2 SR-TWAS Tool Framework................................................................23

3.1.3 ROS/MAP Data..........................................................................25

3.1.4 Simulation Study Design...............................................................26

3.1.5 Application Studies Leveraging GTEx V8 and ROS/MAP Reference Panels...................27

3.2 Results.................................................................................28

3.2.1 Simulation Study Results .............................................................28

3.2.2 Application Studies Leveraging GTEx V8 and ROS/MAP Reference Panels...................33

3.3 Discussion..............................................................................37

4 Conclusion................................................................................39

Appendix A Downsampled Study Results........................................................41

Appendix B Application Breast and Ovarian Cancer TWAS Results...............................43

B.1 TIGAR Results...........................................................................43

B.2 PrediXcan Results ......................................................................44

B.3 Genes Significantin Multiple TWAS Results...............................................46

Appendix C SR-TWAS Code.....................................................................51

References..................................................................................55

List of Figures

2.1 TIGAR-V2 framework....................................................................9

2.2 Results of GReX prediction model training with GTEx V8 data by TIGAR-V2..............13

2.3 Computation costs for GReX prediction model training with GTEx V8 data by TIGAR-V2...14

2.4 Manhattan plots of TWAS results by TIGAR-V2..........................................18

3.1 Training results for SR-TWAS simulations.............................................30

3.2 Density plot of ROS/MAP zeta values for SR-TWAS simulations..........................31

3.3 Average expression prediction R^2 results for SR-TWAS simulations....................32

3.4 TWAS power results for SR-TWAS simulations...........................................33

3.5 Computation costs for SR-TWAS model training from ROS and GTEx V8 models.............35

3.6 Density plot of ROS zeta values for SR-TWAS applied to real data.....................35

3.7 Scaled density plot of prediction R^2 results for SR-TWAS vs single cohort models....36

3.8 Prediction R^2 results for SR-TWAS vs single cohort models...........................37

A.1 Density plots of training R^2 for downsampled study..................................41

A.2 Density plots of CV R^2 for downsampled study........................................42

B.1 Manhattan plots of TWAS results by PrediXcan.........................................44

B.2 Venn diagram of number of TWAS risk genes identified.................................49

B.3 QQ-Plots of TWAS Results.............................................................50

List of Tables

2.1 Independent TWAS risk genes of BC identified by TIGAR-V2.....................19

2.2 Independent TWAS risk genes of OC identified by TIGAR-V2.....................20

3.1 Prediction R^2 results for SR-TWAS vs single cohort models...................36

3.2 Prediction R^2 results for SR-TWAS vs single cohort models...................37

3.3 Pairwise comparison of model prediction R^2 by number of genes...............37

B.1 TWAS risk genes of both BC and OC identified by TIGAR-V2.....................43

B.2 Independent TWAS risk genes of BC identified by PrediXcan....................45

B.3 Independent TWAS risk genes of OC identified by PrediXcan....................46

B.4 TWAS risk genes of both BC and OC identified by PrediXcan....................46

B.5 Total number of TWAS risk genes identified by model, cancer type.............46

B.6 TWAS risk genes identified by multiple models or for multiple cancer types...47

B.7 TWAS risk genes not previously identified in BC, OC GWAS.....................48

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files