Novel Model-based Methods for High-throughput Genomics Data Analysis Open Access

Li, Ben (Fall 2017)

Permanent URL:


In this dissertation, I propose three model-based methods for improving genomics data analysis by utilizing existing external datasets (“Historical Data”).

In the first topic, I propose a Bayesian inference framework with historical data-based informative priors to improve detection of differentially expressed (DE) genes. To evaluate the feasibility and effectiveness of my Bayesian framework, I use a normal-inv chi-square model on gene expression microarray data and Bayes factors (BF) are calculated to rank the top DE genes. Extensive real data-based simulations and real data analyses are conducted to illustrate the advantages of the proposed method.

In my second topic, I propose rank-based strategies to incorporating historical information into new experimental datasets. Ranks from historical data are used to determine groups or windows for new experimental datasets. I also propose a group dividing metric (GDM) to determine the optimal number of groups or size of windows. Through real data-based simulations and real data analysis, I demonstrate that proposed strategies can be easily applied to gene expression microarray data and methylation array data. I also showed the potential of borrowing information across different platforms for the proposed method by applying new strategies to BS-Seq data.

In the third topic, I propose a two-step strategy to summarize and borrow information from historical data by “gene panels”. In the first step, I use a penalized EM algorithm to define gene panels, which summarizing information of target gene, from historical data. In the second step, tasks could be accomplished with better accuracy or previously impossible tasks could be possible when incorporating gene panels. By simulation studies and real data examples, I demonstrate that the use of gene panels improves data analytics results in detecting DE genes, especially with extremely few or no replicates available. 

Table of Contents

Introduction       1

1.1 Overview     1

1.2 Literature Review     2

1.2.1 Gene Expression   2

1.2.2 DNA methylation  7

1.2.3 Hierarchical Models             10

1.3 Outline          11

Bayesian inference with historical data-based informative priors improves detection of differentially expressed genes                13

2.1 Methods      13

2.1.1 Motivation               13

2.1.2 Informative prior Bayesian test (IPBT)          14

2.1.3 Inference and Testing         17

2.1.4 Informative Priors 20

2.2 Simulation Study       22

2.2.1 Simulation Study I: Alleviation of Over-shrinkage    23

2.2.2 Simulation Study II: DE Gene Detection Performances          28

2.2.3 Simulation Study III: Impact of Inaccurate Historical Data     34

2.3 Real Data Analysis    35

2.3.1 Real Data Study I: Global Gene Expression Map       35

2.3.2 Real Data Study II: Latin Square Hgu133a Spike-in Experiment Data 43

2.4 Discussion and Conclusion    44

Improving hierarchical models using rank information from historical data with applications in high throughput genomics data analysis  48

3.1 Methods      48

3.1.1 Motivation               48

3.1.2 stHM and swHM   51

3.2 Simulation Study       55

3.2.1 Simulation Study I: SD Estimate and Group Dividing               55

3.2.2 Simulation Study II: DE Gene Detection Performances          58

3.3 Real Data Analysis    61

3.3.1 Real Data Study I: Global Gene Expression Map       61

3.3.2 Real Data Study II: DNA Methylation Data   65

3.4 Discussion and Conclusion    67

3.5 Appendices 68

Using historical data inferred gene panels to improve statistical inference on high throughput genomics data        70

4.1 Methods      70

4.1.1 Motivation               70

4.1.2 Overview of IPBTSeq           71

4.1.3 Identify gene panels            73

4.1.4 Distance and Imputation Score       78

4.2 Simulation Study       80

4.2.1 Validation of gene panels  80

4.2.2 Detect DE Genes   82

4.3 Real Data Analysis    84

4.3.1 Landscape for gene panels 84

4.3.2 Detect DE Genes   85

4.4 Discussion and Conclusion    87

Summary and Future Work          89

Bibliography       92



About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files