Novel Model-based Methods for High-throughput Genomics Data Analysis Public
Li, Ben (Fall 2017)
Abstract
In this dissertation, I propose three model-based methods for improving genomics data analysis by utilizing existing external datasets (“Historical Data”).
In the first topic, I propose a Bayesian inference framework with historical data-based informative priors to improve detection of differentially expressed (DE) genes. To evaluate the feasibility and effectiveness of my Bayesian framework, I use a normal-inv chi-square model on gene expression microarray data and Bayes factors (BF) are calculated to rank the top DE genes. Extensive real data-based simulations and real data analyses are conducted to illustrate the advantages of the proposed method.
In my second topic, I propose rank-based strategies to incorporating historical information into new experimental datasets. Ranks from historical data are used to determine groups or windows for new experimental datasets. I also propose a group dividing metric (GDM) to determine the optimal number of groups or size of windows. Through real data-based simulations and real data analysis, I demonstrate that proposed strategies can be easily applied to gene expression microarray data and methylation array data. I also showed the potential of borrowing information across different platforms for the proposed method by applying new strategies to BS-Seq data.
In the third topic, I propose a two-step strategy to summarize and borrow information from historical data by “gene panels”. In the first step, I use a penalized EM algorithm to define gene panels, which summarizing information of target gene, from historical data. In the second step, tasks could be accomplished with better accuracy or previously impossible tasks could be possible when incorporating gene panels. By simulation studies and real data examples, I demonstrate that the use of gene panels improves data analytics results in detecting DE genes, especially with extremely few or no replicates available.
Table of Contents
Introduction 1
1.1 Overview 1
1.2 Literature Review 2
1.2.1 Gene Expression 2
1.2.2 DNA methylation 7
1.2.3 Hierarchical Models 10
1.3 Outline 11
Bayesian inference with historical data-based informative priors improves detection of differentially expressed genes 13
2.1 Methods 13
2.1.1 Motivation 13
2.1.2 Informative prior Bayesian test (IPBT) 14
2.1.3 Inference and Testing 17
2.1.4 Informative Priors 20
2.2 Simulation Study 22
2.2.1 Simulation Study I: Alleviation of Over-shrinkage 23
2.2.2 Simulation Study II: DE Gene Detection Performances 28
2.2.3 Simulation Study III: Impact of Inaccurate Historical Data 34
2.3 Real Data Analysis 35
2.3.1 Real Data Study I: Global Gene Expression Map 35
2.3.2 Real Data Study II: Latin Square Hgu133a Spike-in Experiment Data 43
2.4 Discussion and Conclusion 44
Improving hierarchical models using rank information from historical data with applications in high throughput genomics data analysis 48
3.1 Methods 48
3.1.1 Motivation 48
3.1.2 stHM and swHM 51
3.2 Simulation Study 55
3.2.1 Simulation Study I: SD Estimate and Group Dividing 55
3.2.2 Simulation Study II: DE Gene Detection Performances 58
3.3 Real Data Analysis 61
3.3.1 Real Data Study I: Global Gene Expression Map 61
3.3.2 Real Data Study II: DNA Methylation Data 65
3.4 Discussion and Conclusion 67
3.5 Appendices 68
Using historical data inferred gene panels to improve statistical inference on high throughput genomics data 70
4.1 Methods 70
4.1.1 Motivation 70
4.1.2 Overview of IPBTSeq 71
4.1.3 Identify gene panels 73
4.1.4 Distance and Imputation Score 78
4.2 Simulation Study 80
4.2.1 Validation of gene panels 80
4.2.2 Detect DE Genes 82
4.3 Real Data Analysis 84
4.3.1 Landscape for gene panels 84
4.3.2 Detect DE Genes 85
4.4 Discussion and Conclusion 87
Summary and Future Work 89
Bibliography 92
About this Dissertation
| School | |
|---|---|
| Department | |
| Degree | |
| Submission | |
| Language | 
 | 
| Research Field | |
| Mot-clé | |
| Committee Chair / Thesis Advisor | |
| Committee Members | 
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions | 
|---|---|---|---|
|  | Novel Model-based Methods for High-throughput Genomics Data Analysis () | 2017-10-03 15:58:07 -0400 |  | 
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions | 
|---|