Background: As modern high-throughput sequencing technologies such as microarray have become essential in biological studies, the number of publicly assessible datasets has also dramatical increased. The next generation sequencing technologies lead to the ‘large p, small n’ problem, we focus on the extreme case that there is no replicate in the sample when detecting differentially expressed genes. To combine historical data and current studies, hierarchical models serve as the ideal tools.
Methods: The key idea of our method is to borrow information from highly correlated “relative” genes when conducting inference on a single gene. We utilize historical data to identify the correlation structure and specify an informative prior distribution, followed by Bayesian inference using the informative prior. We use the posterior distribution to make statistical inference, and also rank the probability of differential expressed genes.
Results: In simulation studies, our proposed strategy make accurate and robust inference on gene expression levels. It also outperforms GFOLD in differentially expressed genes detection with lower false discovery rate and larger area under the receiver operating characteristic curve.
Conclusion: We illustrated the feasibility and effectiveness of using informative priors from historical data to help detect differentially expressed genes when there is no replicate.
Table of Contents
1. Introduction.. 1
2. Method. 2
2.1 Motivation. 2
2.2 Model 3
2.3 Statistical Inference and Testing. 3
3. Results. 5
3.1 Statistical Inference using Gibbs Sampler 5
3.2 Differentially Expressed Genes Detection. 6
4. Discussion.. 8
5. Conclusion.. 9
About this Master's Thesis
|Subfield / Discipline|
|Committee Chair / Thesis Advisor|
|File download under embargo until 20 May 2020||2019-04-09||File download under embargo until 20 May 2020|