Bayesian Feature Selection Methods for Complex Biomedical Data Open Access

Zhao, Yize (2014)

Permanent URL: https://etd.library.emory.edu/concern/etds/cj82k7573?locale=en
Published

Abstract

Motivated by three different biomedical studies, this dissertation investigates novel Bayesian feature selection methods to analyze complex biomedical data.

In the first project, motivated by the colorectal cancer study, we propose a unified Bayesian approach for hierarchical feature selection of structured functional predictors in Generalized Functional Linear Models (GFLMs). Feature selection here is inherently hierarchical, involving selection of functional predictors and selection of regions within them. To achieve hierarchical feature selection, we construct a class of mixture priors for functional coefficients based on Gaussian processes. In addition, we use Ising priors on the model space to incorporate hierarchical structural information. Applying our approach to the motivating study, we find that one functional biomarker and its expression level in the transitional region between the proliferation and differentiation zones are associated with the risk for colorectal cancer.

In the second project, motivated by the Autism Brain Imaging Data Exchange (ABIDE) study, we are interested in identifying important biomarkers for early detection of the ASD under high resolution brain. We propose a novel multiresolution variable selection procedure under a Bayesian probit regression framework and it recursively uses posterior samples for variable selection at a lower resolution to guide variable selection at a higher resolution. The proposed algorithms are computationally feasible for ultra-high dimensional data. In addition, we also incorporate two levels of structural information into variable selection. Applied to the resting state functional magnetic resonance imaging (R-fMRI) data in the ABIDE study, our methods identify imaging biomarkers predictive of the ASD in several brain regions, which are biologically meaningful and interpretable.


Finally, with the goal to select gene and gene subnetworks with periodic behavior in a microarray dataset, we propose a nonparametric Bayesian model incorporating network information. In addition to identifying genes that have a strong association with a clinical outcome, our model can select genes with particular expressional behavior. We show that our proposed model is equivalent to an infinity mixture model for which we develop a posterior computation algorithm. We also propose two fast computing algorithms that approximate the posterior simulation with good gene selection accuracy but low computational cost.

Table of Contents

Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 A Colorectal Adenoma Study . . . . . . . . . . . . . . . . . . 2
1.1.2 Autism Brain Imaging Data Exchange (ABIDE) . . . . . . . . 4
1.1.3 Spellman Yeast Cell Cycle Microarray Data . . . . . . . . . . 5
1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Variable Selection in High-Dimensional Feature Space . . . . . 6
1.2.2 Incorporating Biological Information . . . . . . . . . . . . . . 10
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Bayesian Hierarchical Feature Selection of Structured Functional
Predictors Measured with Error 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Feature Selection in GFLMs . . . . . . . . . . . . . . . . . . . 15
2.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Basic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Priors for Functional Coecients: Feature Selection . . . . . 19
2.2.3 Hyperpriors: Incorporating Structural Information . . . . . . . 22
2.2.4 Model Approximation . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Posterior Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 Posterior inference for C under SBPM . . . . . . . . . . . . . 28
2.3.2 Posterior Inference for C and γ under HFSM . . . . . . . . . 29

2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 Application to the Colorectal Adenoma Data . . . . . . . . . . . . . . 34
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7.1 Marginalizing the Likelihood with respect to Θ3, Θ1, β and α 38

2.7.2 MCMC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Bayesian Spatial Variable Selection for Ultra-High Dimensional Neuroimaging
Data: A Multiresolution Approach 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Variable Selection in Ultra-high dimensionality . . . . . . . . . 49
3.1.2 Multiresolution Approach . . . . . . . . . . . . . . . . . . . . 50
3.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 A Probit Regression Model for Variable Selection . . . . . . . 52
3.2.2 Prior Specications . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.3 Standard Posterior Computation . . . . . . . . . . . . . . . . 54
3.3 Multiresolution Approach . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3.1 Partition and Auxiliary Models . . . . . . . . . . . . . . . . . 56
3.3.2 Sequential Resolution Sampling . . . . . . . . . . . . . . . . . 58
3.3.3 Fast Sequential Resolution Sampling . . . . . . . . . . . . . . 63
3.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 A Bayesian nonparametric mixture model for selecting genes and
gene sub-networks 82
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.1 A network based DPM model for gene selection . . . . . . . . 87
4.2.2 Model Representations . . . . . . . . . . . . . . . . . . . . . . 88
4.2.3 Posterior Computation . . . . . . . . . . . . . . . . . . . . . . 90
4.2.4 Fast Computation Algorithms . . . . . . . . . . . . . . . . . . 91
4.2.5 The choice of hyper-parameters . . . . . . . . . . . . . . . . . 95
3.5 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.1 Simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.2 Simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.3 Simulation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.7.1 Standard Posterior Computation Algorithm . . . . . . . . . . 78
3.7.2 SRS-MCMC Algorithm . . . . . . . . . . . . . . . . . . . . . . 80
3.7.3 fastSRS-MCMC Algorithm . . . . . . . . . . . . . . . . . . . . 81

4.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4.1 Simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.2 Simulation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.6.1 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.6.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6.3 Hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.6.4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 115
5 Summary and future research 117

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files