Statistical Methods for Omics Data Integration Open Access

Jin, Zhuxuan (Fall 2017)

Permanent URL:


In the first topic, we propose a statistical model to integrate gene expression profiles with gene network for feature classification. Existing methods do not allow flexible modeling of sub-types of genes and they ignore nodes without observed expressions. To address these limitations, we propose a Bayesian nonparametric method for gene classification. A new prior is developed for the class indicators incorporating the network dependencies. Missing gene nodes are handled by imputation. Our method can achieve increased classification accuracy in simulations. We illustrate our method on a survival analysis of the cutaneous melanoma dataset from the Cancer Genome Atlas and obtain some meaningful results.

In the second topic, we propose a computational method for integrating the LC-MS metabolomics data with the metabolic network and adduct ion relations for missing value imputation. Existing methods are mostly borrowed from microarray studies without considering feature relations or network information. Our algorithm incorporates the metabolic network, adduct ion relations, linear and nonlinear associations between features to build a feature-level network. The proposed method resorts to support vector regression for imputation based on features in the neighborhood on the network. It can achieve a smaller normalized root mean squared error in real data-based simulations.

In the third topic, we propose a statistical model to integrate genotypes with brain imaging phenotypes for activation shape estimation and gene discovery for Alzheimer’s disease. There is lack of statistical methods to perform genetic dissection of brain activation phenotypes such as shape and intensity. We propose a Bayesian hierarchical model which consists of two levels of hierarchy. At level 1, a Bayesian non-parametric level set model is used for studying the activation shape. At level 2, a regression model is constructed to select genetic variants that are strongly associated with the activation intensity, where a spike-and-slab prior and a Gaussian process prior are chosen for feature selection. The advantages of the method are illustrated via simulations and analyses of imaging genetics data from the Alzheimer’s disease neuroimaging initiative. 

Table of Contents

1 Introduction 

1.1 Omics Data

1.2 Applications

1.2.1 Feature Classification

1.2.2 Missing Values in Omics Data Research

1.2.3 Alzheimer’s Disease

1.3 Omics Data Sources 

1.4 Outline

2 Integrate Gene Expression Profiles with Gene Network for Feature Classification 

2.1 Introduction

2.2 Bayesian Nonparametric Feature Classification

2.2.1 The Model

2.2.2 Prior Specifications

2.2.3 Missing Data Imputation

2.3 Posterior Computation

2.4 Simulation Studies 

2.4.1 Complete Data Cases

2.4.2 Missing Data Cases

2.5 Survival Analysis of Cutaneous Melanoma

2.6 Discussion

3 Integrate the LC-MS Metabolomics Data with Metabolic Network and Adduct Ion Relations for Missing Value Imputation

3.1 Introduction

3.2 Methods

3.2.1 Building the predictor network

3.2.2 The imputation procedure

3.2.3 Performance Comparision

3.3 Results

3.3.1 Datasets and Simulation Setup

3.3.2 Computation

3.3.3 Simulation Results

3.4 Discussion and Conclusion

4 Integrate Genotypes with Imaging Phyenotypes for Shape Analysis and Gene Discovery for Alzheimer’s Disease

4.1 Introduction

4.2 The Model

4.2.1 Two-Level Model 

4.2.2 Prior Specifications

4.2.3 Model Representation

4.2.4 Posterior Computation

4.2.5 Non-sparse Bayesian Variable Selection Model

4.3 Simulation Studies Single Subject with 2D Image and no Variable Selection Multi-subjects with 3D Image and no Variable Selection Multi-subjects with 3D Image and variable selection 

4.4 Real Data Application

4.5 Conclusion and Discussion

A Appendix for Chapter 2 

B Appendix for Chapter 3 

C Appendix for Chapter 4

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files