High-dimensional Universal Dependence Discovery Público

Peng, Hesen (2012)

Permanent URL: https://etd.library.emory.edu/concern/etds/pv63g1088?locale=pt-BR
Published

Abstract

The emergence of high-throughput data in biological science and computer networks has generated novel challenges for statistical methods. Nonlinear relationships and multivariate interactions are abundant. The sheer volume of high-throughput data has limited the application for traditional case-by-case analysis methods, whose model assumptions, like linearity, are often not supported in high-throughput scenarios.

To meet these challenges, we developed Mira score, a novel probabilistic association statistic that accounts for high-dimensional universal dependence. Mira score is defined as a function of observation graph, and thus circumvents the curse of dimensionality in high-dimensional data. The superior statistical property enjoyed by Mira score has led to our development of an efficient network reverse-engineering procedure for multivariate dependence. As an example, the procedure has been applied to celiac disease and lung cancer pathway interaction analysis, and has achieved interesting findings.

Further more, in the supervised-machine learning scenario, we proposed SeMira procedure, an efficient variable selection procedure that accounts for high-dimensional universal dependence. The SeMira procedure is capable of identifying universal probabilistic association between multivariate response variables and high-dimensional predictors. The highly desirable statistical property of the SeMira procedure is discussed and numerical study is conducted using both simulated and real genetic pathway data.

Table of Contents

1 High-dimensional Universal Dependence Statistic 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Permutation test of association . . . . . . . . . . . . . . . . . 5
1.3 Numerical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Comparison with linear regression test of significance . . . . . 6
1.3.2 High-dimensional Power comparison . . . . . . . . . . . . . . . 8
1.3.3 Specificity study under null hypothesis . . . . . . . . . . . . . 8
1.4 Differential pathway interaction network discovery . . . . . . . . . . . 10
1.4.1 Celiac Disease Pathway Interaction . . . . . . . . . . . . . . . 11
1.4.2 Lung Cancer Pathway Interaction . . . . . . . . . . . . . . . . 15
1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 High-dimensional Universal Dependence Variable Selection 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 SeMira procedure for variable selection . . . . . . . . . . . . . . . . . 22
2.2.1 Geometry of Minimum Mira score Estimate . . . . . . . . . . 23
2.2.2 Parameter tuning . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 SeMira procedure performance . . . . . . . . . . . . . . . . . . 26
2.3.2 SeMira performance with known s . . . . . . . . . . . . . . . . 29
2.4 Clinical outcome pathway interaction analysis . . . . . . . . . . . . . 30
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Quantification and Deconvolution Of Asymmetric LC-MS Peaks Using The Bi-Gaussian Mixture Model And Statistical Model Selection 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 The bi-Gaussian peak model . . . . . . . . . . . . . . . . . . . 36
3.2.2 Likelihood-based estimation method . . . . . . . . . . . . . . 36
3.3 Numerical Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Summary 45
Appendices

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palavra-chave
Committee Chair / Thesis Advisor
Committee Members
Última modificação

Primary PDF

Supplemental Files