High-dimensional Universal Dependence Discovery Público
Peng, Hesen (2012)
Abstract
The emergence of high-throughput data in biological science and computer networks has generated novel challenges for statistical methods. Nonlinear relationships and multivariate interactions are abundant. The sheer volume of high-throughput data has limited the application for traditional case-by-case analysis methods, whose model assumptions, like linearity, are often not supported in high-throughput scenarios.
To meet these challenges, we developed Mira score, a novel probabilistic association statistic that accounts for high-dimensional universal dependence. Mira score is defined as a function of observation graph, and thus circumvents the curse of dimensionality in high-dimensional data. The superior statistical property enjoyed by Mira score has led to our development of an efficient network reverse-engineering procedure for multivariate dependence. As an example, the procedure has been applied to celiac disease and lung cancer pathway interaction analysis, and has achieved interesting findings.
Further more, in the supervised-machine learning scenario, we proposed SeMira procedure, an efficient variable selection procedure that accounts for high-dimensional universal dependence. The SeMira procedure is capable of identifying universal probabilistic association between multivariate response variables and high-dimensional predictors. The highly desirable statistical property of the SeMira procedure is discussed and numerical study is conducted using both simulated and real genetic pathway data.
Table of Contents
1 High-dimensional Universal Dependence Statistic 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 1
1.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 3
1.2.1 Permutation test of association . . . . . . . . . . . . . . .
. . 5
1.3 Numerical Study . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 6
1.3.1 Comparison with linear regression test of significance . . .
. . 6
1.3.2 High-dimensional Power comparison . . . . . . . . . . . . . .
. 8
1.3.3 Specificity study under null hypothesis . . . . . . . . . . .
. . 8
1.4 Differential pathway interaction network discovery . . . . . .
. . . . . 10
1.4.1 Celiac Disease Pathway Interaction . . . . . . . . . . . . .
. . 11
1.4.2 Lung Cancer Pathway Interaction . . . . . . . . . . . . . . .
. 15
1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 18
2 High-dimensional Universal Dependence Variable Selection 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 20
2.2 SeMira procedure for variable selection . . . . . . . . . .
. . . . . . . 22
2.2.1 Geometry of Minimum Mira score Estimate . . . . . . . . . .
23
2.2.2 Parameter tuning . . . . . . . . . . . . . . . . . . . . . .
. . . 25
2.3 Numerical study . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 26
2.3.1 SeMira procedure performance . . . . . . . . . . . . . . . .
. . 26
2.3.2 SeMira performance with known s . . . . . . . . . . . . . . .
. 29
2.4 Clinical outcome pathway interaction analysis . . . . . . . . .
. . . . 30
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 31
3 Quantification and Deconvolution Of Asymmetric LC-MS Peaks Using
The Bi-Gaussian Mixture Model And Statistical Model Selection
33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 33
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 36
3.2.1 The bi-Gaussian peak model . . . . . . . . . . . . . . . . .
. . 36
3.2.2 Likelihood-based estimation method . . . . . . . . . . . . .
. 36
3.3 Numerical Simulation . . . . . . . . . . . . . . . . . . . . .
. . . . . . 37
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 38
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 41
4 Summary 45
Appendices
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
High-dimensional Universal Dependence Discovery () | 2018-08-28 16:14:01 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|