Robust Statistical Methods for Multiplex Classification in Disease Diagnosis Restricted; Files & ToC
Chen, Yuxuan (Fall 2025)
Abstract
Multiple biomarkers are often combined to improve diagnostic accuracy. This development comprises two critical stages: learning a predictive model and evaluating the performance of the resulting model. For model learning, many statistical and machine-learning approaches have been developed. Nevertheless, various issues regarding computation, statistical inference, and/or clinical relevance remain outstanding. On the other hand, for performance estimation, apparent estimates are known to be overly optimistic. Common practices (e.g., cross-validation and bootstrap) may also have their own limitations. This dissertation introduces robust statistical methodologies for both model learning and performance evaluation through three interrelated projects.
The first project considers combining biomarkers through empirical maximization of area under the receiver operating characteristic curve (AUC) to address the computation and inference. Two challenges exist. First, the empirical AUC is piecewise-constant, rendering gradient methods ineffective. Second, AUC is scale-invariant to the coefficients, complicating computation and asymptotics. A novel and effective algorithm is proposed for both the point and variance estimation of the estimated combination coefficients. Simulations demonstrate strong computational and statistical performance, and a clinical application illustrates the practical utility.
The second project focuses on simultaneous performance evaluation, that is, performance evaluation using the training data itself for the biomarker combination derived from empirical AUC maximization. A higher-order asymptotic analysis of the apparent empirical AUC is conducted to theoretically understand the over-optimism. The result motivates a sample-based optimism correction method. Interval estimation procedures are also devised. Simulation studies demonstrate that the proposed methods outperform existing methods both computationally and statistically.
The third project shifts the focus to a linear classifier that maximizes a weighted average of sensitivity and specificity (WSS). By an appropriate weight choice, WSS can be tailored to reflect specific clinical priorities. We develop a classifier estimation method by maximizing the smoothed empirical WSS. Building on the asymptotic properties of the estimator and higher-order asymptotic analysis, we propose a novel, computationally efficient bias correction and interval estimation procedure for simultaneous performance evaluation. Simulations show improved performance estimation compared to existing methods, and a cancer-screening case study illustrates the practical application.
Table of Contents
This table of contents is under embargo until 12 January 2028
About this Dissertation
| School | |
|---|---|
| Department | |
| Subfield / Discipline | |
| Degree | |
| Submission | |
| Language |
|
| Research Field | |
| Keyword | |
| Committee Chair / Thesis Advisor | |
| Committee Members |
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions |
|---|---|---|---|
|
File download under embargo until 12 January 2028 | 2025-12-01 03:02:39 -0500 | File download under embargo until 12 January 2028 |
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions |
|---|