Robust Statistical Methods for Multiplex Classification in Disease Diagnosis Restricted; Files & ToC

Chen, Yuxuan (Fall 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/pc289k71s?locale=en
Published

Abstract

Multiple biomarkers are often combined to improve diagnostic accuracy. This development comprises two critical stages: learning a predictive model and evaluating the performance of the resulting model. For model learning, many statistical and machine-learning approaches have been developed. Nevertheless, various issues regarding computation, statistical inference, and/or clinical relevance remain outstanding. On the other hand, for performance estimation, apparent estimates are known to be overly optimistic. Common practices (e.g., cross-validation and bootstrap) may also have their own limitations. This dissertation introduces robust statistical methodologies for both model learning and performance evaluation through three interrelated projects.

The first project considers combining biomarkers through empirical maximization of area under the receiver operating characteristic curve (AUC) to address the computation and inference. Two challenges exist. First, the empirical AUC is piecewise-constant, rendering gradient methods ineffective. Second, AUC is scale-invariant to the coefficients, complicating computation and asymptotics. A novel and effective algorithm is proposed for both the point and variance estimation of the estimated combination coefficients. Simulations demonstrate strong computational and statistical performance, and a clinical application illustrates the practical utility.

The second project focuses on simultaneous performance evaluation, that is, performance evaluation using the training data itself for the biomarker combination derived from empirical AUC maximization. A higher-order asymptotic analysis of the apparent empirical AUC is conducted to theoretically understand the over-optimism. The result motivates a sample-based optimism correction method. Interval estimation procedures are also devised. Simulation studies demonstrate that the proposed methods outperform existing methods both computationally and statistically. 

The third project shifts the focus to a linear classifier that maximizes a weighted average of sensitivity and specificity (WSS). By an appropriate weight choice, WSS can be tailored to reflect specific clinical priorities. We develop a classifier estimation method by maximizing the smoothed empirical WSS. Building on the asymptotic properties of the estimator and higher-order asymptotic analysis, we propose a novel, computationally efficient bias correction and interval estimation procedure for simultaneous performance evaluation. Simulations show improved performance estimation compared to existing methods, and a cancer-screening case study illustrates the practical application.

Table of Contents

This table of contents is under embargo until 12 January 2028

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified Preview image embargoed

Primary PDF

Supplemental Files