Robust Statistical Methods for Multiplex Classification in Disease Diagnosis Restricted; Files & ToC

Chen, Yuxuan (Fall 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/pc289k71s?locale=en

Published

Abstract

Multiple biomarkers are often combined to improve diagnostic accuracy. This development comprises two critical stages: learning a predictive model and evaluating the performance of the resulting model. For model learning, many statistical and machine-learning approaches have been developed. Nevertheless, various issues regarding computation, statistical inference, and/or clinical relevance remain outstanding. On the other hand, for performance estimation, apparent estimates are known to be overly optimistic. Common practices (e.g., cross-validation and bootstrap) may also have their own limitations. This dissertation introduces robust statistical methodologies for both model learning and performance evaluation through three interrelated projects.

The first project considers combining biomarkers through empirical maximization of area under the receiver operating characteristic curve (AUC) to address the computation and inference. Two challenges exist. First, the empirical AUC is piecewise-constant, rendering gradient methods ineffective. Second, AUC is scale-invariant to the coefficients, complicating computation and asymptotics. A novel and effective algorithm is proposed for both the point and variance estimation of the estimated combination coefficients. Simulations demonstrate strong computational and statistical performance, and a clinical application illustrates the practical utility.

The second project focuses on simultaneous performance evaluation, that is, performance evaluation using the training data itself for the biomarker combination derived from empirical AUC maximization. A higher-order asymptotic analysis of the apparent empirical AUC is conducted to theoretically understand the over-optimism. The result motivates a sample-based optimism correction method. Interval estimation procedures are also devised. Simulation studies demonstrate that the proposed methods outperform existing methods both computationally and statistically.

The third project shifts the focus to a linear classifier that maximizes a weighted average of sensitivity and specificity (WSS). By an appropriate weight choice, WSS can be tailored to reflect specific clinical priorities. We develop a classifier estimation method by maximizing the smoothed empirical WSS. Building on the asymptotic properties of the estimator and higher-order asymptotic analysis, we propose a novel, computationally efficient bias correction and interval estimation procedure for simultaneous performance evaluation. Simulations show improved performance estimation compared to existing methods, and a cancer-screening case study illustrates the practical application.

This table of contents is under embargo until 12 January 2028

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Biostatistics
Subfield / Discipline	Biostatistics - MPH & MSPH
Degree	Ph.D.
Submission	Dissertation
Language	English
Research Field	Biology, Biostatistics
Keyword	kernel smoothing Receiver Operating Characteristic curve optimization performance evaluation Classification
Committee Chair / Thesis Advisor	Eugene Huang, Emory University
Committee Members	Rakesh Shiradkar, Indiana University Robert Lyles, Emory University Yi-An Ko, Emory University

Last modified

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	File download under embargo until 12 January 2028	2025-12-01 03:02:39 -0500	File download under embargo until 12 January 2028

Robust Statistical Methods for Multiplex Classification in Disease Diagnosis Restricted; Files & ToC

Chen, Yuxuan (Fall 2025)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files