Methods for Improving the Interpretability and Evaluation of Machine Learning Models and Decision Making Systems Open Access
Nizam, Sohail (Spring 2023)
Abstract
The ability to interpret and evaluate machine learning models is of great importance, particularly when such models will be used as the foundation for decision making systems. Practitioners must have a high degree of certainty about the level of performance that their models will achieve and be able to explain decisions rendered by them. This work aims to improve capabilities in both of these areas.
In the first chapter, we address the problem of model evaluation in binary classification settings. There, the Area Under the Precision Recall Curve (AUPRC) is often of interest in settings with extreme class imbalance. Estimation of metrics like AUPRC is often paired with cross-validation. We formally define the Cross-Validated AUPRC (CVAUPRC), a data-adaptive target parameter, and provide closed-form inference for its non-parametric maximum likelihood estimator. Additionally, we propose a more efficient estimation strategy based on nested cross-validation that offers dramatic improvement in situations with extreme class imbalance.
In the second chapter, we introduce a method for building an interpretable representation of the Highly Adaptive Lasso (HAL). HAL is a machine learning method that has been shown to have predictive performance on par with state-of-the art algorithms and can be represented as a non-recursive partitioning of the feature space. We propose a method for mapping this partitioning implied by HAL to a recursive partitioning, which then allows for the representation of HAL as a decision tree. We refer to this post-hoc method for interpretability as Highly Adaptive Regression Trees.
In the third chapter, we address the problem of interpretable Conditional Average Treatment Effect (CATE) and, by extension, Optimal Treatment Policy (OTP) estimation. Many machine learning-based frameworks for CATE estimation have been proposed. However, few of these methods are interpretable, and those that are often suffer in terms of performance. We extend HART’s capabilities and build on existing Meta-Learning algorithms to produce CATE and OTP estimates which can be represented as trees. We introduce this method for settings with an arbitrary number of treatment arms. We provide regret rates for the proposed methods and show that they outperform popular methods, both interpretable and not.
Table of Contents
Estimation and Inference for Cross-Validated Area Under the Precision Recall Curve Highly Adaptive Regression Trees: A Post-hoc Method for Interpreting the Highly Adaptive Lasso Highly Adaptive Treatment Trees: Interpretable Estimation of Heterogeneous Treatment Effects and Treatment Policies
About this Dissertation
| School | |
|---|---|
| Department | |
| Degree | |
| Submission | |
| Language | 
 | 
| Research Field | |
| Keyword | |
| Committee Chair / Thesis Advisor | |
| Committee Members | 
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions | 
|---|---|---|---|
|  | Methods for Improving the Interpretability and Evaluation of Machine Learning Models and Decision Making Systems () | 2023-04-11 23:38:42 -0400 |  | 
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions | 
|---|