Methods for Improving the Interpretability and Evaluation of Machine Learning Models and Decision Making Systems Open Access

Nizam, Sohail (Spring 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/ms35tb194?locale=en
Published

Abstract

The ability to interpret and evaluate machine learning models is of great importance, particularly when such models will be used as the foundation for decision making systems. Practitioners must have a high degree of certainty about the level of performance that their models will achieve and be able to explain decisions rendered by them. This work aims to improve capabilities in both of these areas. 

In the first chapter, we address the problem of model evaluation in binary classification settings. There, the Area Under the Precision Recall Curve (AUPRC) is often of interest in settings with extreme class imbalance. Estimation of metrics like AUPRC is often paired with cross-validation. We formally define the Cross-Validated AUPRC (CVAUPRC), a data-adaptive target parameter, and provide closed-form inference for its non-parametric maximum likelihood estimator. Additionally, we propose a more efficient estimation strategy based on nested cross-validation that offers dramatic improvement in situations with extreme class imbalance.

In the second chapter, we introduce a method for building an interpretable representation of the Highly Adaptive Lasso (HAL). HAL is a machine learning method that has been shown to have predictive performance on par with state-of-the art algorithms and can be represented as a non-recursive partitioning of the feature space. We propose a method for mapping this partitioning implied by HAL to a recursive partitioning, which then allows for the representation of HAL as a decision tree. We refer to this post-hoc method for interpretability as Highly Adaptive Regression Trees. 

In the third chapter, we address the problem of interpretable Conditional Average Treatment Effect (CATE) and, by extension, Optimal Treatment Policy (OTP) estimation. Many machine learning-based frameworks for CATE estimation have been proposed. However, few of these methods are interpretable, and those that are often suffer in terms of performance. We extend HART’s capabilities and build on existing Meta-Learning algorithms to produce CATE and OTP estimates which can be represented as trees. We introduce this method for settings with an arbitrary number of treatment arms. We provide regret rates for the proposed methods and show that they outperform popular methods, both interpretable and not. 

Table of Contents

Estimation and Inference for Cross-Validated Area Under the Precision Recall Curve Highly Adaptive Regression Trees: A Post-hoc Method for Interpreting the Highly Adaptive Lasso Highly Adaptive Treatment Trees: Interpretable Estimation of Heterogeneous Treatment Effects and Treatment Policies

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files