Detecting Training Data Biases: MLR And Graphical LASSO Based Methods Open Access
Luo, Shuxuan (Spring 2023)
Abstract
As the use of algorithms for automated decision-making became increasingly prevalent, many have pointed out the discriminatory results produced. This paper aims to extract and evaluate one source of such discrimination—the unintentional biases captured in the training data through high correlations between the predictors and the protected characteristics. To see if a predictor is systematically excluding qualified members belonging to a protected group, we examine the “direct” correlation between this predictor and the protected characteristic, controlling for all other predictors in the training data. We first propose a Multivariable Linear Regression test, adapted from the “Input Accountability Test.” We also propose using a Graphical LASSO based test. We applied all three tests on detecting biases in our simulated datasets, and we found GLASSO to work the best. Finally, we discuss limitations of GLASSO and where we can improve.
Table of Contents
Introduction Problem Statement Methods Input Accountability Test "Indirect" Relationships Significance Testing The "Direct" Relationship Multiple Linear Regression Graphical LASSO Precision Matrix and Conditional Independence LASSO Data Variables Biased Data Simulation Unbiased Data Simulation Results IAT Biased Dataset Unbiased Dataset MLR Biased Dataset Unbiased Dataset GLASSO Biased Dataset Unbiased Dataset Conclusion Future Works Discussion Appendix A, B Bibliography
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Detecting Training Data Biases: MLR And Graphical LASSO Based Methods () | 2023-04-06 13:39:58 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Biased Dataset (Simulation and Analysis) | 2023-04-06 13:40:04 -0400 |
|
|
Unbiased Large Dataset (To demonstrate unreliability of p-test) | 2023-04-06 13:40:14 -0400 |
|
|
Unbiased Dataset (Simulation and Analysis) | 2023-04-06 13:40:25 -0400 |
|