Detecting Training Data Biases: MLR And Graphical LASSO Based Methods Open Access

Luo, Shuxuan (Spring 2023)

Permanent URL:


As the use of algorithms for automated decision-making became increasingly prevalent, many have pointed out the discriminatory results produced. This paper aims to extract and evaluate one source of such discrimination—the unintentional biases captured in the training data through high correlations between the predictors and the protected characteristics. To see if a predictor is systematically excluding qualified members belonging to a protected group, we examine the “direct” correlation between this predictor and the protected characteristic, controlling for all other predictors in the training data. We first propose a Multivariable Linear Regression test, adapted from the “Input Accountability Test.” We also propose using a Graphical LASSO based test. We applied all three tests on detecting biases in our simulated datasets, and we found GLASSO to work the best. Finally, we discuss limitations of GLASSO and where we can improve. 

Table of Contents

Introduction Problem Statement Methods Input Accountability Test "Indirect" Relationships Significance Testing The "Direct" Relationship Multiple Linear Regression Graphical LASSO Precision Matrix and Conditional Independence LASSO Data Variables Biased Data Simulation Unbiased Data Simulation Results IAT Biased Dataset Unbiased Dataset MLR Biased Dataset Unbiased Dataset GLASSO Biased Dataset Unbiased Dataset Conclusion Future Works Discussion Appendix A, B Bibliography

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files