Amplification of Demographic Bias in Epidemiological Forecasting Restricted; Files & ToC

Cai, Lupin (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/3b591b276?locale=zh
Published

Abstract

Forecasting infectious disease outbreaks is critical for timely public health responses, yet predictive models are often trained on biased data that reflect real-world disparities in data collection and reporting. This thesis investigates how such bias can be amplified across different forecasting models, including traditional time series models (ARIMA), graph-based deep learning models (CoLA-GNN), and epidemiologically structured neural networks (SIR-NN). We evaluate model performance on both a synthetic simulation dataset and real-world COVID-19 case data from Georgia, introducing synthetic underreporting bias based on demographic features such as age, income, gender, and education.

     

We apply clustering to group regions by demographic attributes and compare model performance using absolute and relative error metrics. Results show that while absolute error tends to be higher in less-biased clusters (with higher total cases), relative error consistently rises in clusters with greater underreporting, validating the hypothesis that models trained on biased data amplify disparities in forecasting accuracy. Notably, even advanced models like CoLA-GNN, which incorporate spatio-temporal dependencies, are not immune to this amplification. Across both datasets, the gap between reported and ground truth data correlates with poorer model performance in affected regions, highlighting a compounding effect of demographic bias over multi-step forecasting.

The study reveals that demographic-aware bias evaluation is essential for responsible epidemiological modeling. It emphasizes the importance of using relative error and daily error progression, rather than just aggregate absolute metrics, to uncover latent disparities in model performance. This work contributes to the understanding of fairness in epidemic forecasting and provides a foundation for developing bias-mitigation strategies in future modeling efforts.

Table of Contents

This table of contents is under embargo until 22 November 2025

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改 Preview image embargoed

Primary PDF

Supplemental Files