Amplification of Demographic Bias in Epidemiological Forecasting 公开

Cai, Lupin (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/3b591b276?locale=zh

Published

Abstract

Forecasting infectious disease outbreaks is critical for timely public health responses, yet predictive models are often trained on biased data that reflect real-world disparities in data collection and reporting. This thesis investigates how such bias can be amplified across different forecasting models, including traditional time series models (ARIMA), graph-based deep learning models (CoLA-GNN), and epidemiologically structured neural networks (SIR-NN). We evaluate model performance on both a synthetic simulation dataset and real-world COVID-19 case data from Georgia, introducing synthetic underreporting bias based on demographic features such as age, income, gender, and education.

We apply clustering to group regions by demographic attributes and compare model performance using absolute and relative error metrics. Results show that while absolute error tends to be higher in less-biased clusters (with higher total cases), relative error consistently rises in clusters with greater underreporting, validating the hypothesis that models trained on biased data amplify disparities in forecasting accuracy. Notably, even advanced models like CoLA-GNN, which incorporate spatio-temporal dependencies, are not immune to this amplification. Across both datasets, the gap between reported and ground truth data correlates with poorer model performance in affected regions, highlighting a compounding effect of demographic bias over multi-step forecasting.

The study reveals that demographic-aware bias evaluation is essential for responsible epidemiological modeling. It emphasizes the importance of using relative error and daily error progression, rather than just aggregate absolute metrics, to uncover latent disparities in model performance. This work contributes to the understanding of fairness in epidemic forecasting and provides a foundation for developing bias-mitigation strategies in future modeling efforts.

Introduction

Background

Approach

Experiment

Analysis

Conclusion

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Computer Science
Degree	B.S.
Submission	Honors Thesis
Language	English
关键词	Machine Learning Epidemiological Forecasting
Committee Chair / Thesis Advisor	Li Xiong, Emory University
Committee Members	Andreas Züfle, Emory University Max Lau, Emory University

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Amplification of Demographic Bias in Epidemiological Forecasting ()	2025-04-14 17:48:04 -0400	Download

Amplification of Demographic Bias in Epidemiological Forecasting 公开

Cai, Lupin (Spring 2025)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files