Predictive value of cellphone geolocated mobility, vaccination, and social factors on COVID-19 mortality to provide foundational framework for predicting outcomes of future pandemics Público

Johnson, Erica (Spring 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/4f16c4230?locale=es
Published

Abstract

Relevance

Current research into the factors associated with COVID-19 mortality have shown that social distancing has a direct effect on the levels of COVID-19 deaths1. To expand on this research, this study aims to find the predictive value in using cellphone geolocated mobility, vaccination, and social factors on COVID-19 mortality. Knowing that the effect of these variables are most likely not linearly associated to mortality, using a predictive model that allows for nonlinear relationships and is able to handle missing data and outliers will increase the predictability of the model.

Variables

This study assessed COVID-19 mortality as the main outcome. Mean movement aggregated to four categories (visits to K-12 grade schools, visits to food service locations, points of public transportation, and visits to grocery stores) in each county for each week were considered the exposure of interest in our models. We then added in covariates of vaccination, population density, GDP level, level of urbanicity, household size, age, and political affiliation to address cofounding effects of human movement on COVID-19 mortality.

Design

Data was gathered from all 159 counties in Georgia for dates ranging from March 2020 and March 2022 using SafeGraph, the CDC, GA state databases, and US Census data. After processing this data was visualized using correlation graphs, histograms, and scatter plots to check for collinearity and possible associations between variables. These variables were then evaluated using both simple and expanded linear and Gradient Boosted Trees (GBT) models. Model statistics were looked at to assess the models performance and predictability.

Main Findings

Multiple models, looking at different ways to evaluate movement to locations within the categories were evaluated to provide the best way to include this data in future disease predictive models. We found that there was little difference between the three ways we looked at geo-located data and their effect on the model’s ability to accurately predict COVID-19 deaths. The GBT models significantly out preformed the linear regression models. Expanded GBT models, which considered all covariates and exposures showed a good R2 value around 0.6 with low MAE and RMAE values showing the high precision of this model.

Table of Contents

Introduction

Burden of Disease

Natural History of COVID-19

Known Covariates of Infectious Diseases

Mitigating Measures of the COVID-19 Pandemic

COVID-19 Vaccination

Mobility & SafeGraph

Machine Learning in Epidemiology

Impact and Objectives

Methods

Exploration Framework and Context

Data Source & Processing

Outcome

Exposure

Other Covariates

Covariate Selection and Data Visualization

Model Selection and Training

Model Evaluations

Results:

Data Description & Visualization

Linear Models:

Gradient Boosted Trees (GBT) Model:

Discussion

Main Findings

Limitations

Strengths

Implications

Public Health Implications and Future Directions:

References:

Tables:

Table 1: COVID-19 mortality was used as an outcome variable

Table 2: SafeGraph mobility data to four defined location types provided exposure values

Table 3: Three types of movement variables

Table 4: Covariates which were identified as potential cofounders

Table 5: Linear Models Evaluation Statistics

Table 6: GBT Model performance of different movement variable data types

Figures:

Figure 1: Study Roadmap

Figure 2: Assessment of the effect of lagging the movement in GBT models

Figure 3: Histogram of COVID-19 deaths per 1000 individuals in the top 9 Georgia Counties

Figure 4: Histogram of COVID-19 deaths per 1000 individuals in the bottom 9 Georgia Counties

Figure 5: Comprehensive exploratory graph on all variables

Figure 6: Variable Pearson correlation heat map

Figure 7: Frame captured from animated maps

Figure 8: Absolute difference between predictive and actual for each model

Appendix 1 Online locations of data

Appendix 2:Github Repository

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Palabra Clave
Committee Chair / Thesis Advisor
Committee Members
Última modificación

Primary PDF

Supplemental Files