Predictive value of cellphone geolocated mobility, vaccination, and social factors on COVID-19 mortality to provide foundational framework for predicting outcomes of future pandemics Open Access
Johnson, Erica (Spring 2023)
Abstract
Relevance
Current research into the factors associated with COVID-19 mortality have shown that social distancing has a direct effect on the levels of COVID-19 deaths1. To expand on this research, this study aims to find the predictive value in using cellphone geolocated mobility, vaccination, and social factors on COVID-19 mortality. Knowing that the effect of these variables are most likely not linearly associated to mortality, using a predictive model that allows for nonlinear relationships and is able to handle missing data and outliers will increase the predictability of the model.
Variables
This study assessed COVID-19 mortality as the main outcome. Mean movement aggregated to four categories (visits to K-12 grade schools, visits to food service locations, points of public transportation, and visits to grocery stores) in each county for each week were considered the exposure of interest in our models. We then added in covariates of vaccination, population density, GDP level, level of urbanicity, household size, age, and political affiliation to address cofounding effects of human movement on COVID-19 mortality.
Design
Data was gathered from all 159 counties in Georgia for dates ranging from March 2020 and March 2022 using SafeGraph, the CDC, GA state databases, and US Census data. After processing this data was visualized using correlation graphs, histograms, and scatter plots to check for collinearity and possible associations between variables. These variables were then evaluated using both simple and expanded linear and Gradient Boosted Trees (GBT) models. Model statistics were looked at to assess the models performance and predictability.
Main Findings
Multiple models, looking at different ways to evaluate movement to locations within the categories were evaluated to provide the best way to include this data in future disease predictive models. We found that there was little difference between the three ways we looked at geo-located data and their effect on the model’s ability to accurately predict COVID-19 deaths. The GBT models significantly out preformed the linear regression models. Expanded GBT models, which considered all covariates and exposures showed a good R2 value around 0.6 with low MAE and RMAE values showing the high precision of this model.
Table of Contents
Introduction
Burden of Disease
Natural History of COVID-19
Known Covariates of Infectious Diseases
Mitigating Measures of the COVID-19 Pandemic
COVID-19 Vaccination
Mobility & SafeGraph
Machine Learning in Epidemiology
Impact and Objectives
Methods
Exploration Framework and Context
Data Source & Processing
Outcome
Exposure
Other Covariates
Covariate Selection and Data Visualization
Model Selection and Training
Model Evaluations
Results:
Data Description & Visualization
Linear Models:
Gradient Boosted Trees (GBT) Model:
Discussion
Main Findings
Limitations
Strengths
Implications
Public Health Implications and Future Directions:
References:
Tables:
Table 1: COVID-19 mortality was used as an outcome variable
Table 2: SafeGraph mobility data to four defined location types provided exposure values
Table 3: Three types of movement variables
Table 4: Covariates which were identified as potential cofounders
Table 5: Linear Models Evaluation Statistics
Table 6: GBT Model performance of different movement variable data types
Figures:
Figure 1: Study Roadmap
Figure 2: Assessment of the effect of lagging the movement in GBT models
Figure 3: Histogram of COVID-19 deaths per 1000 individuals in the top 9 Georgia Counties
Figure 4: Histogram of COVID-19 deaths per 1000 individuals in the bottom 9 Georgia Counties
Figure 5: Comprehensive exploratory graph on all variables
Figure 6: Variable Pearson correlation heat map
Figure 7: Frame captured from animated maps
Figure 8: Absolute difference between predictive and actual for each model
Appendix 1 Online locations of data
Appendix 2:Github Repository
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Predictive value of cellphone geolocated mobility, vaccination, and social factors on COVID-19 mortality to provide foundational framework for predicting outcomes of future pandemics () | 2023-04-20 14:16:41 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|