Application and Interpretation of Dengue Fever Diagnostic and Prognostic Modeling in Yucatan, Mexico With Random Forest and Logistic Regression Open Access

Corbett, Patrick (Spring 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/w6634480s?locale=en%5D
Published

Abstract

Early identification of patients with dengue and patients at risk of progression into severe forms of disease is important in timely application of potentially life-saving therapies. The current WHO 2009 clinical classification is highly sensitive for detecting dengue but is not very specific which could lead to oversaturation of hospitals in outbreak scenarios. Machine learning methods such as random forest have the potential to supplement clinical classification systems for detecting dengue. In this paper, we apply both random forest and logistic regression for diagnostic and prognostic modeling of dengue disease in Yucatan, Mexico. We also describe the study population with a short descriptive analysis of the demographic and geospatial characteristics within Yucatan, Mexico. Our results indicate that both models perform relatively well when modeling severe dengue versus non-severe dengue as well as severe dengue versus all other febrile illnesses, but they do not perform well when modeling dengue versus other febrile illnesses. We found that logistic regression performed slightly better than random forest for the severe model groups, but results were mixed for the dengue versus non-dengue models. Furthermore, we found that when applying our severe vs non-severe model to novel years and to other Mexican states, model performance decreases thus challenging the applicability of the model in external populations. We conclude with a discussion on the potential applications and interpretability of random forest models in the clinical setting. 

Table of Contents

Table of Contents

Introduction. 1

Background

on the Disease. 1

Epidemiology

and Burden of Disease. 2

Table 1:

Cases, Hospitalizations, and Deaths Across Years in Mexico. 2

Background

on WHO and Modeling Classification. 3

Table 2: WHO

Classification Validation Study Results. 4

Table 3: WHO

Classification Validation Study Results. 5

Methods. 8

Data Source. 8

Descriptive

Statistics. 9

Data

Preparation and Statistical Analysis. 9

Figure 1:

Data Processing and Modeling Structure. 10

Model

Framework. 11

Modeling

Groups. 13

Model

Agnostic Explanations. 14

Results. 15

Descriptive

Statistics. 15

Table 4:

Cases, Hospitalizations, and Deaths Across Years in Yucatan, Mexico. 15

Figure 2:

Confirmed Dengue Diagnosis Across All Months (2008-2019) 17

Figure 3:

Confirmed Dengue From 2016-2019 Disease According to 2009 WHO Classification 17

Figure 4:

Space-Time Spatial Distribution and Clustering of Dengue in Yucatan Mexico

(2008-2019) 19

Severe

Versus Non-Severe Dengue. 20

Severe

Dengue Versus All Else. 21

Figure 6:

Decision Threshold and Hyperparameter Tuning Against Validation Sets. 22

Dengue

Versus OFI (2016-2019) 23

Figure 7:

Feature Importance Using Boruta Analysis and Final Tuned Random Forest 24

Dengue

Versus OFI (2008-2019) 25

Table 4:

Final Assessment Metrics Among Model Groups. 25

Severe

Versus Non-Severe Validation. 28

Table 5:

Assessment Metrics for Severe Vs Non-Severe Temporal and External Validation. 29

LIME

Analysis. 31

Figure 8:

LIME Estimation of Variable Feature Weights for Prediction of “Patient A” 32

Figure 9:

LIME Feature Weights Across 13 randomly Selected Patients in the Target

Population 33

Discussion. 34

Epidemiolocal

and clinical features. 34

Modeling

Results. 35

Interpretation

of Random Forest in The Clinical Setting. 37

Figure 10:

Individual Severe Versus Non-Severe Random Forest Tree For “Patient A” 39

Conclusion. 39

Limitations. 40

Supplemental

Section. 42

Supplemental

Table 1: Model Comparison Metrics. 42

Supplemental

Table 2: Distribution and Univariate Risk for Dengue Across Variables From

2008-2019. 43

Supplemental

Table 3: Distribution of Dengue Across Dengue Prognosis Outcomes From 2016-2019 45

Citations. 51

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files