Application and Interpretation of Dengue Fever Diagnostic and Prognostic Modeling in Yucatan, Mexico With Random Forest and Logistic Regression Open Access

Corbett, Patrick (Spring 2021)

Permanent URL:


Early identification of patients with dengue and patients at risk of progression into severe forms of disease is important in timely application of potentially life-saving therapies. The current WHO 2009 clinical classification is highly sensitive for detecting dengue but is not very specific which could lead to oversaturation of hospitals in outbreak scenarios. Machine learning methods such as random forest have the potential to supplement clinical classification systems for detecting dengue. In this paper, we apply both random forest and logistic regression for diagnostic and prognostic modeling of dengue disease in Yucatan, Mexico. We also describe the study population with a short descriptive analysis of the demographic and geospatial characteristics within Yucatan, Mexico. Our results indicate that both models perform relatively well when modeling severe dengue versus non-severe dengue as well as severe dengue versus all other febrile illnesses, but they do not perform well when modeling dengue versus other febrile illnesses. We found that logistic regression performed slightly better than random forest for the severe model groups, but results were mixed for the dengue versus non-dengue models. Furthermore, we found that when applying our severe vs non-severe model to novel years and to other Mexican states, model performance decreases thus challenging the applicability of the model in external populations. We conclude with a discussion on the potential applications and interpretability of random forest models in the clinical setting. 

Table of Contents

Table of Contents

Introduction. 1


on the Disease. 1


and Burden of Disease. 2

Table 1:

Cases, Hospitalizations, and Deaths Across Years in Mexico. 2


on WHO and Modeling Classification. 3

Table 2: WHO

Classification Validation Study Results. 4

Table 3: WHO

Classification Validation Study Results. 5

Methods. 8

Data Source. 8


Statistics. 9


Preparation and Statistical Analysis. 9

Figure 1:

Data Processing and Modeling Structure. 10


Framework. 11


Groups. 13


Agnostic Explanations. 14

Results. 15


Statistics. 15

Table 4:

Cases, Hospitalizations, and Deaths Across Years in Yucatan, Mexico. 15

Figure 2:

Confirmed Dengue Diagnosis Across All Months (2008-2019) 17

Figure 3:

Confirmed Dengue From 2016-2019 Disease According to 2009 WHO Classification 17

Figure 4:

Space-Time Spatial Distribution and Clustering of Dengue in Yucatan Mexico

(2008-2019) 19


Versus Non-Severe Dengue. 20


Dengue Versus All Else. 21

Figure 6:

Decision Threshold and Hyperparameter Tuning Against Validation Sets. 22


Versus OFI (2016-2019) 23

Figure 7:

Feature Importance Using Boruta Analysis and Final Tuned Random Forest 24


Versus OFI (2008-2019) 25

Table 4:

Final Assessment Metrics Among Model Groups. 25


Versus Non-Severe Validation. 28

Table 5:

Assessment Metrics for Severe Vs Non-Severe Temporal and External Validation. 29


Analysis. 31

Figure 8:

LIME Estimation of Variable Feature Weights for Prediction of “Patient A” 32

Figure 9:

LIME Feature Weights Across 13 randomly Selected Patients in the Target

Population 33

Discussion. 34


and clinical features. 34


Results. 35


of Random Forest in The Clinical Setting. 37

Figure 10:

Individual Severe Versus Non-Severe Random Forest Tree For “Patient A” 39

Conclusion. 39

Limitations. 40


Section. 42


Table 1: Model Comparison Metrics. 42


Table 2: Distribution and Univariate Risk for Dengue Across Variables From

2008-2019. 43


Table 3: Distribution of Dengue Across Dengue Prognosis Outcomes From 2016-2019 45

Citations. 51

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files