Application and Interpretation of Dengue Fever Diagnostic and Prognostic Modeling in Yucatan, Mexico With Random Forest and Logistic Regression Open Access
Corbett, Patrick (Spring 2021)
Abstract
Early identification of patients with dengue and patients at risk of progression into severe forms of disease is important in timely application of potentially life-saving therapies. The current WHO 2009 clinical classification is highly sensitive for detecting dengue but is not very specific which could lead to oversaturation of hospitals in outbreak scenarios. Machine learning methods such as random forest have the potential to supplement clinical classification systems for detecting dengue. In this paper, we apply both random forest and logistic regression for diagnostic and prognostic modeling of dengue disease in Yucatan, Mexico. We also describe the study population with a short descriptive analysis of the demographic and geospatial characteristics within Yucatan, Mexico. Our results indicate that both models perform relatively well when modeling severe dengue versus non-severe dengue as well as severe dengue versus all other febrile illnesses, but they do not perform well when modeling dengue versus other febrile illnesses. We found that logistic regression performed slightly better than random forest for the severe model groups, but results were mixed for the dengue versus non-dengue models. Furthermore, we found that when applying our severe vs non-severe model to novel years and to other Mexican states, model performance decreases thus challenging the applicability of the model in external populations. We conclude with a discussion on the potential applications and interpretability of random forest models in the clinical setting.
Table of Contents
Table of Contents
Introduction. 1
Background
on the Disease. 1
Epidemiology
and Burden of Disease. 2
Table 1:
Cases, Hospitalizations, and Deaths Across Years in Mexico. 2
Background
on WHO and Modeling Classification. 3
Table 2: WHO
Classification Validation Study Results. 4
Table 3: WHO
Classification Validation Study Results. 5
Methods. 8
Data Source. 8
Descriptive
Statistics. 9
Data
Preparation and Statistical Analysis. 9
Figure 1:
Data Processing and Modeling Structure. 10
Model
Framework. 11
Modeling
Groups. 13
Model
Agnostic Explanations. 14
Results. 15
Descriptive
Statistics. 15
Table 4:
Cases, Hospitalizations, and Deaths Across Years in Yucatan, Mexico. 15
Figure 2:
Confirmed Dengue Diagnosis Across All Months (2008-2019) 17
Figure 3:
Confirmed Dengue From 2016-2019 Disease According to 2009 WHO Classification 17
Figure 4:
Space-Time Spatial Distribution and Clustering of Dengue in Yucatan Mexico
(2008-2019) 19
Severe
Versus Non-Severe Dengue. 20
Severe
Dengue Versus All Else. 21
Figure 6:
Decision Threshold and Hyperparameter Tuning Against Validation Sets. 22
Dengue
Versus OFI (2016-2019) 23
Figure 7:
Feature Importance Using Boruta Analysis and Final Tuned Random Forest 24
Dengue
Versus OFI (2008-2019) 25
Table 4:
Final Assessment Metrics Among Model Groups. 25
Severe
Versus Non-Severe Validation. 28
Table 5:
Assessment Metrics for Severe Vs Non-Severe Temporal and External Validation. 29
LIME
Analysis. 31
Figure 8:
LIME Estimation of Variable Feature Weights for Prediction of “Patient A” 32
Figure 9:
LIME Feature Weights Across 13 randomly Selected Patients in the Target
Population 33
Discussion. 34
Epidemiolocal
and clinical features. 34
Modeling
Results. 35
Interpretation
of Random Forest in The Clinical Setting. 37
Figure 10:
Individual Severe Versus Non-Severe Random Forest Tree For “Patient A” 39
Conclusion. 39
Limitations. 40
Supplemental
Section. 42
Supplemental
Table 1: Model Comparison Metrics. 42
Supplemental
Table 2: Distribution and Univariate Risk for Dengue Across Variables From
2008-2019. 43
Supplemental
Table 3: Distribution of Dengue Across Dengue Prognosis Outcomes From 2016-2019 45
Citations. 51
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Application and Interpretation of Dengue Fever Diagnostic and Prognostic Modeling in Yucatan, Mexico With Random Forest and Logistic Regression () | 2021-05-03 19:45:28 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|