Early Detection of Neonatal Infection in NICU Using Machine Learning Models Público
Chi, Zhuohan (Spring 2025)
Abstract
Neonatal infections remain a critical threat in intensive care settings, often progressing rapidly and silently within the first hours of admission. This study develops and evaluates an explainable machine learning framework to enable early prediction of infection risk in neonates, using high-resolution data from the MIMIC-III database. Two time windows were explored—30 and 120 minutes post-ICU admission—during which physiological and hematological variables were aggregated and preprocessed. Missing data were systematically analyzed and imputed using Iterative Imputation, and a comprehensive set of classification models were compared using stratified five-fold cross-validation. Results show that CatBoost achieved the highest F1-score (0.7634) in the 30-minute window, while Gradient Boosting outperformed others in the 120-minute window (F1 ≈ 0.7983), reflecting the impact of data availability on predictive performance. Feature importance and SHAP analysis revealed key indicators such as heart rate, white blood cell count, and temperature as significant contributors. These findings support a two-stage decision-support system that adapts to early and later clinical data, potentially improving timely diagnosis and reducing neonatal morbidity and mortality.
Table of Contents
CHAPTER 1: INTRODUCTION 1
CHAPTER 2: METHOD 3
2.1 Data Source 3
2.2 Data Preprocessing 4
2.3 Missing Value Imputation 5
2.4 Model Selection 7
2.5 Hyperparameter Optimization 8
2.6 Model Selection 10
CHAPTER 3: RESULT 12
3.1 Row Removal Threshold Result 13
3.2 Best Imputation Method 13
3.3 Best Machine Learning Model 14
3.4 Incremental Coverage Analysis 17
CHAPTER 4: DISCUSSION 19
4.1 Overview of Findings 19
4.2 Interpretations and Clinical Implications 20
4.3 Comparison with Existing Literature 21
4.4 Limitation 21
APPENDIX 23
Table 1 Baseline characteristics of the study cohort 23
Table 2: Comparison between Different Infection Detection Methods 26
Figure 1: Research Pipeline 27
Figure 2: Missing Value Distribution 28
Figure 3: Top Five ROC Curve in Model Selection of 120 Minutes Dataset 29
Figure 4: Top Five ROC Curve in Model Selection of 120 Minutes Dataset 30
REFERENCE 31
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Early Detection of Neonatal Infection in NICU Using Machine Learning Models () | 2025-04-25 09:57:18 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|