Early Detection of Neonatal Infection in NICU Using Machine Learning Models Public

Chi, Zhuohan (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/cn69m5672?locale=fr

Published

Abstract

Neonatal infections remain a critical threat in intensive care settings, often progressing rapidly and silently within the first hours of admission. This study develops and evaluates an explainable machine learning framework to enable early prediction of infection risk in neonates, using high-resolution data from the MIMIC-III database. Two time windows were explored—30 and 120 minutes post-ICU admission—during which physiological and hematological variables were aggregated and preprocessed. Missing data were systematically analyzed and imputed using Iterative Imputation, and a comprehensive set of classification models were compared using stratified five-fold cross-validation. Results show that CatBoost achieved the highest F1-score (0.7634) in the 30-minute window, while Gradient Boosting outperformed others in the 120-minute window (F1 ≈ 0.7983), reflecting the impact of data availability on predictive performance. Feature importance and SHAP analysis revealed key indicators such as heart rate, white blood cell count, and temperature as significant contributors. These findings support a two-stage decision-support system that adapts to early and later clinical data, potentially improving timely diagnosis and reducing neonatal morbidity and mortality.

CHAPTER 1: INTRODUCTION 1

CHAPTER 2: METHOD 3

2.1 Data Source 3

2.2 Data Preprocessing 4

2.3 Missing Value Imputation 5

2.4 Model Selection 7

2.5 Hyperparameter Optimization 8

2.6 Model Selection 10

CHAPTER 3: RESULT 12

3.1 Row Removal Threshold Result 13

3.2 Best Imputation Method 13

3.3 Best Machine Learning Model 14

3.4 Incremental Coverage Analysis 17

CHAPTER 4: DISCUSSION 19

4.1 Overview of Findings 19

4.2 Interpretations and Clinical Implications 20

4.3 Comparison with Existing Literature 21

4.4 Limitation 21

APPENDIX 23

Table 1 Baseline characteristics of the study cohort 23

Table 2: Comparison between Different Infection Detection Methods 26

Figure 1: Research Pipeline 27

Figure 2: Missing Value Distribution 28

Figure 3: Top Five ROC Curve in Model Selection of 120 Minutes Dataset 29

Figure 4: Top Five ROC Curve in Model Selection of 120 Minutes Dataset 30

REFERENCE 31

About this Master's Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Rollins School of Public Health
Department	Biostatistics
Subfield / Discipline	Biostatistics - MPH & MSPH
Degree	M.P.H.
Submission	Master's Thesis
Language	English
Research Field	Computer Science Statistics Health Sciences, General
Mot-clé	Machine Learning MIMIC-III Neonatal Infection Early Prediction Catboost NICU Gradient Boosting
Committee Chair / Thesis Advisor	Zhaohui Qin, Emory University
Committee Members	Steve Pittard, Emory University

Dernière modification

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Early Detection of Neonatal Infection in NICU Using Machine Learning Models ()	2025-04-25 09:57:18 -0400	Download

Early Detection of Neonatal Infection in NICU Using Machine Learning Models Public

Chi, Zhuohan (Spring 2025)

Abstract

Table of Contents

About this Master's Thesis

Primary PDF

Supplemental Files