Predicting Rare Clinical Events in Complex and Dynamic Environments Public

Tabaie, Azade (Fall 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/8w32r692d?locale=fr
Published

Abstract

Traditional machine learning classification algorithms assume a balanced proportion of classes in the data. However, class-imbalanced data is a challenge for training predictive models in many fields such as the medical domain. Although patient adverse outcomes occur rarely, they are worthy of prediction to improve the quality of care that patients have received; therefore, monitoring systems are needed in the hospital setting to capture the adverse rare events and improve patient health outcomes.

To that end, machine learning and natural language processing (NLP) techniques were used along with clinical expert knowledge to address the issue of rare event classification in a complex environment such as a hospital setting. In particular, two different patient cohorts with distinct characteristics and objectives were investigated. 

First, strategies were proposed to predict a rare type of infection among hospitalized children with central venous lines (CVLs). This cohort of pediatric patients are at high risk of morbidity and mortality from hospital acquired infections. Many serious infections in hospitalized children are likely preventable through interventions that prevent the infection or identify them early to initiate antimicrobial therapy. Besides being considered as a rare clinical event, the definitions that have been proposed for bloodstream infection commonly have inadequate sensitivity for clinically important infections and may be difficult to generalize across electronic health records (EHR) platforms. To infer the onset of the infection from EHR and eliminate the need for extensive chart reviews, a surrogate definition for bloodstream infection was proposed and validated. Then, two study designs were tested to improve the prediction accuracy of the onset of the infection during hospitalization. Finally, a data fusion approach was undertaken to integrate structured and unstructured information from EHR to boost prediction performance. Incremental but meaningful improvements in the predictions were observed after each step.

Second, an algorithm was proposed to monitor the visits to an emergency department (ED) to detect intimate partner violence (IPV). IPV is a pervasive social challenge with severe health and demographic consequences. People experiencing IPV may seek care in emergency settings. Despite the urgency of this critical public health issue, IPV continues to be profoundly underdiagnosed and is considered a persistent hidden epidemic. IPV is frequently undercoded, undetected without appropriate screening tools, and underreported, rendering it a rare encounter in EHRs. The early and appropriate detection of and response to such cases is critical in disrupting the cycle of abuse including IPV related morbidity and mortality. Our proposed algorithm benefits from NLP techniques and domain expert knowledge. It can identify victims of IPV with a high precision by analyzing the recorded provider notes and patient narratives.

We argue that all the techniques incorporated in this thesis are transferable to identify other rare clinical events with the ultimate goal of improving the level of care.

Table of Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Aim of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 List of publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Predicting Presumed Serious Infection among Hospitalized Children

on Central Venous Lines with Machine Learning 9

2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Deep Learning Model to Predict Serious Infection among Children

with Central Venous Lines 24

3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.1 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.2 Outcome De nition . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.4 Window-Wise Study Design . . . . . . . . . . . . . . . . . . . 29

3.3.5 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.6 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.7 Model Explainability . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.8 Clinical Benchmark . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 A Machine Learning Pipeline for Integrating Structured and Un-

structured Data for Timely Prediction of Bloodstream Infection

among Children with Central Venous Lines 42

4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3.1 Study Population . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3.2 Outcome De nition . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3.4 Feature Extraction from Clinical Notes . . . . . . . . . . . . . 47

4.3.5 Predictive Models . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3.6 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5 A Novel Technique for Developing a Natural Language Processing

Algorithm to Identify Intimate Partner Violence in a Hospital Set-

ting 57

5.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.1 Study Population . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.2 Detecting IPV Cases . . . . . . . . . . . . . . . . . . . . . . . 60

5.3.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 63

5.3.4 NLP Algorithm Application . . . . . . . . . . . . . . . . . . . 65

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4.1 Approach 1: ICD-9/ICD-10 Codes . . . . . . . . . . . . . . . 65

5.4.2 Approaches 2 and 3: IPV Situational Terms Extended Situational

Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6 Conclusion 71

6.1 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . 71

6.1.1 Serious Infection Prediction among Hospitalized Children . . . 71

6.1.2 Identifying IPV Cases among the Visits to ED . . . . . . . . . 73

6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2.1 Serious Infection Prediction among Hospitalized Children . . . 74

6.2.2 Identifying IPV Cases among the Visits to ED . . . . . . . . . 75

6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Appendix A 77

A.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A.2 Inclusion Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.4 The predictive models . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.5 PRISM-III Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

A.6 PELOD-2 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

A.7 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Appendix B 92

B.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

B.2 Training/Validation/Testing Data Splits . . . . . . . . . . . . . . . . 94

B.3 The Input Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

B.4 Model Speci cations . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

B.5 Con dence Interval of Performance Metrics . . . . . . . . . . . . . . . 97

B.6 PELOD-2 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

B.7 PRISM-III score's components . . . . . . . . . . . . . . . . . . . . . . 98

B.8 Model performance on di erent patient race categories . . . . . . . . 99

B.9 Model performance on di erent patient insurance categories . . . . . 100

B.10 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Appendix C 103

C.1 Clinical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

C.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

C.2.1 Structured Data . . . . . . . . . . . . . . . . . . . . . . . . . . 103

C.2.2 Unstructured Data . . . . . . . . . . . . . . . . . . . . . . . . 106

C.3 Training/Validation/Testing Subsets of Data . . . . . . . . . . . . . . 106

C.4 The Input Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

C.5 Model Speci cations . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

C.6 Con dence Interval of Performance Metrics . . . . . . . . . . . . . . . 109

C.7 PELOD-2 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

C.8 PRISM-III Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Appendix D 111

D.1 List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Bibliography 115

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files