A Predictive Random Forest Model on Hospital 30-Day Readmission using Electronic Health Records Público
Lin, Xia (2012)
Abstract
Abstract
A Predictive Random Forest Model on Hospital 30-Day Readmission
using Electronic Health Records
BACKGROUND: Previous studies have employed logistic regression to
predict readmission rates and to identify risk factors for
readmissions at hospitals. Hospital readmission rates remain
high.
OBJECTIVE: To classify patients of 10 diverse subpopulations from
Emory hospitals into groups of different 30-day readmission risks
using 5-year electronic health records and to validate the
applicability of Random Forest on hospital readmission
predictions.
METHODS: The information from the 5-year electronic health
records at all three Emory hospitals was aggregated into
categorical variables and new variables capturing temporal features
were also derived. Random Forest algorithms with 10, 50, or 100
trees were used for model construction. Ranking according to the
predicted readmission probabilities by the Random Forest models
classified patients into groups of different readmission
risks.
RESULTS: The risk ranking strategy using Random Forest models
successfully separated patients into different risk groups for all
10 subpopulations: cancer, chronic kidney disease, chronic
obstructive pulmonary disease, diabetes, heart failure, acute
myocardial infarction, pulmonary hypertension, sickle cell anemia,
stroke, and history of transplant. Misclassification rates for the
top (predicted as "readmitted") and bottom (predicted as "not
readmitted") 10% patient subpopulations by risk ranking were also
calculated. The models appear to be most effective for stroke
patients and least effective for transplant patients. For stroke
patients, the readmission rates of patients who are ranked at
≥90%, 75%-90%, 50%-75%, 25%-50%, 10%-25%, and ≤ 10% are
55%, 13%, 11%, 5%, 3%, and 1%, respectively, compared to the
baseline readmission rate of 12%. For transplant patients, the
readmission rates of patients who are ranked at ≥90%,
75%-90%, 50%-75%, 25%-50%, 10%-25%, and ≤ 10% are 43%, 32%,
24%, 18%, 12%, and 15%, respectively, compared to the baseline
readmission rate of 23%.
Table of Contents
Introduction. 1
Methods. 5
Data Description. 5
Outcome Variable. 5
Risk Factors. 5
Patients. 7
Random Forest 7
Model evaluation and predictions. 9
Results. 10
Data collection. 10
Derived variables. 10
Descriptive statistics. 10
Random Forest ranking and predictions. 11
Discussion. 14
Summary. 17
References. 18
Tables. 23
Table 1. Entire Cohort Patient Characteristics. 23
Table 2. Readmission Rate Distribution of Emory Hospitals. 24
Table 3. Ranking of Readmission Risk by Random Forest (100 trees)
Model, part I 25
Table 4. Ranking of Readmission Risk by Random Forest (100 trees)
Model, part II 26
Table 5. Misclassification Rate by Random Forest with 10, 50, or
100 trees. 27
Appendix. 28
Table A.1 Derived variables used in readmissions analyses (all
definitions are for inpatient encounter data) 28
Table A.2 Cancer Patient Characteristics. 29
Table A.3 Chronic Kidney Disease Patient Characteristics. 30
Table A.4 Chronic Obstructive Pulmonary Disease Patient
Characteristics. 31
Table A.5 Diabetes Patient Characteristics. 32
Table A.6 Heart Failure Patient Characteristics. 33
Table A.7 Acute Myocardial Infarction Patient Characteristics.
34
Table A.8 Pulmonary Hypertension Patient Characteristics. 35
Table A.9 Sickle Cell Anemia Patient Characteristics. 36
Table A.10 Stroke Patient Characteristics. 37
Table A.11 Transplant Patient Characteristics. 38
Table A.12 Ranking of Readmission Risk by Random Forest (10 trees)
Model, part I 39
Table A.13 Ranking of Readmission Risk by Random Forest (10 trees)
Model, part II 40
Table A.14 Ranking of Readmission Risk by Random Forest (50 trees)
Model, part I 41
Table A.15 Ranking of Readmission Risk by Random Forest (50 trees)
Model, part II 42
Figure A.1 List of important variables in the RF models for patient
subpopulations, part I. 43
Figure A.2 List of important variables in the RF models for patient
subpopulations, part II. 44
Figure A.3 List of important variables in the RF models for patient
subpopulations, part III. 45
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palabra Clave | |
Committee Chair / Thesis Advisor | |
Partnering Agencies |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
A Predictive Random Forest Model on Hospital 30-Day Readmission using Electronic Health Records () | 2018-08-28 11:55:27 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|