A Predictive Random Forest Model on Hospital 30-Day Readmission using Electronic Health Records Público

Lin, Xia (2012)

Permanent URL: https://etd.library.emory.edu/concern/etds/3t945r094?locale=es
Published

Abstract

Abstract


A Predictive Random Forest Model on Hospital 30-Day Readmission using Electronic Health Records


BACKGROUND: Previous studies have employed logistic regression to predict readmission rates and to identify risk factors for readmissions at hospitals. Hospital readmission rates remain high.


OBJECTIVE: To classify patients of 10 diverse subpopulations from Emory hospitals into groups of different 30-day readmission risks using 5-year electronic health records and to validate the applicability of Random Forest on hospital readmission predictions.

METHODS: The information from the 5-year electronic health records at all three Emory hospitals was aggregated into categorical variables and new variables capturing temporal features were also derived. Random Forest algorithms with 10, 50, or 100 trees were used for model construction. Ranking according to the predicted readmission probabilities by the Random Forest models classified patients into groups of different readmission risks.

RESULTS: The risk ranking strategy using Random Forest models successfully separated patients into different risk groups for all 10 subpopulations: cancer, chronic kidney disease, chronic obstructive pulmonary disease, diabetes, heart failure, acute myocardial infarction, pulmonary hypertension, sickle cell anemia, stroke, and history of transplant. Misclassification rates for the top (predicted as "readmitted") and bottom (predicted as "not readmitted") 10% patient subpopulations by risk ranking were also calculated. The models appear to be most effective for stroke patients and least effective for transplant patients. For stroke patients, the readmission rates of patients who are ranked at ≥90%, 75%-90%, 50%-75%, 25%-50%, 10%-25%, and ≤ 10% are 55%, 13%, 11%, 5%, 3%, and 1%, respectively, compared to the baseline readmission rate of 12%. For transplant patients, the readmission rates of patients who are ranked at ≥90%, 75%-90%, 50%-75%, 25%-50%, 10%-25%, and ≤ 10% are 43%, 32%, 24%, 18%, 12%, and 15%, respectively, compared to the baseline readmission rate of 23%.

Table of Contents


Introduction. 1
Methods. 5
Data Description. 5
Outcome Variable. 5
Risk Factors. 5
Patients. 7
Random Forest 7
Model evaluation and predictions. 9

Results. 10
Data collection. 10
Derived variables. 10
Descriptive statistics. 10
Random Forest ranking and predictions. 11

Discussion. 14
Summary. 17
References. 18
Tables. 23
Table 1. Entire Cohort Patient Characteristics. 23
Table 2. Readmission Rate Distribution of Emory Hospitals. 24
Table 3. Ranking of Readmission Risk by Random Forest (100 trees) Model, part I 25
Table 4. Ranking of Readmission Risk by Random Forest (100 trees) Model, part II 26
Table 5. Misclassification Rate by Random Forest with 10, 50, or 100 trees. 27
Appendix. 28
Table A.1 Derived variables used in readmissions analyses (all definitions are for inpatient encounter data) 28
Table A.2 Cancer Patient Characteristics. 29
Table A.3 Chronic Kidney Disease Patient Characteristics. 30
Table A.4 Chronic Obstructive Pulmonary Disease Patient Characteristics. 31
Table A.5 Diabetes Patient Characteristics. 32
Table A.6 Heart Failure Patient Characteristics. 33
Table A.7 Acute Myocardial Infarction Patient Characteristics. 34
Table A.8 Pulmonary Hypertension Patient Characteristics. 35
Table A.9 Sickle Cell Anemia Patient Characteristics. 36
Table A.10 Stroke Patient Characteristics. 37
Table A.11 Transplant Patient Characteristics. 38
Table A.12 Ranking of Readmission Risk by Random Forest (10 trees) Model, part I 39
Table A.13 Ranking of Readmission Risk by Random Forest (10 trees) Model, part II 40
Table A.14 Ranking of Readmission Risk by Random Forest (50 trees) Model, part I 41
Table A.15 Ranking of Readmission Risk by Random Forest (50 trees) Model, part II 42
Figure A.1 List of important variables in the RF models for patient subpopulations, part I. 43
Figure A.2 List of important variables in the RF models for patient subpopulations, part II. 44
Figure A.3 List of important variables in the RF models for patient subpopulations, part III. 45

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Palabra Clave
Committee Chair / Thesis Advisor
Partnering Agencies
Última modificación

Primary PDF

Supplemental Files