Identification of Kidney Transplant Recipients at High-Risk for Post-Transplant Hospitalization using Natural Language Processing Pubblico
Arenson, Michael (Spring 2019)
Abstract
Post-discharge rehospitalization after kidney transplant is a common and preventable problem that is both costly to patients and healthcare systems and is associated with poor outcomes. There is epidemiological evidence that up to 50% of surgical readmissions may be preventable (e.g. through discharge planning, patient education, and/or follow-up communication). Predictive analytics have previously been used to identify patients at risk of rehospitalization with limited success.
The vast amount of free-text data in the form of clinical notes that exist in the electronic medical record (EMR) has been untapped in the field of kidney-transplant. To date EMR free-text clinical notes have not been included in predictive models of 30-day rehospitalization (30DR) post-kidney transplant. Unstructured data describes any source of data that is not easily placed in a traditional numeric dataset. Analyzing free-text requires Natural language processing (NLP), which is a subfield of Artificial Intelligence that uses computer algorithms to analyze human language. Here, NLP was used to analyze EMR free-text documentation of kidney transplant recipients with the ultimate goal of reducing readmission post-kidney transplant.
This was a retrospective observational analysis of first-time recipients of kidney transplant at a large institution in the Southeast between January 2005 and December 2015. Both structured and unstructured data in the form of clinical notes written in the EMR were analyzed. Eight clinical notes were characterized and mined for possible new predictive features that might be useful to improve predictive accuracy of 30DR post-kidney transplant. Predictive models using unstructured, free-text clinical notes were built using machine-learning, unsupervised approaches. These predictive models did not meaningfully improve predictive accuracy above structured data alone. However, the results generated a number of new hypotheses regarding potentially novel predictors to be examined in future research applying more human-driven approaches.
Table of Contents
Table of Contents
INTRODUCTION 1
BACKGROUND 3
METHODS 6
RESULTS 13
DISCUSSION 20
CONCLUSIONS 26
REFERENCES 27
TABLES / FIGURES 29
FIGURE 1: CONCEPTUAL MOCK-UP OF CLINICAL DASHBOARD IDENTIFYING KIDNEY TRANSPLANT PATIENTS AT HIGH-RISK OF 30-DAY READMISSION 29
FIGURE 2: LIST OF ALL STRUCTURED DATA VARIABLES AND ALL UNSTRUCTURED DATA SOURCES BY TIME COLLECTED IN TRANSPLANT PROCESS 30
FIGURE 3: TRADITIONAL CONCATENATION VS. ENSEMBLE LOGISTIC REGRESSION METHOD. 32
FIGURE 4: INCLUSION AND EXCLUSION FLOWCHART 33
TABLE 1: BASELINE CHARACTERISTICS OF KIDNEY TRANSPLANT RECIPIENTS FROM EMORY TRANSPLANT CENTER, STRATIFIED BY READMISSION WITHIN 30 DAYS POST -TRANSPLANT, 2005–2015 34
FIGURE 5: FREQUENCY OF WORDS IN THREE TYPES OF NOTES GRAPHED BY 30DR VS. NON-30DR PATIENTS. 39
TABLE 2: MOST COMMON WORDS THAT PRECEDE THE WORD "SUPPORT" AMONGST ALL NOTES AS IDENTIFIED BY TERM FREQUENCY 41
FIGURE 6: TOP 30 TF-IDF FEATURES FOR OPERATIVE, SELECTION CONFERENCE, AND SOCIAL WORK NOTES 42
FIGURE 7: TOP 20 TERMS IN TOPIC MODEL USING LDA WITH K=8 TOPICS 43
FIGURE 8: GAMMA FOR EACH TOPIC BY NOTE TYPE 44
TABLE 3: INDIVIDUAL CLINICAL NOTES ADDED TO STRUCTURED VARIABLES TO CREATE PREDICTIVE MODEL 45
TABLE 4: ADDING MULTIPLE NOTE TYPES TO PREDICTIVE MODELS FOR HOSPITAL READMISSION AFTER KIDNEY TRANSPLANTS 46
TABLE 5: RANKING TOP PREDICTIVE FEATURES FOR HIGHER READMISSION OF KIDNEY TRANSPLANT RECIPIENTS (2005-2015) IN HIGHEST PERFORMING PREDICTIVE MODEL FROM TABLE 4 47
About this Master's Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Identification of Kidney Transplant Recipients at High-Risk for Post-Transplant Hospitalization using Natural Language Processing () | 2019-05-02 07:31:45 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|