Online Learning Based Clinical Information Extraction and Classification Public

Zheng, Shuai (2015)

Permanent URL: https://etd.library.emory.edu/concern/etds/zw12z5859?locale=fr
Published

Abstract

To enable the research use of clinical reports, pertinent data needs to be extracted from narrated medical reports. Traditional automated based approaches are brittle and do not have the ability to take user interaction as feedbacks for improving the extraction algorithm in real time. In this dissertation, we present an interactive, online machine learning based system, IDEAL-X, that addresses some key shortcomings of existing systems. IDEAL-X provides a standard interface that can be used for simple data extraction and data entry. It is unique, however, in its ability to transparently analyze and quickly learn, from users' interactions with a small number of reports, the desired values for the data fields. Additional user feedback (through acceptance or edits on system generated values) incrementally refines the decision model in real-time, which further reduces the user's burden in processing subsequent reports. Extensive experiments in multiple use cases show that the system achieves high accuracy on data extraction with minimal effort from users. The system also accepts predefined domain knowledge, in the form of controlled vocabulary, to improve the efficiency and accuracy of data extraction. The system contains components for standardizing and querying extracted values. Moreover, an online learning based classification module can be used to support clinical decision making. We report successful applications of IDEAL-X to extract data in Emory Cardiology, Pathology, and the Centers for Disease Control and Prevention.

Table of Contents

1 Introduction 1

1.1 Motivation 1

1.1.1 Clinical Data 1

1.1.2 Structured Reporting 2

1.1.3 Clinical Data Processing 3

1.1.4 Traditional Approaches 4

1.1.5 Online Learning Based System 4

1.2 Research Contribution 6

1.3 Potential Usages 7

2 Background and Related Work 10

2.1 Background 10

2.2 Important Problems of Clinical NLP 13

2.3 Related Work 14

2.3.1 Clinical Information Extraction 14

2.3.2 Clinical Decision Support 15

2.3.3 Online Machine Learning 16

2.3.4 Contextual Feature Extraction 16

3 IDEAL-X: The User Perspective 18

3.1 Goals 18

3.2 Human-Computer Interaction 19

3.2.1 Interface Design 20

3.2.2 Basic Operations 21

3.2.3 Interactions 22

3.2.4 Query Interface and Engine 25

4 Online Learning Based Clinical Information Extraction 27

4.1 Online Machine Learning 27

4.1.1 Online Learning Overview 27

4.1.2 Online Learning Based Information Extraction and Classification 29

4.2 System Design Overview 31

4.3 Document Preprocessing 33

4.4 The Data Extraction Engine 34

4.4.1 Answer Prediction 35

4.4.2 Learning 44

5 Controlled Vocabulary Supported Clinical Information Extraction 47

5.1 Construction of Controlled Vocabularies 47

5.2 Supporting Information Extraction 49

5.3 Supporting Standardization 50

5.4 Generating Vocabulary 51

5.4.1 Adaptive Vocabulary 51

5.4.2 Vocabulary Generating Tool 51

5.4.3 Discussion 53

6 Clinical Information Extraction Use Cases 54

6.1 Use Case 1: Large Scale Cohort Identification for Cardiology Research 55

6.1.1 Background and Motivation 55

6.1.2 Experiment Setup 56

6.1.3 Performance Evaluation 57

6.1.4 Discussion 61

6.2 Use Case 2: Support Patient Search on Pathology Reports 61

6.2.1 Background and Motivation 61

6.2.1 Experiment Setup 62

6.2.3 Performance Evaluation 63

6.2.4 Discussion 66

6.3 Use Case 3: Information Extraction Supported Disease Surveillance 67

6.3.1 Background and Motivation 67

6.3.2 Experiment Setup 67

6.3.3 Performance Evaluation 68

6.3.4 Discussion 70

6.4 Summary 71

7 Online Learning Based Clinical Information Classification 73

7.1 Motivations and Goals 73

7.2 Challenges 75

7.3 System Architecture 77

7.4 Algorithms 78

7.4.1 Naïve Bayesian 79

7.4.2 Neural Network 79

7.4.3 k-Nearest Neighbors 81

7.5 Results 82

7.6 Conclusion 85

8 Conclusion and Future Work 86

8.1 Conclusion 86

8.2 Future Work 87

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files