Predicting Disease Comorbidity by Mining Large Text Corpora Pubblico
Askew IV, Walter Scott (2009)
Abstract
Natural language processing techniques have a variety of applications in the public health field. This paper discusses a method for predicting whether two diseases are frequently comorbid. A system is presented which applies previous work into using textual information to compute similarity between words to predict disease comorbidity. The work is based on the assumption that the rate of comorbidity between two diseases should be reflected by linguistic similarity of their cooccurrences. Perhaps most excitingly, the paper demonstrates that corpora such as web forums provide useful data for training the system. The ability to mine web based sources for new medical information has many exciting implications in public health. The web could be used to monitor disease trends and epidemic outbreaks, and to uncover new medical knowledge directly from disease suffers. The evaluation of this system shows that it performs above baseline levels in predicting frequency of comorbidity between diseases.
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Methodology 1
2.1 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Counting Cooccurrence . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Similarity Calculation . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Results 10
4.1 Medline Corpus Cross-Validation . . . . . . . . . . . . . . . . 10
4.2 Psych Forums Corpus Cross-Validation . . . . . . . . . . . . 11
4.3 Classifiers Trained On NCSR Truth Data and Validated on
CHIS 2005 Truth Data . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Predicting Disease Comorbidity by Mining Large Text Corpora () | 2018-08-28 13:33:46 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
tuning.pdf () | 2018-08-28 13:33:59 -0400 |
|
|
mybib.bib () | 2018-08-28 13:34:08 -0400 |
|
|
flow.pdf () | 2018-08-28 13:34:14 -0400 |
|
|
front.doc () | 2018-08-28 13:34:20 -0400 |
|