Machine Learning Methods for Biomedical Keyphrase Extraction 公开
Gero, Zelalem (Fall 2021)
Abstract
Due to the increased generation and digitization of text documents on the Internet and digital libraries, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Efficient automated methods that extract keywords to retrieve the salient concepts of a document are shown to be of a paramount importance in text analysis, document summarization, topic detection, and recommendation systems among others. Various machine learning approaches have been proposed to solve the problem of keyword extraction but the results still lag other tasks such as document classification. The task of keyword extraction in biomedical domain is even more daunting since the literature is highly domain specific and general methods do not translate well. To deal with these problems we propose 1) an unsupervised extraction method based on phrase-embeddings and modified pagerank algorithm which converges faster and performs better than related baseline methods; 2) A deep learning method that pays more attention to words that are central to the document’s semantics; 3) a semi-supervised deep learning approach to harness vastly available unannotated biomedical data that improves keyword extraction based on uncertainty estimation. 4) An encoder-decoder based extraction for Medical Subject Heading (MeSH) indexing.
Table of Contents
1 Introduction
1.1 What Constitutes a Keyphrase
1.2 Contributions
1.3 Outline
2 Unsupervised Keyphrase Extraction
2.1 Introduction
2.1.1 Related Work
2.1.2 Graph-based Methods
2.2 Proposed Model: NamedKeys
2.2.1 Candidate Keyphrase Generation
2.2.2 Phrase Embedding: PMCVec
2.2.3 Phrase Quality
2.2.4 Candidate Clustering and Ranking
2.3 Experiments
2.3.1 Dataset
2.3.2 Baseline Methods
2.3.3 Conclusion
3 Supervised Keyphrase Extraction
3.1 Introduction
3.2 Related Work
3.3 Methodology
3.3.1 Word Embedding Layer
3.3.2 BiLSTM Layer
3.3.3 Centrality Weighting Layer
3.3.4 Conditional Random Fields (CRF)
3.4 Experiments
3.4.1 Datasets
3.4.2 Experimental Settings
3.4.3 Results
3.4.4 Conclusion
4 Semi-Supervised Keyphrase Extraction
4.1 Introduction
4.2 Related Work
4.3 Methodology
4.3.1 BiLSTM-CRF Architecture
4.3.2 Self-training and Uncertainty Estimation
4.4 Experiments
4.4.1 Datasets
4.4.2 Experimental Settings
4.4.3 Evaluation Results
5 MeSH Indexing: Keyphrase Extraction from Controlled Vocabulary
5.1 Introduction
5.2 Related Work
5.3 Proposed Model: Encoder-Decoder with RL for MeSH Indexing
5.3.1 Encoder
5.3.2 Decoder
5.3.3 Reinforcement Learning for seq2seq Training
5.3.4 Conditional Random Fields (CRF)
5.4 Experimental Results
5.4.1 Dataset
5.4.2 Evaluation and Results
7.5 Conclusion
6 Conclusion and Future Work
7 Bibliography
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Machine Learning Methods for Biomedical Keyphrase Extraction () | 2021-10-19 11:49:19 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|