Improving Biomedical Abstract Screening Using Contrastive Learning 公开
Li, Tiantian (Spring 2023)
Abstract
Systematic review is a crucial tool for evidence-based medicine as it identifies and synthesizes published medical literature to inform prevention and intervention strategies. However, it requires intensive labor and time to identify relevant articles to include. While automating the screening process has been proposed using the abstracts, the performance is still suboptimal. Contrastive learning has achieved great success in computer vision but has not been used to expedite the systematic review process. In this thesis, we propose a new method using an autoencoder trained with contrastive loss to generate vector representation for abstracts. We apply data augmentation techniques on the abstract and train the autoencoder to generate representations for anchor and positive samples that are closer in vector space than those for anchor and negative samples. Our experiments suggest that contrastive learning can be used to help filter irrelevant articles during the abstract screening phase.
Table of Contents
1 Introduction 1
2 Background 5
2.1 Automating Systematic Reviews . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Word Embeddings . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Contrastive Representation Learning . . . . . . . . . . . . . . . . . . 8
3 CTRL-Screener 10 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Abstract Representation . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Contrastive Autoencoder . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.1 Data augmentation . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Experiment setup 17
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 Baseline Methods . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.3 Evaluation Strategy . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.4 Hyperparameter tuning of CTRL-Screener . . . . . . . . . . . 20
5 Results 21
5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Effect of Relative Order . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Effect of Sampling Percentage . . . . . . . . . . . . . . . . . . . . . . 24
6 Conclusion 25
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2.1 Additional datasets . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2.2 Comparison to other data augmentation techniques . . . . . . 26
6.2.3 Different levels of automation . . . . . . . . . . . . . . . . . . 26
6.2.4 Continual learning . . . . . . . . . . . . . . . . . . . . . . . . 26
Bibliography 27
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Improving Biomedical Abstract Screening Using Contrastive Learning () | 2023-04-18 01:19:39 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|