Medaboost — An Improved Ensemble Learning Algorithm in Classification with Multiple Annotations Pubblico
Dong, Yilin (Spring 2020)
Abstract
Classification algorithms build models that can classify new observations based on their features. While those algorithms require a training set of samples' features and labels, in reality, many datasets do not meet the requirement. Since having experts to give out manual labels has a high cost, many industries adopted crowdsourcing, which enables a group of people to contribute to the same labeling task. However, multiple annotations from different annotators cannot apply to classification algorithms because they assume that labels are single and consensus. In this paper, we use truth inference methods to estimate single labels given different annotations from multiple annotators. While the Expectation-Maximization method provides the best accuracy, our empirical results suggest that better predictive performance can be achieved by accounting for disagreements. Thus, we propose Medaboost, a new predictive model, that considers the degree of disagreements between annotators to improve predictive performance. Medaboost outperforms AdaBoost on both synthetic dataset and MIMIC-III dataset under different sets of simulated nurses’.
Table of Contents
1 Introduction 1
1.1 A Case Study on Pressure Ulcers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 6
2.1 Truth Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Majority Voting (MV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Weighted Majority Algorithm (WM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Expectation-Maximization (EM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Adaboost: An Ensemble Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Medaboost 12
3.1 Estimation of Sample Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 A Robust Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Experiment Setup 18
4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 MIMIC-III Critical Care Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Training, Validation, and Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Truth Inference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Result 35
5.1 Performance of truth inference methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.2 MIMIC-III Critical Care Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Robustness and advantage of Medaboost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.2 MIMIC-III Critical Care Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Conclusions 46
Bibliography 49
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Medaboost — An Improved Ensemble Learning Algorithm in Classification with Multiple Annotations () | 2020-04-07 17:09:35 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|