Medaboost — An Improved Ensemble Learning Algorithm in Classification with Multiple Annotations Pubblico

Dong, Yilin (Spring 2020)

Permanent URL: https://etd.library.emory.edu/concern/etds/1831ck968?locale=it

Published

Abstract

Classification algorithms build models that can classify new observations based on their features. While those algorithms require a training set of samples' features and labels, in reality, many datasets do not meet the requirement. Since having experts to give out manual labels has a high cost, many industries adopted crowdsourcing, which enables a group of people to contribute to the same labeling task. However, multiple annotations from different annotators cannot apply to classification algorithms because they assume that labels are single and consensus. In this paper, we use truth inference methods to estimate single labels given different annotations from multiple annotators. While the Expectation-Maximization method provides the best accuracy, our empirical results suggest that better predictive performance can be achieved by accounting for disagreements. Thus, we propose Medaboost, a new predictive model, that considers the degree of disagreements between annotators to improve predictive performance. Medaboost outperforms AdaBoost on both synthetic dataset and MIMIC-III dataset under different sets of simulated nurses’.

1 Introduction 1

1.1 A Case Study on Pressure Ulcers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 6

2.1 Truth Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Majority Voting (MV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.3 Weighted Majority Algorithm (WM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.4 Expectation-Maximization (EM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Adaboost: An Ensemble Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Medaboost 12

3.1 Estimation of Sample Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 A Robust Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Experiment Setup 18

4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.1.2 MIMIC-III Critical Care Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Training, Validation, and Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Truth Inference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.3 Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Result 35

5.1 Performance of truth inference methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1.2 MIMIC-III Critical Care Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Robustness and advantage of Medaboost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2.1 Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.2.2 MIMIC-III Critical Care Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Conclusions 46

Bibliography 49

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Computer Science
Degree	B.A.
Submission	Honors Thesis
Language	English
Research Field	Computer Science
Parola chiave	Machine Learning Data Mining Classification
Committee Chair / Thesis Advisor	Joyce Ho, Emory University
Committee Members	Roberto Franzosi, Emory University Davide Fossati, Emory University

Ultima modifica

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Medaboost — An Improved Ensemble Learning Algorithm in Classification with Multiple Annotations ()	2020-04-07 17:09:35 -0400	Download

Medaboost — An Improved Ensemble Learning Algorithm in Classification with Multiple Annotations Pubblico

Dong, Yilin (Spring 2020)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files