FriendsQA: Open-Domain Question Answering Dataset on TV Show Transcripts Öffentlichkeit

Yang, Zhengzhe (Spring 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/4q77fs51r?locale=de
Published

Abstract

This thesis presents FriendsQA, a challenging question answering dataset that contains 1,222 dialogues and 10,610 open-domain questions, to tackle machine comprehension on everyday conversations. Each dialogue, involving multiple speakers, is annotated with six types of questions {what, when, why, where, who, how} regarding the dialogue contexts, and the answers are annotated with contiguous spans in the dialogue. A series of crowdsourcing tasks are conducted to ensure good annotation quality, resulting a high inter-annotator agreement of 81.82%. A comprehensive annotation analytics is provided for a deeper understanding in this dataset. Three state-of-the-art QA systems are experimented, R-Net, QANet, and BERT, and evaluated on this dataset. BERT in particular depicts promising results, an accuracy of 74.2% for answer utterance selection and an F1-score of 64.2% for answer span selection, suggesting that the FriendsQA task is hard yet has a great potential of elevating QA research on multiparty dialogue to another level.

Table of Contents

Contents

1 Introduction 1

2 Background 6

2.1 QA Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 QA Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Character Mining . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Friends QA vs. Other Dialogue QA . . . . . . . . . . . . . . . 11

3 The Corpus 12

3.1 FriendsQA Dataset . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Phase 1: Question-Answer Generation . . . . . . . . . . . . . 16

3.4 Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . 18

3.5 Phase 2: Verication and Paraphrasing . . . . . . . . . . . . . 19

3.6 Four Rounds of Annotation . . . . . . . . . . . . . . . . . . . 19

3.7 Question/Answer Pruning . . . . . . . . . . . . . . . . . . . . 21

3.8 Inter-annotator Agreement . . . . . . . . . . . . . . . . . . . . 22

3.9 Question Types vs. Answer Categories . . . . . . . . . . . . . 23

4 Approach 26

4.1 R-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 QANet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Experiments 29

5.1 Model Development . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Conclusion 41

Appendix 7 - Complete Results 43

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Stichwort
Committee Chair / Thesis Advisor
Committee Members
Zuletzt geändert

Primary PDF

Supplemental Files