Character Identification on Multi-party Dialogues Open Access

Chen, Yu-Hsin (Henry) (2017)

Permanent URL: https://etd.library.emory.edu/concern/etds/qr46r1039?locale=en

Published

Abstract

This thesis introduces a subtask of entity linking, called character identification, that maps mentions in multiparty conversation to their referent characters. Transcripts of TV shows are collected as the sources of our corpus and automatically annotated with mentions by linguistically-motivated rules. These mentions are manually linked to their referents and disambiguate with abstract referent labels through crowdsourcing. Our annotated corpus comprises 448 scenes from 2 seasons and 46 episodes of the TV show Friends, and shows the inter-annotator agreement of κ = 79.96. For statistical modeling, this task is reformulated as coreference resolution, and experimented with two state-of-the-art systems on our corpus. A novel mention-to-mention ranking model is proposed to provides better mention and mention-pair representations learned from feature groupings of dialogue-specific features After linking coreferent clusters to their referent entity with our proposed rule-based remapping algorithm, the best model gives a purity score of 57.27% on average, which is promising given the challenging nature of this task and our corpus.

1 Introduction 1.1 Task definition 1.2 Motivation 1.3 Objectives 2 Background 2.1 Entity linking 2.2 Speaker identification 2.3 Conversation Corpora 2.4 Neural network 2.4.1 Word2Vec and Word Embeddings 2.4.2 Convolutional Neural Network 2.5 Coreference resolution 2.5.1 Evaluation Metrics 2.5.2 Stanford Sieve System 2.5.3 Stanford Neural System 2.5.4 Harvard Neural System 3 Corpus 3.1 Corpus creation 3.1.1 Data collection 3.1.2 Mention detection 3.1.3 Corpus annotation 3.1.4 Corpus adjudication 3.1.5 Corpus disambiguation 3.2 Corpus analysis 3.2.1 Annotation results 3.2.2 Mention Detection Error 3.2.3 Annotation Disagreement 4 Approaches 4.1 Data Formulation 4.1.1 Data Split 4.1.2 Data formats 4.2 Coreference resolution 4.2.1 CNN mention-to-mention model 4.3 Character identification 4.3.1 Rule-based entity linker 5 Experiments 5.1 Task analysis 5.1.1 Task feasibility 5.1.2 Rule-based vs. statistical model 5.1.3 Episode-delim vs. scene-delim documents 5.1.4 Learning past vs. future conversations 5.2 Coreference resolution 5.2.1 Stanford neural system 5.2.2 Harvard neural system 5.2.3 CNN mention-to-mention ranking model 5.3 Character identification 5.3.1 Linking rules 5.3.2 Cluster remapping 6 Conclusion

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Computer Science
Degree	BS
Submission	Honors Thesis
Language	English
Research Field	Computer Science Artificial Intelligence
Keyword	Entity Linking Dialogue Processing Character Identification Corpus Creation Natural Language Processing Coreference Resolution
Committee Chair / Thesis Advisor	Choi, Jinho, Emory University
Committee Members	Lu, James, Emory University Julien, Heather F, Emory University

Last modified

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Character Identification on Multi-party Dialogues ()	2018-08-28 11:20:32 -0400	Download

Character Identification on Multi-party Dialogues Open Access

Chen, Yu-Hsin (Henry) (2017)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files