An Analysis of Causal Language Constructions in Diverse Discourse Data 公开

Cao, Angela (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/zg64tn472?locale=zh
Published

Abstract

Creating datasets of manually annotated texts for relations such as causality has been of interest to computational linguists. The purpose of this thesis is to provide the annotated Constructions of CAUSE, ENABLE, and PREVENT (CCEP) corpus, contribute to the field by systematizing the nuanced CAUSE, ENABLE, and PREVENT roles, and enable annotation of a wide variety of causal construction types. This corpus utilizes constructions as the basic unit of causal language, which is based on the linguistic paradigm entitled Construction Grammar (CxG) and manifests through the surface construction labeling (SCL) approach. This project adapts a pre-identified bank of causal connectives (the Constructicon) from Dunietz (2018), which are used as triggers for annotation instances. Through high inter-annotator performance demonstrated in the corpus of 150 doubly-annotated documents based on the CCEP guidelines, I (1) support Wolff et al. (2005)’s causal aspectualization as psychologically real through high inter-annotator agreement of distinguishing such, (2) build upon previous annotation work that aim to embed this model of causation, and (3) provide a high quality dataset for understanding textual causality. 

Table of Contents

(1) Overview  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

(1.1) Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

(1.2) Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

(1.3) Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

(1.4) Summary of contributions . . . . . . . . . . . . . . . . . . . . . 5

(2) Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

(2.1) Linguistic analysis of causal language . . . . . . . . . . . 6

(2.2) Annotation of causal language . . . . . . . . . . . . . . . . . 7

(2.2.1) The BECauSE corpus of causal language . . . . . . 10

(2.3) A preliminary study . . . . . . . . . . . . . . . . . . . . . . . . . .11

(2.4) Advancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

(3) The CCEP Annotation Scheme . . . . . . . . . . . . . . . . . .18

(3.1) Our working definition of “causal language” . . . . . . .18

(3.2) Parts of a causal instance in annotation . . . . . . . . . .20

(3.3) The Constructicon . . . . . . . . . . . . . . . . . . . . . . . . . .  20

(3.4) Types of causation . . . . . . . . . . . . . . . . . . . . . . . . . . 22

(3.4.1) CAUSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

(3.4.2) ENABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

(3.4.3) PREVENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

(3.4.4) Differentiating CEP while annotating . . . . . . . . . . .25

(3.5) CAUSE vs. ENABLE . . . . . . . . . . . . . . . . . . . . . . . . 26

(3.6) The annotation tool . . . . . . . . . . . . . . . . . . . . . . . . . 30

(4) The CCEP for Causal Language . . . . . . . . . . . . . . . . 32

(4.1) Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

(4.2) Overview of the CCEP Corpus . . . . . . . . . . . . . . . . 34

(4.2.1) Inter-Annotator Agreement . . . . . . . . . . . . . . . . . . 34

(4.2.2) Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

(4.3) Key findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

(5) Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

(5.1) Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . . .  42

(5.2) Summary of contributions . . . . . . . . . . . . . . . . . . . . .43

(5.3) Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . .44

(5.3.1) Linguistic extensions . . . . . . . . . . . . . . . . . . . . . . . 44

(5.3.2) Annotation extensions . . . . . . . . . . . . . . . . . . . . . . 44

(Appendix A) The CCEP Annotation Guidelines . . . . . . . .46

(A.1) Overview of causal linguistic constructions . . . . . . . 46

(A.2) Annotatable units . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

(A.3) Causation classification . . . . . . . . . . . . . . . . . . . . . . 49

(A.3.1) Overview of categories . . . . . . . . . . . . . . . . . . . . . 50

(A.3.2) Decision tree for causation classification . . . . . . . 55

(A.4) Edge cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

(A.4.1) Special cases of connectives . . . . . . . . . . . . . . . . 58

(A.4.2) Special cases of arguments . . . . . . . . . . . . . . . . . 62

(A.4.3) Specifications for Reddit posts . . . . . . . . . . . . . . . 67

(A.5) Suggestions for the annotation process . . . . . . . . . .68

(A.6) Example annotations . . . . . . . . . . . . . . . . . . . . . . . . 70

(Appendix B) Sample of the Constructicon . . . . . . . . . . . .74

(Appendix C) Training quiz sample: Training Quiz 2 . . . . .76

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改

Primary PDF

Supplemental Files