An Analysis of Causal Language Constructions in Diverse Discourse Data 公开
Cao, Angela (Spring 2022)
Abstract
Creating datasets of manually annotated texts for relations such as causality has been of interest to computational linguists. The purpose of this thesis is to provide the annotated Constructions of CAUSE, ENABLE, and PREVENT (CCEP) corpus, contribute to the field by systematizing the nuanced CAUSE, ENABLE, and PREVENT roles, and enable annotation of a wide variety of causal construction types. This corpus utilizes constructions as the basic unit of causal language, which is based on the linguistic paradigm entitled Construction Grammar (CxG) and manifests through the surface construction labeling (SCL) approach. This project adapts a pre-identified bank of causal connectives (the Constructicon) from Dunietz (2018), which are used as triggers for annotation instances. Through high inter-annotator performance demonstrated in the corpus of 150 doubly-annotated documents based on the CCEP guidelines, I (1) support Wolff et al. (2005)’s causal aspectualization as psychologically real through high inter-annotator agreement of distinguishing such, (2) build upon previous annotation work that aim to embed this model of causation, and (3) provide a high quality dataset for understanding textual causality.
Table of Contents
(1) Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
(1.1) Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
(1.2) Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
(1.3) Thesis statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
(1.4) Summary of contributions . . . . . . . . . . . . . . . . . . . . . 5
(2) Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
(2.1) Linguistic analysis of causal language . . . . . . . . . . . 6
(2.2) Annotation of causal language . . . . . . . . . . . . . . . . . 7
(2.2.1) The BECauSE corpus of causal language . . . . . . 10
(2.3) A preliminary study . . . . . . . . . . . . . . . . . . . . . . . . . .11
(2.4) Advancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
(3) The CCEP Annotation Scheme . . . . . . . . . . . . . . . . . .18
(3.1) Our working definition of “causal language” . . . . . . .18
(3.2) Parts of a causal instance in annotation . . . . . . . . . .20
(3.3) The Constructicon . . . . . . . . . . . . . . . . . . . . . . . . . . 20
(3.4) Types of causation . . . . . . . . . . . . . . . . . . . . . . . . . . 22
(3.4.1) CAUSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
(3.4.2) ENABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
(3.4.3) PREVENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
(3.4.4) Differentiating CEP while annotating . . . . . . . . . . .25
(3.5) CAUSE vs. ENABLE . . . . . . . . . . . . . . . . . . . . . . . . 26
(3.6) The annotation tool . . . . . . . . . . . . . . . . . . . . . . . . . 30
(4) The CCEP for Causal Language . . . . . . . . . . . . . . . . 32
(4.1) Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
(4.2) Overview of the CCEP Corpus . . . . . . . . . . . . . . . . 34
(4.2.1) Inter-Annotator Agreement . . . . . . . . . . . . . . . . . . 34
(4.2.2) Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
(4.3) Key findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
(5) Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
(5.1) Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
(5.2) Summary of contributions . . . . . . . . . . . . . . . . . . . . .43
(5.3) Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
(5.3.1) Linguistic extensions . . . . . . . . . . . . . . . . . . . . . . . 44
(5.3.2) Annotation extensions . . . . . . . . . . . . . . . . . . . . . . 44
(Appendix A) The CCEP Annotation Guidelines . . . . . . . .46
(A.1) Overview of causal linguistic constructions . . . . . . . 46
(A.2) Annotatable units . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
(A.3) Causation classification . . . . . . . . . . . . . . . . . . . . . . 49
(A.3.1) Overview of categories . . . . . . . . . . . . . . . . . . . . . 50
(A.3.2) Decision tree for causation classification . . . . . . . 55
(A.4) Edge cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
(A.4.1) Special cases of connectives . . . . . . . . . . . . . . . . 58
(A.4.2) Special cases of arguments . . . . . . . . . . . . . . . . . 62
(A.4.3) Specifications for Reddit posts . . . . . . . . . . . . . . . 67
(A.5) Suggestions for the annotation process . . . . . . . . . .68
(A.6) Example annotations . . . . . . . . . . . . . . . . . . . . . . . . 70
(Appendix B) Sample of the Constructicon . . . . . . . . . . . .74
(Appendix C) Training quiz sample: Training Quiz 2 . . . . .76
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
An Analysis of Causal Language Constructions in Diverse Discourse Data () | 2022-04-09 15:36:14 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|