Heuristic Based Extraction of Causal Relations from Annotated Causal Cue Phrases Público

Hausknecht, Matthew John (2009)

Permanent URL: https://etd.library.emory.edu/concern/etds/sn009x79d?locale=pt-BR
Published

Abstract

This work focuses on the detection and extraction of Causal Relations from open domain text starting with annotated Causal Cue Phrases (CCPs). It is argued that the problem of causality extraction should be decomposed into two distinct subtasks. First, it is necessary to identify Causal Cue Phrases (CCPs) inside of a body of text. Second, using these CCPs, the cause and effect phrases of each causal relation must be extracted. To prove that CCPs are an essential part of causality extraction, it is experimentally demonstrated that the accuracy of cause and effect phrase extraction dramatically increases when CCP knowledge is utilized. A 31% increase in accuracy of cause and effect phrase extraction of two equivalent CRF machine learning algorithms is found when simple, word-based knowledge of CCPs is taken into account. Furthermore, it is shown that cause and effect phrase extraction can be performed accurately and robustly without the aid of complex machine learning techniques. A simple, heuristic based extraction algorithm, centering around three distinct classes of CCPs, is introduced. This algorithm achieves an accuracy of 87% on the task of extracting cause and effect phrases. While the problem of identifying CCPs in open domain text is not addressed, it is hypothesized that this task is far easier than identifying cause and effect phrases alone because the space of all possible CCPs is far smaller than that of all causal relations. Finally, this work contributes a free, publicly accessible corpus explicitly annotated with both intra-sentential causal relations and corresponding Causal Cue Phrases. It is our hope that this resource may see future use as a standard corpus for the task of causality extraction.

Table of Contents

1 Introduction

1.1 Problem Statement

2 Related Work

3 Annotation Guidelines

4 Causal Cue Phrases

5 Causal Relations

5.1 Scope

5.2 Explicit and Implicit

5.3 Ambiguity

6 Corpus Creation

7 Method

7.1 Background

7.2 Eager CCPs

7.3 Verbal CCPs

7.4 Non-Verbal CCPs

7.5 Reversal Keywords

7.6 Extraction Algorithm

8 Experimentation

9 Results

10 Future Work

11 Conclusions

12 References

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palavra-chave
Committee Chair / Thesis Advisor
Committee Members
Última modificação

Primary PDF

Supplemental Files