Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions Open Access

Liu, Zhexiong (Spring 2020)

Permanent URL: https://etd.library.emory.edu/concern/etds/b8515p57p?locale=en


Entity extraction problems have been extensively studied in terms of investigating the capability of extracting entities from text using natural language processing (NLP). Most research involves training learnable models on a large amount of corpus to ex- tract entities and determine their salience. Typically, these systems aim to retrieve an array of ranked entities from a set of documents while giving queries, which mainly measure the relevance between queries and entities. However, this thesis leverages semantic and syntactic information within the documents to perform entities extraction as well as entity ranking. In particular, given document corpus, constituency parsing trees are constructed to extract entity mentions (phrases) for each article. Meanwhile, dependency parsing trees and entity coreference clusters are employed to build a relation graph, of which nodes denote entity mentions and edges denote mention relations. Moreover, graph convolution is performed on the relation graph to normalize the mention representation with respect to mention embeddings. Hierarchical density-based clustering and ranking mechanism are applied to compute entity priors. To evaluate this work, three models are proposed and evaluated on 60 annotated articles. Preliminary results illustrate that the usage of parsing trees, along with entity coreference relations improves the effectiveness of entity extraction and ranking. The interesting hierarchical trees for entity extraction, the principles for graph construction, as well as the system architecture serve as main contributions of this thesis. 

Table of Contents


1 Introduction

1.1 Motivation

1.2 Research Questions

1.3 Contribution

1.4 Organization

2 Backgrounds

2.1 Word Embeddings

2.2 Keyphrase Extraction

2.3 Entities Ranking

2.4 Graph-based Approaches

3 Approaches

3.1 Constituency Parsing

3.2 Dependence Parsing

3.3 Coreference Resolution

3.4 Embedding Normalization

3.5 Graph Convolutions

3.6 Clustering

3.7 Models

3.7.1 Baseline Model

3.7.2 Coreference Model

3.7.3 Convolutional Model

4 Experiments

4.1 Experimental Setup

4.2 DataExploration

4.3 EvaluationMetrics

4.4 ModelEvaluation

5 Analysis


About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files