Improving Question Answering by Bridging Linguistic Structures and Statistical Learning Público

Jurczyk, Tomasz (Fall 2017)

Permanent URL: https://etd.library.emory.edu/concern/etds/xg94hp534?locale=es

Published

Abstract

Question answering (QA) has lately gained lots of interest from both academic and industrial research. No matter the question, search engine users expect the machines to provide answers instantaneously, even without searching through relevant websites. While a significant portion of these questions ask for concise and well known facts, more complex questions do exist and they often require dedicated approaches to provide robust and accurate systems.

This thesis explores linguistically-oriented approaches for both factoid and non-factoid question answering and cross-genre text applications. The contributions include new annotation schemes for question answering oriented corpora, extracting linguistic structures and performing matching, and early exploration of conversation dialog text applications.

For sentence-based factoid question answering, a multi-stage crowdsourcing annotation scheme is presented. Next, a subtree matching algorithm for two sentences that aims to extract semantic similarity in open-domain texts is introduced and combined with a neural network architecture. Then, various factoid question answering corpora are thoroughly analyzed and cross-tested to improve the performance of QA systems. This thesis explores two complex scenarios of non-factoid question answering. In the first, a semantics-graph knowledge graph that is build on the top of linguistic structures is presented and applied on arithmetic questions using verb polarity classification. In the second, a system that combines lexical, syntactic and semantic text representations with statistical learning is presented and evaluated on event-based question answering. The last part of this thesis is focused on the cross-genre aspect of text in which the misalignment between the dialog and formal writings is the main challenge. First, an approach that combines semantic structure extraction with statistical learning is presented and used to improve the performance in the document retrieval task. Next, an exploration for the passage completion task is presented. A crowdsourcing annotation scheme is executed and a new corpus is created. A multi-gram convolutional neural network with the attention is compared to several state-of-the-art approaches for reading comprehension applications.

1 Introduction 1

1.1 Motivation............................. 1

1.2 ResearchChallenges ....................... 2

1.3 Contributions ........................... 6

2 Related Work 12

2.1 Sentence-based Factoid Question Answering . . . . . . . . . . 13

2.1.1 Text-based Question Answering Tasks . . . . . . . . . 14

2.1.2 Question Answering Corpora............... 16

2.1.3 Syntacticand Semantic Matching . . . . . . . . . . . . 17

2.2 Neural Architectures ....................... 18

2.3 Non-factoid Question Answering................. 20

2.4 Applications to Cross-genreTasks................ 23

3 Sentence-based Factoid Question Answering 26

3.1 Multi-stage Annotation Scheme for Question Answering . . . . 30

3.1.1 Data Collection...................... 30

3.1.2 Annotation Scheme.................... 31

3.1.3 Corpus Analysis...................... 35

3.2 Subtree Matching with Statistical Learning . . . . . . . . . . . 37

3.2.1 Subtree Matching Algorithm............... 38

3.2.2 Convolutional Neural Networks . . . . . . . . . . . . . 42

3.2.3 Experiments........................ 44

3.3 Cross-evaluation of Factoid Question Answering Corpora . . . 54

3.3.1 Intrinsic Analysis..................... 55

3.3.2 Answer Passage Retrieval ................ 60

3.3.3 Extrinsic Analysis..................... 62

3.4 Summary ............................. 64

4 Non-factoid Question Answering 65

4.1 Semantics-based Graph Approach to Complex Question Answering............................... 67

4.1.1 Semantics-based Knowledge Approach . . . . . . . . . 67

4.1.2 Graph Construction ................... 71

4.1.3 Arithmetic Questions................... 73

4.1.4 Experiments........................ 77

4.2 Multi-field Structural Decomposition for Event-based QuestionAnswering .......................... 79

4.2.1 Approach ......................... 80

4.2.2 Experiments........................ 86

4.3 Summary ............................. 89

5 Applications to Cross-genre Tasks 90

5.1 Cross-genre Document Retrieval................. 92

5.1.1 Data............................ 92

5.1.2 Structure Reranking ................... 93

5.1.3 Experiments........................ 98

5.2 Cross-genre Passage Completion.................103

5.2.1 Data............................104

5.2.2 Approach .........................108

5.2.3 Experiments........................110

5.3 Summary .............................112

6 Conclusions and future work 114

6.1 Summary .............................114

6.2 Limitations ............................117

6.3 Futurework............................119

Bibliography 122

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Computer Science and Informatics
Degree	Ph.D.
Submission	Dissertation
Language	English
Research Field	Computer Science
Palabra Clave	Natural language processing, Information retrieval, Machine learning, question answering
Committee Chair / Thesis Advisor	Choi, Jinho, Emory University
Committee Members	Agichtein, Eugene, Emory University Gavalda, Marsal, Square, Inc. Xiong, Li, Emory University

Última modificación

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Improving Question Answering by Bridging Linguistic Structures and Statistical Learning ()	2017-11-08 12:40:42 -0500	Download

Improving Question Answering by Bridging Linguistic Structures and Statistical Learning Público

Jurczyk, Tomasz (Fall 2017)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files