Improving Question Answering by Bridging Linguistic Structures and Statistical Learning Público
Jurczyk, Tomasz (Fall 2017)
Abstract
Question answering (QA) has lately gained lots of interest from both academic and industrial research. No matter the question, search engine users expect the machines to provide answers instantaneously, even without searching through relevant websites. While a significant portion of these questions ask for concise and well known facts, more complex questions do exist and they often require dedicated approaches to provide robust and accurate systems.
This thesis explores linguistically-oriented approaches for both factoid and non-factoid question answering and cross-genre text applications. The contributions include new annotation schemes for question answering oriented corpora, extracting linguistic structures and performing matching, and early exploration of conversation dialog text applications.
For sentence-based factoid question answering, a multi-stage crowdsourcing annotation scheme is presented. Next, a subtree matching algorithm for two sentences that aims to extract semantic similarity in open-domain texts is introduced and combined with a neural network architecture. Then, various factoid question answering corpora are thoroughly analyzed and cross-tested to improve the performance of QA systems. This thesis explores two complex scenarios of non-factoid question answering. In the first, a semantics-graph knowledge graph that is build on the top of linguistic structures is presented and applied on arithmetic questions using verb polarity classification. In the second, a system that combines lexical, syntactic and semantic text representations with statistical learning is presented and evaluated on event-based question answering. The last part of this thesis is focused on the cross-genre aspect of text in which the misalignment between the dialog and formal writings is the main challenge. First, an approach that combines semantic structure extraction with statistical learning is presented and used to improve the performance in the document retrieval task. Next, an exploration for the passage completion task is presented. A crowdsourcing annotation scheme is executed and a new corpus is created. A multi-gram convolutional neural network with the attention is compared to several state-of-the-art approaches for reading comprehension applications.
Table of Contents
1 Introduction 1
1.1 Motivation............................. 1
1.2 ResearchChallenges ....................... 2
1.3 Contributions ........................... 6
2 Related Work 12
2.1 Sentence-based Factoid Question Answering . . . . . . . . . . 13
2.1.1 Text-based Question Answering Tasks . . . . . . . . . 14
2.1.2 Question Answering Corpora............... 16
2.1.3 Syntacticand Semantic Matching . . . . . . . . . . . . 17
2.2 Neural Architectures ....................... 18
2.3 Non-factoid Question Answering................. 20
2.4 Applications to Cross-genreTasks................ 23
3 Sentence-based Factoid Question Answering 26
3.1 Multi-stage Annotation Scheme for Question Answering . . . . 30
3.1.1 Data Collection...................... 30
3.1.2 Annotation Scheme.................... 31
3.1.3 Corpus Analysis...................... 35
3.2 Subtree Matching with Statistical Learning . . . . . . . . . . . 37
3.2.1 Subtree Matching Algorithm............... 38
3.2.2 Convolutional Neural Networks . . . . . . . . . . . . . 42
3.2.3 Experiments........................ 44
3.3 Cross-evaluation of Factoid Question Answering Corpora . . . 54
3.3.1 Intrinsic Analysis..................... 55
3.3.2 Answer Passage Retrieval ................ 60
3.3.3 Extrinsic Analysis..................... 62
3.4 Summary ............................. 64
4 Non-factoid Question Answering 65
4.1 Semantics-based Graph Approach to Complex Question Answering............................... 67
4.1.1 Semantics-based Knowledge Approach . . . . . . . . . 67
4.1.2 Graph Construction ................... 71
4.1.3 Arithmetic Questions................... 73
4.1.4 Experiments........................ 77
4.2 Multi-field Structural Decomposition for Event-based QuestionAnswering .......................... 79
4.2.1 Approach ......................... 80
4.2.2 Experiments........................ 86
4.3 Summary ............................. 89
5 Applications to Cross-genre Tasks 90
5.1 Cross-genre Document Retrieval................. 92
5.1.1 Data............................ 92
5.1.2 Structure Reranking ................... 93
5.1.3 Experiments........................ 98
5.2 Cross-genre Passage Completion.................103
5.2.1 Data............................104
5.2.2 Approach .........................108
5.2.3 Experiments........................110
5.3 Summary .............................112
6 Conclusions and future work 114
6.1 Summary .............................114
6.2 Limitations ............................117
6.3 Futurework............................119
Bibliography 122
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palabra Clave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Improving Question Answering by Bridging Linguistic Structures and Statistical Learning () | 2017-11-08 12:40:42 -0500 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|