Leveraging Diverse Data Generation for Domain-Adaptable Dialogue State Tracking Open Access
Finch, James (Spring 2024)
Abstract
This work investigates improving domain adaptability in Dialogue State Tracking (DST), a crucial
task for integrating conversational AI to real-world software applications. DST produces structured
state representations that track important information in dialogue, which can be used as an interface
to external software components and for controlling dialogue model behavior. However, obtaining
DST models that can robustly adapt to new application domains is an ongoing research challenge.
The proposed work aims to improve the utility of DST by making the domain adaptation of DST
models more effective and cost-efficient. To achieve this, a new task is proposed called Dialogue
State Generation (DSG). The goal of DSG is to infer both the schema and values of dialogue
state in unseen dialogue domains, and experimental results demonstrate the effectiveness of the
presented DSG approach for tackling the challenge of domain generalizability. The DSG approach
is then extended for Slot Schema Induction, which is shown to be the first practical method for
discovering a consistent set of new slot types from unlabeled data. Finally, the novel DSG and
Schema Induction approaches are leveraged to generate a synthetic DST dataset with silver dialogue
state labels that covers 1,000 different domains, an order of magnitude more than any existing
dataset. An evaluation of few- and zero-shot DST models trained on the domain-diverse synthetic
data demonstrates a substantial positive impact on DST domain adaptation. These contributions
improve the feasibility of integrating conversational AI in real-world applications, taking steps
towards the global improvement of software applications’ efficacy and ease of use.
Table of Contents
1 Introduction 1
1.1 Application Utility of Conversational AI . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Software Integration of Conversational AI . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Challenges of Dialogue State Tracking . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background 9
2.1 Task-Oriented Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Data for Task-Oriented Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Dialogue State Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Slot Schema Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Emora: A Socialbot Built from Custom State and Policy Rules 19
3.1 Application Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Motivations for a Structured-State-Based Socialbot . . . . . . . . . . . . . . . . . 22
3.3 Emora-STDM: Controllable Human-Computer Dialogue Using Structured Dialogue
State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 NATEX: Natural Language Expression . . . . . . . . . . . . . . . . . . . . 23
3.3.2 Dialogue Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Design and Deployment of Emora . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.1 Conversation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Experimental Evaluation of Emora . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Chatbot Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.2 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.3 Evaluation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.4 Chatbot Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Dialogue State Generation 43
4.1 Dialogue State Generation (DSG) . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 GPTPipe: DSG with Zeroshot Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 Question-Answer Pair (QA) Generation . . . . . . . . . . . . . . . . . . . 48
4.3.2 Slot-Value Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 DSG5K: New Diverse DSG Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4.1 Scenario Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.2 Information Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4.3 Dialogue Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 End-to-End (E2E) DSG Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6.3 Human Evaluation: Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6.4 Human Evaluation: Results . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6.5 Automatic Evaluation: Metrics . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6.6 Automatic Evaluation: Results . . . . . . . . . . . . . . . . . . . . . . . . 58
4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Slot Induction 61
5.1 DSG-I: Inducing Slots from DSG Inferences . . . . . . . . . . . . . . . . . . . . . 62
5.1.1 Encoding Slot Value Candidates . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Pilot Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 Domain-Adaptable Dialogue State Tracking 77
6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7 Conclusion 90
7.1 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Bibliography 94
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Leveraging Diverse Data Generation for Domain-Adaptable Dialogue State Tracking () | 2024-04-05 21:57:50 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|