Leveraging Diverse Data Generation for Domain-Adaptable Dialogue State Tracking Open Access

Finch, James (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/st74cr95q?locale=en%255D
Published

Abstract

This work investigates improving domain adaptability in Dialogue State Tracking (DST), a crucial

task for integrating conversational AI to real-world software applications. DST produces structured

state representations that track important information in dialogue, which can be used as an interface

to external software components and for controlling dialogue model behavior. However, obtaining

DST models that can robustly adapt to new application domains is an ongoing research challenge.

The proposed work aims to improve the utility of DST by making the domain adaptation of DST

models more effective and cost-efficient. To achieve this, a new task is proposed called Dialogue

State Generation (DSG). The goal of DSG is to infer both the schema and values of dialogue

state in unseen dialogue domains, and experimental results demonstrate the effectiveness of the

presented DSG approach for tackling the challenge of domain generalizability. The DSG approach

is then extended for Slot Schema Induction, which is shown to be the first practical method for

discovering a consistent set of new slot types from unlabeled data. Finally, the novel DSG and

Schema Induction approaches are leveraged to generate a synthetic DST dataset with silver dialogue

state labels that covers 1,000 different domains, an order of magnitude more than any existing

dataset. An evaluation of few- and zero-shot DST models trained on the domain-diverse synthetic

data demonstrates a substantial positive impact on DST domain adaptation. These contributions

improve the feasibility of integrating conversational AI in real-world applications, taking steps

towards the global improvement of software applications’ efficacy and ease of use.

Table of Contents

1 Introduction 1

1.1 Application Utility of Conversational AI . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Software Integration of Conversational AI . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Challenges of Dialogue State Tracking . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 9

2.1 Task-Oriented Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Data for Task-Oriented Dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Dialogue State Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Slot Schema Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Emora: A Socialbot Built from Custom State and Policy Rules 19

3.1 Application Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Motivations for a Structured-State-Based Socialbot . . . . . . . . . . . . . . . . . 22

3.3 Emora-STDM: Controllable Human-Computer Dialogue Using Structured Dialogue

State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 NATEX: Natural Language Expression . . . . . . . . . . . . . . . . . . . . 23

3.3.2 Dialogue Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 Design and Deployment of Emora . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4.1 Conversation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Experimental Evaluation of Emora . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5.1 Chatbot Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.2 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5.3 Evaluation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5.4 Chatbot Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Dialogue State Generation 43

4.1 Dialogue State Generation (DSG) . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3 GPTPipe: DSG with Zeroshot Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 Question-Answer Pair (QA) Generation . . . . . . . . . . . . . . . . . . . 48

4.3.2 Slot-Value Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 DSG5K: New Diverse DSG Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4.1 Scenario Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.2 Information Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4.3 Dialogue Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 End-to-End (E2E) DSG Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6.3 Human Evaluation: Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6.4 Human Evaluation: Results . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6.5 Automatic Evaluation: Metrics . . . . . . . . . . . . . . . . . . . . . . . . 56

4.6.6 Automatic Evaluation: Results . . . . . . . . . . . . . . . . . . . . . . . . 58

4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Slot Induction 61

5.1 DSG-I: Inducing Slots from DSG Inferences . . . . . . . . . . . . . . . . . . . . . 62

5.1.1 Encoding Slot Value Candidates . . . . . . . . . . . . . . . . . . . . . . . 63

5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2.3 Pilot Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Domain-Adaptable Dialogue State Tracking 77

6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.6 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7 Conclusion 90

7.1 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.2 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Bibliography 94

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files