Enriching Open-Domain Dialogue Models with Predictive Social Commonsense Public
Finch (Fillwock), Sarah (Spring 2024)
Abstract
The advancement of open-domain dialogue systems represents a significant goal of artificial intelligence, aiming to create more engaging and human-like interactions between machines and users. A key challenge in this domain is equipping these systems with a profound understanding of human experiences, the nuances of which are often subtly implied rather than explicitly stated in conversations. Social commonsense resources aim to support the comprehension of human experiences, including capturing commonsense knowledge about people's motivations, the causes of events, emotions, and more. However, the existing datasets and methodologies for social commonsense integration into dialogue applications suffer from low coverage, sparse detail, and contextual redundancy, thereby impeding their capability to promote meaningful dialogue interactions. Recognizing these limitations, this dissertation explores the enhancement of open-domain dialogue systems through improved integration of social commonsense knowledge.
This dissertation is structured around three core objectives: developing a reliable evaluation framework for assessing commonsense capability in dialogue models, creating a novel dataset of contextually novel commonsense inferences tailored for dialogue, and integrating these inferences into dialogue models to enhance their conversational abilities. The first objective is addressed through the introduction of Annotation of Behaviors in Chat Evaluation (ABC-Eval), a binary behavior-based evaluation framework that offers a more objective and grounded assessment of dialogue models' commonsense reasoning capabilities. The second objective is achieved with the development of ConvoSense, which is the largest dataset of its kind to provide novel commonsense inferences designed specifically for dialogue contexts. Finally, the third objective culminates in the presentation of Commonsense Inference Generate-Select-Respond (CSI-GSR), a novel approach that leverages the rich pool of commonsense inferences from ConvoSense to generate dialogue responses.
The findings of this dissertation highlight the current capabilities of LLM-based dialogue models and the benefits of incorporating predictive commonsense inferences for response guidance. The work on ABC-Eval reveals that commonsense errors are highly prevalent in neural dialogue systems, thus highlighting the importance of improving commonsense capabilities of dialogue models. The work on ConvoSense produces powerful resources and models for capturing multi-faceted and predictive social commonsense inferences for dialogue. The work on CSI-GSR showcases the utility of these multi-faceted and predictive social commonsense inferences for advancing response specificity to its dialogue context. Collectively, this body of work supports the pursuit of more nuanced, contextually aware, and intelligent human-computer interactions.
Table of Contents
1 Introduction 1
1.1 Motivation 1
1.2 Central Thesis and Research Questions 4
1.3 Research Contributions 5
1.3.1 Commonsense Evaluation for Dialogue Models (Ch. 3) 5
1.3.2 Commonsense Dataset for Dialogue Models (Ch. 4) 6
1.3.3 Commonsense-Augmented Dialogue Model (Ch. 5) 6
1.4 Organization 7
2 Background 8
2.1 Large Language Models 8
2.2 Commonsense Resources for Dialogue 9
2.3 Commonsense Augmentation for Dialogue Models 14
2.4 Commonsense Evaluation for Dialogue Models 17
3 Annotation of Behaviors in Chat 18
3.1 Introduction 18
3.2 ABC-Eval Development 20
3.2.1 Collecting Behavior Label Candidates 21
3.2.2 Pilots and Development 21
3.2.3 Final ABC-Eval Design 27
3.3 Existing Evaluation Methods 36
3.4 Evaluation Collection 37
3.5 Evaluation Analysis 39
3.5.1 Interpretability 39
3.5.2 Importance 40
3.5.3 Sensitivity 41
3.5.4 Coverage & Distinctness 42
3.6 Cost 43
3.7 Dialogue Model Insights 45
3.8 Conclusion 45
4 ConvoSense: Generating Commonsense for Dialogue 47
4.1 Introduction 47
4.2 ChatGPT Dialogue Commonsense Generation 48
4.2.1 Prompt Engineering 48
4.2.2 Evaluating ChatGPT-generated Commonsense 50
4.2.3 Results 51
4.3 Constructing Commonsense Datasets 54
4.3.1 ConvoSense: New ChatGPT-generated Dataset 54
4.3.2 HumanGen: Human-generated Datasets 58
4.3.3 Dataset Statistics 59
4.4 Generative Commonsense Models 59
4.4.1 Training and Decoding Strategies 59
4.4.2 Model Configuration 61
4.5 Generative Model Evaluation 61
4.5.1 Automatic Reference Metrics 62
4.5.2 Human Evaluations 65
4.6 Conclusion 66
5 Generate-Select-Respond: Commonsense-Augmented Dialogue Model 67
5.1 Introduction 67
5.2 Approach 68
5.3 Approach Development 70
5.3.1 Commonsense Inference Generate-Select-Respond: CSI-GSR 70
5.3.2 Pilot Study 78
5.3.3 Approach Improvement 84
5.3.4 Few-shot Learning 89
5.4 Evaluation 96
5.4.1 Models 96
5.4.2 Test Data 99
5.4.3 Metrics 99
5.4.4 Metric Reliability 105
5.4.5 Results 105
5.5 Conclusion 110
6 Conclusion 112
6.1 Research Contributions 112
6.2 Future Work 113
6.2.1 Behavioral Evaluation of LLM-based Dialogue Models 114
6.2.2 Commonsense-augmented Dialogue Models “In-the-Wild” 114
6.2.3 Improving Social Commonsense Integration for Open-Domain Dialogue 115
6.2.4 Alternative Commonsense Integrations 115
6.2.5 Improving Open-source LLMs with Commonsense 116
6.2.6 Enhancing Topical Salience in Dialogue Models 116
6.2.7 Refining Response Content Control 116
About this Dissertation
| School | |
|---|---|
| Department | |
| Degree | |
| Submission | |
| Language | 
 | 
| Research Field | |
| Mot-clé | |
| Committee Chair / Thesis Advisor | |
| Committee Members | 
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions | 
|---|---|---|---|
|  | Enriching Open-Domain Dialogue Models with Predictive Social Commonsense () | 2024-04-05 21:58:23 -0400 |  | 
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions | 
|---|