Emora Assistant Bot: Revolutionizing Task Automation and Chatbot Evaluation Open Access

Paek, Ellie (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/79407z70z?locale=en%5D
Published

Abstract

The Emora Assistant Bot, part of the Emora Chat: College Companion project, was developed by members of the Emory NLP Research Lab. Motivated by the call for efficiency in executing tasks, Emora excels in managing both general administrative tasks and those specific to classroom settings, all while facilitating seamless communication among multiple users. Leveraging the Emora State Transition Dialogue Manager framework and OpenAI's GPT-3.5 Turbo API, the Assistant Bot executes seventeen different tasks through natural language interaction, offering users a conversational and efficient experience. An innovative automated evaluation approach utilizing the GPT-3.5 language model is used to evaluate this task-oriented chatbot, providing valuable insights into Emora's performance and highlights areas for improvement.

Conducting automatic evaluations revealed limitations with the STDM framework, yet Emora demonstrated successes in information extraction and task categorization, underscoring her capability for seamless task execution. Moreover, despite occasional inconsistencies, the GPT simulation emerged as a promising method for evaluating task-oriented chatbots. Between two professor and 20 student profiles, Emora had an average success rate of 94.3% for task execution, and 94% for natural language understanding. The GPT simulation displayed an average success rate of about 81%.

Through this research, the Emora Assistant Bot project emerges as a pioneering solution for automating administrative tasks, showcasing the potential of large language models in both task execution and evaluation within the realm of chatbots.

Table of Contents

1 Introduction 1

1.1 Motivation................................. 2

1.2 Merits................................... 4

1.3 Thesis Statement ............................. 4

2 Background 6

2.1 Task-Oriented Chatbots ......................... 6

2.1.1 One-on-One Chatbots ...................... 6

2.1.2 Multiple User Chatbots ..................... 7

2.2 AutomatedEvaluations.......................... 9

2.2.1 GPT Evaluations in NLP Tasks................. 9

2.2.2 Automated Chatbot Evaluations ................ 9

2.3 Preliminary Work............................. 11

2.3.1 Emora Course Assistant and Previous Assistant Bot . . . . . . 11

2.3.2 Dialogue Evaluation with GPT ................. 12

3 Approach 14

3.1 Framework................................. 14

3.2 Database.................................. 17

3.3 Features.................................. 18

3.3.1 Names and Hub.......................... 18

3.3.2 Appointments........................... 18

3.3.3 Messages.............................. 21

3.3.4 Groups............................... 21

4 Experiments 31

4.1 GPT Evaluation.............................. 31

4.1.1 Evaluation Framework ...................... 32

4.1.2 User Profiles............................ 32

4.1.3 Simulation Process ........................ 34

4.2 Evaluation Criterion and Metrics .................... 36

4.2.1 Emora Assistant Chatbot .................... 36

4.2.2 GPT Simulation ......................... 38

4.3 Results and Discussion .......................... 39

4.3.1 Results from Version 1 ...................... 40

4.3.2 Results from Version 2 ...................... 43

5 Analysis 47

5.1 Analysis of Emora Errors......................... 47

5.1.1 Inappropriate Responses ..................... 48

5.1.2 Improper Input Extraction.................... 49

5.1.3 STDM Limitations ........................ 51

5.2 Analysis of GPT Simulation Errors ................... 56

5.3 Possible Explanations and Recommendations. . . . . . . . . . . . . . 61

6 Conclusion 62

6.1 Future Directions............................. 63

6.2 Research Insights ............................. 64

A Appendix 66

A.1 GPT Function Used for Emora’s Input Extraction . . . . . . . . . . . 66

A.2 GPT Prompts Used for GPT Simulation ................ 68

Bibliography 86 

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files