Unsupervised, Context-Aware Emotion Classification of College-Related Reddit Posts Open Access

Huang, Xiaoyuan (Spring 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/12579t60m?locale=en


As emotion plays an important role in conversations, empathetic dialogue systems have been developed to be used in fields such as business and healthcare. However, there is a lack of such chatbots in the higher education sector. To develop such dialogue systems, emotion detection serves as the most important step. Sentiment analysis and emotion detection on social media have been meaningful ways to diagnose emotions, understand user behaviors, and help improve empathetic agents. Current work has focused on machine learning and rule-based approaches, but the number of emotion labels of many existing models is limited. Therefore, inspired by the gap between higher education and emotion-related tasks in the Natural Language Processing field, the goal of this thesis is to develop a novel and well-performed emotion classifier specifically targeting college-related social media contents and producing more elaborated emotion labels than existing emotion classifiers. This thesis achieved this goal by three main steps. The first step was to generate a task-specific dataset for model development. The second step was to develop baseline models using Transformer trained on Empathetic Dialogues for basic emotion detection. The third part was to improve these baseline models by developing unsupervised models that overcome difficulties of detecting neutrality in the baseline models and target higher education contents with better model performances. This work would provide a meaningful tool for more fine-grained emotion detection in college-related textual data and future chatbot developments in higher education as an innovative solution for institutions. 

Table of Contents

1 Introduction 1

1.1 Thesis Statement 3

2 Background 5

2.1 Alexa Prize Socialbot Grand Challenge IV 5

2.2 CollaborationWork 6

2.3 EmotionAnalysis 7

2.3.1 Theories of Emotion 7

2.3.2 Textual Emotion Detection 10

2.3.3 Emotion-Cause Pair Extraction 12

2.3.4 EmpatheticDialogueSystems 13

2.4 Sentiment Analysis in Social Media 14

2.5 TransformerModels 15

3 Dataset 17

3.1 Empathetic Dialogues (ED-32) 17

3.2 College-Related Reddit Posts 18

3.3 Data Annotation 20

3.3.1 Merged Empathetic Dialogues (ED-8) and Neutrality 20

3.3.2 MTurk Tasks 21

3.3.3 Self-Annotated Reddit Posts 30

4 Emotion Classification 32

4.1 Emotion Distributions in Datasets 32

4.2 Transformer Baseline Models 34

4.2.1 32To8 Single-Label Approach 35

4.2.2 Merged-8 Single-Label Approach 35

4.2.3 Experiments and Evaluation 35

4.2.4 Results and Analysis 37

4.3 Unsupervised Models 38

4.3.1 Single-LabelApproach 44

4.3.2 Two-Label Approach 44

4.3.3 Experiments and Evaluation 45

4.3.4 Results and Analysis 48

5 Conclusion and Future Work 53

5.1 Future Work 54

5.1.1 Emotion Analysis on Reddit 54

5.1.2 Applications in Dialogue Systems 55

Bibliography 56

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files