Sentiment-based Open-domain Event Detection for COVID-19 公开

Hu, Xinyi (Spring 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/9c67wn89g?locale=zh

Published

Abstract

As one of the most popular social media platforms in recent years, Twitter has provided a database

containing abundant information reflecting the public’s reactions to various events and discussions. Many

sociological researchers and news agencies have accustomed to collecting and processing Twitter data to

achieve opinion-mining or detect significant social events. The importance of event detection has become

even more remarkable during the special time of the global pandemic because it’s crucial to keep the

public informed timely about social subjects like change of policy and disease prevention strategies.

The main goal of this research is extracting major social events occurring in the bud stage of the

coronavirus in the United States. The major focus of this research is to carefully examine whether

sentiment-based event detection can be successfully implemented when the focal event is essentially

negative. In this case, the pandemic is a worldwide public health emergency, which results in a large bias

on the emotion polarity of tweets. This study employs a data set that covers more than a million English

tweets that contains keywords about Covid-19 posted in a month span. Sentiment analysis tools such as

Stanford CoreNLP and Hedonometer calculates the emotion score of tweets, enabling the researcher to

apply mathematical models that define emotion spike to determine whether an event has occurred on

certain day. In addition, after discovering that sentiment-based event detection, especially with Stanford

CoreNLP, can efficiently discerns hot spot occurrences, this research utilizes Topic Modeling and NER

(named-entity recognition) to draw out words and phrases to help summarize the possible social events.

1 Introduction 1

2 Background 5

2.1 Social Media Data Mining for Event Detection . . . . . . . . . 5

2.2 Sentiment-based Event Detection with Token and Hashtag . . 6

2.3 Topic Clustering on Social Media Posts . . . . . . . . . . . . . 11

2.4 Location-specifific Event Detection . . . . . . . . . . . . . . . . 12

2.5 Sentiment-based Event Detection for Tokens with Negative Connotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 The Corpus 15

3.1 Twitter Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Choice of Time Span . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Language Selecting and Filtering . . . . . . . . . . . . . . . . 19

3.4 Particular Content Removal . . . . . . . . . . . . . . . . . . . 21

i3.5 Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Approach 24

4.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Sentiment Analysis with Stanford CoreNLP . . . . . . 26

4.1.2 Sentiment Analysis with Hedonometer . . . . . . . . . 27

4.1.3 Location-Specifific Sentiment Analysis with Stanford CoreNLP . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 URL Link Groupings . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 News Article Scraping . . . . . . . . . . . . . . . . . . 30

4.3 Key Word Extraction . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1 Topic Modeling with Gensim . . . . . . . . . . . . . . . 31

4.3.2 Named-Entity Recognition . . . . . . . . . . . . . . . . 32

5 Experiments 33

5.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1.1 Sentiment Analysis with Stanford CoreNLP . . . . . . 33

5.1.2 Sentiment Analysis with Hedonometer . . . . . . . . . 34

5.1.3 Location-Specifific Sentiment Analysis with Stanford CoreNLP . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1.4 Location-Specifific Sentiment Analysis with Hedonometer 365.2 URL Link Groupings . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.1 News Article Scraping . . . . . . . . . . . . . . . . . . 38

5.3 Key Word Extraction . . . . . . . . . . . . . . . . . . . . . . . 39

5.3.1 Topic Modeling with Gensim on Tweets . . . . . . . . 39

5.3.2 Topic Modeling with Gensim on News Articles . . . . . 41

5.3.3 Named-Entity Recognition on tweets . . . . . . . . . . 42

5.3.4 Named-Entity Recognition on News Articles . . . . . . 43

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.5 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Conclusion 51

Appendix 7 - Complete Results

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Quantitative Science
Degree	B.S.
Submission	Honors Thesis
Language	English
Research Field	Statistics Computer Science Sociology, Theory and Methods
关键词	Social Media Twitter NLP
Committee Chair / Thesis Advisor	Glynn, Adam, Emory University Franzosi, Roberto, Emory University
Committee Members	Sinykin, Daniel, Emory University

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Sentiment-based Open-domain Event Detection for COVID-19 ()	2021-03-31 16:16:18 -0400	Download

Sentiment-based Open-domain Event Detection for COVID-19 公开

Hu, Xinyi (Spring 2021)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files