Sentiment-based Open-domain Event Detection for COVID-19 公开
Hu, Xinyi (Spring 2021)
Abstract
As one of the most popular social media platforms in recent years, Twitter has provided a database
containing abundant information reflecting the public’s reactions to various events and discussions. Many
sociological researchers and news agencies have accustomed to collecting and processing Twitter data to
achieve opinion-mining or detect significant social events. The importance of event detection has become
even more remarkable during the special time of the global pandemic because it’s crucial to keep the
public informed timely about social subjects like change of policy and disease prevention strategies.
The main goal of this research is extracting major social events occurring in the bud stage of the
coronavirus in the United States. The major focus of this research is to carefully examine whether
sentiment-based event detection can be successfully implemented when the focal event is essentially
negative. In this case, the pandemic is a worldwide public health emergency, which results in a large bias
on the emotion polarity of tweets. This study employs a data set that covers more than a million English
tweets that contains keywords about Covid-19 posted in a month span. Sentiment analysis tools such as
Stanford CoreNLP and Hedonometer calculates the emotion score of tweets, enabling the researcher to
apply mathematical models that define emotion spike to determine whether an event has occurred on
certain day. In addition, after discovering that sentiment-based event detection, especially with Stanford
CoreNLP, can efficiently discerns hot spot occurrences, this research utilizes Topic Modeling and NER
(named-entity recognition) to draw out words and phrases to help summarize the possible social events.
Table of Contents
1 Introduction 1
2 Background 5
2.1 Social Media Data Mining for Event Detection . . . . . . . . . 5
2.2 Sentiment-based Event Detection with Token and Hashtag . . 6
2.3 Topic Clustering on Social Media Posts . . . . . . . . . . . . . 11
2.4 Location-specifific Event Detection . . . . . . . . . . . . . . . . 12
2.5 Sentiment-based Event Detection for Tokens with Negative Connotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 The Corpus 15
3.1 Twitter Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Choice of Time Span . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Language Selecting and Filtering . . . . . . . . . . . . . . . . 19
3.4 Particular Content Removal . . . . . . . . . . . . . . . . . . . 21
i3.5 Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Approach 24
4.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Sentiment Analysis with Stanford CoreNLP . . . . . . 26
4.1.2 Sentiment Analysis with Hedonometer . . . . . . . . . 27
4.1.3 Location-Specifific Sentiment Analysis with Stanford CoreNLP . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 URL Link Groupings . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 News Article Scraping . . . . . . . . . . . . . . . . . . 30
4.3 Key Word Extraction . . . . . . . . . . . . . . . . . . . . . . . 31
4.3.1 Topic Modeling with Gensim . . . . . . . . . . . . . . . 31
4.3.2 Named-Entity Recognition . . . . . . . . . . . . . . . . 32
5 Experiments 33
5.1 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.1 Sentiment Analysis with Stanford CoreNLP . . . . . . 33
5.1.2 Sentiment Analysis with Hedonometer . . . . . . . . . 34
5.1.3 Location-Specifific Sentiment Analysis with Stanford CoreNLP . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.4 Location-Specifific Sentiment Analysis with Hedonometer 365.2 URL Link Groupings . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.1 News Article Scraping . . . . . . . . . . . . . . . . . . 38
5.3 Key Word Extraction . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.1 Topic Modeling with Gensim on Tweets . . . . . . . . 39
5.3.2 Topic Modeling with Gensim on News Articles . . . . . 41
5.3.3 Named-Entity Recognition on tweets . . . . . . . . . . 42
5.3.4 Named-Entity Recognition on News Articles . . . . . . 43
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Conclusion 51
Appendix 7 - Complete Results
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Sentiment-based Open-domain Event Detection for COVID-19 () | 2021-03-31 16:16:18 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|