Modeling Temporal Dynamics in User Generated Content Pubblico

Wang, Yu (2014)

Permanent URL: https://etd.library.emory.edu/concern/etds/zc77sq31q?locale=it
Published

Abstract

The evolving nature of user generated content (UGC) lays the key characteristics of Web 2.0. The evolution process in UGC offers valuable evidence to explain the content dynamics in the past and predict trends in the future. In this thesis, we design models to analyze content evolution patterns of UGC in three granularities: words, topics and sentiment. More specifically, this thesis investigates content evolution in the following aspects: (1) on word-level dyanmics: analyzing word frequency change in collaboratively generated content and using historical word frequencies to better weigh the words in ranking functions; (2) on topic-level dynamics: learning temporal transition patterns of topics in microblog streams and predict future topics according to historical posts; (3) on sentiment-level dynamics: estimating and understanding different sentiment change patterns of popular political topics across different user groups. We show that the developed models enable new applications in UGC, such as improving content-based ranking, anticipating future popular topics and visualizing and interpreting sentiment dynamics.

Table of Contents

Contents

1 Introduction

1.1 Definition of User Generated Content (UGC) . . . . . . . . . . . . . . . . . . . . . 2

1.2 Dynamics in User Generated Content . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Motivation and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Word Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Topic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.3 Sentiment Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.4 Temporal-Dependent User Behavior . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Dissertation Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Background and related work

2.1 Ranking and Term Weighting for Search . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Topic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Twitter User Classification and Attributes Inference . . . . . . . . . . . . . . . 20

3 Modeling Word Dynamics in Collaboratively Generated Content

3.1 Word Dynamics in Wikipedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Revision History Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Global Revision History Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.2 Revision History Burst Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 Edit History Burst Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.4 Incorporating Word Dynamics in Retrieval Models . . . . . . . . . . . . . . 30

3.2.5 RHA in BM25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.6 RHA for Statistical Language Models . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Results on the INEX data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Results on the TREC data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Word Frequency Change in Collaboratively Generated Content . . . . . . 36

3.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4.3 Analysis of Word Frequency Change . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Modeling Topic Dynamics in Microblogging Content

4.1 Topic Dynamics in Tweet Streams . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Modeling Topic Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44

4.2.1 Temporal Latent Dirichlet Allocation (TM-LDA) . . . . . . . . . . . . . . . 45

4.2.2 Error Function of TM-LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.3 Iterative Minimization of the Error Function . . . . . . . . . . . . . . . . 47

4.2.4 Direct Minimization of the Error Function . . . . . . . . . . . . . . . . . . 48

4.3 TM-LDA for Twitter Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Updating Transition Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5.2 Using Perplexity as Evaluation Metric . . . . . . . . . . . . . . . . . . . . . 53

4.5.3 Predicting Future Tweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.4 Efficiency of Updating Transition Parameters . . . . . . . . . . . . . . . . 57

4.6 Visualization and Sensemaking of Topic Transitions . . . . . . . . . . . . 58

4.6.1 Global Topic Transition Patterns . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6.2 Various Topic Transition Patterns by Cities . . . . . . . . . . . . . . . . . 60

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Modeling and Analyzing Sentiment Dynamics in Microblogging Content

5.1 Sentiment and Sentiment Dynamics in Twitter . . . . . . . . . . . . . . . .64

5.1.1 Political Sentiment Classification . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.2 Classifier Feature Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Case Study: Defense of Marriage Act . . . . . . . . . . . . . . . . . . . . . 67

5.3 Inferring Latent User Characteristics for Analyzing Sentiment . . . . 73

5.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76

5.3.3 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.3.4 Intrinsic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3.5 Extrinsic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.3.6 Group Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6 Modeling Temporal-Dependent User Behavior in Microblogging Content

6.1 Modeling Time of Users' Posts to Improve User Characteristic Inference 102

6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.1.2 UserTime Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.1.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.2 Modeling Temporal Order of Users' Followings to Measure Link Importance

6.2.1 Retweets as A Measure of Link Importance . . . . . . . . . . . . . . . . . . . 114

6.2.2 Temporal Following Preference across User Characteristics . . . . . . . . 115

6.2.3 Retweeting Preference for Users with Different Characteristics . . . . . 116

6.2.4 Correlations between Retweets and Followings . . . . . . . . . . . . . . . . 118

6.2.5 Correlations between Early Followings and Overall Followings . . . . . . 119

6.2.6 Implications for Sentiment Dynamics . . . . . . . . . . . . . . . . . . . . . . .119

6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7 Visualizing Temporal Dynamics in Social Media

7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.2 Goals and Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8 Conclusions and Future Work

8.1 Contributions of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8.2 Contributions of Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135

8.3 Limitations and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

8.4 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8.5 Overall Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Parola chiave
Committee Chair / Thesis Advisor
Committee Members
Ultima modifica

Primary PDF

Supplemental Files