Two Essays on Content Engineering with Unstructured Data: Business Insights from User-Generated Content Open Access
Ko, Eun Hee (Spring 2019)
Abstract
A primary driver behind the topics in my dissertation essays is the desire to address the challenges that marketing practitioners front into the market environment, where consumer behaviors are changing quickly with the expansion of platforms into new media that are native to computers or mobile devices, which have prompted continuous growth in marketing expenditures. While there is a wide range of research that studies user-generated content (UGC) and its impact on marketing or consumer purchasing behavior, few studies highlight the content characteristics with large-scale data from the field. Moreover, most of the existing empirical research that studies the semanticity of UGC pays limited attention to content beyond the text. To fill this gap, I have initiated and advanced several projects to investigate the content features not only from texts but images in my Ph.D. program. In doing so, I bring a variety of methodological approaches to my research (natural language processing, machine learning, and image processing techniques), having merged public and proprietary datasets – both longitudinal and cross-sectional. The first essay of my dissertation examines consumer engagement, measured as the number of likes and comments tied to a brand-themed social media post on Instagram. I study consumer engagement with brand-themed user-generated content – imaged-based social media posts tagged with #brandname – an increasingly common way that consumers engage with brands. I describe consumer engagement using characteristics of the image and the text of a post – visual sentiment, visual complexity, text sentiment, and text complexity – which I craft using techniques that include deep convolutional neural networks (Deep CNNs), and both a computer vision application programming interface (API) and natural language processing (NLP). Using data from over 86,000 Instagram posts collectively hashtagged with 86 product brand names, I find that visual sentiment and text sentiment are positively associated with higher levels of consumer engagement. Visual complexity and text complexity both positively affect consumer engagement at low and moderate levels, and become negative at high levels. Too much information either from images or from texts attenuates consumer engagement. Around the middle of the range of visual complexity there is an optimal level that makes a post rich and engaging. The second essay of my dissertation investigates factors that characterize manipulated reviews by concentrating on unstructured text data and brand strength as a factor associated with suspicious online review incidences. Studying over 270,000 Amazon.com reviews from 16 product categories, I find that approximately 3% of reviews are ones consumers would be suspicious about. Extreme emotions (e.g., fear, joy) account for a review being viewed as suspicious better than mixed emotions (e.g., anticipation, surprise) or low-arousal emotions (e.g., sadness). I argue that weaker brands have an incentive for review manipulation. I find that a weak brand status, described by lower advertising effort, is associated with suspicious reviews that are promotional (positive) in nature. Though, the effect fades away for suspicious reviews that are denigrating (negative).
Table of Contents
Chapter 1. Overview .......................................................................................................................................... 1
1.1. Introduction ............................................................................................................................................ 1
1.2. User-Generated Content ....................................................................................................................... 2
1.3. Artificial Intelligence in Marketing ...................................................................................................... 3
1.4. Agenda of the Dissertation ................................................................................................................... 4
Chapter 2. Content Engineering of Images: The Effect of Sentiment and Complexity on Consumer
Engagement with Brand-Themed User-Generated Content ......................................................... 6
2.1. Introduction ............................................................................................................................................ 6
2.2. Background ............................................................................................................................................ 9
2.2.1. Consumer Engagement in Social Media ................................................................................... 11
2.2.2. Content Marketing ....................................................................................................................... 12
2.2.3. Machine Learning and Social Media ......................................................................................... 13
2.3. Conceptual Framework ...................................................................................................................... 13
2.3.1. Characteristics of the Post .......................................................................................................... 14
2.3.2. Brand Characteristics ................................................................................................................... 17
2.3.3. User Characteristics ..................................................................................................................... 18
2.4. Data ....................................................................................................................................................... 18
2.4.1. Raw Data and Sample Selection Criteria .................................................................................. 19
2.4.2. Variable Crafting of User Post Data ......................................................................................... 21
2.4.2.1. Deep CNNs for Visual Sentiment: DeepSentiBank .......................................................... 25
2.4.2.2. Computer Vision API, NLP, and Clustering for Visual Complexity and Object
Types .......................................................................................................................................... 27
2.4.2.3. Text Variables ...................................................................................................................... 32
2.4.3. Descriptive Statistics .................................................................................................................... 34
2.5. Empirical Analysis ............................................................................................................................... 40
2.5.1. Results ........................................................................................................................................... 41
2.5.1.1. Consumer Engagement and Image Content ................................................................... 42
2.5.1.2. Consumer Engagement and Text Content ...................................................................... 42
2.5.1.3. Consumer Engagement and Brand Characteristics ........................................................ 46
2.5.1.4. Consumer Engagement and User Characteristics .......................................................... 47
2.5.2. Simulation ..................................................................................................................................... 50
2.5.3. Robustness Check ....................................................................................................................... 50
2.5.4. Accounting for Commercial Posts and Multiple Brands ........................................................ 52
2.5.4.1. Data Cleaning Process ........................................................................................................ 52
2.5.4.2. Descriptive Statistics ........................................................................................................... 54
2.5.4.3. Empirical Strategy and Analysis Results ......................................................................... 64
2.5.4.4. Robustness Check ............................................................................................................... 71
2.6. Managerial Implications and Conclusions ....................................................................................... 76
Chapter 3. Suspicious Online Product Reviews and Brand Advertising Effort ..................................... 80
3.1. Introduction .......................................................................................................................................... 80
3.2. Background .......................................................................................................................................... 84
3.3. Create the Analysis Sample ................................................................................................................ 85
3.3.1. Initial Processing .......................................................................................................................... 87
3.3.2. Merging Advertising Expenditure Data ................................................................................... 87
3.3.3. Final Preprocessing ...................................................................................................................... 88
3.4. Classify and Label Reviews as Suspicious (or Not) ........................................................................ 90
3.4.1. Selecting a Training Set for Use by Human Evaluators ......................................................... 90
3.4.2. Coding the Training Set as Suspicious (or Not) Using Human Evaluators ........................ 91
3.4.3. Coding the Full Dataset as Suspicious (or Not) Using Semi-Supervised Classification .... 92
3.4.4. Results from Semi-Supervised Classification and Comparison with Supervised
Classifiers ......................................................................................................................................... 94
3.5. Characterize Suspicious Reviews Using Semantic Features .......................................................... 95
3.5.1. Semantic Characteristics of Suspicious Reviews .................................................................... 95
3.5.2. Robustness Check with Holdout Sample .............................................................................. 102
3.5.2.1. Diagnostic Analysis ........................................................................................................... 104
3.5.2.2. Predicted Power: Accuracy and ROC Curve ................................................................. 104
3.6. Predicted Modeling with Alternative Machine Learning Classifiers .......................................... 104
3.7. Explore Word2vec Model ................................................................................................................ 109
3.7.1. Labelling Texts .......................................................................................................................... 110
3.7.2. Fine-Tuning Learned Word Embedding from Word2vec .................................................. 111
3.7.3. Naïve Bayes ................................................................................................................................ 112
3.7.4. Random Forest .......................................................................................................................... 112
3.7.5. Result ........................................................................................................................................... 112
3.7.6. Conclusions ................................................................................................................................ 115
3.8. Merge Suspicious Reviews and Brand Advertising Effort .......................................................... 115
3.8.1. Determining the Cutoff ............................................................................................................ 118
3.8.2. Validation and Implementation of RD Design .................................................................... 118
3.8.3. Robustness ................................................................................................................................. 125
3.9. Beta Regression Model and Category Effect ................................................................................ 125
3.10. Conclusions ...................................................................................................................................... 131
Chapter 4. Conclusions ................................................................................................................................ 135
Bibliography ................................................................................................................................................... 137
Appendix 1. Brand List ................................................................................................................................ 151
Appendix 2. Instructions for Students (Instagram Data) ........................................................................ 153
Appendix 3. Objects Detected .................................................................................................................... 156
Appendix 4. Robustness Check ................................................................................................................... 157
Appendix 5. Spam Review Detection Algorithms .................................................................................... 162
Appendix 6. Instructions Given to Human Evaluators .......................................................................... 168
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Two Essays on Content Engineering with Unstructured Data: Business Insights from User-Generated Content () | 2019-04-25 10:37:05 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|