Modeling Search Trends and Search Interests for Health 公开
Lin, Chen (Fall 2024)
Abstract
Tracking pollution exposure and infectious disease exposure is a significant public health challenge. While online search trends offer valuable insights for monitoring health-related issues, accurately capturing and predicting users' health concerns remains difficult. This challenge is compounded by the fact that users' search behavior changes frequently and varies across different sessions. Integrating additional data sources, such as geographical and environmental data, allows models to better account for external factors that compensate search data, providing a more comprehensive understanding of public health trends. Existing methods, although useful, do not fully leverage advanced neural networks and integrate multifaceted data for symptom prediction and intent categorization. To address this, my research develops a fundamental approach that leverages multimodal data integration and contextual learning across sequential, graph-based, and large language models, each tailored to specific health-related forecasting and intent categorization tasks.
The primary research questions addressed here are as follows:
RQ1: How can multimodal data sources, including online search trends, be effectively leveraged to forecast health symptoms related to environmental factors (e.g., air pollution) and infectious diseases?
RQ2: To enhance understanding of search trends and deliver responsive health information services, how can search intent for health-related queries be identified and anticipated by analyzing user interaction data, such as user click logs?
These research questions are addressed by processing various data sources using multimodal sequential learning, leveraging online search trends and integrating multi-modal data for enhanced forecasting models. Additionally, I employ unsupervised learning with limited annotation data to identify health-related search intents and apply supervised learning to anticipate health service seekers’ needs by modeling queries with both consistent and varying intents across different sessions. To achieve this, the dissertation introduces key contributions such as integrating semantic information from search queries with search trends, utilizing cross-location information for improved pandemic forecasting, and presenting a novel fine-tuning method for adapting large language models to interpret health-related queries.
The proposed methods, validated on diverse real-world datasets, demonstrate significant advancements in health search modeling. These innovations directly contribute to better pollution exposure symptom monitoring, pandemic forecasting, and the accurate interpretation of complex user search behaviors. As a result, this work offers practical solutions to real-world challenges in public health surveillance and health information systems, ultimately paving the way for more responsive, data-driven health search services that can better meet users' evolving healthcare needs.
Table of Contents
1 Introduction 1
1.1 Online Health Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges in Health Search and Trend Modeling . . . . . . . . . . . 5
1.2.1 Challenges in Modeling Search Trend for Health . . . . . . . . 5
1.2.2 Challenges in Health Search Intent Recognition . . . . . . . . 7
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Contributions and Dissertation Structure . . . . . . . . . . . . . . . . 10
1.4.1 Cross-modal Memory Fusion Network for Multimodal Sequen-
tial Learning with Missing Values . . . . . . . . . . . . . . . . 11
1.4.2 Detecting Elevated Air Pollution Levels from Web Search Data 12
1.4.3 Modeling of Web Search Activity for Real-time Pandemic Fore-
casting using Graph Neural Network . . . . . . . . . . . . . . 13
1.4.4 Enhancing Healthcare Search Intent Recognition with Query
Representation Learning and Session Context . . . . . . . . . 14
2 Related Work 17
2.1 Online Health Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Health Search Trends Modeling . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Google Flu Trends (GFT) . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Regularized Autoregressive Models . . . . . . . . . . . . . . . 24
2.2.3 Recurrent Neural Network (RNN) Models . . . . . . . . . . . 26
2.2.4 Graph Neural Network (GNN) Models . . . . . . . . . . . . . 28
2.2.5 Time Series Forecasting with Noise and Missingness . . . . . . 29
2.2.6 Time Series Forecasting with Foundation Models . . . . . . . 30
2.3 Search Query Understanding . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 Large Language Models for Health . . . . . . . . . . . . . . . 31
2.3.2 Search Query Clustering and Classification . . . . . . . . . . . 33
2.3.3 Search Intent Prediction in Web Search and Conversational
Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Modeling Search Trend for Air Pollution Detection 37
3.1 Cross-modal Memory Fusion Network for Multi-modal Sequential Learn-
ing with Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.3 Experimental Setting and Results . . . . . . . . . . . . . . . . 43
3.1.4 Analysis & Discussion . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Detecting Elevated Air Pollution Using Web Search Queries . . . . . 47
3.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.3 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.5 Analysis & Discussion . . . . . . . . . . . . . . . . . . . . . . 61
4 Modeling Search Trend for Infectious Disease Forecasting 66
4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.2 Model Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 Analysis & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5 Search Intent Understanding 84
5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.3 Pairwise Loss Function . . . . . . . . . . . . . . . . . . . . . . 92
5.2.4 Multiset Loss Function . . . . . . . . . . . . . . . . . . . . . . 92
5.2.5 Multi-Label Search Intent Classification . . . . . . . . . . . . 94
5.3 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.2 Experimental Setup for Intent Classification . . . . . . . . . . 98
5.3.3 Experimental Setup for Session-based Intent Classification . . 98
5.3.4 Baseline Models . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . 104
5.5 Analysis & Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6 Conclusion 109
6.0.1 Modeling Search Trend for Air Pollution Detection . . . . . . 109
6.0.2 Modeling Search Trend for Infectious Disease Forecasting . . . 110
6.0.3 Search Intent Understanding . . . . . . . . . . . . . . . . . . . 110
6.0.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.0.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Bibliography 114
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Modeling Search Trends and Search Interests for Health () | 2024-11-15 23:28:10 -0500 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|