Using Deep Recurrent Neural Networks to Estimate Influenza Prevalence from Mobile Phone Records Public
Song, Congzheng (2016)
Abstract
Early detection of Influenza would save millions of people from suffering and death, however the detection itself still remains challenging. Previous influenza surveillance systems require the clinical data of infected individuals or search queries about Influenza which depend heavily on Internet usage. On the other hand, another data resource, mobile phone data, remains popular globally. The hypothesis in our project is that we can use mobile phone data to model the individual behavioral change and apply them to a larger population with unknown prevalence to accurately detect new diseases or epidemics in their early stages. In this thesis, we will focus on a more specific question towards validating the hypothesis. That is, can we detect human behavioral changes using mobile phone data and to use them to build an appropriate individual sickness prediction model? To answer this question, we first define several metrics that can be extracted from mobile phone data and show that they exhibit the behavioral changes when people are sick. Next, we setup a supervised learning task where we want to predict when a mobile phone user will be sick given the set of metrics we define. We further develop a novel deep learning model for this task and show that our model outperform other machine learning models.
Table of Contents
1 Introduction. 1
1.1 Background. 1
1.2 Motivation. 2
1.3 Outline. 2
2 Related Work. 4
2.1 Human Behavior Study. 4
2.2 Classification Using CDR data. 5
2.3 Differences in Our Work. 5
3 Dataset Description 6
3.1 Icelandic CDR Data. 6
3.2 Onset Data. 8
3.3 Privacy and Approvals. 9
3.4 Grouping Data. 9
4 Human Behavior Metrics. 11
4.1 Notation. 11
4.2 Past Top Information. 12
4.3 Call Pattern Metrics. 13
4.3.1 Basic Phone Usage. 14
4.3.2 Social Behavior. 14
4.4 Mobility Metrics. 15
4.4.1 Traveling Diversity. 15
4.4.2 Movement Behavior. 16
4.5 GPRS Data Usage. 17
4.6 Categorical Variables. 17
4.7 Analysis of Metrics. 18
4.7.1 Box-Plots. 18
4.7.2 Observations. 18
5 Model. 22
5.1 Preliminaries. 22
5.1.1 Supervised Learning. 22
5.1.2 Sequence Classification. 24
5.1.3 Gradient Based Learning. 26
5.2 Problem Formalization. 27
5.2.1 Sick Days Classification. 27
5.2.2 Labeling Sick Days. 28
5.2.3 Binning Records. 28
5.2.4 Sequential Data. 29
5.3 Proposed Methods. 29
5.3.1 Hierarchical LSTM. 29
5.3.2 Attention Mechanism. 31
6 Experiments. 33
6.1 Preprocessing Steps. 33
6.1.1 Data Imputation. 33
6.1.2 One-Hot Encoding. 33
6.1.3 Data Normalization. 34
6.1.4 Data Filtering. 34
6.2 Model Configuration. 34
6.2.1 Classifier. 34
6.2.2 Optimization. 35
6.2.3 Regularization. 36
6.2.4 Model Parameters. 36
6.3 Models for Comparison. 37
6.3.1 Non-Sequential Models. 38
6.3.2 Sequential Models. 38
6.4 Results and Analysis. 39
6.4.1 Receiver Operating Characteristic. 39
6.4.2 Analysis and Discussion. 39
7 Conclusion. 43
7.1 Summary of Work. 43
7.2 Future Directions. 44
Appendix A. 51
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Using Deep Recurrent Neural Networks to Estimate Influenza Prevalence from Mobile Phone Records () | 2018-08-28 15:22:35 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|