Using Deep Recurrent Neural Networks to Estimate Influenza Prevalence from Mobile Phone Records Public

Song, Congzheng (2016)

Permanent URL: https://etd.library.emory.edu/concern/etds/z603qz20p?locale=fr
Published

Abstract

Early detection of Influenza would save millions of people from suffering and death, however the detection itself still remains challenging. Previous influenza surveillance systems require the clinical data of infected individuals or search queries about Influenza which depend heavily on Internet usage. On the other hand, another data resource, mobile phone data, remains popular globally. The hypothesis in our project is that we can use mobile phone data to model the individual behavioral change and apply them to a larger population with unknown prevalence to accurately detect new diseases or epidemics in their early stages. In this thesis, we will focus on a more specific question towards validating the hypothesis. That is, can we detect human behavioral changes using mobile phone data and to use them to build an appropriate individual sickness prediction model? To answer this question, we first define several metrics that can be extracted from mobile phone data and show that they exhibit the behavioral changes when people are sick. Next, we setup a supervised learning task where we want to predict when a mobile phone user will be sick given the set of metrics we define. We further develop a novel deep learning model for this task and show that our model outperform other machine learning models.

Table of Contents

1 Introduction. 1

1.1 Background. 1

1.2 Motivation. 2

1.3 Outline. 2

2 Related Work. 4

2.1 Human Behavior Study. 4

2.2 Classification Using CDR data. 5

2.3 Differences in Our Work. 5

3 Dataset Description 6

3.1 Icelandic CDR Data. 6

3.2 Onset Data. 8

3.3 Privacy and Approvals. 9

3.4 Grouping Data. 9

4 Human Behavior Metrics. 11

4.1 Notation. 11

4.2 Past Top Information. 12

4.3 Call Pattern Metrics. 13

4.3.1 Basic Phone Usage. 14

4.3.2 Social Behavior. 14

4.4 Mobility Metrics. 15

4.4.1 Traveling Diversity. 15

4.4.2 Movement Behavior. 16

4.5 GPRS Data Usage. 17

4.6 Categorical Variables. 17

4.7 Analysis of Metrics. 18

4.7.1 Box-Plots. 18

4.7.2 Observations. 18

5 Model. 22

5.1 Preliminaries. 22

5.1.1 Supervised Learning. 22

5.1.2 Sequence Classification. 24

5.1.3 Gradient Based Learning. 26

5.2 Problem Formalization. 27

5.2.1 Sick Days Classification. 27

5.2.2 Labeling Sick Days. 28

5.2.3 Binning Records. 28

5.2.4 Sequential Data. 29

5.3 Proposed Methods. 29

5.3.1 Hierarchical LSTM. 29

5.3.2 Attention Mechanism. 31

6 Experiments. 33

6.1 Preprocessing Steps. 33

6.1.1 Data Imputation. 33

6.1.2 One-Hot Encoding. 33

6.1.3 Data Normalization. 34

6.1.4 Data Filtering. 34

6.2 Model Configuration. 34

6.2.1 Classifier. 34

6.2.2 Optimization. 35

6.2.3 Regularization. 36

6.2.4 Model Parameters. 36

6.3 Models for Comparison. 37

6.3.1 Non-Sequential Models. 38

6.3.2 Sequential Models. 38

6.4 Results and Analysis. 39

6.4.1 Receiver Operating Characteristic. 39

6.4.2 Analysis and Discussion. 39

7 Conclusion. 43

7.1 Summary of Work. 43

7.2 Future Directions. 44

Appendix A. 51

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files