Incorporating Social Relationships from Call Detail Records into Infectious Disease Spread Simulators 公开

Shats, Ilya (2016)

Permanent URL: https://etd.library.emory.edu/concern/etds/5h73pw77s?locale=zh
Published

Abstract

Traditionally, mathematical models and surveying have been central to studying the spread of infectious diseases. In this work, we used an anonymized call-detail-record (CDR) dataset, which contains metadata about phone calls, text messages, and data transmissions, as the foundation for predicting spread of influenza-like-illness (ILI) during the 2009 Flu Pandemic in Iceland. The CDR provides population's mobility patterns and in addition to a basic contact tracing, this data can be used to infer people's social networks. Here we show that social strength has an impact on disease spread, supporting the perhaps intuitive idea that an infected individual is likely to transmit the disease to people socially closest to him or her. To simulate ILI spread throughout populations, we built several discrete event simulators (written in the Python programming language) that are described in the second part of the thesis. Though there is still work to be done in improving the models' accuracy in predicting the spread, it is a step forward in the novel area of using cell phone metadata to model infectious disease dynamics.

Table of Contents

Section 1. Introduction 1
1.1 Motivation 1
1.2 Contribution 3
Section 2. Understanding the impact of social strength on disease spread 4
2.1 Description of data 4
2.2. Data Limitations 7
2.3. Data preprocessing 9
2.3.1. Building hashmaps from datasets 10
2.3.2. Dependency between variables 15
2.4. The impact of social strength 18
Section 3. Predicting disease spread with social strength 30
3.1. Seedset 31
3.2. Proposed Frameworks 31
3.2.1. Baseline 31
3.2.2. Disease Base Model 34
3.2.3. Disease Model 2 36
3.2.4. Social Network Base Model 37
3.2.5. Augmented Social Network Model. 38
3.3. Evaluation 41
3.4. Results and Future Work 42
Section 4. Conclusion 47
References 48

Tables and Figures

Table 1. A fragment of the CDR dataset. 5
Table 2. A fragment of the ILI-onset dataset. 5
Table 3. CDR dataset statistics 6
Table 4. ILI-Onset dataset statistics 6
Figure 1. Daily disease onset curve 6
Figure 2. Distribution of disease onset offsets among all infected pairs 7
Figure 3: Distribution of call duration.. 12
Figure 4: Distribution of call frequency 12
Figure 5: Distribution of co-occurrence 13
Table 5. Call Duration and Call Frequency statistics. 14
Table 6. Co-occurrence statistics. 15
Figure 6. Call Frequency vs Call Duration 16
Figure 7. Co-occurrence vs Call Duration 17
Figure 8. Co-occurrence vs Call Frequency 17
Figure 9. Call Duration vs Days Offset of Disease Onset 19
Figure 10. Number of pairs in each bucket in Figure 9 19
Figure 11. Call Frequency vs Days Offset of Disease Onset 20
Figure 12. Number of pairs in each bucket in Figure 11 20
Figure 13. Co-occurrence vs Days Offset of Disease Onset 21
Figure 14. Number of pairs in each bucket in Figure 13 21
Figure 15. Percentage of Contacts Infected within 7 days vs Call Duration 22
Figure 16. Number of pairs in each bucket in Figure 15 22
Figure 17. Percentage of Contacts Infected within 7 days vs Call Frequency 23
Figure 18. Number of pairs in each bucket in Figure 17 24
Figure 19. Percentage of Contacts Infected within 7 days vs Co-occurrence 24
Figure 20. Number of pairs in each bucket in Figure 21 25
Figure 21 (25 users). Top subplot: Percentage of Top 5 Contacts Infected vs Offset of Disease Onset; Bottom subplot (Control): Percentage of Random Users Infected vs Offset of Disease Onset 26
Figure 22 (100 users). Top subplot: Percentage of Top 5 Contacts Infected vs Offset of Disease Onset; Bottom subplot (Control): Percentage of Random Users Infected vs Offset of Disease Onset 26
Figure 23 (2,125 users). Top subplot: Percentage of Top 5 Contacts Infected vs Offset of Disease Onset; Bottom subplot (Control): Percentage of Random Users Infected vs Offset of Disease Onset 27
Figure 24. % top contacts infected at offset 0 and 1 vs. Number of top contacts 28
Table 7. UID to Family ID relation. 29
Table 8. Data describing the Family Dataset. 29
Figure 25. Simulation Start 31
Figure 26. Baseline Pseudocode 34
Figure 27. Susceptible, Infected, and Recovered set. Components of the SIR model. 35
Figure 28. Augmented Social Model Pseudocode 40
Figure 29. GetIndividualProbabilityOfInfection Pseudocode 40
Figure 30. Cumulative number of people infected 41

Figure 31. Performance of baseline model 43

Figure 32. Performance of disease model 44

Figure 33. Performance of social network base model 45

Figure 34. Performance of augmented social network model 46

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
关键词
Committee Chair / Thesis Advisor
Committee Members
最新修改

Primary PDF

Supplemental Files