Preserving Individual Privacy in Spatio-Temporal Data Analytics Open Access
Fan, Liyue (2014)
Abstract
We live in the age of big data. With an increasing number of people, devices, and sensors connected with digital networks, individual data now can be largely collected and analyzed by data mining applications for social good as well as for commercial interests. However, the data generated by individual users exhibit unique behavioral patterns and sensitive information, and therefore must be transformed prior to the release for analysis. The AOL search log release in 2006 is an example of privacy catastrophe, where the searches of an innocent citizen were quickly re-identified by a newspaper journalist. In this dissertation, I present a novel framework to release continuous aggregation of private data for an important class of real-time data mining tasks, such as disease outbreak detection and web mining, to name a few. The key innovation is that the proposed framework captures the underlying dynamics of the continual aggregate statistics with time series state-space models, and simultaneously adopts filtering techniques to correct the observed, noisy data. It can be shown that the new framework provides a rigorous, provable privacy guarantee to individual data contributors without compromising the output analysis results. Extensive empirical studies confirm that it will enable privacy-preserving data analytical tasks in a broad range of application domains.
Table of Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 1
1.2 Research Contributions . . . . . . . . . . . . . . . . . . . .
. . 4
1.2.1 Real-Time Aggregate Monitoring with Differential Privacy
(Chapter 3) . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Estimating Individual Privacy Risk with Domain Statistics
(Chapter 4) . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Modeling Spatio-Temporal Correlation of Multi-Variate
Aggregate Time Series (Chapter 5) . . . . . . . . . . . 6
2 Related Works 8
2.1 Differential Privacy . . . . . . . . . . . . . . . . . . . . .
. . . 8
2.2 Time Series Analysis and Privacy . . . . . . . . . . . . . . .
. 9
2.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 10
2.4 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . .
. 11
3 Real-Time Aggregate Monitoring with Differential Privacy 13
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . .
. . 13
3.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . .
13
3.1.2 Differential Privacy . . . . . . . . . . . . . . . . . . . .
14
3.1.3 Existing Solutions . . . . . . . . . . . . . . . . . . . . .
16
3.2 FAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 17
3.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3.2.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . .
. . 33
3.3.1 Effects of Filtering . . . . . . . . . . . . . . . . . . . .
35
3.3.2 Effects of Sampling . . . . . . . . . . . . . . . . . . . .
38
3.3.3 Utility vs. Privacy . . . . . . . . . . . . . . . . . . . .
41
3.3.4 Detection and Correlation . . . . . . . . . . . . . . . .
43
4 Analyzing Individual Privacy Risk for Releasing Long Aggregate
Series 45
4.1 Differentially Private Anomaly Detection . . . . . . . . . . .
. 45
4.1.1 Privacy Risk Definition . . . . . . . . . . . . . . . . . .
45
4.1.2 Epidemic Outbreak Detection . . . . . . . . . . . . . .
48
4.1.3 Sensitivity Analysis. . . . . . . . . . . . . . . . . . . .
49
4.1.4 Laplace Perturbation . . . . . . . . . . . . . . . . . . .
50
4.1.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . .
. . 52
4.2.1 Accuracy of Released Data . . . . . . . . . . . . . . . .
54
4.2.2 Outbreak Detection Evaluation . . . . . . . . . . . . .
56
5 Modeling Spatio-Temporal Correlation of Multi-Variate Aggregate
Time Series 59
5.1 Differentially Private Web Monitoring . . . . . . . . . . . . .
. 59
5.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . .
59
5.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . .
. . 62
5.2.1 Univariate Time Series Approach . . . . . . . . . . . .
63
5.2.2 Multivariate Time Series Approach . . . . . . . . . . .
65
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . .
. . 68
5.3.1 Simulating Dynamic Browsing Sessions . . . . . . . . .
69
5.3.2 Learning Models . . . . . . . . . . . . . . . . . . . . .
71
5.3.3 Utility Evaluation . . . . . . . . . . . . . . . . . . . . .
72
5.3.4 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . .
78
5.4 Differentially Private Traffic Monitoring . . . . . . . . . . .
. 79
5.4.1 Problem Statement . . . . . . . . . . . . . . . . . . . .
79
5.4.2 Baseline Solution - LPA Algorithm . . . . . . . . . . .
80
5.5 Proposed Solutions . . . . . . . . . . . . . . . . . . . . . .
. . 81
5.5.1 Temporal Estimation . . . . . . . . . . . . . . . . . . .
82
5.5.2 Spatial Estimation . . . . . . . . . . . . . . . . . . . .
83
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . .
. . 86
5.6.1 Parameter Impacts . . . . . . . . . . . . . . . . . . . .
89
5.6.2 Utility Performance . . . . . . . . . . . . . . . . . . . .
91
5.6.3 Runtime Performance . . . . . . . . . . . . . . . . . . .
93
6 Conclusion and Future Work 95
6.1 Summary of Dissertation . . . . . . . . . . . . . . . . . . . .
. 96
6.2 Recommendations for Future Work . . . . . . . . . . . . . . .
99
Appendix 101
Bibliography 106
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Preserving Individual Privacy in Spatio-Temporal Data Analytics () | 2018-08-28 12:56:16 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|