Preserving Individual Privacy in Spatio-Temporal Data Analytics Public

Fan, Liyue (2014)

Permanent URL: https://etd.library.emory.edu/concern/etds/p8418n73m?locale=fr
Published

Abstract

We live in the age of big data. With an increasing number of people, devices, and sensors connected with digital networks, individual data now can be largely collected and analyzed by data mining applications for social good as well as for commercial interests. However, the data generated by individual users exhibit unique behavioral patterns and sensitive information, and therefore must be transformed prior to the release for analysis. The AOL search log release in 2006 is an example of privacy catastrophe, where the searches of an innocent citizen were quickly re-identified by a newspaper journalist. In this dissertation, I present a novel framework to release continuous aggregation of private data for an important class of real-time data mining tasks, such as disease outbreak detection and web mining, to name a few. The key innovation is that the proposed framework captures the underlying dynamics of the continual aggregate statistics with time series state-space models, and simultaneously adopts filtering techniques to correct the observed, noisy data. It can be shown that the new framework provides a rigorous, provable privacy guarantee to individual data contributors without compromising the output analysis results. Extensive empirical studies confirm that it will enable privacy-preserving data analytical tasks in a broad range of application domains.

Table of Contents

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Real-Time Aggregate Monitoring with Differential Privacy (Chapter 3) . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Estimating Individual Privacy Risk with Domain Statistics (Chapter 4) . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Modeling Spatio-Temporal Correlation of Multi-Variate Aggregate Time Series (Chapter 5) . . . . . . . . . . . 6
2 Related Works 8
2.1 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Time Series Analysis and Privacy . . . . . . . . . . . . . . . . 9
2.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Real-Time Aggregate Monitoring with Differential Privacy 13
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Differential Privacy . . . . . . . . . . . . . . . . . . . . 14
3.1.3 Existing Solutions . . . . . . . . . . . . . . . . . . . . . 16
3.2 FAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Effects of Filtering . . . . . . . . . . . . . . . . . . . . 35
3.3.2 Effects of Sampling . . . . . . . . . . . . . . . . . . . . 38
3.3.3 Utility vs. Privacy . . . . . . . . . . . . . . . . . . . . 41
3.3.4 Detection and Correlation . . . . . . . . . . . . . . . . 43
4 Analyzing Individual Privacy Risk for Releasing Long Aggregate Series 45
4.1 Differentially Private Anomaly Detection . . . . . . . . . . . . 45
4.1.1 Privacy Risk Definition . . . . . . . . . . . . . . . . . . 45
4.1.2 Epidemic Outbreak Detection . . . . . . . . . . . . . . 48
4.1.3 Sensitivity Analysis. . . . . . . . . . . . . . . . . . . . 49
4.1.4 Laplace Perturbation . . . . . . . . . . . . . . . . . . . 50
4.1.5 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1 Accuracy of Released Data . . . . . . . . . . . . . . . . 54
4.2.2 Outbreak Detection Evaluation . . . . . . . . . . . . . 56
5 Modeling Spatio-Temporal Correlation of Multi-Variate Aggregate Time Series 59
5.1 Differentially Private Web Monitoring . . . . . . . . . . . . . . 59
5.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . 59
5.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.1 Univariate Time Series Approach . . . . . . . . . . . . 63
5.2.2 Multivariate Time Series Approach . . . . . . . . . . . 65
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Simulating Dynamic Browsing Sessions . . . . . . . . . 69
5.3.2 Learning Models . . . . . . . . . . . . . . . . . . . . . 71
5.3.3 Utility Evaluation . . . . . . . . . . . . . . . . . . . . . 72
5.3.4 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4 Differentially Private Traffic Monitoring . . . . . . . . . . . . 79
5.4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . 79
5.4.2 Baseline Solution - LPA Algorithm . . . . . . . . . . . 80
5.5 Proposed Solutions . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.1 Temporal Estimation . . . . . . . . . . . . . . . . . . . 82
5.5.2 Spatial Estimation . . . . . . . . . . . . . . . . . . . . 83
5.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 86
5.6.1 Parameter Impacts . . . . . . . . . . . . . . . . . . . . 89
5.6.2 Utility Performance . . . . . . . . . . . . . . . . . . . . 91
5.6.3 Runtime Performance . . . . . . . . . . . . . . . . . . . 93
6 Conclusion and Future Work 95
6.1 Summary of Dissertation . . . . . . . . . . . . . . . . . . . . . 96
6.2 Recommendations for Future Work . . . . . . . . . . . . . . . 99
Appendix 101
Bibliography 106

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files