Towards Scalable and Privacy-Preserving Integration of Distributed Heterogeneous Data Open Access

Jurczyk, Pawel Czeslaw (2010)

Permanent URL: https://etd.library.emory.edu/concern/etds/1z40kt13b?locale=en
Published

Abstract

With the trend of cloud computing, data and computing are moved away from desktop and are instead provided as a service from the cloud. Data-as-a-service enables access to a wealth of data across distributed and heterogeneous data sources in the cloud. It remains a challenge, however, to ensure the privacy, interoperability, and scalability for such services.
We designed and developed DObjects, a general-purpose P2P-based query and data operations infrastructure that can be deployed in the cloud and provides access to heterogeneous data sources. The system builds on top of a distributed mediator-wrapper architecture where individual system nodes serve as mediators and/or wrappers and interact with each other in a P2P fashion. As an analogy, the system nodes can be considered as droplets, small elements that provide similar functionality in the cloud. Just as thousands or millions of droplets form a single drop in nature, in cloud computing, groups of droplets that provide similar functionality can form a micro-cloud. Micro-clouds are an integral part of the whole cloud computing system.
The dissertation also discusses the novel dynamic query execution engine within the data query infrastructure that dynamically adapts to network and node conditions. The query processing is capable of fully benefiting from all the distributed resources to minimize the query response time and maximize system throughput. In addition to leveraging the traditional distributed query optimization techniques, the (sub)queries are deployed on droplets in a dynamic and iterative manner in order to guarantee the best reaction to network and resource dynamics.
Finally, the dissertation presents an extension to the basic DObjects model that enables access to private data that is distributed and needs anonymization. The extension enables droplets to form virtual groups to addresses two privacy issues for the sensitive data: privacy of data subjects and confidentiality of data providers. The dissertation discusses decentralized protocols that enable data sharing for horizontally partitioned databases given these constraints. Concretely, given a query spanning multiple databases, the query results do not contain individually identifiable information. In addition, institutions do not reveal their databases to each other apart from the query results.

Table of Contents

1 Introduction 1
1.1 Motivation ................................ 1

1.1.1 ApplicationScenarios ...................... 1

1.1.2 ResearchChallenges ....................... 3
1.2 Contributions............................... 5

1.3 Organization ............................... 7
2 Related Work 8
2.1 Introduction................................ 8

2.2 DistributedDatabasesandQueryProcessing . . . . . . . . . . . . . 8

2.3 DataStreamsandContinuousQueries ................. 11

2.4 DistributedSystemsArchitectures ................... 11

2.5 LoadBalancing.............................. 13

2.6 PrivacyPreservingDataPublishing .................. 13

2.7 SecureMulti-partyComputation .................... 14
3 DObjects Framework 16
3.1 Introduction................................ 16

3.2 DObjectsOverview............................ 19

3.2.1 DataModel............................ 19

3.2.2 DataOperationsandQueryLanguage . . . . . . . . . . . . . 23
4 Dynamic Query Processing in DObjects 29
4.1 Introduction................................ 29

4.2 QueryExecutionandOptimization................... 30

4.2.1 Execution and Optimization of Operators . . . . . . . . . . . 33
4.2.2 Query Migration . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.3 Cost Metric Components . . . . . . . . . . . . . . . . . . . . 39
4.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2 Testing of a Real Implementation - Small Scale . . . . . . . . 51
4.3.3 Testing of a Real Implementation - PlanetLab . . . . . . . . . 53
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5 Access to Private Data in DObjects 56
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Private Data in DObjects . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.1 Wrapper and Distributed Anonymization . . . . . . . . . . . 61
5.2.2 Mediator and Secure Query Processing . . . . . . . . . . . . . 63
6 Distributed Anonymization 64
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2 Privacy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3 Distributed Anonymization Protocol . . . . . . . . . . . . . . . . . . 71

6.3.1 Selection of Anonymization Algorithm . . . . . . . . . . . . . 72
6.3.2 Distributed Anonymization Protocol . . . . . . . . . . . . . . 73
6.3.3 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.4 Protocol Overhead . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4.1 Distributed Anonymization vs. Centralized and Independent
Anonymization . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4.2 Achieving Anonymity for Data Providers . . . . . . . . . . . 86
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 Secure Union 91
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Existing Protocols for Secure Union . . . . . . . . . . . . . . . . . . 93
7.2.1 Secure Union Problem Denition . . . . . . . . . . . . . . . . 93
7.2.2 Circuit-Based Secure Union . . . . . . . . . . . . . . . . . . . 93
7.2.3 Commutative Cryptography-Based Secure Union . . . . . . . 94
7.2.4 Probabilistic Secure Union . . . . . . . . . . . . . . . . . . . . 96
7.2.5 Anonymous Communication-Based Secure Union . . . . . . . 97
7.3 Random Shares-Based Union Protocol . . . . . . . . . . . . . . . . . 99
7.3.1 Secure Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.2 Random Shares-Based Secure Union . . . . . . . . . . . . . . 99
7.4 Analysis of Random Shares Secure Union . . . . . . . . . . . . . . . 103
7.4.1 Security Metric and Attack Models . . . . . . . . . . . . . . . 103
7.4.2 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.4.3 Comparison with Probabilistic Secure Union . . . . . . . . . 115

7.4.4 Generating Random Items . . . . . . . . . . . . . . . . . . . . 115
7.4.5 Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.5.1 Security of Random Shares Union Protocol . . . . . . . . . . 116
7.5.2 Cost of Secure Union Protocols . . . . . . . . . . . . . . . . . 120
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8 Conclusions and Future Work 123
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.2 System Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files