Towards Scalable and Privacy-Preserving Integration of Distributed Heterogeneous Data Pubblico
Jurczyk, Pawel Czeslaw (2010)
Abstract
With the trend of cloud computing, data and computing are moved
away from desktop and are instead provided as a service from the
cloud. Data-as-a-service enables access to a wealth of data across
distributed and heterogeneous data sources in the cloud. It remains
a challenge, however, to ensure the privacy, interoperability, and
scalability for such services.
We designed and developed DObjects, a general-purpose P2P-based
query and data operations infrastructure that can be deployed in
the cloud and provides access to heterogeneous data sources. The
system builds on top of a distributed mediator-wrapper architecture
where individual system nodes serve as mediators and/or wrappers
and interact with each other in a P2P fashion. As an analogy, the
system nodes can be considered as droplets, small elements that
provide similar functionality in the cloud. Just as thousands or
millions of droplets form a single drop in nature, in cloud
computing, groups of droplets that provide similar functionality
can form a micro-cloud. Micro-clouds are an integral part of the
whole cloud computing system.
The dissertation also discusses the novel dynamic query execution
engine within the data query infrastructure that dynamically adapts
to network and node conditions. The query processing is capable of
fully benefiting from all the distributed resources to minimize the
query response time and maximize system throughput. In addition to
leveraging the traditional distributed query optimization
techniques, the (sub)queries are deployed on droplets in a dynamic
and iterative manner in order to guarantee the best reaction to
network and resource dynamics.
Finally, the dissertation presents an extension to the basic
DObjects model that enables access to private data that is
distributed and needs anonymization. The extension enables droplets
to form virtual groups to addresses two privacy issues for the
sensitive data: privacy of data subjects and confidentiality of
data providers. The dissertation discusses decentralized protocols
that enable data sharing for horizontally partitioned databases
given these constraints. Concretely, given a query spanning
multiple databases, the query results do not contain individually
identifiable information. In addition, institutions do not reveal
their databases to each other apart from the query results.
Table of Contents
1 Introduction 1
1.1 Motivation ................................ 1
1.1.1 ApplicationScenarios ...................... 1
1.1.2 ResearchChallenges ....................... 3
1.2 Contributions............................... 5
1.3 Organization ............................... 7
2 Related Work 8
2.1 Introduction................................ 8
2.2 DistributedDatabasesandQueryProcessing . . . . . . . . . . . . . 8
2.3 DataStreamsandContinuousQueries ................. 11
2.4 DistributedSystemsArchitectures ................... 11
2.5 LoadBalancing.............................. 13
2.6 PrivacyPreservingDataPublishing .................. 13
2.7 SecureMulti-partyComputation .................... 14
3 DObjects Framework 16
3.1 Introduction................................ 16
3.2 DObjectsOverview............................ 19
3.2.1 DataModel............................ 19
3.2.2 DataOperationsandQueryLanguage . . . . . . . . . . . . .
23
4 Dynamic Query Processing in DObjects 29
4.1 Introduction................................ 29
4.2 QueryExecutionandOptimization................... 30
4.2.1 Execution and Optimization of Operators . . . . . . . . .
. . 33
4.2.2 Query Migration . . . . . . . . . . . . . . . . . . . . . . .
. . 34
4.2.3 Cost Metric Components . . . . . . . . . . . . . . . . . . .
. 39
4.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . .
. . . . . 42
4.3.1 Simulation Results . . . . . . . . . . . . . . . . . . . . .
. . . 42
4.3.2 Testing of a Real Implementation - Small Scale . . . . . . .
. 51
4.3.3 Testing of a Real Implementation - PlanetLab . . . . . . . .
. 53
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 54
5 Access to Private Data in DObjects 56
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 56
5.2 Private Data in DObjects . . . . . . . . . . . . . . . . . . .
. . . . . 59
5.2.1 Wrapper and Distributed Anonymization . . . . . . . . . . .
61
5.2.2 Mediator and Secure Query Processing . . . . . . . . . . . .
. 63
6 Distributed Anonymization 64
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 64
6.2 Privacy Model . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 66
6.3 Distributed Anonymization Protocol . . . . . . . . . . . . . .
. . . . 71
6.3.1 Selection of Anonymization Algorithm . . . . . . . . . . .
. . 72
6.3.2 Distributed Anonymization Protocol . . . . . . . . . . . . .
. 73
6.3.3 Security Analysis . . . . . . . . . . . . . . . . . . . . . .
. . . 78
6.3.4 Protocol Overhead . . . . . . . . . . . . . . . . . . . . . .
. . 83
6.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . .
. . . . . 84
6.4.1 Distributed Anonymization vs. Centralized and
Independent
Anonymization . . . . . . . . . . . . . . . . . . . . . . . . . .
85
6.4.2 Achieving Anonymity for Data Providers . . . . . . . . . . .
86
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 89
7 Secure Union 91
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 91
7.2 Existing Protocols for Secure Union . . . . . . . . . . . . . .
. . . . 93
7.2.1 Secure Union Problem Denition . . . . . . . . . . . . . . . .
93
7.2.2 Circuit-Based Secure Union . . . . . . . . . . . . . . . . .
. . 93
7.2.3 Commutative Cryptography-Based Secure Union . . . . . . .
94
7.2.4 Probabilistic Secure Union . . . . . . . . . . . . . . . . .
. . . 96
7.2.5 Anonymous Communication-Based Secure Union . . . . . . .
97
7.3 Random Shares-Based Union Protocol . . . . . . . . . . . . . .
. . . 99
7.3.1 Secure Sum . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 99
7.3.2 Random Shares-Based Secure Union . . . . . . . . . . . . . .
99
7.4 Analysis of Random Shares Secure Union . . . . . . . . . . . .
. . . 103
7.4.1 Security Metric and Attack Models . . . . . . . . . . . . . .
. 103
7.4.2 Security Analysis . . . . . . . . . . . . . . . . . . . . . .
. . . 105
7.4.3 Comparison with Probabilistic Secure Union . . . . . . . . .
115
7.4.4 Generating Random Items . . . . . . . . . . . . . . . . .
. . . 115
7.4.5 Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . .
. . . 116
7.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . .
. . . . . 116
7.5.1 Security of Random Shares Union Protocol . . . . . . . . . .
116
7.5.2 Cost of Secure Union Protocols . . . . . . . . . . . . . . .
. . 120
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 122
8 Conclusions and Future Work 123
8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 123
8.2 System Limitations . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 125
8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 126
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Towards Scalable and Privacy-Preserving Integration of Distributed Heterogeneous Data () | 2018-08-28 12:14:23 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|