Cost Analysis of Joins in RDF Query Processing Using the TripleTIndex 公开
Li, Kanwei (2009)
Abstract
The Semantic Web movement has led to a growing popularity of RDF and its query languages. Clearly, good query performance is important in allowing information to be quickly retrieved from RDF datasets that are ever-increasing in size. We use the TripleT indexing scheme for RDF data as a framework to examine the cost of join operations for RDF. We analyze strategies for efficient join processing for a variety of query patterns. For queries that involve multiple join conditions, we introduce a model to predict the number of I/Os required to best order the join conditions. Experimental results validate the model using three real RDF datasets.
Table of Contents
1 Introduction 1.1 Research Objective 1.2 Prior Work 2 RDF and SPARQL 2.1 Background on RDF 2.1.1 Representation Formats 2.2 RDF Datasets 2.3 Datasets Used in the Thesis 2.3.1 Dataset Statistics 2.3.2 Dataset Discussion 2.4 Background on SPARQL 3 Indexing Techniques 3.1 B+ Trees 3.2 RDF Indexing Schemes 3.2.1 MAP and HexTree 3.2.2 TripleT 4 Join Algorithms 4.1 Nested-loop join 4.2 Hash join 4.3 Sort-Merge join 4.4 Measuring Join Performance 4.4.1 Join CPU Performance on Synthetic Data 4.4.2 Join I/O and CPU Performance on Datasets 5 Query Optimization 5.1 Join Ordering 5.2 Processing SPARQL Queries with TripleT 5.3 Discussion 6 Models and Experiments for All-Variable SAPs 6.1 DBpedia Results 6.2 Uniprot Results 6.3 SP2Bench Results 6.4 Variant Query Forms 6.5 Discussion
7 Conclusion
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Cost Analysis of Joins in RDF Query Processing Using the TripleTIndex () | 2018-08-28 12:55:02 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|