Cost Analysis of Joins in RDF Query Processing Using the TripleT Index Open Access

Li, Kanwei (2009)

Permanent URL:


The Semantic Web movement has led to a growing popularity of RDF and its query languages. Clearly, good query performance is important in allowing information to be quickly retrieved from RDF datasets that are ever-increasing in size. We use the TripleT indexing scheme for RDF data as a framework to examine the cost of join operations for RDF. We analyze strategies for efficient join processing for a variety of query patterns. For queries that involve multiple join conditions, we introduce a model to predict the number of I/Os required to best order the join conditions. Experimental results validate the model using three real RDF datasets.

Table of Contents

1 Introduction
1.1 Research Objective
1.2 Prior Work
2.1 Background on RDF
2.1.1 Representation Formats
2.2 RDF Datasets
2.3 Datasets Used in the Thesis
2.3.1 Dataset Statistics
2.3.2 Dataset Discussion
2.4 Background on SPARQL
3 Indexing Techniques
3.1 B+ Trees
3.2 RDF Indexing Schemes
3.2.1 MAP and HexTree
3.2.2 TripleT
4 Join Algorithms
4.1 Nested-loop join
4.2 Hash join
4.3 Sort-Merge join
4.4 Measuring Join Performance
4.4.1 Join CPU Performance on Synthetic Data
4.4.2 Join I/O and CPU Performance on Datasets
5 Query Optimization
5.1 Join Ordering
5.2 Processing SPARQL Queries with TripleT
5.3 Discussion
6 Models and Experiments for All-Variable SAPs
6.1 DBpedia Results
6.2 Uniprot Results
6.3 SP2Bench Results
6.4 Variant Query Forms
6.5 Discussion

7 Conclusion

About this thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files