Cost Analysis of Joins in RDF Query Processing Using the TripleTIndex Open Access

Li, Kanwei (2009)

Permanent URL: https://etd.library.emory.edu/concern/etds/tb09j615c?locale=en

Published

Abstract

The Semantic Web movement has led to a growing popularity of RDF and its query languages. Clearly, good query performance is important in allowing information to be quickly retrieved from RDF datasets that are ever-increasing in size. We use the TripleT indexing scheme for RDF data as a framework to examine the cost of join operations for RDF. We analyze strategies for efficient join processing for a variety of query patterns. For queries that involve multiple join conditions, we introduce a model to predict the number of I/Os required to best order the join conditions. Experimental results validate the model using three real RDF datasets.

1 Introduction 1.1 Research Objective 1.2 Prior Work 2 RDF and SPARQL 2.1 Background on RDF 2.1.1 Representation Formats 2.2 RDF Datasets 2.3 Datasets Used in the Thesis 2.3.1 Dataset Statistics 2.3.2 Dataset Discussion 2.4 Background on SPARQL 3 Indexing Techniques 3.1 B+ Trees 3.2 RDF Indexing Schemes 3.2.1 MAP and HexTree 3.2.2 TripleT 4 Join Algorithms 4.1 Nested-loop join 4.2 Hash join 4.3 Sort-Merge join 4.4 Measuring Join Performance 4.4.1 Join CPU Performance on Synthetic Data 4.4.2 Join I/O and CPU Performance on Datasets 5 Query Optimization 5.1 Join Ordering 5.2 Processing SPARQL Queries with TripleT 5.3 Discussion 6 Models and Experiments for All-Variable SAPs 6.1 DBpedia Results 6.2 Uniprot Results 6.3 SP2Bench Results 6.4 Variant Query Forms 6.5 Discussion

7 Conclusion

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Math and Computer Science
Degree	MS
Submission	Dissertation
Language	English
Research Field	Computer Science
Keyword	query optimization semantic web computer science rdf sparql
Committee Chair / Thesis Advisor	Lu, James, Emory University
Committee Members	Xiong, Li Shirley, Emory University Hutto, Phillip Ward, Emory University

Last modified

Cost Analysis of Joins in RDF Query Processing Using the TripleTIndex Open Access

Li, Kanwei (2009)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files