The Community for Technology Leaders
2012 IEEE Fifth International Conference on Cloud Computing (2012)
Honolulu, HI, USA USA
June 24, 2012 to June 29, 2012
ISSN: 2159-6182
ISBN: 978-1-4673-2892-0
pp: 139-146
Recently, the number and size of RDF data collections has increased rapidly making the issue of scalable processing techniques crucial. The MapReduce model has become a de facto standard for large scale data processing using a cluster of machines in the cloud. Generally, RDF query processing creates join-intensive workloads, resulting in lengthy MapReduce workflows with expensive I/O, data transfer, and sorting costs. However, the MapReduce computation model provides limited static optimization techniques used in relational databases (e.g., indexing and cost-based optimization). Consequently, dynamic optimization techniques for such join-intensive tasks on MapReduce need to be investigated. In some previous efforts, we propose a Nested Triple Group data model and Algebra (NTGA) for efficient graph pattern query processing in the cloud. Here, we extend this work with a scan-sharing technique that is used to optimize the processing of graph patterns with repeated properties. Specifically, our scan-sharing technique eliminates the need for repeated scanning of input relations when properties are used repeatedly in graph patterns. A formal foundation underlying this scan sharing technique is discussed as well as an implementation strategy that has been integrated in the Apache Pig framework is presented. We also present a comprehensive evaluation demonstrating performance benefits of our NTGA plus scan-sharing approach.
Resource description framework, Data models, Cloning, Algebra, Optimization, Pattern matching, Context, Optimization Techniques, cloud computing, MapReduce, SPARQL, RDF Graph Pattern Matching

H. Kim, P. Ravindra and K. Anyanwu, "Scan-Sharing for Optimizing RDF Graph Pattern Matching on MapReduce," 2012 IEEE Fifth International Conference on Cloud Computing(CLOUD), Honolulu, HI, USA USA, 2012, pp. 139-146.
323 ms
(Ver 3.3 (11022016))