The Community for Technology Leaders
RSS Icon
Subscribe
Honolulu, HI, USA USA
June 24, 2012 to June 29, 2012
ISBN: 978-1-4673-2892-0
pp: 139-146
ABSTRACT
Recently, the number and size of RDF data collections has increased rapidly making the issue of scalable processing techniques crucial. The MapReduce model has become a de facto standard for large scale data processing using a cluster of machines in the cloud. Generally, RDF query processing creates join-intensive workloads, resulting in lengthy MapReduce workflows with expensive I/O, data transfer, and sorting costs. However, the MapReduce computation model provides limited static optimization techniques used in relational databases (e.g., indexing and cost-based optimization). Consequently, dynamic optimization techniques for such join-intensive tasks on MapReduce need to be investigated. In some previous efforts, we propose a Nested Triple Group data model and Algebra (NTGA) for efficient graph pattern query processing in the cloud. Here, we extend this work with a scan-sharing technique that is used to optimize the processing of graph patterns with repeated properties. Specifically, our scan-sharing technique eliminates the need for repeated scanning of input relations when properties are used repeatedly in graph patterns. A formal foundation underlying this scan sharing technique is discussed as well as an implementation strategy that has been integrated in the Apache Pig framework is presented. We also present a comprehensive evaluation demonstrating performance benefits of our NTGA plus scan-sharing approach.
INDEX TERMS
Resource description framework, Data models, Cloning, Algebra, Optimization, Pattern matching, Context, Optimization Techniques, cloud computing, MapReduce, SPARQL, RDF Graph Pattern Matching
CITATION
HyeongSik Kim, Padmashree Ravindra, Kemafor Anyanwu, "Scan-Sharing for Optimizing RDF Graph Pattern Matching on MapReduce", CLOUD, 2012, 2013 IEEE Sixth International Conference on Cloud Computing, 2013 IEEE Sixth International Conference on Cloud Computing 2012, pp. 139-146, doi:10.1109/CLOUD.2012.14
33 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool