2013 IEEE 5th International Conference on Cloud Computing Technology and Science (2013)
Bristol, United Kingdom United Kingdom
Dec. 2, 2013 to Dec. 5, 2013
In recent times, it has been widely recognized that, due to their inherent scalability, frameworks based on MapReduce are indispensable for so-called "Big Data" applications. However, for Semantic Web applications using SPARQL, there is still a demand for sophisticated MapReduce join techniques for processing basic graph patterns, which are at the core of SPARQL. Renowned for their stable and efficient performance, sort-merge joins have become widely used in DBMSs. In this paper, we demonstrate the adaptation of merge joins for SPARQL BGP processing with MapReduce. Our technique supports both n-way joins and sequences of join operations by applying merge joins within the map phase of MapReduce while the reduce phase is only used to fulfill the preconditions of a subsequent join iteration. Our experiments with the LUBM benchmark show an average performance benefit between 15% and 48% compared to other MapReduce based approaches while at the same time scaling linearly with the RDF dataset size.
Resource description framework, Sorting, Layout, Pattern matching, Educational institutions, Information management
M. Przyjaciel-Zablocki, A. Schaetzle, E. Skaley, T. Hornung and G. Lausen, "Map-Side Merge Joins for Scalable SPARQL BGP Processing," 2013 IEEE 5th International Conference on Cloud Computing Technology and Science(CLOUDCOM), Bristol, United Kingdom United Kingdom, 2013, pp. 631-638.