The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - March-April (2013 vol.17)
pp: 52-61
Kemafor Anyanwu , North Carolina State University
HyeongSik Kim , North Carolina State University
Padmashree Ravindra , North Carolina State University
ABSTRACT
MapReduce platforms such as Hadoop are now the de facto standard for large-scale data processing, but they have significant limitations for join-intensive workloads typical in Semantic Web processing. This article overviews an algebraic optimization approach based on a Nested TripleGroup Data Model and Algebra (NTGA) that minimizes overall processing costs by reducing the number of MapReduce cycles. It also presents an approach for integrating NTGA-based processing of graph pattern queries into Apache Pig and compares it to execution plans using relational-style algebra operators.
INDEX TERMS
Resource description framework, Optimization, Query processing, Data processing, query processing, query languages, database management, information technology and systems
CITATION
Kemafor Anyanwu, HyeongSik Kim, Padmashree Ravindra, "Algebraic Optimization for Processing Graph Pattern Queries in the Cloud", IEEE Internet Computing, vol.17, no. 2, pp. 52-61, March-April 2013, doi:10.1109/MIC.2012.22
REFERENCES
1. J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Proc. Symp. Operating System Design and Implementation (OSDI 04) , ACM, 2004, pp. 107–113.
2. A. Abouzeid et al., “HadoopDB: an Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads,” Proc. VLDB Endowment, vol. 2, no. 1, Aug. 2009, pp. 922–933.
3. T. Neumann and G. Weikum, “RDF-3X: A RISC-Style Engine for RDF,” Proc. VLDB Endowment, vol. 1, no. 1, Aug. 2008, pp. 647–659.
4. M. Husain et al., “Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing,” IEEE Trans. Knowledge and Data Eng. (TKDE), vol. 23, no. 9, Sept. 2011, pp. 1312–1327.
5. C. Olston et al., “PigLatin: A Not-So-Foreign Language for Data Processing,” Proc. ACM Int'l Conf. Management of Data (SIGMOD 08), ACM, 2008, pp. 1099–1110.
6. D.J. Abadi et al., “Scalable Semantic Web Data Management Using Vertical Partitioning,” Proc. Conf. Very Large Databases (VLDB 07), VLDB Endowment, 2007, pp. 411–422.
7. P. Ravindra et al., “An Intermediate Algebra for Optimizing RDF Graph Pattern Matching on MapReduce,” Proc. Extended Semantic Web Conf.: Research and Applications (ESWC 11), Springer, 2011, pp. 46–61.
8. H. Kim et al., “From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra,” Proc. VLDB Endowment, vol. 4, no. 10, Aug. 2011, pp. 1426–1429.
9. B. McBride,“Jena: A Semantic Web Toolkit,” IEEE Internet Computing, vol. 6, no. 6, 2002, pp. 55–59.
10. A. Pavlo et al., “A Comparison of Approaches to Large-Scale Data Analysis,” Proc. ACM Int'l Conf. Management of Data (SIGMOD 09), ACM, 2009, pp. 165–178.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool