The Community for Technology Leaders
RSS Icon
Issue No.09 - September (2011 vol.23)
pp: 1312-1327
James McGlothlin , University of Texas at Dallas, Richardson
Mohammad Mehedy Masud , University of Texas at Dallas, Richardson
Latifur R. Khan , University of Texas at Dallas, Richardson
Mohammad Farhan Husain , University of Texas at Dallas, Richardson
Semantic web is an emerging area to augment human reasoning. Various technologies are being developed in this arena which have been standardized by the World Wide Web Consortium (W3C). One such standard is the Resource Description Framework (RDF). Semantic web technologies can be utilized to build efficient and scalable systems for Cloud Computing. With the explosion of semantic web technologies, large RDF graphs are common place. This poses significant challenges for the storage and retrieval of RDF graphs. Current frameworks do not scale for large RDF graphs and as a result do not address these challenges. In this paper, we describe a framework that we built using Hadoop to store and retrieve large numbers of RDF triples by exploiting the cloud computing paradigm. We describe a scheme to store RDF data in Hadoop Distributed File System. More than one Hadoop job (the smallest unit of execution in Hadoop) may be needed to answer a query because a single triple pattern in a query cannot simultaneously take part in more than one join in a single Hadoop job. To determine the jobs, we present an algorithm to generate query plan, whose worst case cost is bounded, based on a greedy approach to answer a SPARQL Protocol and RDF Query Language (SPARQL) query. We use Hadoop's MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can handle large amounts of RDF data, unlike traditional approaches.
Hadoop, RDF, SPARQL, MapReduce.
James McGlothlin, Mohammad Mehedy Masud, Latifur R. Khan, Mohammad Farhan Husain, "Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 9, pp. 1312-1327, September 2011, doi:10.1109/TKDE.2011.103
[1] D.J. Abadi, "Data Management in the Cloud: Limitations and Opportunities," IEEE Data Eng. Bull., vol. 32, no. 1, pp. 3-12, Mar. 2009.
[2] D.J. Abadi, A. Marcus, S.R. Madden, and K. Hollenbach, "SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management," VLDB J., vol. 18, no. 2, pp. 385-406, Apr. 2009.
[3] D.J. Abadi, A. Marcus, S.R. Madden, and K. Hollenbach, "Scalable Semantic Web Data Management Using Vertical Partitioning," Proc. 33rd Int'l Conf. Very Large Data Bases, 2007.
[4] A. Abouzeid, K. Bajda-Pawlikowski, D.J. Abadi, A. Silberschatz, and A. Rasin, "HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads," Proc. VLDB Endowment, vol. 2, pp. 922-933, 2009.
[5] M. Atre, J. Srinivasan, and J.A. Hendler, "BitMat: A Main-Memory Bit Matrix of RDF Triples for Conjunctive Triple Pattern Queries," Proc. Int'l Semantic Web Conf., 2008.
[6] P. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner, "MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 479-490, 2006.
[7] J.J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson, "Jena: Implementing the Semantic Web Recommendations," Proc. 13th Int'l World Wide Web Conf. Alternate Track Papers and Posters, pp. 74-83, 2004.
[8] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber, "Bigtable: A Distributed Storage System for Structured Data," Proc. Seventh USENIX Symp. Operating System Design and Implementation, Nov. 2006.
[9] A. Chebotko, S. Lu, and F. Fotouhi, Semantics Preserving SPARQL-to-SQL Translation, Technical Report TR-DB-112007-CLF, 2007.
[10] E.I. Chong, S. Das, G. Eadon, and J. Srinivasan, "An Efficient SQL-Based RDF Querying Scheme," Proc. Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[11] C.T. Chu, S.K. Kim, Y.A. Lin, Y. Yu, G. Bradski, A.Y. Ng, and K. Olukotun, "Map-Reduce for Machine Learning on Multicore," Proc. Neural Information Processing Systems (NIPS), 2007.
[12] R. Cyganiak, A Relational Algebra for SPARQL, Technical Report HPL-2005-170, 2005.
[13] J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proc. Sixth Conf. Symp. Operating Systems Design and Implementation, 2004.
[14] L. Ding, T. Finin, Y. Peng, P.P. da Silva, and D.L. Mcguinness, "Tracking RDF Graph Provenance Using RDF Molecules," Proc. Fourth Int'l Semantic Web Conf., 2005.
[15] R. Elmasri and B. Navathe, Fundamentals of Database Systems. Pearson Education, 1994.
[16] L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold, "Column-Store Support for RDF Data Management: Not All Swans Are White," Proc. VLDB Endowment, vol. 1, no. 2, pp. 1553-1563, Aug. 2008.
[17] Y. Guo and J. Heflin, "A Scalable Approach for Partitioning OWL Knowledge Bases," Proc. Second Int'l Workshop Scalable Semantic Web Knowledge Base Systems, 2006.
[18] Y. Guo, Z. Pan, and J. Heflin, "LUBM: A Benchmark for OWL Knowledge Base Systems," Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3, pp. 158-182, 2005.
[19] Y. Guo, Z. Pan, and J. Heflin, "An Evaluation of Knowledge Base Systems for Large OWL Datasets," Proc. Int'l Semantic Web Conf., 2004.
[20] M.F. Husain, P. Doshi, L. Khan, and B. Thuraisingham, "Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce," Proc. First Int'l Conf. Cloud Computing, , 2009.
[21] M.F. Husain, L. Khan, M. Kantarcioglu, and B. Thuraisingham, "Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools," Proc. IEEE Int'l Conf. Cloud Computing, pp. 1-10, July 2010.
[22] A. Kiryakov, D. Ognyanov, and D. Manov, "OWLIM: A Pragmatic Semantic Repository for OWL," Proc. Int'l Workshop Scalable Semantic Web Knowledge Base Systems (SSWS), 2005.
[23] J.P. Mcglothlin and L.R. Khan, "RDFKB: Efficient Support for RDF Inference Queries and Knowledge Management," Proc. Int'l Database Eng. and Applications Symp. (IDEAS), 2009.
[24] J.P. McGlothlin and L. Khan, "Materializing and Persisting Inferred and Uncertain Knowledge in RDF Datasets," Proc. AAAI Conf. Artificial Intelligence, 2010.
[25] A.W. Mcnabb, C.K. Monson, and K.D. Seppi, "MRPSO: MapReduce Particle Swarm Optimization," Proc. Ann. Conf. Genetic and Evolutionary Computation (GECCO), 2007.
[26] P. Mika and G. Tummarello, "Web Semantics in the Clouds," IEEE Intelligent Systems, vol. 23, no. 5, pp. 82-87, Sept./Oct. 2008.
[27] J.E. Moreira, M.M. Michael, D. Da Silva, D. Shiloach, P. Dube, and L. Zhang, "Scalability of the Nutch Search Engine," Proc. 21st Ann. Int'l Conf. Supercomputing (ICS '07), pp. 3-12, June 2007.
[28] C. Moretti, K. Steinhaeuser, D. Thain, and N. Chawla, "Scaling Up Classifiers to Cloud Computers," Proc. IEEE Int'l Conf. Data Mining (ICDM '08), 2008.
[29] T. Neumann and G. Weikum, "RDF-3X: A RISC-Style Engine for RDF," Proc. VLDB Endowment, vol. 1, no. 1, pp. 647-659, 2008.
[30] A. Newman, J. Hunter, Y.F. Li, C. Bouton, and M. Davis, "A Scale-Out RDF Molecule Store for Distributed Processing of Biomedical Data," Proc. Semantic Web for Health Care and Life Sciences Workshop, 2008.
[31] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, "Pig Latin: A Not-So-Foreign Language for Data Processing," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
[32] P. Pantel, "Data Catalysis: Facilitating Large-Scale Natural Language Data Processing," Proc. Int'l Symp. Universal Comm., 2007.
[33] K. Rohloff, M. Dean, I. Emmons, D. Ryder, and J. Sumner, "An Evaluation of Triple-Store Technologies for Large Data Stores," Proc. OTM Confederated Int'l Conf. On the Move to Meaningful Internet Systems, 2007.
[34] M. Schmidt, T. Hornung, G. Lausen, and C. Pinkel, "SP2Bench: A SPARQL Performance Benchmark," Proc. 25th Int'l Conf. Data Eng. (ICDE '09), 2009.
[35] Y. Sismanis, S. Das, R. Gemulla, P. Haas, K. Beyer, and J. McPherson, "Ricardo: Integrating R and Hadoop," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2010.
[36] M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds, "SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation," WWW '08: Proc. 17th Int'l Conf. World Wide Web, 2008.
[37] M. Stonebraker, D.J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik, "C-Store: A Column-Oriented DBMS," VLDB '05: Proc. 31st Int'l Conf. Very Large Data Bases, pp. 553-564, 2005.
[38] J. Urbani, S. Kotoulas, E. Oren, and F. van Harmelen, "Scalable Distributed Reasoning Using MapReduce," Proc. Int'l Semantic Web Conf., 2009.
[39] J. Wang, S. Wu, H. Gao, J. Li, and B.C. Ooi, "Indexing Multi-Dimensional Data in a Cloud System," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2010.
[40] J. Weaver and J.A. Hendler, "Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples," Proc. Eighth Int'l Semantic Web Conf., 2009.
[41] C. Weiss, P. Karras, and A. Bernstein, "Hexastore: Sextuple Indexing for Semantic Web Data Management," Proc. VLDB Endowment, vol. 1, no. 1, pp. 1008-1019, 2008.
[42] Y. Xu, P. Kostamaa, and L. Gao, "Integrating Hadoop and Parallel DBMs," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2010.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool