The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2011 vol.23)
pp: 1006-1021
Jiefeng Cheng , The University of Hong Kong, Hong Kong
Jeffrey Xu Yu , The Chinese University of Hong Kong, Hong Kong
Philip S. Yu , University of Illinois at Chicago, Chicago
ABSTRACT
Due to rapid growth of the Internet and new scientific/technological advances, there exist many new applications that model data as graphs, because graphs have sufficient expressiveness to model complicated structures. The dominance of graphs in real-world applications demands new graph processing techniques to access large data graphs effectively and efficiently. In this paper, we study a graph pattern matching problem, which is to find all patterns in a large data graph that match a user-given graph pattern. We propose new two-step R-join (reachability join) algorithms with a filter step (R-semijoin) and a fetch step (R-join) by utilizing a new cluster-based join index with graph codes in a relational database context. We also propose two optimization approaches to further optimize sequences of R-joins/R-semijoins. The first approach is based on R-join order selection followed by R-semijoin enhancement, and the second approach is to interleave R-joins with R-semijoins. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches.
INDEX TERMS
Graph matching, 2-hop labeling, reachability joins, join/semijoin processing.
CITATION
Jiefeng Cheng, Jeffrey Xu Yu, Philip S. Yu, "Graph Pattern Matching: A Join/Semijoin Approach", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 7, pp. 1006-1021, July 2011, doi:10.1109/TKDE.2010.169
REFERENCES
[1] S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, 2000.
[2] D. Brickley and R.V. Guha, "Resource Description Framework (RDF) Schema Specification 1.0," W3C Candidate Recommendation, 2000.
[3] D. Shasha, J.T.L. Wang, and R. Giugno, "Algorithmics and Applications of Tree and Graph Searching," Proc. ACM Symp. Principles of Database Systems (PODS '02), 2002.
[4] N. Bruno, N. Koudas, and D. Srivastava, "Holistic Twig Joins: Optimal XML Pattern Matching," Proc. ACM SIGMOD, 2002.
[5] S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K.S. Candan, "Twig2stack: Bottom-Up Processing of Generalized-Tree-Pattern Queries over XML Documents," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), 2006.
[6] H. Wang, H. He, J. Yang, P.S. Yu, and J.X. Yu, "Dual Labeling: Answering Graph Reachability Queries in Constant Time," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), 2006.
[7] L. Zou, L. Chen, and M.T. Özsu, "Distancejoin: Pattern Match Query in a Large Graph Database," Proc. 35th Int'l Conf. Very Large Data Bases (VLDB '09), 2009.
[8] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, "Reachability and Distance Queries via 2-Hop Labels," Proc. Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '02), 2002.
[9] R. Schenkel et al., "HOPI: An Efficient Connection Index for Complex XML Document Collections," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '04), 2004.
[10] R. Schenkel, A. Theobald, and G. Weikum, "Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections," Proc. 21st Int'l Conf. Data Eng. (ICDE '05), 2005.
[11] J. Cheng, J.X. Yu, X. Lin, H. Wang, and P.S. Yu, "Fast Computation of Reachability Labeling for Large Graphs," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '06), 2006.
[12] J. Cheng, J.X. Yu, X. Lin, H. Wang, and P.S. Yu, "Fast Computing Reachability Labelings for Large Graphs with High Compression Rate," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '08), 2008.
[13] R. Bramandia, B. Choi, and W.K. Ng, "On Incremental Maintenance of 2-Hop Labeling of Graphs," Proc. Int'l Conf. World Wide Web (WWW), 2008.
[14] Y. Wu, J.M. Patel, and H. Jagadish, "Structural Join Order Selection for XML Query Optimization," Proc. 19th Int'l Conf. Data Eng. (ICDE '03), 2003.
[15] M. Jarke and J. Koch, "Query Optimization in Database Systems," ACM Computing Surveys, vol. 16, no. 2, pp. 111-152, 1984.
[16] P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 63-113, 1992.
[17] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-169, 1993.
[18] Y.E. Ioannidis, "Query Optimization," ACM Computing Surveys, vol. 28, no. 1, pp. 121-123, 1996.
[19] S. Chaudhuri, "An Overview of Query Optimization in Relational Systems," Proc. ACM Symp. Principles of Database Systems (PODS '98), 1998.
[20] D. Kossmann, "The State of the Art in Distributed Query Processing," ACM Computing Surveys, vol. 32, no. 4, pp. 422-469, 2000.
[21] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Addison-Wesley, 1995.
[22] L. Chen, A. Gupta, and M.E. Kurul, "Stack-Based Algorithms for Pattern Matching on DAGs," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[23] H. Wang, W. Wang, X. Lin, and J. Li, "Labeling Scheme and Structural Joins for Graph-Structured XML Data," Proc. Seventh Asia-Pacific Web Conf. Web Technologies Research and Development (APWeb '05), 2005.
[24] R. Agrawal, A. Borgida, and H.V. Jagadish, "Efficient Management of Transitive Relationships in Large Data and Knowledge Bases," Proc. ACM SIGMOD, 1989.
[25] J. Cheng, J.X. Yu, and B. Ding, "Cost-Based Query Optimization for Multi Reachability Joins," Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA '07), 2007.
[26] H. Wang, J. Li, J. Luo, and H. Gao, "Hash-Base Subgraph Query Processing Method for Graph-Structured XML Documents," Proc. VLDB Endowment, vol. 1, no. 1, pp. 478-489, 2008.
[27] A. Schmidt, F. Waas, M. Kersten, M.J. Carey, I. Manolescu, and R. Busse, "XMark: A Benchmark for XML Data Management," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), 2002.
[28] P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, and T.G. Price, "Access Path Selection in a Relational Database Management System," Proc. ACM SIGMOD, pp. 23-34, 1979.
[29] S. Chaudhuri, "An Overview of Query Optimization in Relational Systems," Proc. ACM Symp. Principles of Database Systems (PODS '98), 1998.
[30] P.A. Bernstein et al., "Query Processing in a System for Distributed Databases (SDD-1)," ACM Trans. Database Systems, vol. 6, no. 4, pp. 602-605, 1981.
[31] P.A. Bernstein and D.-M.W. Chiu, "Using Semi-Joins to Solve Relational Queries," J. ACM, vol. 28, no. 1, pp. 25-40, 1981.
[32] M.S. Chen and P.S. Yu, "Interleaving a Join Sequence with Semijoins in Distributed Query Processing," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 611-621, Sept. 1992.
[33] M.S. Chen and P.S. Yu, "Combining Joint and Semi-Join Operations for Distributed Query Processing," IEEE Trans. Knowledge and Data Eng., vol. 5, no. 3, pp. 534-542, June 1993.
[34] F. Bancilhon and R. Ramakrishnan, "An Amateur's Introduction to Recursive Query Processing Strategies," Proc. ACM SIGMOD, 1986.
[35] M. Yannakakis, "Graph-Theoretic Methods in Database Theory," Proc. ACM Symp. Principles of Database Systems (PODS '90), 1990.
[36] S. TrißI and U. Leser, "Fast and Practical Indexing and Querying of Very Large Graphs," Proc. ACM SIGMOD, 2007.
[37] R. Jin, Y. Xiang, N. Ruan, and H. Wang, "Efficiently Answering Reachability Queries on Very Large Directed Graphs," Proc. ACM SIGMOD, 2008.
[38] R. Jin, Y. Xiang, N. Ruan, and D. Fuhry, "3-Hop: A High-Compression Indexing Scheme for Reachability Query," Proc. ACM SIGMOD, 2009.
[39] C. Yu and H.V. Jagadish, "Querying Complex Structured Databases," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), 2007.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool