This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Graph Pattern Matching: A Join/Semijoin Approach
July 2011 (vol. 23 no. 7)
pp. 1006-1021
Jiefeng Cheng, The University of Hong Kong, Hong Kong
Jeffrey Xu Yu, The Chinese University of Hong Kong, Hong Kong
Philip S. Yu, University of Illinois at Chicago, Chicago
Due to rapid growth of the Internet and new scientific/technological advances, there exist many new applications that model data as graphs, because graphs have sufficient expressiveness to model complicated structures. The dominance of graphs in real-world applications demands new graph processing techniques to access large data graphs effectively and efficiently. In this paper, we study a graph pattern matching problem, which is to find all patterns in a large data graph that match a user-given graph pattern. We propose new two-step R-join (reachability join) algorithms with a filter step (R-semijoin) and a fetch step (R-join) by utilizing a new cluster-based join index with graph codes in a relational database context. We also propose two optimization approaches to further optimize sequences of R-joins/R-semijoins. The first approach is based on R-join order selection followed by R-semijoin enhancement, and the second approach is to interleave R-joins with R-semijoins. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches.

[1] S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, 2000.
[2] D. Brickley and R.V. Guha, "Resource Description Framework (RDF) Schema Specification 1.0," W3C Candidate Recommendation, 2000.
[3] D. Shasha, J.T.L. Wang, and R. Giugno, "Algorithmics and Applications of Tree and Graph Searching," Proc. ACM Symp. Principles of Database Systems (PODS '02), 2002.
[4] N. Bruno, N. Koudas, and D. Srivastava, "Holistic Twig Joins: Optimal XML Pattern Matching," Proc. ACM SIGMOD, 2002.
[5] S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K.S. Candan, "Twig2stack: Bottom-Up Processing of Generalized-Tree-Pattern Queries over XML Documents," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), 2006.
[6] H. Wang, H. He, J. Yang, P.S. Yu, and J.X. Yu, "Dual Labeling: Answering Graph Reachability Queries in Constant Time," Proc. 22nd Int'l Conf. Data Eng. (ICDE '06), 2006.
[7] L. Zou, L. Chen, and M.T. Özsu, "Distancejoin: Pattern Match Query in a Large Graph Database," Proc. 35th Int'l Conf. Very Large Data Bases (VLDB '09), 2009.
[8] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick, "Reachability and Distance Queries via 2-Hop Labels," Proc. Ann. ACM-SIAM Symp. Discrete Algorithms (SODA '02), 2002.
[9] R. Schenkel et al., "HOPI: An Efficient Connection Index for Complex XML Document Collections," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '04), 2004.
[10] R. Schenkel, A. Theobald, and G. Weikum, "Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections," Proc. 21st Int'l Conf. Data Eng. (ICDE '05), 2005.
[11] J. Cheng, J.X. Yu, X. Lin, H. Wang, and P.S. Yu, "Fast Computation of Reachability Labeling for Large Graphs," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '06), 2006.
[12] J. Cheng, J.X. Yu, X. Lin, H. Wang, and P.S. Yu, "Fast Computing Reachability Labelings for Large Graphs with High Compression Rate," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT '08), 2008.
[13] R. Bramandia, B. Choi, and W.K. Ng, "On Incremental Maintenance of 2-Hop Labeling of Graphs," Proc. Int'l Conf. World Wide Web (WWW), 2008.
[14] Y. Wu, J.M. Patel, and H. Jagadish, "Structural Join Order Selection for XML Query Optimization," Proc. 19th Int'l Conf. Data Eng. (ICDE '03), 2003.
[15] M. Jarke and J. Koch, "Query Optimization in Database Systems," ACM Computing Surveys, vol. 16, no. 2, pp. 111-152, 1984.
[16] P. Mishra and M.H. Eich, "Join Processing in Relational Databases," ACM Computing Surveys, vol. 24, no. 1, pp. 63-113, 1992.
[17] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-169, 1993.
[18] Y.E. Ioannidis, "Query Optimization," ACM Computing Surveys, vol. 28, no. 1, pp. 121-123, 1996.
[19] S. Chaudhuri, "An Overview of Query Optimization in Relational Systems," Proc. ACM Symp. Principles of Database Systems (PODS '98), 1998.
[20] D. Kossmann, "The State of the Art in Distributed Query Processing," ACM Computing Surveys, vol. 32, no. 4, pp. 422-469, 2000.
[21] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Addison-Wesley, 1995.
[22] L. Chen, A. Gupta, and M.E. Kurul, "Stack-Based Algorithms for Pattern Matching on DAGs," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[23] H. Wang, W. Wang, X. Lin, and J. Li, "Labeling Scheme and Structural Joins for Graph-Structured XML Data," Proc. Seventh Asia-Pacific Web Conf. Web Technologies Research and Development (APWeb '05), 2005.
[24] R. Agrawal, A. Borgida, and H.V. Jagadish, "Efficient Management of Transitive Relationships in Large Data and Knowledge Bases," Proc. ACM SIGMOD, 1989.
[25] J. Cheng, J.X. Yu, and B. Ding, "Cost-Based Query Optimization for Multi Reachability Joins," Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA '07), 2007.
[26] H. Wang, J. Li, J. Luo, and H. Gao, "Hash-Base Subgraph Query Processing Method for Graph-Structured XML Documents," Proc. VLDB Endowment, vol. 1, no. 1, pp. 478-489, 2008.
[27] A. Schmidt, F. Waas, M. Kersten, M.J. Carey, I. Manolescu, and R. Busse, "XMark: A Benchmark for XML Data Management," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), 2002.
[28] P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie, and T.G. Price, "Access Path Selection in a Relational Database Management System," Proc. ACM SIGMOD, pp. 23-34, 1979.
[29] S. Chaudhuri, "An Overview of Query Optimization in Relational Systems," Proc. ACM Symp. Principles of Database Systems (PODS '98), 1998.
[30] P.A. Bernstein et al., "Query Processing in a System for Distributed Databases (SDD-1)," ACM Trans. Database Systems, vol. 6, no. 4, pp. 602-605, 1981.
[31] P.A. Bernstein and D.-M.W. Chiu, "Using Semi-Joins to Solve Relational Queries," J. ACM, vol. 28, no. 1, pp. 25-40, 1981.
[32] M.S. Chen and P.S. Yu, "Interleaving a Join Sequence with Semijoins in Distributed Query Processing," IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, pp. 611-621, Sept. 1992.
[33] M.S. Chen and P.S. Yu, "Combining Joint and Semi-Join Operations for Distributed Query Processing," IEEE Trans. Knowledge and Data Eng., vol. 5, no. 3, pp. 534-542, June 1993.
[34] F. Bancilhon and R. Ramakrishnan, "An Amateur's Introduction to Recursive Query Processing Strategies," Proc. ACM SIGMOD, 1986.
[35] M. Yannakakis, "Graph-Theoretic Methods in Database Theory," Proc. ACM Symp. Principles of Database Systems (PODS '90), 1990.
[36] S. TrißI and U. Leser, "Fast and Practical Indexing and Querying of Very Large Graphs," Proc. ACM SIGMOD, 2007.
[37] R. Jin, Y. Xiang, N. Ruan, and H. Wang, "Efficiently Answering Reachability Queries on Very Large Directed Graphs," Proc. ACM SIGMOD, 2008.
[38] R. Jin, Y. Xiang, N. Ruan, and D. Fuhry, "3-Hop: A High-Compression Indexing Scheme for Reachability Query," Proc. ACM SIGMOD, 2009.
[39] C. Yu and H.V. Jagadish, "Querying Complex Structured Databases," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB '07), 2007.

Index Terms:
Graph matching, 2-hop labeling, reachability joins, join/semijoin processing.
Citation:
Jiefeng Cheng, Jeffrey Xu Yu, Philip S. Yu, "Graph Pattern Matching: A Join/Semijoin Approach," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 1006-1021, July 2011, doi:10.1109/TKDE.2010.169
Usage of this product signifies your acceptance of the Terms of Use.