Subscribe

Issue No.09 - September (2011 vol.23)

pp: 1388-1405

Jianzhong Li , Harbin Institute of Technology, Harbin

Yong Liu , Harbin Institute of Technology, Harbin

Hong Gao , Harbin Institute of Technology, Harbin

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.249

ABSTRACT

We investigate the problem of summarizing frequent subgraphs by a smaller set of representative patterns. We show that some special graph patterns, called \delta\hbox{-}jump patterns in this paper, must be representative patterns. Based on the fact, we devise two algorithms, RP-FP and RP-GD, to mine a representative set that summarizes frequent subgraphs. RP-FP derives a representative set from frequent closed subgraphs, whereas RP-GD mines a representative set from graph databases directly. Three novel heuristic strategies, Last-Succeed-First-Check, Reverse-Path-Trace, and Nephew-Representative-Based-Cover, are proposed to further improve the efficiency of RP-GD. RP-FP can provide a tight ratio bound but has heavy computation cost. RP-GD cannot provide a ratio bound guarantee but is more efficient than RP-FP. We also make use of the similarity between sibling branches in the graph pattern space to devise another much more efficient algorithm, RP-Leap, for mining a representative set that can approximately summarize frequent subgraphs. Our extensive experiments on both real and synthetic data sets verify the summarization quality and efficiency of our algorithms. To further demonstrate the interestingness of representative patterns, we study an application of representative patterns to classification. We demonstrate that the classification accuracy achieved by representative pattern-based model is no less than that achieved by closed graph pattern-based model.

INDEX TERMS

Data mining, graph mining, pattern summarization.

CITATION

Jianzhong Li, Yong Liu, Hong Gao, "Efficient Algorithms for Summarizing Graph Patterns",

*IEEE Transactions on Knowledge & Data Engineering*, vol.23, no. 9, pp. 1388-1405, September 2011, doi:10.1109/TKDE.2010.249REFERENCES

- [1] A. Inokuchi, T. Washio, and H. Motoda, "An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data,"
Proc. Fourth European Conf. Principles of Data Mining and Knowledge Discovery (PKDD), pp. 13-23, 2000.- [2] M. Kuramochi and G. Karypis, "Frequent Subgraph Discovery,"
Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 313-320, 2001.- [3] C. Borgelt and M.R. Berhold, "Mining Molecular Fragments: Finding Relevant Substructures of Molecules,"
Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 51-58, 2002.- [4] X. Yan and J. Han, "gSpan: Graph-Based Substructure Pattern Mining,"
Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 721-724, 2002.- [5] J. Huan, W. Wang, and J. Prins, "Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism,"
Proc. Third IEEE Int'l Conf. Data Mining (ICDM), pp. 549-552, 2003.- [6] S. Nijssen and J.N. Kok, "A Quickstart in Frequent Structure Mining can Make a Difference,"
Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 647-652, 2004.- [7] X. Yan, X.J. Zhou, and J. Han, "Mining Closed Relational Graphs with Connectivity Constraints,"
Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 324-333, 2005.- [8] X. Yan and J. Han, "CloseGraph: Mining Closed Frequent Graph Patterns,"
Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 286-295, 2003.- [9] J. Huan, W. Wang, J. Prins, and J. Yang, "SPIN: Mining Maximal Frequent Subgraphs from Graph Databases,"
Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 581-586, 2004.- [10] L.T. Thomas, S.R. Valluri, and K. Karlapalem, "Margin: Maximal Frequent Subgraph Mining,"
Proc. Sixth Int'l Conf. Data Mining (ICDM), pp. 1097-1101, 2006.- [11] Y. Liu, J. Li, and H. Gao, "Summarizing Graph Patterns,"
Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 903-912, 2008.- [12] D. Xin, J. Han, X. Yan, and H. Cheng, "Mining Compressed Frequent Pattern Sets,"
Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), pp. 709-720, 2005.- [13] S.A. Cook, "The Complexity of Theorem-Proving Procedures,"
Proc. Third Ann. ACM Symp. Theory of Computing (STOC), pp. 151-158, 1971.- [14] M. Worlein, "Extension and Parallelization of a Graph-Mining-Algorithm," PhD dissertation, Friedrich-Alexander-Universitat, 2006.
- [15] F. Afrati, A. Gionis, and H. Mannila, "Approximating a Collection of Frequent Sets,"
Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 12-19, 2004.- [16] X. Yan, H. Cheng, J. Han, and D. Xin, "Summarizing Itemset Patterns: A Profile-Based Approach,"
Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 314-323, 2005.- [17] C. Wang and S. Parthasarathy, "Summarizing Itemset Patterns Using Probabilistic Models,"
Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 730-735, 2006.- [18] M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki, "Origami: Mining Representative Orthogonal Graph Patterns,"
Proc. IEEE Seventh Int'l Conf. Data Mining (ICDM), pp. 153-162, 2007.- [19] S. Zhang, J. Yang, and V. Cheedella, "Monkey: Approximate Graph Mining Based on Spanning Trees,"
Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), pp. 1247-1249, 2007.- [20] S. Zhang and J. Yang, "RAM: Randomized Approximate Graph Mining,"
Proc. Scientific and Statistical Database Management (SSDBM), pp. 187-203, 2008.- [21] A.K. Jain and R.C. Dubes,
Algorithms for Clustering Data. Prentice Hall, 1988.- [22] T.H. Cormen, C.E. leiserson, R.L. Rivest, and C. Stein,
Introduction to Algorithms, second ed. MIT Press, 2001.- [23] V. Vazirani,
Approxiamation Algorithms. Springer, 2001.- [24] X. Yan, H. Cheng, J. Han, and P.S. Yu, "Mining Significant Graph Patterns by Leap Search,"
Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 433-444, 2008.- [27] Y. Liu, J. li, and G. Hong, "Efficient Algorithms for Summarizing Graph Patterns," technical report, Dept. Computer Science, Harbin Inst. of Technology, http://db.cs.hit.edu.cn/reports/2009DBTR_SummariizngGraphPatterns.pdf , 2009.
- [28] H. He and A.K. Singh, "GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space,"
Proc. Sixth Int'l Conf. Data Mining (ICDM), pp. 885-890, 2006.- [29] C. Chang and C. Lin, "LIBSVM: A Library for Support Vector Machines," http://www.csie.ntu.edu.tw/cjlinlibsvm, 2001.
- [30] M. Thoma, H. Cheng, A. Gretton, J. Han, H.P. Kriegel, A. Smola, L. Song, P.S. Yu, X. Yan, and K. Borgwardt, "Near-Optimal Supervised Feature Selection among Frequent Subgraphs,"
Proc. SIAM Conf. Data Mining, pp. 1076-1087, 2009. |