This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Discovering Frequent Graph Patterns Using Disjoint Paths
November 2006 (vol. 18 no. 11)
pp. 1441-1456
Ehud Gudes, IEEE Computer Society
Solomon Eyal Shimony, IEEE Computer Society
Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data mining, the issue is frequent labels and common specific topologies. Here, the structure of the data is just as important as its content. We study the problem of discovering typical patterns of graph data, a task made difficult because of the complexity of required subtasks, especially subgraph isomorphism. In this paper, we propose a new Apriori-based algorithm for mining graph data, where the basic building blocks are relatively large, disjoint paths. The algorithm is proven to be sound and complete. Empirical evidence shows practical advantages of our approach for certain categories of graphs.

[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases, Sept. 1994.
[2] M.S. Chen, J.S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Mar./Apr. 1998.
[3] D. Chamberlin, “XQuery: A Query Language for XML,” Proc. SIGMOD Conf., 2003.
[4] Y. Chi, S. Nijssen, R.R. Muntz, and J.N. Kok, “Frequent Subtree Mining: An Overview,” Fundamenta Informaticae, special issue graph and tree mining, 2005.
[5] C. Chung, J. Ki Min, and K. Shim, “APEX: An Adaptive Path Index for XML Data,” Proc. SIGMOD Conf. 2002, pp. 121-132, 2002.
[6] M. Cohen and E. Gudes, “Diagonally Subgraphs Pattern Mining,” Proc. Ninth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, 2004.
[7] J. Cook and L. Holder, “Substructure Discovery Using Minimum Description Length and Background Knowledge,” J. Artificial Intelligence Research, pp. 231-255, 1994.
[8] L. Dehaspe, H. Toivonen, and R.D. King, “Finding Frequent Substructures in Chemical Compounds,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD '98), pp. 30-36, 1998.
[9] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Maier, and D. Suciu, “Querying XML Data,” IEEE Data Eng. Bull., vol. 22, no. 3, pp. 27-34, 1999.
[10] A. Deutsch, M.F. Fernandez, and D. Suciu, “Storing Semistructured Data with STORED,” Proc. SIGMOD Conf., pp. 431-442, 1999.
[11] C. Domshlak, R. Brafman, and S.E. Shimony, “Preference-Based Configuration of Web Page Content,” Proc. Int'l Joint Conf. Artificial Intelligence, Aug. 2001.
[12] L. Garton, C. Haythornthwaite, and B. Wellman, “Studying Online Social Networks,” J. Computer-Mediated Comm., vol. 3, no. 1, 2004.
[13] R. Goldman and J. Widom, “DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases,” Proc. 23rd Very Large Data Bases Conf. (VLDB '97), 1997.
[14] E. Gudes, S.E. Shimony, and N. Vanetik, “Support Measures for Semistructured Data,” Data Mining and Knowledge Discovery J., to appear in vol. 13, 2006.
[15] M. Hong, H. Zhou, W. Wang, and B. Shi, “An Efficient Algorithm of Frequent Connected Subgraph Extraction,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2003.
[16] J. Huan, W. Wang, and J. Prins, “Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism,” Proc. IEEE Int'l Conf. Data Mining (ICDM '03), pp.549-552, 2003.
[17] A. Inokuchi, T. Washio, and H. Motoda, “An Apriori Based Algorithm for Mining Frequent Substructures from Graph Data,” Proc. European Conf. Principles of Data Mining and Knowledge Discovery (PKDD '00), 2000.
[18] A. Inokuchi, T. Washio, and H. Motoda, “Complete Mining of Frequent Patterns from Graphs, Mining Graph Data,” Machine Learning, vol. 50, no. 3, pp. 321-354, 2003.
[19] M. Kuramochi and G. Karypis, “Frequent Subgraph Discovery,” Proc. IEEE Int'l Conf. Data Mining (ICDM), 2001.
[20] M. Kuramochi and G. Karypis, “An Efficient Algorithm for Discovering Frequent Subgraphs,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, Sept. 2004.
[21] M. Kuramochi and G. Karypis, “Finding Frequent Patterns in a Large Sparse Graph,” Proc. 2004 Soc. Industrial and Applied Math. (SIAM) Data Mining Conf., 2004.
[22] V. Lipets and E. Gudes, “An Efficient Algorithm for Subgraph Isomorphism,” Proc. Fourth Haifa Workshop Graph Theory and Algorithms, 2004.
[23] X. Lin, C. Liu, Y. Zhang, and X. Zhou, “Efficiently Computing Frequent Tree-Like Topology Patterns in a Web Environment,” Proc. 31st Int'l Conf. Technology of Object-Oriented Language and Systems, 1998.
[24] A. Meisels, M. Orlov, and T. Maor, “Discovering Associations in XML Data,” technical report, Ben-Gurion Univ., 2001.
[25] T. Milo and D. Suciu, “Index Structures for Path Expressions,” Proc. Int'l Conf. Database Theory (ICDT '99), pp. 277-295, 1999.
[26] S. Muggleton and L. DeRaedt, “Inductive Logic Programming: Theory and Methods,” J. Logic Programming, vol. 19, no. 2, pp. 629-679, 1994.
[27] S. Nijssen and J.N. Kok, “Frequent Graph Mining and Its Application to Molecular Databases,” Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics, pp. 4571-4577, 2004.
[28] Internet Movie Database, http:/us.imdb.com, 2002.
[29] X. Pennec and N. Ayache, “A Geometric Algorithm to Find Small but Highly Similar 3D Substructures in Proteins,” Bioinformatics, vol. 14, no. 6, pp. 516-522, 1998.
[30] D. Shasha, J.T.L. Wang, and R. Guigno, “Algorithmics and Applications of Tree and Graph Searching,” Proc. 21st ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems, pp. 39-52, 2002.
[31] F. Tian, D. DeWitt, J. Chen, and C. Zhang, “The Design and Performance Evaluation of Alternative XML Storage Strategies,” technical report, Computer Sciences Dept., Univ. of Wisconsin, 2000.
[32] N. Vanetik, E. Gudes, and S.E. Shimony, “Computing Frequent Graph Patterns from Semistructured Data,” Proc. Int'l Conf. Data Mining (ICDM), pp. 458-465, 2002.
[33] N. Vanetik and E. Gudes, “Mining Frequent Labeled and Partially Labeled Graph Patterns,” Proc. Int'l Conf. Data Eng. (ICDE '04), pp.91-102, 2004.
[34] C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi, “Scalable Mining of Large Disk Based Graph Database,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), 2004.
[35] K. Wang and H. Liu, “Discovering Typical Structures of Documents: A Road Map Approach,” Proc. SIGIR Conf., pp. 146-154, 1998.
[36] X. Wang, J.T. Li Wang, D. Shasha, B. Shapiro, I. Rigoutsos, and K. Zhang, “Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 4, pp. 731-749, July/Aug. 2002.
[37] T. Washio and H. Motoda, “State of the Art of Graph-Based Data Mining,” SIGKDD Explorations, July 2003.
[38] S. Wasserman, K. Faust, and D. Iacobucci, Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge Univ. Press, 1994.
[39] X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining,” Proc. Int'l Conf. Data Mining, pp. 721-724, 2002.
[40] X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[41] K. Yoshida, H. Motoda, and N. Indurkhya, “Graph-Based Induction as a Unified Learning Framework,” J. Applied Intelligence, pp. 297-328, 1994.

Index Terms:
Database applications, data mining, mining methods and algorithms, Web mining, graph mining.
Citation:
Ehud Gudes, Solomon Eyal Shimony, Natalia Vanetik, "Discovering Frequent Graph Patterns Using Disjoint Paths," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1441-1456, Nov. 2006, doi:10.1109/TKDE.2006.173
Usage of this product signifies your acceptance of the Terms of Use.