This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees
February 2005 (vol. 17 no. 2)
pp. 190-202
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of frequent subtrees and, therefore, mining all frequent subtrees becomes infeasible for large tree sizes. In this paper, we present CMTreeMiner, a computationally efficient algorithm that discovers only closed and maximal frequent subtrees in a database of labeled rooted trees, where the rooted trees can be either ordered or unordered. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all frequent subtrees. Several techniques are proposed to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible. We study the performance of our algorithm through extensive experiments, using both synthetic data and data sets from real applications. The experimental results show that our algorithm is very efficient in reducing the search space and quickly discovers all closed and maximal frequent subtrees.

[1] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa, “Efficient Substructure Discovery from Large Semi-Structured Data,” Proc. Second SIAM Int'l Conf. Data Mining, Apr. 2002.
[2] T. Asai, H. Arimura, T. Uno, and S. Nakano, “Discovering Frequent Substructures in Large Unordered Trees,” Proc. Sixth Int'l Conf. Discovery Science, Oct. 2003.
[3] R. Chalmers and K. Almeroth, “On the Topology of Multicast Trees,” technical report, Univ. of California, Santa Barbara, Mar. 2002.
[4] Y. Chi, Y. Yang, and R.R. Muntz, “Canonical Forms for Labeled Trees and Their Applications in Frequent Subtree Mining,” Knowledge and Information Systems, to appear.
[5] Y. Chi, Y. Yang, and R.R. Muntz, “Indexing and Mining Free Trees,” Proc. Int'l Conf. Data Mining (ICDM '03), Nov. 2003.
[6] Y. Chi, Y. Yang, and R.R. Muntz, “HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms,” Proc. 16th Int'l Conf. Scientific and Statistical Database Management (SSDBM '04), June 2004.
[7] Y. Chi, Y. Yang, Y. Xia, and R.R. Muntz, “CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees,” Proc. Eighth Pacific Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), May 2004.
[8] J. Cui, J. Kim, D. Maggiorini, K. Boussetta, and M. Gerla, “Aggregated Multicast–A Comparative Study,” Proc. IFIP Networking Conf. 2002, May 2002.
[9] M.R. Garey and D.S. Johnson, Computers and Intractability— A Guide to the Theory of NP-Completeness. New York: W.H. Freeman, 1979.
[10] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. Int'l Conf. Management of Data (ACM SIGMOD '00), May 2000.
[11] J. Hein, T. Jiang, L. Wang, and K. Zhang, “On the Complexity of Comparing Evolutionary Trees,” Discrete Applied Math., vol. 71, pp. 153-169, 1996.
[12] J. Huan, W. Wang, and J. Prins, “Efficient Mining of Frequent Subgraph in the Presence of Isomorphism,” Proc. Int'l Conf. Data Mining (ICDM '03), 2003.
[13] A. Inokuchi, T. Washio, and H. Motoda, “An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data,” Proc. Fourth European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '00), pp. 13-23, Sept. 2000.
[14] T. Kudo, “FREQT: An Implementation of FREQT,” http://chasen.org/~taku/softwarefreqt/, 2003.
[15] M. Kuramochi and G. Karypis, “Frequent Subgraph Discovery,” Proc. Int'l Conf. Data Mining (ICDM '01), Nov. 2001.
[16] F. Luccio, A.M. Enriquez, P.O. Rieumont, and L. Pagli, “Bottom-Up Subtree Isomorphism for Unordered Labeled Trees,” Technical Report TR-04-13, Università di Pisa, 2004.
[17] S. Nijssen and J.N. Kok, “Efficient Discovery of Frequent Unordered Trees,” Proc. Int'l Workshop Mining Graphs, Trees, and Sequences, 2003.
[18] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Lecture Notes in Computer Science, vol. 1540, pp. 398-416, 1999.
[19] U. Rückert and S. Kramer, “Frequent Free Tree Discovery in Graph Data,” Special Track on Data Mining, Proc. ACM Symp. Applied Computing (SAC '04), 2004.
[20] D. Shasha, J.T.L. Wang, and S. Zhang, “Unordered Tree Mining with Applications to Phylogeny,” Proc. 20th Int'l Conf. Data Eng., 2004.
[21] C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi, “Efficient Pattern-Growth Methods for Frequent Tree Pattern Mining,” Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), 2004.
[22] K. Wang and H. Liu, “Discovering Typical Structures of Documents: A Road Map Approach,” Proc. 21st Int'l Conf. Research and Development in Information Retrieval (ACM SIGIR '98), pp. 146-154, 1998.
[23] Y. Xiao, J-F Yao, Z. Li, and M. Dunham, “Efficient Data Mining for Maximal Frequent Subtrees,” Proc. Int'l Conf. Data Mining (ICDM'03), Nov. 2003.
[24] X. Yan and J. Han, “gSpan: Graph-based Substructure Pattern Mining,” Proc. Int'l Conf. Data Mining (ICDM '02), 2002.
[25] X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD'03), 2003.
[26] L.H. Yang, M.L. Lee, W. Hsu, and S. Achary, “Mining Frequent Quer Patterns from XML Queries,” Proc. Eighth Int'l Conf. Database Systems for Advanced Applications (DASFAA '03), 2003.
[27] M.J. Zaki, “Efficiently Mining Frequent Trees in a Forest,” Proc. Eighth Int'l Conf. Knowledge Discovery and Data Mining (ACM SIGKDD '03), July 2002.
[28] M.J. Zaki and C.C. Aggarwal, “XRules: An Effective Structural Classifier for XML Data,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '03), 2003.

Index Terms:
Trees, graph algorithms, data mining, mining methods and algorithms, frequent subtree, closed frequent subtree, maximal frequent subtree.
Citation:
Yun Chi, Yi Xia, Yirong Yang, Richard R. Muntz, "Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 190-202, Feb. 2005, doi:10.1109/TKDE.2005.30
Usage of this product signifies your acceptance of the Terms of Use.