Subscribe

Issue No.03 - March (2008 vol.20)

pp: 300-320

ABSTRACT

In this paper, we present a new tree mining algorithm, DryadeParent, based on the hooking principle first introduced in Dryade. In the experiments, we demonstrate that the branching factor and depth of the frequent patterns to find are key factors of complexity for tree mining algorithms, even if often overlooked in previous work. We show that DryadeParent outperforms the current fastest algorithm, CMTreeMiner, by orders of magnitude on datasets where the frequent patterns have a high branching factor.

INDEX TERMS

Data mining, Mining methods and algorithms, Mining tree structured data

CITATION

Alexandre Termier, Marie-Christine Rousset, Michèle Sebag, Kouzou Ohara, Takashi Washio, Hiroshi Motoda, "DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm",

*IEEE Transactions on Knowledge & Data Engineering*, vol.20, no. 3, pp. 300-320, March 2008, doi:10.1109/TKDE.2007.190695REFERENCES

- [2] H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,”
Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, citeseer.ist.psu.edu/agrawal95mining.htmlciteseer.ist.psu.edu mannila97discovery.html , 1997.- [3] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa, “Efficient Substructure Discovery from Large Semi-Structured Data,”
Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), pp. 158-174, Apr. 2002.- [4] A. Inokuchi, T. Washio, and H. Motoda, “Complete Mining of Frequent Patterns from Graphs: Mining Graph Data,”
Machine Learning, vol. 50, no. 3, pp. 321-354, 2003.- [6] J.-H. Cui, J. Kim, D. Maggiorini, K. Boussetta, and M. Gerla, “Aggregated Multicast—A Comparative Study,”
Proc. Second Int'l IFIP-TC6 Networking Conf. (NETWORKING '02): Networking Technologies, Services, and Protocols; Performance of Computer and Comm. Networks; and Mobile and Wireless Comm., pp. 1032-1044, 2002.- [10] M.J. Zaki and C.C. Aggarwal, “XRules: An Effective Structural Classifier for XML Data,”
Proc. ACM SIGKDD '03, citeseer.ist. psu.eduzaki03xrules.html, 2003.- [12] H. Arimura and T. Uno, “An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees,”
Proc. 15th Int'l Conf. Inductive Logic Programming (ILP '05), 2005.- [13] Y. Chi, Y. Yang, Y. Xia, and R.R. Muntz, “CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees,”
Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), 2004.- [14] P. Kilpeläinen, “Tree Matching Problems with Applications to Structured Text Databases,” PhD dissertation, Technical Report A-1992-6, Dept. of Computer Science, Univ. of Helsinki, Nov. 1992.
- [15] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,”
Proc. Seventh Int'l Conf. Database Theory (ICDT '99), 1999.- [16] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,”
Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), 1994.- [17] M.J. Zaki and C.-J. Hsiao, “Charm: An Efficient Algorithm for Closed Itemset Mining,”
Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), Apr. 2002.- [18] T. Uno, M. Kiyomi, and H. Arimura, “LCM v.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets,”
Proc. Second Workshop Frequent Itemset Mining Implementations (FIMI '04), 2004.- [19] M.J. Zaki, “Efficiently Mining Frequent Trees in a Forest,”
Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), July 2002.- [20] T. Asai, H. Arimura, T. Uno, and S. Nakano, “Discovering Frequent Substructures in Large Unordered Trees,”
Proc. Sixth Int'l Conf. Discovery Science (DS '03), pp. 47-61, 2003.- [21] S. Nijssen and J.N. Kok, “Efficient Discovery of Frequent Unordered Trees,”
Proc. First Int'l Workshop Mining Graphs, Trees and Sequences (MGTS '03), 2003.- [23] M.J. Zaki, “Efficiently Mining Frequent Embedded Unordered Trees,”
Fundamenta Informaticae, special issue on advances in mining graphs, trees and sequences, vol. 65, nos. 1-2, pp. 33-52, Mar./Apr. 2005.- [24] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, “Dynamic Programming,”
Introduction to Algorithms, second ed., pp. 323-369, MIT Press, 2001.- [25] R. Chalmers and K. Almeroth, “Modeling the Branching Characteristics and Efficiency Gains of Global Multicast Trees,”
Proc. IEEE INFOCOM '01, Apr. 2001.- [26] L. Denoyer, “XML Mining Challenge,” http://xmlmining.lip6.frCorpus, 2006.
- [27] W. Consortium, Extensible Markup Language (XML) 1.0, fourth ed., http://www.w3.org/TRREC-xml/, 2006.
- [29] Y. Papakonstantinou and V. Vianu, “DTD Inference for Views of XML Data,”
Proc. ACM SIGMOD, 2000.- [30] I.C. Jarvie, “America's Sociological Movies,”
Arts in Soc., vol. 10, no. 2, pp. 171-181, Summer-Fall 1973. |