The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2008 vol.20)
pp: 300-320
ABSTRACT
In this paper, we present a new tree mining algorithm, DryadeParent, based on the hooking principle first introduced in Dryade. In the experiments, we demonstrate that the branching factor and depth of the frequent patterns to find are key factors of complexity for tree mining algorithms, even if often overlooked in previous work. We show that DryadeParent outperforms the current fastest algorithm, CMTreeMiner, by orders of magnitude on datasets where the frequent patterns have a high branching factor.
INDEX TERMS
Data mining, Mining methods and algorithms, Mining tree structured data
CITATION
Alexandre Termier, Marie-Christine Rousset, Michèle Sebag, Kouzou Ohara, Takashi Washio, Hiroshi Motoda, "DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 3, pp. 300-320, March 2008, doi:10.1109/TKDE.2007.190695
REFERENCES
[1] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 11th Int'l Conf. Data Eng. (ICDE '95), P.S. Yu and A.S.P. Chen, eds., pp. 3-14, , 1995.
[2] H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, citeseer.ist.psu.edu/agrawal95mining.htmlciteseer.ist.psu.edu mannila97discovery.html , 1997.
[3] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa, “Efficient Substructure Discovery from Large Semi-Structured Data,” Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), pp. 158-174, Apr. 2002.
[4] A. Inokuchi, T. Washio, and H. Motoda, “Complete Mining of Frequent Patterns from Graphs: Mining Graph Data,” Machine Learning, vol. 50, no. 3, pp. 321-354, 2003.
[5] M. Kuramochi and G. Karypis, “An Efficient Algorithm for Discovering Frequent Subgraphs,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1038-1051, Sept. 2004.
[6] J.-H. Cui, J. Kim, D. Maggiorini, K. Boussetta, and M. Gerla, “Aggregated Multicast—A Comparative Study,” Proc. Second Int'l IFIP-TC6 Networking Conf. (NETWORKING '02): Networking Technologies, Services, and Protocols; Performance of Computer and Comm. Networks; and Mobile and Wireless Comm., pp. 1032-1044, 2002.
[7] D. Shasha, J.T.L. Wang, and S. Zhang, “Unordered Tree Mining with Applications to Phylogeny,” Proc. 20th Int'l Conf. Data Eng. (ICDE '04), p. 708, 2004.
[8] M.J. Zaki, “Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 8, pp. 1021-1035, Aug. 2005.
[9] L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya, “Mining Frequent Query Patterns from XML Queries,” Proc. Eighth Int'l Conf. Database Systems for Advanced Applications (DASFAA '03), p. 355, 2003.
[10] M.J. Zaki and C.C. Aggarwal, “XRules: An Effective Structural Classifier for XML Data,” Proc. ACM SIGKDD '03, citeseer.ist. psu.eduzaki03xrules.html, 2003.
[11] A. Termier, M. Rousset, and M. Sebag, “Dryade: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp. 543-546, 2004.
[12] H. Arimura and T. Uno, “An Output-Polynomial Time Algorithm for Mining Frequent Closed Attribute Trees,” Proc. 15th Int'l Conf. Inductive Logic Programming (ILP '05), 2005.
[13] Y. Chi, Y. Yang, Y. Xia, and R.R. Muntz, “CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees,” Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), 2004.
[14] P. Kilpeläinen, “Tree Matching Problems with Applications to Structured Text Databases,” PhD dissertation, Technical Report A-1992-6, Dept. of Computer Science, Univ. of Helsinki, Nov. 1992.
[15] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory (ICDT '99), 1999.
[16] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), 1994.
[17] M.J. Zaki and C.-J. Hsiao, “Charm: An Efficient Algorithm for Closed Itemset Mining,” Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), Apr. 2002.
[18] T. Uno, M. Kiyomi, and H. Arimura, “LCM v.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets,” Proc. Second Workshop Frequent Itemset Mining Implementations (FIMI '04), 2004.
[19] M.J. Zaki, “Efficiently Mining Frequent Trees in a Forest,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), July 2002.
[20] T. Asai, H. Arimura, T. Uno, and S. Nakano, “Discovering Frequent Substructures in Large Unordered Trees,” Proc. Sixth Int'l Conf. Discovery Science (DS '03), pp. 47-61, 2003.
[21] S. Nijssen and J.N. Kok, “Efficient Discovery of Frequent Unordered Trees,” Proc. First Int'l Workshop Mining Graphs, Trees and Sequences (MGTS '03), 2003.
[22] Y. Xiao, J.-F. Yao, Z. Li, and M.H. Dunham, “Efficient Data Mining for Maximal Frequent Subtrees,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), p. 379, 2003.
[23] M.J. Zaki, “Efficiently Mining Frequent Embedded Unordered Trees,” Fundamenta Informaticae, special issue on advances in mining graphs, trees and sequences, vol. 65, nos. 1-2, pp. 33-52, Mar./Apr. 2005.
[24] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, “Dynamic Programming,” Introduction to Algorithms, second ed., pp. 323-369, MIT Press, 2001.
[25] R. Chalmers and K. Almeroth, “Modeling the Branching Characteristics and Efficiency Gains of Global Multicast Trees,” Proc. IEEE INFOCOM '01, Apr. 2001.
[26] L. Denoyer, “XML Mining Challenge,” http://xmlmining.lip6.frCorpus, 2006.
[27] W. Consortium, Extensible Markup Language (XML) 1.0, fourth ed., http://www.w3.org/TRREC-xml/, 2006.
[28] E. Gold, “Language Identification in the Limit,” Information and Control, vol. 10, pp. 447-474, 1967.
[29] Y. Papakonstantinou and V. Vianu, “DTD Inference for Views of XML Data,” Proc. ACM SIGMOD, 2000.
[30] I.C. Jarvie, “America's Sociological Movies,” Arts in Soc., vol. 10, no. 2, pp. 171-181, Summer-Fall 1973.
34 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool