Third IEEE International Conference on Data Mining (ICDM'03)
Efficient Data Mining for Maximal Frequent Subtrees
Melbourne, Florida
November 19-November 22
ISBN: 0-7695-1978-4
Yongqiao Xiao, Georgia College and State University, Milledgeville, GA
Jenq-Foung Yao, Georgia College and State University, Milledgeville, GA
Zhigang Li, Southern Methodist University, Dallas, TX
A new type of tree mining is defined in this paper, which uncovers maximal frequent induced subtrees from a database of unordered labeled trees. A novel algorithm, PathJoin, is proposed. The algorithm uses a compact data structure, FST-Forest, which compresses the trees and still keeps the original tree structure. PathJoin generates candidate subtrees by joining the frequent paths in FST-Forest. Such candidate subtree generation is localized and thus substantially reduces the number of candidate subtrees. Experiments with synthetic data sets show that the algorithm is effective and efficient.
Citation:
Yongqiao Xiao, Jenq-Foung Yao, Zhigang Li, Margaret H. Dunham, "Efficient Data Mining for Maximal Frequent Subtrees," icdm, pp.379, Third IEEE International Conference on Data Mining (ICDM'03), 2003