Issue No. 02 - February (2005 vol. 17)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.30
Yun Chi , IEEE
Richard R. Muntz , IEEE
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of frequent subtrees and, therefore, mining all frequent subtrees becomes infeasible for large tree sizes. In this paper, we present CMTreeMiner, a computationally efficient algorithm that discovers only closed and maximal frequent subtrees in a database of labeled rooted trees, where the rooted trees can be either ordered or unordered. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all frequent subtrees. Several techniques are proposed to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible. We study the performance of our algorithm through extensive experiments, using both synthetic data and data sets from real applications. The experimental results show that our algorithm is very efficient in reducing the search space and quickly discovers all closed and maximal frequent subtrees.
Trees, graph algorithms, data mining, mining methods and algorithms, frequent subtree, closed frequent subtree, maximal frequent subtree.
R. R. Muntz, Y. Yang, Y. Xia and Y. Chi, "Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees," in IEEE Transactions on Knowledge & Data Engineering, vol. 17, no. , pp. 190-202, 2005.