Issue No. 08 - August (2005 vol. 17)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.125
Mohammed J. Zaki , IEEE
Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher), and we also compare it with TreeMinerD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.
Index Terms- Frequent tree mining, rooted, ordered, labeled trees, subtree enumeration, pattern matching, RNA structure, phylogenetic trees, data mining.
M. J. Zaki, "Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications," in IEEE Transactions on Knowledge & Data Engineering, vol. 17, no. , pp. 1021-1035, 2005.