This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets
May 2005 (vol. 17 no. 5)
pp. 652-664
Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.

[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, Sept. 1994.
[2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, Mar. 1995.
[3] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal, “Mining Frequent Patterns with Counting Inference,” SIGKDD Explorations, pp. 66-75, vol. 2, 2000.
[4] S.D. Bay and M.J. Pazzani, “Detecting Change in Categorical Data: Mining Contrast Sets,” Proc. 1999 Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 302-306, Aug. 1999.
[5] R.J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. 1998 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 85-93, June 1998.
[6] F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi, “Exante: Anticipated Data Reduction in Constrained Pattern Mining,” Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '03), Sept. 2003.
[7] J.-F. Boulicaut and A. Bykowski, “Frequent Closures As a Concise Representation for Binary Data Mining,” Proc. 2000 Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '00), pp. 62-73, Apr. 2000.
[8] J.-F. Boulicaut, A. Bykowski, and C. Rigotti, “Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries,” Data Mining and Knowledge Discovery, vol. 7, pp. 5-22, 2003.
[9] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Basket: Generalizing Association Rules to Correlations,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '97), pp. 265-276, May 1997.
[10] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 443-452, Apr. 2001.
[11] E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J.D. Ullman, and C. Yang, “Finding Interesting Associations without Support Pruning,” Proc. 2000 Int'l Conf. Data Eng. (ICDE '00), pp. 489-499, Feb. 2000.
[12] A.W.-C. Fu, R.W.-W. Kwong, and J. Tang, “Mining n-Most Interesting Itemsets,” Proc. 2000 Int'l Symp. Methodologies for Intelligent Systems (ISMIS '00), pp. 59-67, Oct. 2000.
[13] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases (VLDB '95), pp. 420-431, Sept. 1995.
[14] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp. 1-12, May 2000.
[15] C. Hidber, “Online Association Rule Mining,” Proc. 1999 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp. 145-156, June 1999.
[16] Y.-K. Lee, W.-Y. Kim, Y.D. Cai, and J. Han, “CoMine: Efficient Mining of Correlated Patterns,” Proc. 2003 Int'l Conf. Data Mining (ICDM '03), Nov. 2003.
[17] G. Liu, H. Lu, W. Lou, and J.X. Yu, “On Computing, Storing, and Querying Frequent Patterns,” Proc. 2003 ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), Aug. 2003.
[18] S. Morishita and A. Nakaya, “Parallel Branch-and-Bound Graph Search for Correlated Association Rules,” Large-Scale Parallel Data Mining, pp. 127-144, 1999.
[19] S. Morishita and J. Sese, “Traversing Itemset Lattice with Statistical Metric Pruning,” Proc. 2000 ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS '00), pp. 226-236, May 2001.
[20] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” Proc. 1998 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 13-24, June 1998.
[21] F. Pan, G. Cong, A.K.H. Tung, J. Yang, and M. Zaki, “CARPENTER: Finding Closed Patterns in Long Biological Datasets,” Proc. 2003 ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), Aug. 2003.
[22] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory (ICDT '99), pp. 398-416, Jan. 1999.
[23] J. Pei, J. Han, and R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. 2000 ACM-SIGMOD Int'l Workshop Data Mining and Knowledge Discovery (DMKD '00), pp. 11-20, May 2000.
[24] F. Rioult, J.-F. Boulicaut, B. Cremileux, and J. Besson, “Using Transposition for Pattern Discovery from Microarray Data,” Proc. Eighth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, June 2003.
[25] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. 2003 ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 236-245, Aug. 2003.
[26] K. Wang, Y. He, D. Cheung, and F. Chin, “Mining Confident Rules without Support Requirement,” Proc. 2001 ACM CIKM Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 81-88, Nov. 2001.
[27] M.J. Zaki and C.J. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proc. 2002 SIAM Int'l Conf. Data Mining (SDM '02), pp. 457-473, Apr. 2002.

Index Terms:
Data mining, frequent itemset, association rules, mining methods and algorithms.
Citation:
Jianyong Wang, Jiawei Han, Ying Lu, Petre Tzvetkov, "TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 652-664, May 2005, doi:10.1109/TKDE.2005.81
Usage of this product signifies your acceptance of the Terms of Use.