This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Fast and Memory Efficient Mining of Frequent Closed Itemsets
January 2006 (vol. 18 no. 1)
pp. 21-36
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like Closet+ and FP-Close, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.

[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases, Sept. 1994.
[2] J.-F. Boulicaut, A. Bykowski, and C. Rigotti, “Approximation of Frequency Queries by Means of Free-Sets,” Proc. Fourth European Conf. Principles and Practice of Knowledge Discovery in Databases, Sept. 2000.
[3] S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, May 1997.
[4] A. Bykowski and C. Rigotti, “A Condensed Representation to Find Frequent Patterns,” Proc. 20th ACM Symp. Principles of Database Systems, May 2001.
[5] T. Calders and B. Goethals, “Mining All Non-Derivable Frequent Itemsets,” Proc. Sixth European Conf. Principles and Practice of Knowledge Discovery in Databases, Aug. 2002.
[6] B. Goethals and M.J. Zaki, “Advances in Frequent Itemset Mining Implementations: Report on Fimi '03,” SIGKDD Explorations, vol. 6, no. 1, pp. 109-117, 2004.
[7] G. Grahne and J. Zhu, “Efficiently Using Prefix-Trees in Mining Frequent Itemsets,” Proc. ICDM Workshop Frequent Itemset Mining Implementations, Dec. 2003.
[8] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, May 2000.
[9] M. Kryszkiewicz, “Concise Representation of Frequent Patterns Based on Disjunction-Free Generators,” Proc. IEEE Int'l Conf. Data Mining, Nov. 2001.
[10] M. Kryszkiewicz and M. Gajek, “Concise Representation of Frequent Patterns Based on Generalized Disjunction-Free Generators,” Proc. Sixth Pacific-Asia Conf. Knowledge Discovery and Data Mining, May 2002.
[11] J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” Proc. Ninth ACM Int'l Conf. Knowledge Discovery and Data Mining, 2002.
[12] C. Lucchese, S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “KDCI: A Multi-Strategy Algorithm for Mining Frequent Sets,” Proc. 2003 Workshop Frequent Itemset Mining Implementations, Dec. 2003.
[13] H. Mannila and H. Toivonen, “Multiple Uses of Frequent Sets and Condensed Representations,” Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1996.
[14] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and Resource-Aware Mining of Frequent Sets,” Proc. IEEE Int'l Conf. Data Mining, Dec. 2002.
[15] J.S. Park, M.-S. Chen, and P.S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” Proc. ACM Int'l Conf. Management of Data, May 1995.
[16] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory, Jan. 1999.
[17] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Efficient Mining of Association Rules Using Closed Itemset Lattices,” Information Systems, vol. 24, no. 1, pp. 25-46, 1999.
[18] J. Pei, J. Han, H. Lu, S. Nishio, D. Tang, and S. Yang, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” Proc. IEEE Int'l Conf. Data Mining, Nov. 2001.
[19] J. Pei, J. Han, and R. Mao, “Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Int'l Workshop Data Mining and Knowledge Discovery, May 2000.
[20] J. Pei, J. Han, and J. Wang, “Closet+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2003.
[21] R. Taouil, N. Pasquier, Y. Bastide, L. Lajhal, and G. Stumme, “Mining Frequent Patterns with Counting Inference,” SIGKDD Explorations, vol. 2, no. 2, Dec. 2000.
[22] M.J. Zaki, “Mining Non-Redundant Association Rules,” Data Mining and Knowledge Discovery, vol. 9, no. 3, pp. 223-248, 2004.
[23] M.J. Zaki and K. Gouda, “Fast Vertical Mining Using Diffsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2003.
[24] M.J. Zaki and C.-J. Hsiao, “Charm: An Efficient Algorithm for Closed Itemsets Mining,” Proc. Second SIAM Int'l Conf. Data Mining, Apr. 2002.

Index Terms:
Index Terms- Data mining, association rules, frequent itemsets, condensed representations, closed itemsets, high-performance algorithms.
Citation:
Claudio Lucchese, Salvatore Orlando, Raffaele Perego, "Fast and Memory Efficient Mining of Frequent Closed Itemsets," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 21-36, Jan. 2006, doi:10.1109/TKDE.2006.10
Usage of this product signifies your acceptance of the Terms of Use.