This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
April 2005 (vol. 17 no. 4)
pp. 462-478
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hash-based approach to remove any "nonclosed” sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.

[1] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Inkeri Verkamo, “Fast Discovery of Association Rules,” Advances in Knowledge Discovery, and Data Mining, U. Fayyad et al., eds., pp. 307-328, Menlo Park, Calif.: AAAI Press, 1996.
[2] R. Agrawal, C. Aggarwal, and V.V.V. Prasad, “Depth First Generation of Long Patterns,” Proc. Seventh Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2000.
[3] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal, “Mining Frequent Patterns with Counting Inference,” SIGKDD Explorations, vol. 2, no. 2, Dec. 2000.
[4] R.J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. ACM SIGMOD Conf. Management of Data, June 1998.
[5] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. ACM SIGMOD Conf. Management of Data, May 1997.
[6] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. Proc. Int'l Conf. Data Eng., Apr. 2001.
[7] D. Cristofor, L. Cristofor, and D. Simovici, “Galois Connection and Data Mining,” J. Universal Computer Science, vol. 6, no. 1, pp. 60-73, 2000.
[8] B. Dunkel and N. Soparkar, “Data Organization and Access for Efficient Data Mining,” Proc. 15th IEEE Int'l Conf. Data Eng., Mar. 1999.
[9] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, 1999.
[10] K. Gouda and M.J. Zaki, “Efficiently Mining Maximal Frequent Itemsets,” Proc. First IEEE Int'l Conf. Data Mining, Nov. 2001.
[11] J. Han and M. Kamber, Data Mining: Concepts and Techniuqes. Morgan Kaufmann, 2001.
[12] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD Conf. Management of Data, May 2000.
[13] D-I. Lin and Z.M. Kedem, “Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set,” Proc. Sixth Int'l Conf. Extending Database Technology, Mar. 1998.
[14] L. Nourine and O. Raynaud, “A Fast Algorithm for Building Lattices,” Information Processing Letters, vol. 71, pp. 199-204, 1999.
[15] J.S. Park, M. Chen, and P.S. Yu, “An Effective Hash Based Algorithm for Mining Association Rules,” Proc. ACM SIGMOD Int'l Conf. Management of Data, May 1995.
[16] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory, Jan. 1999.
[17] J. Pei, J. Han, and R. Mao, “Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. SIGMOD Int'l Workshop Data Mining and Knowledge Discovery, May 2000.
[18] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Very Large Data Bases Conf., 1995.
[19] P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, “Turbo-Charging Vertical Mining of Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, May 2000.
[20] J. Wang, J. Han, and J. Pei, “Closet+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2003.
[21] M.J. Zaki, “Generating Non-Redundant Association Rules,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2000.
[22] M.J. Zaki, “Scalable Algorithms for Association Mining,” IEEE Trans. Knowledge and Data Eng., vol. 12, no. 3, pp. 372-390, May-June 2000.
[23] M.J. Zaki and K. Gouda, “Fast Vertical Mining Using Diffsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug. 2003.
[24] M.J. Zaki and C.-J. Hsiao, “ChARM: An Efficient Algorithm for Closed Association Rule Mining,” Technical Report 99-10, Computer Science Dept., Rensselaer Polytechnic Inst., Oct. 1999.
[25] M.J. Zaki and M. Ogihara, “Theoretical Foundations of Association Rules,” Proc. Third ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, June 1998.
[26] M.J. Zaki and B. Phoophakdee, “MIRAGE: A Framework for Mining, Exploring, and Visualizing Minimal Association Rules,” Technical Report 03-4, Computer Science Dept., Rensselaer Polytechnic Inst., July 2003.

Index Terms:
Closed itemsets, frequent itemsets, closed itemset lattice, association rules, data mining.
Citation:
Mohammed J. Zaki, Ching-Jui Hsiao, "Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 462-478, April 2005, doi:10.1109/TKDE.2005.60
Usage of this product signifies your acceptance of the Terms of Use.