| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Fast Algorithms for Frequent Itemset Mining Using FP-Trees
October 2005 (vol. 17 no. 10)
pp. 1347-1362
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. We also present experimental results comparing our methods with existing algorithms. The results show that our methods are the fastest for many cases. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense.
[1] 1347 http://www.almaden. ibm.com/cs/questsyndata.html , 2003.[2] http://www.almaden.ibm.com/cs/people/bayardo resources. html, 2003.[3] http:/fimi.cs.helsinki.fi, 2003.[4] http://www-sal.cs.uiuc.edu/~hanj/pubssoftware.htm , 2004.[5] R. Agrawal , T. Imielinski , and A. Swami , “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993. [6] R. Agrawal and R. Srikant , “Fast Algorithms for Mining Association Rules,” Proc. Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.[7] R. Agrawal and R. Srikant , “Mining Sequential Patterns,” Proc. Int'l Conf. Data Eng., pp. 3-14, Mar. 1995. [8] R.J. Bayardo , “Efficiently Mining Long Patterns from Databases,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 85-93, 1998. [9] C. Borgelt , “Efficient Implementations of Apriori and Eclat,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations, CEUR Workshop Proc., vol. 80, Nov. 2003.[10] D. Burdick , M. Calimlim , and J. Gehrke , “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. Int'l Conf. Data Eng., pp. 443-452, Apr. 2001. [11] K. Gouda and M.J. Zaki , “Efficiently Mining Maximal Frequent Itemsets,” Proc. IEEE Int'l Conf. Data Mining, pp. 163-170, 2001. [12] G. Grahne and J. Zhu , “High Performance Mining of Maximal Frequent Itemsets,” Proc. SIAM Workshop High Performance Data Mining: Pervasive and Data Stream Mining, May 2003.[13] G. Grahne and J. Zhu , “Mining Frequent Itemsets from Secondary Memory,” Proc. IEEE Int'l Conf. Data Mining, pp. 91-98, Nov. 2004. [14] J. Han , J. Pei , and Y. Yin , “Mining Frequent Patterns without Candidate Generation,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, May 2000.[15] J. Han , J. Pei , Y. Yin , and R. Mao , “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53-87, 2004. [16] J. Han , J. Wang , Y. Lu , and P. Tzvetkov , “Mining Top-K Frequent Closed Patterns without Minimum Support,” Proc. Int'l Conf. Data Mining, pp. 211-218, Dec. 2002. [17] M. Kamber , J. Han , and J. Chiang , “Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 207-210, Aug. 1997.[18] D. Knuth , Sorting and Searching. Reading, Mass.: Addison Wesley, 1973.[19] G. Liu , H. Lu , J.X. Yu , W. Wei , and X. Xiao , “AFOPT: An Efficient Implementation of Pattern Growth Approach,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations, CEUR Workshop Proc., vol. 80, Nov. 2003.[20] H. Mannila and H. Toivonen , “Levelwise Search and Borders of Theories in Knowledge Discovery,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 241-258, 1997.[21] H. Mannila , H. Toivonen , and I. Verkamo , “Efficient Algorithms for Discovering Association Rules,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 181-192, July 1994.[22] H. Mannila , H. Toivonen , and I. Verkamo , “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997.[23] S. Orlando , C. Lucchese , P. Palmerini , R. Perego , and F. Silvestri , “kDCI: A Multi-Strategy Algorithm for Mining Frequent Sets,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations, CEUR Workshop Proc., vol. 80, Nov. 2003.[24] N. Pasquier , Y. Bastide , R. Taouil , and L. Lakhal , “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Int'l Conf. Database Theory, pp. 398-416, Jan. 1999.[25] J. Park , M. Chen , and P. Yu , “Using a Hash-Based Method with Transaction Trimming for Mining Association Rules,” IEEE Trans. Knowledge and Data Eng. vol. 9, no. 5, pp. 813-825, 1997. [26] J. Pei , J. Han , and R. Mao , “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 21-30, May 2000.[27] A. Pietracaprina and D. Zandolin , “Mining Frequent Itemsets Using Patricia Tries,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations, CEUR Workshop Proc., vol. 80, Nov. 2003.[28] A. Savasere , E. Omiecinski , and S. Navathe , “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.[29] H. Toivonen , “Sampling Large Databases for Association Rules,” Proc. Int'l Conf. Very Large Data Bases, pp. 134-145, Sept. 1996.[30] J. Wang , J. Han , and J. Pei , “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 236-245, Aug. 2003.[31] D. Xin , J. Han , X. Li , and B.W. Wah , “Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration,” Proc. Int'l Conf. Very Large Data Bases, pp. 476-487, Sept. 2003.[32] Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations, B. Goethals and M.J. Zaki, eds., CEUR Workshop Proc., vol. 80, Nov. 2003, http://CEUR-WS.orgVol-90.[33] M.J. Zaki , “Scalable Algorithms for Association Mining,” IEEE Trans. Knowledge and Data Mining, vol. 12, no. 3, pp. 372-390, 2000. [34] M.J. Zaki and C. Hsiao , “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proc. SIAM Int'l Conf. Data Mining, pp. 457-473, Apr. 2002.[35] M.J. Zaki and K. Gouda , “Fast Vertical Mining Using Diffsets,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 326-335, Aug. 2003. [36] Q. Zou , W.W. Chu , and B. Lu , “SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets,” Proc. IEEE Int'l Conf. Data Mining, Dec. 2002.
Index Terms:
Index Terms- Data mining, association rules.
Citation:
G?sta Grahne, Jianfei Zhu, "Fast Algorithms for Frequent Itemset Mining Using FP-Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347-1362, Oct. 2005, doi:10.1109/TKDE.2005.166