This Article 
 Bibliographic References 
 Add to: 
A Support-Ordered Trie for Fast Frequent Itemset Discovery
July 2004 (vol. 16 no. 7)
pp. 875-879
Wee-Keong Ng, IEEE Computer Society

Abstract—The importance of data mining is apparent with the advent of powerful data collection and storage tools; raw data is so abundant that manual analysis is no longer possible. Unfortunately, data mining problems are difficult to solve and this prompted the introduction of several novel data structures to improve mining efficiency. Here, we will critically examine existing preprocessing data structures used in association rule mining for enhancing performance in an attempt to understand their strengths and weaknesses. Our analyses culminate in a practical structure called the SOTrieIT (Support-Ordered Trie Itemset) and two synergistic algorithms to accompany it for the fast discovery of frequent itemsets. Experiments involving a wide range of synthetic data sets reveal that its algorithms outperform FP-growth, a recent association rule mining algorithm with excellent performance, by up to two orders of magnitude and, thus, verifying its efficiency and viability.

[1] F. Angiulli, G. Ianni, and L. Palopoli, On the Complexity of Mining Association Rules Proc. Nono Convegno Nazionale su Sistemi Evoluti di Basi di Dati (SEBD), pp. 177-184, 2001.
[2] T.H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms. MIT Press, 1990.
[3] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules Proc. 20th Int'l Conf. Very Large Databases, pp. 487-499, 1994.
[4] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Riedl, Analysis of Recommender Algorithms for E-Commerce Proc. ACM E-Commerce Conf., 2000.
[5] B. Liu, W. Hsu, and Y. Ma, Integrating Classification and Association Rule Mining Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 80-86 1998.
[6] W. Kosters, E. Marchiori, and A. Oerlemans, Mining Clusters with Association Rules Proc. Third Int'l Symp. Intelligent Data Analysis, pp. 39-50, 1999.
[7] Y.K. Woon, W.K. Ng, and E.P. Lim, Online and Incremental Mining of Separately Grouped Web Access Logs Proc. Third Int'l Conf. Web Information Systems Eng., 2002.
[8] K. Satou, G. Shibayama, T. Ono, and Y. Yamamura, Finding Association Rules on Heterogeneous Genome Data Proc. Pacific Symp. Biocomputing, pp. 397-408, 1997.
[9] M.J. Zaki, Scalable Algorithms for Association Mining IEEE Trans. Knowledge and Data Eng., vol. 12, no. 2, pp. 372-390, 2000.
[10] C. Aggarwal and P. Yu, ”Online Generation of Association Rules,” Proc. Int'l Conf. Data Eng., Feb. 1998.
[11] C. Hidber, Online Association Rule Mining Proc. ACM SIGMOD Conf., pp. 145-154, 1999.
[12] A. Amir, R. Feldman, and R. Kashi, A New and Versatile Method for Association Generation Information Systems, vol. 22, no. 6, pp. 333-347, 1999.
[13] D-Y. Yang, A. Johar, A. Grama, and W. Szpankowski, Summary Structures for Frequency Queries on Large Transaction Sets Proc. Data Compression Conf., pp. 420-429, 2000.
[14] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM SIGMOD Conf., pp. 1-12, 2000.
[15] J.-S. Park, M.-S. Chen, and P.S. Yu, Using a Hash-Based Method with Transaction Trimming for Mining Association Rules IEEE Trans. Knowledge and Data Eng., vol. 9, no. 5, pp. 813-825, Oct. 1997.
[16] P. Shenoy, J.R. Haritsa, S. Sundarshan, G. Bhalotia, M. Bawa, and D. Shah, Turbo-Charging Vertical Mining of Large Databases Proc. ACM SIGMOD Conf., pp. 22-33, 2000.
[17] R. Rymon, Search through Systematic Set Enumeration Proc. Third Int'l Conf. Principles of Knowledge Representation and Reasoning, pp. 539-550, 1992.
[18] F. Coenen, G. Goulbourne, and P.H. Leng, Computing Association Rules Using Partial Totals Proc. Fifth European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 54-66, 2001.
[19] Y.K. Woon, W.K. Ng, and A. Das, Fast Online Dynamic Association Rule Mining Proc. Second Int'l Conf. Web Information Systems Eng., pp. 278-287, 2001.
[20] A. Das, W.K. Ng, and Y.K. Woon, Rapid Association Rule Mining Proc. 10th Int'l Conf. Information and Knowledge Management, pp. 474-481, 2001.
[21] The University of Massachusetts, Boston, ARMiner Project,, 2000.
[22] National Association of Recording Merchandisers, 2000 Ann. Survey Results, , 2000.

Index Terms:
Data mining, association rule mining, data structures.
Yew-Kwong Woon, Wee-Keong Ng, Ee-Peng Lim, "A Support-Ordered Trie for Fast Frequent Itemset Discovery," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 7, pp. 875-879, July 2004, doi:10.1109/TKDE.2004.1318569
Usage of this product signifies your acceptance of the Terms of Use.