| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints
September 2004 (vol. 16 no. 9)
pp. 1052-1069
In classical association rules mining, a minimum support threshold is assumed to be available for mining frequent itemsets. However, setting such a threshold is typically hard. In this paper, we handle a more practical problem; roughly speaking, it is to mine N k-itemsets with the highest supports for k up to a certain k_{max} value. We call the results the N-most interesting itemsets. Generally, it is more straightforward for users to determine N and k_{max}. We propose two new algorithms, LOOPBACK and BOMO. Experiments show that our methods outperform the previously proposed Itemset-Loop algorithm, and the performance of BOMO can be an order of magnitude better than the original FP-tree algorithm, even with the assumption of an optimally chosen support threshold. We also propose the mining of "N-most interesting k-itemsets with item constraints.” This allows user to specify different degrees of interestingness for different itemsets. Experiments show that our proposed Double FP-trees algorithm, which is based on BOMO, is highly efficient in solving this problem.
[1] C.C. Aggarwal and P.S. Yu, Mining Large Itemsets for Association Rules Bull. IEEE Computer Soc. Technical Committee on Data Eng., pp. 23-31, Mar. 1998.
[2] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications Proc. ACM SIGMOD Conf. Management of Data, 1998.
[3] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules Proc. 20th Int'l Conf. Very Large Databases (VLDB), pp. 487-499, 1994.
[4] S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, Dynamic Itemset Counting and Implication Rules for Market Basket Data Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 255-264, 1997.
[5] IBM Almaden Research Center, Synthetic Data Generation Code for Associations and Sequential Patterns,http://www.almaden.ibm.com/softwarequest /.
[6] Y.L. Cheung and A. Fu, An FP-Tree Approach for Mining N-Most Interesting Itemsets Proc. SPIE Conf. Data Mining, 2002.
[7] A.W.-C. Fu, R.W.-W. Kwong, and J. Tang, Mining N-Most Interesting Itemsets Proc. Int'l Symp. Methodologies for Intelligent Systems (ISMIS), 2000.
[8] G. Grahne, L. Lakshmanan, and X. Wang, Efficient Mining of Constrained Correlated Sets Proc. Int'l Conf. Data Eng., pp. 512-521, Feb. 2000.
[9] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[10] Minnesota Population Center in Univ. of Minnesota IPUMS-98,http://www.ipums.umn.edu/usasamples.html .
[11] M. Kamber, J. Han, and J.Y. Chiang, Mining Partial Periodicity Using Frequent Pattern Trees Proc. Conf. Knowledge Discovery and Data Mining (KDD), pp. 207-210, 1997.
[12] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, Finding Interesting Rules from Large Sets of Discovered Associated Rules Proc. Int'l Conf. Information and Knowledge Management (CIKM), pp. 401-408, 1994.
[13] B. Lent, A. Swami, and J. Widom, “Clustering Association Rules,” Proc. 1997 Int'l Conf. Data Eng., pp. 220-231, Apr. 1997.
[14] C. Leung, L. Lakshmanan, and R. Ng, Exploiting Succinct Constraints Using FP-Trees ACM SIGKDD Explorations (special issue on constraints in data mining), vol. 4, no. 1, pp. 40-49, June 2002.
[15] J.S. Park, M.S. Chen, and P.S. Yu, An Effective Hash-Based Algorithm for Mining Association Rules Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 175-186, 1995.
[16] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, Discovering Frequent Closed Itemsets for Association Rules Proc. Seventh Int'l Conf. Database Theory (ICDT), 1999.
[17] J. Pei and J. Han, Constrained Frequent Pattern Mining: A Pattern-Growth View Proc. ACM SIGKDD Explorations (special issue on constraints in data mining), vol. 4, no. 1, pp. 31-39, June 2002.
[18] J. Pei, J. Han, and R. Mao, Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD '00), May 2000.
[19] J. Pei, J. Han, and L.V.S. Lakshmanan, Mining Frequent Itemsets with Convertible Constraints Proc. Int'l Conf. Data Eng., 2001.
[20] A. Savasere, E. Omiecinski, and S. Navathe, An Efficient Algorithm for Mining Association Rules in Large Database Proc. 21st Int'l Conf. Very Large Databases (VLDB), pp. 432-443, 1995.
[21] P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, Turbo-Charging Vertical Mining of Large Databases Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 2000.
[22] C. Silverstein, S. Brin, R. Motwani, and J Ullman, Scalable Techniques for Mining Causal Structures Proc. 24th Int'l Conf. Very Large Databases (VLDB), pp. 594-605, 1998.
[23] R. Srikant, Q. Vu, and R. Agrawal, Mining Association Rules with Item Constraints Proc. Conf. Knowledge Discovery and Data Mining (KDD), pp. 67-73, 1997.
[24] K. Wang, Y. He, and J. Han, Pushing Support Constraints into Frequent Itemset Mining School of Computing, National Univ. of Singapore, 2000.
[25] K. Wang, Y. He, and J. Han, Mining Frequent Itemsets Using Support Constraints Proc. 26th Very Large Databases (VLDB) Conf., 2000.
[26] Z. Zheng, R. Kohavi, and L. Mason, Real World Performance of Association Rule Algorithms Proc. Knowledge Discovery and Data Mining (KDD), pp. 401-406, 2001.
Index Terms:
Association rules, N-most interesting itemsets, FP-tree, item constraints.
Citation:
Yin-Ling Cheung, Ada Wai-Chee Fu, "Mining Frequent Itemsets without Support Threshold: With and without Item Constraints," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 9, pp. 1052-1069, Sept. 2004, doi:10.1109/TKDE.2004.44