Subscribe

Issue No.12 - December (2009 vol.21)

pp: 1708-1721

Syed Khairuzzaman Tanbeer , Kyung Hee University, Youngin-si

Byeong-Soo Jeong , Kyung Hee University, Youngin-si

Young-Koo Lee , Kyung Hee University, Youngin-si

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.46

ABSTRACT

Recently, high utility pattern (HUP) mining is one of the most important research issues in data mining due to its ability to consider the nonbinary frequency values of items in transactions and different profit values for every item. On the other hand, incremental and interactive data mining provide the ability to use previous data structures and mining results in order to reduce unnecessary calculations when a database is updated, or when the minimum threshold is changed. In this paper, we propose three novel tree structures to efficiently perform incremental and interactive HUP mining. The first tree structure, Incremental HUP Lexicographic Tree ({\rm IHUP}_{{\rm {L}}}-Tree), is arranged according to an item's lexicographic order. It can capture the incremental data without any restructuring operation. The second tree structure is the IHUP Transaction Frequency Tree ({\rm IHUP}_{{\rm {TF}}}-Tree), which obtains a compact size by arranging items according to their transaction frequency (descending order). To reduce the mining time, the third tree, IHUP-Transaction-Weighted Utilization Tree ({\rm IHUP}_{{\rm {TWU}}}-Tree) is designed based on the TWU value of items in descending order. Extensive performance analyses show that our tree structures are very efficient and scalable for incremental and interactive HUP mining.

INDEX TERMS

Data mining, frequent pattern mining, high utility pattern mining, incremental mining, interactive mining.

CITATION

Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, Young-Koo Lee, "Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 12, pp. 1708-1721, December 2009, doi:10.1109/TKDE.2009.46REFERENCES

- [1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,”
Proc. 12th ACM SIGMOD, pp. 207-216, 1993.- [2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,”
Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, 1994.- [3] U. Yun and J.J. Leggett, “WFIM: Weighted Frequent Itemset Mining with a Weight Range and a Minimum Weight,”
Proc. Fifth SIAM Int'l Conf. Data Mining (SDM '05), pp. 636-640, 2005.- [4] U. Yun, “Efficient Mining of Weighted Interesting Patterns with a Strong Weight and/or Support Affinity,”
Information Sciences, vol. 177, pp. 3477-3499, 2007.- [5] H. Yao, H.J. Hamilton, and C.J. Butz, “A Foundational Approach to Mining Itemset Utilities from Databases,”
Proc. Fourth SIAM Int'l Conf. Data Mining (SDM '04), pp. 482-486, 2004.- [6] H. Yao and H.J. Hamilton, “Mining Itemset Utilities from Transaction Databases,”
Data and Knowledge Eng., vol. 59, pp.603-626, 2006.- [7] Y. Liu, W.-K. Liao, and A. Choudhary, “A Two Phase Algorithm for Fast Discovery of High Utility of Itemsets,”
Proc. Ninth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '05), pp.689-695, 2005.- [8] Y. Liu, W.-K. Liao, and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,”
Proc. First Int'l Conf. Utility-Based Data Mining, pp. 90-99, 2005.- [9] B. Barber and H.J. Hamilton, “Extracting Share Frequent Itemsets with Infrequent Subsets,”
Data Mining and Knowledge Discovery, vol. 7, pp. 153-185, 2003.- [10] R. Chan, Q. Yang, and Y.D. Shen, “Mining High Utility Itemsets,”
Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), pp. 19-26, 2003.- [11] F. Tao, “Weighted Association Rule Mining Using Weighted Support and Significant Framework,”
Proc. Ninth ACM SIGKDD, pp. 661-666, 2003.- [12] J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,”
Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004.- [13] G. Grahne and J. Zhu, “Fast Algorithms for Frequent Itemset Mining Using FP-Trees,”
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 10, pp. 1347-1362, Oct. 2005.- [14] J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets,”
IEEE Trans. Knowledge and Data Eng., vol. 17, no. 5, pp. 652-664, May 2005.- [15] C. Lucchese, S. Orlando, and R. Perego, “Fast and Memory Efficient Mining of Frequent Closed Itemsets,”
IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 21-36, Jan. 2006.- [16] W. Cheung and O.R. Zaïane, “Incremental Mining of Frequent Patterns without Candidate Generation or Support Constraint,”
Proc. Seventh Int'l Database Eng. and Applications Symp. (IDEAS '03), pp. 111-116, 2003.- [17] J.-L. Koh and S.-F. Shieh, “An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures,”
Proc. Ninth Int'l Conf. Database Systems for Advanced Applications (DASFAA '04), pp. 417-424, 2004.- [18] X. Li, Z.-H. Deng, and S. Tang, “A Fast Algorithm for Maintenance of Association Rules in Incremental Databases,”
Proc. Advanced Data Mining and Applications (ADMA 06), pp. 56-63, 2006.- [19] C.K.-S. Leung, Q.I. Khan, Z. Li, and T. Hoque, “CanTree: A Canonical-Order Tree for Incremental Frequent-Pattern Mining,”
Knowledge and Information Systems, vol. 11, no. 3, pp. 287-311, 2007.- [20] W. Wang, J. Yang, and P.S. Yu, “WAR: Weighted Association Rules for Item Intensities,”
Knowledge Information and Systems, vol. 6, pp. 203-229, 2004.- [21] U. Yun, “Mining Lossless Closed Frequent Patterns with Weight Constraints,”
Knowledge-Based Systems, vol. 20, pp. 86-97, 2007.- [22] H. Xiong, P.-N. Tan, and V. Kumar, “Hyperclique Pattern Discovery,”
Data Mining and Knowledge Discovery, vol. 13, pp.219-242, 2006.- [23] J. Dong and M. Han, “BitTableFI: An Efficient Mining Frequent Itemsets Algorithm,”
Knowledge-Based Systems, vol. 20, pp. 329-335, 2007.- [24] M. Song and S. Rajasekaran, “A Transaction Mapping Algorithm for Frequent Itemsets Mining,”
IEEE Trans. Knowledge and Data Eng., vol. 18, no. 4, pp. 472-481, Apr. 2006.- [25] Y.-H. Wen, J.-W. Huang, and M.-S. Chen, “Hardware-Enhanced Association Rule Mining with Hashing and Pipelining,”
IEEE Trans. Knowledge and Data Eng., vol. 20, no. 6, pp. 784-795, June 2008.- [26] A. Erwin, R.P. Gopalan, and N.R. Achuthan, “CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach,”
Proc. Seventh IEEE Int'l Conf. Computer and Information Technology (CIT '07), pp. 71-76, 2007.- [27] J. Hu and A. Mojsilovic, “High Utility Pattern Mining: A Method for Discovery of High Utility Itemsets,”
Pattern Recognition, vol. 40, pp. 3317-3324, 2007.- [28] Frequent Itemset Mining Dataset Repository, http://fimi.cs. helsinki.fidata/, accessed Jan. 2008.
- [29] UCI Machine Learning Repository, http:/kdd.ics.uci.edu/, accessed Jan. 2008.
- [30] J. Pisharath, Y. Liu, J. Parhi, W.-K. Liao, A. Choudhary, and G. Memik,, NU-MineBench Version 2.0 Source Code and Datasets, http://cucis.ece.northwestern.edu/projects/ DMSMineBench.html, accessed June 2008
- [31] T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets, “Using Association Rules for Product Assortment Decisions: A Case Study,”
Proc. Fifth ACM SIGKDD, pp. 254-260, 1999.- [32] S. Zhang, J. Zhang, and C. Zhang, “EDUA: An Efficient Algorithm for Dynamic Database Mining,”
Information Sciences, vol. 177, pp.2756-2767, 2007.- [33] Y.-S. Lee and S.-J. Yen, “Incremental and Interactive Mining of Web Traversal Patterns,”
Information Sciences, vol. 178, pp. 287-306, 2008.- [34] T.-P. Hong, C.-W. Lin, and Y.-L. Wu, “Incrementally Fast Updated Frequent Pattern Trees,”
Expert Systems with Applications, vol. 34, pp. 2424-2435, 2008.- [35] Y.-C. Li, J.-S. Yeh, and C.-C. Chang, “Isolated Items Discarding Strategy for Discovering High Utility Itemsets,”
Data and Knowledge Eng., vol. 64, pp. 198-217, 2008.- [36] S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong, and Y.-K. Lee, “CP-Tree: A Tree Structure for Single Pass Frequent Pattern Mining,”
Proc. 12th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '08), pp. 1022-1027, 2008. |