The Community for Technology Leaders
RSS Icon
Issue No.10 - October (2009 vol.21)
pp: 1418-1431
Ho Jin Woo , Yonsei University, Seoul
Won Suk Lee , Yonsei University, Seoul
Frequent item set mining is one of the most challenging issues for descriptive data mining. In general, its resulting set tends to produce a large number of frequent item sets. To represent them in a more compact notation, closed or maximal frequent item sets are often used but finding such item sets over online transactional data streams is not easy due to the requirements of a data stream. For this purpose, this paper proposes a method of tracing the set of MFIs instantly over an online data stream. The method, namely estMax, maintains the set of frequent item sets by a prefix tree and extracts all MFIs without any additional superset/subset checking mechanism. Upon processing a new transaction, those frequent item sets that are matched maximally by the transaction are newly marked in their corresponding nodes of the prefix tree as candidates for MFIs. At the same time, if any subset of a newly marked item set has been already marked as a candidate MFI by a previous transaction, it is cleared as well. By employing this additional step, it is possible to extract the set of MFIs at any moment. The performance of the estMax method is comparatively analyzed by a series of experiments to identify its various characteristics.
Data mining, maximal frequent item sets, transactional data streams.
Ho Jin Woo, Won Suk Lee, "estMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 10, pp. 1418-1431, October 2009, doi:10.1109/TKDE.2008.233
[1] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” Proc. ACM Special Interest Group on Management of Data, pp. 255-264, 1997.
[2] M. Garofalakis, J. Gehrke, and R. Rastogi, “Querying and Mining Data Streams: You Only Get One Look,” Proc. Tutorial Notes of the 28th Int'l Conf. Very Large Data Bases, 2002.
[3] M. Charikar, K. Chen, and M. Farach-Colton, “Finding Frequent Items in Data Streams,” Proc. 29th Int'l Colloquium Automata, Language, and Programming, pp. 693-703, 2002.
[4] G.S. Manku and R. Motwani, “Approximate Frequency Counts over Data Streams,” Proc. 28th Int'l Conf. Very Large Data Bases, pp.346-357, 2002.
[5] J.H. Chang and W.S. Lee, “Finding Recent Frequent Itemsets Adaptively over Online Data Streams,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp.487-492, 2003.
[6] J.H. Chang and W.S. Lee, “estWin: Online Data Stream Mining of Recent Frequent Itemsets by Sliding Window Method,” J.Information Science, vol. 31, no. 2, pp. 76-90, 2005.
[7] D. Xin, J. Han, X. Yan, and H. Cheng, “Mining Compressed Frequent-Pattern Sets,” Proc. 31st Int'l Conf. Very Large Data Bases, pp. 709-720, 2005.
[8] Y. Chi, H. Wang, P.S. Yu, and R.R. Muntz, “Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window,” Proc. Fourth IEEE Int'l Conf. Data Mining, pp. 59-66, 2004.
[9] N. Jiang and L. Gruenwald, “CFI-Stream: Mining Closed Frequent Itemsets in Data Streams,” Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 592-597, 2006.
[10] G. Mao, X. Wu, X. Zhu, and G. Chen, “Mining Maximal Frequent Itemsets from Data Streams,” J. Information Science, vol. 33, no. 3, pp. 251-262, 2007.
[11] H.J. Woo and W.S. Lee, “estMax: Tracing Maximal Frequent Itemsets over Online Data Streams,” Proc. Seventh IEEE Int'l Conf. Data Mining, pp. 709-714, 2007.
[12] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases, pp. 487-499, 1994.
[13] R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad, “Depth First Generation of Long Patterns,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 108-118, 2000.
[14] C. Jin, W. Qian, C. Sha, J.X. Yu, and A. Zhou, “Dynamically Maintaining Frequent Items over a Data Stream,” Proc. 12th ACM Int'l Conf. Information and Knowledge Management, pp. 287-294, 2003.
[15] Z. Chong, J.X. Yu, H. Lu, Z. Zhang, and A. Zhou, “False-Negative Frequent Items Mining from Data Streams with Bursting,” Proc. 10th Int'l Conf. Database Systems for Advanced Applications, pp. 422-434, 2003.
[16] J. Wang, J. Han, and J. Pei, “Closet+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 236-245, 2003.
[17] M. Zaki and C. Hsiao, “Charm: An Efficient Algorithm for Closed Itemset Mining,” Proc. SIAM Conf. Data Mining, pp. 457-473, 2002.
[18] D. Burdick, M. Calimlim, and J. Gehrke, “MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. 17th Int'l Conf. Data Eng., pp. 443-452, 2001.
[19] K. Gouda and M.J. Zaki, “Efficiently Mining Maximal Frequent Itemsets,” Proc. First Int'l Conf. Data Mining, pp. 163-170, 2001.
[20] D. Lin and Z.M. Kedem, “Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 553-566, May/June 2002.
[21] R.J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. ACM Special Interest Group on Management of Data, pp. 85-93, 1998.
[22] G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang, and P.S. Yu, “Online Mining of Changes from Data Streams: Research Problems and Preliminary Results,” Proc. Workshop Management and Processing of Data Streams, 2003.
[23] H. Javitz and A. Valdes, “The NIDES Statistical Component Description and Justification,” Ann. Report A010, SRI Int'l, 1994.
11 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool