The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.09 - September (2008 vol.20)
pp: 1153-1167
ABSTRACT
To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted or deleted. In addition, we present a progressive algorithm Pisa, standing for Progressive mIning of Sequential pAtterns, to progressively discover sequential patterns in defined time period of interest. The period of interest is a sliding window continuously advancing as the time goes by. Pisa utilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of period of interest, thereby effectively limiting the memory space required by Pisa that is significantly smaller than the memory needed by alternative methods. Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing Start time and End time of the period of interest, Pisa can easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed.
INDEX TERMS
Sequential Pattern, data mining, progressive databases
CITATION
Chi-Yao Tseng, Jian-Chih Ou, Ming-Syan Chen, "A General Model for Sequential Pattern Mining with a Progressive Database", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 9, pp. 1153-1167, September 2008, doi:10.1109/TKDE.2008.37
REFERENCES
[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 478-499, Sept. 1994.
[2] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 11th Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, Feb. 1995.
[3] S. Aseervatham, A. Osmani, and E. Viennet, “Bitspade: A Lattice-Based Sequential Pattern Mining Algorithm Using Bitmap Representation,” Proc. Sixth Int'l Conf. Data Mining (ICDM), 2006.
[4] J. Ayres, J. Gehrke, T. Yiu, and J. Flannick, “Sequential Pattern Mining Using a Bitmap Representation,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 429-435, July 2002.
[5] A. Balachandran, G.M. Voelker, P. Bahl, and P.V. Rangan, “Characterizing User Behavior and Network Performance in a Public Wireless LAN,” Proc. ACM SIGMETRICS Int'l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS '02), June 2002.
[6] H. Cao, N. Mamoulis, and D.W. Cheung, “Mining Frequent Spatio-Temporal Sequential Patterns,” Proc. Fifth Int'l Conf. Data Mining (ICDM '05), pp. 82-89, Nov. 2005.
[7] G. Chen, X. Wu, and X. Zhu, “Sequential Pattern Mining in Multiple Streams,” Proc. Fifth Int'l Conf. Data Mining (ICDM '05), pp. 585-588, Nov. 2005.
[8] M.-S. Chen, J. Han, and P.S. Yu, “Data Mining: An Overview from Database Perspective,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 866-883, Dec. 1996.
[9] M.-S. Chen, J.-S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Mar./Apr. 1998.
[10] H. Cheng, X. Yan, and J. Han, “INCSPAN: Incremental Mining of Sequential Patterns in Large Database,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), pp. 527-532, 2004.
[11] S. Cong, J. Han, and D. Padua, “Parallel Mining of Closed Sequential Patterns,” Proc. 11th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '05), pp. 562-567, Aug. 2005.
[12] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy, Advances in Knowledge Discovery and Data Mining. MIT Press, 1996.
[13] M.N. Garofalakis, R. Rastogi, and K. Shim, “Spirit: Sequential Pattern Mining with Regular Expression Constraints,” Proc. 25th Int'l Conf. Very Large Data Bases (VLDB '99), pp. 223-234, 1999.
[14] J.-K. Guo, B.-J. Ruan, and Y.-Y. Zhu, “A Top-Down Algorithm for Web Log Sequential Pattern Mining,” Proc. Ninth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2005.
[15] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
[16] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.C. Hsu, “Freespan: Frequent Pattern-Projected Sequential Pattern Mining,” Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '00), pp. 355-359, 2000.
[17] Y. Hirate and H. Yamana, “Sequential Pattern Mining with Time Interval,” Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '06), pp. 775-779, 2006.
[18] C.-C. Ho, H.-F. Li, F.-F. Kuo, and S.-Y. Lee, “Incremental Mining of Sequential Patterns over a Stream Sliding Window,” Proc. IEEE Int'l Workshop Mining Evolving and Streaming Data (IWMESD '06), Dec. 2006.
[19] KDD CUP 2007, http://www.cs.uic.edu/~liubNetflix-KDD-Cup-2007.html , 2007.
[20] C.-R. Lin, C.-H. Yun, and M.-S. Chen, “Utilizing Slice Scan and Selective Hash for Episode Mining,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), Aug. 2001.
[21] M.-Y. Lin and S.-Y. Lee, “Incremental Update on Sequential Patterns in Large Databases by Implicit Merging and Efficient Counting,” Information System, vol. 29, no. 5, pp. 385-404, July 2004.
[22] C. Luo and S.M. Chung, “Efficient Mining of Maximal Sequential Patterns Using Multiple Samples,” Proc. Fifth SIAM Int'l Conf. Data Mining (SDM), 2005.
[23] H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997.
[24] A. Marascu and F. Masseglia, “Mining Sequential Patterns from Temporal Streaming Data,” Proc. First ECML/PKDD Workshop Mining Spatio-Temporal Data (MSTD '05), Oct. 2005.
[25] A. Marascu and F. Masseglia, “Mining Sequential Patterns from Data Streams: A Centroid Approach,” J. Intelligent Information Systems, vol. 27, no. 3, pp. 291-307, Nov. 2006.
[26] F. Masseglia, P. Poncelet, and M. Teisseire, “Incremental Mining of Sequential Patterns in Large Databases,” Data and Knowledge Eng., vol. 46, pp. 97-121, July 2003.
[27] S. Nguyen, X. Sun, and M. Orlowska, “Improvements of INCSPAN: Incremental Mining of Sequential Patterns in Large Database,” Proc. Ninth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2005.
[28] J.-Z. Ouh, P. Wu, and M.-S. Chen, “Constrained Based Sequential Pattern Mining,” Proc. Int'l Workshop Web Technology, Dec. 2001.
[29] A.B. Pandey, J. Srivastava, and S. Shekhar, “Web Proxy Server with Intelligent Prefetcher for Dynamic Pages Using Association Rules,” Technical Report 01-004, Univ. of Minnesota, Jan. 2001.
[30] A.B. Pandey, R.R. Vatsavai, X. Ma, J. Srivastava, and S. Shekhar, “Data Mining for Intelligent Web Prefetching,” Proc. Workshop Mining Data Across Multiple Customer Touchpoints for CRM (MDCRM '02), May 2002.
[31] S. Parthasarathy, M.J. Zaki, M. Ogihara, and S. Dwarkadas, “Incremental and Interactive Sequence Mining,” Proc. Eighth ACM Int'l Conf. Information and Knowledge Management (CIKM '99), pp.251-258, 1999.
[32] J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, “Prefixspan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proc. 17th Int'l Conf. Data Eng. (ICDE '01), pp. 215-224, 2001.
[33] J. Pei, J. Han, and W. Wang, “Mining Sequential Patterns with Constraints in Large Databases,” Proc. 11th ACM Int'l Conf. Information and Knowledge Management (CIKM), 2002.
[34] C. Romero, S. Ventura, J.A. Delgado, and P.D. Bra, “Personalized Links Recommendation Based on Data Mining. In Adaptive Educational Hypermedia Systems,” Proc. Second European Conf. Technology Enhanced Learning (EC-TEL '07), Sept. 2007.
[35] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT '96), Mar. 1996.
[36] J. Wang and J. Han, “Bide: Efficient Mining of Frequent Closed Sequences,” Proc. 20th Int'l Conf. Data Eng. (ICDE '04), pp. 79-91, 2004.
[37] K. Wang, Y. Xu, and J.X. Yu, “Scalable Sequential Pattern Mining for Biological Sequences,” Proc. 13th ACM Int'l Conf. Information and Knowledge Management (CIKM '04), pp. 178-187, Nov. 2004.
[38] X. Yan, J. Han, and R. Afshar, “Clospan: Mining Closed Sequential Patterns in Large Datasets,” Proc. Third SIAM Int'l Conf. Data Mining (SDM '03), pp. 166-177, May 2003.
[39] Q. Yang and H.H. Zhang, “Web-Log Mining for Predictive Web Caching,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp.1050-1053, July/Aug. 2003.
[40] Q. Yang, H.H. Zhang, and T. Li, “Mining Web Logs for Prediction Models in www Caching and Prefetching,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '01), pp. 473-478, Aug. 2001.
[41] M.J. Zaki, “Efficient Enumeration of Frequent Sequences,” Proc. Seventh ACM Int'l Conf. Information and Knowledge Management (CIKM '98), Nov. 1998.
[42] M. Zhang, B. Kao, D.W.-L. Cheung, and C.L. Yip, “Efficient Algorithms for Incremental Update of Frequent Sequences,” Proc. Sixth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '02), pp. 186-197, 2002.
44 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool