Issue No.09 - September (2008 vol.20)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.37
To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted or deleted. In addition, we present a progressive algorithm Pisa, standing for Progressive mIning of Sequential pAtterns, to progressively discover sequential patterns in defined time period of interest. The period of interest is a sliding window continuously advancing as the time goes by. Pisa utilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of period of interest, thereby effectively limiting the memory space required by Pisa that is significantly smaller than the memory needed by alternative methods. Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing Start time and End time of the period of interest, Pisa can easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed.
Sequential Pattern, data mining, progressive databases
Chi-Yao Tseng, Jian-Chih Ou, Ming-Syan Chen, "A General Model for Sequential Pattern Mining with a Progressive Database", IEEE Transactions on Knowledge & Data Engineering, vol.20, no. 9, pp. 1153-1167, September 2008, doi:10.1109/TKDE.2008.37