Database Engineering and Applications Symposium, International (2006)
Dec. 11, 2006 to Dec. 14, 2006
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IDEAS.2006.34
Yitong Wang , Fudan university
Masaru Kitsuregawa , University of Tokyo
Zhenglu Yang , University of Tokyo
Sequential pattern mining is very important because it is the basis of many applications. Yet how to efficiently implement the mining is difficult due to the inherent characteristic of the problem - the large size of the dataset. Although there has been a great deal of effort on sequential pattern mining in recent years, its performance is still far from satisfactory. In this paper, we have proposed a new algorithm called PAssed Item Deduced sequential pattern mining (abbreviated as PAID), which can efficiently get all the frequent sequential patterns from a large database. The main difference between our strategy and the existing works is that other algorithms accumulate the candidate support in each iteration from scratch, in contrast, PAID makes good use of the temporary results (support value) of k-length frequent patterns on discovering (k+1)-length patterns, which can reduce the search space greatly in mining sequential patterns. Our experimental results and performance studies show that PAID outperforms the previous works by meaningful margins on large datasets.
Yitong Wang, Masaru Kitsuregawa, Zhenglu Yang, "PAID: Mining Sequential Patterns by Passed Item Deduction in Large Databases", Database Engineering and Applications Symposium, International, vol. 00, no. , pp. 113-120, 2006, doi:10.1109/IDEAS.2006.34