36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the (2003)
Big Island, Hawaii
Jan. 6, 2003 to Jan. 9, 2003
Ming-Yen Lin , National Chiao Tung University
Suh-Yin Lee , National Chiao Tung University
The discovery of sequential patterns, which extends beyond frequent item-set finding of association rule mining, has become a challenging task due to its complexity. Essentially, a user would specify a minimum support threshold with respect to the database to find out the desired patterns. The mining process is usually iterative since the user must try various thresholds to obtain the satisfactory result. Therefore, the time-consuming process has to be repeated several times. However, current approaches are inadequate for such process due to the long execution time required for each trial. In order to minimize the total execution time and the response time for each trial, we propose a knowledge base assisted algorithm for interactive sequence discovery, called KISP. KISP constructs a knowledge base accumulating the pattern information in individual mining, eliminates considerable amount of potential patterns to facilitate efficient support counting, and speeds up the whole process. In addition, we further optimize the algorithm by direct generations of the reduced candidate sets and concurrent counting of variable sized candidates. For some queries, KISP may eliminate database access completely. The conducted experiments show that KISP outperforms GSP, a state-of-the-art sequence mining algorithm, by several orders of magnitudes for interactive sequence discovery.
M. Lin and S. Lee, "Improving the Efficiency of Interactive Sequential Pattern Mining by Incremental Pattern Discovery," 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the(HICSS), Big Island, Hawaii, 2003, pp. 68b.