The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.07 - July (2010 vol.22)
pp: 913-928
Jinlin Chen , Queens College, City University of New York, Flushing
ABSTRACT
Traditional pattern growth-based approaches for sequential pattern mining derive length-(k+1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in \lfloor log_{2}k+1\rfloor levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces.
INDEX TERMS
Data mining algorithm, directed acyclic graph, performance analysis, sequential pattern, transaction database.
CITATION
Jinlin Chen, "An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 7, pp. 913-928, July 2010, doi:10.1109/TKDE.2009.135
REFERENCES
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, 1995.
[2] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th Int'l Conf. Very Large Data Bases (VLDB), pp. 487-499, 1994.
[3] C. Antunes and A.L. Oliveira, "Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints," Proc. Int'l Conf. Machine Learning and Data Mining 2003, pp. 239-251, 2003.
[4] J. Ayres, J. Gehrke, T. Yu, and J. Flannick, "Sequential Pattern Mining Using a Bitmap Representation," Proc. Int'l Conf. Knowledge Discovery and Data Mining 2002, pp. 429-435, 2002.
[5] S. Berkovich, G. Lapir, and M. Mack, "A Bit-Counting Algorithm Using the Frequency Division Principle," Software: Practice and Experience, vol. 30, no. 14, pp. 1531-1540, 2000.
[6] J. Chen and T. Cook, "Mining Contiguous Sequential Patterns from Web Logs," Proc. World Wide Web Conf. (WWW '07) Poster Session, May 2007.
[7] J. Chen and K. Xiao, "BISC: A Binary Itemset Support Counting Approach Towards Efficient Frequent Itemset Mining," to be published in ACM Trans. Knowledge Discovery in Data.
[8] G. Grahne and J. Zhu, "Efficiently Using Prefix-Trees in Mining Frequent Itemsets," Proc. Workshop Frequent Itemset Mining Implementations (FIMI '03), 2003.
[9] M. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: Sequential Pattern Mining with Regular Expression Constraints," Proc. Int'l Conf. Very Large Data Bases (VLDB '99), pp. 223-234, 1999.
[10] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.C. Hsu, "FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining," Proc. ACM SIGKDD, pp. 355-359, 2000.
[11] M.Y. Lin and S.Y. Lee, "Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning," J. Information Science and Eng., vol. 21, pp. 109-128, 2005.
[12] F. Masseglia, F. Cathala, and P. Poncelet, "The PSP Approach for Mining Sequential Patterns," Proc. European Symp. Principle of Data Mining and Knowledge Discovery, pp. 176-184, 1998.
[13] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 215-224, 2001.
[14] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1424-1440, Nov. 2004.
[15] E.M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms—Theory and Practice. Prentice-Hall, Inc., 1977.
[16] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. Int'l Conf. Extending Database Technology 1996, pp. 3-17, 1996.
[17] K. Wang, Y. Xu, and J.X. Yu, "Scalable Sequential Pattern Mining for Biological Sequences," Proc. 2004 ACM Int'l Conf. Information and Knowledge Management, pp. 178-187, 2004.
[18] J. Wang, Y. Asanuma, E. Kodama, T. Takata, and J. Li, "Mining Sequential Patterns More Efficiently by Reducing the Cost of Scanning Sequence Databases," IPSJ Trans. Database, vol. 47, no. 12, pp. 3365-3379, 2006.
[19] M. Zaki, "Spade: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 40, pp. 31-60, 2001.
[20] Z. Zhang and M. Kitsuregawa, "LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern," Proc. Int'l Special Workshop Databases for Next Generation Researchers, pp. 8-11, Apr. 2005.
[21] Z. Zhang, Y. Wang, and M. Kitsuregawa, "Effective Sequential Pattern Mining Algorithms for Dense Database," Proc. Japanese Nat'l Data Eng. Workshop (DEWS '06), 2006.
26 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool