This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Discovering Frequent Closed Partial Orders from Strings
November 2006 (vol. 18 no. 11)
pp. 1467-1481
Haixun Wang, IEEE Computer Society
Jianyong Wang, IEEE Computer Society
Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For example, if many customers follow a partial order in their purchases of a series of products, the partial order can be used to predict other related customers' future purchases and develop marketing campaigns. Moreover, some biological sequences (e.g., microarray data) can be clustered based on the partial orders shared by the sequences. Given a set of items, a total order of a subset of items can be represented as a string. A string database is a multiset of strings. In this paper, we identify a novel problem of mining frequent closed partial orders from strings. Frequent closed partial orders capture the nonredundant and interesting ordering information from string databases. Importantly, mining frequent closed partial orders can discover meaningful knowledge that cannot be disclosed by previous data mining techniques. However, the problem of mining frequent closed partial orders is challenging. To tackle the problem, we develop Frecpo (for Frequent closed partial order), a practically efficient algorithm for mining the complete set of frequent closed partial orders from large string databases. Several interesting pruning techniques are devised to speed up the search. We report an extensive performance study on both real data sets and synthetic data sets to illustrate the effectiveness and the efficiency of our approach.

[1] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models from Workflow Logs,” EDBT '98: Proc. Sixth Int'l Conf. Extending Database Technology, pp. 469-483, 1998.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '93), pp. 207-216, May 1993.
[3] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, Mar. 1995.
[4] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential Pattern Mining Using a Bitmap Representation,” Proc. 2002 ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 429-435, July 2002.
[5] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, “Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem,” Proc. Sixth Ann. Int'l Conf. Computational Biology, pp. 49-57, 2002.
[6] E. Boros, V. Gurvich, L. Khachiyan, and K. Makino, “On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets,” Proc. Symp. Theoretical Aspects of Computer Science, pp. 133-141, 2002.
[7] G. Casas-Garriga, “Summarizing Sequential Data with Closed Partial Orders,” Proc. 2005 SIAM Int'l Conf. Data Mining, Apr. 2005.
[8] D.Y. Chiu, Y.H. Wu, and A.L.P. Chen, “An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting,” Proc. 20th IEEE Int'l Conf. Data Eng. (ICDE '04), pp.275-286, 2004.
[9] G. Dong and J. Li, “Efficient Mining of Emerging Patterns: Discovering Trends and Differences,” Proc. 1999 Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 43-52, Aug. 1999.
[10] M. Garofalakis, R. Rastogi, and K. Shim, “SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,” Proc. 1999 Int'l Conf. Very Large Data Bases (VLDB '99), pp. 223-234, Sept. 1999.
[11] A. Gionis, T. Kujala, and H. Mannila, “Fragments of Order,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 129-136, 2003.
[12] D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R.S. Sharma, “Discovering All Most Specific Sentences,” ACM Trans. Database Systems, vol., 28, no. 2, pp. 140-174, 2003.
[13] A. Inokuchi, T. Washio, and H. Motoda, “An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data,” Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD '00), pp. 13-23, Sept. 2000.
[14] H.C.M. Kum, J. Pei, and W. Wang, “Approxmap: Approximate Mining of Consensus Sequential Patterns,” Proc. 2003 SIAM Int'l Conf. Data Mining, May 2003.
[15] J. Liu and W. Wang, “Op-Cluster: Clustering by Tendency in High Dimensional Space,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), Nov. 2003.
[16] H. Mannila and C. Meek, “Global Partial Orders from Sequential Data,” Proc. 2000 ACM SIGKDD Int'l Conf. Knowledge Discovery in Databases (KDD '00), pp. 150-160, Aug. 2000.
[17] H. Mannila, H. Toivonen, and A. I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, pp. 259-289, 1997.
[18] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory (ICDT '99), pp. 398-416, Jan. 1999.
[19] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 215-224, Apr. 2001.
[20] J. Pei, J. Han, and W. Wang, “Constraint-Based Sequential Pattern Mining in Large Databases,” Proc. 2002 Int'l Conf. Information and Knowledge Management (CIKM '02), Nov. 2002.
[21] J. Pei, J. Liu, H. Wang, K. Wang, P.S. Yu, and J. Wang, “Efficiently Mining Frequent Closed Partial Orders,” Proc. Fifth IEEE Int'l Conf. Data Mining (ICDM '05), pp. 753-756, IEEE, Nov. 2005
[22] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT '96), pp. 3-17, Mar. 1996.
[23] P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-k Closed Sequential Patterns,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), Nov. 2003.
[24] J. Valdes, R.E. Tarjan, and E.L. Lawler, “The Recognition of Series Parallel Digraphs,” Proc. 11th Ann. ACM Symp. Theory of Computing, pp. 1-12, 1979.
[25] W. van der Aalst, T. Weijters, and L. Maruster, “Workflow Mining: Discovering Process Models from Event Logs,” IEEE Trans. Knowledge and Data Eng., vol. 16, pp. 1128-1142, Sept. 2004.
[26] J. Wang and J. Han, “BIDE: Efficient Mining of Frequent Closed Sequences,” Proc. 20th IEEE Int'l Conf. Data Eng., pp. 79-90, 2004.
[27] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[28] X. Yan and J. Han, “Closegraph: Mining Closed Frequent Graph Patterns,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[29] X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Databases,” Proc. 2003 SIAM Int'l Conf. Data Mining, May 2003.
[30] G. Yang, “The Complexity of Mining Maximal Frequent Itemsets and Maximal Frequent Patterns,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '04), 2004.
[31] J. Yang, P.S. Yu, W. Wang, and J. Han, “Mining Long Sequential Patterns in a Noisy Environment,” Proc. 2002 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '02), Jun. 2002.
[32] M.J. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 42, nos. 1-2, pp. 31-60, 2001.
[33] M.J. Zaki and C.J. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining,” Proc. 2002 SIAM Int'l Conf. Data Mining, pp. 457-473, Apr., 2002.

Index Terms:
Frequent patterns, closed patterns, partial orders, strings, data mining.
Citation:
Jian Pei, Haixun Wang, Jian Liu, Ke Wang, Jianyong Wang, Philip S. Yu, "Discovering Frequent Closed Partial Orders from Strings," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1467-1481, Nov. 2006, doi:10.1109/TKDE.2006.172
Usage of this product signifies your acceptance of the Terms of Use.