Subscribe

Issue No.02 - Feb. (2013 vol.25)

pp: 402-418

Tias Guns , Katholieke Universiteit Leuven, Leuven

Siegfried Nijssen , Katholieke Universiteit Leuven, Leuven

Luc De Raedt , Katholieke Universiteit Leuven, Leuven

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.204

ABSTRACT

We introduce the problem of k-pattern set mining, concerned with finding a set of k related patterns under constraints. This contrasts to regular pattern mining, where one searches for many individual patterns. The k-pattern set mining problem is a very general problem that can be instantiated to a wide variety of well-known mining tasks including concept-learning, rule-learning, redescription mining, conceptual clustering and tiling. To this end, we formulate a large number of constraints for use in k-pattern set mining, both at the local level, that is, on individual patterns, and on the global level, that is, on the overall pattern set. Building general solvers for the pattern set mining problem remains a challenge. Here, we investigate to what extent constraint programming (CP) can be used as a general solution strategy. We present a mapping of pattern set constraints to constraints currently available in CP. This allows us to investigate a large number of settings within a unified framework and to gain insight in the possibilities and limitations of these solvers. This is important as it allows us to create guidelines in how to model new problems successfully and how to model existing problems more efficiently. It also opens up the way for other solver technologies.

INDEX TERMS

Itemsets, Data mining, Optimization, Accuracy, Redundancy, Tiles, Size measurement, constraint programming, Data mining, pattern set mining, constraints

CITATION

Tias Guns, Siegfried Nijssen, Luc De Raedt, "k-Pattern Set Mining under Constraints",

*IEEE Transactions on Knowledge & Data Engineering*, vol.25, no. 2, pp. 402-418, Feb. 2013, doi:10.1109/TKDE.2011.204REFERENCES

- [1] B. Liu, W. Hsu, and Y. Ma, "Integrating Classification and Association Rule Mining,"
Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 80-86, 1998.- [2] W. Li, J. Han, and J. Pei, "CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules,"
Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 369-376, 2001.- [3] M.J. Kearns and U.V. Vazirani,
An Introduction to Computational Learning Theory. MIT Press, 1994.- [4] D.H. Fisher, "Knowledge Acquisition via Incremental Conceptual Clustering,"
Machine Learning, vol. 2, no. 2, pp. 139-172, 1987.- [5] L. Parida and N. Ramakrishnan, "Redescription Mining: Structure Theory and Algorithms,"
Proc. 20th Nat'l Conf. Artificial Intelligence (AAAI), pp. 837-844, 2005.- [6] F. Geerts, B. Goethals, and T. Mielikäinen, "Tiling Databases,"
Discovery Science, vol. 3245, pp. 278-289, 2004.- [7] T. Guns, S. Nijssen, and L. De Raedt, "Itemset Mining: A Constraint Programming Perspective,"
Artificial Intelligence, vol. 175, nos. 12/13, pp. 1951-1983, 2011.- [8] L. De Raedt, T. Guns, and S. Nijssen, "Constraint Programming for Itemset Mining,"
Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 204-212, 2008.- [9] J. Fürnkranz and P.A. Flach, "Roc 'n' Rule Learning-Toward a Better Understanding of Covering Algorithms,"
Machine Learning, vol. 58, no. 1, pp. 39-77, 2005.- [10] S. Morishita and J. Sese, "Traversing Itemset Lattice with Statistical Metric Pruning,"
Proc. 19th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems (PODS), pp. 226-236, 2000.- [11] G.C. Garriga, P. Kralj, and N. Lavrac, "Closed Sets for Labeled Data,"
J. Machine Learning Research, vol. 9, pp. 559-580, 2008.- [12] F. Rossi, P. van Beek, and T. Walsh,
Handbook of Constraint Programming. Elsevier, 2006.- [13] L. De Raedt and A. Zimmermann, "Constraint-Based Pattern Set Mining,"
Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 1-12, 2007.- [14] Gecode Team, "Gecode: Generic Constraint Development Environment," http:/www.gecode.org, 2010.
- [15] A. Frank and A. Asuncion, "UCI Machine Learning Repository," http://archive.ics.uci.eduml, 2010.
- [16] P. Shaw, B. De Backer, and V. Furnon, "Improved Local Search for CP Toolkits,"
Ann. of Operations Research, vol. 115, pp. 31-50, 2002.- [17] M. Khiari, P. Boizumault, and B. Crémilleux, "Constraint Programming for Mining N-Ary Patterns,"
Proc. 16th Int'l Conf. Principles and Practice of Constraint Programming (CP), pp. 552-567, 2010.- [18] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, "Discovering Frequent Closed Itemsets for Association Rules,"
Proc. Seventh Int'l Conf. Database Theory (ICDT), pp. 398-416, 1999.- [19] R. Bayardo, "Efficiently Mining Long Patterns from Databases,"
Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 85-93, 1998.- [20] T. Calders and B. Goethals, "Non-Derivable Itemset Mining,"
Data Mining and Knowledge Discovery, vol. 14, no. 1, pp. 171-206, 2007.- [21] S. Ruggieri, "Frequent Regular Itemset Mining,"
Proc. 16th ACM Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 263-272, 2010.- [22] L. Zhao, M.J. Zaki, and N. Ramakrishnan, "Blosom: A Framework for Mining Arbitrary Boolean Expressions,"
Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 827-832, 2006.- [23] B. Bringmann and A. Zimmermann, "The Chosen Few: On Identifying Valuable Patterns,"
Proc. Seventh Int'l Conf. Data Mining (ICDM), pp. 63-72, 2007.- [24] B. Bringmann, S. Nijssen, and A. Zimmermann, "Pattern-Based Classification: A Unifying Perspective,"
Proc. LeGo, "From Local Patterns to Global Models," Second ECML PKDD Workshop, 2009.- [25] M. Thoma, H. Cheng, A. Gretton, J. Han, H.-P. Kriegel, A.J. Smola, L. Song, P.S. Yu, X. Yan, and K.M. Borgwardt, "Near-Optimal Supervised Feature Selection among Frequent Subgraphs,"
Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 1075-1086, 2009.- [26] H. Cheng, X. Yan, J. Han, and P.S. Yu, "Direct Discriminative Pattern Mining for Effective Classification,"
Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 169-178, 2008.- [27] X. Yan, H. Cheng, J. Han, and D. Xin, "Summarizing Itemset Patterns: A Profile-Based Approach,"
Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 314-323, 2005.- [28] F. Afrati, A. Gionis, and H. Mannila, "Approximating a Collection of Frequent Sets,"
Proc. 10th ACM SIGKDD Knowledge Discovery and Data Mining (KDD), pp. 12-19, 2004.- [29] T. Mielikäinen and H. Mannila, "The Pattern Ordering Problem,"
Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 327-338, 2003.- [30] A. Gallo, T. Bie, and N. Cristianini, "Mini: Mining Informative Non-Redundant Itemsets,"
Proc. 11th Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 438-445, 2007.- [31] A. Siebes, J. Vreeken, and M. van Leeuwen, "Item Sets That Compress,"
Proc. SIAM Int'l Conf. Data Mining (SDM), 2006.- [32] A.J. Knobbe and E.K.Y. Ho, "Pattern Teams,"
Proc. 10th European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 577-584, 2006.- [33] S. Nijssen, T. Guns, and L. De Raedt, "Correlated Itemset Mining in Roc Space: A Constraint Programming Approach,"
Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 647-656, 2009.- [34] T. Imielinski and A. Virmani, "MSQL: A Query Language for Database Mining,"
Data Mining and Knowledge Discovery, vol. 3, pp. 373-408, 1999.- [35] H. Blockeel, T. Calders, É. Fromont, B. Goethals, A. Prado, and C. Robardet, "An Inductive Database Prototype Based on Virtual Mining Views,"
Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 1061-1064, 2008.- [36] F. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, and R. Trasarti, "A Constraint-Based Querying System for Exploratory Pattern Discovery,"
Information Systems, vol. 34, pp. 3-27, 2009.- [37] T. Imielinski and H. Mannila, "A Database Perspective on Knowledge Discovery,"
Comm. ACM, vol. 39, pp. 58-64, 1996.- [38] P. Van Hentenryck and L. Michel,
Constraint-Based Local Search. MIT Press, 2009.- [39] L. De Raedt and S. Nijssen, "Towards Programming Languages for Machine Learning and Data Mining (Extended Abstract),"
Proc. Int'l Symp. Methodologies for Intelligent Systems (ISMIS), pp. 25-32, 2011. |