This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Specifying Mining Algorithms with Iterative User-Defined Aggregates
October 2004 (vol. 16 no. 10)
pp. 1232-1246
We present a way of exploiting domain knowledge in the design and implementation of data mining algorithms, with special attention to frequent patterns discovery, within a deductive framework. In our framework, domain knowledge is represented by way of deductive rules, and data mining algorithms are specified by means of iterative user-defined aggregates and implemented by means of user-defined predicates. This choice allows us to exploit the full expressive power of deductive rules without loosing in performance. Iterative user-defined aggregates have a fixed scheme, in which user-defined predicates are to be added. This feature allows the modularization of data mining algorithms, thus providing a way to integrate the proper domain knowledge exploitation in the right point. As a case study, the paper presents how user-defined aggregates can be exploited to specify and implement a version of the a priori algorithm. Some performance analyzes and comparisons are discussed in order to show the effectiveness of the approach.

[1] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases. Addison-Wesley, 1995.
[2] P. Alcamo, F. Domenichini, and F. Turini, An XML Based Environment in Support of the Overall KDD Process Proc. Fourth Int'l Conf. Flexible Query Answering Systems, pp. 413-424, 2000.
[3] R. Agrawal and K. Shim, Developing Tightly-Coupled Data Mining Applications on a Relational Database System Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 287-290, 1996.
[4] R. Agrawal, S. Sarawagi, and S. Thomas, Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications Data Mining and Knowledge Discovery, vol. 4, nos. 2/3, pp. 89-125, 2000.
[5] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules Proc. 20th Int'l Conf. Very Large Databases, pp. 487-499, 1994.
[6] R. Bayardo, Efficiently Mining Long Patterns from Databases Proc. ACM Conf. Management of Data, pp. 85-93, 1998.
[7] J.-F. Boulicaut, M. Klemettinen, and H. Mannila, Querying Inductive Databases: A Case Study on the MINE RULE Operator Proc. Second European Conf. Principles and Practice Knowledge Discovery in Databases, pp. 194-202, 1998.
[8] D. Bourdick, M. Calimlin, and J. Gehrke, MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases Proc. 17th Int'l Conf. Data Eng., pp. 443-452, 2001.
[9] S. Chaudhuri and K. Shim, Optmization of Queries with User-Defined Predicates ACM Trans. Database Systems, vol. 24, no. 2, pp. 177-228, 1999.
[10] M.-S. Chen, J. Han, and P.S. Yu, Data Mining: An Overview from Database Perspective IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 866-883, Dec. 1996.
[11] D. Chimenti, R. Gamboa, and R. Krishnamurthy, Towards an Open Architecture for$\cal LDL$ Proc. 15th Int'l Conf. Very Large Data Bases, pp. 195-204, 1989.
[12] L. De Raedt, Data Mining as Constraint Logic Programming Computational Logic: Logic Programming and Beyond, pp. 526-547, 2002.
[13] L. De Raedt, A Logical Database Mining Language Proc. 10th Int'l Conf. Inductive Logic Programming, pp. 78-92, 2000.
[14] F. Giannotti and G. Manco, Making Knowledge Extraction and Reasoning Closer Proc. Fourth Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 360-371, 2000.
[15] F. Giannotti and G. Manco, Declarative Knowledge Extraction with Iterative User-Defined Aggregates Proc. Fourth Int'l Conf. Flexible Query Answering Systems, pp. 445-454, 2000.
[16] F. Giannotti and G. Manco, Querying Inductive Databases via Logic-Based User-Defined Aggregates Proc. Third European Conf. Principles and Practices of Knowledge Discovery in Databases, pp. 125-135, 1999.
[17] F. Giannotti, G. Manco, M. Nanni, and D. Pedreschi, Nondeterministic, Nonmonotonic Logic Databases IEEE Trans. Knowledge and Data Eng., vol. 13, no. 5, pp. 813-823, Sept./Oct. 2001.
[18] F. Giannotti, G. Manco, D. Pedreschi, and F. Turini, Experiences with a Logic-Based Knowledge Discovery Support Environment Selected Papers of the Sixth Italian Congress of Artificial Intellingence, pp. 202-213, 2000.
[19] F. Giannotti, G. Manco, and F. Turini, Specifying Mining Algorithms with Iterative User-Defined Aggregates: A Case Study Proc. Fifth European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 128-139, 2001.
[20] F. Giannotti, G. Manco, and J. Wijsen, Logical Languages for Data Mining Logics for Emerging Applications of Databases, J. Chomicki et al., eds., pp. 325-361, 2003.
[21] F. Giannotti, D. Pedreschi, and C. Zaniolo, Semantics and Expressive Power of Non Deterministic Constructs for Deductive Databases J. Computer and Systems Sciences, vol. 62, no. 1, pp. 15-42, 2001.
[22] G. Graefe, U. Fayyad, and S. Chaudhuri, On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining, pp. 204-208, 1998.
[23] J. Han, Y. Fu, K. Koperski, W. Wang, and O. Zaiane, DMQL: A Data Mining Query Language for Relational Databases Proc. ACM SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery, 1996.
[24] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM Conf. Management of Data, pp. 1-12, 2000.
[25] J. Hellerstein, Optimization Techniques for Queries with Expensive Methods ACM Trans. Database Systems, vol. 23, no. 2, pp. 113-157, 1998.
[26] T. Imielinski and H. Mannila, A Database Perspective on Knowledge Discovery Comm. ACM, vol. 39, no. 11, pp. 58-64, 1996.
[27] T. Imielinski and A. Virmani, MSQL: A Query Language for Database Mining Data Mining and Knowledge Discovery, vol. 3, no. 4, pp. 373-408, 1999.
[28] T. Johnson, L. Lakshmanan, and R.T. Ng, The 3W Model and Algebra for Unified Data Mining Proc. 26th Int'l Conf. Very Large Data Bases, pp. 21-32, 2000.
[29] L. Lakshmanan, F. Sadri, and S.N. Subramanian, On Efficiently Implementingschemasqlon a SQL Database Proc. 25th Int'l Conf. Very Large Data Bases, pp. 471-482, 1999.
[30] G. Manco, Foundations of a Logic-Based Framework for Intelligent Data Analysis PhD thesis, Dept. of Computer Science, Univ. of Pisa, Apr. 2001.
[31] H. Mannila, Inductive Databases and Condensed Representations for Data Mining Proc. Int'l Logic Programming Symp., pp. 21-30, 1997.
[32] R. Meo, G. Psaila, and S. Ceri, A New SQL-Like Operator for Mining Association Rules Proc. 22th Int'l Conf. Very Large Databases, pp. 122-133, 1996.
[33] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, Exploratory Mining and Pruning Optimizations of Constrained Associations Rules Proc. ACM Conf. Management of Data, pp. 13-24, 1998.
[34] S. Nestorov and S. Tsur, Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach Proc. Fourth Int'l Workshop Next Generation Information Technologies and Systems, pp. 295-311, 1999.
[35] J.S. Park, M.-S. Chen, and P.S. Yu, An Effective Hash-Based Algorithm for Mining Association Rules Proc. ACM Conf. Management of Data, pp. 175-187, 1997.
[36] A. Savasere, E. Omiecinski, and S.B. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases Proc. 21th Int'l Conf. Very Large Databases, pp. 432-444, 1995.
[37] R. Srikant, Fast Algorithms for Mining Association Rules and Sequential Patterns PhD thesis, Univ. of Wisconsin-Madison, 1996.
[38] H. Toivonen, Sampling Large Databases for Association Rules Proc. 22th Int'l Conf. Very Large Databases, pp. 134-145, 1996.
[39] D. Tsur et al., Query Flocks: A Generalization of Association-Rule Mining Proc. ACM Conf. Management of Data, pp. 1-12, 1998.
[40] J. Ullman, Principles of Database and Knowledge-Base Systems, vols. I-II, New York: Computer Science Press, 1989.
[41] H. Wang and C. Zaniolo, ATLaS: A Native Extension of SQL for Data Mining Proc. Third SIAM Conf. Data Mining, 2003.
[42] H. Wang and C. Zaniolo, Using SQL to Build New Aggregates and Extenders for Object-Relational Systems Proc. 26th Int'l Conf. Very Large Data Bases, pp. 166-175, 2000.
[43] M. Zaki and C. Hsiao, CHARM: An Efficient Algorithm for Closed Itemset Mining Proc. Second SIAM Int'l Conf. Data Mining, 2002.
[44] C. Zaniolo, N. Arni, and K. Ong, Negation and Aggregates in Recursive Rules: The${\cal LDL}+\!+$Approach Proc. Third Int'l Conf. Deductive and Object-Oriented Databases, pp. 204-221, 1993.
[45] C. Zaniolo, S. Ceri, C. Faloutsos, R.T. Snodgrass, V.S. Subrahmanian, and R. Zicari, Advanced Database Systems. Morgan Kaufman, 1997.
[46] C. Zaniolo and H. Wang, Logic-Based User-Defined Aggregates for the Next Generation of Database Systems The Logic Programming Paradigm: Current Trends and Future Directions, K.R. Apt et al., eds., Springer Verlag, 1998.

Index Terms:
Data mining, query languages, constraint and logic languages, rule-based databases, user-defined aggregates, association rules.
Citation:
Fosca Giannotti, Giuseppe Manco, Franco Turini, "Specifying Mining Algorithms with Iterative User-Defined Aggregates," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1232-1246, Oct. 2004, doi:10.1109/TKDE.2004.64
Usage of this product signifies your acceptance of the Terms of Use.