The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.04 - April (2009 vol.21)
pp: 493-506
Elena Baralis , Politecnico di Torino, Torino
Tania Cerquitelli , Politecnico di Torino, Torino
Silvia Chiusano , Politecnico di Torino, Torino
ABSTRACT
This paper presents the IMine index, a general and compact structure which provides tight integration of itemset extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different itemset extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v.2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large datasets. Itemset mining supported by the IMine index shows performance always comparable with, and sometimes better than, state of the art algorithms accessing data on flat file.
INDEX TERMS
Data Mining, Itemset Extraction, Indexing
CITATION
Elena Baralis, Tania Cerquitelli, Silvia Chiusano, "IMine: Index Support for Item Set Mining", IEEE Transactions on Knowledge & Data Engineering, vol.21, no. 4, pp. 493-506, April 2009, doi:10.1109/TKDE.2008.180
REFERENCES
[1] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), Sept. 1994.
[2] R. Agrawal, T. Imilienski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD '93, May 1993.
[3] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD, 2000.
[4] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” Proc. AAAI Workshop Knowledge Discovery in Databases (KDD '94), pp. 181-192, 1994.
[5] A. Savasere, E. Omiecinski, and S.B. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 21st Int'l Conf. Very Large Data Bases (VLDB '95), pp. 432-444, 1995.
[6] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 22nd Int'l Conf. Very Large Data Bases (VLDB '96), pp. 134-145, 1996.
[7] M. El-Hajj and O.R. Zaiane, “Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), 2003.
[8] G. Grahne and J. Zhu, “Mining Frequent Itemsets from Secondary Memory,” Proc. IEEE Int'l Conf. Data Mining (ICDM '04), pp. 91-98, 2004.
[9] G. Ramesh, W. Maniatty, and M. Zaki, “Indexing and Data Access Methods for Database Mining,” Proc. ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD), 2002.
[10] Y.-L. Cheung, “Mining Frequent Itemsets without Support Threshold: With and without Item Constraints,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1052-1069, Sept. 2004.
[11] G. Cong and B. Liu, “Speed-Up Iterative Frequent Itemset Mining with Constraint Changes,” Proc. IEEE Int'l Conf. Data Mining (ICDM '02), pp. 107-114, 2002.
[12] C.K.-S. Leung, L.V.S. Lakshmanan, and R.T. Ng, “Exploiting Succinct Constraints Using FP-Trees,” SIGKDD Explorations Newsletter, vol. 4, no. 1, pp. 40-49, 2002.
[13] R. Srikant, Q. Vu, and R. Agrawal, “Mining Association Rules with Item Constraints,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 67-73, 1997.
[14] T. Uno, M. Kiyomi, and H. Arimura, “LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations (FIMI), 2004.
[15] J. Pei, J. Han, and L.V.S. Lakshmanan, “Pushing Convertible Constraints in Frequent Itemset Mining,” Data Mining and Knowledge Discovery, vol. 8, no. 3, pp. 227-252, 2004.
[16] POSTGRESQL, http:/www.postgresql.org, 2008.
[17] G. Grahne and J. Zhu, “Efficiently Using Prefix-Trees in Mining Frequent Itemsets,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations (FIMI '03), Nov. 2003.
[18] G. Moerkotte, “Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 476-487, 1998.
[19] K. Chakrabarti and S. Mehrotra, “The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces,” Proc. 15th Int'l Conf. Data Eng. (ICDE '99), pp. 440-447, 1999.
[20] A. Pietracaprina and D. Zandolin, “Mining Frequent Itemsets Using Patricia Tries,” Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations (FIMI), 2003.
[21] R. Bayer and E.M. McCreight, “Organization and Maintenance of Large Ordered Indices,” Acta Informatica, vol. 1, pp. 173-189, 1972.
[22] FIMI, http:/fimi.cs.helsinki.fi/, 2008.
[23] N. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.
[24] S. Nestorov and S. Tsur, “Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach,” Proc. Fourth Workshop Next Generation Information Technologies and Systems (NGITS), 1999.
[25] R. Agrawal and K. Shim, “Developing Tightly-Coupled Data Mining Applications on a Relational Database System,” Proc. Second Int'l Conf. Knowledge Discovery in Databases and Data Mining (KDD), 1996.
[26] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Mining with Relational Database Systems: Alternatives and Implications,” Proc. ACM SIGMOD, 1998.
[27] J. Han, Y. Fu, W. Wang, K. Koperski, and O. Zaiane, “DMQL: A Data Mining Query Language for Relational Databases,” Proc. ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD), 1996.
[28] R. Meo, G. Psaila, and S. Ceri, “A New SQL-Like Operator for Mining Association Rules,” Proc. 22nd Int'l Conf. Very Large Data Bases (VLDB), 1996.
[29] M. Botta, J.-F. Boulicaut, C. Masson, and R. Meo, “A Comparison between Query Languages for the Extraction of Association Rules,” Proc. Fourth Int'l Conf. Data Warehousing and Knowledge Discovery (DaWak), 2002.
[30] S. Chaudhuri, V. Narasayya, and S. Sarawagi, “Efficient Evaluation of Queries with Mining Predicates,” Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.
[31] M. Zaki, “Scalable Algorithms for Association Mining,” IEEE Trans. Knowledge Discovery and Data Eng., vol. 12, no. 3, pp. 372-390, May/June 2000.
[32] B. Lan, B. Ooi, and K.-L. Tan, “Efficient Indexing Structures for Mining Frequently Patterns,” Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.
[33] E. Baralis, T. Cerquitelli, and S. Chiusano, “Index Support for Frequent Itemset Mining in a Relational DBMS,” Proc. 21st Int'l Conf. Data Eng. (ICDE), 2005.
[34] G. Liu, H. Lu, W. Lou, and J.X. Yu, “On Computing, Storing and Querying Frequent Patterns,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), 2003.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool