Subscribe

Issue No.04 - April (2009 vol.21)

pp: 493-506

Elena Baralis , Politecnico di Torino, Torino

Tania Cerquitelli , Politecnico di Torino, Torino

Silvia Chiusano , Politecnico di Torino, Torino

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.180

ABSTRACT

This paper presents the IMine index, a general and compact structure which provides tight integration of itemset extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different itemset extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v.2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large datasets. Itemset mining supported by the IMine index shows performance always comparable with, and sometimes better than, state of the art algorithms accessing data on flat file.

INDEX TERMS

Data Mining, Itemset Extraction, Indexing

CITATION

Elena Baralis, Tania Cerquitelli, Silvia Chiusano, "IMine: Index Support for Item Set Mining",

*IEEE Transactions on Knowledge & Data Engineering*, vol.21, no. 4, pp. 493-506, April 2009, doi:10.1109/TKDE.2008.180REFERENCES

- [1] R. Agrawal and R. Srikant, “Fast Algorithm for Mining Association Rules,”
Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), Sept. 1994.- [2] R. Agrawal, T. Imilienski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,”
Proc. ACM SIGMOD '93, May 1993.- [3] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,”
Proc. ACM SIGMOD, 2000.- [4] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient Algorithms for Discovering Association Rules,”
Proc. AAAI Workshop Knowledge Discovery in Databases (KDD '94), pp. 181-192, 1994.- [5] A. Savasere, E. Omiecinski, and S.B. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,”
Proc. 21st Int'l Conf. Very Large Data Bases (VLDB '95), pp. 432-444, 1995.- [6] H. Toivonen, “Sampling Large Databases for Association Rules,”
Proc. 22nd Int'l Conf. Very Large Data Bases (VLDB '96), pp. 134-145, 1996.- [7] M. El-Hajj and O.R. Zaiane, “Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining,”
Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), 2003.- [8] G. Grahne and J. Zhu, “Mining Frequent Itemsets from Secondary Memory,”
Proc. IEEE Int'l Conf. Data Mining (ICDM '04), pp. 91-98, 2004.- [9] G. Ramesh, W. Maniatty, and M. Zaki, “Indexing and Data Access Methods for Database Mining,”
Proc. ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD), 2002.- [11] G. Cong and B. Liu, “Speed-Up Iterative Frequent Itemset Mining with Constraint Changes,”
Proc. IEEE Int'l Conf. Data Mining (ICDM '02), pp. 107-114, 2002.- [13] R. Srikant, Q. Vu, and R. Agrawal, “Mining Association Rules with Item Constraints,”
Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 67-73, 1997.- [14] T. Uno, M. Kiyomi, and H. Arimura, “LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets,”
Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations (FIMI), 2004.- [16] POSTGRESQL, http:/www.postgresql.org, 2008.
- [17] G. Grahne and J. Zhu, “Efficiently Using Prefix-Trees in Mining Frequent Itemsets,”
Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations (FIMI '03), Nov. 2003.- [18] G. Moerkotte, “Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing,”
Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), pp. 476-487, 1998.- [20] A. Pietracaprina and D. Zandolin, “Mining Frequent Itemsets Using Patricia Tries,”
Proc. IEEE ICDM Workshop Frequent Itemset Mining Implementations (FIMI), 2003.- [22] FIMI, http:/fimi.cs.helsinki.fi/, 2008.
- [23] N. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective,”
IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.- [24] S. Nestorov and S. Tsur, “Integrating Data Mining with Relational DBMS: A Tightly-Coupled Approach,”
Proc. Fourth Workshop Next Generation Information Technologies and Systems (NGITS), 1999.- [25] R. Agrawal and K. Shim, “Developing Tightly-Coupled Data Mining Applications on a Relational Database System,”
Proc. Second Int'l Conf. Knowledge Discovery in Databases and Data Mining (KDD), 1996.- [26] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Mining with Relational Database Systems: Alternatives and Implications,”
Proc. ACM SIGMOD, 1998.- [27] J. Han, Y. Fu, W. Wang, K. Koperski, and O. Zaiane, “DMQL: A Data Mining Query Language for Relational Databases,”
Proc. ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD), 1996.- [28] R. Meo, G. Psaila, and S. Ceri, “A New SQL-Like Operator for Mining Association Rules,”
Proc. 22nd Int'l Conf. Very Large Data Bases (VLDB), 1996.- [29] M. Botta, J.-F. Boulicaut, C. Masson, and R. Meo, “A Comparison between Query Languages for the Extraction of Association Rules,”
Proc. Fourth Int'l Conf. Data Warehousing and Knowledge Discovery (DaWak), 2002.- [30] S. Chaudhuri, V. Narasayya, and S. Sarawagi, “Efficient Evaluation of Queries with Mining Predicates,”
Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.- [32] B. Lan, B. Ooi, and K.-L. Tan, “Efficient Indexing Structures for Mining Frequently Patterns,”
Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.- [33] E. Baralis, T. Cerquitelli, and S. Chiusano, “Index Support for Frequent Itemset Mining in a Relational DBMS,”
Proc. 21st Int'l Conf. Data Eng. (ICDE), 2005.- [34] G. Liu, H. Lu, W. Lou, and J.X. Yu, “On Computing, Storing and Querying Frequent Patterns,”
Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), 2003. |