The Community for Technology Leaders
Green Image
Issue No. 04 - April (2009 vol. 21)
ISSN: 1041-4347
pp: 493-506
Silvia Chiusano , Politecnico di Torino, Torino
Elena Baralis , Politecnico di Torino, Torino
Tania Cerquitelli , Politecnico di Torino, Torino
ABSTRACT
This paper presents the IMine index, a general and compact structure which provides tight integration of itemset extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. The IMine index structure can be efficiently exploited by different itemset extraction algorithms. In particular, IMine data access methods currently support the FP-growth and LCM v.2 algorithms, but they can straightforwardly support the enforcement of various constraint categories. The IMine index has been integrated into the PostgreSQL DBMS and exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed index and its linear scalability also for large datasets. Itemset mining supported by the IMine index shows performance always comparable with, and sometimes better than, state of the art algorithms accessing data on flat file.
INDEX TERMS
Data Mining, Itemset Extraction, Indexing
CITATION
Silvia Chiusano, Elena Baralis, Tania Cerquitelli, "IMine: Index Support for Item Set Mining", IEEE Transactions on Knowledge & Data Engineering, vol. 21, no. , pp. 493-506, April 2009, doi:10.1109/TKDE.2008.180
97 ms
(Ver 3.3 (11022016))