Issue No. 01 - January (2006 vol. 18)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2006.10
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like Closet+ and FP-Close, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.
Index Terms- Data mining, association rules, frequent itemsets, condensed representations, closed itemsets, high-performance algorithms.
Raffaele Perego, Claudio Lucchese, Salvatore Orlando, "Fast and Memory Efficient Mining of Frequent Closed Itemsets", IEEE Transactions on Knowledge & Data Engineering, vol. 18, no. , pp. 21-36, January 2006, doi:10.1109/TKDE.2006.10