The Community for Technology Leaders
2015 IEEE 31st International Conference on Data Engineering (ICDE) (2015)
Seoul, South Korea
April 13, 2015 to April 17, 2015
ISBN: 978-1-4799-7964-6
pp: 1071-1082
Gregory Buehrer , Microsoft, USA
David Fuhry , The Ohio State University, USA
Srinivasan Parthasarathy , The Ohio State University, USA
Extracting interesting patterns from large data stores efficiently is a challenging problem in many domains. In the data mining literature, pattern frequency has often been touted as a proxy for interestingness and has been leveraged as a pruning criteria to realize scalable solutions. However, while there exist many frequent pattern algorithms in the literature, all scale exponentially in the worst case, restricting their utility on very large data sets. Furthermore, as we theoretically argue in this article, the problem is very hard to approximate within a reasonable factor, with a polynomial time algorithm. As a counter point to this theoretical result, we present a practical algorithm called Localized Approximate Miner (LAM) that scales linearithmically with the input data. Instead of fully exploring the top of the search lattice to a user-defined point, as traditional mining algorithms do, we instead explore different parts of the complete lattice, efficiently. The key to this efficient exploration is the reliance on min-wise independent permutations to collect the data into highly similar subsets of a partition. It is straightforward to implement and scales to very large data sets. We illustrate its utility on a range of data sets, and demonstrate that the algorithm finds more patterns of higher utility in much less time than several state-of-the-art algorithms. Moreover, we realize a natural multi-level parallelization of LAM that further reduces runtimes by up to 193-fold when leveraging 256 CMP cores spanning 32 machines.
Integrated circuits, Itemsets
Gregory Buehrer, Roberto L. de Oliveira, David Fuhry, Srinivasan Parthasarathy, "Towards a parameter-free and parallel itemset mining algorithm in linearithmic time", 2015 IEEE 31st International Conference on Data Engineering (ICDE), vol. 00, no. , pp. 1071-1082, 2015, doi:10.1109/ICDE.2015.7113357
83 ms
(Ver 3.3 (11022016))