The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2011 vol.22)
pp: 1632-1640
Bora Uçar , LIP, ENS Lyon, Lyon
Cevdet Aykanat , Bilkent University, Ankara
ABSTRACT
We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases.
INDEX TERMS
Parallel data mining, frequent item set mining, mining methods and algorithms, selective data replication, graph partitioning by vertex separator.
CITATION
Bora Uçar, Cevdet Aykanat, "Parallel Frequent Item Set Mining with Selective Item Replication", IEEE Transactions on Parallel & Distributed Systems, vol.22, no. 10, pp. 1632-1640, Oct. 2011, doi:10.1109/TPDS.2011.32
REFERENCES
[1] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, 1994.
[2] M.J. Zaki, "Generating Non-Redundant Association Rules," Proc. Knowledge Discovery and Data Mining Conf., pp. 34-43, 2000.
[3] D.-I. Lin and Z.M. Kedem, "Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set," Proc. Sixth Int'l Conf. Extending Database Technology, pp. 105-119, 1998.
[4] R. Agrawal and J.C. Shafer, "Parallel Mining of Association Rules," IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 962-969, Dec. 1996.
[5] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, "Parallel Algorithms for Discovery of Association Rules," Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 343-373, 1997.
[6] D.W. Cheung, V.T. Ng, A.W. Fu, and Y.J. Fu, "Efficient Mining of Association Rules in Distributed Databases," IEEE Trans. Knowledge And Data Eng., vol. 8, no. 6, pp. 911-922, Dec. 1996.
[7] O. Zaïane, M. El-Hajj, and P. Lu, "Fast Parallel Association Rule Mining without Candidacy Generation," Proc. IEEE Int'l Conf. Data Mining (ICDM '01), Nov.-Dec. 2001.
[8] A. Rudra, R.P. Gopalan, and Y.G. Sucahyo, "Scalable Parallel Mining for Frequent Patterns from Dense Data Sets Using a Cluster of PCs," Proc. Sixth Int'l Conf. Information Technology, Dec. 2003.
[9] E.-H. Han, G. Karypis, and V. Kumar, "Scalable Parallel Data Mining for Association Rules," IEEE Trans. Knowledge and Data Eng., vol. 12, no. 3, pp. 337-352, May 2000.
[10] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, "An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets," Proc. Fifth Int'l Conf. High Performance Computing for Computational Science (VECPAR '02), 2003.
[11] C. Lucchese, S. Orlando, and R. Perego, "Fast and Memory Efficient Mining of Frequent Closed Itemsets," IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 21-36, Jan. 2006.
[12] H. Li, Y. Wang, D. Zhang, M. Zhang, and E.Y. Chang, "Pfp: Parallel fp-Growth for Query Recommendation," Proc. ACM Conf. Recommender Systems (RecSys), P. Pu, D.G. Bridge, B. Mobasher, and F. Ricci, eds., pp. 107-114, 2008.
[13] A. Savasere, E. Omiecinski, and S.B. Navathe, "An Efficient Algorithm for Mining Association Rules in Large Databases," Proc. 21st Int'l Conf. Very Large Data Bases (VLDB '95), pp. 432-444, Sept. 1995.
[14] J.W.H. Liu, "A Graph Partitioning Algorithm by Node Separators," ACM Trans. Math. Software, vol. 15, no. 3, pp. 198-219, Sept. 1989.
[15] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Jan. 1999.
[16] A. Grama, V. Kumar, A. Gupta, and G. Karypis, An Introduction to Parallel Computing: Design and Analysis of Algorithms, second ed., Addison-Wesley, 2003.
[17] R.J. BayardoJr., "Efficiently Mining Long Patterns from Databases," ACM SIGMOD Record, vol. 27, no. 2, pp. 85-93, 1998.
[18] S. Orlando, C. Lucchese, P. Palmerini, R. Perego, and F. Silvestri, "kDCI: A Multi-Strategy Algorithm for Mining Frequent Sets," Proc. Second IEEE Int'l Conf. Data Mining (ICDM) Workshop Frequent Itemset Mining Implementations (FIMI '04), 2004.
[19] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, "Adaptive and Resource-Aware Mining of Frequent Sets," Proc. IEEE Int'l Conf. Data Mining (ICDM '02), pp. 338-345, Dec. 2002.
[20] T. Uno, M. Kiyomi, and H. Arimura, "LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets," Proc. Second IEEE Int'l Conf. Data Mining (ICDM) Workshop Frequent Itemset Mining Implementations (FIMI '04), 2004.
[21] A. Fiat and S. Shporer, "AIM2: Another Itemset Miner," Proc. Second IEEE Int'l Conf. Data Mining (ICDM) Workshop Frequent Itemset Mining Implementations (FIMI '04), 2004.
[22] E. Özkural and C. Aykanat, "A Space Optimization for FP-Growth," Proc. Second IEEE Int'l Conf. Data Mining (ICDM) Workshop Frequent Itemset Mining Implementations (FIMI '04), 2004.
[23] Ü.V. Çatalyürek and C. Aykanat, "Hypergraph-Partitioning-Based Sparse Matrix Ordering," Proc. Second Int'l Workshop Combinatorial Scientific Computing (CSC '05), June 2005.
[24] U. Catalyurek, C. Aykanat, and E. Kayaaslan, "Hypergraph Partitioning-Based Fill-Reducing Ordering," Technical Report BU-CE-0904, Bilkent Univ. Inst. of Science and Eng., 2009.
[25] Ü.V. Çatalyürek and C. Aykanat, "PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0," technical report, Bilkent Univ., Computer Eng. Dept., 1999.
[26] Ü.V. Çatalyürek and C. Aykanat, "Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication," IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 7, pp. 673-693, July 1999.
17 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool