This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Scalable Parallel Data Mining for Association Rules
May/June 2000 (vol. 12 no. 3)
pp. 337-352

Abstract—In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size.

[1] M. Stonebraker, R. Agrawal, U. Dayal, E.J. Neuhold, and A. Reuter, "DBMS Research at the Crossroads: the Vienna Update," Proc. 19th Int'l Conf. Very Large Databases, R. Agrawal, S. Baker, and D. Bell, eds., pp. 688-692,Dublin, Ireland, 1993.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[3] M. Houtsma and A. Swami, “Set-Oriented Mining of Association Rules in Relational Databases,” 11th Int'l Conf. Data Eng., 1995.
[4] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[5] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407-419, Sept. 1995.
[6] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487-499, Dec. 1996.
[7] E.-H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” ACM SIGMOD Conf. Management of Data, May 1997.
[8] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[9] V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin Cummings, 1994.
[10] C.H. Papadimitriu and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice Hall, 1987.
[11] T. Shintani and M. Kitsuregawa, Hash Based Parallel Algorithms for Mining Association Rules Proc. Conf. Parallel and Distributed Information Systems, pp. 19-30, 1996.
[12] J. Park, M. Chen, and P. Yu, Efficient Parallel Data Mining for Association Rules Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 31-36, 1995.
[13] D.W. Cheung, V.T. Ng, W. Fu, and Y. Fu, “Efficient Mining Association Rules in Distributed Databases,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 911-922, Dec. 1996.
[14] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “Parallel Algorithms for Fast Discovery of Association Rules,” Data Mining and Knowledge Discovery: An Int'l Journal, vol. 1, no. 4, pp. 343–373, Dec. 1997.
[15] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[16] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, 1997.
[17] IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” http://www.netlib.org/utk/papers/mpi-book/ mpi-book.pshttp://www.almaden.ibm.com/ cs/questsyndata.html, 1996.

Index Terms:
Data mining, parallel processing, association rules, load balance, scalability.
Citation:
Eui-Hong (Sam) Han, George Karypis, Vipin Kumar, "Scalable Parallel Data Mining for Association Rules," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 3, pp. 337-352, May-June 2000, doi:10.1109/69.846289
Usage of this product signifies your acceptance of the Terms of Use.