This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Super-Programming Approach for Mining Association Rules in Parallel on PC Clusters
September 2004 (vol. 15 no. 9)
pp. 783-794

Abstract—PC clusters have become popular in parallel processing. They do not involve specialized interprocessor networks, so the latency of data communications is rather long. The programming models for PC clusters are often different than those for parallel machines or supercomputers containing sophisticated interprocessor communication networks. For PC clusters, load balancing among the nodes becomes a more critical issue in attempts to yield high performance. We introduce a new model for program development on PC clusters, namely, the Super-Programming Model (SPM). The workload is modeled as a collection of Super-Instructions (SIs). We propose that a set of SIs be designed for each application domain. They should constitute an orthogonal set of frequently used high-level operations in the corresponding application domain. Each SI should normally be implemented as a high-level language routine that can execute on any PC. Application programs are modeled as Super-Programs (SPs), which are coded using SIs. SIs are dynamically assigned to available PCs at runtime. Because of the known granularity of SIs, an upper bound on their execution time can be estimated at static time. Therefore, dynamic load balancing becomes an easier task. Our motivation is to support dynamic load balancing and code porting, especially for applications with diverse sets of inputs such as data mining. We apply here SPM to the implementation of an Apriori-like algorithm for mining association rules. Our experiments show that the average idle time per node is kept very low.

[1] M. Oguchi, T. Shintani, T. Tamura, and M. Kitsuregawa, Optimizing Protocol Parameters to Large Scale PC Cluster and Evaluation of its Effectiveness with Parallel Data Mining, Proc. Seventh Int'l Symp. High Performance Distributed Computing, pp. 34-41, July 1998.
[2] T. Hiroyas, M. Miki, and Y. Tanimura, The Differences of Parallel Efficiency between the Two Models of Parallel Genetic Algorithms on PC Cluster Systems Proc. Conf. High Performance Computing Asia-Pacific Region, vol. 2, pp. 945-948, 2000.
[3] T. Fahringer and A. Jugravu, JavaSymphony: New Directives to Control and Synchronize Locality, Parallelism, and Load Balancing for Cluster and GRID-Computing Proc. 2002 Joint ACM-ISCOPE Conf. Java Grande, Nov. 2002.
[4] T.G. Mattson, High Performance Computing at Intel: The OSCAR Software Solution Stack for Cluster Computing Proc. First IEEE/ACM Int'l Symp. Cluster Computing Grid, pp. 22-25, 2001.
[5] http://www.netlib.orgblas/, 2004.
[6] R. Agrawal and R. Srikant, Mining Association Rules between Sets of Items in Large Databases Proc. ACM SIGMOD, pp. 207-216, May 1993.
[7] R. Agrawal, T. Imielinski, and A. Swami, Fast Algorithms for Mining Association Rules in Large Databases Proc. 20th Very Large Databases Conf., Sept. 1994.
[8] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487-499, Dec. 1996.
[9] M.J. Zaki, M. Ogihara, S. Parthasarathy, and W. Li, Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors Proc. ACM/IEEE Conf. Supercomputing, July 1996.
[10] R. Agrawal, C. Aggarwal, and V.V.V. Prasad, A Tree Projection Algorithm for Generation of Frequent Itemsets J. Parallel and Distributed Computing, vol. 61, no. 3, pp. 350-371, 2001.
[11] S. Brin, R. Motwani, J. Ullman, and S. Tsur, Dynamic Itemset Counting and Implication Rules for Market Basket Data Proc. ACM SIGMOD, pp. 255-264, May 1997.
[12] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM SIGMOD, pp. 1-12, May 2000.
[13] D. Lin and Z.M. Kedem, Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 553-566, Dec. 2002.
[14] E. Han, G. Karypis, and V. Kumar, Scalable Parallel Data Mining for Association Rules Proc. ACM SIGMOD, pp. 277-288, May 1997.
[15] D.W. Cheung, S.D. Lee, and Y. Xiao, Effect of Data Skewness and Workload Balance in Parallel Data Mining IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 498-514, Dec. 2002.
[16] T. Shintani and M. Kitsuregawa, Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy Proc. ACM SIGMOD, pp. 25-36, 1998.
[17] T. Shintani and M. Kitsuregawa, Hash Based Parallel Algorithms for Mining Association Rules Proc. Conf. Parallel and Distributed Information Systems, pp. 19-30, 1996.
[18] D.W. Cheung, K. Hu, and S. Xia, Asynchronous Parallel Algorithm for Mining Association Rules on a Shared-Memory Multi-Processors Proc. 10th Ann. ACM Symp. Parallel Algorithms and Architectures, pp. 279-288, 1998.
[19] J. Han and Y. Fu, Discover of Multiple-Level Association Rules from Large Database Proc. 21st Very Large Databases Conf., Sept. 1995.
[20] http://www.almaden.ibm.com/cs/questsyndata.html , 2004.
[21] D. Jin and S.G. Ziavras, A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Cluster Proc. Second Workshop Hardware/Software Support High Performance Scientific and Engineering Computing, Sept. 2003.

Index Terms:
Mining association rules, cluster computing, load balancing, parallel processing.
Citation:
Dejiang Jin, Sotirios G. Ziavras, "A Super-Programming Approach for Mining Association Rules in Parallel on PC Clusters," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 9, pp. 783-794, Sept. 2004, doi:10.1109/TPDS.2004.37
Usage of this product signifies your acceptance of the Terms of Use.