
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
D.W. Cheung, S.D. Lee, V. Xiao, "Effect of Data Skewness and Workload Balance in Parallel Data Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 498514, May/June, 2002.  
BibTex  x  
@article{ 10.1109/TKDE.2002.1000339, author = {D.W. Cheung and S.D. Lee and V. Xiao}, title = {Effect of Data Skewness and Workload Balance in Parallel Data Mining}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {14}, number = {3}, issn = {10414347}, year = {2002}, pages = {498514}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2002.1000339}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Effect of Data Skewness and Workload Balance in Parallel Data Mining IS  3 SN  10414347 SP498 EP514 EPD  498514 A1  D.W. Cheung, A1  S.D. Lee, A1  V. Xiao, PY  2002 KW  Association rules KW  data mining KW  data skewness KW  workload balance KW  parallel mining KW  partitioning VL  14 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—To mine association rules efficiently, we have developed a new parallel mining algorithm FPM on a distributed sharenothing parallel system in which data are partitioned across the processors. FPM is an enhancement of the FDM algorithm, which we previously proposed for distributed mining of association rules. FPM requires fewer rounds of message exchanges than FDM and, hence, has a better response time in a parallel environment. The algorithm has been experimentally found to outperform CD, a representative parallel algorithm for the same goal. The efficiency of FPM is attributed to the incorporation of two powerful candidate sets pruning techniques: distributed and global prunings. The two techniques are sensitive to two data distribution characteristics, data skewness, and workload balance. Metrics based on entropy are proposed for these two characteristics. The prunings are very effective when both the skewness and balance are high. In order to increase the efficiency of FPM, we have developed methods to partition a database so that the resulting partitions have high balance and skewness. Experiments have shown empirically that our partitioning algorithms can achieve these aims very well, in particular, the results are consistently better than a random partitioning. Moreover, the partitioning algorithms incur little overhead. So, using our partitioning algorithms and FPM together, we can mine association rules from a database efficiently.
[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACMSIGMOD Int'l Conf. Management of Data, pp. 207216, May 1993.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487499, Sept. 1994.
[3] R. Agrawal and J.C. Shafer, “Parallel Mining of Association Rules: Design, Implementation and Experience,” Technical Report TJ10004, IBM Research Division, Almaden Research Center, 1996.
[4] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[5] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley&Sons, 1991.
[6] D. Cheung, J. Han, V. Ng, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique Proc. 1996 Int'l Conf. Data Eng., pp. 106114, Feb. 1996.
[7] D. Cheung, J. Han, V. Ng, A. Fu, and Y. Fu, “A Fast Distributed Algorithm for Mining Association Rules,” Fourth Int'l Conf. Parallel and Distributed Information Systems, Dec. 1996.
[8] D.W. Cheung, V.T. Ng, W. Fu, and Y. Fu, “Efficient Mining Association Rules in Distributed Databases,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 911922, Dec. 1996.
[9] S.K. Gupta, Linear Programming and Network Models. New Delhi: Affiliated EastWest Press, 1985.
[10] E.H. Han, G. Karypis, and V. Kumar, “Scalable Parallel Data Mining for Association Rules,” ACM SIGMOD Conf. Management of Data, May 1997.
[11] J. Han and Y. Fu, “Discovery of MultipleLevel Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420431, Sept. 1995.
[12] M. Houtsma and A. Swami, “SetOriented Mining of Association Rules in Relational Databases,” 11th Int'l Conf. Data Eng., 1995.
[13] International Business Machines, Scalable POWERparallel Systems, GA23247502 ed. Feb. 1995.
[14] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
[15] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” AAAI Workshop Knowledge Discovery in Databases (KDD94), July 1994.
[16] Message Passing Interface Forum, MPI: A MessagePassing Interface Standard. May 1994.
[17] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective HashBased Algorithm for Mining Association Rules,” Proc. 1995 ACMSIGMOD Int'l Conf. Management of Data, pp. 175186, May 1995.
[18] J. Park, M. Chen, and P. Yu, Efficient Parallel Data Mining for Association Rules Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 3136, 1995.
[19] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432443, Sept. 1995.
[20] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407419, Sept. 1995.
[21] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT), pp. 317, 1996.
[22] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proc. 1996 ACMSIGMOD Int'l Conf. Management of Data, pp. 112, June 1996.
[23] T. Shintani and M. Kitsuregawa, Hash Based Parallel Algorithms for Mining Association Rules Proc. Conf. Parallel and Distributed Information Systems, pp. 1930, 1996.
[24] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 1996 Int'l Conf. Very Large Data Bases, pp. 134145, Sept. 1996.
[25] M.J. Zaki, M. Ogihara, S. Parthasarathy, and W. Li, “Parallel Data Mining for Association Rules on SharedMemory MultiProcessors,” Technical Report 618, Computer Science Dept., The Univ. of Rochester, May 1996.