This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Mining of Association Rules in Distributed Databases
December 1996 (vol. 8 no. 6)
pp. 911-922

Abstract—Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm, DMA, is proposed. It generates a small number of candidate sets and requires only O(n) messages for support count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental test bed and its performance is studied. The results show that DMA has superior performance when comparing with the direct application of a popular sequential algorithm in distributed databases.

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms, pp. 69-84, Oct. 1993.
[2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[3] R. Agrawal, T. Imielinski, and A. Swami, Database Mining: A Performance Perspective IEEE Trans. Knowledge and Data Eng., vol. 5, no. 6, Dec. 1993.
[4] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[5] R. Agrawal and J.C. Shafer, "Parallel Mining of Association Rules: Design, Implementation, and Experience," IBM Research Report RJ1004, 1996.
[6] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[7] D.W. Cheung, A. W.-C. Fu, and J. Han, "Knowledge Discovery in Databases: A Rule-Based Attribute-Oriented Approach," Proc. Int'l Symp. Methodologies for Intelligent Systems, pp. 164-173,Charlotte, N.C., Oct. 1994.
[8] D. Cheung, J. Han, V. Ng, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique Proc. 1996 Int'l Conf. Data Eng., pp. 106-114, Feb. 1996.
[9] U. Fayyad et al., eds., Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, Mass., 1996.
[10] W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus, "Knowledge Discovery in Databases: An Overview," Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley, eds., pp. 1-27. AAAI/MIT Press, 1991.
[11] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek,, and V. Sunderam,PVM: Parallel Virtual Machine—A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, 1994.
[12] J. Han, Y. Cai, and N. Cercone, "Data-Driven Discovery of Quantitative Rules in Relational Databases," IEEE Trans. Knowledge and Data Eng., pp. 29-40, Feb. 1993.
[13] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[14] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, “Finding Interesting Rules from Large Sets of Association Rules,” Proc. Third Int'l Conf. Information and Knowledge Management, N.R. Adam, K.B. Bhargava, and Y. Yesha, eds. pp. 401-407, 1994.
[15] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Databases, Morgan Kaufmann, 1994, pp. 144-155.
[16] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[17] J. Park, M. Chen, and P. Yu, Efficient Parallel Data Mining for Association Rules Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 31-36, 1995.
[18] G. Piatetsky-Shapiro and W.J. Frawley, Knowledge Discovery in Databases. AAAI/MIT Press, 1991.
[19] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[20] A. Silberschatz, M. Stonebraker, and J. Ullman, "Database Research: Achievements and Opportunities into the 21st Century," Report NSF Workshop Future of Databases Systems Research, May 1995.
[21] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407-419, Sept. 1995.
[22] J. Ullman, Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press, 1988.

Index Terms:
Data mining, knowledge discovery, distributed data mining, association rule, distributed database, distributed algorithm, partitioned database.
Citation:
David W. Cheung, Vincent T. Ng, Ada W. Fu, Yongjian Fu, "Efficient Mining of Association Rules in Distributed Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 911-922, Dec. 1996, doi:10.1109/69.553158
Usage of this product signifies your acceptance of the Terms of Use.