This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Synthesizing High-Frequency Rules from Different Data Sources
March/April 2003 (vol. 15 no. 2)
pp. 353-367

Abstract—Many large organizations have multiple data sources, such as different branches of an interstate company. While putting all data together from different sources might amass a huge database for centralized processing, mining association rules at different data sources and forwarding the rules (rather than the original raw data) to the centralized company headquarter provides a feasible way to deal with multiple data source problems. In the meanwhile, the association rules at each data source may be required for that data source in the first instance, so association analysis at each data source is also important and useful. However, the forwarded rules from different data sources may be too many for the centralized company headquarter to use. This paper presents a weighting model for synthesizing high-frequency association rules from different data sources. There are two reasons to focus on high-frequency rules. First, a centralized company headquarter is interested in high-frequency rules because they are supported by most of its branches for corporate profitability. Second, high-frequency rules have larger chances to become valid rules in the union of all data sources. In order to extract high-frequency rules efficiently, a procedure of rule selection is also constructed to enhance the weighting model by coping with low-frequency rules. Experimental results show that our proposed weighting model is efficient and effective.

[1] C.C. Aggarwal and P.S. Yu, “A New Framework for Itemset Generation,” Proc. ACM Principles of Database Systems Conf., 1998.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[4] R. Agrawal and J.C. Shafer, Parallel Mining of Association Rules: Design, Implementation, and Experience IEEE Trans. Knowledge and Data Eng., pp. 487-499, Dec. 1996.
[5] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Basket: Generalizing Association Rules to Correlations,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data, pp. 265-276, May 1997.
[6] P. Chan, “An Extensible Meta-Learning Approach for Scalable and Accurate Inductive Learning,” PhD dissertation, Dept. of Computer Science, Columbia Univ., New York, 1996.
[7] J. Chattratichat, “Large Scale Data Mining: Challenges and Responses,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 143-146, 1997.
[8] M.-S. Chen, J. Han, and P.S. Yu, Data Mining: An Overview from Database Perspective IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 866-883, Dec. 1996.
[9] D.W. Cheung, V.T. Ng, W. Fu, and Y. Fu, “Efficient Mining Association Rules in Distributed Databases,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 911-922, Dec. 1996.
[10] D. Cheung, J. Han, V. Ng, and C.Y. Wong, Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique Proc. 1996 Int'l Conf. Data Eng., pp. 106-114, Feb. 1996.
[11] D. Cheung, S. Lee, and B. Kao, “A General Incremental Technique for Maintaining Discovered Association Rules,” Proc. Fifth Database Systems for Advances Applications (DASFAA) Conf., Apr. 1997.
[12] S.H. Clearwater, T.P. Cheng, H. Hirsh, H. , and B.G. Buchanan, “Incremental Batch Learning,” Proc. Sixth Int'l Workshop Machine Learning, Morgan Kaufmann, pp. 366-370, 1989.
[13] R. Godin and R. Missaoui, “An Incremental Concept Formation Approach for Learning from Databases,” Theoretical Computer Science, pp. 387-419, 1994.
[14] I. Good, Probability and the Weighting of Evidence. London: Charles Griffin, 1950.
[15] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM SIGMOD Conf. Management of Data (SIGMOD '00), pp. 1-12, 2000.
[16] C. Hidber, "Online Association Rule Mining," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1999, pp. 145-156.
[17] R. Kohavi and G.H. John, Wrappers for Feature Subset Selection Artificial Intelligence, vol. 97, pp. 273-324, 1997.
[18] G. Lee, K.L. Lee, and A.L.P. Chen, “Efficient Graph-Based Algorithms for Discovering and Maintaining Association Rules in Large Databases,” Knowledge and Information Systems, vol. 3, pp. 338-355, Mar. 2001.
[19] V. Lesser, B. Horling, F. Klassner, A. Raja, T. Wagner, and S. Zhang, “BIG: An Agent for Resource-Bounded Information Gathering and Decision Making,” Artificial Intelligence, (Special Issue on Internet Information Agents), vol. 118, pp. 197-244, 2000.
[20] H. Liu and R. Setiono, Incremental Feature Selection Applied Intelligence, vol. 9, pp. 217-230, 1998.
[21] J. Ortega, M. Koppel, S. Argamon, “Arbitrating Among Competing Classifiers Using Learned Referees,” Knowledge and Information Systems, vol. 4, pp. 470-490, Mar. 2001.
[22] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[23] J. Park, M. Chen, and P. Yu, Efficient Parallel Data Mining for Association Rules Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 31-36, 1995.
[24] S. Parthasarathy, M.J. Zaki, M. Ogihara, and W. Li, “Parallel Data Mining for Association Rules on Shared-Memory Systems,” Knowledge and Information Systems, vol. 1, pp. 1-29, Mar. 2001.
[25] G. Piatetsky-Shapiro, “Discovery, Analysis, and Presentation of Strong Rules,” Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley, eds., AAAI Press/MIT Press, pp. 229-248, 1991.
[26] A. Prodromidis and S. Stolfo, “Pruning Meta-Classifiers in a Distributed Data Mining System,” Proc. First Nat'l Conf. New Information Technologies, pp. 151-160, Oct. 1998.
[27] A. Prodromidis, P. Chan, and S. Stolfo, “Meta-Learning in Distributed Data Mining Systems: Issues and Approaches,” Advances in Distributed and Parallel Knowledge Discovery, H. Kargupta and P. Chan, eds., AAAI/MIT Press, 2000.
[28] A.L. Prodromidis and S.J. Stolfo, “Cost Complexity-Based Pruning of Ensemble Classifiers,” Knowledge and Information Systems, vol. 4, pp. 449-469, Mar. 2001.
[29] F. Provost and V. Kolluri, “A Survey of Methods for Scaling Up Inductive Algorithms,” Data Mining and Knowledge Discovery, vol. 2, pp. 131-169, Mar. 1999.
[30] R. Rastogi and K. Shim, “Mining Optimized Support Rules for Numeric Attributes,” Proc. ACM SIGMOD Conf. Management of Data, 1999.
[31] M. Sayal and P. Scheuermann, “Distributed Web Log Mining Using Maximal Large Itemsets,” Knowledge and Information Systems, vol. 4, pp. 389-404, Mar. 2001.
[32] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[33] V. Seshadri, S. Weiss, and R. Sasisekharan, “Feature Extraction for Massive Data Mining,” Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 258-262, 1995.
[34] T. Shintani and M. Kitsuregawa, Parallel Mining Algorithms for Generalized Association Rules With Classification Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, pp. 25-36, 1998.
[35] M.-L. Shyu, S.-C. Chen, and R.L. Kashyap, “Generalized Affinity-Based Association Rule Mining for Multimedia Database Queries,” Knowledge and Information Systems, vol. 3, pp. 319-337, Mar. 2001.
[36] D.B. Skillicorn and Y. Wang, “Parallel and Sequential Algorithms for Data Mining Using Inductive Logic,” Knowledge and Information Systems, vol. 4, pp. 405-421, Mar. 2001.
[37] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, June 1996.
[38] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Future Generation Computer Systems, vol. 13, pp. 161-180, 1997.
[39] K.M. Ting and I.H. Witten, “Stacked Generalization: When Does It Work?,” Proc. Int'l Joint Conf. Artificial Intelligence '97, pp. 866-871, 1997.
[40] D. Tsur, J.R. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal, “Query Flocks: A Generalization of Association-Rule Mining,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, May 1998.
[41] G.I. Webb, “Efficient Search for Association Rules,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 99-107, Aug. 2000.
[42] D. Wolpert, "Stacked Generalization," Neural Networks, Vol. 5, 1992, pp. 241-259.
[43] X. Wu and W. Lo, “Multi-Layer Incremental Induction,” Proc. Fifth Pacific Rim Int'l Conf. Artificial Intelligence, pp. 24-32, 1998.

Index Terms:
Large databases, multiple data sources, association rules, synthesizing, weighting, rule selection.
Citation:
Xindong Wu, Shichao Zhang, "Synthesizing High-Frequency Rules from Different Data Sources," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 353-367, March-April 2003, doi:10.1109/TKDE.2003.1185839
Usage of this product signifies your acceptance of the Terms of Use.