This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Forecasting Association Rules Using Existing Data Sets
November/December 2003 (vol. 15 no. 6)
pp. 1448-1459
Sam Y. Sung, IEEE Computer Society

Abstract—An important issue that needs to be addressed when using data mining tools is the validity of the rules outside of the data set from which they are generated. Rules are typically derived from the patterns in a particular data set. When a new situation occurs, the change in the set of rules obtained from the new data set could be significant. In this paper, we provide a novel model for understanding how the differences between two situations affect the changes of the rules, based on the concept of fine partitioned groups that we call caucuses. Using this model, we provide a simple technique called Combination Data Set, to get a good estimate of the set of rules for a new situation. Our approach works independently of the core mining process and it can be easily implemented with all variations of rule mining techniques. Through experiments with real-life and synthetic data sets, we show the effectiveness of our technique in finding the correct set of rules under different situations.

[1] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[2] K. Ali, S. Manganaris, and R. Srikant, Partial Classification Using Association Rules Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, 1997.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[4] R. Agrawal and J. Shafer, Parallel Mining of Association Rules: Design, Implementation and Experience IEEE Trans. Knowledge and Data Eng., 1996.
[5] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication Rules for Market Basket Data,” ACM SIGMOD Conf. Management of Data, May 1997.
[6] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Basket: Generalizing Association Rules to Correlations,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data, pp. 265-276, May 1997.
[7] G.F. Cooper, A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships Data Mining and Knowledge Discovery, vol. 1, no. 2, 1997.
[8] M. Dash, H. Liu, and J. Yao, Dimensionality Reduction of Unsupervised Data Proc. Ninth IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI '97), pp. 532-539, 1997.
[9] J.D. Fast, Entropy: The Significance of the Concept of Entropy and Its Applications in Science and Technology The Statistical Significance of the Entropy Concept, Eindhoven: Philips Technical Library, 1962.
[10] Y. Fu and J. Han, Meta-Rule-Guided Mining of Association Rules in Relational Databases Proc. 1995 Int'l Workshop Knowledge Discovery and Deductive and Object-Oriented Databases, pp. 39-46, Dec. 1995.
[11] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, Mining Optimized Association Rules for Numeric Attributes Proc. 1996 ACM Symp. Principles of Database Systems, pp. 182-191, 1996.
[12] T. Fukuda, Y. Morimoto, S. Morishira, and T. Tokuyama, Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules Proc. 22nd Int'l Conf. Very Large Databases, Dec. 1996.
[13] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[14] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, “Finding Interesting Rules from Large Sets of Association Rules,” Proc. Third Int'l Conf. Information and Knowledge Management, N.R. Adam, K.B. Bhargava, and Y. Yesha, eds. pp. 401-407, 1994.
[15] H. Mannila, H. Toivonen, and A. Verkamo, Efficient Algorithms for Discovering Association Rules Proc. AAAI '94 Workshop Knowledge Discovery in Databases, pp. 181-192, July 1994.
[16] R.J. Miller and Y. Yang, “Association Rules Over Interval Data,” Proc. 1997 ACM-SIGMOD Int'l Conf. Management of Data, pp. 452-461, May 1997.
[17] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” Proc. 1998 ACM-SIGMOD Int'l Conf. Management of Data, pp. 13-24, June 1998.
[18] B. Ozden, S. Ramaswamy, and A. Silberschatz, Cyclic Association Rules Proc. 14th Int'l Conf. Data Eng., Apr. 1998.
[19] J. Park, M. Chen, and P. Yu, Efficient Parallel Data Mining for Association Rules Proc. Fourth Int'l Conf. Information and Knowledge Management, pp. 31-36, 1995.
[20] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[21] J. Pearl, Graphs, Causality and Structural Equation Models Technical Report R-253, Univ. of California, Los Angeles, 1998.
[22] K. Rajamani, B. Iyer, and A. Chadha, Using DB2's Object Relational Extensions for Mining Association Rules Technical Report TR 03,690, Santa Teresa Laboratory, IBM Corp., Sept. 1997.
[23] S. Ross, A First Course in Probability. fifth ed., Prentice-Hall, 1998.
[24] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407-419, Sept. 1995.
[25] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, June 1996.
[26] C. Silverstein, S. Brin, R. Motwani, and J. Ullman, “Scalable Techniques for Mining Causal Structures,” Proc. 1998 Int'l Conf. Very Large Data Bases, pp. 594-605, Aug. 1998.
[27] P. Sprites, C. Glymour, and R. Scheines, Causation, Prediction and Search. New York: Springer-Verlag, 1993.
[28] T. Shintani and M. Kitsuregawa, Hash Based Parallel Algorithms for Mining Association Rules Proc. Conf. Parallel and Distributed Information Systems, pp. 19-30, 1996.
[29] T. Shintani and M. Kitsuregawa, Parallel Mining Algorithms for Generalized Association Rules With Classification Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, pp. 25-36, 1998.
[30] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.
[31] S. Sarawagi, S. Thomas, and R. Agrawal, “Integrating Association Rule Mining with Databases: Alternatives and Implications,” ACM SIGMOD Int'l Conf. Management of Data, June 1998.
[32] R. Srikant, Q. Vu, and R. Agrawal, Mining Association Rules With Item Constraints Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, 1997.
[33] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. 1996 Int'l Conf. Very Large Data Bases, pp. 134-145, Sept. 1996.
[34] M.S. Viveros, J.P. Nearhoe, and M.J. Rothman, Applying Data Mining Techniques to a Health Insurance Information System Proc. 22nd Int'l Conf. Very Large Databases, Dec. 1996.
[35] Adult Data Set,http://www.cs.toronto.edu/~delve/data/adult desc.html, 1996.
[36] Test Results for the Real Data Set,http://www.comp.nus.edu.sg/~lizhao/research Appen dix-A.doc, 2002.
[37] The Insurance Company Benchmark,http://kdd.ics.uci.edu/databases/tictic.html , 2000.
[38] CoIL Challenge 2000: The Insurance Company Case. P. van der Putten and M. van Someren, eds., published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000-09, June 2000.

Index Terms:
Combination data set, data mining, extending association rule, fine partition, proportionate sampling.
Citation:
Sam Y. Sung, Zhao Li, Chew L. Tan, Peter A. Ng, "Forecasting Association Rules Using Existing Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 6, pp. 1448-1459, Nov.-Dec. 2003, doi:10.1109/TKDE.2003.1245284
Usage of this product signifies your acceptance of the Terms of Use.