This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mining Optimized Gain Rules for Numeric Attributes
March/April 2003 (vol. 15 no. 2)
pp. 324-338

Abstract—Association rules are useful for determining correlations between attributes of a relation and have applications in the marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support, confidence, or gain of the rule is maximized. In this paper, we generalize the optimized gain association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving the uninstantiated attribute. For rules containing a single numeric attribute, we present an algorithm with linear complexity for computing optimized gain rules. Furthermore, we propose a bucketing technique that can result in a significant reduction in input size by coalescing contiguous values without sacrificing optimality. We also present an approximation algorithm based on dynamic programming for two numeric attributes. Using recent results on binary space partitioning trees, we show that the approximations are within a constant factor of the optimal optimized gain rules. Our experimental results with synthetic data sets for a single numeric attribute demonstrate that our algorithm scales up linearly with the attribute's domain size as well as the number of disjunctions. In addition, we show that applying our optimized rule framework to a population survey real-life data set enables us to discover interesting underlying correlations among the attributes.

[1] F.D. Amore and P.G. Franciosa, “On the Optimal Binary Plane Partition for Sets of Isothetic Rectangles,” Information Processing Letters, vol. 44, no. 5, pp. 255-259, Dec. 1992.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” Proc. 1993 ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[4] R. Bayardo and R. Agrawal, "Mining the Most Interesting Rules," Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, ACM Press, New York, 1999.
[5] R.J. Bayardo, R. Agrawal, and D. Gunopulos, “Constraint-Based Rule Mining in Large, Dense Databases,” Proc. Int'l Conf. Data Eng., 1997.
[6] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, Mining Optimized Association Rules for Numeric Attributes Proc. 1996 ACM Symp. Principles of Database Systems, pp. 182-191, 1996.
[7] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Data Mining Using Two-Dimensional Optimized Association Rules: Scheme, Algorithms, and Visualization,” Proc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, pp. 13-23, June 1996.
[8] J. Han and Y. Fu, “Discovery of Multiple-Level Association Rules from Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 420-431, Sept. 1995.
[9] H.V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel, “Optimal Histograms with Quality Guarantees,” Proc. VLDB Conf., Aug. 1998.
[10] S. Khanna, S. Muthukrishnan, and M. Paterson, “On Approximating Rectangle Tiling and Packing,” Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 384-393, 1998.
[11] B. Lent, A. Swami, and J. Widom, “Clustering Association Rules,” Proc. 1997 Int'l Conf. Data Eng., pp. 220-231, Apr. 1997.
[12] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, “Efficient Algorithms for Discovering Association Rules,” Proc. AAAI Workshop Knowledge Discovery in Databases (KDD-94), pp. 181-192, July 1994.
[13] J.S. Park, M.S. Chen, and P.S. Yu, “An Effective Hash-Based Algorithm for Mining Association Rules,” Proc. 1995 ACM-SIGMOD Int'l Conf. Management of Data, pp. 175-186, May 1995.
[14] R. Rastogi and K. Shim, “Mining Optimized Association Rules for Categorical and Numeric Attributes,” Proc. Int'l Conf. Data Eng., 1998.
[15] R. Rastogi and K. Shim, “Mining Optimized Support Rules for Numeric Attributes,” Proc. Int'l Conf. Data Eng., 1999.
[16] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 407-419, Sept. 1995.
[17] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” Proc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, June 1996.
[18] A. Savasere, E. Omiecinski, and S. Navathe, “An Efficient Algorithm for Mining Association Rules in Large Databases,” Proc. 1995 Int'l Conf. Very Large Data Bases, pp. 432-443, Sept. 1995.

Index Terms:
Association rules, support, confidence, gain, dynamic programming, region bucketing, binary space partitioning.
Citation:
Sergey Brin, Rajeev Rastogi, Kyuseok Shim, "Mining Optimized Gain Rules for Numeric Attributes," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 324-338, March-April 2003, doi:10.1109/TKDE.2003.1185837
Usage of this product signifies your acceptance of the Terms of Use.