This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Divide-and-Approximate: A Novel Constraint Push Strategy for Iceberg Cube Mining
March 2005 (vol. 17 no. 3)
pp. 354-368
The iceberg cube mining computes all cells v, corresponding to GROUP BY partitions, that satisfy a given constraint on aggregated behaviors of the tuples in a GROUP BY partition. The number of cells often is so large that the result cannot be realistically searched without pushing the constraint into the search. Previous works have pushed antimonotone and monotone constraints. However, many useful constraints are neither antimonotone nor monotone. We consider a general class of aggregate constraints of the form f(v)\theta \sigma, where f is an arithmetic function of SQL-like aggregates and \theta is one of <,\leq,\geq,>. We propose a novel pushing technique, called Divide-and-Approximate, to push such constraints. The idea is to recursively divide the search space and approximate the given constraint using antimonotone or monotone constraints in subspaces. This technique applies to a class called separable constraints, which properly contains all constraints built by an arithmetic function f of all SQL aggregates.

[1] S. Agarwal et al. “On the Computation of Multidimensional Aggregates,” Proc. 22nd Int'l Conf. Very Large Databases (VLDB), 1996.
[2] R. Agrawal, T. Imilienski, and A. Swami, “Mining Association Rules between Sets of Items in Large Datasets,” Proc. 1993 ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.
[3] R. Bayardo, “Efficient Mining Long Patterns from Databases,” Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, pp. 85-93, 1998.
[4] R. Bayardo, R. Agrawal, and D. Gunopulos, “Constraint-Based Rule Mining in Large Dense Databases,” Proc. Int'l Conf. Data Eng. (ICDE), 1999.
[5] K. Beyer and R. Ramakrishnan, “Bottom-Up Computation of Sparse and Iceberg Cubes,” Proc. 1999 ACM SIGMOD Int'l Conf. Management of Data, pp. 359-370, 1999.
[6] D. Burdick, M. Calimlim, and J. Gehrke, “Mafia: A Maximal Frequent Itemset Algorithm for Transactional Databases,” Proc. Int'l Conf. Data Eng. (ICDE), 2001.
[7] G. Dong and J. Li, “Efficient Mining of Emerging Patterns: Discovering Trends and Differences,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-52, 1999.
[8] M. Fang, N. Shivakumar, H. Molina, R. Motwani, and J. Ullman, “Computing Iceberg Queries Efficiently,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB), pp. 299-310, 1998.
[9] J. Han, J. Pei, G. Dong, and K. Wang, “Efficient Computation of Iceberg Cubes with Complex Measures,” Proc. Int'l Conf. Management of Data (SIGMOD), 2001.
[10] V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing Data Cubes Efficiently,” Proc. 1996 ACM Int'l Conf. Management of Data (SIGMOD), 1996.
[11] C.T. Ho, R. Agrawal, and R. Srihant, “Range Queries in Data Cubes,” Proc. Int'l Conf. Management of Data (SIGMOD), 1997.
[12] KDD98, “The KDD-Cup-98 Dataset,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD), Aug. 1998, http://kdd.ics.uci.edu/databases/kddcup98 kddcup98.html.
[13] R. Ng, L.V. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” Proc. Int'l Conf. Management of Data (SIGMOD), pp. 13-24, 1998.
[14] J. Pei, J. Han, and L.V.S. Lakshmanan, “Mining Frequent Itemsets with Convertible Constraints,” Proc. Int'l Conf. Data Eng., 2001.
[15] R. Srikant, Q. Vu, and R. Agrawal, “Mining Association Rules with Item Constraints,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 67-73, 1997.
[16] K. Wang, Y. He, D. Cheung, and F. Chin, “Mining Confident Rules without Support Requirement,” Proc. 10th Int'l Conf. Information and Knowledge Management, 2001.
[17] K. Wang, Y. He, and J. Han, “Pushing Support Constraints into Frequent Itemset Mining,” Proc. Very Large Data Bases Conf. (VLDB), 2000.
[18] Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An Array-Based Algorithm for Simultaneous Multidimensional Aggregates,” Proc. 1997 ACM SIGMOD Conf. (SIGMOD), 1997.

Index Terms:
Aggregate constraint, constrained data mining, data cube, iceberg cube mining, iceberg query.
Citation:
Ke Wang, Yuelong Jiang, Jeffrey Xu Yu, Guozhu Dong, Jiawei Han, "Divide-and-Approximate: A Novel Constraint Push Strategy for Iceberg Cube Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 354-368, March 2005, doi:10.1109/TKDE.2005.45
Usage of this product signifies your acceptance of the Terms of Use.