This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mining Constrained Gradients in Large Databases
August 2004 (vol. 16 no. 8)
pp. 922-938
Jiawei Han, IEEE
Jian Pei, IEEE Computer Society

Abstract—Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS). In such MDSs, dimensions capture potentially important factors for given applications, and cells represent combinations of values for the factors. To systematically analyze data in MDS, an interesting notion, called "cubegrade” was recently introduced by Imielinski et al. [CHECK END OF SENTENCE], which focuses on the notable changes in measures in MDS by comparing a cell (which we refer to as probe cell) with its gradient cells, namely, its ancestors, descendants, and siblings. We call such queries gradient analysis queries (GQs). Since an MDS can contain billions of cells, it is important to answer GQs efficiently. In this study, we focus on developing efficient methods for mining GQs constrained by certain (weakly) antimonotone constraints. Instead of conducting an independent gradient-cell search once per probe cell, which is inefficient due to much repeated work, we propose an efficient algorithm, LiveSet-Driven. This algorithm finds all good gradient-probe cell pairs in one search pass. It utilizes measure-value analysis and dimension-match analysis in a set-oriented manner, to achieve bidirectional pruning between the sets of hopeful probe cells and of hopeful gradient cells. Moreover, it adopts a hypertree structure and an H-cubing method to compress data and to maximize sharing of computation. Our performance study shows that this algorithm is efficient and scalable. In addition to data cubes, we extend our study to another important scenario: mining constrained gradients in transactional databases where each item is associated with some measures such as price. Such transactional databases can be viewed as sparse MDSs where items represent dimensions, although they have significantly different characteristics than data cubes. We outline efficient mining methods for this problem in this paper.

[1] S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, On the Computation of Multidimensional Aggregates Proc. Int'l Conf. Very Large Data Bases, pp. 506-521, Sept. 1996.
[2] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules between Sets of Items in Large Databases Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 207-216, May 1993.
[3] Y. Aumann and Y. Lindell, A Statistical Theory for Quantitative Association Rules Proc. Int'l Conf. Knowledge Discovery and Data Mining, Aug. 1999.
[4] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules Proc. Int'l Conf. Very Large Data Bases, pp. 487-499, Sept. 1994.
[5] K. Beyer and R. Ramakrishnan, Bottom-Up Computation of Sparse and Iceberg Cubes Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 359-370, June 1999.
[6] S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology SIGMOD Record, vol. 26, pp. 65-74, 1997.
[7] G. Dong and J. Li, Efficient Mining of Emerging Patterns: Discovering Trends and Differences Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-52, Aug. 1999.
[8] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J.D. Ullman, Computing Iceberg Queries Efficiently Proc. Int'l Conf. Very Large Data Bases, pp. 299-310, Aug. 1998.
[9] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Totals Data Mining and Knowledge Discovery, vol. 1, pp. 29-54, 1997.
[10] G. Grahne, L. Lakshmanan, and X. Wang, Efficient Mining of Constrained Correlated Sets Proc. Int'l Conf. Data Eng., pp. 512-521, Feb. 2000.
[11] J. Han, J. Pei, G. Dong, and K. Wang, Efficient Computation of Iceberg Cubes with Complex Measures Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, May 2001.
[12] J. Han, J. Pei, and Y. Yin, Mining Frequent Patterns without Candidate Generation Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 1-12, May 2000.
[13] V. Harinarayan, A. Rajaraman, and J.D. Ullman, Implementing Data Cubes Efficiently Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 205-216, June 1996.
[14] T. Imielinski, L. Khachiyan, and A. Abdulghani, Cubegrades: Generalizing Association Rules Data Mining and Knowledge Discovery, vol. 6, pp. 219-258, 2002.
[15] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, Exploratory Mining and Pruning Optimizations of Constrained Associations Rules Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 13-24, June 1998.
[16] J. Pei, J. Han, and L.V.S. Lakshmanan, Mining Frequent Itemsets with Convertible Constraints Proc. Int'l Conf. Data Eng., 2001.
[17] K. Ross and D. Srivastava, Fast Computation of Sparse Datacubes Proc. Int'l Conf. Very Large Data Bases, pp. 116-125, Aug. 1997.
[18] S. Sarawagi, R. Agrawal, and N. Megiddo, Discovery-Driven Exploration of OLAP Data Cubes Proc. Int'l Conf. Extending Database Technology, pp. 168-182, Mar. 1998.
[19] G. Sathe and S. Sarawagi, Intelligent Rollups in Multidimensional OLAP Data Proc. Int'l Conf. Very Large Data Bases, pp. 531-540, Sept. 2001.
[20] R. Srikant, Q. Vu, and R. Agrawal, Mining Association Rules with Item Constraints Proc. Int. Conf. Knowledge Discovery and Data Mining, pp. 67-73, Aug. 1997.
[21] Y. Zhao, P.M. Deshpande, and J.F. Naughton, An Array-Based Algorithm for Simultaneous Multidimensional Aggregates Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 159-170, May 1997.

Index Terms:
Data cube, data mining, gradient analysis, iceberg query, antimonotonicity, dimension-based pruning, constraint-based pruning, complex measures.
Citation:
Guozhu Dong, Jiawei Han, Joyce M.W. Lam, Jian Pei, Ke Wang, Wei Zou, "Mining Constrained Gradients in Large Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 8, pp. 922-938, Aug. 2004, doi:10.1109/TKDE.2004.28
Usage of this product signifies your acceptance of the Terms of Use.