This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach
January 2007 (vol. 19 no. 1)
pp. 111-126
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the MultiWay Array Cube (called the MultiWay) algorithm [30], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC [6] , computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms the previous methods.

[1] S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, “On the Computation of Multidimensional Aggregates,” Proc. Int'l Conf. Very Large Data Bases (VLDB '96), pp. 506-521, Sept. 1996.
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, Sept. 1994.
[3] E. Baralis, S. Paraboschi, and E. Teniente, “Materialized View Selection in a Multidimensional Database,” Proc. Int'l Conf. Very Large Data Bases (VLDB '97), pp. 98-12, Aug. 1997.
[4] D. Barbara and M. Sullivan, “Quasi-Cubes: Exploiting Approximation in Multidimensional Databases,” SIGMOD Record, vol. 26, pp. 12-17, 1997.
[5] D. Barbará and X. Wu, “Using Loglinear Models to Compress Datacube,” Proc. First Int'l Conf. Web-Age Information Management (WAIM '00), pp. 311-322, 2000.
[6] K. Beyer and R. Ramakrishnan, “Bottom-Up Computationn of Sparse and Iceberg Cubes,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp. 359-370, June 1999.
[7] Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “Multidimensional Regression Analysis of Time-Series Data Streams,” Proc. 2002 Int'l Conf. Very Large Data Bases (VLDB '02), pp. 323-334, Aug. 2002.
[8] Z. Chen and V. Narasayya, “Efficient Computation of Multiple Group by Queries,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 263-274, June 2005.
[9] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and SubTotals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29-54, 1997.
[10] H. Gupta, “Selection of Views to Materialize in a Data Warehouse,” Proc. Seventh Int'l Conf. Database Theory (ICDT '97), pp. 98-112, Jan. 1997.
[11] H. Gupta, V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Index Selection for OLAP,” Proc. Int'l Conf. Data Eng. (ICDE '97), pp. 208-219, Apr. 1997.
[12] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
[13] J. Han, J. Pei, G. Dong, and K. Wang, “Efficient Computation of Iceberg Cubes with Complex Measures,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '01), pp. 1-12, May 2001.
[14] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp. 1-12, May 2000.
[15] V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing Data Cubes Efficiently,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '96), pp. 205-216, June 1996.
[16] T. Imielinski, L. Khachiyan, and A. Abdulghani, “Cubegrades: Generalizing Association Rules,” Data Mining and Knowledge Discovery, vol. 6, pp. 219-258, 2002.
[17] L.V.S. Lakshmanan, J. Pei, and J. Han, “Quotient Cube: How to Summarize the Semantics of a Data Cube,” Proc. Int'l Conf. Very Large Data Bases (VLDB '02), pp. 778-789, Aug. 2002.
[18] L.V. S. Lakshmanan, J. Pei, and Y. Zhao, “QC-Trees: An Efficient Summary Structure for Semantic OLAP,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '03), pp. 64-75, June 2003.
[19] X. Li, J. Han, and H. Gonzalez, “High-Dimensional OLAP: A Minimal Cubing Approach,” Proc. Int'l Conf. Very Large Data Bases (VLDB '04), pp. 528-539, Aug. 2004.
[20] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 13-24, June 1998.
[21] K. Ross and D. Srivastava, “Fast Computation of Sparse Datacubes,” Proc. Int'l Conf. Very Large Data Bases (VLDB '97), pp. 116-125, Aug. 1997.
[22] S. Sarawagi, R. Agrawal, and N. Megiddo, “Discovery-Driven Exploration of OLAP Data Cubes,” Proc. Int'l Conf. Extending Database Technology (EDBT '98), pp. 168-182, Mar. 1998.
[23] J. Shanmugasundaram, U.M. Fayyad, and P.S. Bradley, “Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 223-232, Aug. 1999.
[24] A. Shukla, P.M. Deshpande, and J.F. Naughton, “Materialized View Selection for Multidimensional Datasets,” Proc. Int'l Conf. Very Large Data Bases (VLDB '98), pp. 488-499, Aug. 1998.
[25] Y. Sismanis and N. Roussopoulos, “The Complexity of Fully Materialized Coalesced Cubes,” Proc. Int'l Conf. Very Large Data Bases (VLDB '04), pp. 540-551, Aug. 2004.
[26] Y. Sismanis, N. Roussopoulos, A. Deligianannakis, and Y. Kotidis, “Dwarf: Shrinking the Petacube,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 464-475, June 2002.
[27] J.S. Vitter, M. Wang, and B.R. Iyer, “Data Cube Approximation and Histograms Via Wavelets,” Proc. Int'l Conf. Information and Knowledge Management (CIKM '98), pp. 96-104, Nov. 1998.
[28] W. Wang, H. Lu, J. Feng, and J.X. Yu, “Condensed Cube: An Effective Approach to Reducing Data Cube Size,” Proc. Int'l Conf. Data Eng. (ICDE '02), pp. 155-165, Apr. 2002.
[29] D. Xin, Z. Shao, J. Han, and H. Liu, “C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking,” Proc. Int'l Conf. Data Eng. (ICDE '06), p. 4, Apr. 2006.
[30] Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An Array-Based Algorithm for Simultaneous Multidimensional Aggregates,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '97), pp.159-170, May 1997.

Index Terms:
Data warehouse, data mining, online analytical processing (OLAP).
Citation:
Dong Xin, Jiawei Han, Xiaolei Li, Zheng Shao, Benjamin W. Wah, "Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 111-126, Jan. 2007, doi:10.1109/TKDE.2007.4
Usage of this product signifies your acceptance of the Terms of Use.