| | This Article | |
| |
| |
| | Share | |
| |
| |
| | Bibliographic References | |
| |
| |
| | Add to: | |
| |
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
| |
| | Search | |
| |
| |
| | |
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach
January 2007 (vol. 19 no. 1)
pp. 111-126
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the MultiWay Array Cube (called the MultiWay) algorithm [30], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning [2] when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC [6] , computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, Star-Cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that Star-Cubing is highly efficient and outperforms the previous methods.
[1] 111 S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, “On the Computation of Multidimensional Aggregates,” Proc. Int'l Conf. Very Large Data Bases (VLDB '96), pp. 506-521, Sept. 1996.[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, Sept. 1994.[3] E. Baralis, S. Paraboschi, and E. Teniente, “Materialized View Selection in a Multidimensional Database,” Proc. Int'l Conf. Very Large Data Bases (VLDB '97), pp. 98-12, Aug. 1997.[4] D. Barbara and M. Sullivan, “Quasi-Cubes: Exploiting Approximation in Multidimensional Databases,” SIGMOD Record, vol. 26, pp. 12-17, 1997.[5] D. Barbará and X. Wu, “Using Loglinear Models to Compress Datacube,” Proc. First Int'l Conf. Web-Age Information Management (WAIM '00), pp. 311-322, 2000.[6] K. Beyer and R. Ramakrishnan, “Bottom-Up Computationn of Sparse and Iceberg Cubes,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp. 359-370, June 1999.[7] Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “Multidimensional Regression Analysis of Time-Series Data Streams,” Proc. 2002 Int'l Conf. Very Large Data Bases (VLDB '02), pp. 323-334, Aug. 2002.[8] Z. Chen and V. Narasayya, “Efficient Computation of Multiple Group by Queries,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '05), pp. 263-274, June 2005.[9] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and SubTotals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29-54, 1997.[10] H. Gupta, “Selection of Views to Materialize in a Data Warehouse,” Proc. Seventh Int'l Conf. Database Theory (ICDT '97), pp. 98-112, Jan. 1997.[11] H. Gupta, V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Index Selection for OLAP,” Proc. Int'l Conf. Data Eng. (ICDE '97), pp. 208-219, Apr. 1997.[12] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.[13] J. Han, J. Pei, G. Dong, and K. Wang, “Efficient Computation of Iceberg Cubes with Complex Measures,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '01), pp. 1-12, May 2001.[14] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp. 1-12, May 2000.[15] V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing Data Cubes Efficiently,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '96), pp. 205-216, June 1996.[16] T. Imielinski, L. Khachiyan, and A. Abdulghani, “Cubegrades: Generalizing Association Rules,” Data Mining and Knowledge Discovery, vol. 6, pp. 219-258, 2002.[17] L.V.S. Lakshmanan, J. Pei, and J. Han, “Quotient Cube: How to Summarize the Semantics of a Data Cube,” Proc. Int'l Conf. Very Large Data Bases (VLDB '02), pp. 778-789, Aug. 2002.[18] L.V. S. Lakshmanan, J. Pei, and Y. Zhao, “QC-Trees: An Efficient Summary Structure for Semantic OLAP,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '03), pp. 64-75, June 2003.[19] X. Li, J. Han, and H. Gonzalez, “High-Dimensional OLAP: A Minimal Cubing Approach,” Proc. Int'l Conf. Very Large Data Bases (VLDB '04), pp. 528-539, Aug. 2004.[20] R. Ng, L.V.S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 13-24, June 1998.[21] K. Ross and D. Srivastava, “Fast Computation of Sparse Datacubes,” Proc. Int'l Conf. Very Large Data Bases (VLDB '97), pp. 116-125, Aug. 1997.[22] S. Sarawagi, R. Agrawal, and N. Megiddo, “Discovery-Driven Exploration of OLAP Data Cubes,” Proc. Int'l Conf. Extending Database Technology (EDBT '98), pp. 168-182, Mar. 1998.[23] J. Shanmugasundaram, U.M. Fayyad, and P.S. Bradley, “Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions,” Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD '99), pp. 223-232, Aug. 1999.[24] A. Shukla, P.M. Deshpande, and J.F. Naughton, “Materialized View Selection for Multidimensional Datasets,” Proc. Int'l Conf. Very Large Data Bases (VLDB '98), pp. 488-499, Aug. 1998.[25] Y. Sismanis and N. Roussopoulos, “The Complexity of Fully Materialized Coalesced Cubes,” Proc. Int'l Conf. Very Large Data Bases (VLDB '04), pp. 540-551, Aug. 2004.[26] Y. Sismanis, N. Roussopoulos, A. Deligianannakis, and Y. Kotidis, “Dwarf: Shrinking the Petacube,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 464-475, June 2002.[27] J.S. Vitter, M. Wang, and B.R. Iyer, “Data Cube Approximation and Histograms Via Wavelets,” Proc. Int'l Conf. Information and Knowledge Management (CIKM '98), pp. 96-104, Nov. 1998.[28] W. Wang, H. Lu, J. Feng, and J.X. Yu, “Condensed Cube: An Effective Approach to Reducing Data Cube Size,” Proc. Int'l Conf. Data Eng. (ICDE '02), pp. 155-165, Apr. 2002.[29] D. Xin, Z. Shao, J. Han, and H. Liu, “C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking,” Proc. Int'l Conf. Data Eng. (ICDE '06), p. 4, Apr. 2006.[30] Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An Array-Based Algorithm for Simultaneous Multidimensional Aggregates,” Proc. ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '97), pp.159-170, May 1997.
Index Terms:
Data warehouse, data mining, online analytical processing (OLAP).
Citation:
Dong Xin, Jiawei Han, Xiaolei Li, Zheng Shao, Benjamin W. Wah, "Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 111-126, Jan. 2007, doi:10.1109/TKDE.2007.4