This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
New Algorithm for Computing Cube on Very Large Compressed Data Sets
December 2006 (vol. 18 no. 12)
pp. 1667-1680
Data compression is an effective technique to improve the performance of data warehouses. Since cube operation represents the core of online analytical processing in data warehouses, it is a major challenge to develop efficient algorithms for computing cube on compressed data warehouses. To our knowledge, very few cube computation techniques have been proposed for compressed data warehouses to date in the literature. This paper presents a novel algorithm to compute cubes on compressed data warehouses. The algorithm operates directly on compressed data sets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube is also proposed.

[1] S. Yazdani and S. Wong, Data Warehousing with Oracle. Prentice-Hall, 1997.
[2] V.R. Gupta, Data Warehousing with MS SQL Server Unleashed. Sams, 1977.
[3] D. Chatziantonian and K. Ross, “Querying Multiple Features in Relational Databases,” Proc. 22nd Int'l Conf. Very Large Databases, 1996.
[4] Arbor Software “The Role of Multidimensional Database in a Data Warehousing Solution,” white paper, http://www.arborsoft.com/paperswareTOC.html , 2006.
[5] W.H. Inmon, “Multidimensional Databases and Data Warehousing,” Data Management Rev., Feb. 1995.
[6] G. Colliat, “OLAP, Relational, and Multidimensional Databases Systems,” SIGMOD Record, vol. 25, no. 3, 1996.
[7] J. Gray et al. “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tables, and Sub-Totals,” Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 29-53, 1997.
[8] S. Agarwal et al., “On the Computation of Multidimensional Aggregates,” Proc. 22nd Very Large Data Bases Conf., pp. 506-521, 1996.
[9] V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing Data Cubes Efficiently,” Proc. ACM SIGMOD Conf., pp. 205-216, 1996.
[10] K.A. Ross and D. Srivastava, “Fast Computation of Sparse Datacubes,” Proc. 23rd Int'l Conf. Very Large Data Bases, pp. 116-125, Aug. 1997.
[11] P.M. Deshpande, S. Agarwal, J.F. Naughton, and R. Ramakrishnan, “Computation of Multidimensional Aggregates,” Technical Report 1314, Univ. of Wisconson, Madison, 1996.
[12] T.N. Raymond, A. Wagner, and Y. Yin, “Iceberg-Cube Computation with PC Clusters,” Proc. ACM SIGMOD Conf., 2001.
[13] J. Gehrke, F. Korn, and D. Srivastava, “On Computing Correlated Aggregates over Continual Data Streams,” Proc. ACM SIGMOD Conf., 2001.
[14] F. Dehne, T. Eavis, S.E. Hambrusch, and A.R. Chaplin, “Parallelizing the Data Cube,” Proc. Eighth Int'l Conf. Database Theory, Jan. 2001.
[15] J.X. Yu and H. Lu, “Multi-Cube Computation,” Proc. Seventh Int'l Symp. Database Systems for Advanced Applications, Apr. 2001.
[16] Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An Array-Based Algorithm for Simultaneous Multidimensional Aggregations,” Proc. 1997 ACM SIGMOD Conf., 1996.
[17] J. Li, D. Rotem, and J. Srivastava, “Efficient Aggregation Algorithms for Very Large Compressed Data Warehouses,” Proc. 25th Int'l Conf. Very Large Databases, 1999.
[18] J. Li, D. Rotem, and H.K. Wong, “A New Compression Method with Fast Searching on Databases,” Proc. 19th Int'l Conf. Very Large Databases, pp. 311-318, Sept. 1987.
[19] S.J. Eggers and A. Shoshani, “Efficient Access of Compressed Data,” Proc. Sixth Int'l Conf. Very Large Databases, pp. 205-211, Oct. 1980.
[20] M.A. Bassiouni, “Data Compression in Scientific and Statistical Databases,” IEEE Trans. Software Eng., vol. 11, no. 10, Oct. 1995.
[21] M.A. Roth and S.J. Van Horn, “Database Compression,” SIGMOD RECORD, vol. 22, no. 3, Sept. 1993.
[22] M.A. Bassiouni, “Data Compression in Scientific and Statistical Databases,” IEEE Trans. Software Eng., vol. 11, no. 10, pp. 1047-1058, Oct. 1985.
[23] L. Lakshmanan, J. Pei, and Y. Zhao, “QC-Trees: An Efficient Summary Structure for Semantic OLAP,” Proc. 2003 ACM-SIGMOD Int'l Conf. Management of Data, pp. 64-75, 2003.
[24] W. Wang, H. Lu, J. Feng, and J.X. Yu, “Condensed Cube: An Effective Approach to Reducing Data Cube Size,” Proc. Int'l Conf. Data Eng., 2002.
[25] Y. Sismanis, A. Deligiannakis, N. Roussopoulos, and Y. Kotidis, “Dwarf: Shrinking the PetaCube,” Proc. 2002 ACM-SIGMOD Int'l Conf. Management of Data, pp. 464-475, 2002.
[26] Y. Sismanis and N. Roussopoulos, “The Polynomial Complexity of Fully Materialized Coalesced Cubes,” Proc. 30th Int'l Conf. Very Large Data Bases, 2004.
[27] H.K.T. Wong, J.Z. Li, F. Olken, D. Rotem, and L. Wong, “Bit Transpositoin for Very Large Science and Statistical Databases,” Algorithmica, pp. 289-309, Spinger-Verlag, 1986.
[28] S.J. Eggers, F. Olken, and A. Shoshani, “A Compression Technique for Large Statistical Databases,” Proc. Int'l Conf. Very Large Databases, pp. 424-434, 1981.
[29] A. Shoshani, F. Olken, and H.K.T. Wong, “Characteristics of Scientific Databases,” Proc. Int'l Conf. Very Large Database, pp. 147-160, 1984.
[30] A. Shoshani, “Statistical Databases: Characteristics, Problem, and some Solution,” Proc. Int'l Conf. Very Large Database, pp. 208-222, 1982.

Index Terms:
Data warehouses, data compression, cube operation, OLAP.
Citation:
Weili Wu, Hong Gao, Jianzhong Li, "New Algorithm for Computing Cube on Very Large Compressed Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 12, pp. 1667-1680, Dec. 2006, doi:10.1109/TKDE.2006.195
Usage of this product signifies your acceptance of the Terms of Use.