This Article 
 Bibliographic References 
 Add to: 
Efficient Aggregation Algorithms for Compressed Data Warehouses
May/June 2002 (vol. 14 no. 3)
pp. 515-529

Abstract—Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms.

[1] J. Gray et al., "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," J. Data Mining and Knowledge Discovery, Vol. 1, No. 1, 1997, pp. 29-53.
[2] S. Yazdani and S. Wong, Data Warehousing with Oracle. Upper Saddle River, N.J.: Prentice-Hall, 1997.
[3] V.R. Gupta, Data Warehousing with MS SQL Server Unleashed. Englewood Cliffs, N.J.: Sams, 1977.
[4] D. Chatziantonian and K. Ross, “Querying Multiple Features in Relational Databases,” Proc. 22nd Int'l Conf. Very Large Data Bases, pp. 295-306, Sept. 1996.
[5] Arbor Sofware, “The Role of Multidimensional Database in a Data Warehousing Solution,” White Paper, Arbor Software, URL: .
[6] W.H. Inmon, “Multidimensional Databases and Data Warehousing,” Data Management Rev., Feb. 1995.
[7] G. Colliat, “OLAP, Relational and Multidimensional Databases Systems,” SIGMOD Record, vol. 25, no. 3, Sept. 1996.
[8] M.A. Bassiouni, "Data Compression in Scientific and Statistical Databases," IEEE Trans. Software Eng., vol. 11, no. 10, pp. 1,047-1,058, Oct. 1985.
[9] M.A. Roth and S.J. Van Horn, “Database Compression,” SIGMOD RECORD, vol. 22, no. 3, pp.19-29, Sept. 1993.
[10] Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An Array-Based Algorithm for Simultaneous Multidimensional Aggregations,” Proc. 1997 ACM-SIGMOD Conf. Management of Data, pp. 159-170, May 1997.
[11] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, June 1993.
[12] V. Harinarayan, A. Rajaraman, and J. D. Ullman, “Implementing Data Cubes Efficiently,” Proc. ACM SIGMOD, pp. 205-216, June 1996
[13] H. Gupta, V. Harinarayan, A. Rajaramana, and J. Ullman, Index Selection for OLAP Proc. 13th Int'l Conf. Data Eng., 1997.
[14] Y. Kotidis and N. Roussopoulos, “An Alternative Storage Organization for ROLAP Aggregation Views Based on Cubtrees,” Proc. 1998 ACM-SIGMOD Conf. Management of Data, pp. 249-258, June 1998.
[15] N. Roussopoulos, Y. Kotidis, and M. Roussopoulos, “Cubetree: Organization of and Bulk Updates on the Data Cube,” Proc. ACM SIGMOD Conf. Management of Data, 1997.
[16] S. Agrawal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, "On the Computation of Multidimensional Aggregates," Proc. 22nd Int'l Conf. Very Large Databases, pp. 506-521,Mumbai (Bombay), India, Sept. 1996.
[17] A. Shoshani, “Statistical Databases: Characteristics, Problems and Some Solutions,” Proc. Eighth Int'l Conf. Very Large Data Base, pp. 208-222, Sept. 1982.
[18] M.C. Chen and L.P. McNamee, "The Data Model and Access Method of Summary Data Management," IEEE Trans. Knowledge and Data Eng., vol. 1, no. 4, pp. 519-529, 1989.
[19] J. Srivastava, J.S.E. Tan, and V.Y. Lum, "TBSAM: An Access Method for Efficient Processing of Statistical Queries," IEEE Trans. Knowledge and Date Eng., vol. 1, no. 4, 1989.
[20] Statistical and Scientific Databases. A. Michalewicz, ed., 1992.
[21] S. Eggers and A. Shoshani, “Efficient Access of Compressed Data,” Proc. Sixth Int'l Conf. Very Large Data Bases, pp. 205-211, Oct. 1980.
[22] J. Li, H.K. Wang, and D. Rotem, “Batched International Searching on Databases,” Proc. Third Int'l Conf. Data Eng., pp.18-24, Feb. 1987.
[23] J.Z. Li, D. Rotem, and H.K.T. Wong, "A New Compression Method with Fast Searching on Large Databases," Proc. 13th Int'l Conf. Very Large Databases, pp. 311-318, 1987.
[24] J. Li and J. Srivastava, “Aggregation Algorithms for Very Large Compressed Data Warehouses,” technique report, Harbin Inst. of Technology,FTP:/, 1999.

Index Terms:
Data warehouse, multidimensional array, OLAP, aggregation, aggregation on compressed data warehouses
J. Li, J. Srivastava, "Efficient Aggregation Algorithms for Compressed Data Warehouses," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 3, pp. 515-529, May-June 2002, doi:10.1109/TKDE.2002.1000340
Usage of this product signifies your acceptance of the Terms of Use.