The Community for Technology Leaders
RSS Icon
Issue No.09 - Sept. (2012 vol.24)
pp: 1570-1583
Bin He , IBM Almaden Research Center, San Jose
Hui-I Hsiao , IBM Almaden Research Center, San Jose
Ziyang Liu , NEC Laboratories America, Inc., Cupertino
Yu Huang , Arizona State University, Tempe
Yi Chen , Arizona State University, Tempe
Decision support and knowledge discovery systems often compute aggregate values of interesting attributes by processing a huge amount of data in very large databases and/or warehouses. In particular, iceberg query is a special type of aggregation query that computes aggregate values above a user-provided threshold. Usually, only a small number of results will satisfy the threshold constraint. Yet, the results often carry very important and valuable business insights. Because of the small result set, iceberg queries offer many opportunities for deep query optimization. However, most existing iceberg query processing algorithms do not take advantage of the small-result-set property and rely heavily on the tuple-scan-based approach. This incurs intensive disk accesses and computation, resulting in long processing time especially when data size is large. Bitmap index, which builds one bitmap vector for each attribute value, is gaining popularity in both column-oriented and row-oriented databases in recent years. It occupies less space than the raw data and gives opportunities for more efficient query processing. In this paper, we exploited the property of bitmap index and developed a very effective bitmap pruning strategy for processing iceberg queries. Our index-pruning-based approach eliminates the need of scanning and processing the entire data set (table) and thus speeds up the iceberg query processing significantly. Experiments show that our approach is much more efficient than existing algorithms commonly used in row-oriented and column-oriented databases.
Heuristic algorithms, Indexes, Aggregates, Query processing, Business, column-oriented database, Iceberg query, bitmap index
Bin He, Hui-I Hsiao, Ziyang Liu, Yu Huang, Yi Chen, "Efficient Iceberg Query Evaluation Using Compressed Bitmap Index", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 9, pp. 1570-1583, Sept. 2012, doi:10.1109/TKDE.2011.73
[1] S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, "On the Computation of Multidimensional Aggregates," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 506-521, 1996.
[2] R. Agrawal, T. Imielinski, and A.N. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.
[3] G. Antoshenkov, "Byte-Aligned Bitmap Compression," Proc. Conf. Data Compression, p. 476, 1995.
[4] J. Bae and S. Lee, "Partitioning Algorithms for the Computation of Average Iceberg Queries," Proc. Second Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK), 2000.
[5] M. Beeler, R.W. Gosper, and R. Schroeppel, "HAKMEM," technical report, Massachusetts Inst. of Technology, Cambridge, 1972.
[6] K.S. Beyer and R. Ramakrishnan, "Bottom-Up Computation of Sparse and Iceberg CUBEs," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 359-370, 1999.
[7] C.Y. Chan and Y.E. Ioannidis, "Bitmap Index Design and Evaluation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1998.
[8] F. Deliège and T.B. Pedersen, "Position List Word Aligned Hybrid: Optimizing Space and Performance for Compressed Bitmaps," Proc. Int'l Conf. Extending Database Technology (EDBT), pp. 228-239, 2010.
[9] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J.D. Ullman, "Computing Iceberg Queries Efficiently," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 299-310, 1998.
[10] A. Ferro, R. Giugno, P.L. Puglisi, and A. Pulvirenti, "BitCube: A Bottom-Up Cubing Engineering," Proc. Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 189-203, 2009.
[11] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, 1993.
[12] J. Han, J. Pei, G. Dong, and K. Wang, "Efficient Computation of Iceberg Cubes with Complex Measures," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 2001.
[13] M. Jrgens, "Tree Based Indexes versus Bitmap Indexes: A Performance Study," Proc. Int'l Workshop Design and Management of Data Warehouses (DMDW), 1999.
[14] D.E. Knuth, The Art of Computer Programming, second ed. Addison-Wesley Professional, Jan. 1973.
[15] P.-Å. Larson, "Grouping and Duplicate Elimination: Benefits of Early Aggregation," Technical Report MSR-TR-97-36, Microsoft Research, 1997.
[16] K.P. Leela, P.M. Tolani, and J.R. Haritsa, "On Incorporating Iceberg Queries in Query Processors," Proc. Int'l Conf. Database Systems for Advances Applications (DASFAA), pp. 431-442, 2004.
[17] P.E. O'Neil, "Model 204 Architecture and Performance," Proc. Int'l Workshop High Performance Transaction Systems (HPTS), pp. 40-59, 1987.
[18] P.E. O'Neil and G. Graefe, "Multi-Table Joins through Bitmapped Join Indices," SIGMOD Record, vol. 24, no. 3, pp. 8-11, 1995.
[19] P.E. O'Neil and D. Quass, "Improved Query Performance with Variant Indexes," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 38-49, 1997.
[20] K. Stockinger, J. Cieslewicz, K. Wu, D. Rotem, and A. Shoshani, "Using Bitmap Index for Joint Queries on Structured and Text Data," Annals of Information Systems, vol. 3, pp. 1-23, 2009.
[21] M. Stonebraker, D.J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E.J. O'Neil, P.E. O'Neil, A. Rasin, N. Tran, and S.B. Zdonik, "C-Store: A Column-Oriented DBMS," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 553-564, 2005.
[22] K.-Y. Whang, B.T.V. Zanden, and H.M. Taylor, "A Linear-Time Probabilistic Counting Algorithm for Database Applications," ACM Trans. Database Systems, vol. 15, no. 2, pp. 208-229, 1990.
[23] K. Wu, E.J. Otoo, and A. Shoshani, "On the Performance of Bitmap Indices for High Cardinality Attributes," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 24-35, 2004.
[24] K. Wu, E.J. Otoo, and A. Shoshani, "Optimizing Bitmap Indices with Efficient Compression," ACM Trans. Database Systems, vol. 31, no. 1, pp. 1-38, 2006.
[25] W.P. Yan and P.-Å. Larson, "Data Reduction through Early Grouping," Proc. Conf. Centre for Advanced Studies on Collaborative Research (CASCON), p. 74, 1994.
24 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool