This Article 
 Bibliographic References 
 Add to: 
Approximate Query Processing in Cube Streams
November 2007 (vol. 19 no. 11)
pp. 1557-1570

Abstract—Data cubes have become important components in most data warehouse systems and Decision-Support-Systems. In such systems, users usually pose very complex queries to the Online Analytical Processing (OLAP) system, and systems usually have to deal with a huge amounts of data because of the large dimensionality of the sets; thus approximating query processing has emerged as a viable solution. Specifically, the applications of cube streams handle multidimensional data sets in a continuous manner in contrast to traditional cube approximation. Such an application collects data events for cube streams on-line and generates snapshots with limited resources and keeps the approximated information in a synopsis memory for further analysis. Compared to OLAP applications, applications of cube streams are subject to many more resource constraints on both the processing time and the memory and cannot be dealt with by existing methods due to the limited resources. In this paper, we propose the DAWA algorithm, which is a hybrid algorithm of Dct for Data and the discrete WAvelet transform, to approximate cube streams. Our algorithm combines the advantages of the high compression rate of DWT and the low memory cost of DCT. Consequently, DAWA requires much smaller working buffer and outperforms both DWT-based and DCT-based methods in execution efficiency. Also, it is shown that DAWA provides a good solution for approximate query processing of cube streams with a small working buffer and a short execution time. The optimality of the DAWA algorithm is theoretically proved and empirically demonstrated by our experiments.

[1] E. Codd, S. Codd, and C. Salley, “Providing OLAP (On-Line Analytical Processing) to User-Analysis: An IT Mandate,” technical report, Arbor Software Corp., 1993.
[2] W.-G. Teng, M.-S. Chen, and P.S. Yu, “Using Wavelet-Based Resource-Aware Mining to Explore Temporal and Support Count Granularities in Data Streams,” Proc. Fourth SIAM Int'l Conf. Data Mining (SDM '04), 2004.
[3] D. Zhang, D. Gunopulos, V.J. Tsotras, and B. Seeger, “Temporal Aggregation over Data Streams Using Multiple Granularities,” Proc. Int'l Conf. Extending Database Technology, pp. 646-663, 2002.
[4] M. Datar, A. Gionis, P. Indyk, and R. Motwani, “Maintaining Stream Statisitcs over Sliding Windows,” Proc. 13th ACM-SIAM Ann. Symp. Discrete Algorithms (SODA '02), 2002.
[5] M.-Y.Y. Bi-Ru Dai, J.-W. Huang, and M.-S. Chen, “Adaptive Clustering for Multiple Evolving Streams,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 9, p. 1166, Sept. 2006.
[6] H.-P. Hung and M.-S. Chen, “Efficient Range-Constrained Similarity Search from Wavelet Synopses over Multiple Streams,” Proc. 15th ACM Conf. Information and Knowledge Management (CIKM '06), 2006.
[7] J.S. Vitter and M. Wang, “Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp.193-204, 1999.
[8] R. Agrawal, C. Faloutsos, and A.N. Swami, “Efficient Similarity Search in Sequence Databases,” Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms (FODO '93), 1993.
[9] M. Garofalakis, J. Gehrke, and R. Rastogi, “Querying and Mining Data Streams: You Only Get One Look. A Tutorial,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 635-635, 2002.
[10] J.-H. Lee, D.-H. Kim, and C.-W. Chung, “Multi-Dimensional Selectivity Estimation Using Compressed Histogram Information,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), pp. 205-214, 1999.
[11] J.S. Vitter, M. Wang, and B. Iyer, “Data Cube Approximation and Histograms via Wavelets,” Proc. Seventh Conf. Information and Knowledge Management (CIKM '98), pp. 96-104, 1998.
[12] C.S. Burrus, R.A. Gopinath, and H. Guo, Introduction to Wavelets and Wavelet Transforms. Prentice Hall, 1998.
[13] B. Jawerth and W. Sweldens, “An Overview of Wavelet-Based Multiresolution Analysis,” SIAM Rev., vol. 36, no. 3, pp. 377-412, 1994.
[14] A. Haar, “Theorie der Orthogonalen Funktionen-Systeme,” Mathematische Annalen, vol. 69, pp. 331-371, 1910.
[15] D. Gabor, “Theory of Communication,” J. Inst. Electrical Engineers, vol. 93, no. 22, p. 429, 1946.
[16] D.L. Donoho, “De-Noising by Soft-Thresholding,” IEEE Trans. Information Theory, vol. 41, no. 3, pp. 613-627, 1995.
[17] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries,” The VLDB J., pp. 79-88, 2001.
[18] G. Colliat, “Olap, Relational, and Multidimensional Database Systems,” SIGMOD Record, vol. 25, no. 3, pp. 64-69, 1996.
[19] S. Sarawagi, “Indexing OLAP Data,” Data Eng. Bull., vol. 20, no. 1, pp. 36-43, 1997.
[20] C.-T. Ho, R. Agrawal, N. Megiddo, and R. Srikant, “Range Queries in OLAP Data Cubes,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '97), pp. 73-88, 1997.
[21] J.S. Lim, Two-Dimensional Signal and Image Processing. Prentice Hall, 1990.
[22] Y. Matias, J.S. Vitter, and M. Wang, “Dynamic Maintenance of Wavelet-Based Histograms,” The VLDB J., pp. 101-110, 2000.
[23] K.-P. Chan and A.W.-C. Fu, “Efficient Time Series Matching by Wavelets,” Proc. 15th IEEE Int'l Conf. Data Eng. (ICDE '99), 1999.
[24] H.E. Hurst, “Long-Term Storage Capacity of Reservoirs,” Trans. Am. Soc. of Civil Engineers, p. 770, 1951.
[25] A.V. Oppenheim and R.W. Schafer, Digital Signal Processing. Prentice Hall, 1975.
[26] M. Garofalakis and P.B. Gibbons, “Wavelet Synopses with Error Guarantees,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '02), 2002.
[27] TPCD, TPC Benchmark, 1995.

Index Terms:
Cube Streams, OLAP, Data Cubes, Data Streams
Ming-Jyh Hsieh, Ming-Syan Chen, Philip S. Yu, "Approximate Query Processing in Cube Streams," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 11, pp. 1557-1570, Nov. 2007, doi:10.1109/TKDE.2007.190622
Usage of this product signifies your acceptance of the Terms of Use.