This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error
October 2007 (vol. 19 no. 10)
pp. 1404-1419
Due to the resource limitation in the data stream environments, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a Data Stream Management System (DSMS). In the literature, recent research has been elaborated upon minimizing the local error metric of an individual stream. However, many emergent applications, such as stock marketing and sensor detection, also call for the need of recording multiple streams in a commercial DSMS. As shown in our thorough analysis and experimental studies, minimizing global error in multiple-stream environments leads to good reliability for DSMS to answer the queries; in contrast, only minimizing local error may lead to significant loss of query accuracy. As such, we first study in this paper the problem of maintaining the wavelet coefficients of multiple streams within collective memory so that the predetermined global error metric is minimized. Moreover, we also examine a promising application in the multistream environment, i.e., the queries for top-k range sum. We resolve the problem of efficient top-k query processing with minimized global error by developing a general framework. For the purposes of maintaining the wavelet coefficients and processing top-k queries, several well-designed algorithms are utilized to optimize the performance of each primary component of this general framework. We also evaluate the proposed algorithms empirically on real and simulated data streams and show that our framework can process top-k queries accurately and efficiently.

[1] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. 21st ACM Symp. Principles of Database Systems (PODS '02), 2002.
[2] B. Babcock and C. Olston, “Distributed ${\rm{Top}}{\hbox{-}}k$ Monitoring,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), 2003.
[3] W.-T. Balke, W. Nejdl, W. Siberski, and U. Thaden, “Progressive Distributed ${\rm{Top}}{\hbox{-}}k$ Retrieval in Peer-to-Peer Networks,” Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE '05), 2005.
[4] M. Bawa, R.J. Bayardo, S. Rajagopalan, and E.J. Shekita, “Make It Fresh, Make It Quick—Searching a Network of Personal Webservers,” Proc. 12th Int'l World Wide Web Conf. (WWW '03), 2003.
[5] A. Bulut and A. Singh, “SWAT: Hierarchical Stream Summarization in Large Networks,” Proc. 19th IEEE Int'l Conf. Data Eng. (ICDE '03), 2003.
[6] A. Bulut and A. Singh, “A Unified Framework for Monitoring Data Streams in Real Time,” Proc. 21st IEEE Int'l Conf. Data Eng. (ICDE '05), 2005.
[7] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, “Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases,” ACM Trans. Database Systems, vol. 27, no. 2, 2002.
[8] S. Chaudhuri, L. Gravano, and A. Marian, “Optimizing ${\rm{Top}}{\hbox{-}}k$ Selection Queries over Multimedia Repositories,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 8, Aug. 2004.
[9] R. Cheng, D. Kalashnikov, and S. Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), 2003.
[10] R. Chengy, B. Kaox, S. Prabhakarz, A. Kwanx, and Y. Tuz, “Adaptive Stream Filters for Entity-Based Queries with Non-Value Tolerance,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[11] G. Cormode, S. Muthukrishnan, and I. Rozenbaum, “Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[12] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms for Middleware,” Proc. 20th ACM Symp. Principles of Database Systems (PODS '01), 2001.
[13] M. Garofalakis and P.B. Gibbons, “Probabilistic Wavelet Synopses,” ACM Trans. Database Systems, vol. 29, no. 1, 2004.
[14] M. Garofalakis and A. Kumar, “Deterministic Wavelet Thresholding for Maximum-Error Metrics,” Proc. 23rd ACM Symp. Principles of Database Systems (PODS '04), 2004.
[15] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss, “One-Pass Wavelet Decompositions of Data Streams,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, May/June 2003.
[16] S. Guha, D. Gunopulos, and N. Koudas, “Correlating Synchronous and Asynchronous Data Streams,” Proc. Ninth ACM Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[17] S. Guha and B. Harb, “Wavelet Synopsis for Data Streams: Minimizing Non-Euclidean Error,” Proc. 11th Int'l Conf. Knowledge Discovery and Data Mining (KDD '05), 2005.
[18] S. Guha, C. Kim, and K. Shim, “XWAVE: Approximate Extended Wavelets for Streaming Data,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[19] S. Guha, N. Koudas, and K. Shim, “Approximation and Streaming Algorithms for Histogram Construction Problems,” ACM Trans. Database Systems, vol. 31, no. 1, 2006.
[20] S. Guha, K. Shim, and J. Woo, “REHIST: Relative Error Histogram Construction Algorithms,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[21] U. Güntzer, W.-T. Balke, and W. Kießling, “Optimizing Multi-Feature Queries for Image Databases,” Proc. 26th Int'l Conf. Very Large Data Bases (VLDB '00), 2000.
[22] M.J. Hsieh, M.S. Chen, and P.S. Yu, “Integrating DCT and DWT for Approximating Cube Streams,” Proc. 14th ACM Int'l Conf. Information and Knowledge Management (CIKM '05), 2005.
[23] H.V. Jagadish, H. Jin, B.C. Ooi, and K.-L. Tan, “Global Optimization of Histograms,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '01), 2001.
[24] P. Karras and N. Mamoulis, “One-Pass Wavelet Synopses for Maximum-Error Metrics,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[25] N. Koudas, B.C. Ooi, K.-L. Tan, and R. Zhang, “Approximate NN Queries on Streams with Guaranteed Error/Performance Bounds,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[26] K.H. Liu, W.G. Teng, and M.S. Chen, “Incremental Maintenance of Wavelet Synopses for Data Streams,” Proc. ICDM Workshop Temporal Data Mining: Algorithms, Theory and Applications (TDM '05), 2005.
[27] Y. Matias, J.S. Vitter, and M. Wang, “Wavelet-Based Histograms for Selectivity Estimation,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '98), 1998.
[28] S. Michel, P. Triantafillou, and G. Weikum, “KLEE: A Framework for Distributed Top-k Query Algorithms,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[29] S. Papadimitriou, J. Sun, and C. Faloutsos, “Streaming Pattern Discovery in Multiple Time Series,” Proc. 31st Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[30] C.S. Perng, H. Wang, S.R. Zhang, and D.S. Parker, “Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases,” Proc. 16th IEEE Int'l Conf. Data Eng. (ICDE '00), 2000.
[31] E.J. Stollnitz, T.D. Derose, and D.H. Salesin, Wavelets for Computer Graphics: Theory and Application. Morgan Kaufmann, 1996.
[32] J.S. Vitter and M. Wang, “Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), 1999.

Index Terms:
Data Stream Management System, Top-K Queries, Wavelet Synopses
Citation:
Hao-Ping Hung, Kun-Ta Chuang, Ming-Syan Chen, "Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 10, pp. 1404-1419, Oct. 2007, doi:10.1109/TKDE.2007.1070
Usage of this product signifies your acceptance of the Terms of Use.