This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Incremental Evaluation of Sliding-Window Queries over Data Streams
January 2007 (vol. 19 no. 1)
pp. 57-72
Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query reevaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations.

[1] D.J. Abadi et al., “Aurora: A New Model and Architecture for Data Stream Management,” VLDB J.—The Int'l J. Very Large Data Bases, vol. 12, no. 2, pp. 120-139, 2003.
[2] D.J. Abadi et al., “The Design of the Borealis Stream Processing Engine,” Proc. Conf. Innovative Data Systems Research (CIDR), 2005.
[3] A. Arasu et al., “Data-Stream Management: Processing High-Speed Data Streams,” Chapter STREAM: The Stanford Data Stream Management System, Springer-Verlag, 2005.
[4] A. Arasu, S. Babu, and J. Widom, “CQL: A Language for Continuous Queries over Streams and Relations.,” Proc. Int'l Workshop Database Programming Languages (DBPL), 2003.
[5] A. Arasu and J. Widom, “Resource Sharing in Continuous Sliding-Window Aggregates,” Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.
[6] A. Ayad and J.F. Naughton, “Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams,” Proc. ACM SIGMOD Conf., 2004.
[7] B. Babcock, S. Babu, M. Datar, and R. Motwani, “Chain: Operator Scheduling for Memory Minimization in Data Stream Systems,” Proc. ACM SIGMOD Conf., pp. 253-264, 2003.
[8] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. ACM SIGMOD/PODS Conf., 2002.
[9] S. Babu, U. Srivastava, and J. Widom, “Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams,” ACM Trans. Database Systems, vol. 29, no. 3, pp. 545-580, 2004.
[10] J.A. Blakeley, P. Larson, and F.W. Tompa, “Efficiently Updating Materialized Views,” Proc. ACM SIGMOD Conf., 1986.
[11] D. Carney, U. Cetintemel, A. Rasin, S.B. Zdonik, M. Cherniack, and M. Stonebraker, “Operator Scheduling in a Data Stream Manager,” VLDB J.—The Int'l J. Very Large Data Bases, pp. 838-849, 2003.
[12] S. Chandrasekaran et al., “TelegraphCQ: Continuous Dataflow Processing for an Uncertain World,” Proc. Conf. Innovative Data Systems Research (CIDR), 2003.
[13] S. Chandrasekaran and M.J. Franklin, “Streaming Queries over Streaming Data,” Proc. 28th Int'l Conf. Very Large Data Bases, 2002.
[14] S. Chandrasekaran and M.J. Franklin, “PSoup: A System for Streaming Queries over Streaming Data,” VLDB J.—The Int'l J. Very Large Data Bases, vol. 12, no. 2, pp. 140-156, 2003.
[15] J. Chen, D.J. DeWitt, F. Tian, and Y. Wang, “NiagaraCQ: A Scalable Continuous Query System for Internet Databases,” Proc. ACM SIGMOD Conf., 2000.
[16] L. Golab and M.T. Ozsu, “Issues in Data Stream Management,” SIGMOD Record, vol. 32, no. 2, June 2003.
[17] L. Golab and M.T. Ozsu, “Update-Pattern-Aware Modeling and Processing of Continuous Queries,” Proc. ACM SIGMOD Conf., 2005.
[18] T. Griffin and L. Libkin, “Incremental Maintenance of Views with Duplicates,” Proc. ACM SIGMOD Conf., 1995.
[19] A. Gupta and I.S. Mumick, “Maintenance of Materialized Views: Problems, Techniques, and Applications,” IEEE Data Eng. Bull., vol. 18, no. 2, pp. 3-18, 1995.
[20] M.A. Hammad et al., “Nile: A Query Processing Engine for Data Streams (Demo),” Proc. Int'l Conf. Data Eng., 2004.
[21] M.A. Hammad, T.M. Ghanem, W.G. Aref, A.K. Elmagarmid, and M.F. Mokbel, “Efficient Pipelined Execution of Sliding Window Queries over Data Streams,” Technical Report CSD TR 03-035, Purdue Univ., June 2004.
[22] Z.G. Ives, D. Florescu, M. Friedman, A.Y. Levy, and D.S. Weld, “An Adaptive Query Execution System for Data Integration,” Proc. ACM SIGMOD Conf., 1999.
[23] J. Kang, J.F. Naughton, and S.D. Viglas, “Evaluating Window Joins over Unbounded Streams,” Proc. Int'l Conf. Data Eng., 2003.
[24] R. Motwani et al., “Query Processing, Approximation, and Resource Management in a Data Stream Management System,” Proc. Conf. Innovative Data Systems Research (CIDR), 2003.
[25] U. Srivastava and J. Widom, “Flexible Time Management in Data Stream Systems,” Proc. ACM SIGMOD/PODS Conf., 2004.
[26] U. Srivastava and J. Widom, “Memory-Limited Execution of Windowed Stream Joins,” Proc. Int'l Conf. Very Large Databases (VLDB), 2004.
[27] D.B. Terry, D. Goldberg, D. Nichols, and B.M. Oki, “Continuous Queries over Append-Only Databases,” Proc. ACM SIGMOD Conf., 1992.
[28] P.A. Tucker, D. Maier, T. Sheard, and L. Fegaras, “Exploiting Punctuation Semantics in Continuous Data Streams,” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 555-568, May/June 2003.
[29] T. Urhan, M.J. Franklin, and L. Amsaleg, “Cost Based Query Scrambling for Initial Delays,” Proc. ACM SIGMOD Conf., 1998.
[30] S. Viglas, J.F. Naughton, and J. Burger, “Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources,” Proc. 29th Int'l Conf. Very Large Data Bases, 2003.

Index Terms:
Data stream management systems, pipelined query execution, negative tuples.
Citation:
Thanaa M. Ghanem, Moustafa A. Hammad, Mohamed F. Mokbel, Walid G. Aref, Ahmed K. Elmagarmid, "Incremental Evaluation of Sliding-Window Queries over Data Streams," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 57-72, Jan. 2007, doi:10.1109/TKDE.2007.12
Usage of this product signifies your acceptance of the Terms of Use.