This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Exploiting Punctuation Semantics in Continuous Data Streams
May/June 2003 (vol. 15 no. 3)
pp. 555-568

Abstract—As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators maintain state with no upper bound in size and, so, run out of memory. Blocking operators read an entire input before emitting a single output and, so, might never produce a result. We believe that a priori knowledge of a data stream can permit the use of such operators in some cases. We discuss a kind of stream semantics called punctuated streams. Punctuations in a stream mark the end of substreams allowing us to view an infinite stream as a mixture of finite streams. We introduce three kinds of invariants to specify the proper behavior of operators in the presence of punctuation. Pass invariants define when results can be passed on. Keep invariants define what must be kept in local state to continue successful operation. Propagation invariants define when punctuation can be passed on. We report on our initial implementation and show a strategy for proving implementations of these invariants are faithful to their relational counterparts.

[1] J. Albert, “Algebraic Properties of Bag Data Types,” Proc. Very Large Data Base Conf., pp. 211–219, 1991.
[2] A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom, “Characterizing Memory Requirements for Queries over Continuous Data Streams,” Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, pp. 221-232, June 2002.
[3] R. Avnur and J.M. Hellerstein, “Eddies: Continuously Adaptive Query Processing,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 261-272, May 2000.
[4] S. Babu and J. Widom, “Continuous Queries over Data Streams,” SIGMOD Record, vol. 30, no. 3, Sept. 2001.
[5] Namespaces in XML. T. Bray, D. Hollander, and A. Layman, eds., World Wide Web Consortium,http://www.w3.org/TRREC-xml-names/, Jan. 1999.
[6] D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik, “Monitoring Streams—A New Class of Data Management Applications,” Proc. 28th Int'l Conf. Very Large Data Bases, Aug. 2002.
[7] C. Cortes, K. Fisher, D. Pregibon, A. Rogers, and F. Smith, “Hancock: A Language for Extracting Signatures from Data Streams,” Proc. Sixth Int'l Conf. Knowledge Discovery and Data Mining, pp. 9-17, Aug. 2000.
[8] L. Fegaras, D. Levine, S. Bose, and V. Chaluvadi, “Query Processing of Streamed XML Data,” Proc. 11th Int'l Conf. Information and Knowledge Management, Nov. 2002.
[9] J. Gehrke, F. Korn, and D. Srivastava, “On Computing Correlated Aggregates over Continuous Data Streams,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 13-24, May 2001.
[10] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss, “Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries,” Proc. 27th Int'l Conf. Very Large Data Bases, pp. 79-88, Sept. 2001.
[11] G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, vol. 25, no. 2, pp. 73-170, June 1993.
[12] T. Hallgren and M. Carlsson, “Fudgets,” PhD thesis, Chalmers Univ. of Tech nology, Mar. 1998.
[13] J.M. Hellerstein, P.J. Haas, and H.J. Wang, "Online Aggregation," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1997, pp. 171-182.
[14] P. Hudak, The Haskell School of Expression: Learning Functional Programming through Multimedia. Cambridge Univ. Press, 2000.
[15] B.K. Livezey and R.R. Muntz, “ASPEN: A Stream Processing Environment,” Proc. Conf. Parallel Architectures and Languages Europe, pp. 374-388, 1989.
[16] S. Madden and M.J. Franklin, “Fjording the Stream: An Architecture for Queries over Streaming Sensor Data,” Proc. 18th IEEE Int'l Conf. Data Eng., pp. 555-566, Feb. 2002.
[17] S. Madden, M. Shah, J.M. Hellerstein, and V. Raman, “Continuously Adaptive Continuous Queries over Streams,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 49-60, June 2002.
[18] J. Naughton, D. DeWitt, D. Maier, J. Chen, L. Galanis, K. Tufte, J. Kang, Q. Luo, N. Prakash, and F. Tian, “The Niagara Query System,” The IEEE Data Eng. Bull., vol. 24, no. 2, pp. 27-33, June 2000.
[19] D.S. Parker, “Stream Data Analysis in Prolog,” The Practice of Prolog, L. Sterling, ed., chapter 8, MIT Press, 1990.
[20] D.S. Parker, R.R. Muntz, and L. Chau, “The Tangram Stream Query Processing System,” Proc. Fifth IEEE Int'l Conf. Data Eng., Feb. 1989.
[21] A. Schmidt, F. Waas, M. Kersten, D. Florescu, I. Manolescu, M.J. Carey, and R. Busse, “The XML Benchmark Project,” Technical Report INS-R0103, Centrum voor Wiskunde en Informatica, Apr. 2001.
[22] P. Seshadri, M. Livny, and R. Ramakrishnan, “Sequence Query Processing,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 430-441, May 1994.
[23] J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. Naughton, and D. Maier, “Architecting a Network Query Engine for Producing Partial Results,” WebDB (Informal Proc.), pp. 17-22, May 2000.
[24] M. Sullivan and A. Heybey, “Tribeca: A System for Managing Large Databases of Network Traffic,” Proc. 1998 USENIX Ann. Techincal Conf., June 1998.
[25] P. Tucker, D. Maier, T. Sheard, and L. Fegaras, “Enhancing Relational Operators for Querying over Punctuated Data Streams,” http://www.cse.ogi.edu/dot/niagara/pstream punctuating.pdf, 2002.
[26] T. Urhan and M.J. Franklin, “Xjoin: A Reactively-Scheduled Pipelined Join Operator,” The IEEE Data Eng. Bull., vol. 23, no. 2, pp. 27-33, June 2000.
[27] A. Wilschut and P. Apers,“Dataflow query execution in parallel main-memory environment,” Proc. First Conf. Parallel and Distributed Information Systems, pp. 68-77, Dec. 1991.

Index Terms:
Continuous queries, stream semantics, continuous data streams, query operators, stream iterators.
Citation:
Peter A. Tucker, David Maier, Tim Sheard, Leonidas Fegaras, "Exploiting Punctuation Semantics in Continuous Data Streams," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 555-568, May-June 2003, doi:10.1109/TKDE.2003.1198390
Usage of this product signifies your acceptance of the Terms of Use.