This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
October 2007 (vol. 19 no. 10)
pp. 1363-1380
Ling Liu, IEEE
Tuple dropping, though commonly used for loadshedding in most data stream operations, is generally inadequatefor multi-way, windowed stream joins. The join output rate canbe unnecessarily reduced because tuple dropping fails to exploitthe time correlations likely to exist among interrelated streams.In this paper, we introduce GrubJoin - an adaptive, multi-way,windowed stream join that effectively performs time correlationawareCPU load shedding. GrubJoin maximizes the output rateby achieving near-optimal window harvesting, which picks onlythe most profitable segments of individual windows for the join.Due mainly to the combinatorial explosion of possible multi-wayjoin sequences involving different window segments, GrubJoinfaces unique challenges that do not exist for binary joins, suchas determining the optimal window harvesting configurationin a time efficient manner and learning the time correlationsamong the streams without introducing overhead. To tacklethese challenges, we formalize window harvesting as an optimizationproblem, develop greedy heuristics to determine nearoptimalwindow harvesting configurations and use approximationtechniques to capture the time correlations. Our experimentalresults show that GrubJoin is vastly superior to tuple droppingwhen time correlations exist and is equally effective when timecorrelations are nonexistent.

[1] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom, “STREAM: The Stanford Stream Data Manager,” IEEE Data Eng. Bull., vol. 26, 2003.
[2] H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, and S. Zdonik, “Retrospective on Aurora,” VLDB J., special issue on data stream processing, 2004.
[3] S. Chandrasekaran, O. Cooper, A. Deshpande, M.J. Franklin, J.M. Hellerstein, W. Hong, S. Krishnamurthy, S.R. Madden, V. Raman, F. Reiss, and M.A. Shah, “TelegraphCQ: Continuous Dataflow Processing for an Uncertain World,” Proc. First Biennial Conf. Innovative Data Systems Research (CIDR '03), 2003.
[4] Streambase Systems, http:/www.streambase.com/, May 2005.
[5] N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani, “Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core,” Proc. 25th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '06), 2006.
[6] K.-L. Wu, P.S. Yu, B. Gedik, K.W. Hildrum, C. Aggarwal, E. Bouillet, W. Fan, D.A. George, X. Gu, G. Luo, and H. Wang, “Challenges and Experience in Prototyping a Multi-Modal Stream Analytic and Monitoring Application on System S,” Technical Report RC24199, IBM T.J. Watson Research, 2007.
[7] M.A. Hammad, W.G. Aref, and A.K. Elmagarmid, “Stream Window Join: Tracking Moving Objects in Sensor-Network Databases,” Proc. 15th Int'l Conf. Scientific and Statistical Database Management (SSDBM '03), 2003.
[8] B. Gedik, K.-L. Wu, P.S. Yu, and L. Liu, “Adaptive Load Shedding for Windowed Stream Joins,” Proc. 14th ACM Conf. Information and Knowledge Management (CIKM '05), 2005.
[9] A.M. Ayad and J.F. Naughton, “Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams,” Proc. 23rd ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), 2004.
[10] N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker, “Load Shedding in a Data Stream Manager,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), 2003.
[11] B. Gedik, K.-L. Wu, P.S. Yu, and L. Liu, “A Load Shedding Framework and Optimizations for m-Way Windowed Stream Joins,” Proc. 23rd IEEE Int'l Conf. Data Eng. (ICDE '07), 2007.
[12] U. Srivastava and J. Widom, “Memory-Limited Execution of Windowed Stream Joins,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.
[13] F. Li, C. Chang, G. Kollios, and A. Bestavros, “Characterizing and Exploiting Reference Locality in Data Stream Applications,” Proc. 22nd IEEE Int'l Conf. Data Eng. (ICDE '06), 2006.
[14] S. Helmer, T. Westmann, and G. Moerkotte, “Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships,” Proc. 24th Int'l Conf. Very Large Data Bases (VLDB '98), 1998.
[15] S.D. Viglas, J.F. Naughton, and J. Burger, “Maximizing the Output Rate of m-Way Join Queries over Streaming Information Sources,” Proc. 29th Int'l Conf. Very Large Data Bases (VLDB '03), 2003.
[16] N. Tatbul and S. Zdonik, “A Subset-Based Load Shedding Approach for Aggregation Queries over Data Streams,” Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), 2006.
[17] C. Pu and L. Singaravelu, “Fine-Grain Adaptive Compression in Dynamically Variable Networks,” Proc. 25th IEEE Int'l Conf. Distributed Computing Systems (ICDCS '05), 2005.
[18] L. Golab, S. Garg, and M.T. Ozsu, “On Indexing Sliding Windows over Online Data Streams,” Proc. Ninth Int'l Conf. Extending Database Technology (EDBT '04), 2004.
[19] J. Kang, J. Naughton, and S. Viglas, “Evaluating Window Joins over Unbounded Streams,” Proc. 19th IEEE Int'l Conf. Data Eng. (ICDE '03), 2003.
[20] S. Chaudhuri, R. Motwani, and V. Narasayya, “On Random Sampling over Joins,” Proc. 19th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '99), 1999.
[21] T. Ibaraki and T. Kameda, “On the Optimal Nesting Order for Computing N-Relational Joins,” ACM Trans. Database Systems, vol. 9, no. 3, pp. 482-502, 1984.
[22] A. Das, J. Gehrke, and M. Riedewald, “Approximate Join Processing over Data Streams,” Proc. 22nd ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '03), 2003.
[23] J. Xie, J. Yang, and Y. Chen, “On Joining and Caching Stochastic Streams,” Proc. 24th ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), 2005.
[24] S. Chandrasekaran and M.J. Franklin, “Remembrance of Streams Past: Overload-Sensitive Management of Archived Streams,” Proc. 30th Int'l Conf. Very Large Data Bases (VLDB '04), 2004.

Index Terms:
Stream Joins, Query processing, Load Shedding
Citation:
Buğra Gedik, Kun-Lung Wu, Philip S. Yu, Ling Liu, "GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 10, pp. 1363-1380, Oct. 2007, doi:10.1109/TKDE.2007.190630
Usage of this product signifies your acceptance of the Terms of Use.