The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - October (2007 vol.19)
pp: 1363-1380
Ling Liu , IEEE
ABSTRACT
Tuple dropping, though commonly used for loadshedding in most data stream operations, is generally inadequatefor multi-way, windowed stream joins. The join output rate canbe unnecessarily reduced because tuple dropping fails to exploitthe time correlations likely to exist among interrelated streams.In this paper, we introduce GrubJoin - an adaptive, multi-way,windowed stream join that effectively performs time correlationawareCPU load shedding. GrubJoin maximizes the output rateby achieving near-optimal window harvesting, which picks onlythe most profitable segments of individual windows for the join.Due mainly to the combinatorial explosion of possible multi-wayjoin sequences involving different window segments, GrubJoinfaces unique challenges that do not exist for binary joins, suchas determining the optimal window harvesting configurationin a time efficient manner and learning the time correlationsamong the streams without introducing overhead. To tacklethese challenges, we formalize window harvesting as an optimizationproblem, develop greedy heuristics to determine nearoptimalwindow harvesting configurations and use approximationtechniques to capture the time correlations. Our experimentalresults show that GrubJoin is vastly superior to tuple droppingwhen time correlations exist and is equally effective when timecorrelations are nonexistent.
INDEX TERMS
Stream Joins, Query processing, Load Shedding
CITATION
Kun-Lung Wu, Philip S. Yu, Ling Liu, "GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding", IEEE Transactions on Knowledge & Data Engineering, vol.19, no. 10, pp. 1363-1380, October 2007, doi:10.1109/TKDE.2007.190630
21 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool