GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding
Issue No. 10 - October (2007 vol. 19)
Buğra Gedik , IEEE
Kun-Lung Wu , IEEE
Philip S. Yu , IEEE
Ling Liu , IEEE
Tuple dropping, though commonly used for loadshedding in most data stream operations, is generally inadequatefor multi-way, windowed stream joins. The join output rate canbe unnecessarily reduced because tuple dropping fails to exploitthe time correlations likely to exist among interrelated streams.In this paper, we introduce GrubJoin - an adaptive, multi-way,windowed stream join that effectively performs time correlationawareCPU load shedding. GrubJoin maximizes the output rateby achieving near-optimal window harvesting, which picks onlythe most profitable segments of individual windows for the join.Due mainly to the combinatorial explosion of possible multi-wayjoin sequences involving different window segments, GrubJoinfaces unique challenges that do not exist for binary joins, suchas determining the optimal window harvesting configurationin a time efficient manner and learning the time correlationsamong the streams without introducing overhead. To tacklethese challenges, we formalize window harvesting as an optimizationproblem, develop greedy heuristics to determine nearoptimalwindow harvesting configurations and use approximationtechniques to capture the time correlations. Our experimentalresults show that GrubJoin is vastly superior to tuple droppingwhen time correlations exist and is equally effective when timecorrelations are nonexistent.
Stream Joins, Query processing, Load Shedding
K. Wu, L. Liu, B. Gedik and P. S. Yu, "GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding," in IEEE Transactions on Knowledge & Data Engineering, vol. 19, no. , pp. 1363-1380, 2007.