Issue No. 03 - May/June (2003 vol. 15)

ISSN: 1041-4347

pp: 569-572

Divesh Srivastava , IEEE Computer Society

Flip Korn , IEEE Computer Society

Johannes Gehrke , IEEE Computer Society

ABSTRACT

<p><b>Abstract</b>—In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries formed by composing basic aggregates on <tmath>(x,y)</tmath> pairs and are of the form <tmath>SUM{g(y) : x \leq f(AGG(x))}</tmath>, where <tmath>AGG(x)</tmath> can be any basic aggregate and <tmath>f()</tmath>, <tmath>g()</tmath> are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when <tmath>AGG(x)</tmath> can be computed in limited space (e.g., <b>MIN</b>, <b>MAX</b>, <b>AVG</b>), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when <tmath>AGG(x)</tmath> is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.</p>

INDEX TERMS

Correlated aggregates, data streams, approximation, summary structures, a priori error bounds, IP network management.

CITATION

S. Muthukrishnan, Divesh Srivastava, Flip Korn, Johannes Gehrke, Rohit Ananthakrishna, Abhinandan Das, "Efficient Approximation of Correlated Sums on Data Streams",

*IEEE Transactions on Knowledge & Data Engineering*, vol. 15, no. , pp. 569-572, May/June 2003, doi:10.1109/TKDE.2003.1198391SEARCH