18th International Conference on Data Engineering (ICDE'02)
Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation
San Jose, California
February 26-March 01
ISBN: 0-7695-1531-2
Obtaining fast and good quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications including, approximate querying, similarity searching and data mining in most application domains, rely on such good quality approximations. Histogram based approximation is a very popular method in database theory and practice to succinctly represent a data distribution in a space efficient manner.In this paper, we place the problem of histogram construction into perspective and we generalize it by raising the requirement of a finite data set and/or known data set size. We consider the case of an infinite data set on which data arrive continuously forming an infinite data stream. In this context, we present the first single pass algorithms capable of constructing histograms of provable good quality. We present algorithms for the fixed window variant of the basic histogram construction problem, supporting incremental maintenance of the histograms. The proposed algorithms trade accuracy for speed and allow for a graceful tradeoff between the two, based on application requirements.In the case of approximate queries on infinite data streams, we present a detailed experimental evaluation comparing our algorithms with other applicable techniques using real data sets, demonstrating the superiority of our proposal.
Index Terms:
Data Streams, Histograms, Queries, Incremental Algorithms
Citation:
Sudipto Guha, Nick Koudas, "Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation," icde, pp.0567, 18th International Conference on Data Engineering (ICDE'02), 2002
Usage of this product signifies your acceptance of the
Terms of Use.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||