2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing A New Method for Estimating the Number of Distinct Values over Data Streams Catholic University of Daegu, Daegu, Korea May 27-May 29 ISBN: 978-0-7695-3642-2
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/SNPD.2009.39
Virtually all query optimization methods in data stream management system (DSMS) require a means of estimating the number of distinct values of an attribute in a data stream. Accurate assessment of the number of distinct values can be crucial for selecting a good query plan. Due to data streams’ continuous, real-time and unbounded characteristics, data streams may not be stored in limited memory an effective method. Therefore, estimating the number of distinct values over data streams is a more difficult problem. In this paper, combining with data streams’ properties and analyzing BloomFilter, we present a new estimation method based on circular BloomFilter using limited space. We store the distinct values in circular BloomFilter to solve effectively the problem that data streams could not be stored in limited memory. The theoretical analysis and the results of experiment indicate that the estimation method is more feasible and highly effective.
Index Terms:
BloomFilter, Data Streams, the Number of Distinct Values, circular BloomFilter
Citation:
Longjiang Guo, Yingshu Li, Meirui Ren, Zhongzhao Zhang, "A New Method for Estimating the Number of Distinct Values over Data Streams," snpd, pp.71-76, 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, 2009 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||