2013 IEEE 29th International Conference on Data Engineering (ICDE) (2002)
San Jose, California
Feb. 26, 2002 to Mar. 1, 2002
ISBN: 0-7695-1531-2
pp: 0628
Domenico Saccà , DEIS-UNICAL & ISI-CNR
Francesco Buccafurri , University of Reggio Calabria
Luigi Pontieri , DEIS-UNICAL & ISI-CNR
Domenico Rosaci , University of Reggio Calabria
Histograms are used to summarize the contents of relations for the estimation of query result sizes into a number of buckets. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide better estimations. This paper proposes to use a 32-bit information (4-level tree index) for each bucket for storing approximated cumulative frequencies at 7 internal intervals of a bucket. Both theoretical analysis and experimental results show that the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known techniques for constructing histograms, MaxDiff and V-Optimal, thus obtaining high improvements in the frequency estimation over inter-bucket ranges w.r.t. the original methods.
histograms, range query estimation, OLAP queries
doi:10.1109/ICDE.2002.994780
