loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
16th International Conference on Data Engineering (ICDE'00)
San Diego, California
February 28-March 03
ISBN: 0-7695-0506-6
Caetano Traina Jr., University of Sao Paulo at San Carlos
Agma J.M. Traina, University of Sao Paulo at San Carlos
Christos Faloutsos, Carnegie Mellon University
This paper discusses the problem of selectivity estimation for range queries in metric datasets, which include vector, or dimensional, datasets as a special case. The main contribution of this paper is that, surprisingly, many different real datasets follow a "power law". From this observation we derive an analysis for the distance distribution of metric datasets. This is the first analysis of distance distributions for real metric datasets.We called the exponent of our power law as "distance exponent". We show that it plays a relevant role for the analysis of real, metric datasets. Specifically, we show (a) how to exploit the distance exponent to derive formulas for selectivity estimation of range queries and (b) how to compute it quickly from a metric index tree.We performed several experiments on many real datasets (road intersections of U.S. counties, vectors characteristics extracted from face matching systems, sets of words, distance matrixes) and synthetic datasets (Sierpinsky triangle, a 2-dimensional uniform distribution and a 2-dimensional line). Our selectivity estimation formulas are accurate, within relative error from 4% to 17%, and always within one standard deviation from the analytical results. Moreover, we present also a quick algorithm to estimate the "distance exponent", which gives good accuracy and saves orders of magnitude in computation time.
Index Terms:
Selectivity estimation, metric access methods (MAM), metric trees, multimedia databases
Citation:
Caetano Traina Jr., Agma J.M. Traina, Christos Faloutsos, "Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees," icde, pp.195, 16th International Conference on Data Engineering (ICDE'00), 2000
Usage of this product signifies your acceptance of the Terms of Use.