Scientific and Statistical Database Management, International Conference on (1997)

Olympia, WA

Aug. 11, 1997 to Aug. 13, 1997

ISBN: 0-8186-7952-2

pp: 30

Yun-Wu Huang , IBM T. J. Watson Research Center

Ning Jing , Changsha Institute of Technology

Elke A. Rundensteiner , Worcester Polytechnic Institute

ABSTRACT

The development of a cost model for predicting the performance of spatial joins has been identified in the literature as an important and difficult problem. In this paper, we present the first cost model that can predict the performance of spatial joins using R-trees. Based on two existing R-trees (join targets), our model first estimates the number of expected I/Os for the join process by assuming a zero buffer size. Our method for this estimation extends the cost model for R-tree window queries (developed by Kamel and Faloutsos and by Pagel et al.) to also handle spatial joins (which are more complex). In the context of spatial join processing, this number of zero-buffer expected I/Os is not practical for performance prediction in a buffered environment. To model the buffer impact, we use an (exponential) distribution function to measure the probability that a buffer-less I/O would cause a page fault in a buffered environment. Based on this probability and the zero-buffer expected I/O cost, the estimated number of I/Os for an R-tree join can then be computed. The comparisons between the predictions from our cost model and the actual results from our experiments based on real GIS maps show that the average relative error ratio is about 10% with a maximum of about 20% for a wide range of buffer sizes. Therefore, our model is a useful tool for the query optimization of spatial join queries.

INDEX TERMS

CITATION

E. A. Rundensteiner, Y. Huang and N. Jing, "A Cost Model for Estimating the Performance of Spatial Joins Using R-trees,"

*Scientific and Statistical Database Management, International Conference on(SSDBM)*, Olympia, WA, 1997, pp. 30.

doi:10.1109/SSDM.1997.621148

CITATIONS