This Article 
 Bibliographic References 
 Add to: 
Analysis of Range Queries and Self-Spatial Join Queries on Real Region Datasets Stored Using an R-Tree
September/October 2000 (vol. 12 no. 5)
pp. 751-762

Abstract—In this paper, we study the node distribution of an R-tree storing region data, like, for instance, islands, lakes, or human-inhabited areas. We will show that real region datasets are packed in an R-tree into minimum bounding rectangles (MBRs) whose area distribution follows the same power law, named REGAL (REGion Area Law), as that for the regions themselves. Moreover, these MBRs are packed in their turn into MBRs following the same law, and so on iteratively, up to the root of the R-tree. Based on this observation, we are able to accurately estimate the search effort for range queries, using a small number of easy-to-retrieve parameters. Furthermore, since our analysis exploits, through a realistic mathematical model, the proximity relations existing among the regions in the dataset, we show how to use our model to predict the selectivity of a self-spatial join query posed on the dataset. Experiments on a variety of real datasets (islands, lakes, human-inhabited areas) show that our estimations are accurate, enjoying a geometric average relative error ranging from 22 percent to 32 percent for the search effort of a range query, and from 14 percent to 34 percent for the selectivity of a self-spatial join query. This is significantly better than using a naive model based on uniformity assumption, which gives rise to a geometric average relative error up to 270 percent and up to 85 percent for the two problems, respectively.

[1] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[2] A. Belussi and C. Faloutsos, “Estimating the Selectivity of Spatial Queries Using the‘Correlation’Fractal Dimension,” Proc. Very Large Data Bases Conf., pp. 299–310, Sept. 1995.
[3] J. Korcak, “Deux Types Fondamentaux de Distribution Statistique,” Bull. de l'Institute Int'l de Statistique, vol. 3, pp. 295–299, 1938.
[4] S. Christodoulakis,“Implications of certain assumptions in database performance evaluation,” ACM Trans. on Database Systems, vol. 9, no. 2, pp. 163-186, June 1984.
[5] C. Faloutsos, M. Ranganathan, and I. Manolopoulos, “Fast Subsequence Matching in Time Series Databases,” Proc. ACM SIGMOD, pp. 419-429, May 1994.
[6] C. Faloutsos, T. Sellis, and N. Roussopoulos, “Analysis of Object Oriented Spatial Access Methods,” Proc. ACM SIGMOD Conf. Management of Data, 1987.
[7] R. L. Graham, D. E. Knuth, and O. Patashnik,Concrete Mathematics. Reading, MA: Addison-Wesley, 1989.
[8] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[9] H.H. Hastings and G. Sugihara, Fractals. Oxford Science Publications, 1993.
[10] J. Jin, N. An, and A. Sivasubramaniam, Analyzing Range Queries on Spatial Data Proc. Int'l Conf. Data Eng., 2000.
[11] I. Kamel and C. Faloutsos, “On Packing R-Trees,” Proc. Second Int'l Conf. Information and Knowledge Management (CIKM), 1993.
[12] W. Lu, J. Han, “Information Associated Join Index for Spatial Range Search,” Int'l J. Geographical Information Systems, vol. 9, no. 3, pp. 221–249, 1995.
[13] B.B. Mandelbrot, The Fractal Geometry of Nature. W.H. Freeman and Company, 1982.
[14] B.-U. Pagel, H.-W. Six, H. Toben, and P. Widmayer, “Towards an Analysis of Range Query Performance,” Proc. 12th ACM Symp. Principles of Database Systems (PODS), 1993.
[15] G. Proietti and C. Faloutsos, Accurate Modeling of Region Data IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 874-383, Nov./Dec. 2001.
[16] N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest Neighbor Queries,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 71-79, 1995.
[17] B. Salzberg, “Access Methods,” The Computer Science and Eng. Handbook, pp. 1,012–1,037, 1997.
[18] M. Schroeder, Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise. New York: W.H. Freeman and Company, 1991.
[19] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-Tree: A Dynamic Index for Multidimensional Objects,” Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 1987.
[20] Y. Theodoridis and T. Sellis, “A Model for the Prediction of R-tree Performance,” Proc. 15th ACM Symp. Principles of Database Systems (PODS), 1996.

Index Terms:
R-tree, region dataset, I/O complexity, selectivity, window query, self-spatial join.
Guido Proietti, Christos Faloutsos, "Analysis of Range Queries and Self-Spatial Join Queries on Real Region Datasets Stored Using an R-Tree," IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 5, pp. 751-762, Sept.-Oct. 2000, doi:10.1109/69.877506
Usage of this product signifies your acceptance of the Terms of Use.