This Article 
 Bibliographic References 
 Add to: 
Toward an Accurate Analysis of Range Queries on Spatial Data
March/April 2003 (vol. 15 no. 2)
pp. 305-323

Abstract—Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.

[1] N. An, R. Lu, L. Qian, A. Sivasubramaniam, and T. Keefe, “Storing Spatial Data on a Network of Workstations,” Cluster Computing, The J. Networks, Software Tools&Applications: Special Issue on I/O in Shared-Storage Clusters, vol. 2, no. 4, pp. 259-270, 1999.
[2] S. Acharya, V. Poosala, and S. Ramaswamy, Selectivity Estimation in Spatial Databases Proc. SIGMOD, June 1999.
[3] W.G. Aref and H. Samet, “Optimization Strategies for Spatial Query Processing,” Proc. 17th Conf. Very Large Databases, pp. 81-90, 1991.
[4] W.G. Aref and H. Samet, “A Cost Model for Query Optimization Using R-Trees,” Proc. ACM GIS Workshop, pp. 60-67, 1994.
[5] A. Belussi and C. Faloutsos, “Estimating the Selectivity of Spatial Queries Using the‘Correlation’Fractal Dimension,” Proc. Very Large Data Bases Conf., pp. 299–310, Sept. 1995.
[6] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[7] R. Finkel and J. Bentley, “Quad Trees: A Data Structure for Retrieval on Composite Keys,” Acta Informatica, vol. 4, pp. 1-9, 1974.
[8] C. Faloutsos and I. Kamel, “Beyond Uniformity and Independence: Analysis of R-Trees Using the Concept of Fractal Dimension,” Proc. 13th ACM Symp. Principles of Database Systems (PODS), 1994.
[9] C. Faloutsos, T. Sellis, and N. Roussopoulos, “Analysis of Object Oriented Spatial Access Methods,” Proc. ACM SIGMOD Conf. Management of Data, 1987.
[10] P.J. Besl, "Geometric Modeling and Computer Vision," Proc. IEEE, vol. 76, pp. 936-958, 1988.
[11] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[12] A. Henrich, H.-W. Six, and P. Widmayer, “The LSD Tree: Spatial Access to Multidimensional Point and Non Point Objects,” Proc. 15th Int'l Conf. Very Large Data Bases (VLDB), 1989.
[13] H.V. Jagadish, "Linear Clustering of Objects with Multiple Attributes," Proc. Int'l Conf. Management of Data, pp. 332-342, ACM SIGMOD, 1990.
[14] J. Jin, N. An, and A. Sivasubramaniam, “Analyzing Range Queries on Spatial Data,” Technical Report CSE-99-005, Dept. of Computer Science and Engineering, The Pennsylvania State Univ., June 1999.
[15] J. Jin, N. An, and A. Sivasubramaniam, Analyzing Range Queries on Spatial Data Proc. Int'l Conf. Data Eng., 2000.
[16] J. Jin, “Techniques for Analyzing Range Queries on R-Trees,” master's thesis, Dept. of Computer Science&Eng., Penn State Univ., May 1999.
[17] I. Kamel and C. Faloutsos, “On Packing R-Trees,” Proc. Second Int'l Conf. Information and Knowledge Management (CIKM), 1993.
[18] S.T. Leutenegger and M.A. Lopez, “The Effect of Buffering on the Performance of R-Trees,” Proc. 14th IEEE Int'l Conf. Data Eng. (ICDE), 1998.
[19] R.W. Marx, “The TIGER System: Automating the Geographic Structure of the United States Census,” Government Publications Rev., vol. 13, pp. 181-201, 1986.
[20] D.J. Mavriplis, “An Advancing Front Delaunay Triangulation Algorithm Designed for Robustness,” J. Computational Physics, pp. 90-101, 1995.
[21] M. Muralikrishna and D. DeWitt, Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries Proc. ACM SIGMOD Conf. Management of Data (SIGMOD '88), pp. 28-36, 1988.
[22] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, “The Grid File: An Adaptable, Symmetric Multikey File Structure,” Proc. Int'l Workshop Graph Theoretic Concepts in Computer Science, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[23] G. Proietti and C. Faloutsos, I/O Complexity for Range Queries on Region Data Stored Using an R-Tree Proc. Int'l Conf. Data Eng., 1999.
[24] V. Poosala, “Histogram-Based Estimation Techniques in Databases,” PhD thesis, Univ. of Wisconsin-Madison, 1997.
[25] Int'l Research Inst. For Climate Prediction, IRI/LDEO Climate Data Library,
[26] B.-U. Pagel, H.-W. Six, H. Toben, and P. Widmayer, “Towards an Analysis of Range Query Performance,” Proc. 12th ACM Symp. Principles of Database Systems (PODS), 1993.
[27] N. Roussopoulos and D. Leifker, “Direct Spatial Search on Pictorial Databases Using Packed R-trees,” Proc. ACM SIGMOD Conf. Management of Data, 1985.
[28] J.T. Robinson, “The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 10-18, 1981.
[29] H. Samet, “Spatial Data Structures,” Modern Database Systems, pp. 361-385, 1995.
[30] G. Piatetsky-Shapiro and C. Connel, “Accurate Estimation of the Number of Tuples Satisfying a Condition,” Proc. 1984 ACM-SIGMOD Conf., pp. 256-276, June 1984.
[31] S. Shekhar, S. Ravada, A. Fetterer, X. Liu, and C.T. Lu, “Spatial Databases: Accomplishments and Research Needs,” IEEE Trans. Knowledge and Data Eng., vol. 11, no. 1, pp. 45-55, Jan./Feb. 1999.
[32] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-Tree: A Dynamic Index for Multidimensional Objects,” Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 1987.
[33] Y. Theodoridi and D. Papadias, “Range Queries Involving Spatial Relations: A Performance Analysis,” Proc. COSIT '95, pp. 537-551, Sept. 1995.
[34] Y. Theodoridis and T. Sellis, “A Model for the Prediction of R-tree Performance,” Proc. 15th ACM Symp. Principles of Database Systems (PODS), 1996.
[35] Y. Theodoridis, E. Stefanakis, and T. Sellis, “An Efficient Cost Model for Spatial Queries Using R-Trees,” Technical Report KDBSLAB-TR-97-01, Dept. Electrical and Computer Eng., Computer Science Division, Nat'l Technical Univ. of Athens, Feb. 1997.

Index Terms:
Spatial data, range query, selectivity estimation, node access estimation, histogram-based estimation, R-trees.
Ning An, Ji Jin, Anand Sivasubramaniam, "Toward an Accurate Analysis of Range Queries on Spatial Data," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 305-323, March-April 2003, doi:10.1109/TKDE.2003.1185836
Usage of this product signifies your acceptance of the Terms of Use.