This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Performance Analysis of R*-Trees with Arbitrary Node Extents
June 2004 (vol. 16 no. 6)
pp. 653-668

Abstract—Existing analysis for R-trees is inadequate for several traditional and emerging applications including, for example, temporal, spatio-temporal, and multimedia databases because it is based on the assumption that the extents of a node are identical on all dimensions, which is not satisfied in these domains. In this paper, we propose analytical models that can accurately predict R*-tree performance without this assumption. Our derivation is based on the novel concept of extent regression function, which computes the node extents as a function of the number of node splits. Detailed experimental evaluation reveals that the proposed models are accurate, even in cases where previous methods fail completely.

[1] S. Acharya, V. Poosala, and S. Ramaswamy, Selectivity Estimation in Spatial Databases Proc. ACM SIGMOD Int'l Conf. Management of Data, 1999.
[2] A. Aboulnaga and J. Naughton, Accurate Estimation of the Cost of Spatial Selections Proc. Int'l Conf. Data Eng., 2000.
[3] C. Bohm, A Cost Model for Query Processing in High Dimensional Data Spaces ACM Trans. Database Systems, vol. 25, no. 2, pp. 129-178, 2000.
[4] B. Babcock, S. Chaudhuri, and G. Das, Dynamic Sample Selection for Approximate Query Processing Proc. 2003 ACM SIGMOD Int'l Conf. Management of Data, 2003.
[5] A. Belussi and C. Faloutsos, Estimating the Selectivity of Spatial Queries Using the Correlation's Fractal Dimension Proc. 21st Int'l Conf. Very Large Data Bases (VLDB), 1995.
[6] N. Bruno, L. Gravano, and S. Chaudhuri, STHoles: A Workload Aware Multidimensional Histogram Proc. ACM SIGMOD Int'l Conf. Management of Data, 2001.
[7] R. Bliujute, C. Jensen, S. Saltenis, and G. Slivinskas, R-Tree Based Indexing of Now-Relative Bitemporal Data Proc. 24th Int'l Conf. Very Large Data Bases, 1998.
[8] S. Berchtold, D. Keim, and H.P. Kriegel, The X-Tree: An Index Structure for High-Dimensional Data Proc. Int'l Conf. Very Large Databases (VLDB), 1996.
[9] B. Blohsfeld, D. Korus, and B. Seeger, A Comparison of Selectivity Estimators for Range Queries on Metric Attributes Proc. Int'l Conf. Very Large Databases (VLDB), 1999.
[10] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles Proc. ACM SIGMOD Int'l Conf. Management of Data, 1990.
[11] A. Deshpande, M. Garofalakis, and R. Rastogi, Independence Is Good: Dependency-Based Histogram Synopses for High-Dimensional Data Proc. 2001 ACM SIGMOD Int'l Conf. Management of Data, 2001.
[12] C. Faloutsos and I. Kamel, Beyond Uniformity and Independence, Analysis of R-Trees Using the Concept of Fractal Dimension Proc. ACM SIGACT-SIGMOD-SIGART Principles of Database Systems, 1994.
[13] C. Faloutsos, T. Sellis, and N. Roussopoulos, Analysis of Object Oriented Spatial Access Methods Proc. ACM SIGMOD Int'l Conf. on Management of Data, 1987.
[14] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching Proc. ACM SIGMOD Int'l Conf. Management of Data, 1984.
[15] D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi, Approximate Multi-Dimensional Aggregate Range Queries over Real Attributes Proc. 2000 ACM SIGMOD Int'l Conf. Management of Data, 2000.
[16] C. Jermaine, Making Sampling Robust with APA Proc. VLDB Conf., 2003.
[17] J. Jin, N. An, and A. Sivasubramaniam, Analyzing Range Queries on Spatial Data Proc. Int'l Conf. Data Eng., 2000.
[18] M. Jurgens and H. Lenz, PISA: Performance Models for Index Structures with and without Aggregated Data Proc. 11th Int'l Conf. Scientific and Statistical Database Management, 1999.
[19] I. Kamel and C. Faloutsos, On Packing R-Trees Proc. Second Int'l Conf. Information and Knowledge Management (CIKM), 1993.
[20] I. Kamel and C. Faloutsos, Hilbert R-Tree: An Improved R-Tree Using Fractals Proc. 20th Int'l Conf. Very Large Databases, 1994.
[21] C. Kolovson and M. Stonebraker, "Indexing Techniques for Historical Databases," Proc. IEEE Conf. Data Eng., pp. 127-137, 1989.
[22] C. Kolovson and M. Stonebraker, Segment Indexes: Dynamic Indexing Techniques for Multi-Dimensional Interval Data Proc. ACM SIGMOD Int'l Conf. Management of Data, 1991.
[23] J. Lee, D. Kim, and C. Chung, Multidimensional Selectivity Estimation Using Compressed Histogram Information Proc. 1999 ACM SIGMOD Int'l Conf. Management of Data, 1999.
[24] S.T. Leutenegger and M.A. Lopez, The Effect of Buffering on the Performance of R-Trees IEEE Trans. Knowledge and Data Eng., vol. 12, no. 1, pp. 33-44, Jan./Feb. 2000.
[25] X. Lin, Q. Liu, Y. Yuan, and X. Zhou, Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets Proc. 29th Int'l Conf. Very Large Data Bases, 2003.
[26] Y. Mattias, J. Vitter, and M. Wang, Dynamic Maintenance of Wavelet-Based Histograms Proc. 26th Int'l Conf. Very Large Data Bases, 2000.
[27] Y. Mattias, J. Vitter, and M. Wang, Wavelet-Based Histograms for Selectivity Estimation Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, 1998.
[28] O. Procopiuc, P. Agarwal, L. Arge, and J. Vitter, Bkd-Tree: A DynamicScalable kd-Tree Proc. Eighth Int'l Symp. Spatial and Temporal Databases, 2003.
[29] G. Proietti and C. Faloutsos, Accurate Modeling of Region Data IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 874-383, Nov./Dec. 2001.
[30] G. Proietti and C. Faloutsos, I/O Complexity for Range Queries on Region Data Stored Using an R-Tree Proc. Int'l Conf. Data Eng., 1999.
[31] Y. Poosala and Y. Ioannidis, Selectivity Estimation without the Attribute Value Independence Assumption Proc. 23rd Int'l Conf. on Very Large Data Bases, 1997.
[32] D. Pfoser, C. Jensen, and Y. Theodoridis, Novel Approaches to the Indexing of Moving Object Trajectories Proc. 26th Int'l Conf. Very Large Databases, 2000.
[33] B.U. Pagel and H.W. Six, Are Window Queries Representative for Arbitrary Range Queries? Proc. 15th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems 1996.
[34] B.U. Pagel, H.W. Six, H. Toben, and P. Widmayer, Towards an Analysis of Range Query Performance in Spatial Data Structures Proc. 12th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, 1993.
[35] D. Papadias, Y. Tao, P. Kalnis, and J. Zhang, Indexing Spatio-Temporal Data Warehouses Proc. Int'l Conf. Data Eng., 2002.
[36] M. Stonebraker, The Design of Postgres Storage System Proc. 13th Conf. Very Large Databases, 1987.
[37] C. Sun, D. Agrawal, and A. El Abbadi, Selectivity Estimation for Spatial Joins with Geometric Selections Proc. Conf. Extending Database Technology, 2002.
[38] S. Saltenis and C. Jensen, Indexing the Positions of Continuously Moving Objects The VLDB J., vol. 11, no. 1, pp. 1-16, 2002.
[39] T. Sellis, N. Roussopoulos, and C. Faloutsos, The R+-Tree: A Dynamic Index for Multi-Dimensional Objects Proc. 13th Conf. Very Large Databases, 1987.
[40] B. Salzberg and V. Tsotras, Comparison of Access Methods for Temporal Data ACM Computing Surveys, vol. 31, no. 2, pp. 158-221, 1999.
[41] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, The A-Tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation Proc. 26th Int'l Conf. Very Large Data Bases, 2000.
[42] N. Thaper, S. Guha, P. Indyk, and N. Koudas, Dynamic Multidimensional Histograms Proc. 2002 ACM SIGMOD Int'l Conf. Management of Data , 2002.
[43] http://www.census.gov/geo/www/tiger, 2004.
[44] Y. Tao and D. Papadias, Spatial Queries in Dynamic Environments ACM Trans. Database Systems, vol. 28, no. 2, pp. 101-139, 2003.
[45] Y. Tao, D. Papadias, and J. Zhang, Cost Models for Overlapping and Multi-Version Structures ACM Trans. Database Systems, vol. 27, no. 3, pp. 299-342, 2002.
[46] Y. Theodoridis and T. Sellis, A Model for the Prediction of R-Tree Performance Proc. 15th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems, 1996.
[47] Y. Theodoridis, E. Stefanakis, and T. K. Sellis, “Efficient Cost Models for Spatial Queries Using R-Trees,” IEEE Trans. Knowledge and Data Eng., vol. 12, no. 1 pp. 19-32, Jan./Feb. 2000.
[48] M. Vazirgiannis, Y. Theodoridis, and T. Sellis, Spatio-Temporal Composition and Indexing for Large Multimedia Applications ACM/Springer Multimedia J., vol. 6, no. 4, 1998.
[49] Y. Wu, D. Agrawal, and A. Abbadi, Applying the Golden Rule of Sampling for Query Estimation Proc. 2001 ACM SIGMOD Int'l Conf. Management of Data, 2001.
[50] http:/www.rtreeportal.org/, 2004.
[51] M. Wang, J. Vitter, L. Lim, and S. Padmanabhan, Wavelet-Based Cost Estimation for Spatial Queries Proc. Seventh Int'l Symp. Spatial and Temporal Databases, 2001.
[52] S. Yao, Random 2-3 Trees Acta Informatica, vol. 2, no. 9, pp. 159-179, 1978.

Index Terms:
Database, spatial database, R-tree, cost model.
Citation:
Yufei Tao, Dimitris Papadias, "Performance Analysis of R*-Trees with Arbitrary Node Extents," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 653-668, June 2004, doi:10.1109/TKDE.2004.13
Usage of this product signifies your acceptance of the Terms of Use.