
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
C. Traina, Jr., A. Traina, C. Faloutsos, B. Seeger, "Fast Indexing and Visualization of Metric Data Sets using SlimTrees," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 244260, March/April, 2002.  
BibTex  x  
@article{ 10.1109/69.991715, author = {C. Traina, Jr. and A. Traina and C. Faloutsos and B. Seeger}, title = {Fast Indexing and Visualization of Metric Data Sets using SlimTrees}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {14}, number = {2}, issn = {10414347}, year = {2002}, pages = {244260}, doi = {http://doi.ieeecomputersociety.org/10.1109/69.991715}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Fast Indexing and Visualization of Metric Data Sets using SlimTrees IS  2 SN  10414347 SP244 EP260 EPD  244260 A1  C. Traina, Jr., A1  A. Traina, A1  C. Faloutsos, A1  B. Seeger, PY  2002 KW  metric databases KW  metric access methods KW  index structures KW  multimedia databases KW  selectivity estimation KW  similarity search VL  14 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Many recent database applications must deal with similarity queries. For such applications, it is important to measure the similarity between two objects using the distance between them. Focusing on this problem, this paper proposes the Slimtree, a new dynamic tree for organizing metric data sets in pages of fixed size. The Slimtree uses the triangle inequality to prune distance calculations needed to answer similarity queries over objects in metric spaces. The proposed insertion algorithm uses new policies to select the nodes where incoming objects are stored. When a node overflows, the Slimtree uses a Minimal Spanning Tree to help with the split. The new insertion algorithm leads to a tree with high storage utilization and improved query performance. The Slimtree is the first metric access method to tackle the problem of overlap between nodes in metric spaces and to propose a technique to minimize it. The proposed “fatfactor” is a way to quantify whether a given tree can be improved and also to compare two trees. We show how to use the fatfactor to achieve accurate estimates of the search performance and also how to improve the performance of a metric tree through the proposed “Slimdown” algorithm. This paper also presents a new tool in the arsenal of resources of Slimtree aimed at visualizing it. Visualization is a powerful tool for interactive data mining and for the visual tracking of the behavior of a tree under updates. Finally, we present a formula to estimate the number of disk accesses in range queries. Results from experiments with real and synthetic data sets show that the new algorithms of the Slimtree lead to performance improvements. These results show that the Slimtree outperforms the Mtree up to 200 percent for range queries. For insertion and split, the MinimalSpanningTreebased algorithm achieves up to 40 times faster insertions. We observed improvements up to 40 percent in range queries after applying the Slimdown algorithm.
[1] R. BaezaYates, W. Cunto, U. Manber, and S. Wu, “Proximity Matching Using FixedQueries Trees,” Combinatorial Pattern Matching, pp. 198212, 1994.
[2] N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger, “The R*Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[3] S. Berchtold, C. Böhm, and H.P. Kriegel, “A Cost Model for Nearest Neighbor Search in HighDimensional Data Spaces,” Proc. 16th ACM SIGACTSIGMODSIGART Symp. Principles of Database Systems (PODS), pp. 7886, 1997.
[4] T. Bozkaya and M. Ozsoyoglu, “DistanceBased Indexing for HighDimensional Metric Spaces,” Proc. SIGMOD Int'l Conf. Management of Data, pp. 357368, 1997.
[5] T. Bozkaya and M. Özsoyoglu, “Indexing Large Metric Spaces for Similarity Search Queries,” ACM Trans. Database Systems, vol. 24, no. 3, pp. 361404, Sept. 1999.
[6] S. Brin, “Near Neighbour Search in Large Metric Spaces,” Proc. 21st Int'l Conf. Very Large Data Bases, pp. 574584, Sept. 1995.
[7] W.A. Burkhard and R.M. Keller, “Some Approaches to BestMatch File Searching,” Comm. ACM, vol. 16, no. 4, pp. 230236, Apr. 1973.
[8] T. Chiueh, “Content Based Image Indexing,” Proc. 20th Int'l Conf. Very Large Data Bases, pp. 582593, Sept. 1994.
[9] P. Ciaccia and M. Patella, “Bulk Loading the MTree,” Proc. ADC Australasian Database Conf., pp. 1526, 1998.
[10] P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula, “Indexing Metric Spaces with MTree,” Proc. Atti del Quinto Convegno Nazionale SEBD, pp. 6786, June 1997.
[11] P. Ciaccia, M. Patella, and P. Zezula, “MTree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. Int'l Conf. Very Large Data Bases, 1997.
[12] P. Ciaccia, M. Patella, and P. Zezula, “A Cost Model for Similarity Queries in Metric Spaces,” Proc. Principles of Database Systems (PODS '98), pp. 59–68, June 1998.
[13] C. Faloutsos and I. Kamel, “Beyond Uniformity and Independence: Analysis of RTrees Using the Concept of Fractal Dimension,” Proc. 13th ACM Symp. Principles of Database Systems (PODS), 1994.
[14] C. Faloutsos and K.I. Lin, “Fastmap: A Fast Algorithm for Indexing, DataMining and Visualization of Traditional and Multimedia Datasets,” Proc. SIGMOD, Int'l Conf. Management of Data, pp. 163174, 1995.
[15] V. Gaede and O. Guenther, “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 123169, 1998.
[16] Y.J.R. Garcia, M.A. Lopez, and S.T. Leutenegger, “On Optimal Node Splitting for RTrees,” Proc. Int'l Conf. Very Large Databases (VLDB '98), 1998.
[17] A. Guttman, “RTrees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[18] G. Hristescu and M. FarachColton, “ClusterPreserving Embedding of Proteins,” DIMACS, Technical Report 9950, 1999.
[19] T. Johnson and D. Shasha, “Utilization of BTrees with Inserts, Deletes and Modifies,” Proc. ACM Symp. Principles of Database Systems (PODS), pp. 235246, 1989.
[20] J.B. Kruskal, “On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem,” Proc. Am. Math Soc., vol. 7, pp. 4850, 1956.
[21] R.T. Martins, R. Hasegawa, M.G.V. Nunes, G. Montilha, and O.N. Oliveira, “Linguistic Issues in the Development of ReGra: A Grammar Checker for Brazilian Portuguesse,” Natural Language Eng., vol. 4, no. 4, pp. 287307, Dec. 1997.
[22] B. Pagel, F. Korn, and C. Faloutsos, Deflating the Dimensionality Curse Using Multiple Fractal Dimensions Proc. IEEE Int'l Conf. Database Eng., 2000.
[23] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+Tree: A Dynamic Index for Multidimensional Objects,” Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 1987.
[24] M.A. Shah, M. Kornacker, and J.M. Hellerstein, “amdb: A Visual Access Method Development Tool,” Proc. Int'l Workshop User Interfaces to Data Intensive System, pp. 130140, 1999.
[25] D. Shasha, T.L. Wang, “New Techniques for BestMatch Retrieval,” ACM Trans. Information Systems, vol. 8, no. 2, pp. 140158, Apr. 1990.
[26] C. Traina, A.J.M. Traina, and C. Faloutsos, “Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees,” Technical Report CMUCS99110, Carnegie Mellon Univ., Pittsburgh, Pa., Mar. 1999.
[27] C. Traina, A.J.M. Traina, and C. Faloutsos, “Distance Exponent: A New Concept for Selectivity Estimation in Metric Trees,” Proc. Int'l Conf. Data Engineering (ICDE), p. 195, 2000.
[28] C. Traina, A.J.M. Traina, B. Seeger, and C. Faloutsos, “SlimTrees: High Performance Metric Trees Minimizing Overlap Between Nodes,” Proc. Int'l Conf. Extending Database Technology, pp. 5165, 2000.
[29] J.K. Uhlmann, “Satisfying General Proximity/Similarity Queries with Metric Trees,” Information Processing Letter, vol. 40, no. 4, pp. 175179, Nov. 1991.
[30] H. Wactlar, T. Kanade, M.A. Smith, and S.M. Stevens, “Intelligent Access to Digital Video: The Informedia Project,” Computer, vol. 29, no. 5, pp. 4652, 1996.
[31] J.T.L. Wang, X. Wang, K.I. Lin, D. Shasha, B.A. Shapiro, and K. Zhang, “Evaluating a Class of DistanceMapping Algorithms for Data Mining and Clustering,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 307311, Aug. 1999.
[32] P. Yianilos, “Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces,” Proc. Third Ann. ACMSIAM Symp. Discrete Algorithms, pp. 311321, 1993.