This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Multiple Similarity Queries: A Basic DBMS Operation for Mining in Metric Databases
January/February 2001 (vol. 13 no. 1)
pp. 79-95

Abstract—Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest-neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries. In this paper, we introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach.

[1] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[2] S. Berchtold, C. Böhm, B. Braunmüller, D. Keim, and H.-P. Kriegel, “Fast Parallel Similarity Search in Multimedia Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1-12, 1997.
[3] S. Berchtold, C. Böhm, and H.-P. Kriegel, “A Cost Model for Nearest Neighbor Search in High-Dimensional Data Spaces,” Proc. 16th ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS), pp. 78-86, 1997.
[4] S. Berchtold, D. Keim, and H.-P. Kriegel, “The X-Tree: An Index Structure for High-Dimensional Data,” Proc. 22nd Conf. Very Large Data Bases, pp. 28-39, 1996.
[5] B. Braunmüller, M. Ester, H.-P. Kriegel, and J. Sander, “Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases,” Proc. Int'l Conf. Data Eng., 2000.
[6] K. Chakrabarti, K. Porkaew, and S. Mehrotra, “Efficient Query Refinement in Multimedia Databases,” Proc. Int'l Conf. Data Eng., 2000.
[7] P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. Int'l Conf. Very Large Data Bases, 1997.
[8] M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander, “Algorithms for Characterization and Trend Detection in Spatial Databases,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 44–50, 1998.
[9] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Second Int'l Conf. Knowledge Discovery and Data Mining, pp. 226–231, 1996.
[10] C. Faloutsos, R. Barber, M. Flicker, J. Hafner, W. Niblack, and W. Equitz, "Efficient and effective querying by image content," J. Intell. Information Systems," vol. 3, pp. 231-262, 1994.
[11] U.M. Fayyad, G. Piatesky-Shapiro, and P. Smith, “From Data Mining to Knowledge Discovery: An Overview,” Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, U.M. Fayyad et al., eds., pp. 1-34, 1996.
[12] V. Gaede and O. Guenther, “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 123-169, 1998.
[13] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[14] J. Hafner, H.S. Sawhney, W. Equitz, M. Flickner, and W. Niblack, “Efficient Color Histogram Indexing for Quadratic Form Distance Functions,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 7, pp. 729-736, July 1995.
[15] E. Høg, “The Tycho Catalogue,” J. Astronomy and Astrophysics, vol. 323, pp. L57–L60, 1997.
[16] G.R. Hjaltason and H. Samet, “Ranking in Spatial Databases,” Proc. Fourth Int'l Symp. Large Spatial Databases, pp. 83-95, 1995.
[17] K. Kaindl and B. Steipe, Metric Properties of RMSD After Optimal Superposition, Acta Cryst, A53, p. 809, 1997.
[18] D. Keim, “Efficient Support of Similarity Search in Spatial Data Bases,” habilitation thesis, Univ. of Munich, 1997.
[19] K. Koperski and J. Han, “Discovery of Spatial Association Rules in Geographic Information Databases,” Proc. Fourth Int'l Symp. Large Spatial Databases (SSD '95), pp. 47–66, Portland, Maine, Aug. 1995.
[20] D.A. Keim, J.P. Lee, B. Thuraisinghaman, and C. Wittenbrink, “Database Issues for Data Visualization: Supporting Interactive Database Exploration,” Proc. Workshop Database Issues for Data Visualization, 1995.
[21] E.M. Knorr and R.T. Ng, “Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 884–897, 1996.
[22] T.M. Mitchell, Machine Learning. McGraw-Hill, 1997.
[23] B. Mombasher, N. Jain, E.-H. Han, and J. Srivastava, “Web Mining: Pattern Discovery from World Wide Web Transactions,” Technical Report 96-050, Univ. of Minnesota, 1996.
[24] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C. Cambridge Univ. Press, 1992.
[25] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe, “Efficient and Extensible Algorithms for Multi Query Optimization,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
[26] H. Samet, The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990.
[27] T. Seidl and H.-P. Kriegel, "Efficient User-Adaptable Similarity Search in Large Multimedia Databases," Proc. 23rd VLDB Conf., pp. 506-515,Athens, Aug. 1997.
[28] S.N. Subramanian and S. Venkataraman, “Cost Based Optimization of Decision Support Queries Using Transient Views,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1998.
[29] R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. Very Large Data Base Conf. (VLDB '98), pp. 194–205, Aug. 1998.

Index Terms:
Knowledge discovery in databases, data mining, similarity search, efficient query processing, high-dimensional indexing.
Citation:
Bernhard Braunmüller, Martin Ester, Hans-Peter Kriegel, Jörg Sander, "Multiple Similarity Queries: A Basic DBMS Operation for Mining in Metric Databases," IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 1, pp. 79-95, Jan.-Feb. 2001, doi:10.1109/69.908982
Usage of this product signifies your acceptance of the Terms of Use.