This Article 
 Bibliographic References 
 Add to: 
Multilevel Filtering for High-Dimensional Image Data: Why and How
November/December 1999 (vol. 11 no. 6)
pp. 916-928

Abstract—It has been shown that filtering is a promising way to support efficient content-based retrievals from image data. However, all existing studies on filtering restrict their attention to two levels. In this paper, we consider filtering structures that have at least three levels. In the first half of the paper, by analyzing the CPU and I/O costs of various structures, we provide analytic evidence on why three-level structures can often outperform corresponding two-level ones. We provide further experimental results showing that the three-level structures are typically the best, and can beat the two-level ones by a wide margin. In the second half of the paper, we study how to find the (near-) optimal three-level structure for a given dataset. We develop an optimizer that can handle this task effectively and efficiently. Experimental results indicate that in tens of seconds of CPU time, the optimizer can find a filtering structure whose total runtime per query exceeds that of the real optimal structure by only 2-3 percent .

[1] A. Alexandrov, W. Ma, A. Abbadi, and B. Manjunath, “Adaptive Filtering and Indexing for Image Databases,” Proc. Storage and Retrieval for Image and Video Databases III, vol. 2,420, pp. 12–23, SPIE, 1995.
[2] J.R. Bach, S. Paul, and R. Jain, “A Visual Information Management System for the Interactive Retrieval of Faces,” IEEE Trans. Knowledge and Data Eng., vol. 5, no. 4, pp. 619-628, 1993.
[3] J. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Jain, and C. Shu,“The Virage Image Search Engine: An Open Framework for Image Management,” Proc. Storage and Retrieval for Image and Video Databases IV, SPIE, 1996.
[4] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Conf. Management of Data, 1990.
[5] S. Berchtold, D. Keim, and H.-P. Kriegel, “The X-Tree: An Index Structure for High-Dimensional Data,” Proc. 22nd Conf. Very Large Data Bases, pp. 28-39, 1996.
[6] A.F. Cardenas, "Analysis and Performance of Inverted Data Base Structures," Comm. ACM, vol. 18, no. 5, pp. 253-263, May 1975.
[7] B. Chazelle, “Lower Bounds on the Complexity of Multidimensional Searching,” Proc. 24th IEEE Symp. Foundations of Computer Science, pp. 87–96, 1986.
[8] W. Cochran, Sampling Techniques. New York: Wiley, 1963.
[9] C. Faloutsos, R. Barber, M. Flicker, J. Hafner, W. Niblack, and W. Equitz, "Efficient and effective querying by image content," J. Intell. Information Systems," vol. 3, pp. 231-262, 1994.
[10] B. Flury, Common Principal Components and Related Multivariate Models, John Wiley and Sons, 1988.
[11] W.I. Gorsky and R. Mehrotra, "Index-Based Object Recognition in Pictorial Data Management," Computer Vision, Graphics, and Image Processing, vol. 52, pp. 416-436, 1990.
[12] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Conf. Management of Data, 1984.
[13] J. Hellerstein, E. Koutsoupias, and C. Papadimitriou, “On the Analysis of Indexing Schemes,” Proc. Principles of Database Systems (PODS '97), pp. 249–256, May 1997.
[14] K. Hirata and T. Kato, “Query by Visual Example,” Advances in Database Technology EDBT '92, Third Int'l Conf. Extending Database Technology, 1992.
[15] R. Hunt, Measuring Color, Ellis Horwood, 1987.
[16] R. Jain, “NSF Workshop on Visual Information Management Systems,” SIGmod Record , Vol. 22, No. 3, Dec. 1993, pp. 57-75.
[17] I. Kamel and C. Faloutsos, “On Packing R-Trees,” Proc. Second Int'l Conf. Information and Knowledge Management (CIKM), 1993.
[18] R. Kurniawati, J. Jin, and J. Shepherd, “The SS+-tree: An Improved Index Structure for Similarity Searches in a High-Dimensional Feature Space,” Proc. Storage and Retrieval for Image and Video Databases V, pp. 110–120, vol. 3,022, SPIE, 1997.
[19] K. Lin, H.V. Jagadish, and C. Faloutsos, “The TV-Tree: An Index Structure for High-Dimensional Data,” VLDB J., vol. 3, pp. 517-542, 1995.
[20] R. Ng and A. Sedighian, “Multi-Dimensional Indexing Structures for Images Transformed by Principal Component Analysis,” Proc. SPIE Conf. on Storage and Retrieval for Image and Video Databases V, pp. 50-61, 1996.
[21] W. Niblack et al., “The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,” IBM Research Report No. RJ9203, IBM Almaden Research Center, 1993.
[22] J. Nievergelt, H. Hinterberger, and K.C. Sevcik, "The Grid File: An Adaptable, Symmetric Multikey File Structure," ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, Mar. 1984.
[23] B.-U. Pagel, H.-W. Six, H. Toben, and P. Widmayer, “Towards an Analysis of Range Query Performance,” Proc. 12th ACM Symp. Principles of Database Systems (PODS), 1993.
[24] F. Rabitti and P. Savino, “An Information Retrieval Approach for Image Databases,” Proc. 18th VLDB, pp. 574–584, 1992.
[25] N.S. Rao, S.S. Iyengar, and R.L. Kashyap, “An Average-Case Analysis of MAT and Inverted File,” Theoretical Computer Science, vol. 62, pp. 251–266, 1988.
[26] H. Samet, The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990.
[27] C. Sarndal, B. Swensson, and J. Wretman, Model Assisted Survey Sampling. New York: Springer-Verlag, 1992.
[28] H. Sawhney and J. Hafner, “Efficient Color Histogram Indexing,” Proc. Int'l Conf. Image Processing, pp. 66-70, 1994.
[29] T. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+-Tree: A Dynamic Index for Multidimensional Objects,” Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 1987.
[30] J.R. Smith and S.F. Chang, “VisualSEEk: A Fully Automated Content-Based Image Query System,” ACM Multimedia '96, Nov. 1996.
[31] G. Strang, Linear Algebra and its Applications, third ed. Harcourt Brace Jova novich, 1988.
[32] M. Stricker and M. Swain, “The Capacity of Color Histogram Indexing,” Proc. Computer Vision and Pattern Recognition, pp. 704- 708, 1994.
[33] M.J. Swain and B.H. Ballard, “Color Indexing,” Int'l J. Computer Vision, vol. 7, no. 1, pp. 11-32, 1991.
[34] Y. Theodoridis and T. Sellis, “A Model for the Prediction of R-tree Performance,” Proc. 15th ACM Symp. Principles of Database Systems (PODS), 1996.
[35] D. Tam, “An Analysis of MultiLevel Filtering for High Dimensional Image Data,” MSc thesis, Dept. of Computer Science, Univ. of British Columbia, B.C., Canada, 1996.
[36] M. Turk and A. Pentland, "Face Recognition Using Eigenfaces," Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 1991, pp. 586-591.
[37] R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. Very Large Data Base Conf. (VLDB '98), pp. 194–205, Aug. 1998.
[38] D. White and R. Jain, “Similarity Indexing with the SS-Tree,” Proc. 12th Int'l Conf. Data Eng., 1996.
[39] S.B. Yao, "Approximating Block Accesses in Database Organizations," Comm. ACM, vol. 20, no. 4, pp. 260-261, Apr. 1977.

Index Terms:
Image databases, cost analysis, optimizer issues.
Raymond T. Ng, Dominic Tam, "Multilevel Filtering for High-Dimensional Image Data: Why and How," IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 6, pp. 916-928, Nov.-Dec. 1999, doi:10.1109/69.824605
Usage of this product signifies your acceptance of the Terms of Use.