This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Toward Efficient Multifeature Query Processing
March 2006 (vol. 18 no. 3)
pp. 350-362
In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range (minimum and maximum values) of the f features with respect to a reference point (the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B^+{\hbox{-}}\rm tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files.

[1] kdd.ics.uci.edu/databases/corelfeaturescorelfeatures.html , 2005.
[2] S. Berchtold, C. Böhm, H.V. Jagadish, H.-P. Kriegel, and J. Sander, “Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces,” Proc. Int'l Conf. Data Eng., pp. 577-588, 2000.
[3] S. Berchtold, C. Böhm, and H.-P. Kriegel, “The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,” Proc. SIGMOD Conf., pp. 142-153, 1998.
[4] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When Is Nearest Neighbors Meaningful,” Proc. Int'l Conf. Data Transfer, pp. 217-235, 1999.
[5] C. Böhm, S. Berchtold, and D. Keim, “Searching in High-Dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases,” ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, 2001.
[6] K. Chakrabarti and S. Mehrotra, “Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces,” Proc. Conf. Very Large Data Bases, pp. 89-100, 2000.
[7] P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces,” Proc. Conf. Very Large Data Bases, pp. 426-435, 1997.
[8] A. deVries, N. Mamoulis, N. Nes, and M. Kersten, “Efficient k-NN Search on Vertically Decomposed Data,” Proc. SIGMOD Conf., pp. 322-333, 2002.
[9] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms for Middleware,” Proc. ACM Symp. Principles of Database Systems, pp. 102-113, 2001.
[10] V. Gaede and O. Gunther, “Multidimensional Access Methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170-231, 1998.
[11] U. Guntzer, W.-T. Balke, and W. Kiessling, “Optimizing Multifeature Queries for Image Databases,” Proc. Conf. Very Large Data Bases, pp. 261-281, 2000.
[12] H. Jin, B. Ooi, H. Shen, C. Yu, and A. Zhou, “An Adaptive and Efficient Dimensionality Reduction Algorithm for High-Dimensional Indexing,” Proc. Int'l Conf. Data Eng., pp. 87-98, 2003.
[13] N. Koudas, B. Ooi, H.T. Shen, and A. Tung, “Ldc: Enabling Search by Partial Distance in a Hyper-Dimensional Space,” Proc. Int'l Conf. Data Eng., 2004.
[14] A.H. Ngu, Q. Sheng, D. Huynh, and R. Lei, “Combining Multivisual Features for Efficient Indexing in a Large Image Database,” VLDB J., vol. 9, no. 4, pp. 279-293, 2001.
[15] B. Ooi, K. Tan, C. Yu, and S. Bressan, “Indexing the Edges— A Simple and Yet Efficient Approach to High-Dimensional Indexing,” Proc. ACM Symp. Principles of Database Systems, pp. 166-174, 2000.
[16] Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, “The A-Tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation,” Proc. Conf. Very Large Data Bases, pp. 516-526, 2000.
[17] J.Z. Wang, G. Wiederhold, O. Firschein, and S.X. Wei, “Content-Based Image Indexing and Searching Using Daubechies Wavelets,” Int'l J. Digital Libraries, vol. 1, no. 4, pp. 311-328, 1998.
[18] R. Weber, H. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity Search Methods in High Dimensional Spaces,” Proc. Conf. Very Large Data Bases, pp. 194-205, 1998.
[19] C. Yu, B. Ooi, K. Tan, and H.V. Jagadish, “Indexing the Distance: An Efficient Method to KNN Processing,” Proc. Conf. Very Large Data Bases, pp. 166-174, 2001.

Index Terms:
Index Terms- Multifeature, indexing, query processing, high-dimensional, weighted query.
Citation:
H.V. Jagadish, Beng Chin Ooi, Heng Tao Shen, Kian-Lee Tan, "Toward Efficient Multifeature Query Processing," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 350-362, March 2006, doi:10.1109/TKDE.2006.51
Usage of this product signifies your acceptance of the Terms of Use.