
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
 
ASCII Text  x  
?mer Egecioglu, Hakan Ferhatosmanoglu, Umit Ogras, "Dimensionality Reduction and Similarity Computation by InnerProduct Approximations," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 6, pp. 714726, June, 2004.  
BibTex  x  
@article{ 10.1109/TKDE.2004.9, author = {?mer Egecioglu and Hakan Ferhatosmanoglu and Umit Ogras}, title = {Dimensionality Reduction and Similarity Computation by InnerProduct Approximations}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {16}, number = {6}, issn = {10414347}, year = {2004}, pages = {714726}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2004.9}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Dimensionality Reduction and Similarity Computation by InnerProduct Approximations IS  6 SN  10414347 SP714 EP726 EPD  714726 A1  ?mer Egecioglu, A1  Hakan Ferhatosmanoglu, A1  Umit Ogras, PY  2004 KW  Innerproduct approximation KW  dimensionality reduction KW  p{\hbox{}}\rm NORMS KW  similarity search KW  highdimensional data. VL  16 JA  IEEE Transactions on Knowledge and Data Engineering ER   
Abstract—As databases increasingly integrate different types of information such as multimedia, spatial, timeseries, and scientific data, it becomes necessary to support efficient retrieval of multidimensional data. Both the dimensionality and the amount of data that needs to be processed are increasing rapidly. Reducing the dimension of the feature vectors to enhance the performance of the underlying technique is a popular solution to the infamous curse of dimensionality. We expect the techniques to have good quality of distance measures when the similarity distance between two feature vectors is approximated by some notion of distance between two lowerdimensional transformed vectors. Thus, it is desirable to develop techniques resulting in accurate approximations to the original similarity distance. In this paper, we investigate dimensionality reduction techniques that
[1] A. Acharya, M. Uysal, and J. Saltz, Active Disks: Programming Model, Algorithms and Evaluation Proc. Eighth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, pp. 8191, May 1998.
[2] R. Agrawal, C. Faloutsos, and A. Swami, Efficient Similarity Search in Sequence Databases Proc. Fourth Int'l Conf. Foundations of Data Organization and Algorithms, pp. 6984, 1993.
[3] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, The R*Tree: An Efficient and Robust Access Method for Points and Rectangles Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 322331, May 1990.
[4] S. Berchtold, C. Bohm, D. Keim, and H. Kriegel, A Cost Model for Nearest Neighbor Search in HighDimensional Data Space Proc. ACM Symp. Principles of Database Systems, 1997.
[5] S. Berchtold, C. Bohm, and H.P. Kriegel, The PyramidTechnique: Towards Breaking the Curse of Dimensionality Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 142153, June 1998.
[6] S. Berchtold, D. Keim, and H. Kriegel, The XTree: An Index Structure for HighDimensional Data Proc. Int'l Conf. Very Large Data Bases, pp. 2839, 1996.
[7] P. Bernstein, M. Brodie, S. Ceri, D. DeWitt, M. Franklin, H. GarciaMolina, J. Gray, J. Held, J. Hellerstein, H. Jagadish, M. Lesk, D. Maier, J. Naughton, H. Pirahesh, M. Stonebraker, and J. Ullman, The Asilomar Report on Database Research Sigmod Record, vol. 27, no. 4, Dec. 1998.
[8] E. Bingham, H. Mannila, Random Projection in Dimensionality Reduction: Applications to Image and Text Data Proc. Int'l Conf. Knowledge Discovery and Data Mining, 2001.
[9] K.R. Castleman, Digital Image Processing. PrenticeHall 1996.
[10] K. Chakrabarti and S. Mehrotra, Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces The VLDB J., pp. 89100, 2000.
[11] M.S. Charikar, Similarity Estimation Techniques from Rounding Algorithms Proc. 34th Ann. ACM Symp. Theory of Computing, 2002.
[12] X. Cheng, R. Dolin, M. Neary, S. Prabhakar, K. Ravikanth, D. Wu, D. Agrawal, A. El Abbadi, M. Freeston, A. Singh, T. Smith, and J. Su, Scalable Access within the Context of Digital Libraries IEEE Proc. Int'l Conf. Advances in Digital Libraries (ADL), pp. 7081, 1997.
[13] S. Dasgupta and A. Gupta, An Elementary Proof of the JohnsonLindenstrauss Lemma Technical Report TR99006, Int'l Computer Science Inst., Berkeley, 1999.
[14] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Launder, and R. Harshman, Indexing by Latent Semantic Analysis J. Am. Soc. for Information Science, vol. 41, pp. 391407, 1990.
[15] D. Hull, Improving Text Retrieval for the Routing Problem Using Latent Semantic Indexing Proc. 17th ACMSIGIR Conf., pp. 282291, 1994.
[16] S.T. Dumais, Improving the Retrieval of Information from External Sources Behavior Research Methods, Instruments and Computers, vol. 23, pp. 229236, 1991.
[17] Ö. Egecioglu, How to Approximate the InnerProduct: Fast Dynamic Algorithms for Similarity Technical Report TRCS9837, Dept. of Computer Science, Univ. of California at Santa Barbara, Dec. 1998.
[18] Ö. Egecioglu and H. Ferhatosmanoglu, Dimensionality Reduction and Similarity Distance Computation by Inner Product Approximations Proc. Ninth ACM Int'l Conf. Information and Knowledge Management, pp. 219226, Nov. 2000.
[19] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, and W. Equitz, Efficient and Effective Querying by Image Content J. Intelligent Information Systems, vol. 3, pp. 231262, 1994.
[20] C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, Fast Subsequence Matching in TimeSeries Databases Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 419429, May 1994.
[21] H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi, Vector Approximation Based Indexing for NonUniform High Dimensional Data Sets Proc. Ninth ACM Int'l Conf. Information and Knowledge Management, pp. 202209, Nov. 2000.
[22] H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi, Approximate Nearest Neighbor Searching in Multimedia Databases Proc. 17th IEEE Int'l Conf. Data Eng. (ICDE), pp. 503511, Apr. 2001.
[23] A. Gionis, P. Indyk, and R. Motwani, Similarity Searching in High Dimensions via Hashing Proc. Int'l Conf. Very Large Data Bases, pp. 518529, Sept. 1999.
[24] A. Guttman, RTrees: A Dynamic Index Structure for Spatial Searching Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 4757, 1984.
[25] N.A.J. Hastings and J.B. Peacock, Statistical Distributions. New York, Halsted Press, 1975.
[26] N.S. Jayant and P. Noll, Digital Coding of Waveforms. PrenticeHall, 1984.
[27] T. Kailath, Modern Signal Processing. Springer Verlag, 1985.
[28] K.V.R. Kanth, D. Agrawal, and A. Singh, Dimensionality Reduction for Similarity Searching in Dynamic Databases Proc. ACM SIGMOD Int'l Conf. Management of Data, 1998.
[29] F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, Fast Nearest Neighbor Search in Medical Image Databases Proc. Int'l Conf. Very Large Data Bases, pp. 215226, 1996.
[30] K. Lin, H.V. Jagadish, and C. Faloutsos, The TVTree: An Index Structure for HighDimensional Data VLDB J., vol. 3, pp. 517542, 1995.
[31] D.B. Lomet and B. Salzberg, The hbTree: A MultiAttribute Indexing Method with Good Guaranteed Performance ACM Trans. Database Systems, vol. 15, no. 4, pp. 625658, Dec. 1990.
[32] B.S. Manjunath and W.Y. Ma, “Texture Features for Browsing and Retrieval of Image Data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837842, Aug. 1996
[33] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker, The QBIC Project: Querying Images by Content Using Color, Texture and Shape Proc. SPIE Conf. 1908 on Storage and Retrieval for Image and Video Databases, vol. 1908, pp. 173187, Feb. 1993.
[34] A.V. Oppenheim and R.W. Schafer, DiscreteTime Signal Processing. PrenticeHall, 1989.
[35] J.T. Robinson, The kdbTree: A Search Structure for Large MultiDimensional Dynamic Indexes Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 1018, 1981.
[36] SciDAC, Scientific data management center,http://sdm.lbl.govsdmcenter/, 2002.
[37] T. Seidl and H.P. Kriegel, Efficient UserAdaptable Similarity Search in Large Multimedia Databases Proc. Int'l Conf. Very Large Data Bases, pp. 506515, 1997.
[38] T. Seidl and H.P. Kriegel, Optimal Multistep kNearest Neighbor Search Proc. ACM SIGMOD Int'l Conf. Management of Data, June 1998.
[39] V.S. Subrahmanian, Principles of Multimedia Database Systems. Morgan Kaufmann Publishers, 1999.
[40] M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, NonLinear Dimensionality Reduction Techniques for Classification and Visualization Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, July 2002.
[41] D. White and R. Jain, “Similarity Indexing with the SSTree,” Proc. 12th Int'l Conf. Data Eng., 1996.
[42] D. Wu, D. Agrawal, A. El Abbadi, and T.R. Smith, Efficient Retrieval for Browsing Large Image Databases Proc. Conf. Information and Knowledge Management, pp. 1118, Nov. 1996.
[43] Y. Wu, D. Agrawal, and A. El Abbadi, A Comparison of DFT and DWT Based Similarity Search in TimeSeries Databases Proc. Ninth Int'l Conf. Information and Knowledge Management, 2000.