This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition
September 2005 (vol. 17 no. 9)
pp. 1208-1222
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.

[1] C.C. Aggarwal, “On the Effects of Dimensionality Reduction on High Dimensional Similarity Search,” Proc. ACM Symp. Principles of Database Systems Conf., 2001.
[2] P.N. Belhumeour, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
[3] M.W. Berry, S.T. Dumais, and G.W. O'Brie, “Using Linear Algebra for Intelligent Information Retrieval,” SIAM Rev., vol. 37, pp. 573-595, 1995.
[4] C. Bohm, S. Berchtold, and D.A. Keim, “Searching in High-Dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases,” ACM Computing Surveys, vol. 33, no. 3, pp. 322-373, 2001.
[5] V. Castelli, A. Thomasian, and C.-S. Li, “Csvd: Clustering and Singular Value Decomposition for Approximate Similarity Searches in High Dimensional Space,” IEEE Trans. Knowledge Discovery and Data Eng., vol. 15, no. 3, pp. 671-685, May/June 2003.
[6] S. Chakrabarti, S. Roy, and M. Soundalgekar, “Fast and Accurate Text Classification via Multiple Linear Discriminant Projections,” Proc. Very Large Data Bases Conf., pp. 658-669, 2002.
[7] S. Chandrasekaran, B.S. Manjunath, Y.F. Wang, J. Winkeler, and H. Zhang, “An Eigenspace Update Algorithm for Image Analysis,” Graphical Models and Image Processing: GMIP, vol. 59, no. 5, pp. 321-332, 1997.
[8] C. Chatterjee and V.P. Roychowdhury, “On Self-Organizing Algorithms and Networks for Class-Separability Features,” IEEE Trans. Neural Networks, vol. 8, no. 3, pp. 663-678, 1997.
[9] J.W. Daniel, W.B. Gragg, L. Kaufman, and G.W. Stewart, “Reorthogonalization and Stable Algorithms for Updating the Gram-Schmidt Gr Factorization,” Math. Computation, vol. 30, pp. 772-795, 1976.
[10] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, “Indexing by Latent Semantic Analysis,” J. Soc. Information Science, vol. 41, pp. 391-407, 1990.
[11] I.S. Dhillon and D.S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, vol. 42, pp. 143-175, 2001.
[12] R.O. Duda, P.E. Hart, and D. Stork, Pattern Classification. Wiley, 2000.
[13] S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data,” J. Am. Statistical Assoc., vol. 97, no. 457, pp. 77-87, 2002.
[14] J. Yan et al. “IMMC: Incremental Maximum Margin Criterion,” Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 725-730, 2004.
[15] K. Hiraoka et al. “Fast Algorithm for Online Linear Discriminant Analysis,” IEICE Trans. Fundamentals, vol. E84-A, no. 6, pp. 1431-1441, 2001.
[16] J.H. Friedman, “Regularized Discriminant Analysis,” J. Am. Statistical Assoc., vol. 84, no. 405, pp. 165-175, 1989.
[17] K. Fukunaga, Introduction to Statistical Pattern Classification. Academic Press, 1990.
[18] G.H. Golub and C.F. Van Loan, Matrix Computations, third ed. Baltimore, M.D.: The Johns Hopkins Univ. Press, 1996.
[19] P. Hall, D. Marshall, and R. Martin, “Merging and Splitting Eigenspace Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 9, pp. 1042-1049, Sept. 2000.
[20] T. Hastie and R. Tibshirani, “Discriminant Analysis by Gaussian Mixtures,” J. Royal Statistical Soc. series B, vol. 58, pp. 158-176, 1996.
[21] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[22] P. Howland, M. Jeon, and H. Park, “Structure Preserving Dimension Reduction for Clustered Text Data Based on the Generalized Singular Value Decomposition,” SIAM J. Matrix Analysis and Applications, vol. 25, no. 1, pp. 165-179, 2003.
[23] I.T. Jolliffe, Principal Component Analysis. New York: Springer-Verlag, 1986.
[24] K.V. Ravi Kanth, D. Agrawal, A. El Abbadi, and A. Singh, “Dimensionality Reduction for Similarity Searching in Dynamic Databases,” Computer Vision and Image Understanding: CVIU, vol. 75, nos. 1-2, pp. 59-72, 1999.
[25] W.J. Krzanowski, P. Jonathan, W.V McCarthy, and M.R. Thomas, “Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data,” Applied Statistics, vol. 44, pp. 101-115, 1995.
[26] H. Li, J. Tao, and K. Zhang, “Efficient and Robust Feature Extraction by Maximum Margin Criterion,” Advances in Neural Information Processing Systems 16, Cambridge, Mass.: MIT Press, 2004.
[27] J. Mao and K. Jain, “Artificial Neural Networks for Feature Extraction and Multivariate Data Projection,” IEEE Trans. Neural Networks, vol. 6, no. 2, pp. 296-317, 1995.
[28] A. Martinez and A. Kak, “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, pp. 228-233, 2001.
[29] A.M. Martinez and R. Benavente, “The AR Face Database,” Technical Report No. 24, 1998.
[30] H. Park, M. Jeon, and J.B. Rosen, “Lower Dimensional Representation of Text Data Based on Centroids and Least Squares,” BIT Numerical Math., vol. 43, no. 2, pp. 1-22, 2003.
[31] R. Polikar, L. Udpa, S. Udpa, and V. Honavar, “Learn++: An Incremental Learning Algorithm for Supervised Neural Networks,” IEEE Trans. Systems, Man, and Cybernetics, vol. 31, pp. 497-508, 2001.
[32] M.F. Porter, “An Algorithm for Suffix Stripping Program,” Program, vol. 14, no. 3, pp. 130-137, 1980.
[33] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
[34] D.L. Swets and J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, Aug. 1996.
[35] F.D.L. Torre and M. Black, “Robust Principal Component Analysis for Computer Vision,” Proc. Int'l Conf. Computer Vision, pp. 362-369, 2001.
[36] J. Ye, R. Janardan, C.H. Park, and H. Park, “An Optimization Criterion for Generalized Discriminant Analysis on Undersampled Problems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 982-994, Aug. 2004.
[37] Y. Zhao and G. Karypis, “Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering,” Machine Learning, vol. 55, no. 3, pp. 311-331, 2004.

Index Terms:
Index Terms- Dimension reduction, linear discriminant analysis, incremental learning, QR Decomposition, Singular Value Decomposition (SVD).
Citation:
Jieping Ye, Qi Li, Hui Xiong, Haesun Park, Ravi Janardan, Vipin Kumar, "IDR/QR: An Incremental Dimension Reduction Algorithm via QR Decomposition," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1208-1222, Sept. 2005, doi:10.1109/TKDE.2005.148
Usage of this product signifies your acceptance of the Terms of Use.