This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hierarchical Discriminant Regression
November 2000 (vol. 22 no. 11)
pp. 1277-1293

Abstract—The main motivation of this paper is to propose a new classification and regression method for challenging high-dimensional data. The proposed new technique casts classification problems (class labels as output) and regression problems (numeric values as output) into a unified regression problem. This unified view enables classification problems to use numeric information in the output space that is available for regression problems but are traditionally not readily available for classification problems—distance metric among clustered class labels for coarse and fine classifications. A doubly clustered subspace-based hierarchical discriminating regression (HDR) method is proposed in this work. The major characteristics include: 1) Clustering is performed in both output space and input space at each internal node, termed “doubly clustered.” Clustering in the output space provides virtual labels for computing clusters in the input space. 2) Discriminants in the input space are automatically derived from the clusters in the input space. These discriminants span the discriminating subspace at each internal node of the tree. 3) A hierarchical probability distribution model is applied to the resulting discriminating subspace at each internal node. This realizes a coarse-to-fine approximation of probability distribution of the input samples, in the hierarchical discriminating subspaces. No global distribution models are assumed. 4) To relax the per class sample requirement of traditional discriminant analysis techniques, a sample-size dependent negative-log-likelihood (NLL) is introduced. This new technique is designed for automatically dealing with small-sample applications, large-sample applications, and unbalanced-sample applications. 5) The execution of HDR method is fast, due to the empirical logarithmic time complexity of the HDR algorithm. Although the method is applicable to any data, we report the experimental results for three types of data: synthetic data for examining the near-optimal performance, large raw face-image data bases, and traditional databases with manually selected features along with a comparison with some major existing methods, such as CART, C5.0, and OC1.

[1] Y. Lamdan and H.J. Wolfson, "Geometric hashing: A general and efficient model-based recognition scheme," Second Int'l Conf. Computer Vision, pp. 238-249, 1988.
[2] M. Bichsel, “Strategies of Robust Object Recognition for the Automatic Identification of Human Faces,” doctoral thesis, Eidgenössischen Technischen Hochschule Zürich, no. 9,467, 1991.
[3] K. Ikeuche and T. Kanade, “Automatic Generation of Object Recognition Programs,” Proc. IEEE, vol. 76, no. 8, pp. 1,016–1,035, 1988.
[4] D.J. Kriegman and J. Ponce, "On Recognizing and Positioning Curve 3-D Objects From Image Contours," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, pp. 1,127-1,137, Dec. 1990.
[5] W.E.L. Grimson, Object Recognition by Computer. MIT Press, 1990.
[6] M. Turk and A. Pentland, “Eigenfaces for Recognition,” J. Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
[7] H. Murase and S.K. Nayar, “Visual Learning and Recognition of 3-D Objects from Appearance,” Int'l J. Computer Vision, vol. 14, pp. 5-24, 1995.
[8] J. Weng, “Cresceptron and SHOSLIF: Toward Comprehensive Visual Learning,” Early Visual Learning, S.K. Nayar and T. Poggio, eds., New York: Oxford Univ. Press, 1996.
[9] A. Pentland, B. Moghaddam, and Starner, "View-Based and Modular Eigenspaces for Face Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1994, pp. 84-91.
[10] D.L. Swets and J. Weng, Using Discriminant Eigenfeatures for Image Retrieval IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 831-836, Aug. 1996.
[11] G.R. Dattatreya and L.N. Kanal, “Decision Tress in Pattern Recognition,” Progress in Pattern Recognition, L. Kanal and A. Rosenfeld, eds., pp. 189–239, 1985.
[12] S.R. Safavian and D. Landgrebe, "A Survey of Decision Tree Classifier Methodology," IEEE Trans. Systems, Man, and Cybernetics, vol. 21, no. 3, pp. 660-674, 1991.
[13] S.K. Murthy, “Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey,” Data Mining and Knowledge Discovery, pp. 345-389, 1998.
[14] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees. New York: Chapman&Hall, 1993.
[15] D. Swets and J. Weng, “Discriminant Analysis and Eigenspace Partition Tree for Face and Object Recognition from Views,” Proc. Int'l Conf. Automatic Face- and Gesture-Recognition, pp. 192–197, Oct. 1996.
[16] D.J. Swets and J. Weng, “Hierarchical Discriminant Analysis for Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 386-401, May 1999.
[17] K. Fukunaga, Introduction to Statistical Pattern Recognition, second edition. Academic Press, 1990.
[18] S.S. Wilks, Mathematical Statistics. New York: Wiley, 1963.
[19] G.H. Golub and C.F. van Loan, Matrix Computations. Baltimore, Ma.: Johns Hopkins Univ. Press, 1989.
[20] W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling, Numerical Recipes, the Art of Scientific Computing. Cambridge, Mass.: Cambridge Univ. Press, 1986.
[21] V.N. Vapnik, Statistical Learning Theory, John Wiley&Sons, 1998.
[22] V. Cherkassky and F. Mulier, Learning from Data—Concepts, Theory, and Methods. New York: John Wiley&Sons, 1998.
[23] C. Saunders, M.O. Stitson, J. Weston, L. Bottou, B. Schlkopf, and A. Smola, “Support Vector Machine Reference Manual,” Technical Report, CSD-TR-98-03, Royal Holloway, Univ. of London, Egham, UK, Mar. 1998, http:/svm.cs.rhbnc.ac.uk/.
[24] P. Phillips, H. Moon, P. Rauss, and S. Risvi, “The FERET September 1996 Database Evaluation Procedure,” Audio and Video-Based Biometric Person Authentication, 1997.
[25] Machine Learning, Neural and Statistical Classification. D. Michie, D.J. Spiegelhalter, and C.C. Taylor, eds., Ellis Horwood, 1994.
[26] E. Oja, Subspace Methods of Pattern Recognition. Letchworth, UK: Research Studies Press, 1983.
[27] D.G. Luenberger, Optimization by Vector Space Methods. New York: John Wiley&Sons, 1969.

Index Terms:
Discriminant analysis, classification and regression, decision trees, high-dimensional data, image retrieval.
Citation:
Wey-Shiuan Hwang, Juyang Weng, "Hierarchical Discriminant Regression," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1277-1293, Nov. 2000, doi:10.1109/34.888712
Usage of this product signifies your acceptance of the Terms of Use.