This Article 
 Bibliographic References 
 Add to: 
Probabilistic Visual Learning for Object Representation
July 1997 (vol. 19 no. 7)
pp. 696-710

Abstract—We present an unsupervised technique for visual learning, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density estimates are derived for modeling the training data: a multivariate Gaussian (for unimodal distributions) and a Mixture-of-Gaussians model (for multimodal distributions). These probability densities are then used to formulate a maximum-likelihood estimation framework for visual search and target detection for automatic object recognition and coding. Our learning technique is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects, such as hands.

[1] C. Anderson, P. Burt, and G. Van der Wall, "Change Detection and Tracking Using Pyramid Transform Techniques," Proc. SPIE Conf. Intelligence, Robots, and Computer Vision, vol. 579, pp. 72-78, 1985.
[2] M. Bichsel and A. Pentland, "Human Face Recognition and the Face Image Set's Topology," CVGIP: Image Understanding, vol. 59, no. 2, pp. 254-261, 1994.
[3] C. Bregler and S. Omohundro, "Surface Learning with Applications to Lip Reading," Advances in Neural Information Processing Systems 6, J.D. Cowan, G. Tesauro, and J. Alspector, eds., pp. 43-50.San Mateo, Calif.: Morgan Kaufman, 1994.
[4] R. Brunelli and T. Poggio, "Face Recognition: Features vs. Templates," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 10, pp. 1,042-1,053, Oct. 1993.
[5] R. Brunelli and S. Messelodi, "Robust Estimation of Correlation: An Application to Computer Vision," IRST Technical Report no. 9310-015, Oct. 1993.
[6] M. Burl et al., "Automating the Hunt for Volcanos on Venus," Proc. IEEE Conf. Computer Vision and Pattern Recognition,Seattle, June21-23, 1994.
[7] T. Cootes and C. Taylor, "Active Shape Models: Smart Snakes," Proc. British Machine Vision Conf., pp. 9-18. Springer-Verlag, 1992.
[8] T. Cootes, A. Hill, C. Taylor, and J. Haslam, "Use of Active Shape Models for Locacting Structures in Medical Images," Image and Vision Computing, vol. 12, no. 6, pp. 355-365, July/Aug. 1994.
[9] M. Cover and J. Thomas, Elements of Information Theory.New York: John Wiley&Sons, 1994.
[10] T. Darrell and A. Pentland, “Space-Time Gestures,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 335-340, 1993.
[11] A. Dempster, N. Laird, and D. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc. B, vol. 39, 1977.
[12] G. Golub and C. Van Loan, Matrix Computations. Johns Hopkins Press, 1989.
[13] M.K. Hu, “Pattern Recognition by Moment Invariants,” Proc. IRE Trans. Information Theory, vol. 8, pp. 179-187, 1962.
[14] I. Jolliffe, Principal Component Analysis.New York: Springer-Verlag, 1986.
[15] T. Kanade, "Picture Processing by Computer Complex and Recognition of Human Faces," technical report, Kyoto Univ., Dept. of Information Science, 1973.
[16] M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active Contour Models," Int'l J. Computer Vision, vol. 1, no. 4, pp. 321-331, 1987.
[17] M. Kirby and L. Sirovich,“Application of Karhunen-Loève procedure for the characterization of human faces,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 1, pp. 103-108, Jan. 1990.
[18] B. Kumar, D. Casasent, and H. Murakami, "Principal Component Imagery for Statistical Pattern Recognition Correlators," Optical Eng., vol. 21, no. 1, Jan./Feb. 1982.
[19] M. Loeve, Probability Theory.Princeton, N.J.: Van Nostrand, 1955.
[20] T. McElroy, E. Wilson, and G. Anspach, "Fourier Descriptors and Neural Networks for Shape Classification," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing,Detroit, May 1995.
[21] B. Moghaddam and A. Pentland, "Face Recognition Using View-Based and Modular Eigenspaces," Automatic Systems for the Identification and Inspection of Humans, SPIE vol. 2277, 1994.
[22] B. Moghaddam, C. Nastar, and A. Pentland, "A Bayesian Similarity Measure for Direct Image Matching," Proc. Int'l Conf. Pattern Recognition,Vienna, 1996.
[23] H. Murase and S. Nayar, "Visual Learning and Recognition of 3D Objects from Appearance," Int'l J. Computer Vision, vol. 14, no. 1, 1995.
[24] S. Nayar, H. Murase, and S. Nene, "General Learning Algorithm for Robot Vision," Neural&Stochastic Methods in Image&Signal Processing, SPIE vol. 2304, July 1994.
[25] S. Palmer, "The Psychology of Perceptual Organization: A Transformational Approach," Human and Machine Vision, J. Beck, B. Hope, and A. Rosenfeld, eds. Academic Press, 1983.
[26] A. Pentland and S. Sclaroff, "Closed-Form Solutions for Physically-Based Shape Modeling and Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 7, pp. 715-729, July 1991.
[27] A. Pentland, B. Moghaddam, and Starner, "View-Based and Modular Eigenspaces for Face Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1994, pp. 84-91.
[28] A. Pentland, R. Picard, and S. Sclaroff, "Photobook: Tools for Content-Based Manipulation of Image Databases," Int'l J. Computer Vision, vol. 18, no. 3., pp. 233-254
[29] P.J. Phillips, H. Moon, P. Rauss, and S.A. Rizvi, "The FERET September 1996 Database and Evaluation Procedure," Proc. First Int'l Conf. Audio and Video-Based Biometric Person Authentication,Crans-Montana, Switzerland, Mar. 1997.
[30] T. Poggio and F. Girosi, Networks for Approximation and Learning Proc. IEEE, vol. 78, pp. 1481-1497, 1990.
[31] R. Redner and H. Walker, "Mixture Densities, Maximum Likelihood and the EM Algorithm," SIAM Rev., vol. 26, no. 2, pp. 195-239, 1984.
[32] D. Reisfeld, H. Wolfson, and Y. Yeshurun, "Detection of Interest Points Using Symmetry," Proc. Int'l Conf. Computer Vision,Osaka, Japan, Dec. 1990.
[33] H. Rowley, S. Baluja, and T. Kanade, "Human Face Detection in Visual Scenes," Technical Report CMU-CS-95-158, Carnegie Mellon Univ., July 1995.
[34] S. Sclaroff and A.P. Pentland, Modal Matching for Correspondence and Recognition IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 6, pp. 545-561, 1995.
[35] K. Sung and T. Poggio, "Example-Based Learning for View-Based Human Face Detection," Proc. Image Understanding Workshop,Monterey, Calif., Nov. 1994.
[36] M. Turk and A. Pentland, "Eigenfaces for Recognition," J. Cognitive Neuroscience, vol. 3, no. 1, 1991.
[37] J. Vincent, J. Waite, and D. Myers, "Automatic Location of Visual Features by a System of Multilayered Perceptrons," IEE Proc., vol. 139, no. 6, Dec. 1992.
[38] J. Weng, "On Comprehensive Visual Learning," Proc. NSF/ARPA Workshop Performance vs. Methodology in Computer Vision,Seattle, June24-25, 1994.

Index Terms:
Face recognition, gesture recognition, target detection, subspace methods, maximum-likelihood, density estimation, principal component analysis, Eigenfaces.
Baback Moghaddam, Alex Pentland, "Probabilistic Visual Learning for Object Representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 696-710, July 1997, doi:10.1109/34.598227
Usage of this product signifies your acceptance of the Terms of Use.