Subscribe
Issue No.08 - August (2008 vol.30)
pp: 1357-1370
ABSTRACT
Model-based techniques have proven to be successful in interpreting the large amount of information contained in images. Associated fitting algorithms search for the global optimum of an objective function, which should correspond to the best model fit in a given image. Although fitting algorithms have been the subject of intensive research and evaluation, the objective function is usually designed ad hoc, based on implicit and domain-dependent knowledge. In this article, we address the root of the problem by learning more robust objective functions. First, we formulate a set of desirable properties for objective functions and give a concrete example function that has these properties. Then, we propose a novel approach that learns an objective function from training data generated by manual image annotations and this ideal objective function. In this approach, critical decisions such as feature selection are automated, and the remaining manual steps hardly require domain-dependent knowledge. Furthermore, an extensive empirical evaluation demonstrates that the obtained objective functions yield more robustness. Learned objective functions enable fitting algorithms to determine the best model fit more accurately than with designed objective functions.
INDEX TERMS
Computer vision, Pattern matching, Image Processing and Computer Vision, Model-based coding, Object recognition, Vision and Scene Understanding, Modeling and recovery of physical attributes, Shape, Texture, Computer vision, Computational models of vision, Face and gesture recognition, Real-time systems
CITATION
Freek Stulp, Sylvia Pietzsch, Bernd Radig, "Learning Local Objective Functions for Robust Face Model Fitting", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 8, pp. 1357-1370, August 2008, doi:10.1109/TPAMI.2007.70793
REFERENCES
[1] A. Agarwal and B. Triggs, “Recovering 3D Human Pose from Monocular Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 1, Jan. 2006.
[2] T. Belker, “Plan Projection, Execution, and Learning for Mobile Robot Control,” PhD dissertation, Univ. of Bonn, 2004.
[3] V. Blanz and T. Vetter, “Face Recognition Based on Fitting a 3D Morphable Model,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1063-1074, Sept. 2003.
[4] A. Boffy, Y. Tsin, and Y. Gegnc, “Real-Time Feature Matching Using Adaptive and Spatially Distributed Classification Trees,” Proc. British Machine Vision Conf., vol. 2, p. 529, 2006.
[5] I. Cohen, N. Sebe, L. Chen, A. Garg, and T. Huang, “Facial Expression Recognition from Video Sequences: Temporal and Static Modeling,” Computer Vision and Image Understanding, special issue on face recognition, vol. 91, nos. 1-2, pp. 160-187, 2003.
[6] T.F. Cootes and C.J. Taylor, “Statistical Models of Appearance for Computer Vision,” technical report, Univ. of Manchester, 2004.
[7] T.F. Cootes, G.J. Edwards, and C.J. Taylor, “Active Appearance Models,” Proc. European Conf. Computer Vision, vol. 2, pp. 484-498, 1998.
[8] T.F. Cootes and C.J. Taylor, “Active Shape Models—Smart Snakes,” Proc. British Machine Vision Conf., pp. 266-275, 1992.
[9] T.F. Cootes, C.J. Taylor, A. Lanitis, D.H. Cooper, and J. Graham, “Building and Using Flexible Models Incorporating Grey-Level Information,” Proc. Int'l Conf. Computer Vision, 1993.
[10] D. Cristinacce and T.F. Cootes, “Facial Feature Detection and Tracking with Automatic Template Selection,” Face and Gesture Recognition, pp. 429-434, 2006.
[11] D. Cristinacce and T.F. Cootes, “Feature Detection and Tracking with Constrained Local Models,” Proc. 17th British Machine Vision Conf., pp. 929-938, 2006.
[12] F. Dufrenois, J. Colliez, and D. Hamad, “Crisp Weighted Support Vector Regression for Robust Single Model Estimation: Application to Object Tracking in Image Sequences,” Proc. Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[13] G.J. Edwards, T.F. Cootes, and C.J. Taylor, “Face Recognition Using Active Appearance Models,” Proc. Fifth European Conf. Computer Vision, pp. 581-595, 1998.
[14] M.A. Fischler and R.C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” Comm. ACM, vol. 24, no. 6, pp. 381-395, 1981.
[15] B. van Ginneken, A. Frangi, J. Staal, B. Haar, and R. Viergever, “Active Shape Model Segmentation with Optimal Features,” IEEE Trans. Medical Imaging, vol. 21, no. 8, pp. 924-933, 2002.
[16] B. van Ginneken and M. Loog, “Pixel Position Regression—Application to Medical Image Segmentation,” Proc. 17th Int'l Conf. Pattern Recognition, 2004.
[17] H. Grabner, M. Grabner, and H. Bischof, “Real-Time Tracking via On-Line Boosting,” Proc. British Machine Vision Conf., vol. 1, p. 47, 2006.
[18] D. Grest, D. Herzog, and R. Koch, “Human Model Fitting from Monocular Posture Images,” Proc. Vision, Modeling, and Visualization, 2005.
[19] G.D. Hager and P.N. Belhumeur, “Efficient Region Tracking with Parametric Models of Geometry and Illumination,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 10, Oct. 1998.
[20] M.A. Hall and L.A. Smith, “Feature Subset Selection: A Correlation Based Filter Approach,” Proc. Int'l Conf. Neural Information Processing and Intelligent Information System, pp. 855-858, 1997.
[21] R. Hanek, “Fitting Parametric Curve Models to Images Using Local Self-Adapting Separation Criteria,” PhD dissertation, Dept. of Informatics, Technical Univ. München, 2004.
[22] M. Isard and A. Blake, “Contour Tracking by Stochastic Propagation of Conditional Density,” Proc. European Conf. Computer Vision, 1996.
[23] M. Isard and A. Blake, “Condensation—Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[24] O. Jesorsky, K.J. Kirchberg, and R. Frischholz, “Robust Face Detection Using the Hausdorff Distance,” Proc. Int'l Conf. Audio- and Video-Based Biometric Person Authentication, pp. 90-95, 2001.
[25] M.J. Jones and P. Viola, “Fast Multi-View Face Detection,” Technical Report TR2003-96, Mitsubishi Electric Research Laboratories, June 2003.
[26] F. Jurie and M. Dhome, “Hyperplane Approximation for Template Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002.
[27] R. Kohavi and G.H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
[28] V.P. Kumar and T. Poggio, “Learning-Based Approach to Real Time Tracking and Analysis of Faces,” Automatic Face and Gesture Recognition, pp. 96-101, 2000.
[29] G. Langs, P. Peloschek, R. Donner, M. Reiter, and H. Bischof, “Active Feature Models,” Proc. 18th Int'l Conf. Pattern Recognition, pp. 417-420, 2006.
[30] V. Lepetit and P. Fua, “Monocular Model-Based 3D Tracking of Rigid Objects,” Foundations and Trends in Computer Graphics and Vision, vol. 1, no. 1, pp. 1-89, 2006.
[31] V. Lepetit, P. Lagger, and P. Fua, “Randomized Trees for Real-Time Keypoint Recognition,” Proc. Computer Vision and Pattern Recognition, vol. 2, pp. 775-781, 2005.
[32] R. Lienhart and J. Maydt, “An Extended Set of Haar-Like Features for Rapid Object Detection,” Proc. IEEE Int'l Conf. Image Processing, pp. 900-903, 2002.
[33] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 20, pp. 91-110, 2003.
[34] S. Lucey, S. Sridharan, and V. Chandran, “Initialized Eigenlip Estimator for Fast Lip Tracking Using Linear Regression,” Proc. Int'l Conf. Pattern Recognition, vol. 3, p. 3182, 2000.
[35] J. Matas and O. Chum, “Randomized Ransac,” Proc. Computer Vision Winter Workshop, pp. 49-58, 2002.
[36] K. Messer, J. Matas, J. Kittler, J. Lüttin, and G. Maitre, “XM2VTSDB: The Extended M2VTS Database,” Audio- and Video-Based Biometric Person Authentication, pp. 72-77, 1999.
[37] M.M. Nordstrøm, M. Larsen, J. Sierakowski, and M.B. Stegmann, “The IMM Face Database—An Annotated Dataset of 240 Face Images,” technical report, Informatics and Math. Modelling, DTU, 2004.
[38] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 7, July 2002.
[39] M. Pantic and L. Rothkrantz, “Automatic Analysis of Facial Expressions: The State of the Art,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 12, Dec. 2000.
[40] R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[41] S. Romdhani, “Face Image Analysis Using a Multiple Feature Fitting Strategy,” PhD dissertation, Univ. of Basel, 2005.
[42] M.B. Stegmann, R. Fisker, and B.K. Ersbøll, “On Properties of Active Shape Models,” technical report, Informatics and Math. Modelling, DTU, 2000.
[43] F. Stulp and M. Beetz, “Optimized Execution of Action Chains Using Learned Performance Models of Abstract Actions,” Proc. Int'l Joint Conf. Artificial Intelligence, 2005.
[44] A. Thayananthan, R. Navaratnam, B. Stenger, P. Torr, and R. Cipolla, “Multivariate Relevance Vector Machines for Tracking,” Proc. European Conf. Computer Vision, pp. 124-138, 2006.
[45] M. Tipping, “The Relevance Vector Machine,” Advances in Neural Information Processing Systems, Morgan Kaufmann, 2000.
[46] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001.
[47] O. Williams, A. Blake, and R. Cipolla, “Sparse Bayesian Learning for Efficient Visual Tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, Aug. 2005.
[48] M. Wimmer, B. Radig, and M. Beetz, “A Person and Context Specific Approach for Skin Color Classification,” Proc. Int'l Conf. Pattern Recognition, vol. 2, pp. 39-42, 2006.
[49] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed. Morgan Kaufmann, 2005.
[50] C. Zetzsche and G. Krieger, “Nonlinear Mechanisms and Higher Order Statistics in Biological Vision and Electronic Image Processing: Review and Perspective,” J. Electronic Imaging, vol. 10, no. 1, pp. 56-99, 2001.