The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2010 vol.32)
pp: 348-363
Tim K. Marks , Mitsubishi Electric Research Laboratories, Cambridge
John R. Hershey , IBM T. J. Watson Research Center, Yorktown Heights
Javier R. Movellan , University of California San Diego, La Jolla
ABSTRACT
We present a generative model and inference algorithm for 3D nonrigid object tracking. The model, which we call G-flow, enables the joint inference of 3D position, orientation, and nonrigid deformations, as well as object texture and background texture. Optimal inference under G-flow reduces to a conditionally Gaussian stochastic filtering problem. The optimal solution to this problem reveals a new space of computer vision algorithms, of which classic approaches such as optic flow and template matching are special cases that are optimal only under special circumstances. We evaluate G-flow on the problem of tracking facial expressions and head motion in 3D from single-camera video. Previously, the lack of realistic video data with ground truth nonrigid position information has hampered the rigorous evaluation of nonrigid tracking. We introduce a practical method of obtaining such ground truth data and present a new face video data set that was created using this technique. Results on this data set show that G-flow is much more robust and accurate than current deterministic optic-flow-based approaches.
INDEX TERMS
Computer vision, generative models, motion, shape, texture, video analysis, face tracking.
CITATION
Tim K. Marks, John R. Hershey, Javier R. Movellan, "Tracking Motion, Deformation, and Texture Using Conditionally Gaussian Processes", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.32, no. 2, pp. 348-363, February 2010, doi:10.1109/TPAMI.2008.278
REFERENCES
[1] M. Osadchy, Y. LeCun, and M. Miller, “Synergistic Face Detection and Pose Estimation with Energy-Based Models,” J. Machine Learning Research, vol. 8, pp. 1197-1215, 2007.
[2] A. Torralba, K.P. Murphy, W.T. Freeman, and M. Rubin, “Context-Based Vision System for Place and Object Recognition,” Proc. IEEE Int'l Conf. Computer Vision, 2003.
[3] G. Hinton, S. Osindero, and K. Bao, “Learning Causally Linked Markov Random Fields,” Proc. Int'l Workshop Artificial Intelligence and Statistics, 2005.
[4] I. Fasel, B. Fortenberry, and J.R. Movellan, “A Generative Framework for Real-Time Object Detection and Classification,” Computer Vision and Image Understanding, vol. 98, pp. 182-210, 2005.
[5] M. Beal, N. Jojic, and H. Attias, “A Graphical Model for Audio-Visual Object Tracking,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 7, pp. 828-836, July 2003.
[6] N. Jojic and B. Frey, “Learning Flexible Sprites in Video Layers,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp.199-206, 2001.
[7] L. Torresani and A. Hertzmann, “Automatic Non-Rigid 3D Modeling from Video,” Proc. European Conf. Computer Vision, 2004.
[8] J. Xiao, S. Baker, I. Matthews, and T. Kanade, “Real-Time Combined 2D+3D Active Appearance Models,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2004.
[9] L. Torresani, D. Yang, G. Alexander, and C. Bregler, “Tracking and Modeling Non-Rigid Objects with Rank Constraints,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp.493-500, 2001.
[10] M. Brand and R. Bhotika, “Flexible Flow for 3D Nonrigid Tracking and Shape Recovery,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2001.
[11] M. Brand, “Morphable 3D Models from Video,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2001.
[12] H. Chen, P. Kumar, and J. van Schuppen, “On Kalman Filtering for Conditionally Gaussian Systems with Random Matrices,” Systems and Control Letters, vol. 13, pp. 397-404, 1989.
[13] A. Doucet, N. de Freitas, K. Murphy, and S. Russell, “Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks,” Proc. 16th Conf. Uncertainty in Artificial Intelligence, pp.176-183, 2000.
[14] R. Chen and J. Liu, “Mixture Kalman Filters,” J. Royal Statistical Soc.: Series B, vol. 62, pp. 493-508, 2000.
[15] A. Doucet and C. Andrieu, “Particle Filtering for Partially Observed Gaussian State Space Models,” J. Royal Statistical Soc.: Series B, vol. 64, pp. 827-838, 2002.
[16] T.K. Marks, J. Hershey, J.C. Roddey, and J.R. Movellan, “3D Tracking of Morphable Objects Using Conditionally Gaussian Nonlinear Filters,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, Workshop Generative Model Based Vision, 2004.
[17] T.K. Marks, J. Hershey, J.C. Roddey, and J.R. Movellan, “Joint Tracking of Pose, Expression, and Texture Using Conditionally Gaussian Filters,” Advances in Neural Information Processing Systems, vol. 17, pp. 889-896, MIT Press, 2005.
[18] V. Blanz and T. Vetter, “A Morphable Model for the Synthesis of 3D Faces,” Proc. ACM SIGGRAPH '99, pp. 187-194, 1999.
[19] R.E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Trans. ASME-J. Basic Eng. D, vol. 82, pp. 35-45, 1960.
[20] G.S. Fishman, Monte Carlo Sampling: Concepts Algorithms and Applications. Springer-Verlag, 1996.
[21] C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan, “An Introduction to MCMC for Machine Learning,” Machine Learning, vol. 50, nos. 1/2, pp. 5-43, 2003.
[22] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A Tutorial on Particle Filters for On-Line Non-Linear/Non-Gaussian Bayesian Tracking,” IEEE Trans. Signal Processing, vol. 50, no. 2, pp.174-188, 2002.
[23] C. Bregler, A. Hertzmann, and H. Biermann, “Recovering Non-Rigid 3D Shape from Image Streams,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2000.
[24] L. Torresani, A. Hertzmann, and C. Bregler, “Learning Non-Rigid 3D Shape from 2D Motion,” Advances in Neural Information Processing Systems, vol. 16, MIT Press, 2004.
[25] M. Brand, “A Direct Method for 3D Factorization of Nonrigid Motion Observed in 2D,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2005.
[26] J. Xiao, J. Chai, and T. Kanade, “A Closed-Form Solution to Non-Rigid Shape and Motion Recovery,” Proc. European Conf. Computer Vision, 2004.
[27] L. Torresani, A. Hertzmann, and C. Bregler, “Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 5, pp. 878-892, May 2008.
[28] I. Matthews, J. Xiao, and S. Baker, “2D versus 3D Deformable Face Models: Representational Power, Construction, and Real-Time Fitting,” Int'l J. Computer Vision, vol. 75, no. 1, pp. 93-113, 2007.
[29] T. Cootes, G. Edwards, and C. Taylor, “Active Appearance Models,” Proc. European Conf. Computer Vision, vol. 2, pp. 484-498, 1998.
[30] I. Matthews and S. Baker, “Active Appearance Models Revisited,” Int'l J. Computer Vision, vol. 60, no. 2, pp. 135-164, 2004.
[31] S. Baker and I. Matthews, “Lucas-Kanade 20 Years On: A Unifying Framework,” Int'l J. Computer Vision, vol. 56, no. 3, pp. 221-255, 2004.
[32] F. Dellaert, S. Thrun, and C. Thorpe, “Jacobian Images of Super-Resolved Texture Maps for Model-Based Motion Estimation and Tracking,” Proc. IEEE Workshop Applications of Computer Vision, pp.2-7, 1998.
[33] N. de Freitas, R. Dearden, F. Hutter, R. Morales-Menendez, J. Mutch, and D. Poole, “Diagnosis by a Waiter and a Mars Explorer,” Proc. IEEE, special issue on sequential state estimation, vol. 92, no. 3, pp. 455-468, Mar. 2004.
[34] Z. Khan, T. Balch, and F. Dellaert, “A Rao-Blackwellized Particle Filter for Eigentracking,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2004.
[35] J. Ho, K.-C. Lee, M.-H. Yang, and D.J. Kriegman, “Visual Tracking Using Learned Linear Subspaces,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 782-789, 2004.
[36] T.K. Marks, “Facing Uncertainty: 3D Face Tracking and Learning with Generative Models,” PhD dissertation, Univ. of California San Diego, 2006.
16 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool