This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors
May 2008 (vol. 30 no. 5)
pp. 878-892
This paper describes methods for recovering time-varying shape and motion of non-rigid 3D objects from uncalibrated 2D point tracks. For example, given a video recording of a talking person, we would like to estimate the 3D shape of the face at each instant, and learn a model of facial deformation. Time-varying shape is modeled as a rigid transformation combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed, and thus additional assumptions about deformations are required. We first suggest restricting shapes to lie within a low-dimensional subspace, and describe estimation algorithms. However, this restriction alone is insufficient to constrain reconstruction. To address these problems, we propose a reconstruction method using a Probabilistic Principal Components Analysis (PPCA) shape model, and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points. We then extend the model to model temporal dynamics in object shape, allowing the algorithm to robustly handle severe cases of missing data.

[1] J. Barbič and D. James, “Real-Time Subspace Integration for St. Venant-Kirchhoff Deformable Models,” ACM Trans. Graphics, vol. 24, no. 3, pp. 982-990, Aug. 2005.
[2] B. Bascle and A. Blake, “Separability of Pose and Expression in Facial Tracking Animation,” Proc. Int'l Conf. Computer Vision, pp.323-328, Jan. 1998.
[3] C.M. Bishop, “Variational Principal Components,” Proc. Int'l Conf. Artificial Neural Networks, vol. 1, pp. 509-514, 1999.
[4] V. Blanz and T. Vetter, “A Morphable Model for the Synthesis of 3D Faces,” Proc. ACM Int'l Conf. Computer Graphics and Interactive Techniques (SIGGRAPH '99), pp. 187-194, Aug. 1999.
[5] M. Brand, “Morphable 3D Models from Video,” Proc. Computer Vision and Pattern Recognition, vol. 2, pp. 456-463, 2001.
[6] M. Brand, “A Direct Method for 3D Factorization of Nonrigid Motion Observed in 2D,” Proc. Computer Vision and Pattern Recognition, vol. 2, pp. 122-128, 2005.
[7] C. Bregler, A. Hertzmann, and H. Biermann, “Recovering Non-Rigid 3D Shape from Image Streams,” Proc. Computer Vision and Pattern Recognition, pp. 690-696, 2000.
[8] A.M. Buchanan and A.W. Fitzgibbon, “Damped Newton Algorithms for Matrix Factorization with Missing Data,” Proc. Computer Vision and Pattern Recognition, vol. 2, pp. 316-322, 2005.
[9] T.F. Cootes and C.J. Taylor, “Statistical Models of Appearance for Medical Image Analysis and Computer Vision,” Proc. SPIE Medical Imaging, 2001.
[10] J.P. Costeira and T. Kanade, “A Multibody Factorization Method for Independently Moving Objects,” Int'l J. Computer Vision, vol. 29, no. 3, pp. 159-179, 1998.
[11] F. Dellaert, S.M. Seitz, C.E. Thorpe, and S. Thrun, “EM, MCMC, and Chain Flipping for Structure from Motion with Unknown Correspondence,” Machine Learning, vol. 50, nos. 1-2, pp. 45-71, 2003.
[12] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. Series B, vol. 39, pp. 1-38, 1977.
[13] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis, second ed. CRC Press, 2003.
[14] Z. Ghahramani and G.E. Hinton, “The EM Algorithm for Mixtures of Factor Analyzers,” Technical Report CRG-TR-96-1, Univ. of Toronto, 1996.
[15] Z. Ghahramani and G.E. Hinton, “Parameter Estimation for Linear Dynamical Systems,” Technical Report CRG-TR-96-2, Univ. of Toronto, 1996.
[16] M. Han and T. Kanade, “Multiple Motion Scene Reconstruction from Uncalibrated Views,” Proc. Int'l Conf. Computer Vision, vol. 1, pp. 163-170, July 2001.
[17] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, second ed. Cambridge Univ. Press, 2003.
[18] G. Johansson, “Visual Perception of Biological Motion and a Model for Its Analysis,” Perception and Psychophysics, vol. 14, pp.201-211, 1973.
[19] D.J.C. MacKay, “Probable Networks and Plausible Predictions—A Review of Practical Bayesian Methods for Supervised Neural Networks,” Network: CNS, vol. 6, pp. 469-505, 1995.
[20] F.I. Parke, “Computer Generated Animation of Faces,” Proc. ACM Ann. Conf., pp. 451-457, 1972.
[21] V. Pavlović, J.M. Rehg, and J. MacCormick, “Learning Switching Linear Models of Human Motion,” Advances in Neural Information Processing Systems 13, pp. 981-987, 2001.
[22] S.T. Roweis, “EM Algorithms for PCA and SPCA,” Proc. Ann. Conf. Advances in Neural Information Processing Systems (NIPS '97), pp. 626-632, 1998.
[23] R.H. Shumway and D.S. Stoffer, “An Approach to Time Series Smoothing and Forecasting Using the EM Algorithm,” J. Time Series Analysis, vol. 3, no. 4, pp. 253-264, 1982.
[24] L. Sirovich and M. Kirby, “Low-Dimensional Procedure for the Characterization of Human Faces,” J. Optical Soc. Am. A, vol. 4, no. 3, pp. 519-524, Mar. 1987.
[25] M.E. Tipping and C.M. Bishop, “Probabilistic Principal Components Analysis,” J. Royal Statistical Soc. Series B, vol. 61, no. 3, pp.611-622, 1999.
[26] C. Tomasi and T. Kanade, “Shape and Motion from Image Streams Under Orthography: A Factorization Method,” Int'l J. Computer Vision, vol. 9, no. 2, pp. 137-154, 1992.
[27] L. Torresani and A. Hertzmann, “Automatic Non-Rigid 3D Modeling from Video,” Proc. European Conf. Computer Vision, pp.299-312, 2004.
[28] L. Torresani, A. Hertzmann, and C. Bregler, “Learning Non-Rigid 3D Shape from 2D Motion,” Proc. Ann. Conf. Advances in Neural Information Processing Systems (NIPS '04), pp. 1555-1562, 2004.
[29] L. Torresani, D. Yang, G. Alexander, and C. Bregler, “Tracking and Modeling Non-Rigid Objects with Rank Constraints,” Proc. Computer Vision and Pattern Recognition, pp. 493-500, 2001.
[30] N.F. Troje, “Decomposing Biological Motion: A Framework for Analysis and Synthesis of Human Gait Patterns,” J. Vision, vol. 2, no. 5, pp. 371-387, 2002.
[31] M. Turk and A. Pentland, “Eigenfaces for Recognition,” J.Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
[32] S. Ullman, “Maximizing Rigidity: The Incremental Recovery of 3-D Structure from Rigid and Nonrigid Motion,” Perception, vol. 13, no. 3, pp. 255-274, 1984.
[33] J.M. Wang, D.J. Fleet, and A. Hertzmann, “Gaussian Process Dynamical Models,” Proc. Ann. Conf. Advances in Neural Information Processing Systems (NIPS '06), pp. 1441-1448, 2006.
[34] J. Xiao, J. Chai, and T. Kanade, “A Closed-Form Solution to Non-Rigid Shape and Motion Recovery,” Int'l J. Computer Vision, vol. 67, no. 2, pp. 233-246, 2006.
[35] A.J. Yezzi and S. Soatto, “Deformotion: Deforming Motion, Shape Averages, and the Joint Registration and Approximation of Structures in Images,” Int'l J. Computer Vision, vol. 53, pp. 153-167, 2003.

Index Terms:
Motion, Shape, Machine learning, 3D/stereo scene analysis
Citation:
Lorenzo Torresani, Aaron Hertzmann, Chris Bregler, "Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 5, pp. 878-892, May 2008, doi:10.1109/TPAMI.2007.70752
Usage of this product signifies your acceptance of the Terms of Use.