This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
BM?E : Discriminative Density Propagation for Visual Tracking
November 2007 (vol. 29 no. 11)
pp. 2030-2044
We introduce BM?E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent probabilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a non-linear generative observation model at run-time, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations — typically bag-of-feature global image histograms or descriptors computed over regular spatial grids. These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and non-linear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) We establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) We propose flexible supervised and unsupervised algorithms for learning feedforward, multivalued contextual mappings (multimodal state distributions) based on compact, conditional Bayesian mixture of experts models; (3) We validate the framework empirically for the reconstruction of 3d human motion in monocular video sequences. Our tests on both real and motion capture-based sequences show significant performance gains with respect to competing nearest-neighbor, regression, and structured prediction methods.

[1] “CMU Human Motion Capture Database,” http://mocap.cs. cmu.edusearch.html, 2003.
[2] A. Agarwal and B. Triggs, “Monocular Human Motion Capture with a Mixture of Regressors,” Proc. Workshop Vision for Human Computer Interaction, 2005.
[3] A. Agarwal and B. Triggs, “Recovering 3D Human Pose from Monocular Images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 1, pp. 44-58, Jan. 2006.
[4] F. Aherne, N. Thacker, and P. Rocket, “Optimal Pairwise Geometric Histograms,” Proc. British Machine Vision Conf., 1997.
[5] G. Bakir, J. Weston, and B. Scholkopf, “Learning to Find Preimages,” Advances in Neural Information Processing Systems, 2004.
[6] S. Belongie, J. Malik, and J. Puzicha, “Shape Matching and Object Recognition Using Shape Contexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, 2002.
[7] C. Bishop and M. Svensen, “Bayesian Mixtures of Experts,” Uncertainty in Artificial Intelligence, 2003.
[8] M. Black and P. Anandan, “The Robust Estimation of Multiple Motions: Parametric and Piecewise Smooth Flow Fields,” Computer Vision and Image Understanding, vol. 6, no. 1, pp. 57-92, 1996.
[9] M. Brand, “Shadow Puppetry,” Proc. Seventh IEEE Int'l Conf. Computer Vision, pp. 1237-1244, 1999.
[10] M. Bray, P. Kohli, and P. Torr, “Posecut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph Cuts,” Proc. Ninth European Conf. Computer Vision, 2006.
[11] K. Choo and D. Fleet, “People Tracking Using Hybrid Monte Carlo Filtering,” Proc. Eighth IEEE Int'l Conf. Computer Vision, 2001.
[12] O. Cula and K. Dana, “3D Texture Recognition Using Bidirectional Feature Histograms,” Int'l J. Computer Vision, vol. 59, no. 1, pp. 33-60, 2004.
[13] W. DeSarbo and W. Cron, “A Maximum Likelihood Methodology for Clusterwise Linear Regression,” J. Classification, no. 5, pp. 249-282, 1988.
[14] J. Deutscher, A. Blake, and I. Reid, “Articulated Body Motion Capture by Annealed Particle Filtering,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2000.
[15] A. Elgammal, R. Duraiswami, D. Harwood, and L. Davis, “Foreground and Background Modeling Using Non-Parametric Kernel Density Estimation for Visual Surveillance,” Proc. IEEE, 2002.
[16] A. Elgammal and C. Lee, “Inferring 3D Body Pose from Silhouettes Using Activity Manifold Learning,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2004.
[17] N. Gordon, D. Salmond, and A. Smith, “Novel Approach to Non-Linear/Non-Gaussian State Estimation,” IEE Proc. F, 1993.
[18] K. Grauman, G. Shakhnarovich, and T. Darell, “Inferring 3D Structure with a Statistical Image-Based Shape Model,” Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.
[19] N. Howe, M. Leventon, and W. Freeman, “Bayesian Reconstruction of 3D Human Motion from Single-Camera Video,” Advances in Neural Information Processing Systems, 1999.
[20] M. Isard and A. Blake, “CONDENSATION—Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, 1998.
[21] T. Jaeggli, E. Koller-Meier, and L. Van Gool, “Monocular Tracking with a Mixture of View-Dependent Learned Models,” Proc. Fourth Conf. Articulated Motion and Deformable Objects, pp. 494-503, 2006.
[22] T. Jebara and A. Pentland, “On Reversing Jensen's Inequality,” Advances in Neural Information Processing Systems, 2000.
[23] M. Jordan, Learning in Graphical Models. MIT Press, 1998.
[24] M. Jordan and R. Jacobs, “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, no. 6, pp. 181-214, 1994.
[25] I. Kakadiaris and D. Metaxas, “Model-Based Estimation of 3D Human Motion with Occlusion Prediction Based on Active Multi-Viewpoint Selection,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, pp. 81-87, 1996.
[26] N. Lawrence, M. Seeger, and R. Herbrich, “Fast Sparse Gaussian Process Methods: The Informative Vector Machine,” Advances in Neural Information Processing Systems, 2003.
[27] M. Lee and I. Cohen, “Proposal-Maps-Driven MCMC for Estimating Human Body Pose in Static Images,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2004.
[28] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int'l J. Computer Vision, vol. 60, no. 2, 2004.
[29] D. Mackay, “Bayesian Interpolation,” Neural Computation, vol. 4, no. 5, pp. 720-736, 1992.
[30] D. Mackay, “Comparison of Approximate Methods for Handling Hyperparameters,” Neural Computation, vol. 11, no. 5, 1998.
[31] A. McCallum, D. Freitag, and F. Pereira, “Maximum Entropy Markov Models for Information Extraction and Segmentation,” Proc. Int'l Conf. Machine Learning, 2000.
[32] G. Mori and J. Malik, “Estimating Human Body Configurations Using Shape Context Matching,” Proc. Seventh European Conf. Computer Vision, 2002.
[33] R. Rosales and S. Sclaroff, “Learning Body Pose via Specialized Maps,” Advances in Neural Information Processing Systems, 2002.
[34] S. Roth, L. Sigal, and M. Black, “Gibbs Likelihoods for Bayesian Tracking,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2004.
[35] B. Schölkopf, A. Smola, and K. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, pp. 1299-1319, 1998.
[36] G. Shakhnarovich, P. Viola, and T. Darrell, “Fast Pose Estimation with Parameter Sensitive Hashing,” Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.
[37] H. Sidenbladh and M. Black, “Learning Image Statistics for Bayesian Tracking,” Proc. Seventh IEEE Int'l Conf. Computer Vision, 2001.
[38] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard, “Tracking Loose-Limbed People,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2004.
[39] C. Sminchisescu and A. Jepson, “Density Propagation for Continuous Temporal Chains. Generative and Discriminative Models,” Technical Report CSRG-401, Univ. of Toronto, Oct. 2004.
[40] C. Sminchisescu and A. Jepson, “Generative Modeling for Continuous Non-Linearly Embedded Visual Inference,” Proc. Int'l Conf. Machine Learning, pp. 759-766, 2004.
[41] C. Sminchisescu and A. Jepson, “Variational Mixture Smoothing for Non-Linear Dynamical Systems,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 608-615, 2004.
[42] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Learning to Reconstruct 3D Human Motion from Bayesian Mixtures of Experts: A Probabilistic Discriminative Approach,” Technical Report CSRG-502, Univ. of Toronto, Oct. 2004.
[43] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Conditional Models for Contextual Human Motion Recognition,” Proc. 10th IEEE Int'l Conf. Computer Vision, vol. 2, pp. 1808-1815, 2005.
[44] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Conditional Visual Tracking in Kernel Space,” Advances in Neural Information Processing Systems, 2005.
[45] C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative Density Propagation for 3D Human Motion Estimation,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 390-397, 2005.
[46] C. Sminchisescu, A. Kanaujia, and D. Metaxas, “Learning Joint Top-Down and Bottom-Up Processes for 3D Visual Inference,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2006.
[47] C. Sminchisescu and B. Triggs, “Estimating Articulated Human Motion with Covariance Scaled Sampling,” Int'l J. Robotics Research, vol. 22, no. 6, pp. 371-393, 2003.
[48] C. Sminchisescu and B. Triggs, “Kinematic Jump Processes for Monocular 3D Human Tracking,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 69-76, 2003.
[49] C. Sminchisescu and M. Welling, “Generalized Darting Monte Carlo,” Technical Report CSRG-543, Univ. of Toronto, Oct. 2006.
[50] E. Sudderth, A. Ihler, W. Freeman, and A. Wilsky, “Non-Parametric Belief Propagation,” Proc. IEEE Int'l Conf. Computer Vision and Pattern Recognition, 2003.
[51] M. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J. Machine Learning Research, 2001.
[52] C. Tomasi, S. Petrov, and A. Sastry, “$3{\rm D}\; {\rm Tracking} = {\rm classification} + $ ${\rm interpolation}$ ,” Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.
[53] N. Ueda and Z. Ghahramani, “Bayesian Model Search for Mixture Models Based on Optimizing Variational Bounds,” Neural Networks, vol. 15, pp. 1223-1241, 2002.
[54] R. Urtasun, D. Fleet, A. Hertzmann, and P. Fua, “Priors for People Tracking in Small Training Sets,” Proc. 10th IEEE Int'l Conf. Computer Vision, 2005.
[55] S. Waterhouse, D. Mackay, and T. Robinson, “Bayesian Methods for Mixtures of Experts,” Advances in Neural Information Processing Systems, 1996.
[56] J. Weston, O. Chapelle, A. Elisseeff, B. Scholkopf, and V. Vapnik, “Kernel Dependency Estimation,” Advances in Neural Information Processing Systems, 2002.
[57] D. Wipf, J. Palmer, and B. Rao, “Perspectives on Sparse Bayesian Learning,” Advances in Neural Information Processing Systems, 2003.

Index Terms:
computer vision, statistical models, video analysis, motion, tracking
Citation:
Cristian Sminchisescu, Atul Kanaujia, Dimitris N. Metaxas, "BM?E : Discriminative Density Propagation for Visual Tracking," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 11, pp. 2030-2044, Nov. 2007, doi:10.1109/TPAMI.2007.1111
Usage of this product signifies your acceptance of the Terms of Use.