The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2008 vol.30)
pp: 1212-1229
We define and address the problem of finding the {\em visual focus of attention for a varying number of wandering people} (VFOA-W) -- where the people's movement is unconstrained. VFOA-W estimation is a new and important problem with mplications for behavior understanding and cognitive science, as well as real-world applications. One such application, which we present in this article, monitors the attention passers-by pay to an outdoor advertisement. Our approach to the VFOA-W problem proposes a multi-person tracking solution based on a dynamic Bayesian network that simultaneously infers the (variable) number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting large variable-dimensional state-space we propose a Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme, as well as a novel global observation model which determines the number of people in the scene and localizes them. We propose a Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM)-based VFOA-W model which use head pose and location information to determine people's focus state. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where a moderate number of people pass in front of an advertisement.
Image Processing and Computer Vision, Tracking, Scene Analysis, Computer vision, Marketing
Sileye O. Ba, Kevin Smith, Daniel Gatica-Perez, "Tracking the Visual Focus of Attention for a Varying Number of Wandering People", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.30, no. 7, pp. 1212-1229, July 2008, doi:10.1109/TPAMI.2007.70773
[1] C. Andrieu, N. de Freitas, and M. Jordan, “An Introduction to MCMC for Machine Learning,” Machine Learning, vol. 50, no. 1, pp. 5-43, 2003.
[2] S. Ba, “Joint Head Tracking and Pose Estimation for Visual Focus of Attention Recognition,” PhD dissertation, École Polytechnique Fédérale de Lausanne (EPFL), Feb. 2007.
[3] S. Ba and J. Odobez, “Evaluation of Multiple Cues Head-Pose Tracking Algorithms in Indoor Environments,” Proc. Int'l Conf. Multimedia and Expo, July 2005.
[4] S. Ba and J.M. Odobez, “Probabilistic Head Pose Tracking Evaluation in Single and Multiple Camera Setups,” Classification of Events, Activities and Relationships, 2007.
[5] J. Berclaz, F. Fleuret, and P. Fua, “Robust People Tracking with Global Trajectory Optimization,” Proc. Computer Vision and Pattern Recognition, 2006.
[6] A. Bhattacharyya, “On a Measure of Divergence between Two Statistical Populations Defined by Their Probability Distributions,” Bull. Calcutta Math. Soc., vol. 35, pp. 99-109, 1943.
[7] L. Brown and Y. Tian, “A Study of Coarse Head-Pose Estimation,” Proc. Workshop Motion and Video Computing, 2002.
[8] T. Cootes, G. Edwards, and J. Taylor, “Active Appearance Model,” Proc. Fifth European Conf. Computer Vision, June 1998.
[9] M. Danninger, R. Vertegaal, D. Siewiorek, and A. Mamuji, “Using Social Geometry to Manage Interruptions and Co-Worker Attention in Office Environments,” Proc. Conf. Graphics Interface, 2005.
[10] A. Gee and R. Cipolla, “Estimating Gaze from a Single View of a Face,” Proc. Int'l Conf. Pattern Recognition, Oct. 1994.
[11] P. Green, “Reversible Jump MCMC Computation and Bayesian Model Determination,” Biometrika, vol. 82, pp. 711-732, 1995.
[12] H. Haario, E. Saksman, and J. Tamminen, “Componentwise Adaptation for High-Dimensional MCMC,” Computational Statistics, vol. 20, no. 2, pp. 265-274, 2005.
[13] A.T. Horprasert, Y. Yacoob, and L.S. Davis, “Computing 3D Head Orientation from a Monocular Image Sequence,” Proc. Int'l Soc. Optical Eng., 1996.
[14] TPAMI.2007.70773, 2008.
[15] M. Isard and J. MacCormick, “Bramble: A Bayesian Multi-Blob Tracker,” Proc. Eighth Int'l Conf. Computer Vision, July 2001.
[16] M. Katzenmaier, R. Stiefelhagen, and T. Schultz, “Identifying the Addressee in Human-Human-Robot Interactions Based on Head Pose and Speech,” Proc. Sixth Int'l Conf. Multimodal Interfaces, 2004.
[17] Z. Khan, T. Balch, and F. Dellaert, “An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets,” Proc. Eighth European Conf. Computer Vision, May 2004.
[18] Z. Khan, T. Balch, and F. Dellaert, “MCMC-Based Particle Filtering for Tracking a Variable Number of Interacting Targets,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, pp.1805-1819, 2005.
[19] V. Kruger, S. Bruns, and G. Sommer, “Efficient Head-Pose Estimation with Gabor Wavelet Networks,” Proc. 11th British Machine Vision Conf., Sept. 2000.
[20] J. MacCormick and A. Blake, “A Probabilistic Exclusion Principle for Tracking Multiple Objects,” Proc. Seventh Int'l Conf. Computer Vision, Sept. 1999.
[21] L. Marcenaro, L. Marchesotti, and C. Regazzoni, “Tracking and Counting Multiple Interacting People in Indoor Scenes,” Proc. Third Performance Evaluation of Tracking and Surveillance, June 2002.
[22] Y. Matsumoto, T. Ogasawara, and A. Zelinsky, “Behavior Recognition Based on Head-Pose and Gaze Direction Measurement,” Proc. Int'l Conf. Intelligent Robots and Systems, 2002.
[23] K. Okuma, A. Taleghani, N. Freitas, J. Little, and D. Lowe, “A Boosted Particle Filter: Multi-Target Detection and Tracking,” Proc. Eighth European Conf. Computer Vision, May 2004.
[24] K. Otsuka, J. Takemae, and H. Murase, “A Probabilistic Inference of Multi-Party Conversation Structure Based on Markov Switching Models of Gaze Patterns, Head Direction and Utterance,” Proc. Seventh Int'l Conf. Multimodal Interfaces, Oct. 2005.
[25] A.E.C. Pece, “From Cluster Tracking to People Counting,” Proc. Third Performance Evaluation of Tracking and Surveillance Workshop, June 2002.
[26] P. Perez, C. Hue, J. Vermaak, and M. Gangnet, “Color-Based Probabilistic Tracking,” Proc. Seventh European Conf. Computer Vision, May 2002.
[27] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Readings in Speech Recognition, vol. 3, no. 53A, pp. 267-296, 1990.
[28] R. Rae and H. Ritter, “Recognition of Human Head Orientation Based on Artificial Neural Networks,” IEEE Trans. Neural Networks, vol. 9, no. 2, pp. 257-265, 1998.
[29] K. Smith, “Bayesian Methods for Visual Multi-Object Tracking with Applications to Human Activity Recognition,” PhD dissertation, École Polytechnique Fédérale de Lausanne (EPFL), 2007.
[30] K. Smith, S. Ba, D. Gatica-Perez, and J.M. Odobez, “Tracking the Multi-Person Wandering Visual Focus of Attention,” Proc. Seventh Int'l Conf. Multimodal Interfaces, Nov. 2006.
[31] K. Smith, D. Gatica-Perez, S. Ba, and J.M. Odobez, “Evaluating Multi-Object Tracking,” Proc. CVPR Workshop Empirical Evaluation Methods in Computer Vision, June 2005.
[32] K. Smith, D. Gatica-Perez, and J.M. Odobez, “Using Particles to Track Varying Numbers of Objects,” Proc. Computer Vision and Pattern Recognition, June 2005.
[33] K. Smith, P. Quelhas, and D. Gatica-Perez, “Detecting Abandoned Luggage Items in a Public Space,” Proc. Ninth Workshop Performance Evaluation of Tracking and Surveillance, June 2006.
[34] P. Smith, M. Shah, and N. da Vitoria Lobo, “Determining Driver Visual Attention with One Camera,” IEEE Trans. Intelligent Transportation Systems, vol. 4, no. 4, pp. 205-218, 2004.
[35] C. Stauffer and E. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,” Proc. Computer Vision and Pattern Recognition, June 1999.
[36] R. Stiefelhagen, “Tracking Focus of Attention in Meetings,” Proc. Fourth IEEE Conf. Multimodal Interfaces, 2002.
[37] R. Stiefelhagen, M. Finke, and A. Waibel, “A Model-Based Gaze Tracking System,” Proc. Int'l Joint Symp. Intelligence and Systems, 1996.
[38] R. Stiefelhagen, M. Finke, J. Yang, and A. Waibel, “From Gaze to Focus of Attention,” Visual Information and Information Systems, pp.761-768, 1999.
[39] H. Tao, H. Sawhney, and R. Kumar, “A Sampling Algorithm for Detection and Tracking Multiple Objects,” Proc. ICCV Workshop Vision Algorithms, Sept. 1999.
[40] K. Toyama and A. Blake, “Probabilistic Tracking in a Metric Space,” Proc. Eighth IEEE Int'l Conf. Computer Vision, 2001.
[41] Y. Wu and K. Toyama, “Wide Range Illumination Insensitive Head Orientation Estimation,” Proc. Automatic Face and Gesture Recognition, Apr. 2001.
[42] R. Yang and Z. Zhang, “Model-Based Head-Pose Tracking with Stereo Vision,” Technical Report MSR-TR-2001-102, Microsoft Research, 2001.
[43] L. Zhao, G. Pingali, and I. Carlbom, “Real-Time Head Orientation Estimation Using Neural Networks,” Proc. Int'l Conf. Image Processing, Sept. 2002.
[44] T. Zhao and R. Nevatia, “Tracking Multiple Humans in Crowded Environment,” Proc. Computer Vision and Pattern Recognition, June 2004.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool