This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Building Models of Animals from Video
August 2006 (vol. 28 no. 8)
pp. 1319-1334
This paper argues that tracking, object detection, and model building are all similar activities. We describe a fully automatic system that builds 2D articulated models known as pictorial structures from videos of animals. The learned model can be used to detect the animal in the original video—in this sense, the system can be viewed as a generalized tracker (one that is capable of modeling objects while tracking them). The learned model can be matched to a visual library; here, the system can be viewed as a video recognition algorithm. The learned model can also be used to detect the animal in novel images—in this case, the system can be seen as a method for learning models for object recognition. We find that we can significantly improve the pictorial structures by augmenting them with a discriminative texture model learned from a texture library. We develop a novel texture descriptor that outperforms the state-of-the-art for animal textures. We demonstrate the entire system on real video sequences of three different animals. We show that we can automatically track and identify the given animal. We use the learned models to recognize animals from two data sets; images taken by professional photographers from the Corel collection, and assorted images from the Web returned by Google. We demonstrate quite good performance on both data sets. Comparing our results with simple baselines, we show that, for the Google set, we can detect, localize, and recover part articulations from a collection demonstrably hard for object recognition.

[1] M.A. Fischler and R.A. Elschlager, “The Representation and Matching of Pictorial Structures,” IEEE Trans. Computer, vol. 1, no. 22, pp. 67-92, Jan. 1973.
[2] U. Grenander, Y. Chow, and D. Keenan, Hands: A Pattern Theoretic Study of Biological Shapes. Springer-Verlag, 1991.
[3] T. Cootes, G. Edwards, and C. Taylor, “Active Appearance Models,” Proc. European Conf. Computer Vision, 1998.
[4] M. Burl, M. Weber, and P. Perona, “A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry,” Proc. European Conf. Computer Vision, pp. 628-641, 1998.
[5] C. Schmid, “Constructing Models for Content-Based Image Retrieval,” Proc. Computer Vision and Pattern Recongition, 2001.
[6] S. Lazebnik, C. Schmid, and J. Ponce, “Affine-Invariant Local Descriptors and Neighborhood Statistics for Texture Recognition,” Proc. Int'l Conf. Computer Vision, 2003.
[7] P.F. Felzenszwalb and D.P. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 61, no. 1, Jan. 2005.
[8] S. Ioffe and D.A. Forsyth, “Human Tracking with Mixtures of Trees,” Proc. Int'l Conf. Computer Vision, 2001.
[9] T. Leung, M. Burl, and P. Perona, “Finding Faces in Cluttered Scenes Using Random Labelled Graph Matching,” Proc. Int'l Conf. Computer Vision, 1995.
[10] M. Weber, M. Welling, and P. Perona, “Unsupervised Learning of Models for Recognition,” Proc. European Conf. Computer Vision, pp. 18-32, 2000, citeseer.nj.nec.comweber00unsupervised.html .
[11] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[12] D. Ramanan and D.A. Forsyth, “Using Temporal Coherence to Build Models of Animals,” Proc. Int'l Conf. Computer Vision, 2003.
[13] M. Kumar, P. Torr, and A. Zisserman, “Learning Layered Pictorial Structures from Video,” Proc. Indian Conf. Vision, Graphics, and Image Processing, 2004.
[14] D. Ramanan, “Tracking People and Recognizing Their Activities,” PhD dissertation, University of California, Berkeley, 2005.
[15] D. Hogg, “Model Based Vision: A Program to See a Walking Person,” Image and Vision Computing, vol. 1, no. 1, pp. 5-20, 1983.
[16] J. O'Rourke and N. Badler, “Model-Based Image Analysis of Human Motion Using Constraint Propagation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, pp. 522-546, 1980.
[17] C. Bregler and J. Malik, “Tracking People with Twists and Exponential Maps,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 8-15, 1998.
[18] D. Gavrila and L. Davis, “3D Model-Based Tracking of Humans in Action: A Multi-View Approach,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 73-80, 1996.
[19] K. Rohr, “Incremental Recognition of Pedestrians from Image Sequences,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 9-13, 1993.
[20] H. Sidenbladh, M.J. Black, and D.J. Fleet, “Stochastic Tracking of 3D Human Figures Using 2D Image Motion,” Proc. European Conf. Computer Vision, 2000.
[21] A. Blake and M. Isard, “Condensation— Conditional Density Propagation for Visual Tracking,” Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[22] K. Toyama and A. Blake, “Probabilistic Tracking with Exemplars in a Metric Space,” Int'l J. Computer Vision, vol. 48, no. 1, pp. 9-19, 2002.
[23] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems. Artech House, 1999.
[24] G. Mori and J. Malik, “Estimating Human Body Configurations Using Shape Context Matching,” Proc. European Conf. Computer Vision, 2002.
[25] J. Sullivan and S. Carlsson, “Recognizing and Tracking Human Action,” Proc. European Conf. Computer Vision, 2002.
[26] D.M. Gavrila, “Pedestrian Detection from a Moving Vehicle,” Proc. Conf. European Conf. Computer Vision, pp. 37-49, 2000.
[27] N. Jojic and B. Frey, “Learning Flexible Sprites in Video Layers,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[28] M. Brand, “Morphable 3D Models from Video,” Proc. Computer Vision and Pattern Recognition, 2001.
[29] L. Torresani, D. Yang, G. Alexander, and C. Bregler, “Tracking and Modeling Non-Rigid Objects with Rank Constraints,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[30] M. Weber, M. Welling, and P. Perona, “Unsupervised Learning of Models for Recognition,” Proc. European Conf. Computer Vision, pp. 18-32, 2000.
[31] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth, “Object Recognition as Machine Translation,” Proc. European Conf. Computer Vision, pp. IV: 97-112, 2002.
[32] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[33] D. Comaniciu and P. Meer, “Mean Shift: A Robust Approach toward Feature Space Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603-619, May 2002.
[34] D. Ramanan and D.A. Forsyth, “Finding and Tracking People from the Bottom Up,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[35] J. Coughlan and S. Ferreira, “Finding Deformable Shapes Using Loopy Belief Propogation,” Proc. European Conf. Computer Vision, 2002.
[36] M. Wainwright, T. Jaakola, and A. Willsky, “Tree-Based Reparameterization for Approximate Inference on Loopy Graphs,” Proc. Advances in Neural Information Processing Systems 14, 2001.
[37] S. Ioffe and D. Forsyth, “Finding People by Sampling,” IEEE Proc. Int'l Conf. Computer Vision, pp. 1092-1097, 1999.
[38] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut— Interactive Foreground Extraction Using Iterated Graph Cuts,” Proc. ACM SIGGRAPH, 2004.
[39] D. Ramanan, D. Forsyth, and A. Zisserman, “Strike a Pose: Tracking People by Finding Stylized Poses,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2005.
[40] Y. Song, X. Feng, and P. Perona, “Towards Detection of Human Motion,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 810-17, 2000.
[41] K. Dana, S. Nayar, B. van Ginneken, and J. Koenderink, “Reflectance and Texture of Real-World Surfaces,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 151-157, 1997.
[42] Hemera Photo Objects, Hemera Technologies, Inc., http:/www.hemera.com, 2002.
[43] T. Leung and J. Malik, “Representing and Recognizing the Visual Appearance of Materials Using Three-Dimensional Textons,” Int'l J. Computer Vision, vol. 43, no. 1, pp. 29-44, 2001.
[44] M. Varma and A. Zisserman, “Texture Classification: Are Filter Banks Necessary?” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[45] D. Lowe, “Object Recognition from Local Scale-Invariant Features,” Proc. Int'l Conf. Computer Vision, 1999.
[46] A. Berg and J. Malik, “Geometric Blur for Template Matching,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[47] P. Indyk and R. Motwani, “Approximate Nearest Neighbor— Towards Removing the Curse of Dimensionality,” Proc. 30th Symp. Theory of Computing, 1998
[48] P. Phillips and E. Newton, “Meta-Analysis of Face Recognition Algorithms,” Proc. Int'l Conf. Automatic Face and Gesture Recognition, 2002.
[49] M.E. Nilsback and B. Caputo, “Cue Integration through Discriminative Accumulation,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[50] G. Dorko and C. Schmid, “Object Class Recognition Using Discriminative Local Features,” IEEE Trans. Pattern Analysis and Machine Intelligence, submitted.
[51] O. Chapelle, P. Haffner, and V. Vapnik, “Support Vector Machines for Histogram-Based Image Classification,” IEEE Neural Networks, vol. 10, no. 5, pp. 1055-1064, 1999.
[52] S. Ioffe and D. Forsyth, “Probabilistic Methods for Finding People,” Int'l J. Computer Vision, 2001.

Index Terms:
Tracking, video analysis, object recognition, texture, shape.
Citation:
Deva Ramanan, David A. Forsyth, Kobus Barnard, "Building Models of Animals from Video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1319-1334, Aug. 2006, doi:10.1109/TPAMI.2006.155
Usage of this product signifies your acceptance of the Terms of Use.