The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.03 - March (2011 vol.33)
pp: 514-530
Quan Yuan , Sony Electronics Inc., San Jose
Ashwin Thangali , Boston University, Boston
Vitaly Ablavsky , Boston University, Boston
Stan Sclaroff , Boston University, Boston
ABSTRACT
Object detection is challenging when the object class exhibits large within-class variations. In this work, we show that foreground-background classification (detection) and within-class classification of the foreground class (pose estimation) can be jointly learned in a multiplicative form of two kernel functions. Model training is accomplished via standard SVM learning. When the foreground object masks are provided in training, the detectors can also produce object segmentations. A tracking-by-detection framework to recover foreground state in video sequences is also proposed with our model. The advantages of our method are demonstrated on tasks of object detection, view angle estimation, and tracking. Our approach compares favorably to existing methods on hand and vehicle detection tasks. Quantitative tracking results are given on sequences of moving vehicles and human faces.
INDEX TERMS
Object recognition, object detection, object tracking, pose estimation, kernel methods.
CITATION
Quan Yuan, Ashwin Thangali, Vitaly Ablavsky, Stan Sclaroff, "Learning a Family of Detectors via Multiplicative Kernels", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.33, no. 3, pp. 514-530, March 2011, doi:10.1109/TPAMI.2010.117
REFERENCES
[1] A. Agarwal and B. Triggs, "3D Human Pose from Silhouettes by Relevance Vector Regression," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[2] M. Andriluka, S. Roth, and B. Schiele, "People-Tracking-by-Detection and People-Detection-by-Tracking," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[3] V. Athitsos and S. Sclaroff, "Estimating 3D Hand Pose from a Cluttered Image," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[4] S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[5] A. Bissacco, M. Yang, and S. Soatto, "Detecting Humans via Their Pose," Advances in Neural Information Processing Systems, MIT Press, 2006.
[6] A. Bissacco, M. Yang, and S. Soatto, "Fast Human Pose Estimation Using Appearance and Motion via Multi-Dimensional Boosting Regression," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[7] M.B. Blaschko and C.H. Lampert, "Learning to Localize Objects with Structured Output Regression," Proc. European Conf. Computer Vision, 2008.
[8] E. Borenstein and S. Ullman, "Class-Specific, Top-Down Segmentation," Proc. European Conf. Computer Vision, 2002.
[9] C. Cortes and V. Vapnik, "Support Vector Networks," Machine Learning, vol. 20, pp. 273-297, 1995.
[10] O. Crasborn, E. van der Kooij, A. Nonhebel, and W. Emmerik, "ECHO Data Set for Sign Language of the Netherlands," technical report, Dept. of Linguistics, Univ. of Nijmegen, Netherlands, 2004.
[11] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[12] T. Damoulas and M.A. Girolami, "Pattern Recognition with a Bayesian Kernel Combination Machine," Pattern Recognition Letters, vol. 30, no. 1, pp. 46-54, 2008.
[13] M. Enzweiler and D.M. Gavrila, "A Mixed Generative-Discriminative Framework for Pedestrian Classification," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[14] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[15] P.F. Felzenszwalb and D.P. Huttenlocher, "Pictorial Structures for Object Recognition," Int'l J. Computer Vision, vol. 61, pp. 55-79, 2005.
[16] D.M. Gavrila, "Pedestrian Detection from a Moving Vehicle," Proc. European Conf. Computer Vision, 2000.
[17] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, "Multi-PIE," Proc. IEEE Int'l Conf. Face and Gesture Recognition, 2008.
[18] D. Hoiem, A.A. Efros, and M. Hebert, "Putting Objects in Perspective," Int'l J. Computer Vision, vol. 80, no. 1, pp. 3-15, 2008.
[19] C. Huang, H. Ai, Y. Li, and S. Lao, "High-Performance Rotation Invariant Multiview Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 4, pp. 671-686, Apr. 2007.
[20] C. Ioffe and D. Forsyth, "Probabilistic Methods for Finding People," Int'l J. Computer Vision, vol. 43, no. 1, pp. 45-68, 2001.
[21] C. Ionescu, L. Bo, and C. Sminchisescu, "Structural SVM for Visual Localization and Continuous State Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[22] M. Isard and A. Blake, "CONDENSATION: Conditional Density Propagation for Visual Tracking," Int'l J. Computer Vision, vol. 29, no. 1, pp. 5-28, 1998.
[23] T. Joachims, "Making Large-Scale SVM Learning Practical," Advances in Kernel Methods—Support Vector Learning, B. Scholkopf, C. Burges, and A. Smola, eds., MIT Press, 1999.
[24] M.P. Kumar, P.H.S. Torr, and A. Zisserman, "Obj Cut," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[25] B. Leibe, N. Cornelis, K. Cornelis, and L.V. Gool, "Dynamic 3D Scene Analysis from a Moving Vehicle," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[26] B. Leibe, A. Leonardis, and B. Schiele, "Robust Object Detection with Interleaved Categorization and Segmentation," Int'l J. Computer Vision, vol. 77, no. 1, pp. 259-289, 2007.
[27] S. Li, Q. Fu, L. Gu, B. Scholkopf, Y. Cheng, and H. Zhang, "Kernel Machine Based Learning for Multi-View Face Detection and Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision, 2001.
[28] S. Li and Z. Zhang, "Floatboost Learning and Statistical Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1112-1123, Sept. 2004.
[29] Y. Li, H. Ai, T. Yamashita, S. Lao, and M. Kawade, "Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Life Spans," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 10, pp. 1728-1740, Oct. 2008.
[30] M. Everingham et al. "The 2005 PASCAL Visual Object Class Challenge," Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment, Springer, 2006.
[31] M. Marszalek, C. Schmid, H. Harzallah, and J. van de Weijer, "Learning Object Representations for Visual Object Class Recognition," Proc. Visual Recognition Challange Workshop, in Conjunction with IEEE Int'l Conf. Computer Vision, 2007.
[32] H. Murase and S.K. Nayar, "Visual Learning and Recognition of 3D Objects from Appearance," Int'l J. Computer Vision, vol. 14, no. 1, pp. 5-24, 1995.
[33] C. Neidle, "SLLRP Signstream Databases," Boston Univ., http://ling.bu.edu/asllrpdataqueryPages, 2003.
[34] J. Nocedal and S.J. Wright, Numerical Optimization. Springer-Verlag, 2006.
[35] A. Oikonomopoulos, I. Patras, and M. Pantic, "Kernel-Based Recognition of Human Actions Using Spatiotemporal Salient Points," Proc. Workshop Vision for Human Computer Interaction, 2006.
[36] K. Okuma, A. Taleghani, N.D. Freitas, J. Little, and D. Lowe, "A Boosted Particle Filter: Multitarget Detection and Tracking," Proc. European Conf. Computer Vision, 2004.
[37] E. Ong and R. Bowden, "A Boosted Classifier Tree for Hand Shape Detection," Proc. IEEE Int'l Conf. Face and Gesture Recognition, 2004.
[38] R. Osadchy, M. Miller, and Y. LeCun, "Synergistic Face Detection and Pose Estimation with Energy-Based Model," Advances in Neural Information Processing Systems, MIT Press, 2004.
[39] C. Papageorgiou and T. Poggio, "A Trainable System for Object Detection," Int'l J. Computer Vision, vol. 38, no. 1 pp. 15-33, 2000.
[40] A. Pentland, B. Moghaddam, and T. Starner, "View-Based and Modular Eigenspaces for Face Recognition," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1994.
[41] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods," Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, eds., MIT Press, 1999.
[42] D. Ramanan, D.A. Forsyth, and A. Zisserman, "Strike a Pose: Tracking People by Finding Stylized Poses," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[43] R. Rifkin and A. Klautau, "In Defense of One-vs-All Classification," J. Machine Learning Research, pp. 101-141, 2004.
[44] R. Rosales and S. Sclaroff, "Learning Body Pose via Specialized Maps," Advances in Neural Information Processing Systems, MIT Press, 2002.
[45] S. Roweis and L. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 5500, pp. 2323-2326, 2000.
[46] H.A. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23-38, Jan. 1998.
[47] B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman,"LabelMe: A Database and Web-Based Tool for Image Annotation," technical report, Massachusetts Inst. of Tech nology, 2005.
[48] E. Seemann, B. Leibe, and B. Schiele, "Multi-Aspect Detection of Articulated Objects," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[49] G. Shakhnarovich, P. Viola, and T. Darrell, "Fast Pose Estimation with Parameter-Sensitive Hashing," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[50] J. Shi and J. Malik, "Normalized Cuts and Image Segmentation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997.
[51] H. Sidenbladh, M.J. Black, and D.J. Fleet, "Stochastic Tracking of 3D Human Figures Using 2D Image Motion," Proc. European Conf. Computer Vision, 2000.
[52] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard, "Tracking Loose-Limbed People," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[53] C. Sminchisescu, A. Kanaujia, and D. Metaxas, "Learning Joint Top-Down and Bottom-Up Processes for 3D Visual Inference," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[54] B. Stenger, A. Thayananthan, P. Torr, and R. Cipolla, "Filtering Using a Tree-Based Estimator," Proc. IEEE Int'l Conf. Computer Vision, 2003.
[55] A. Torralba, K. Murphy, and W. Freeman, "Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2004.
[56] M. Varma and D. Ray, "Learning the Discriminative Power-Invariance Trade-Off," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[57] P. Viola and M. Jones, "Fast Multi-View Face Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[58] P. Viola and M. Jones, "Robust Real Time Object Detection," Int'l J. Computer Vision, vol. 57, no. 2, pp. 137-154, 2004.
[59] L. Wang, J. Shi, G. Song, and I. Shen, "Object Detection Combining Recognition and Segmentation," Proc. Asian Conf. Computer Vision, 2007.
[60] B. Wu and R. Nevatia, "Cluster Boosted Tree Classifier for Multi-View Multi-Pose Object Detection," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[61] B. Wu and R. Nevatia, "Simultaneous Object Detection and Segmentation by Boosting Local Shape Feature Based Classifier," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[62] Q. Yuan, A. Thangali, V. Ablavsky, and S. Sclaroff, "Parameter Sensitive Detectors," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
[63] L. Zhu, Y. Chen, C. Lin, and A.L. Yuille, "Rapid Inference on a Novel and/or Graph: Detection, Segmentation and Parsing of Articulated Deformable Objects in Cluttered Backgrounds," Advances in Neural Information Processing Systems, MIT Press, 2007.
[64] http://www.cs.bu.edu/groups/ivc/dataMultiplicativeKernels / 2010.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool