The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.11 - Nov. (2013 vol.35)
pp: 2608-2623
M. Zeeshan Zia , Photogrammetry & Remote Sensing Lab., ETH Zurich, Zurich, Switzerland
M. Stark , Dept. of Comput. Sci., Stanford Univ., Stanford, CA, USA
B. Schiele , Comput. Vision & Multimodal Comput. Lab., Max-Planck-Inst. fur Inf., Saarbrucken, Germany
K. Schindler , Photogrammetry & Remote Sensing Lab., ETH Zurich, Zurich, Switzerland
ABSTRACT
Geometric 3D reasoning at the level of objects has received renewed attention recently in the context of visual scene understanding. The level of geometric detail, however, is typically limited to qualitative representations or coarse boxes. This is linked to the fact that today's object class detectors are tuned toward robust 2D matching rather than accurate 3D geometry, encouraged by bounding-box-based benchmarks such as Pascal VOC. In this paper, we revisit ideas from the early days of computer vision, namely, detailed, 3D geometric object class representations for recognition. These representations can recover geometrically far more accurate object hypotheses than just bounding boxes, including continuous estimates of object pose and 3D wireframes with relative 3D positions of object parts. In combination with robust techniques for shape description and inference, we outperform state-of-the-art results in monocular 3D pose estimation. In a series of experiments, we analyze our approach in detail and demonstrate novel applications enabled by such an object class representation, such as fine-grained categorization of cars and bicycles, according to their 3D geometry, and ultrawide baseline matching.
INDEX TERMS
Three-dimensional displays, Solid modeling, Geometry, Shape, Computational modeling, Detectors, Design automation,ultrawide baseline matching, 3D representation, recognition, single image 3D reconstruction, scene understanding
CITATION
M. Zeeshan Zia, M. Stark, B. Schiele, K. Schindler, "Detailed 3D Representations for Object Recognition and Modeling", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 11, pp. 2608-2623, Nov. 2013, doi:10.1109/TPAMI.2013.87
REFERENCES
[1] S. Agarwal and D. Roth, "Learning a Sparse Representation for Object Detection," Proc. European Conf. Computer Vision, 2002.
[2] M. Andriluka, S. Roth, and B. Schiele, "Pictorial Structures Revisited: People Detection and Articulated Pose Estimation," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[3] M. Arie-Nachimson and R. Basri, "Constructing Implicit 3D Shape Models for Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[4] S.Y. Bao and S. Savarese, "Semantic Structure from Motion," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[5] O. Barinova, V. Lempitsky, E. Tretyak, and P. Kohli, "Geometric Image Parsing in Man-Made Environments," Proc. European Conf. Computer Vision, 2010.
[6] L. Bourdev and J. Malik, "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[7] L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
[8] R.A. Brooks, "Symbolic Reasoning among 3-D Models and 2-D Images," Artificial Intelligence, vol. 17, nos. 1-3, pp. 285-348, 1981.
[9] Y. Chen, T.-K. Kim, and R. Cipolla, "Inferring Shapes Inferring 3D and Deformations from Single Views," Proc. European Conf. Computer Vision, 2010.
[10] T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, "Active Shape Models Their Training and Application," Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38-59, 1995.
[11] G. Csurka, C.R. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual Categorization with Bags of Keypoints," Proc. Workshop Statistical Learning in Computer Vision, 2004.
[12] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
[13] A. Ess, B. Leibe, K. Schindler, and L.V. Gool, "Robust Multi-Person Tracking from a Mobile Platform," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1831-1846, Oct. 2009.
[14] M. Everingham, L. Van Gool, C.K. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," Int'l J. Computer Vision, vol. 88, no. 2, pp. 303-338, 2010.
[15] R. Farrell, O. Oza, N. Zhang, V.I. Morariu, T. Darrell, and L.S. Davis, "Birdlets: Subordinate Categorization Using Volumetric Primitives and Pose-Normalized Appearance," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[16] P.F. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 2010.
[17] R. Fergus, P. Perona, and A. Zisserman, "Object Class Recognition by Unsupervised Scale-Invariant Learning," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2003.
[18] Y. Freund and R. Schapire, "A Decision-Theoretic Generalization of Online Learning and an Application to Boosting," J. Computer and System Sciences, vol. 55, no. 1 pp. 119-139, 1997.
[19] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich, "Viewpoint-Aware Object Detection and Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[20] C. Gu and X. Ren, "Discriminative Mixture-of-Templates for Viewpoint Classification," Proc. European Conf. Computer Vision, 2010.
[21] A. Gupta, A.A. Efros, and M. Hebert, "Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics," Proc. European Conf. Computer Vision, 2010.
[22] M. Haag and H.-H. Nagel, "Combination of Edge Element and Optical Flow Estimates for 3D-Model-Based Vehicle Tracking in Traffic Image Sequences," Int'l J. Computer Vision, vol. 35, no. 3, pp. 295-319, 1999.
[23] R.I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, second ed. Cambridge Univ. Press, 2004.
[24] V. Hedau, D. Hoiem, and D. Forsyth, "Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry," Proc. European Conf. Computer Vision, 2010.
[25] M. Hejrati and D. Ramanan, "Analyzing 3D Objects in Cluttered Images," Proc. Neural Information Processing System, 2012.
[26] D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Int'l J. Computer Vision, vol. 80, no. 1, pp. 3-15, 2008.
[27] D. Koller, K. Daniilidis, and H.H. Nagel, "Model-Based Object Tracking in Monocular Image Sequences of Road Traffic Scenes," Int'l J. Computer Vision, vol. 10, no. 3, pp. 257-281, 1993.
[28] S. Lazebnik, C. Schmid, and J. Ponce, "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[29] B. Leibe, A. Leonardis, and B. Schiele, "An Implicit Shape Model for Combined Object Categorization and Segmentation," Toward Category-Level Object Recognition, Springer, 2006.
[30] C. Leistner, "Semi-Supervised Ensemble Methods for Computer Vision," PhD thesis, TU Graz, 2010.
[31] M. Leordeanu and M. Hebert, "Smoothing-Based Optimization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[32] V. Lepetit and P. Fua, "Keypoint Recognition Using Randomized Trees," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1465-1479, Sept. 2006.
[33] Y. Li, L. Gu, and T. Kanade, "Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 9, pp. 1860-1876, Sept. 2011.
[34] J. Liebelt and C. Schmid, "Multi-View Object Class Detection with a 3D Geometric Model," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[35] J. Liebelt, C. Schmid, and K. Schertler, "Viewpoint-Independent Object Class Detection Using 3D Feature Maps," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[36] R.J. Lopez-Sastre, T. Tuytelaars, and S. Savarese, "Deformable Part Models Revisited: A Performance Evaluation for Object Category Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[37] D. Lowe, "Distinctive Image Features from Scale Invariant Keypoints," Int'l J. Computer Vision, vol. 2, no. 60, pp. 91-110, 2004.
[38] D.G. Lowe, "Three-Dimensional Object Recognition from Single Two-Dimensional Images," Artificial Intelligence, vol. 31, pp. 355-395, 1987.
[39] D. Marr and H.K. Nishihara, "Representation and Recognition of the Spatial Organization of Three-Dimensional Shapes," Proc. Royal Soc. London B, vol. 200, no. 1140, pp. 269-294, 1978.
[40] B.H. Menze, B.M. Kelm, D.N. Splitthoff, U. Koethe, and F.A. Hamprecht, "On Oblique Random Forests," Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, 2011.
[41] M. Nilsback and A. Zisserman, "Automated Flower Classification over a Large Number of Classes," Proc. Sixth Indian Conf. Computer Vision, Graphics and Image Processing, 2008.
[42] M. Ozuysal, V. Lepetit, and P. Fua, "Pose Estimation for Category Specific Multiview Object Localization," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009.
[43] N. Payet and S. Todorovic, "From Contours to 3D Object Detection and Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[44] A.P. Pentland, "Perceptual Organization and the Representation of Natural Form," Artificial Intelligence, vol. 28, no. 3, pp. 293-331, 1986.
[45] B. Pepik, M. Stark, P. Gehler, and B. Schiele, "Teaching 3D Geometry to Deformable Part Models," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[46] L.G. Roberts, "Machine Perception of Three-Dimensional Solids," PhD thesis, MIT, 1963.
[47] R. Salakhutdinov, A. Torralba, and J.B. Tennenbaum, "Learning to Share Visual Appearance for Multiclass Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2011.
[48] S. Savarese and L. Fei-Fei, "3D Generic Object Categorization Localization and Pose Estimation," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[49] H. Schneiderman and T. Kanade, "A Statistical Method for 3D Object Detection Applied to Faces and Cars," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
[50] S. Shalom, L. Shapira, A. Shamir, and D. Cohen-Or, "Part Analogies in Sets of Objects," Proc. Eurographics Symp. 3D Object Retrieval, 2008.
[51] M. Stark, M. Gösele, and B. Schiele, "Back to the Future: Learning Shape Models from 3D CAD Data," Proc. British Machine Vision Conf., 2010.
[52] H. Su, M. Sun, L. Fei-Fei, and S. Savarese, "Learning a Dense Multi-View Representation for Detection Viewpoint Classification and Synthesis of Object Categories," Proc. IEEE Int'l Conf. Computer Vision, 2009.
[53] G.D. Sullivan, A.D. Worrall, and J. Ferryman, "Visual Object Recognition Using Deformable Models of Vehicles," Proc. IEEE Workshop Context-Based Vision, 1995.
[54] M. Sun, B. Xu, G. Bradski, and S. Savarese, "Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery," Proc. European Conf. Computer Vision, 2010.
[55] A. Thomas, V. Ferrari, B. Leibe, T. Tuytelaars, B. Schiele, and L. Van Gool, "Towards Multi-View Object Class Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006.
[56] A. Vedaldi and B. Fulkerson, "VLFeat: An Open and Portable Library of Computer Vision Algorithms," Proc. Int'l Conf. Multimedia, 2008.
[57] M. Villamizar, H. Grabner, J. Andrade-Cetto, A. Sanfeliu, L.V. Gool, and F. Moreno-Noguer, "Efficient 3D Object Detection Using Multiple Pose-Specific Classifiers," Proc. British Machine Vision Conf., 2011.
[58] P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2001.
[59] H. Wang, S. Gould, and D. Koller, "Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding," Proc. European Conf. Computer Vision, 2010.
[60] C. Wojek, S. Roth, K. Schindler, and B. Schiele, "Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes," Proc. European Conf. Computer Vision, 2010.
[61] Y. Xiang and S. Savarese, "Estimating the Aspect Layout of Object Categories," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012.
[62] P. Yan, S. Khan, and M. Shah, "3D Model Based Object Class Detection in an Arbitrary View," Proc. IEEE Int'l Conf. Computer Vision, 2007.
[63] B. Yao, A. Khosla, and L. Fei-Fei, "Combining Randomization and Discrimination for Fine-Grained Image Categorization," Proc. IEEE Int'l Conf. Computer Vision, 2011.
[64] L.L. Zhu, Y. Chen, A. Torralba, W. Freeman, and A. Yuille, "Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[65] M.Z. Zia, M. Stark, and K. Schindler, "Explicit Occlusion Modeling for 3D Object Class Representations," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2013.
[66] M.Z. Zia, M. Stark, K. Schindler, and B. Schiele, "Revisiting 3D Geometric Models for Accurate Object Shape and Pose," Proc. IEEE Int'l Workshop 3D Representation and Recognition, 2011.
64 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool