Computer Vision, IEEE International Conference on (2013)
Sydney, Australia Australia
Dec. 1, 2013 to Dec. 8, 2013
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICCV.2013.53
The appearance of an object changes profoundly with pose, camera view and interactions of the object with other objects in the scene. This makes it challenging to learn detectors based on an object-level label (e.g., "car"). We postulate that having a richer set of labelings (at different levels of granularity) for an object, including finer-grained subcategories, consistent in appearance and view, and higher order composites - contextual groupings of objects consistent in their spatial layout and appearance, can significantly alleviate these problems. However, obtaining such a rich set of annotations, including annotation of an exponentially growing set of object groupings, is simply not feasible. We propose a weakly-supervised framework for object detection where we discover subcategories and the composites automatically with only traditional object-level category labels as input. To this end, we first propose an exemplar-SVM-based clustering approach, with latent SVM refinement, that discovers a variable length set of discriminative subcategories for each object class. We then develop a structured model for object detection that captures interactions among object subcategories and automatically discovers semantically meaningful and discriminatively relevant visual composites. We show that this model produces state-of-the-art performance on UIUC phrase object detection benchmark.
Visualization, Labeling, Support vector machines, Object detection, Training, Context modeling, Detectors
T. Lan, M. Raptis, L. Sigal and G. Mori, "From Subcategories to Visual Composites: A Multi-level Framework for Object Detection," 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 2013, pp. 369-376.