The Community for Technology Leaders
CVPR 2011 (2011)
Providence, RI
June 20, 2011 to June 25, 2011
ISBN: 978-1-4577-0394-2
pp: 2729-2736
G. W. Taylor , Dept. of Comput. Sci., New York Univ., New York, NY, USA
I. Spiro , Dept. of Comput. Sci., New York Univ., New York, NY, USA
C. Bregler , Dept. of Comput. Sci., New York Univ., New York, NY, USA
R. Fergus , Dept. of Comput. Sci., New York Univ., New York, NY, USA
ABSTRACT
Supervised methods for learning an embedding aim to map high-dimensional images to a space in which perceptually similar observations have high measurable similarity. Most approaches rely on binary similarity, typically defined by class membership where labels are expensive to obtain and/or difficult to define. In this paper we propose crowd-sourcing similar images by soliciting human imitations. We exploit temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations. We introduce two methods for learning nonlinear, invariant mappings that exploit graded similarities. We learn a model that is highly effective at matching people in similar pose. It exhibits remarkable invariance to identity, clothing, background, lighting, shift and scale.
INDEX TERMS
pose, invariance learning, supervised learning methods, high-dimensional images, binary similarity, crowd-sourcing, human imitations, temporal coherence, invariant mappings
CITATION

C. Bregler, R. Fergus, G. W. Taylor and I. Spiro, "Learning invariance through imitation," CVPR 2011(CVPR), Providence, RI, 2011, pp. 2729-2736.
doi:10.1109/CVPR.2011.5995538
165 ms
(Ver 3.3 (11022016))