CVPR 2011 (2011)
June 20, 2011 to June 25, 2011
G. W. Taylor , Dept. of Comput. Sci., New York Univ., New York, NY, USA
I. Spiro , Dept. of Comput. Sci., New York Univ., New York, NY, USA
C. Bregler , Dept. of Comput. Sci., New York Univ., New York, NY, USA
R. Fergus , Dept. of Comput. Sci., New York Univ., New York, NY, USA
Supervised methods for learning an embedding aim to map high-dimensional images to a space in which perceptually similar observations have high measurable similarity. Most approaches rely on binary similarity, typically defined by class membership where labels are expensive to obtain and/or difficult to define. In this paper we propose crowd-sourcing similar images by soliciting human imitations. We exploit temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations. We introduce two methods for learning nonlinear, invariant mappings that exploit graded similarities. We learn a model that is highly effective at matching people in similar pose. It exhibits remarkable invariance to identity, clothing, background, lighting, shift and scale.
pose, invariance learning, supervised learning methods, high-dimensional images, binary similarity, crowd-sourcing, human imitations, temporal coherence, invariant mappings
C. Bregler, R. Fergus, G. W. Taylor and I. Spiro, "Learning invariance through imitation," CVPR 2011(CVPR), Providence, RI, 2011, pp. 2729-2736.