This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
One-Shot Learning of Object Categories
April 2006 (vol. 28 no. 4)
pp. 594-611
Li Fei-Fei, IEEE
Rob Fergus, IEEE
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

[1] Merriam-Webster's Collegiate Dictionary, 10th ed., Springfield, Mass.: Merriam-Webster, Inc., 1994.
[2] Y. Amit and D. Geman, “A Computational Model for Visual Selection,” Neural Computation, vol. 11, no. 7, pp. 1691-1715, 1999.
[3] H. Attias, “Inferring Parameters and Structure of Latent Variable Models by Variational Bayes,” Proc. 15th Conf. Uncertainty in Artificial Intelligence, pp. 21-30, 1999.
[4] I. Biederman, “Recognition-by-Components: A Theory of Human Image Understanding,” Psychological Rev., vol. 94, pp. 115-147, 1987.
[5] M. Burl and P. Perona, “Recognition of Planar Object Classes,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 223-230, 1996.
[6] M. Burl, M. Weber, and P. Perona, “A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry,” Proc. European Conf. Computer Vision, pp. 628-641, 1996.
[7] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc., vol. 29, pp. 1-38, 1976.
[8] L. Fei-Fei, R. Fergus, and P. Perona, “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories,” Proc. Ninth Int'l Conf. Computer Vision, pp. 1134-1141, Oct. 2003.
[9] L. Fei-Fei, R. Fergus, and P. Perona, “Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories,” Proc. Workshop Generative-Model Based Vision, 2004.
[10] L. Fei-Fei, R. Fergus, and P. Perona, supplemental material, http://computer.org/tpamiarchives.htm, 2006.
[11] P. Felzenszwalb, and D. Huttenlocher, “Pictorial Structures for Object Recognition,” Int'l J. Computer Vision, vol. 1, pp. 55-79, 2005.
[12] P. Felzenszwalb and D. Huttenlocher, “Representation and Detection of Deformable Shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 208-220, Feb. 2005.
[13] R. Fergus, P. Perona, and A. Zisserman, “Object Class Recognition by Unsupervised Scale-Invariant Learning,” Proc. Computer Vision and Pattern Recognition, pp. 264-271, 2003.
[14] R. Fergus, P. Perona, and A. Zisserman, “A Visual Category Filter for Google Images,” Proc. Eighth European Conf. Computer Vision, 2004.
[15] R. Fergus, P. Perona, and A. Zisserman, “A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition,” Proc. Computer Vision and Pattern Recognition, 2005.
[16] D. Forsyth and A. Zisserman, “Shape from Shading in the Light of Mutual Illumination,” Image and Vision Computing, pp. 42-29, 1990.
[17] A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin, Bayesian Data Analysis. Chapman Hall/CRC, 1995.
[18] R. Gilks, S. Richardson, and D. Spiegelhalter, Markov Chain Monte Carlo in Practice. Chapman Hall, 1992.
[19] R. Gilks and P. Wild, “Adaptive Rejection Sampling for Gibbs Sampling,” Applied Statistics, vol. 41, pp. 337-348, 1992.
[20] W. Grimson and D. Huttenlocher, “On the Sensitivity of the Hough Transform for Object Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 3, pp. 255-274, Mar. 1990.
[21] K. Humphreys and M. Titterington, “Some Examples of Recursive Variational Approximations for Bayesian Inference,” Advanced Mean Field Methods. M. Opper and D. Saad, eds., MIT Press, 2001.
[22] D. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing Images Using the Hausdorff Distance,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850-863, Sept. 1993.
[23] T. Kadir and M. Brady, “Scale, Saliency and Image Description,” Int'l J. Computer Vision, vol. 45, no. 2, pp. 83-105, 2001.
[24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[25] Y. LeCun, F. Huang, and L. Bottou, “Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting,” Proc. Conf. Computer Vision and Pattern Recognition, 2004.
[26] T. Leung, M. Burl, and P. Perona, “Finding Faces in Cluttered Scenes Using Labeled Random Graph Matching,” Proc. Int'l Conf. Computer Vision, pp. 637-644, 1995.
[27] D. Lowe, “Object Recognition from Local Scale-Invariant Features,” Proc. Int'l Conf. Computer Vision, pp. 1150-1157, 1999.
[28] K. Mikolajczyk and C. Schmid, “An Affine Invariant Interest Point Detector,” Proc. European Conf. Computer Vision, vol. 1, pp. 128-142, 2002.
[29] R. Neal and G. Hinton, “A View of the EM Algorithm that Justifies Incremental, Sparse and Other Variants,” Learning in Graphical Models, M.I. Jordan, ed., pp. 355-368, Norwell, Mass.: Kluwer Academic Press, 1998.
[30] W. Penny, “Variational Bayes for d-Dimensional Gaussian Mixture Models,” technical report, Univ. College London, 2001.
[31] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, “3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints,” Proc. Computer Vision and Pattern Recognition, pp. 272-280, 2003.
[32] H. Rowley, S. Baluja, and T. Kanade, “Neural Network-Based Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 23-38, Jan. 1998.
[33] E. Sali and S. Ullman, “Combining Class-Specific Fragments for Object Classification,” Proc. British Machine Vision Conf., vol. 1, pp. 203-213, 1999.
[34] H. Schneiderman and T. Kanade, “A Statistical Approach to 3D Object Detection Applied to Faces and Cars,” Proc. Computer Vision and Pattern Recognition, pp. 746-751, 2000.
[35] K. Sung and T. Poggio, “Example-Based Learning for View-Based Human Face Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, Jan. 1998.
[36] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proc. Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001.
[37] P. Viola, M. Jones, and D. Snow, “Detecting Pedestrians Using Patterns of Motion and Appearance,” Proc. Int'l Conf. Computer Vision, pp. 734-741, 2003.
[38] M. Weber, W. Einhaeuser, M. Welling, and P. Perona, “Viewpoint-Invariant Learning and Detection of Human Heads,” Proc. Fourth Int'l Conf. Automated Face and Gesture Recognition, pp. 20-27, 2000.
[39] M. Weber, M. Welling, and P. Perona, “Unsupervised Learning of Models for Recognition,” Proc. European Conf. Computer Vision, vol. 2, pp. 101-108, 2000.
[40] A. Torralba, K.P. Murphy, and W.T. Freeman, “Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 762-769, 2004.
[41] M. Weber, “Unsupervised Learning of Models for Object Recognition,” PhD thesis, Calif. Inst. of Technology, Pasadena, 2000.
[42] R. Fergus, “Visual Object Category Recognition,” PhD thesis, Univ. of Oxford, U.K., 2005.
[43] A. Berg, T. Berg, and J. Malik, “Shape Matching and Object Recognition Using Low Distortion Correspondence,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 26-33, June 2005.

Index Terms:
Recognition, object categories, learning, few images, unsupervised, variational inference, priors.
Citation:
Li Fei-Fei, Rob Fergus, Pietro Perona, "One-Shot Learning of Object Categories," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594-611, April 2006, doi:10.1109/TPAMI.2006.79
Usage of this product signifies your acceptance of the Terms of Use.