This Article 
 Bibliographic References 
 Add to: 
Learning Viewpoint Invariant Perceptual Representations from Cluttered Images
May 2005 (vol. 27 no. 5)
pp. 753-761
In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalize across changes in location, rotation, and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli be presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This paper proposes a simple modification to the learning method that can overcome this limitation and results in more robust learning of invariant representations.

[1] H.B. Barlow, “Conditions for Versatile Learning, Helmholtz's Unconscious Inference, and the Task of Perception,” Vision Research, vol. 30, pp. 1561-1571, 1990.
[2] M.S. Bartlett and T.J. Sejnowski, “Unsupervised Learning of Invariant Representations of Faces through Temporal Association,” Computational Neuroscience: Int'l Rev. Neurobiology, J. Bower, ed., vol. suppliment 1, pp. 317-22, 1996.
[3] M.S. Bartlett and T.J. Sejnowski, “Learning Viewpoint Invariant Face Representations from Visual Experience in an Attractor Network,” Network: Computation in Neural Systems, vol. 9, no. 3, pp. 1-19, 1998.
[4] S. Becker, “Learning to Categorize Objects Using Temporal Coherence,” Advances in Neural Information Processing Systems 5, S.J. Hanson, J.D. Cowan, and C.L. Giles, eds. pp. 361-368, San Francisco: Morgan Kaufmann, 1993.
[5] S. Becker, “Implicit Learning in 3D Object Recognition: The Importance of Temporal Context,” Neural Computation, vol. 11, no. 2, pp. 347-374, 1999.
[6] M.C.A. Booth and E.T. Rolls, “View-Invariant Representations of Familiar Objects by Neurons in the Inferior Temporal Visual Cortex,” Cerebral Cortex, vol. 8, pp. 510-523, 1998.
[7] A. Clark and C. Thornton, “Trading Spaces: Computation, Representation and the Limits of Uninformed Learning,” Behavioural and Brain Sciences, vol. 20, no. 1, pp. 57-66, 1997.
[8] M. Ebdon, “Towards a General Theory of Cerebral Neocortex,” PhD thesis, Univ. of Sussex, U.K., 1996.
[9] P. Földiák, “Forming Sparse Representations by Local Anti-Hebbian Learning,” Biological Cybernetics, vol. 64, pp. 165-170, 1990.
[10] P. Földiák, “Learning Invariance from Transformation Sequences,” Neural Computation, vol. 3, pp. 194-200, 1991.
[11] K. Fukushima, “Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biological Cybernetics, vol. 36, vol. 4, pp. 193-202, 1980.
[12] K. Fukushima, “Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition,” Neural Networks, vol. 1, no. 2, pp. 119-130, 1988.
[13] C.D. Gilbert, “Plasticity in Visual Perception and Physiology,” Current Opinion in Neurobiology, vol. 6, no. 2, pp. 269-274, 1996.
[14] M.A. Goodale and A.D. Milner, “Separate Visual Pathways for Perception and Action,” Trends in Neurosciences, vol. 15, pp. 20-25, 1992.
[15] G.E. Hinton, “Connectionist Learning Procedures,” Artificial Intelligence, vol. 40, nos. 1-3, pp. 185-234, 1989.
[16] D.H. Hubel and T.N. Wiesel, “Receptive Fields, Binocular Interaction and Functional Architecture in the Cat's Visual Cortex,” J. Physiology (London), vol. 160, pp. 106-154, 1962.
[17] D.H. Hubel and T.N. Wiesel, “Functional Architecture of Macaque Monkey Visual Cortex,” Proc. Royal Soc. London, Series B, vol. 198, pp. 1-59, 1977.
[18] E. Kobatake and K. Tanaka, “Neuronal Selectivities to Complex Object Features in the Ventral Visual Pathway of the Macaque Cerebral Cortex,” J. Neurophysiology, vol. 71, no. 3, pp. 856-867, 1994.
[19] K.P. Körding and P. König, “Neurons with Two Sites of Synaptic Integration Learn Invariant Representations,” Neural Computation, vol. 13, no. 12, pp. 2823-2849, 2001.
[20] N. Logothetis, “Object Vision and Visual Awareness,” Current Opinion in Neurobiology, vol. 8, no. 4, pp. 536-544, 1998.
[21] N. Logothetis and D.L. Sheinberg, “Visual Object Recognition,” Ann. Rev. Neuroscience, vol. 19, pp. 577-621, 1996.
[22] Y. Miyashita, “Neural Correlate of Visual Associative Long-Term Memory in the Primate Temporal Cortex,” Nature, vol. 335, pp. 817-820, 1988.
[23] V.B. Mountcastle, Perceptual Neuroscience: The Cerebral Cortex. Cambridge, Mass.: Harvard Univ. Press, 1998.
[24] M.W. Oram and P. Földiák, “Learning Generalisation and Localisation: Competition for Stimulus Type and Receptive Field,” Neurocomputing, vol. 11, nos. 2-4, pp. 297-321, 1996.
[25] R.C. O'Reilly and M.H. Johnson, “Object Recognition and Sensitive Periods: A Computational Analysis of Visual Imprinting,” Neural Computation vol. 6, pp. 357-389, 1994.
[26] R.C. O'Reilly and J.L. McClelland, “The Self-Organization of Spatially Invariant Representations,” Technical Report PDP.CNS.92.5, Dept. of Psychology, Carnegie Mellon Univ., 1992.
[27] S.E. Palmer, Vision Science: Photons to Phenomenology. Cambridge, Mass.: MIT Press, 1999.
[28] D.I. Perrett, “View-Dependent Coding in the Ventral Stream and Its Consequences for Recognition,” Vision and Movement Mechanisms in the Cerebral Cortex, R. Caminiti, K.-P. Hoffmann, F. Lacquaniti, and J. Altman, eds., pp. 142-151. Strasbourg: HFSP, 1996.
[29] D.I. Perrett, J.K. Hietanen, M.W. Oram, and P.J. Benson, “Organisation and Functions of Cells Responsive to Faces in the Temporal Cortex,” Philosophical Trans. Royal Soc. London, vol. 335, pp. 23-30, 1992.
[30] M. Riesenhuber and T. Poggio, “Are Cortical Models Really Bound by the ”Binding Problem”?” Neuron, vol. 24, no. 1, pp. 87-93, 1999.
[31] M. Riesenhuber and T. Poggio, “Hierarchical Models of Object Recognition in Cortex,” Nature Neuroscience, vol. 11, pp. 1019-1025, 1999.
[32] E.T. Rolls, “Functions of the Primate Temporal Lobe Cortical Visual Areas in Invariant Visual Object and Face Recognition,” Neuron, vol. 27, pp. 205-218, 2000.
[33] E.T. Rolls and T. Milward, “A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures,” Neural Computation, vol. 12, no. 11, pp. 2547-2572, 2000.
[34] P. Sinha and T. Poggio, “Role of Learning in Three-Dimensional Form Perception,” Nature, vol. 384, pp. 460-463, 1996.
[35] M.W. Spratling and M.H. Johnson, “Pre-Integration Lateral Inhibition Enhances Unsupervised Learning,” Neural Computation, vol. 14, vol. 9, pp. 2157-2179, 2002.
[36] M.W. Spratling and M.H. Johnson, “Neural Coding Strategies and Mechanisms of Competition,” Cognitive Systems Research, vol. 5, no. 2, pp. 93-117, 2004.
[37] J. Stone, “Object Recognition Using Spatio-Temporal Signatures,” Vision Research, vol. 38, no. 7, pp. 947-51, 1998.
[38] J. Stone and A. Bray, “A Learning Rule for Extracting Spatio-Temporal Invariances,” Network: Computation in Neural Systems, vol. 6, no. 3, pp. 429-436, 1995.
[39] J.V. Stone, “A Canonical Microfunction for Learning Perceptual Invariances,” Perception, vol. 25, pp. 207-220, 1996.
[40] S.M. Stringer and E.T. Rolls, “Position Invariant Recognition in the Visual System with Cluttered Environments,” Neural Networks, vol. 13, pp. 305-315, 2000.
[41] M.P. Stryker, “Temporal Associations,” Nature, vol. 354, pp. 108-109, 1991.
[42] K. Tanaka, “Representation of Visual Feature Objects in the Inferotemporal Cortex,” Neural Networks, vol. 9, no. 8, pp. 1459-1475, 1996.
[43] J.N. Templeman and M.H. Loew, “Staged Assimilation: A System for Detecting Invariant Features in Temporally Coherent Visual Stimuli,” Proc. Int'l Joint Conf. Neural Networks, vol. 1, pp. 731-738, 1989.
[44] C. Thornton, “Re-Presenting Representation,” Forms of Representation: An Interdisciplinary Theme for Cognitive Science, D.M. Peterson, ed., pp. 152-162. Exeter, U.K.: Intellect Books, 1996.
[45] M.J. Tovee, E.T. Rolls, and P. Azzopardi, “Translation Invariance in the Responses to Faces of Single Neurons in the Temporal Visual Cortical Areas of the Alert Macaque,” J. Neurophysiology, vol. 72, no. 3, pp. 1049-1060, 1994.
[46] L.G. Ungerleider and M. Mishkin, “Two Cortical Visual Systems,” Analysis of Visual Behavior, D.J. Ingle, M.A. Goodale, and R.J.W. Mansfield, eds., pp. 549-586. Cambridge, Mass.: MIT Press, 1982.
[47] G. Wallis, “Neural Mechanisms Underlying Processing in the Visual Areas of the Occipital and Temporal Lobes,” PhD thesis, Corpus Christi College/Dept. of Experimental Psychology, Univ. of Oxford, U.K., 1994.
[48] G. Wallis, “Using Spatio-Temporal Correlations to Learn Invariant Object Recognition,” Neural Networks, vol. 9, no. 9, pp. 1513-1519, 1996.
[49] G. Wallis, “Spatio-Temporal Influences at the Neural Level of Object Recognition,” Network: Computation in Neural Systems, vol. 9, no. 2, pp. 265-278, 1998.
[50] G. Wallis, “Temporal Order in Human Object Recognition,” J. Biological Systems, vol. 6, no. 3, pp. 299-313, 1998.
[51] G. Wallis, “The Role of Object Motion in Forging Long-Term Representations of Objects,” Visual Cognition, vol. 9, pp. 233-247, 2002.
[52] G. Wallis and H. Bülthoff, “Learning to Recognize Objects,” Trends in Cognitive Sciences, vol. 3, no. 1, pp. 22-31, 1999.
[53] G. Wallis and H. Bülthoff, “Role of Temporal Association in Establishing Recognition Memory,” Proc. Nat'l Academy of Sciences USA, vol. 98, no. 8, pp. 4800-4804, 2001.
[54] G. Wallis, E Rolls, and P. Földiák, “Learning Invariant Responses to the Natural Transformations of Objects,” Proc. Int'l Joint Conf. Neural Networks, vol. 2, pp. 1087-1090, 1993.
[55] G. Wallis and E.T. Rolls, “Invariant Face and Object Recognition in the Visual System,” Progress in Neurobiology, vol. 51, no. 2, pp. 167-194, 1997.
[56] L. Wiskott and T.J. Sejnowski, “Slow Feature Analysis: Unsupervised Learning of Invariances,” Neural Computation, vol. 14, no. 4, pp. 715-770, 2002.

Index Terms:
Computational models of vision, neural nets.
Michael W. Spratling, "Learning Viewpoint Invariant Perceptual Representations from Cluttered Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 753-761, May 2005, doi:10.1109/TPAMI.2005.105
Usage of this product signifies your acceptance of the Terms of Use.